问题:本地数据库直接复制了原服务器上的数据文件,包含一个host表,但在本地mysql中,select显示结果总是乱码,而用php调用确可正常显示.而在本地新建的表(utf8),确能在utf8编码环境下的mysql终端显示,却无法在php中正常显示,通过在php中的mysql_connect()后,加入mysql_query("set names utf8")后,可正常显示,但host表又不可显示.
show create database 和show create table显示都是使用utf8编码.
原来服务器上的配置文件:
[mysqld]
character_set_server=ascii
我的机子上的配置文件:
[mysqld]
default-character-set = utf8
原来服务器上:
mysql> show variables like "character%";
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | gbk |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | gbk |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
我的机子上:
mysql> show variables like "character%";
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.01 sec)
几个关键的变量含义如下:
(1)character_set_server:The server character set and collation are
used as default values if the database character set and collation
are not specified in CREATE DATABASE statements. They have no other purpose.
(2)character_set_database:The database character set and collation
are used as default values if the table character set and
collation are not specified in CREATE
TABLE statements. The database character set also is used by LOAD DATA INFILE. The character set and collation have no other purposes.
• What character set is the statement in when it leaves the client?
The server takes the character_set_client system variable to be the character set in which statements are sent by the client.
• What character set should the server translate a statement to after receiving it?
For this, the server uses the character_set_connection and collation_connection system variables. It converts
statements sent by the client from character_set_client to character_set_connection (except for string literals
that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings.
For comparisons of strings with column values, collation_connection does not matter because columns have their own col-
lation, which has a higher collation precedence.
• What character set should the server translate to before shipping result sets or error messages back to the client?
The character_set_results system variable indicates the character set in which the server returns query results to the client.
This includes result data such as column values, and result metadata such as column names.
mysql根据以上变量,自动执行的编码转换(注意:此处转换只是按新的编码方式将原文字编码成新字符,并非修改编码):
(1)mysql接收查询语句时,如果有必要的话,自动执行将查询语句从character_set_client转换成character_set_connection编码.
(2)如果是insert语句,如果有必要的话,则还要根据表中定义的编码类型,完成一次转换,存入数据库中.
(3)返回结果时,如果有必要的话,完成表中编码类型向character_set_results编码类型的转换.
此处,"有必要"是指两端的两种编码方式不同.在使用php来和mysql通信的话,character_set_client只是起指示作用,并不一定保证php传过来的查询语句就是该编码.其正确性应由用户来保证.
问题应该是这样的:::
在原来的服务器上:
php发送查询语句(utf8格式)给mysql,
(1)由于character_set_client和character_set_connection都使用的是latin1编码,所以不用转换;
(2)如果是insert语句,由于表中数据为utf8编码,character_set_connection为latin1编码,数据(utf8)经历一次从latin1到utf8的转换,再存入!
(3)返回结果时,由于表中数据为utf8编码,character_set_results为latin1编码,数据(latin1_to_utf8(utf8))再经历一次从utf8->latin1的转换.最终结果还是utf8!
注意:(1)单纯的utf8->latin时,由于utf8"兼容"所有的字符集,故utf8中有些字符可能latin1中没有,这种转换可能导致信息丢失.但latin1->utf8不会.本例(1)中,utf8数据被错误的当成latin1被转换时,也可能出现某些字段无法对应latin1字符集,可能出错,所以虽然以上的方式一般显示的结果是正确的,但我觉得还是有必要将其改正过来.
(2)"For comparisons of strings with column values, collation_connection does not matter because columns have their own col-
lation, which has a higher collation precedence."--说明以上的转换不会影响对列值的匹配.因为列值的匹配有它自己的匹配规则,无论开始被转换成什么乱码,最终都会被转换成和列值同样的编码,再进行匹配(再次强调,编码转换只是改变解析方式,并不改变编码;同样的编码,不同的解析方式,被认为是不匹配的).
在我的机子上:
我机子上的表是复制服务器上的表,所以表里面的数据仍然是latin_to_utf8(utf8)格式的.但我的character set都设成了utf8.所以整个查询过程中,都不会进行编码转换,最终输出的就是latin_to_utf8(utf8)--乱码!无论你将终端界面的字符编码怎么设置,终会是乱码,只不过可能是不同形式的乱码而已.
另外,还有一个值得注意的地方,必须设置相应的collation_database和collation_server,否则在select查询时匹配每个列中的中文时,会显示空集.
原有服务器:
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | gbk_chinese_ci |
| collation_server | gbk_chinese_ci |
+----------------------+-------------------+
通过set @@global. collation_database=utf8_general_ci
set @@global. collation_server=utf8_general_ci,
重新登陆mysql可见,不加@@global.只对当前session有效.
最终在我的机子上:
mysql> show variables like "collation%";
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
好了,原理都弄清了,解决办法也就浮出水面了:
(1)保持原有mysql变量,set names latin1;
(2)mysqldump -uroot -p --default-character-set=latin1 hcdata host > /mysql/a.sql;
(3)更正环境变量:set names utf8;
(4)备份原有host表alter table host rename host_backup
(5)导入a.sql:进入mysql后,执行命令source /mysql/a.sql
或直接 mysql -uroot -p hcdata </mysql/a.sql
下面这篇文章都以上几个参数做了几个试验,很不错.
http://chenzs19850728.blog.163.com/blog/static/7629609200868104214954/
Thursday, March 12, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment