简明现代魔法 -> 数据库技术 -> 中文乱码显示问题,MySQL数据库字符集的原理和常见问题

中文乱码显示问题,MySQL数据库字符集的原理和常见问题

2009-08-21

 

Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections: 
一些字符集和collation,是用户client's对Server的相互作用。有些在前面已经提及了:

The server character set and collation can be determined from the values of the character_set_server and collation_server system variables.
服务器(server)的字符集和排序(collation)是被 character_set_server和 collation_server这两个系统变量system variables的值決定

The character set and collation of the default database can be determined from the values of thecharacter_set_database and collation_database system variables. 
数据库的默认(default) 字符集(character se)t和 排序(collation) 是被 character_set_database和 collation_database这两个系统变量system variables的值決定

Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.
¯`'•.¸(•¿•).•:*¨¨*:•.(°_°)~*~get it ??? ~*~(°_°).•:*¨¨*:•.(•¿•)¸.•'´¯

Consider what a “connection” is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. 
This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:
想想"连接connection"是什么:是你连到服务器Server时作的事情。客户端Client通过这条连接发送SQL语句(statements),比如查询(queries),通过 connection到server, 服务器则送回回应,像是结果集合 result sets, 通过连接给客户端client,
这导致了客户端连接client connections处理字符集和collation的一些问题,它们每个都可以按照系统变量system variables来回答:

For Example : system variables指的就是 X
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

What character set is the statement in when it leaves the client?
当查询 离开客户端时 是什么字符集?
The server takes the character_set_client system variable to be the character set in which statements are sent by the client.
服务器读取character_set_client 这个变量variable, 来作为客户端发送查询所用的字符集。


 

MySQL Server预期接收到的数据是 latin1
但Client( Bigfat) 端藉着php传来的资料却是gb语系
例如 :Client( Bigfat)送出一些中文资料, 却不使用character_set_client告诉MySQL Server 这送来的资料是gb2312或big5语系,结果 当然是乱码,
因为MySQL Server字符集的默认值default=latin1,
造成乱码的问题很多 光看这些 MySQL 中文參考手冊的翻譯, 还不足以解决你的问题,我尽可能多用例子說明
首先 你应该对电脑的编码方式 有清楚的概念

What character set should the server translate a statement to after receiving it?
服务器端在接收到了查询以后, 应该把翻译translate到什么字符集里?
For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8).
对于这个,服务器用的是character_set_connection 和 collation_connection 这两个变量variables。它把客户端送来的查询 从 character_set_client转换成 character_set_connection(除了latin1或者utf8 的字符串)
collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
collation_connection 对于比较字符串literal strings非常重要,对于 列columns值比较字符串是没有关系does not matter,因为列columns有自己的collation(拥有更高优先)

mysql4.1.x及以后都增加了的语言编码自动转换功能,它把客户端送来的查询 从 character_set_client转换成
character_set_connection
光看文字的表面 我也搞不太清楚MySQL转换的机制,为什麽要转换 ?
MySQL数据库只是储存空间,放进去的是什么,那么拿出来的 就该是什么。
转换 这样不是容易出现乱码 ?
MySQL 4.0没有character sett字符集and排序(collation) 数据储存在表 table,
也没有support字符集的转换(converting).
要转换character set需要更改server's character set,然後使用外部工具去转换data.
MySQL 4.0升级到MySQL 4.1 由于大部分用户使用的是latin语系问题不大
但导致中文语系问题的出现和复杂化,

有了新规定character_set_client所以 MySQL 4.1 要求客户端必须指定这个字符集,
若你都不说(keep silence就是默认), MySQL就会按照默认值latin1 处理,
MySQL再问你: character_set_connection转换成什麽 ?
若你说:I have right to keep silence, MySQL就会把你的 "东东" ,转换为数据库默认的 "东西"
可能你存进MySQL是Big5的10000元, 出来变成latin1_swedish的10000元
如果 你不希望发生这种事, 可以去看看原作者的解释, 或继续看下去 明白下面的解释
Jim Winstead that lead web developer with MySQL AB

What character set should the server translate to before shipping result sets or error messages back to the client?
服务器要送回结果result集合或者错误信息给客户端时 应该用什么字符集?
The character_set_results system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, and result metadata such as column names.
character_set_results变量指示了送回给客户端 结果result集合的字符集,这包括了列column值,或者列column名 等结果数据result data
You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip the rest of this section).
你可以调整这些变量的值,或者就使用默认的(那样你就可以跳过以下部分)
There are two statements that affect the connection character sets:
有两个语句statements作用(affect)连接字符集设置:
SET NAMES 'charset_name'
SET CHARACTER SET charset_name

在保存数据的时候,MySQL并不清楚网站提交的字符编码是gb2312big5,若你在发出信息前 不使用SET NAMES 'big5' MySQL保存时就不知道用什么编码来保存数据,因此产生了乱码!

SET NAMES indicates what character set the client will use to send SQL statements to the server.
Thus, SET NAMES 'cp1251' tells the server “future incoming messages from this client are in character set cp1251.”
It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)
SET NAMES 指示:客户端送到server的SQL语句里是什么character set。
因此,SET NAMES 'cp1251' 就告诉server "将來future incoming的信息, 从这个客户端送来的信息将是使用'cp1251'这个字符集。
这也指定了server送回的结果所用的字符集,(例如 如果你用了一个select语句 它会指出列值column values 拥有的字符集)
A SET NAMES 'x' statement is equivalent to these three statements:
SET NAMES 'x'语句相当于下面三个语句:
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

Setting character_set_connection to x also sets collation_connection to the default collation for x.
SET CHARACTER SET is similar to SET NAMES but sets the connection character set and collation to be those of the default database.
但connection character set and collation 是转换成默认值, 当然是 瑞典文 latin1_swedish_ci
A SET CHARACTER SET x statement is equivalent to these three statements:
SET CHARACTER SET x语句相当于这三个语句:
SET character_set_client = x;
SET character_set_results = x;
SET collation_connection = @@collation_database;
SET CHARACTER SET is similar类似 to SET NAMES,
比较上面6句 红色statements 只有以下2句不同:
SET character_set_connection = x;
SET collation_connection = @@collation_database;

 

叁考下面例子:
mysql> SET NAMES 'gbk' ;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%collation%' ;
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | gbk_chinese_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.02 sec)
mysql> SET CHARACTER SET gbk ;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%collation%' ;
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%character%' ;
+--------------------------+-------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------+
| character_set_client | gbk |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_results | gbk |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | E:\wamp\mysql\share\charsets\ |
+--------------------------+-------------------------------+
7 rows in set (0.00 sec)

有些參考書上建议这麽做 :
mysql> SET collation_connection = 'big5_chinese_ci' ;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%collation%' ;
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | big5_chinese_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
It apply in PHP :
mysqli_query($link, 'SET CHARACTER SET big5');
mysqli_query($link, "SET collation_connection = 'big5_chinese_ci'");
Setting collation_connection also sets character_set_connection to the character set associated with the collation.
When a client connects, it sends to the server the name of the character set that it wants to use. The server uses the name to set the character_set_client, character_set_results, and character_set_connection system variables.
In effect, the server performs a SET NAMES operation using the character set name.
当一个客户连接,它向Server发送它想使用的字符集的名字,Server用这名字把character_set_client, character_set_results, and character_set_connection 这些变量设置成那个字符集,
实际上,Server使用字符集名字执行了SET NAMES 操作
With the mysql client, it is not necessary to execute SET NAMES every time you start up if you want to use a character set different from the default. You can add the --default-character-set option setting to your mysql statement line, or in your option file.
For example, the following option file setting changes the three character set variables set to koi8r each time you invoke mysql:
如果你不想用默认字符集, mysql 客户端 不需要每次启动时执行SET NAMES 。你可以加上 --default-character-set在 mysql执行语句行, 或在option file(configuration file)加上,
比如,下面的option file(my.ini)选项文件设置 使你每次执行mysql程序时把三句字符集变量改成 koi8r:
[mysql]
default-character-set=koi8r

当启动MySQL Server配置文件Configuration File会被读取,
Windows平台 放置在 C:\Program Files\MySQL\MySQL Server 5.1或 C:\WINDOWS\my.ini 或C:\wamp\mysql\my.ini
Unix平台 放置在 /etc/my.cnf
如果找不到可以用find命令找一下:
起动mysqld时须指定 --default-character-set=gb2312
#或default-character-set=utf8
#或default-character-set=gbk,gb2312,big5,utf8(加上 # 為註解,沒有作用)
然后 重新启动MySQL server,就生效了。
在mysqld配置项中加入下面配置:
# The default character set that will be used when a new schema or table is
# created and no character set is defined
default-character-set=latin1
若要正确显示中文繁、简、日文、韩文 使用utf8
default-character-set=utf8

注意:所有语言的名称用小写字母指定:utf8不是UTF8

当你建立一个新的表格,若未指定字集(character set),预设字型latin1将被使用that is the same as saying --character-set-server=latin1
创建数据库时,你可以指定字符集,如果没指定,就使用数据库服务器的字符集的默认值(default)
问 : 为什么 字符集 总是 默认 瑞典文 latin1_swedish_ci
答 : 因为MySQL是瑞典人发明
问 : 为什么 不能 是中文 ?
答 : 或许 未来MySQL是你改版成 MySee狗 默认 就是 中文
指定MySQL数据库服务器的字符集把 my.ini 改成 default-character-set=utf8
然后打入 show variables like '%character%';
就会看到下图, 注意 原本默认Value全是 latin1

mysql> show variables like '%character%';
+--------------------------+-------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | E:\wamp\mysql\share\charsets\ |
+--------------------------+-------------------------------+
7 rows in set (0.00 sec)
mysql> show variables like '%collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
mysql>
可以看到 _database, _server的值 都改变, 这是MySQL server的my.ini 在你的掌控下
简单么 ?
但有时MySQL server在远方,就不简单 你就只能用PHP 去控制 MySQL server
Example: Suppose that column1 is defined as CHAR(5) CHARACTER SET latin2. If you do not say SET NAMES or SET CHARACTER SET, then for SELECT column1 FROM t, the server sends back all the values for column1 using the character set that the client specified when it connected. On the other hand, if you say SET NAMES 'latin1' or SET CHARACTER SET latin1 before issuing the SELECT statement, the server converts the latin2 values to latin1 just before sending results back. Conversion may be lossy if there are characters that are not in both character sets.
例如:假设column1定义是 CHAR(5) CHARACTER SET latin2。如果你不用SET NAMES 或者 SET CHARACTER SET,那么对于你的 SELECT column1 FROM t请求,Server 会把column1 的所有值用连接建立时 客户端指定的字符集 送回。另一方面,如果你用了 SET NAMES 'latin1' or SET CHARACTER SET latin1`,那么在送回结果之前, Server会把 latin2 的值转成latin1,如果 有字符 不在两种字符集里,转化Conversion可能会失败 。
If you do not want the server to perform any conversion of result sets, set character_set_results to NULL:
如果你不希望Server作任何转换,就把 character_set_results设置成 NULL
SET character_set_results = NULL;
上面提到 你若不希望MySQL server 把Big5的10000元 转换成latin1_swedish的10000元
可以把Conversion(转换)关掉,使用 mysql> SET character_set_results = NULL;
Example: column1的数据是gbk ,如果你用了SET NAMES 'gb2312' or
SET CHARACTER SET gb2312查询column1的数据 结果会是??? ??? ? 这种乱码
因为在送回结果之前,Server会把gbk 的值转成gb2312
Note: Currently, UCS-2 cannot be used as a client character set, which means that SET NAMES 'ucs2' does not work.
To see the values of the character set and collation system variables that apply to your connection, use these statements:
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Get it ?
清楚上面的游戏规则 ?
但不能 保证不乱码,还必须确定网页自身的编码,Web page里指定的编码,与MySQL处理的编码是统一的。

随机文章推荐
网站分类


注:如需转载本文,请注明出处(原文链接),谢谢。更多精彩内容,请进入简明现代魔法首页。

进入新博客
喜欢本文,就分享它吧
给我留言
您的名字:
您的邮件:
您的网站:


 

copyright © 2009 简明现代魔法    学习、分享、进步

power by Gonn 感谢所有关心和支持本站的朋友们