MySQL支持的字符集
mysql> show character set; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | cp866 | DOS Russian | cp866_general_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | binary | Binary pseudo charset | binary | 1 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ 36 rows in set (0.00 sec)查看MySQL的字符集
mysql> show variables like '%charac%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec)
– character_set_client 客户端来源数据使用的字符集 – character_set_connection 连接层字符集 – character_set_database 当前选中数据库的默认字符集 – character_set_results 查询结果字符集 – character_set_server 默认的内部操作字符集 – character_set_system 系统元数据(字段名等)字符集 mysql> show variables like '%collation%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec) mysql>
mysql> show collation; +--------------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | | dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 | | dec8_bin | dec8 | 69 | | Yes | 1 | | cp850_general_ci | cp850 | 4 | Yes | Yes | 1 | | cp850_bin | cp850 | 80 | | Yes | 1 | | hp8_english_ci | hp8 | 6 | Yes | Yes | 1 | | hp8_bin | hp8 | 72 | | Yes | 1 | | koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 | | koi8r_bin | koi8r | 74 | | Yes | 1 | | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | Yes | 1 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | Yes | 1 | | latin1_general_cs | latin1 | 49 | | Yes | 1 | | latin1_spanish_ci | latin1 | 94 | | Yes | 1 | | latin2_czech_cs | latin2 | 2 | | Yes | 4 | | latin2_general_ci | latin2 | 9 | Yes | Yes | 1 | | latin2_hungarian_ci | latin2 | 21 | | Yes | 1 | | latin2_croatian_ci | latin2 | 27 | | Yes | 1 | | latin2_bin | latin2 | 77 | | Yes | 1 | | swe7_swedish_ci | swe7 | 10 | Yes | Yes | 1 | | swe7_bin | swe7 | 82 | | Yes | 1 | | ascii_general_ci | ascii | 11 | Yes | Yes | 1 | | ascii_bin | ascii | 65 | | Yes | 1 | | ujis_japanese_ci | ujis | 12 | Yes | Yes | 1 | | ujis_bin | ujis | 91 | | Yes | 1 | | sjis_japanese_ci | sjis | 13 | Yes | Yes | 1 | | sjis_bin | sjis | 88 | | Yes | 1 | | hebrew_general_ci | hebrew | 16 | Yes | Yes | 1 | | hebrew_bin | hebrew | 71 | | Yes | 1 | | tis620_thai_ci | tis620 | 18 | Yes | Yes | 4 | | tis620_bin | tis620 | 89 | | Yes | 1 | | euckr_korean_ci | euckr | 19 | Yes | Yes | 1 | | euckr_bin | euckr | 85 | | Yes | 1 | | koi8u_general_ci | koi8u | 22 | Yes | Yes | 1 | | koi8u_bin | koi8u | 75 | | Yes | 1 | | gb2312_chinese_ci | gb2312 | 24 | Yes | Yes | 1 | | gb2312_bin | gb2312 | 86 | | Yes | 1 | | greek_general_ci | greek | 25 | Yes | Yes | 1 | | greek_bin | greek | 70 | | Yes | 1 | | cp1250_general_ci | cp1250 | 26 | Yes | Yes | 1 | | cp1250_czech_cs | cp1250 | 34 | | Yes | 2 | | cp1250_croatian_ci | cp1250 | 44 | | Yes | 1 | | cp1250_bin | cp1250 | 66 | | Yes | 1 | | cp1250_polish_ci | cp1250 | 99 | | Yes | 1 | | gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | | gbk_bin | gbk | 87 | | Yes | 1 | | latin5_turkish_ci | latin5 | 30 | Yes | Yes | 1 | | latin5_bin | latin5 | 78 | | Yes | 1 | | armscii8_general_ci | armscii8 | 32 | Yes | Yes | 1 | | armscii8_bin | armscii8 | 64 | | Yes | 1 | | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | | utf8_unicode_ci | utf8 | 192 | | Yes | 8 | | utf8_icelandic_ci | utf8 | 193 | | Yes | 8 | | utf8_latvian_ci | utf8 | 194 | | Yes | 8 | | utf8_romanian_ci | utf8 | 195 | | Yes | 8 | | utf8_slovenian_ci | utf8 | 196 | | Yes | 8 | | utf8_polish_ci | utf8 | 197 | | Yes | 8 | | utf8_estonian_ci | utf8 | 198 | | Yes | 8 | | utf8_spanish_ci | utf8 | 199 | | Yes | 8 | | utf8_swedish_ci | utf8 | 200 | | Yes | 8 | | utf8_turkish_ci | utf8 | 201 | | Yes | 8 | | utf8_czech_ci | utf8 | 202 | | Yes | 8 | | utf8_danish_ci | utf8 | 203 | | Yes | 8 | | utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 | | utf8_slovak_ci | utf8 | 205 | | Yes | 8 | | utf8_spanish2_ci | utf8 | 206 | | Yes | 8 | | utf8_roman_ci | utf8 | 207 | | Yes | 8 | | utf8_persian_ci | utf8 | 208 | | Yes | 8 | | utf8_esperanto_ci | utf8 | 209 | | Yes | 8 | | utf8_hungarian_ci | utf8 | 210 | | Yes | 8 | | utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 | | ucs2_general_ci | ucs2 | 35 | Yes | Yes | 1 | | ucs2_bin | ucs2 | 90 | | Yes | 1 | | ucs2_unicode_ci | ucs2 | 128 | | Yes | 8 | | ucs2_icelandic_ci | ucs2 | 129 | | Yes | 8 | | ucs2_latvian_ci | ucs2 | 130 | | Yes | 8 | | ucs2_romanian_ci | ucs2 | 131 | | Yes | 8 | | ucs2_slovenian_ci | ucs2 | 132 | | Yes | 8 | | ucs2_polish_ci | ucs2 | 133 | | Yes | 8 | | ucs2_estonian_ci | ucs2 | 134 | | Yes | 8 | | ucs2_spanish_ci | ucs2 | 135 | | Yes | 8 | | ucs2_swedish_ci | ucs2 | 136 | | Yes | 8 | | ucs2_turkish_ci | ucs2 | 137 | | Yes | 8 | | ucs2_czech_ci | ucs2 | 138 | | Yes | 8 | | ucs2_danish_ci | ucs2 | 139 | | Yes | 8 | | ucs2_lithuanian_ci | ucs2 | 140 | | Yes | 8 | | ucs2_slovak_ci | ucs2 | 141 | | Yes | 8 | | ucs2_spanish2_ci | ucs2 | 142 | | Yes | 8 | | ucs2_roman_ci | ucs2 | 143 | | Yes | 8 | | ucs2_persian_ci | ucs2 | 144 | | Yes | 8 | | ucs2_esperanto_ci | ucs2 | 145 | | Yes | 8 | | ucs2_hungarian_ci | ucs2 | 146 | | Yes | 8 | | ucs2_general_mysql500_ci | ucs2 | 159 | | Yes | 1 | | cp866_general_ci | cp866 | 36 | Yes | Yes | 1 | | cp866_bin | cp866 | 68 | | Yes | 1 | | keybcs2_general_ci | keybcs2 | 37 | Yes | Yes | 1 | | keybcs2_bin | keybcs2 | 73 | | Yes | 1 | | macce_general_ci | macce | 38 | Yes | Yes | 1 | | macce_bin | macce | 43 | | Yes | 1 | | macroman_general_ci | macroman | 39 | Yes | Yes | 1 | | macroman_bin | macroman | 53 | | Yes | 1 | | cp852_general_ci | cp852 | 40 | Yes | Yes | 1 | | cp852_bin | cp852 | 81 | | Yes | 1 | | latin7_estonian_cs | latin7 | 20 | | Yes | 1 | | latin7_general_ci | latin7 | 41 | Yes | Yes | 1 | | latin7_general_cs | latin7 | 42 | | Yes | 1 | | latin7_bin | latin7 | 79 | | Yes | 1 | | cp1251_bulgarian_ci | cp1251 | 14 | | Yes | 1 | | cp1251_ukrainian_ci | cp1251 | 23 | | Yes | 1 | | cp1251_bin | cp1251 | 50 | | Yes | 1 | | cp1251_general_ci | cp1251 | 51 | Yes | Yes | 1 | | cp1251_general_cs | cp1251 | 52 | | Yes | 1 | | cp1256_general_ci | cp1256 | 57 | Yes | Yes | 1 | | cp1256_bin | cp1256 | 67 | | Yes | 1 | | cp1257_lithuanian_ci | cp1257 | 29 | | Yes | 1 | | cp1257_bin | cp1257 | 58 | | Yes | 1 | | cp1257_general_ci | cp1257 | 59 | Yes | Yes | 1 | | binary | binary | 63 | Yes | Yes | 1 | | geostd8_general_ci | geostd8 | 92 | Yes | Yes | 1 | | geostd8_bin | geostd8 | 93 | | Yes | 1 | | cp932_japanese_ci | cp932 | 95 | Yes | Yes | 1 | | cp932_bin | cp932 | 96 | | Yes | 1 | | eucjpms_japanese_ci | eucjpms | 97 | Yes | Yes | 1 | | eucjpms_bin | eucjpms | 98 | | Yes | 1 | +--------------------------+----------+-----+---------+----------+---------+ 129 rows in set (0.00 sec) mysql>字符集文件
mysql> \! ls -l /usr/share/mysql/charsets/ total 232 -rw-r--r--. 1 root root 5491 May 11 12:02 armscii8.xml -rw-r--r--. 1 root root 5477 May 11 12:02 ascii.xml -rw-r--r--. 1 root root 8206 May 11 12:02 cp1250.xml -rw-r--r--. 1 root root 8365 May 11 12:02 cp1251.xml -rw-r--r--. 1 root root 5534 May 11 12:02 cp1256.xml -rw-r--r--. 1 root root 8867 May 11 12:02 cp1257.xml -rw-r--r--. 1 root root 5471 May 11 12:02 cp850.xml -rw-r--r--. 1 root root 5493 May 11 12:02 cp852.xml -rw-r--r--. 1 root root 5578 May 11 12:02 cp866.xml -rw-r--r--. 1 root root 6494 May 11 12:02 dec8.xml -rw-r--r--. 1 root root 5481 May 11 12:02 geostd8.xml -rw-r--r--. 1 root root 5693 May 11 12:02 greek.xml -rw-r--r--. 1 root root 5482 May 11 12:02 hebrew.xml -rw-r--r--. 1 root root 5467 May 11 12:02 hp8.xml -rw-r--r--. 1 root root 18272 May 11 12:02 Index.xml -rw-r--r--. 1 root root 5494 May 11 12:02 keybcs2.xml -rw-r--r--. 1 root root 5475 May 11 12:02 koi8r.xml -rw-r--r--. 1 root root 6497 May 11 12:02 koi8u.xml -rw-r--r--. 1 root root 9781 May 11 12:02 latin1.xml -rw-r--r--. 1 root root 7203 May 11 12:02 latin2.xml -rw-r--r--. 1 root root 5480 May 11 12:02 latin5.xml -rw-r--r--. 1 root root 7403 May 11 12:02 latin7.xml -rw-r--r--. 1 root root 8012 May 11 12:02 macce.xml -rw-r--r--. 1 root root 8023 May 11 12:02 macroman.xml -rw-r--r--. 1 root root 1749 May 11 12:02 README -rw-r--r--. 1 root root 6495 May 11 12:02 swe7.xml mysql>MySQL字符集建议
建库建表时,要尽量设定使用的字符集,而不是依赖于MySQL的默认设置
一般情况下将数据库和连接字符集都置为utf8是较好的选择,因为网页、MySQL CAPI、PHP API使用UTF8的情况较多
如何设置字符集
只对当前会话生效,全局变更,需加global
SET character_set_client = utf8; SET character_set_connection = utf8; SET character_set_database = utf8; SET character_set_results = utf8; SET character_set_server = utf8; SET collation_connection = utf8; SET collation_database = utf8; SET collation_server = utf8;
或修改my.cnf增加下面的内容
default-character-set = utf8
容易出现乱码的原因
没有正确的设置字符集是主要原因。
如果字段没有设置字符集,默认会使用表的数据集 如果表没有指定字符集,默认会使用数据库存的字符集 如果数据库在创建的时候没有指定字符集,默认会使用character_set_server设定值 如果我们修改character_set_server指定字符集,会使用mysql默认
检查顺序:字段字符集-->表的字符集-->数据库字符集-->character_set_server-->MySQL默认
比如:MySQL默认的字符集是latin1,我们使用了latin1字符集,如果连接层字符集(character_set_connection)是UTF-8,插入中文就会出现乱码
set names gbk的操作(相当于下面的三条命令)
SET character_set_client = gbk; SET character_set_results = gbk; SET character_set_connection = gbk; mysql> set names gbk; Query OK, 0 rows affected (0.00 sec) mysql> show variables like '%charac%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | gbk | | character_set_connection | gbk | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) mysql>
创建库或表时指定字符集
mysql>create database if not exists ohs default charset utf8 collate utf8_general_ci;
mysql> use ohs Database changed mysql> CREATE TABLE test_table ( -> id INT(20) NOT NULL AUTO_INCREMENT, -> name VARCHAR(32) NOT NULL DEFAULT '', -> other INT(20) NOT NULL DEFAULT '0', -> PRIMARY KEY (id), -> INDEX name (name), -> INDEX other_key (other) -> ) -> ENGINE=MyISAM -> DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.00 sec) mysql> show create table test_table\G *************************** 1. row *************************** Table: test_table Create Table: CREATE TABLE `test_table` ( `id` int(20) NOT NULL AUTO_INCREMENT, `name` varchar(32) NOT NULL DEFAULT '', `other` int(20) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `name` (`name`), KEY `other_key` (`other`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql>
注意一点就好:客户端、服务器端字符集保持一致就好
字符集转化
导出原表,通过iconv工具转化,然后导入即可
默认字符集Latin1
Latin1是ISO-8859-1的别名,有些环境下写作Latin-1。 ISO-8859-1编码是单字节编码,向下兼容ASCII,其编码范围是0x00-0xFF,0x00-0x7F之间完全和ASCII一致,0x80-0x9F之间是控制字符,0xA0-0xFF之间是文字符号。 ISO-8859-1收录的字符除ASCII收录的字符外,还包括西欧语言、希腊语、泰语、阿拉伯语、希伯来语对应的文字符号。欧元符号出现的比较晚,没有被收录在ISO-8859-1当中。 因为ISO-8859-1编码范围使用了单字节内的所有空间,在支持ISO-8859-1的系统中传输和存储其他任何编码的字节流都不会被抛弃。换言之,把其他任何编码的字节流当作ISO-8859-1编码看待都没有问题。这是个很重要的特性,MySQL数据库默认编码是Latin1就是利用了这个特性。ASCII编码是一个7位的容器,ISO-8859-1编码是一个8位的容器。
使用gb2312的建议
如果使用gb2312编码,建议使用latin1作为数据表的默认字符集,这样就能直接用中文在命令行工具中插入数据,并且可以直接显示出来。而不要使用gb2312或者gbk等字符集,如果担心查询排序等问题,可以使用binary属性约束。 create table my_table ( name varchar(20) binary not null default '')type=myisam default charset latin1;
MySQL字符集兼容性
http://imysql.com/2012/08/20/about-mysql-character-set-compatibility.html
http://baike.baidu.com/item/latin1
http://dev.mysql.com/doc/refman/5.7/en/charset-mysql.html
http://dev.mysql.com/doc/refman/5.7/en/data-types.html
http://imysql.com/charset_tips