正確的改變table字符集的語句是:
alter table xxx convert to character set utf8;
而不是想當然的:
alter table xxx default charset utf8;
字符集從GBK轉成utf8,
會增大字段所占用的空間,有可能會改變字段的類型:
比如text有可能會自動變成medium text
但是varchar沒有自動變成medium text.
下面看測試:
建一張GBK的表:
mysql> SHOW CREATE TABLE xxx;
CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(255) NOT NULL DEFAULT '',
`body` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk
用想當然的辦法把它轉成utf8:
mysql> ALTER TABLE xxx DEFAULT charset=utf8;
Query OK, 0 ROWS affected (0.13 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql>
mysql> SET names utf8;
Query OK, 0 ROWS affected (0.00 sec)
插入測試數據:
mysql> INSERT INTO xxx SET title='我愛北京天安門';
Query OK, 1 ROW affected (0.06 sec)
成功了。。。然後您就認為萬事OK了?
mysql> INSERT INTO xxx SET title='㤇';
Query OK, 1 ROW affected, 1 warning (0.03 sec)
Warning (Code 1366): Incorrect string VALUE: 'xE3xA4x87' FOR COLUMN 'title' at ROW 1
注意 title和body字段的CHARACTER SET 仍然為gbk
mysql> SHOW CREATE TABLE xxx;
| xxx | CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(255) CHARACTER SET gbk NOT NULL DEFAULT '',
`body` text CHARACTER SET gbk,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 |
正確的做法:
mysql> DROP TABLE xxx;
Query OK, 0 ROWS affected (0.02 sec)
mysql> CREATE TABLE `xxx` (
-> `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-> `title` VARCHAR(255) NOT NULL DEFAULT '',
-> `body` text,
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=gbk;
Query OK, 0 ROWS affected (0.06 sec)
mysql> ALTER TABLE xxx CONVERT TO CHARACTER SET utf8;
Query OK, 0 ROWS affected (0.14 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> INSERT INTO xxx SET title='㤇';
Query OK, 1 ROW affected (0.04 sec)
mysql> SHOW CREATE TABLE xxx;
| xxx | CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(255) NOT NULL DEFAULT '',
`body` mediumtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
可是你注意到 body字段從text變成 mediumtext了嗎?
text字段,最多存儲65535字節,換成GBK的字符就是32767個字符,這32767個gbk字符轉成utf8卻要占用98301字節,已經超過text的存儲能力,所以被自動轉成了mediumtext。
下面測試VARCHAR的情況:
VARCHAR除了數據部分,還有1-2個字節用來保存數據的長度。如果只使用一個字節,那麼長度上限為255(2^8-1),如果使用二個字節,長度上限為65535(2^16-1)。
所以VARCHAR最多存儲65535字節,換成GBK字符為32767個:
mysql> DROP TABLE xxx;
Query OK, 0 ROWS affected (0.02 sec)
mysql>
mysql> CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(32768) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1074 (42000): COLUMN LENGTH too big FOR COLUMN 'title' (MAX = 32767); USE BLOB OR TEXT instead
mysql>
mysql>
可是 32767也是不成的。。還有每行記錄的總長度限制(不包括text和BLOB字段) 65535:
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32767) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32766) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32765) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32764) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
Query OK, 0 ROWS affected (0.06 sec)
32764 * 2 + 2 + 4 = 65534 最接近於65535了,
32765 * 2 + 2 + 4 = 65536 超過65535。
*2是因為gbk字符占用2字節。
+2是VARCHAR還需要額外2字節保存數據的長度。
+4是id字段INT UNSIGNED占了4字節。
合理嗎?
轉換字符集為utf8之後,VARCHAR(32764) 已經不足以保存 32764個utf8字符:
mysql> ALTER TABLE xxx CONVERT TO CHARACTER SET utf8;
ERROR 1074 (42000): COLUMN LENGTH too big FOR COLUMN 'title' (MAX = 21845); USE BLOB OR TEXT instead
21845 * 3 = 65535.
後面附一些關於mysql數據庫字符集修改方法
MySQL:修改默認字符集,轉換字符集(MySQL 5.X)
1. 修改 MySQL 數據庫默認字符集(mysql database default character set)
alter database testdb default character set = gb2312; -www.2cto.com-
2. 修改 MySQL 數據表默認字符集(mysql table default character set)
alter table ip_to_country default character set = gb2312;
注意:修改 MySQL 的默認字符集,不管是在數據庫級別,還是數據表級別,對已經存儲的字符數據無任何改變。只是新增的表或列,開始使用新的字符集。