utf8的数据库,存入表情符,会出错
代码如下 | 复制代码 |
Incorrect string value: '\xF0\x9F\x98\x84\xF0\x9F...' for column 'content' |
错误的解决办法:
代码如下 | 复制代码 |
4 byte Unicode characters aren't yet widely used, so not every application out there fully supports them. MySQL 5.5 works fine with 4 byte characters when properly configured – check if your other components can work with them as well. Here's a few other things to check out: Make sure all your tables' default character sets and text fields are converted to utf8mb4, in addition to setting the client & server character sets, e.g. ALTER TABLE mytable charset=utf8mb4, MODIFY COLUMN textfield1 VARCHAR(255) CHARACTER SET utf8mb4,MODIFY COLUMN textfield2 VARCHAR(255) CHARACTER SET utf8mb4; and so on. If your data is already in the utf8 character set, it should convert to utf8mb4 in place without any problems. As always, back up your data before trying! Also make sure your app layer sets its database connections' character set to utf8mb4. Double-check this is actually happening – if you're running an older version of your chosen framework's mysql client library, it may not have been compiled with utf8mb4 support and it won't set the charset properly. If not, you may have to update it or compile it yourself. When viewing your data through the mysql client, make sure you're on a machine that can display emoji, and run a SET NAMES utf8mb4 before running any queries. Once every level of your application can support the new characters, you should be able to use them without any corruption. |
总结就是,表结构改为支持4字节的unicode,数据库连接也用这个字符集哦,证明是可行的。
如果别的地方不支持,可以考虑去掉这些字符:
代码如下 | 复制代码 |
Since 4-byte UTF-8 sequences always start with the bytes 0xF0-0xF7, the following should work:
$str = preg_replace('/[\xF0-\xF7].../s', '', $str); $str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str); |