select - MySQL matching unicode characters with ascii version -
i'm running mysql 5.1.50 , have table looks this:
organizations | create table `organizations` ( `id` int(11) not null auto_increment, `name` text character set utf8 collate utf8_unicode_ci not null, `url` text character set utf8 collate utf8_unicode_ci default null, `phone` varchar(20) character set utf8 collate utf8_unicode_ci default null, `timestamp` timestamp not null default current_timestamp on update current_timestamp, primary key (`id`), key `id` (`id`) ) engine=myisam auto_increment=25837 default charset=utf8 |
the problem i'm having mysql matching unicode characters ascii versions. example when search word contains 'é', match same word has 'e' instead, , vice versa:
mysql> set names utf8; query ok, 0 rows affected (0.00 sec) mysql> select id, name `organizations` `name` = 'universite de montreal'; +-------+-------------------------+ | id | name | +-------+-------------------------+ | 16973 | université de montreal | +-------+-------------------------+ 1 row in set (0.01 sec)
i these results both php , command line console. how can accurate matches select queries?
thanks!
you specified name
column text character set utf8 collate utf8_unicode_ci
tells mysql consider e , é equivalent in matching , sorting. collation , utf8_general_ci
both make lot of things equivalent.
http://www.collation-charts.org/ great resource once learn how read charts, pretty easy.
if want e , é etc. considered different must choose different collation. find out collations on server (assuming you're limited utf-8 encoding):
mysql> show collation 'utf8%';
and choose using collation charts reference.
one more special collation utf8_bin
in there no equivalencies, it's binary match.
the mysql unicode collations i'm aware of not language specific utf8_unicode_ci
, utf8_general_ci
, utf8_bin
. rather weird. real purpose of collation make computer match , sort person somewhere expect. hungarian , turkish dictionaries have entries ordered according different rules. specifying collation allows sort , match according such local rules.
for example, seems danes consider e , é equivalent icelanders don't:
mysql> select _utf8'e' collate utf8_danish_ci -> = _utf8'é' collate utf8_danish_ci equal; +-------+ | equal | +-------+ | 1 | +-------+ mysql> select _utf8'e' collate utf8_icelandic_ci -> = _utf8'é' collate utf8_icelandic_ci equal; +-------+ | equal | +-------+ | 0 | +-------+
another handy trick fill 1 column table bunch of characters you're interested in (it's easier script) , mysql can tell equivalencies:
mysql> create table t (c char(1) character set utf8); mysql> insert t values ('a'), ('ä'), ('á'); mysql> select group_concat(c) t group c collate utf8_icelandic_ci; +-----------------+ | group_concat(c) | +-----------------+ | | | á | | ä | +-----------------+ mysql> select group_concat(c) t group c collate utf8_danish_ci; +-----------------+ | group_concat(c) | +-----------------+ | a,á | | ä | +-----------------+ mysql> select group_concat(c) t group c collate utf8_general_ci; +-----------------+ | group_concat(c) | +-----------------+ | a,ä,á | +-----------------+
Comments
Post a Comment