select - MySQL matching unicode characters with ascii version -

i'm running mysql 5.1.50 , have table looks this:

organizations | create table `organizations` (   `id` int(11) not null auto_increment,   `name` text character set utf8 collate utf8_unicode_ci not null,   `url` text character set utf8 collate utf8_unicode_ci default null,   `phone` varchar(20) character set utf8 collate utf8_unicode_ci default null,   `timestamp` timestamp not null default current_timestamp on update current_timestamp,   primary key (`id`),   key `id` (`id`) ) engine=myisam auto_increment=25837 default charset=utf8 |

the problem i'm having mysql matching unicode characters ascii versions. example when search word contains 'é', match same word has 'e' instead, , vice versa:

mysql> set names utf8; query ok, 0 rows affected (0.00 sec)  mysql> select id, name `organizations` `name` = 'universite de montreal';     +-------+-------------------------+ | id    | name                    | +-------+-------------------------+ | 16973 | université de montreal  | +-------+-------------------------+ 1 row in set (0.01 sec)

i these results both php , command line console. how can accurate matches select queries?

thanks!

you specified name column text character set utf8 collate utf8_unicode_ci tells mysql consider e , é equivalent in matching , sorting. collation , utf8_general_ci both make lot of things equivalent.

http://www.collation-charts.org/ great resource once learn how read charts, pretty easy.

if want e , é etc. considered different must choose different collation. find out collations on server (assuming you're limited utf-8 encoding):

mysql> show collation 'utf8%';

and choose using collation charts reference.

one more special collation utf8_bin in there no equivalencies, it's binary match.

the mysql unicode collations i'm aware of not language specific utf8_unicode_ci, utf8_general_ci , utf8_bin. rather weird. real purpose of collation make computer match , sort person somewhere expect. hungarian , turkish dictionaries have entries ordered according different rules. specifying collation allows sort , match according such local rules.

for example, seems danes consider e , é equivalent icelanders don't:

mysql> select _utf8'e' collate utf8_danish_ci     -> = _utf8'é' collate utf8_danish_ci equal; +-------+ | equal | +-------+ |     1 | +-------+  mysql> select _utf8'e' collate utf8_icelandic_ci     -> = _utf8'é' collate utf8_icelandic_ci equal; +-------+ | equal | +-------+ |     0 | +-------+

another handy trick fill 1 column table bunch of characters you're interested in (it's easier script) , mysql can tell equivalencies:

mysql> create table t (c char(1) character set utf8); mysql> insert t values ('a'), ('ä'), ('á'); mysql> select group_concat(c) t group c collate utf8_icelandic_ci; +-----------------+ | group_concat(c) | +-----------------+ |               | | á               | | ä               | +-----------------+  mysql> select group_concat(c) t group c collate utf8_danish_ci; +-----------------+ | group_concat(c) | +-----------------+ | a,á             | | ä               | +-----------------+  mysql> select group_concat(c) t group c collate utf8_general_ci; +-----------------+ | group_concat(c) | +-----------------+ | a,ä,á           | +-----------------+

Search This Blog

Barbera

select - MySQL matching unicode characters with ascii version -

Comments

Post a Comment

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

c# - SharpSVN - How to get the previous revision? -

php cli reading files and how to fix it? -