CakePHP encoding problem : storing uppercase S with caron on top, saves in the database but causes errors while processed by cake -
so working in site sores cuneiform tablets info. use semitic chars transliteration.
in script, create term list translittaration of tablet.
my problem Š, script created 2 different terms because thinks there space in word because of way cake treats special char.
exemple :
partial contents of tablet :
- utu-diŠ-nu-il2
terms tablet when treated script :
utu-diŠ, -nu-il2
it should :
utu-diŠ-nu-il2
when print contents of array in course of treatment of contents, see :
- utu-di� -nu-il2
so means uncorrect parsing of text creates space interpreted in script 2 words instead of one.
in database, text fine...
i these errors :
warning (512): sql error: 1366: incorrect string value: '\xc5' column 'term' @ row 1 [core\cake\libs\model\datasources\dbo_source.php, line 684]
query: insert
terms
(term
,lft
,rght
) values ('utu-di�', 449, 450)query: insert
terms
(term
,lft
,rght
) values ('a�', 449, 450)query: insert
terms
(term
,lft
,rght
) values ('xdi�', 449, 450)
anybody knows make work ?
thanks !
added info :
$terms=$this->data['tablet']['translit']; $terms= str_replace(array('\r\n', '\r', '\n','\n\r','\t'), ' ', $terms); $terms = trim($terms, chr(173)); print_r($terms); $terms = preg_replace('/\s+/', ' ', $terms); $terms = explode(" ", $terms); $terms=array_map('trim', $terms); $anti_terms = array('@tablet','1.','2.','3.','4.','5.','6.','7.','7.','9.','10.','11.','12.','13.','14.','15.','16.','17.','18.','19.','20.','rev.', 'obv.','@tablet','@obverse','@reverse','c1','c2','c3','c4','c5','c6','c7','c8','c9', '\r', '\n','\r\n', '\t',''. ' ', null, chr(173), 'x', '[x]','[...]' ); foreach($terms $key => $term) { if(in_array($term, $anti_terms) || is_numeric($term)) { unset($terms[$key]); } }
if put print_r before preg, s good, if after, display black lozenge. guess preg function problem !
just found : http://www.php.net/manual/fr/function.preg-replace.php#84385
but seems that
mb_ereg_replace()
causes same problem preg_replace() ....
solutuion :
mb_internal_encoding("utf-8"); mb_regex_encoding("utf-8"); $terms = mb_ereg_replace('\s+', ' ', $terms);
and error gone ... !
mb_internal_encoding("utf-8"); mb_regex_encoding("utf-8"); $terms = mb_ereg_replace('\s+', ' ', $terms);
Comments
Post a Comment