How to convert any character encoding to UTF8 on PHP -


i'm working on web crawler grabs data sites on world, , dealing distinct languages , encodings.

currently i'm using following function, , works in 99% of cases. there 1% giving me headaches.

function convertencoding($str) {     return iconv(mb_detect_encoding($str), "utf-8", $str); } 

rather blindly trying detect encoding, should first check if page downloaded has listed character set. character set may set in http response header, example:

content-type:text/html; charset=utf-8 

or in html meta tag, example:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />  

only if neither available try guess encoding mb_detect_encoding() or other methods.


Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -