How to convert any character encoding to UTF8 on PHP -
i'm working on web crawler grabs data sites on world, , dealing distinct languages , encodings.
currently i'm using following function, , works in 99% of cases. there 1% giving me headaches.
function convertencoding($str) { return iconv(mb_detect_encoding($str), "utf-8", $str); }
rather blindly trying detect encoding, should first check if page downloaded has listed character set. character set may set in http response header, example:
content-type:text/html; charset=utf-8 or in html meta tag, example:
<meta http-equiv="content-type" content="text/html; charset=utf-8" /> only if neither available try guess encoding mb_detect_encoding() or other methods.
Comments
Post a Comment