utf 8 - Converting UCS2 (Unknown LE or BE) In Numeric Hex format to UTF-8 Using Perl -
hoping can point me in direction of i'm going wrong this:
i have string of (what believe) hex-encoded ucs2, provider cannot tell me if ucs2-le or ucs2-be.
like so: 0627062e062a062806270631
it translates this: اختبا
in arabic apparently... no-matter whether try converting out of hex, using straight ucs2 (le or be) or practically else can think of under sun, can't turn native-perl utf-8 can re-encode standard utf-8 (native format of our system).
code:
my $string = "0627062e062a062806270631"; $decodedhex = hex($string); #nearest $perldecodedutf8 = decode("ucs-2be", $decodedhex); $utf8 = encode('utf-8',$perldecodedutf8); open(arabictest,">ucs2test.txt"); print(arabictest $perldecodedutf8); print("done!"); close(arabictest); it outputs gibberish characters @ moment.
now 1 idea did come split string in question 4-character sections (i.e. per hex code), trying individual, known ucs2 hex value doesn't appear work.
also tried forcing output encoding, no joy there either.
thanks!
hex not way decode hex string byte sequence. pack is. (hex produces single integer, not string of bytes.) other that, close. try this:
use strict; use warnings; use encode; $string = "0627062e062a062806270631"; $decodedhex = pack('h*', $string); $perldecodedutf8 = decode("ucs-2be", $decodedhex); open(my $arabictest,">:utf8", "ucs2test.txt"); print $arabictest $perldecodedutf8; print("done!"); close($arabictest); note: want use utf-16be instead of ucs-2be. they're same thing, utf-16be allows surrogate pairs, , ucs-2be doesn't. ucs-2be text valid utf-16be, not vice versa.
Comments
Post a Comment