Problem with decoding UTF-8 JSON in perl -


utf-8 characters destroyed when processed json library (maybe similar problem decoding unicode json in perl, setting binmode creates problem).

i have reduced problem down following example:

(hlovdal) localhost:/tmp/my_test>cat my_test.pl #!/usr/bin/perl -w use strict; use warnings; use json; use file::slurp; use getopt::long; use encode;  $set_binmode = 0; getoptions("set-binmode" => \$set_binmode);  if ($set_binmode) {         binmode(stdin,  ":encoding(utf-8)");         binmode(stdout, ":encoding(utf-8)");         binmode(stderr, ":encoding(utf-8)"); }  sub check {         $text = shift;         return "is_utf8(): " . (encode::is_utf8($text) ? "1" : "0") . ", is_utf8(1): " . (encode::is_utf8($text, 1) ? "1" : "0"). ". "; }  $my_test = "hei på deg"; $json_text = read_file('my_test.json'); $hash_ref = json->new->utf8->decode($json_text);  print check($my_test), "\$my_test = $my_test\n"; print check($json_text), "\$json_text = $json_text"; print check($$hash_ref{'my_test'}), "\$\$hash_ref{'my_test'} = " . $$hash_ref{'my_test'} .  "\n";  (hlovdal) localhost:/tmp/my_test> 

when running testing text reason crippeled iso-8859-1. setting binmode sort of solves causes double encoding of other strings.

(hlovdal) localhost:/tmp/my_test>cat my_test.json  { "my_test" : "hei på deg" } (hlovdal) localhost:/tmp/my_test>file my_test.json  my_test.json: utf-8 unicode text (hlovdal) localhost:/tmp/my_test>hexdump -c my_test.json  0000000   {       "   m   y   _   t   e   s   t   "       :       "   h 0000010   e         p 303 245       d   e   g   "       }  \n         000001e (hlovdal) localhost:/tmp/my_test> (hlovdal) localhost:/tmp/my_test>perl my_test.pl is_utf8(): 0, is_utf8(1): 0. $my_test = hei på deg is_utf8(): 0, is_utf8(1): 0. $json_text = { "my_test" : "hei på deg" } is_utf8(): 1, is_utf8(1): 1. $$hash_ref{'my_test'} = hei p� deg (hlovdal) localhost:/tmp/my_test>perl my_test.pl --set-binmode is_utf8(): 0, is_utf8(1): 0. $my_test = hei pÃ¥ deg is_utf8(): 0, is_utf8(1): 0. $json_text = { "my_test" : "hei pÃ¥ deg" } is_utf8(): 1, is_utf8(1): 1. $$hash_ref{'my_test'} = hei på deg (hlovdal) localhost:/tmp/my_test> 

what causing , how solve?


this on newly installed , date fedora 15 system.

(hlovdal) localhost:/tmp/my_test>perl --version | grep version perl 5, version 12, subversion 4 (v5.12.4) built x86_64-linux-thread-multi (hlovdal) localhost:/tmp/my_test>rpm -q perl-json perl-json-2.51-1.fc15.noarch (hlovdal) localhost:/tmp/my_test>locale lang=en_us.utf-8 lc_ctype="en_us.utf-8" lc_numeric="en_us.utf-8" lc_time="en_us.utf-8" lc_collate="en_us.utf-8" lc_monetary="en_us.utf-8" lc_messages="en_us.utf-8" lc_paper="en_us.utf-8" lc_name="en_us.utf-8" lc_address="en_us.utf-8" lc_telephone="en_us.utf-8" lc_measurement="en_us.utf-8" lc_identification="en_us.utf-8" lc_all= (hlovdal) localhost:/tmp/my_test> 

update: adding use utf8 not solve it, characters still not processed right (although different before):

(hlovdal) localhost:/tmp/my_test>perl  my_test.pl is_utf8(): 1, is_utf8(1): 1. $my_test = hei p� deg is_utf8(): 0, is_utf8(1): 0. $json_text = { "my_test" : "hei på deg" } is_utf8(): 1, is_utf8(1): 1. $$hash_ref{'my_test'} = hei p� deg (hlovdal) localhost:/tmp/my_test>perl  my_test.pl --set-binmode is_utf8(): 1, is_utf8(1): 1. $my_test = hei på deg is_utf8(): 0, is_utf8(1): 0. $json_text = { "my_test" : "hei pÃ¥ deg" } is_utf8(): 1, is_utf8(1): 1. $$hash_ref{'my_test'} = hei på deg (hlovdal) localhost:/tmp/my_test> 

as noted perlunifaq

can use unicode in perl sources?

yes, can! if sources utf-8 encoded, can indicate use utf8 pragma.

use utf8; 

this doesn't input, or output. influences way sources read. can use unicode in string literals, in identifiers (but still have "word characters" according \w ), , in custom delimiters.

you saved program in utf-8, forget tell perl. add use utf8;.

also, programming complicated. json functions dwym. inspect stuff, use devel::peek.

use utf8; # following line $my_test = 'hei på deg';  use devel::peek qw(dump); use file::slurp (read_file); use json qw(decode_json);  $hash_ref = decode_json(read_file('my_test.json'));  dump $hash_ref; # perl character strings dump $my_test;  # perl character string 

Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -