python - UnicodeDecodeError: 'ascii' codec can't decode -


i'm reading file contains romanian words in python file.readline(). i've got problem many characters because of encoding.

example :

>>> = "aberație"  #type 'str' >>> -> 'abera\xc8\x9bie' >>> print sys.stdin.encoding utf-8 

i've tried encode() utf-8, cp500 etc, doesn't work.

i can't find right character encoding have use ?

thanks in advance.

edit: aim store word file in dictionnary, , when printing it, obtain aberație , not 'abera\xc8\x9bie'

what trying do?

this set of bytes:

bytes = 'abera\xc8\x9bie' 

it's set of bytes represents utf-8 encoding of string "aberație". decode bytes unicode string:

>>> bytes  'abera\xc8\x9bie' >>> print bytes  aberaÈ›ie >>> abberation = bytes.decode('utf-8') >>> abberation  u'abera\u021bie' >>> print abberation  aberație 

if want store unicode string file, have encode particular byte format of choosing:

>>> abberation.encode('utf-8') 'abera\xc8\x9bie' >>> abberation.encode('utf-16') '\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00' 

Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -