email parsing - HeaderParseError in python -
i headerparseerror if try parse string decode_header() in python 2.6.5 (and 2.7). here repr() of string:
'=?iso-8859-1?b?qw5tzwxkdw5nie5ldhphbnnjagx1c3mgu_xkcmluzznwlmpwzw==?=' this string comes mime email contains jpeg picture. thunderbird can decode filename (which contains german umlauts).
>>> email.header import decode_header >>> decode_header('=?iso-8859-1?b?qw5tzwxkdw5nie5ldhphbnnjagx1c3mgu_xkcmluzznwlmpwzw==?=') traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib64/python2.6/email/header.py", line 101, in decode_header raise headerparseerror email.errors.headerparseerror
it seems incompatibility between python's character set base64-encoded strings , mail agent's:
>>> email.header import decode_header >>> a='qw5tzwxkdw5nie5ldhphbnnjagx1c3mgu_xkcmluzznwlmpwzw==' >>> decode_header(a) traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.7/email/header.py", line 108, in decode_header raise headerparseerror email.errors.headerparseerror >>> a1= a.replace('_', '/') >>> decode_header(a1) [('anmeldung netzanschluss s\xecdring3p.jpg', 'iso-8859-1')] >>> print _[0][0].decode(_[0][1]) anmeldung netzanschluss südring3p.jpg python utilizes character set wikipedia article suggests (i.e 0-9, a-z, a-z, +, /). in same article, alternatives (including underscore that's issue here) included; however, underscore's value vague (it's value 62 or 63, depending on alternative).
i don't know python can guess intentions of b0rken mail agents; suggest appropriate guessing whenever decode_header fails.
i'm calling “broken” mail agent because there no need escape either + or / in message header: it's not url, why not use typical character set?
Comments
Post a Comment