[Tutor] urllib.urlencode and unicode strings
Kent Johnson
kent37 at tds.net
Fri May 18 01:29:54 CEST 2007
Jon Crump wrote:
> Dear all,
>
> I've got a python list of data pulled via ElementTree from an xml file
> <?xml version="1.0" encoding="utf-8"?> that contains mixed str and unicode
> strings, like this:
>
> [u'Jumi\xe9ge, Normandie', 'Farringdon, Hampshire', 'Ravensworth,
> Durham', 'La Suse, Anjou', 'Lions, Normandie', 'Lincoln, Lincolnshire',
> 'Chelmsford, Essex', u'Ch\xe2telerault, Poitou', 'Bellencombre,
> Normandie'] etc.
>
> trying to use geopy to geocode these placenames I get the following
> traceback:
>
> Traceback (most recent call last):
> File "<stdin>", line 2, in <module>
> File "build/bdist.macosx-10.3-fat/egg/geopy/geocoders.py", line 327, in
> geocode
> File
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
> line 1242, in urlencode
> v = quote_plus(str(v))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in
> position 2: ordinal not in range(128)
>
> It appears that urlencode is choking on the unicode literals. Can anybody
> tell me how I can translate these strings into something like this:
> Ch%C3%A2tellerault.
It's two steps. First convert to utf-8, then urlencode:
>>> c = u'\xe2'
>>> c
u'\xe2'
>>> c.encode('utf-8')
'\xc3\xa2'
>>> import urllib
>>> urllib.quote(c.encode('utf-8'))
'%C3%A2'
Kent
More information about the Tutor
mailing list