[Tutor] urllib.urlencode and unicode strings

Kent Johnson kent37 at tds.net
Fri May 18 01:29:54 CEST 2007


Jon Crump wrote:
> Dear all,
> 
> I've got a python list of data pulled via ElementTree from an xml file 
> <?xml version="1.0" encoding="utf-8"?> that contains mixed str and unicode 
> strings, like this:
> 
> [u'Jumi\xe9ge, Normandie', 'Farringdon, Hampshire', 'Ravensworth, 
> Durham', 'La Suse, Anjou', 'Lions, Normandie', 'Lincoln, Lincolnshire', 
> 'Chelmsford, Essex', u'Ch\xe2telerault, Poitou', 'Bellencombre, 
> Normandie'] etc.
> 
> trying to use geopy to geocode these placenames I get the following 
> traceback:
> 
> Traceback (most recent call last):
>    File "<stdin>", line 2, in <module>
>    File "build/bdist.macosx-10.3-fat/egg/geopy/geocoders.py", line 327, in 
> geocode
>    File 
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py", 
> line 1242, in urlencode
>      v = quote_plus(str(v))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in 
> position 2: ordinal not in range(128)
> 
> It appears that urlencode is choking on the unicode literals. Can anybody 
> tell me how I can translate these strings into something like this: 
> Ch%C3%A2tellerault.

It's two steps. First convert to utf-8, then urlencode:
 >>> c = u'\xe2'
 >>> c
u'\xe2'
 >>> c.encode('utf-8')
'\xc3\xa2'
 >>> import urllib
 >>> urllib.quote(c.encode('utf-8'))
'%C3%A2'

Kent


More information about the Tutor mailing list