Tuesday, March 1, 2011

django.utils.encoding.DjangoUnicodeDecodeError

I got the following error when tried to add an entry to a Django model via generic relations.

django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 24: unexpected code byte. You passed in 'ASL/60Styles_Timeless-3_\xb8 CaLe.asl' (<type 'str'>)

The model is like this:

class MD5(models.Model):
    value = models.CharField(max_length=32, db_index=True)
    filename = models.CharField(max_length=100)
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey()

Table's charset is utf8 and collation is utf8_general_ci.

Does it mean that the filename is not a valid utf8 string? How to fix this error or can we convert the invalid string to a valid format?

From stackoverflow
  • Your file system is apparently not using UTF-8 encoding:

    >>> a = 'ASL/60Styles_Timeless-3_\xb8 CaLe.asl'
    >>> print a.decode('utf-8')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 24: unexpected code byte
    >>> a.decode('iso8859-2')
    u'ASL/60Styles_Timeless-3_\xb8 CaLe.asl'
    >>> print a.decode('iso8859-2')
    ASL/60Styles_Timeless-3_¸ CaLe.asl
    

    Only now I've realized that the string you got is actually already unicode. Try using this to get unicode:

    >>> a.decode('raw_unicode_escape')
    u'ASL/60Styles_Timeless-3_\xb8 CaLe.asl'
    
    jack : it's running on ubuntu. system locale has been set to en_US.UTF-8. how to see current file system encoding and how to set it to utf8?
    : I've edited to add a possible solution. Hope this helps.

0 comments:

Post a Comment