"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Why Does `utf-8` Decoding Fail on `\\xe9` While `latin-1` Succeeds?

Why Does `utf-8` Decoding Fail on `\\xe9` While `latin-1` Succeeds?

Posted on 2025-02-24
Browse:933

Why Does `utf-8` Decoding Fail on `\xe9` While `latin-1` Succeeds?

UnicodeDecodeError: Invalid Continuation Byte

When attempting to decode a string using the "utf-8" codec, the error "UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9..." may arise. This indicates an invalid continuation byte in the string.

In the provided code snippet:

o = "a test of \xe9 char"
v = o.decode("utf-8")

The string "a test of \xe9 char" contains a character represented by the byte \xe9. This byte is not a valid continuation byte in a UTF-8 sequence, so the "utf-8" codec cannot decode it.

However, when using the "latin-1" codec instead, the decoding succeeds:

v = o.decode("latin-1")

This is because the "latin-1" codec interprets \xe9 as a single-byte character, rather than as part of a UTF-8 sequence. Consequently, the string remains a string without encountering the UnicodeDecodeError.

Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3