UnicodeDecodeError: Invalid Continuation Byte
When attempting to decode a string using the "utf-8" codec, the error "UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9..." may arise. This indicates an invalid continuation byte in the string.
In the provided code snippet:
o = "a test of \xe9 char" v = o.decode("utf-8")
The string "a test of \xe9 char" contains a character represented by the byte \xe9. This byte is not a valid continuation byte in a UTF-8 sequence, so the "utf-8" codec cannot decode it.
However, when using the "latin-1" codec instead, the decoding succeeds:
v = o.decode("latin-1")
This is because the "latin-1" codec interprets \xe9 as a single-byte character, rather than as part of a UTF-8 sequence. Consequently, the string remains a string without encountering the UnicodeDecodeError.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3