UnicodeDecodeError
PythonERRORCriticalUnicode ErrorHIGH confidence

Unicode decoding error

What this means

Raised when a Unicode-related decoding error occurs. This almost always happens when you try to read a sequence of bytes (e.g., from a file or network) and interpret it as text using the wrong encoding.

Why it happens
  1. 1Reading a file saved with one encoding (e.g., `latin-1`) while trying to decode it as another (e.g., `utf-8`).
  2. 2Receiving binary data from a network socket and trying to decode it as text without knowing the correct encoding.
  3. 3The data is corrupted and contains invalid byte sequences for the specified encoding.
How to reproduce

This error is triggered when trying to decode a byte sequence that is not valid UTF-8, using the UTF-8 codec.

trigger — this will error
trigger — this will error
# 0xff is not a valid start byte in UTF-8
byte_sequence = b'ÿ'
try:
    byte_sequence.decode('utf-8')
except UnicodeDecodeError as e:
    print(f"Caught UnicodeDecodeError: {e}")

expected output

Caught UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Fix 1

Specify the correct encoding when opening files

WHEN You know the encoding of the file you are reading.

Specify the correct encoding when opening files
# If you know the file is encoded in latin-1
try:
    with open('my_file.txt', 'r', encoding='latin-1') as f:
        content = f.read()
except FileNotFoundError:
    print("File not found.")

Why this works

Explicitly providing the correct encoding to `open()` tells Python how to interpret the bytes, preventing decode errors.

Fix 2

Provide an error handling strategy

WHEN A file might contain a few invalid characters that you can afford to ignore or replace.

Provide an error handling strategy
byte_sequence = b'helloÿworld'
# 'replace' will insert a placeholder for invalid bytes
text = byte_sequence.decode('utf-8', errors='replace')
print(text)
# 'ignore' will simply discard invalid bytes
text_ignored = byte_sequence.decode('utf-8', errors='ignore')
print(text_ignored)

Why this works

The `.decode()` method's `errors` parameter allows you to specify a policy for handling bytes that can't be decoded, such as replacing them (`'replace'`) or discarding them (`'ignore'`).

Code examples
Triggerpython
b"\xff".decode("utf-8")  # UnicodeDecodeError: invalid start byte
Handle with errors parampython
try:
    text = data.decode("utf-8")
except UnicodeDecodeError:
    text = data.decode("utf-8", errors="replace")
Avoid by specifying encodingpython
with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
    content = f.read()  # never raises UnicodeDecodeError
What not to do

Guessing encodings one by one until something works

This is unreliable and can lead to silently corrupted text (mojibake). The correct solution is to find out the actual encoding of your data source.

Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev

← All Python errors