Unicode decoding error

Q: How to fix Python UnicodeDecodeError Unicode decoding error?

Specify the correct encoding when opening files: Explicitly providing the correct encoding to `open()` tells Python how to interpret the bytes, preventing decode errors.

Q: How to fix Python UnicodeDecodeError Unicode decoding error?

Provide an error handling strategy: The `.decode()` method's `errors` parameter allows you to specify a policy for handling bytes that can't be decoded, such as replacing them (`'replace'`) or discarding them (`'ignore'`).

What this means

Raised when a Unicode-related decoding error occurs. This almost always happens when you try to read a sequence of bytes (e.g., from a file or network) and interpret it as text using the wrong encoding.

Why it happens

1Reading a file saved with one encoding (e.g., `latin-1`) while trying to decode it as another (e.g., `utf-8`).
2Receiving binary data from a network socket and trying to decode it as text without knowing the correct encoding.
3The data is corrupted and contains invalid byte sequences for the specified encoding.

How to reproduce

This error is triggered when trying to decode a byte sequence that is not valid UTF-8, using the UTF-8 codec.

trigger — this will error

# 0xff is not a valid start byte in UTF-8
byte_sequence = b'ÿ'
try:
    byte_sequence.decode('utf-8')
except UnicodeDecodeError as e:
    print(f"Caught UnicodeDecodeError: {e}")

expected output

Caught UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Fix 1

Specify the correct encoding when opening files

WHEN You know the encoding of the file you are reading.

Specify the correct encoding when opening files

# If you know the file is encoded in latin-1
try:
    with open('my_file.txt', 'r', encoding='latin-1') as f:
        content = f.read()
except FileNotFoundError:
    print("File not found.")

Why this works

Explicitly providing the correct encoding to `open()` tells Python how to interpret the bytes, preventing decode errors.

Fix 2

Provide an error handling strategy

WHEN A file might contain a few invalid characters that you can afford to ignore or replace.

Provide an error handling strategy

byte_sequence = b'helloÿworld'
# 'replace' will insert a placeholder for invalid bytes
text = byte_sequence.decode('utf-8', errors='replace')
print(text)
# 'ignore' will simply discard invalid bytes
text_ignored = byte_sequence.decode('utf-8', errors='ignore')
print(text_ignored)

Why this works

The `.decode()` method's `errors` parameter allows you to specify a policy for handling bytes that can't be decoded, such as replacing them (`'replace'`) or discarding them (`'ignore'`).

Code examples

Triggerpython

b"\xff".decode("utf-8")  # UnicodeDecodeError: invalid start byte

Handle with errors parampython

try:
    text = data.decode("utf-8")
except UnicodeDecodeError:
    text = data.decode("utf-8", errors="replace")

Avoid by specifying encodingpython

with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
    content = f.read()  # never raises UnicodeDecodeError

What not to do

✕ Guessing encodings one by one until something works

This is unreliable and can lead to silently corrupted text (mojibake). The correct solution is to find out the actual encoding of your data source.

Same error in other languages

TextDecoder errorin javascript→MalformedInputExceptionin Java→utf8.Invalidin Go→

Sources

Official documentation ↗

cpython/Objects/unicodeobject.c

The Absolute Minimum Every Developer Must Know About Unicode ↗

Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev

At a glance

PlatformPython Built-in Exceptions

CodeUnicodeDecodeError

ClassUnicode Error

SeverityERROR

TierCritical

ConfidenceHIGH

← All Python errors