Unicode encoding error
Raised when a Unicode-related encoding error occurs. This happens when you try to convert a string into a sequence of bytes using a specific encoding, but the string contains characters that the encoding cannot represent.
- 1Trying to save a string containing emoji (e.g., '😊') to a file using an encoding that doesn't support it, like `'ascii'`.`
- 2Sending a string with non-Latin characters over a network protocol that expects `'ascii'` bytes.
- 3Using a limited character set for data serialization.
This error is triggered when trying to encode a string containing a non-ASCII character (like 'é') into raw ASCII bytes.
text = "café"
try:
text.encode('ascii')
except UnicodeEncodeError as e:
print(f"Caught UnicodeEncodeError: {e}")
expected output
Caught UnicodeEncodeError: 'ascii' codec can't encode character 'é' in position 3: ordinal not in range(128)
Fix 1
Use a capable encoding like UTF-8
WHEN You need to save or transmit text that may contain any Unicode character.
text = "café 😊"
# UTF-8 can handle any Unicode character
encoded_text = text.encode('utf-8')
print(encoded_text)
Why this works
UTF-8 is a variable-width encoding designed to represent every character in the Unicode standard, making it the default safe choice for encoding text.
Fix 2
Provide an error handling strategy
WHEN You must use a limited encoding and need to handle unsupported characters.
text = "café"
# 'replace' will insert a '?' for the unsupported character
ascii_text = text.encode('ascii', errors='replace')
print(ascii_text)
# 'ignore' will discard the unsupported character
ascii_text_ignored = text.encode('ascii', errors='ignore')
print(ascii_text_ignored)
Why this works
The `.encode()` method's `errors` parameter lets you define a fallback for characters that can't be encoded, such as replacing them or stripping them from the output.
"café".encode("ascii") # UnicodeEncodeError: ordinal not in rangetry:
data = text.encode("ascii")
except UnicodeEncodeError:
data = text.encode("ascii", errors="replace")data = text.encode("utf-8") # supports all Unicode characters✕ Manually stripping out non-ASCII characters before encoding
This can silently corrupt user data (e.g., names, addresses). Using the `errors` parameter is a more explicit and safer way to handle this.
cpython/Objects/unicodeobject.c
Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev