Detailedlikelihood: Highseverity: HighDraft

CAPEC-80Using UTF-8 Encoding to Bypass Validation Logic

Abstraction
Detailed
Status
Draft
Likelihood
High
Severity
High

Description

This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.

Related weaknesses· 9

CWE-173CWE-172CWE-180CWE-181CWE-73CWE-74CWE-20CWE-697CWE-692

Related attack patterns· 3

CAPEC-64 (PeerOf)CAPEC-71 (PeerOf)CAPEC-267 (ChildOf)

Exploits9

TypeTargetConfidenceTier
WeaknessIncorrect Behavior Order: Validate Before Filtercwe-181100%live
WeaknessIncomplete Denylist to Cross-Site Scriptingcwe-692100%live
WeaknessImproper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')cwe-74100%live
WeaknessImproper Input Validationcwe-20100%live
WeaknessExternal Control of File Name or Pathcwe-73100%live
WeaknessIncorrect Comparisoncwe-697100%live
WeaknessImproper Handling of Alternate Encodingcwe-173100%live
WeaknessIncorrect Behavior Order: Validate Before Canonicalizecwe-180100%live
WeaknessEncoding Errorcwe-172100%live

Related by meaning· 6

Nearest entities by semantic similarity across the cs-graph corpus.

CAPEC
Using Unicode Encoding to Bypass Validation Logic
CAPEC
Using Escaped Slashes in Alternate Encoding
CAPEC
Using Slashes and URL Encoding Combined to Bypass Validation Logic
CAPEC
Using Alternative IP Address Encodings
CAPEC
URL Encoding
CAPEC
Double Encoding
Sourced from MITRE CAPEC. Curated by Adam Lundqvist, SQUR.