Detailedlikelihood: Highseverity: HighDraft
CAPEC-80Using UTF-8 Encoding to Bypass Validation Logic
Abstraction
Detailed
Status
Draft
Likelihood
High
Severity
High
Description
This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.
Related weaknesses· 9
Related attack patterns· 3
Exploits9
| Type | Target | Confidence | Tier |
|---|---|---|---|
| Weakness | Incorrect Behavior Order: Validate Before Filtercwe-181 | 100% | live |
| Weakness | Incomplete Denylist to Cross-Site Scriptingcwe-692 | 100% | live |
| Weakness | Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')cwe-74 | 100% | live |
| Weakness | Improper Input Validationcwe-20 | 100% | live |
| Weakness | External Control of File Name or Pathcwe-73 | 100% | live |
| Weakness | Incorrect Comparisoncwe-697 | 100% | live |
| Weakness | Improper Handling of Alternate Encodingcwe-173 | 100% | live |
| Weakness | Incorrect Behavior Order: Validate Before Canonicalizecwe-180 | 100% | live |
| Weakness | Encoding Errorcwe-172 | 100% | live |
Related by meaning· 6
Nearest entities by semantic similarity across the cs-graph corpus.