Recipe 5-3: Normalizing Unicode
This recipe demonstrates how to specify a Unicode code point for use in decoding transactional data.
Ingredients
- OWASP AppSensor
- ModSecurity
- SecUnicodeMapFile directive
- SecUnicodeCodePage directive
Best-Fit Mapping
How should an application handle input that is Unicode encoded using a character set that is outside of what is expected (such as non-ASCII)? This brings up the issue of
best-fit mapping, in which an application internally maps characters to a character code point that looks visually similar. Why is this a security concern? Let’s look at how this can be leveraged as part of a filter evasion technique. The following Unicode encoded XSS payload uses various code points, including full-width characters:
%u3008scr%u0131pt%u3009%u212fval(%uFF07al%u212Frt(%22XSS%22)%u02C8)
%u2329/scr%u0131pt%u232A
This payload should be correctly Unicode-decoded to this:
〈scrıpt〉ℯval('alℯrt("XSS")ˈ)〈/scrıpt〉
This is simply a set of text; a web browser would not treat it as executable code. If the target web application is running Microsoft ASP classic, however, it tries to do best-fit mapping for the Unicode encoding characters. Here is a short example of some of the mappings ASP makes for the left angle bracket and single tick mark characters:
〈(0x2329) ~= <(0x3c)
〈(0x3008) ~= <(0x3c)
<(0xff1c) ~= <(0x3c)
ʹ(0x2b9) ~= '(0x27)
ʼ(0x2bc) ~= '(0x27)
ˈ(0x2c8) ~= '(0x27)
′(0x2032) ~= '(0x27)
'(0xff07) ~= '(0x27)
With this mapping, ...