Recipe 5-3: Normalizing Unicode
This recipe demonstrates how to specify a Unicode code point for use in decoding transactional data.
Ingredients
  • OWASP AppSensor4
    • Unexpected Encoding Used
  • ModSecurity
    • SecUnicodeMapFile directive
    • SecUnicodeCodePage directive
Best-Fit Mapping
How should an application handle input that is Unicode encoded using a character set that is outside of what is expected (such as non-ASCII)? This brings up the issue of best-fit mapping, in which an application internally maps characters to a character code point that looks visually similar. Why is this a security concern? Let’s look at how this can be leveraged as part of a filter evasion technique. The following Unicode encoded XSS payload uses various code points, including full-width characters:
%u3008scr%u0131pt%u3009%u212fval(%uFF07al%u212Frt(%22XSS%22)%u02C8)
%u2329/scr%u0131pt%u232A
This payload should be correctly Unicode-decoded to this:
〈scrıpt〉ℯval('alℯrt("XSS")ˈ)〈/scrıpt〉
This is simply a set of text; a web browser would not treat it as executable code. If the target web application is running Microsoft ASP classic, however, it tries to do best-fit mapping for the Unicode encoding characters. Here is a short example of some of the mappings ASP makes for the left angle bracket and single tick mark characters:
〈(0x2329) ~= <(0x3c)
〈(0x3008) ~= <(0x3c)
<(0xff1c) ~= <(0x3c)
ʹ(0x2b9) ~= '(0x27)
ʼ(0x2bc) ~= '(0x27)
ˈ(0x2c8) ~= '(0x27)
′(0x2032) ~= '(0x27)
'(0xff07) ~= '(0x27)
With this mapping, ...

Get Web Application Defender's Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.