A couple of unicode issues on PHP and Firefox
by dominee on Oct.19, 2009, under Uvahy
1.- Overlong UTF-8:
As REQUIRED by UNICODE 3.1, and noted in the Unicode Technical Report #36, UTF-8 is forbidden to interpretate a character’s non-shortest form.
2.- Ill formed sequences:
As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it.
3.- Integer overflow:
Unsigned short has a size of 16 bits (2 bytes), that is UNCAPABLE of storing unicode characters of 21 bits, and represented on UTF with 4 bytes (1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx). PHP attempts to sum a 21 bits value to a 16 bits-size variable, and then makes no checks on the value.
The firefox one
Firefox is supposed to consider the non-shortest form exception (point #1 in the PHP vulnerabilities), and section 3.1 of the Unicode Technical Report #36 but apparently there’s a flaw on it. This is specially problematic for the reasons that an overlong unicode sequence not taken into consideration may allow several types of filter bypasses.
http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html