Whitespace characters must always be escapedJAVA-E1029
There are many whitespace characters other than the space character (' '
) defined in the Unicode standard. However, using these characters without properly escaping them can cause unintended behavior, bugs or even a security breach to occur.
There has been an example of a security vulnerability due to the lack of escaping certain whitespace characters: CVE-2021-42574.
Bad Practice
This issue is raised when any whitespace character other than ' '
(the space character) is used without an escape sequence.
String withATab = "A B";
String withZeroWidthSpace = "abcdef"; // There's a character between abc and def here.
char tabChar = ' ';
System.out.println("5678,6776, 4321, USD");
Try selecting the text of the last line in this example; you may notice some strange behavior...
This is due to the use of the Unicode right-to-left override character (U+202E
), which lets us force the following characters to be formatted as right-to-left, and the pop directional formatting character (U+202C
) which removes the current directional formatting override.
Recommended
Always escape whitespace characters which are not spaces.
char goodTab = ' ';
String goodStringWithTab = "A B";
String withZeroWidthSpace = "abcdef";
String bidiText = "5678, 6776, 4321, USD"
References
- Unicode Technical Reports - Bidirectional Algorithm Spec