Found characters which are made with multiple code points in character class syntaxJS-0036
Unicode includes the characters which are made with multiple code points.
RegExp character class syntax (/[abc]/
) cannot handle characters which are made by multiple code points as a character; those characters will be dissolved to each code point.
Probably the most important concept about Unicode in JavaScript is to treat strings as sequences of code units, as they really are. The confusion appears when the developer thinks that strings are composed of graphemes (or symbols), ignoring the code unit sequence concept.
It creates misunderstanding when processing strings that contain surrogate pairs or combining character sequences:
- Getting the string length
- Character positioning
- Regular expression matching
For example, ❇️
is made by ❇
(U+2747
) and VARIATION SELECTOR-16 (U+FE0F
).
If this character is in RegExp character class, it will match to either ❇
(U+2747
) or VARIATION SELECTOR-16 (U+FE0F
) rather than ❇️
.
Bad Practice
/^[Á]$/u
/^[❇️]$/u
/^[👶🏻]$/u
/^[🇯🇵]$/u
/^[👨👩👦]$/u
/^[👍]$/
Recommended
/^[abc]$/
/^[👍]$/u