Options
All
  • Public
  • Public/Protected
  • All
Menu

The type used for simple Javascript strings. Javascript strings expose characters as UCS2 code units. This is a fixed-size encoding that supports the unicode codepoints from U+000000 to U+00FFFF (Basic Multilingual Plane or BMP). Displaying larger codepoints is a property of the environment based on UTF-16 surrogate pairs. Unicode does not, and will never, assign characters to the codepoints from U+OOD800 to U+00DFFF. These spare codepoints allows UTF16 to combine codeunits from 0xd800 to 0xdfff in pairs (called surrogate pairs) to represent codepoints from supplementary planes. This transformation happens during the transition from codeunits to codepoints in UTF-16. In UCS2, the codeunits from 0xd800 to 0xdfff directly produce codepoints in the range from U+OOD8OO to U+OODFF. Then, the display might merge these codepoints into higher codepoints during the rendering.

Lets take an example (all the numbers are in hexadecimal):

                                        +---+---+---+---+---+---+
Bytes                                   | 00| 41| d8| 34| dd| 1e|
                                        +---+---+---+---+---+---+
UTF-16BE codeunits                      | 0x0041| 0xd834| 0xdd1e|
                                        +-------+-------+-------+
Codepoints (from UTF-16BE)              |  U+41 |   U+01D11E    |
                                        +-------+---------------+
Displayed (from UTF-16BE)               |   A   |       𝄞       |
                                        +-------+-------+-------+
UCS2 codeunits                          | 0x0041| 0xd834| 0xdd1e|
                                        +-------+-------+-------+
Codepoints (from UCS2BE)                |  U+41 | U+D834| U+DD1E|  <- This is what Javascript sees
                                        +-------+-------+-------+
Displayed (from UCS2BE)                 |   A   |||  <- This is what the user may see
                                        +-------+-------+-------+
Displayed (from UCS2BE with surrogates) |   A   |       𝄞       |  <- This is what the user may see
                                        +-------+---------------+

The most important takeaway is that codepoints outside of the BMP are a property of the display, not of the Javascript string. This is the cause of multiple issues.

  • Surrogate halves are exposed as distinct characters: "𝄞".length === 2
  • Unmatched surrogate halves are allowed: "\ud834"
  • Surrogate pairs in the wrong order are allowed: "\udd1e\ud834"

If you need to support the full unicode range by manipulating codepoints instead of UCS2 character codes, you may want to use CodepointString or CodepointArray instead of Ucs2String.

PS: This type does not deal with Unicdoe normalization either. Use CodepointString and CodepointArray if you need it.

Hierarchy

  • Ucs2StringType

Implements

Index

Constructors

constructor

Properties

Private _options

allowUnicodeRegExp

allowUnicodeRegExp: boolean

lowerCase

lowerCase: boolean

maxLength

maxLength: number

Optional minLength

minLength: undefined | number

name

name: Name = name

Optional pattern

pattern: RegExp

trimmed

trimmed: boolean

Methods

Private _applyOptions

  • _applyOptions(): void

clone

  • clone(val: string): string

diff

  • diff(oldVal: string, newVal: string): Diff | undefined

equals

  • equals(val1: string, val2: string): boolean

patch

  • patch(oldVal: string, diff: Diff | undefined): string

read

  • read<R>(reader: Reader<R>, raw: R): string

reverseDiff

  • reverseDiff(diff: Diff | undefined): Diff | undefined

squash

  • squash(diff1: Diff | undefined, diff2: Diff | undefined): Diff | undefined

test

  • test(val: string): boolean

testError

  • testError(val: string): Error | undefined

toJSON

write

  • write<W>(writer: Writer<W>, value: string): W

Static fromJSON

Generated using TypeDoc