Ucs2StringType

The type used for simple Javascript strings. Javascript strings expose characters as UCS2 code units. This is a fixed-size encoding that supports the unicode codepoints from U+000000 to U+00FFFF (Basic Multilingual Plane or BMP). Displaying larger codepoints is a property of the environment based on UTF-16 surrogate pairs. Unicode does not, and will never, assign characters to the codepoints from U+OOD800 to U+00DFFF. These spare codepoints allows UTF16 to combine codeunits from 0xd800 to 0xdfff in pairs (called surrogate pairs) to represent codepoints from supplementary planes. This transformation happens during the transition from codeunits to codepoints in UTF-16. In UCS2, the codeunits from 0xd800 to 0xdfff directly produce codepoints in the range from U+OOD8OO to U+OODFF. Then, the display might merge these codepoints into higher codepoints during the rendering.

Lets take an example (all the numbers are in hexadecimal):

                                        +---+---+---+---+---+---+
Bytes                                   | 00| 41| d8| 34| dd| 1e|
                                        +---+---+---+---+---+---+
UTF-16BE codeunits                      | 0x0041| 0xd834| 0xdd1e|
                                        +-------+-------+-------+
Codepoints (from UTF-16BE)              |  U+41 |   U+01D11E    |
                                        +-------+---------------+
Displayed (from UTF-16BE)               |   A   |       𝄞       |
                                        +-------+-------+-------+
UCS2 codeunits                          | 0x0041| 0xd834| 0xdd1e|
                                        +-------+-------+-------+
Codepoints (from UCS2BE)                |  U+41 | U+D834| U+DD1E|  <- This is what Javascript sees
                                        +-------+-------+-------+
Displayed (from UCS2BE)                 |   A   |   �   |   �   |  <- This is what the user may see
                                        +-------+-------+-------+
Displayed (from UCS2BE with surrogates) |   A   |       𝄞       |  <- This is what the user may see
                                        +-------+---------------+

The most important takeaway is that codepoints outside of the BMP are a property of the display, not of the Javascript string. This is the cause of multiple issues.

Surrogate halves are exposed as distinct characters: "𝄞".length === 2
Unmatched surrogate halves are allowed: "\ud834"
Surrogate pairs in the wrong order are allowed: "\udd1e\ud834"

If you need to support the full unicode range by manipulating codepoints instead of UCS2 character codes, you may want to use CodepointString or CodepointArray instead of Ucs2String.

PS: This type does not deal with Unicdoe normalization either. Use CodepointString and CodepointArray if you need it.

Hierarchy

Ucs2StringType

Implements

IoType<string>
VersionedType<string, Diff>

Index

Constructors

constructor

Properties

Methods

Constructors

constructor

new Ucs2StringType(options: Lazy<Ucs2StringTypeOptions>): Ucs2StringType

- Defined in types/ucs2-string.ts:119
Parameters
- options: Lazy<Ucs2StringTypeOptions>
Returns Ucs2StringType

Properties

Private _options

_options: Lazy<Ucs2StringTypeOptions>

allowUnicodeRegExp

allowUnicodeRegExp: boolean

lowerCase

lowerCase: boolean

maxLength

maxLength: number

Optional minLength

minLength: undefined | number

name

name: Name = name

Optional pattern

pattern: RegExp

trimmed

trimmed: boolean

Methods

Private _applyOptions

_applyOptions(): void

- Defined in types/ucs2-string.ts:253
Returns void

clone

clone(val: string): string

- Defined in types/ucs2-string.ts:228
Parameters
- val: string
Returns string

diff

diff(oldVal: string, newVal: string): Diff | undefined

- Defined in types/ucs2-string.ts:232
Parameters
- oldVal: string
- newVal: string
Returns Diff | undefined

equals

equals(val1: string, val2: string): boolean

- Defined in types/ucs2-string.ts:224
Parameters
- val1: string
- val2: string
Returns boolean

patch

patch(oldVal: string, diff: Diff | undefined): string

- Defined in types/ucs2-string.ts:236
Parameters
- oldVal: string
- diff: Diff | undefined
Returns string

read

read<R>(reader: Reader<R>, raw: R): string

Implementation of VersionedType.read
- Defined in types/ucs2-string.ts:168
Type parameters
- R
Parameters
- reader: Reader<R>
- raw: R
Returns string

reverseDiff

reverseDiff(diff: Diff | undefined): Diff | undefined

- Defined in types/ucs2-string.ts:240
Parameters
- diff: Diff | undefined
Returns Diff | undefined

squash

squash(diff1: Diff | undefined, diff2: Diff | undefined): Diff | undefined

- Defined in types/ucs2-string.ts:244
Parameters
- diff1: Diff | undefined
- diff2: Diff | undefined
Returns Diff | undefined

test

test(val: string): boolean

- Defined in types/ucs2-string.ts:220
Parameters
- val: string
Returns boolean

testError

testError(val: string): Error | undefined

- Defined in types/ucs2-string.ts:185
Parameters
- val: string
Returns Error | undefined

toJSON

toJSON(): Type

- Defined in types/ucs2-string.ts:150
Returns Type

write

write<W>(writer: Writer<W>, value: string): W

- Defined in types/ucs2-string.ts:181
Type parameters
- W
Parameters
- writer: Writer<W>
- value: string
Returns W

Static fromJSON

fromJSON(options: Type): Ucs2StringType

- Defined in types/ucs2-string.ts:134
Parameters
- options: Type
Returns Ucs2StringType

Hierarchy

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

options: Lazy<Ucs2StringTypeOptions>

Returns Ucs2StringType

Properties

Private _options

allowUnicodeRegExp

lowerCase

maxLength

Optional minLength

name

Optional pattern

trimmed

Methods

Private _applyOptions

Returns void

clone

Parameters

val: string

Returns string

diff

Parameters

oldVal: string

newVal: string

Returns Diff | undefined

equals

Parameters

val1: string

val2: string

Returns boolean

patch

Parameters

oldVal: string

diff: Diff | undefined

Returns string

read

Type parameters

R

Parameters

reader: Reader<R>

raw: R

Returns string

reverseDiff

Parameters

diff: Diff | undefined

Returns Diff | undefined

squash

Parameters

diff1: Diff | undefined

diff2: Diff | undefined

Returns Diff | undefined

test

Parameters

val: string

Returns boolean

testError

Parameters

val: string

Returns Error | undefined

toJSON

Returns Type

write

Type parameters

W

Parameters

writer: Writer<W>

value: string

Returns W

Static fromJSON

Parameters

options: Type

Returns Ucs2StringType