May 2019
Beginner to intermediate
466 pages
10h 44m
English
In Julia, string literals are encoded using UTF-8. UTF-8 is a variable-width encoding, meaning that not all characters are represented using the same number of bytes. For example, ASCII characters are encoded using a single byte—but other characters can use up to four bytes. This means that not every byte index into a UTF-8 string is necessarily a valid index for a corresponding character. If you index into a string at such an invalid byte index, an error will be thrown. Here is what I mean:
julia> str = "Søren Kierkegaard was a Danish Philosopher" julia> str[1] 'S': ASCII/Unicode U+0053 (category Lu: Letter, uppercase)
We can correctly retrieve the character at index 1:
julia> str[2] 'ø': Unicode U+00f8 (category Ll: Letter, ...
Read now
Unlock full access