Building Character
To define your own property, you need to write a subroutine with the
name of the property you want (see Chapter 7). For
security reasons, this subroutine’s (unqualified) name must begin
with either Is or In. The subroutine should be defined in
the package that needs the property (see Chapter 10), which means that if you want to use it in
multiple packages, you’ll either have to import it from a module
(see Chapter 11), or inherit it as a class method
from the package in which it is defined (see Chapter 12).
Once you’ve got that all settled, the subroutine should return
data in the same format as the files in
PATH_TO_PERLLIB/unicode/Is
directory. That is, just return a list of characters or character
ranges in hexadecimal, one per line. If there is a range, the two
numbers are separated by a tab. Suppose you wanted a property that
would be true if your character is in the range of either of the
Japanese syllabaries, known as hiragana and katakana (together
they’re known as kana). You can just put in the two ranges like
this:
sub InKana {
return <<'END';
3040 309F
30A0 30FF
END
}Alternatively, you could define it in terms of existing property names:
sub InKana {
return <<'END';
+utf8::InHiragana
+utf8::InKatakana
END
}You can also do set subtraction using a “–” prefix. Suppose you only wanted the
actual characters, not just the block ranges of characters. You
could weed out all the undefined ones like this:
sub IsKana { return <<'END'; +utf8::InHiragana +utf8::InKatakana ...Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access