Matching Letters
Problem
You want to see whether a value only consists of alphabetic characters.
Solution
The obvious character class for matching regular letters isn’t good enough in the general case:
if ($var =~ /^[A-Za-z]+$/) {
# it is purely alphabetic
}That’s because it doesn’t respect the user’s locale
settings. If you need to match letters with diacritics as well,
use
locale and match against a
negated character class:
use locale;
if ($var =~ /^[^\W\d_]+$/) {
print "var is purely alphabetic\n";
}Discussion
Perl can’t directly express “something alphabetic”
independent of locale, so we have to be more clever. The
\w regular expression notation matches one
alphabetic, numeric, or underscore character. Therefore,
\W is not one of those. The negated character
class [^\W\d_] specifies a byte that must not be
an alphanumunder, a digit, or an underscore. That leaves us with
nothing but alphabetics, which is what we were looking for.
Here’s how you’d use this in a program:
use locale;
use POSIX 'locale_h';
# the following locale string might be different on your system
unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {
die "couldn't set locale to French Canadian\n";
}
while (<DATA>) {
chomp;
if (/^[^\W\d_]+$/) {
print "$_: alphabetic\n";
} else {
print "$_: line noise\n";
}
}
__END_ _
silly
façade
coöperate
niño
Renée
Molière
hæmoglobin
naïve
tschüß
random!stuff#here
See Also
The treatment of locales in Perl in perllocale
(1); your system’s locale (3) manpage; we discuss locales in greater ...