Unicode::Semantics is a workaround for the Perl 5 Unicode bug. Although the internal encoding of a string is hidden from the Perl programmer, it does unfortunately affect semantics. Perl uses Unicode semantics when the internal encoding for a string is UTF8, but it uses ASCII semantics when the internal encoding is ISO-8859-1.
Because you shouldn't (and often don't) know what the internal encoding will be, it's hard to predict whether these operations will actually do what you want. Unicode::Semantics::us() gives you predictable results for your string.
Normally, the non-ASCII part of the character set is ignored when for the following operations on a string of which the internal encoding is ISO-8859-1:
* uc, lc, ucfirst, lcfirst, U, L, u, l
* d, s, w, D, S, W
* /.../i, (?i:...)
* /[[:posix:]]/
This module exports us that upgrades your string to UTF-8 internally and returns the string. An alias, up, is also exported by default. After initially releasing the module with us, I changed my mind and starting liking up better.
You can also use the built-in function utf8::upgrade, which upgrades the string and returns the number of octets used for the internal UTF8 buffer.
Non-string values (like numbers, references, objects, and undef) are stringified on upgrade.
us, up, and utf8::upgrade mutate the variable's actual value. If you need to upgrade only a copy of a string, make the copy first:
up(my $copy = $original);
Upgrading an already upgraded variable does not re-upgrade, so it is safe.
SYNOPSIS
$foo; # could be anything
up $foo; # force Unicode semantics
or:
up($foo) =~ s/W/_/g; # Upgrade and use immediately
Product's homepage
Requirements:
· Perl