URI::Escape::XS is a drop-in replacement for URI::Escape.
# use it instead of URI::Escape
use URI::Escape::XS qw/uri_escape uri_unescape/;
$safe = uri_escape("10% is enoughn");
$verysafe = uri_escape("foo", "�-377);
$str = uri_unescape($safe);
# or use encodeURIComponent and decodeURIComponent
$safe = encodeURIComponent("10% is enoughn");
$str = decodeURIComponent("10% is enough
$uri = encodeURIComponent("http://www.example.com/");
Note you cannot customize characters to escape. If you need to do so, use "uri_escape".
$str = decodeURIComponent("http://www.example.com/");
It decode not only %HH sequences but also %uHHHH sequences, with surrogate pairs correctly decoded.
$str = decodeURIComponent("%uD869%uDEB2%u5F3E%u0061");
This function UNCONDITIONALLY returns the decoded string with utf8 flag off. To get utf8-decoded string, use Encode and
This is the correct behavior because you can't tell if the decoded string actually contains UTF-8 decoded string, like ISO-8859-1 and Shift_JIS.
Does exactly the same as URI::Escape::uri_escape() except when utf8-flagged string is fed.
URI::Escape::uri_escape() croak and urge you to uri_escape_utf8() but it is pointless because URI itself has no such things as utf8 flag. The function in this module ALWAYS TREATS the string as byte sequence. That way you can safely use this function without worring about utf8 flags.
Note this function is NOT EXPORTED by default. That way you can use URI::Escape and URI::Escape::XS simultaneously.
Does exactly the same as URI::Escape::uri_escape() except when %uHHHH is fed.
URI::Escape::uri_unescape() simply ignores %uHHHH sequences while the function in this module does decode it into the corresponding UTF-8 byte sequence.
Like uri_escape, this funciton is NOT EXPORTED by default.
Note on the %uHHHH sequence
With this module the resulting strings never have the utf8 flag on. So if you want to decode it to perl utf8, You have to explicitly decode via Encode. Remember. URIs have always been a byte sequence, not UTF-8 characters.
If %uHHHH sequence became standard, you could've safely told if a given URI is in Unicode. But more fortunately than unfortunately, the RFC proposal was rejected so you can't tell which encoding is used just by looking at the URI.
I said fortunately because %uHHHH can be nasty for non-BMP characters. Since each %uHHHH can hold one 16-bit value, you need a surrogate pair to represent it if it is U+10000 and above.
In spite of that, there are a significant number of URIs with %uHHHH escapes. Therefore this module supports decoding only.
What's New in This Release: [ read full changelog ]
· The fix in RT#39135 did not address %XX reported by @kazuho via twitter DM