Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.3 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Programming > Libraries

    Unicode::Regex::Set 0.02

    Download button

    No screenshots available
    Downloads: 548  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    SADAHIRO Tomoyuki | More programs
    Perl Artistic License / FREE
    July 11th, 2007, 18:05 GMT
    ROOT / Programming / Libraries

     Read user reviews (0)  Refer to a friend  Subscribe

    Unicode::Regex::Set description

    Unicode::Regex::Set is a subtraction and intersection of Character Sets in Unicode Regular Expressions.

    Unicode::Regex::Set is a subtraction and intersection of Character Sets in Unicode Regular Expressions.

    SYNOPSIS

    use Unicode::Regex::Set qw(parse);

    $regex = parse('[p{Latin} & p{L&} - A-Z]');

    Perl 5.8.0 misses subtraction and intersection of characters, which is described in Unicode Regular Expressions (UTS #18). This module provides a mimic syntax of character classes including subtraction and intersection, taking advantage of look-ahead assertions.

    The syntax provided by this module is considerably incompatible with the standard Perl's regex syntax.

    Any whitespace character (that matches /s/) is allowed between any tokens. Square brackets ('[' and ']') are used for grouping. A literal whitespace and square brackets must be backslashed (escaped with a backslash, ''). You cannot put literal ']' at the start of a group.

    A POSIX-style character class like [:alpha:] is allowed since its '[' is not a literal.
    SEPARATORS ('&' for intersection, '|' for union, and '-' for subtraction) should be enclosed with one or more whitespaces. E.g. [A&Z] is a list of 'A', '&', 'Z'. [A-Z] is a character range from 'A' to 'Z'. [A-Z - Z] is a set by removal of [Z] from [A-Z].
    Union operator '|' may be omitted. E.g. [A-Z | a-z] is equivalent to [A-Z a-z], and also to [A-Za-z].

    Intersection operator '&' has high precedence, so [p{A} p{B} & p{C} p{D}] is equivalent to [p{A} | [p{B} & p{C}] | p{D}].

    Subtraction operator '-' has low precedence, so [p{A} p{B} - p{C} p{D}] is equivalent to [[p{A} | p{B}] - [p{C} | p{D}] ].

    [p{A} - p{B} - p{C}] is a set by removal of p{B} and p{C} from p{A}. It is equivalent to [p{A} - [p{B} p{C}]] and [p{A} - p{B} p{C}].

    Negation. when '^' just after a group-opening '[', i.e. when they are combined as '[^', all the tokens following are negated. E.g. [^A-Z a-z] matches anything but neither [A-Z] nor [a-z]. More clearly you can say this with grouping as [^ [A-Z a-z]].

    If '^' that is not next to '[' is prefixed to a sequence of literal characters, character ranges, and/or metacharacters, such a '^' only negates that sequence; e.g. [A-Z ^p{Latin}] matches A-Z or a non-Latin character. But [A-Z [^p{Latin}]] (or [A-Z P{Latin}], for this is a simple case) is recommended for clarity.

    If you want to remove anything other than PERL from [A-Z], use [A-Z & PERL] as well as [A-Z - [^PERL]]. Similarly, if you want to intersect [A-Z] and a thing not JUNK, use [A-Z - JUNK] as well as [A-Z & [^JUNK]].

    Product's homepage

    Requirements:

    · Perl

      


    TAGS:

    Character Sets | Unicode Regular Expressions | Perl module | Unicode::Regex::Set | Unicode | Character

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM