Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.6 / 3....
  • Linux Kernel 3.0.82 LTS...
  • KDE Software Compilatio...
  • PulseAudio 4.0
  • Wireshark 1.10.0
  • NetworkManager 0.9.8.2
  • LibreOffice 3.6.6 / 4.0...
  • SystemRescueCd 3.7.0
  • Linux Kernel 3.10 RC6
  • Ubuntu Tweak 0.8.5
  • Home > Linux > Text Editing&Processing > Markup

    Jericho HTML Parser 3.3

    Download button

    Downloads: 1,336  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    Fair (2.5/5)
    17 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Martin Jericho | More programs
    LGPL / FREE
    November 1st, 2012, 20:03 GMT [view history]
    ROOT / Text Editing&Processing / Markup

     Read user reviews (0)  Refer to a friend  Subscribe

    Jericho HTML Parser description

    A simple but powerful Java library

    Jerich HTML Parser is an open source, simple, yet powerful library written entirely in Java.

    It allows programmers to manipulate and analyse parts of a HTML document.

    Jerich HTML Parser also incorporates high-level HTML form manipulation functions.

    Product's homepage

    Requirements:

    · Java 2 Standard Edition Runtime Environment

    What's New in This Release: [ read full changelog ]

    Bug Fixes:
    · [3581664] CharacterReference.decode() does not decode entities containing digits - ½ ¼ ¾ ¹ ² ³ ∴
    · [3311286] SourceCompactor does not respect TEXTAREA
    · [3519131] Renderer output incorrect when constructed with an Element object.
    · [3538829] Renderer output of font decoration on block boundaries incorrect.
    · Segment.getAllStartTags(name) and Segment.getFirstElement(name) do not work if the argument contains upper case characters.
    · The end delimiter of a common server tag inside an escaped server tag is falsely recognised as the end delimiter of the escaped tag.

    CHANGES THAT COULD AFFECT THE BEHAVIOUR OF EXISTING PROGRAMS:
    · [3427073] Segment.getStyleURISegments() now includes style element content as well as style attribute values.
    · [3427927] Segment.getURIAttributes() now includes the archive attributes of object and applet elements.
    · Comments no longer recognised inside script elements during full sequential parse. Previously they were recognised for compatibility with major browsers but modern browser behaviour has changed.
    · Changed the log level of all parsing errors from INFO to ERROR, and the log level of the Source.fullSequentialParse() advisory message from WARN to INFO. The previous levels gave the advisory message a higher severity than the parsing errors, preventing logging systems from hiding the advisory message while showing parsing errors. Character encoding warnings remain unchanged at WARN level.
    · Changed the behaviour of the Renderer.renderHyperlinkURL(StartTag) method so that relative URLs are not rendered.
    · Changed the behaviour of the Renderer so that hyperlink element content is not rendered if it is the same as the hyperlink URL, ignoring any http:// prefix or / suffix.
    · EndTag.tidy() now removes whitespace before the closing bracket.
    · Added Source(File) constructor.
    · Added OutputDocument.getSegment() method.
    · Added OutputDocument.remove(int begin, int end) method.
    · Added Renderer.setHRLineLength() method.
    · Added RenderToText.jsp webapp sample.
    · Added Segment.getRowColumnVector() method.
    · Encoding detection now ignores common encodings specified in meta tags that have a code unit size incompatible with the preliminary encoding.
    · Upgraded to the following logger APIs: slf4j-api-1.7.2, log4j-1.2.17

      


    TAGS:

    HTML parser | java library | HTML manipulator | Jericho | HTML | parser

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM