Java Mozilla Html Parser project is a Java package that enables you to parse html pages into a Java Document object. The parser is a wrapper around Mozilla's Html Parser, thus giving the user a browser-quality html parser.
Limitiations and known issues
The most major limitation is performance related , in the sense that the parser serializes the requests. At the moment , the parser is running at a separate thread , which at its time receives request , parses them and gives back the responses to the requester. It all happens because of Mozilla's mechanism to keep it's object thread safe. in the process of doing that, mozilla forces you to use proxied objects instead of the real objects that you have. My hope is that the open source community will take that project and maintain those issues.
Here are some key features of "Java Mozilla Html Parser":
· Real world , browser quality DOM parsing
· compatiability with SAX parsers
· sequential performance comparable to pure java implementations of dom parsers
· Win32 , linux and MacOSX platforms are supported.
What's New in This Release:
· This release has a major performance boost and a major encoding-related bugfix.