Libarc is a C library for accessing the contents of GZIP compressed ARC files generated by the Heritrix web crawler.
Libarc is a C library for accessing the contents of GZIP compressed ARC files generated by the Internet Archive's Heritrix web crawler.
- Opening and scanning the contents of GZIP compressed ARC file. The library does not currently read CDX index files, though this feature will be added in a future release.
- You can get an iterator to walk over the contents of the ARC file member by member. You can specify a media type to limit the types members you see.
- You can access the information in the member's URL record and the response headers from the HTTP server.
- You can access the member's data in a single API call.