htmltotext For Linux

n/a

Last updated: Dec 17, 2011 GPL

SOFTPEDIA® DOWNLOAD NOW 1,142 downloads so far

Extract text and some metainfo from HTML, coping with malformed pages as well as possible.. #Extract textual content #Extract metadata #HTML to text #HTML2TXT #HTML #Text

Description

Free Download

htmltotext is a Python package that was written for a search engine, to allow it to extract the textual content and metadata from HTML pages. It tries to cope with invalid markup and incorrectly specified character sets, and strips out HTML tags (splitting words at tags appropriately). It also discards the contents of script tags and style tags.

As well as text from the body of the page, it extracts the page title, and the content of meta description and keyword tags. It also parses meta robots tags to determine whether the page should be indexed.

The HTML parser used by this module was extracted from the Xapian search engine library (and specifically, from the omindex indexing utility in that library).

System requirements

Python

DOWNLOAD htmltotext 0.7.3

htmltotext 0.7.3

add to watchlist add to download basket send us an update REPORT

runs on:: Linux
filename:: htmltotext-0.7.3.tar.gz
main category:: Text Editing&Processing
developer:: Richard Boulton
visit homepage

Context Menu Manager 3.3.3.1

Customize Windows’ original right-click context menu using this free, portable and open-source utility meant to enhance your workflow

Context Menu Manager

7-Zip 23.01 / 24.04 Beta

An intuitive application with a very good compression ratio that can help you not only create and extract archives, but also test them for errors

Zoom Client 6.0.3.37634

The official desktop client for Zoom, the popular video conferencing and collaboration tool used by millions of people worldwide

calibre 7.9.0

Effortlessly keep your e-book library thoroughly organized with the help of the numerous features offered by this efficient and capable manager

Windows Sandbox Launcher 1.0.0

Set up the Windows Sandbox parameters to your specific requirements, with this dedicated launcher that features advanced parametrization

Windows Sandbox Launcher

Microsoft Teams 24060.3102.2733.5911 Home / 1.7.00.7956 Work

Effortlessly chat, collaborate on projects, and transfer files within a business-like environment by employing this Microsoft-vetted application

Microsoft Teams

IrfanView 4.67

With support for a long list of plugins, this minimalistic utility helps you view images, as well as edit and convert them using a built-in batch mode

ShareX 16.0.1

Capture your screen, create GIFs, and record videos through this versatile solution that includes various other amenities: an OCR scanner, image uploader, URL shortener, and much more

Bitdefender Antivirus Free 27.0.35.146

Feather-light and free antivirus solution from renowned developer that keeps the PC protected at all times from malware without requiring user configuration

Bitdefender Antivirus Free

4k Video Downloader 1.5.3.0080 Plus / 4.30.0.5655

Export your favorite YouTube videos and playlists with this intuitive, lightweight program, built to facilitate downloading clips from the popular website

4k Video Downloader



% discount

ShareX

ShareX
Bitdefender Antivirus Free
4k Video Downloader
Context Menu Manager
7-Zip
Zoom Client
calibre
Windows Sandbox Launcher
Microsoft Teams
IrfanView

essentials

User Comments

This enables Disqus, Inc. to process some of your data. Disqus privacy policy

feedback