Friday, January 18, 2008

Extract Text From PDF, .Doc, and other Files

Thanks to the folks at lifehacker and elsewhere for this tip, which I started using. It works, at least on early use here in the office. It's a program called Text Mining Tool:

From the text mining site:

Text Mining Tool is a freeware program for extraction of text from files of the next types:
pdf, doc, rtf, chm, html without need to have installed any other programs like Word, Arcrobat, etc.

The beauty of the program is that it works, extremely simply, on almost all common forms of documents. That includes HTML web pages, both DOC and RTF document formats from Microsoft Word and others like Open Office, Windows Help files ending in CHM, and portable documents using PDF format.
Text Mining Tool antivirus report

Its comfortable and easy usage is defined by the following key features:

* No payment or license restrictions. Tool is absolutely free.
* Works as converter of PDF, DOC, RTF, CHM, HTML files to text.
* User-friendly interface with hotkeys available.
* Console tool minetext for automation of text converting is included.
* .NET 2.0 framework based.
* No installation is needed. Just unpack the program and use.

If you don't have on your RSS feed, you should.