I have a huge folder of documents in JPG, HTML, PDF, epub, and DOC / DOCX format. Does anyone know of a tool that can scan the files and make a nice keyword based index of them, preferably in HTML output?
It would be great if the tool did stuff like pull keywords and metadata from the individual files. Great if it can create a flat index, even better if it can build a series of index pages which are linked together. I'm trying to create an indexed archive not unlike the old warez CDs of the days of yore.
Of course, open source always appreciated.