we live in hell
Creating Archival CDs / DVDs - Printable Version

+- we live in hell (https://weliveinhell.net)
+-- Forum: interests & hobbies (https://weliveinhell.net/forumdisplay.php?fid=10)
+--- Forum: science & technology (https://weliveinhell.net/forumdisplay.php?fid=11)
+--- Thread: Creating Archival CDs / DVDs (/showthread.php?tid=205)



Creating Archival CDs / DVDs - FrodoSwaggins - 02-25-2025

I have a huge folder of documents in JPG, HTML, PDF, epub, and DOC / DOCX format. Does anyone know of a tool that can scan the files and make a nice keyword based index of them, preferably in HTML output?
It would be great if the tool did stuff like pull keywords and metadata from the individual files. Great if it can create a flat index, even better if it can build a series of index pages which are linked together. I'm trying to create an indexed archive not unlike the old warez CDs of the days of yore.
Of course, open source always appreciated.


RE: Creating Archival CDs / DVDs - gorzek - 02-26-2025

You might look at this first: https://github.com/henrystern/PDF-Metadata-to-CSV
For Word docs, you could use the python-docx library to pull out properties and do whatever you want with them.
See: https://medium.com/@HeCanThink/python-docx-a-comprehensive-guide-to-creating-and-manipulating-word-documents-in-python-a765cf4b4cb9