This is actually the second attempt at writing this post. I first wrote a complete post about how I organize my files and how I attempted at creating a system for indexing each book. But after writing it I started to do some research and figured out how to find which of my 4000 pdfs weren’t already OCR, how to batch move them, then batch convert, and how to index them. Including how to set the whole process up to be somewhat automated.
I’m sure this is super easy for many people and there will be some who are scratching their heads saying, “What?” so I’ll try to lay out the steps the best I can. I would suggest you do your own research regarding what is best for you and your own needs and consider this preview of what you could do.
Now, if you are like me you probably have hundreds, thousands, or maybe tens of thousands of RPG-related pdfs and files. The last time I checked I had nearly four thousand files in my RPG folder. Sometimes when I look at that folder I feel like a little pack rat, gathering up all of my precious pdfs and squirrelling them away for that day when I really feel the need to play one. It is a problem. I know this. But I can’t stop. “Oh, is that pdf free? Oh, is that bundle 50% off?” Unfortunately the answer is usually yes. And yes usually leads to hitting the download button. And hitting the download button leads to more files in the bin. I try to keep things organized as I go but that hasn’t stopped me from filling up a couple folders with hundreds of files.
This leads to the perennial question, “How do I organize this mess so I know what I have and where it lives?” Further, is there a way to index these pdfs so I know where the resources are which will help me in the game I’m playing right now?
Originally, the rest of this was about how I organize my folder and how I try to keep track of where things are (indexing). But instead I’ll lay out the steps of what I did to make them searchable and to create a database of them.
Step 1.
My first step was to find all the files that needed to be converted to OCR. This would be anything that was scanned in or was on the older side. As I mentioned above, I have around 4000 gaming pdfs. It would take way too long to check each file to see which ones need OCR or not. Instead I found a script for Windows online which checked each PDF in my RPG folder and if they needed OCR it moved that file into another folder designated for PDFs that needed to be converted. I could give you the script I used if that would be helpful but I’m no expert. I was definitely flying by the seat of my pants here, so I would recommend you do your own research.
Now that I have this script I’m planning on automating it so that every once in a while it will check my RPG download folder and sort files between OCR and not OCR.
Step 2.
With 1000s of PDFs that need OCR I went back online and found a script which allowed me to convert them all using a couple different apps. At this point, I’m completely flying by the seat of my pants and the pilot was drunk. I used the python based OCRmyPDF with Tesseract OCR and Ghostscript. Please don’t ask what any of them really do. All I can tell you with those apps I batch converted my non-OCR PDFs into OCR PDFs. It took just short of 24 hours to complete the process and in the end just about all of my PDFs are now OCR. This alone was huge because at this point I could search the contents of just about 90% of my 4000 files.
Step 3.
Next I used DocFetcher to index the folders that held my converted-to-OCR files and my native OCR files. It took about an hour to index them and now that they are indexed I can search all of my RPG files for every occurrence of kobold, find what I’m looking for, double click and it opens up the file for me to read.
Step 4.
Finally, I created a spreadsheet to catalogue all of my files. This I will do manually and my plan is to go slow and add to it as I actually use files. This way it really just captures the files I use the most and the ones buried in there might get added as they are used.
My spreadsheet has the following columns: File Name, System, Good For, Random Tables, Utility, Tone, Notes, Folder Path w/ Link. I then created a second tab which had the same columns which I started to fill up with tags. My plan is to use the tags with commas and filters to filter for the tags I’m looking for. I’m particularly excited to start using this for Random Tables, as I find many good ones and then completely forget about them.
There you have it. Between the index and spreadsheet, I should be able to find everything I need quickly and easily. I will even be able to find stuff that I didn’t even realize I have or which I had forgotten about.
Next Steps.
First, I’m going to automate my workflow as much as I can. This is the workflow that I’ve developed moving forward
- New files -> drop into 1_Originals folder
- Run the OCR Testing Script (put it on a schedule)
- Run the OCR Converting Script (put it on a schedule)
- Reindex (put it on a schedule)
- Tag/add entries to spreadsheet (on an as used basis)
- Organize into subfolder (on an as used basis)
Slowly over time I will populate my spreadsheet and organize my files into subfolders using a similar set-up as my spreadsheet categories. That way I can browse a little easier and will have everything in one area.
Okay, that was a lot. I’m happy I did it. Was it needed? Of course not, was it fun to research and learn? Very much.
So let’s hear it, how are you organizing your RPG files?

Leave a comment