The mega-firm at which I am employed produces a lot of documents. A lot. Some of the more important documents are financial statements which communicate to the attorneys how much moolah they’re making (or not, as the case may be).
One of our vendors provides several tax documents as one big honkin’ PDF, many 1,000’s of pages long, comprised of individual partner statements all jammed together, one after the other. Our financial folks pulled their hair out year after year trying to divide these big documents up into individual statements for each attorney. They would literally spread out the whole mess on a huge conference room table and create little piles of paper.
This might explain why most of the people in our Finance dept have offices littered with little stacks of paper. But I digress.
It fell my lot to create a solution to this little problem. And the solution was found in a little technology called “bursting”, otherwise known as “splitting” a single document into multiple individual documents.
There are several open-source utilities that will take a single large document and split it up into multiple single-page documents, but most didn’t fit the bill at all. We needed a solution that would…
- Split a document based on text indicators (tags) that would let the splitter know where new documents should begin.
- Allow the user to specify at least a portion of each split document’s file name. Most bursters will just create a random filename for each split document, without allowing you any control.
- Allow us to hide the tags by coloring them white, so they don’t appear to anyone reading the reports.
The best (and possibly only) solution I could find that matched these criteria was PDF-Explode. I say “and possibly only” because so far I haven’t found any competing product that remotely meets these criteria.
To burst a document using PDF-Explode, you must have some control over the content of the document you’re going to burst. In our case, we got our vendors to modify the reports they sent us, or we changed any reports we produced internally. The modification is simple enough: to trigger PDF Explode to create a new document, you simply add a <pdfexplode> tag to the top of the page, thusly…
<pdfexplode>NewFileName</pdfexplode>
Then, you run the PDF-Explode application from the command-line, passing it the name of the “master” PDF file, and a few seconds later you have a horde of new “bursted” files!
But wait a second… if you can automagically split a ginormous PDF into individual documents, and you can control the filename of each new document, why not go one step further and create an automagical delivery mechanism?
And that is exactly what I did. I built a custom ASP.Net application that would, when visited by an attorney, scan a folder for files that matched their employee number, and display them as links for the attorney to download.
Quick tangent: I must note here that PDF-Explode can automatically email bursted documents to an email address you specify in each tag, if you so desire, but that was a little scary for us. Our users wanted time to review and reflect before publishing anything.
So now it takes just a couple of hours (and many times much less) to burst a document and drop the files into a folder, vs. about a week doing things the old way. And attorneys love being able to get their financial documents whenever they like from a web page (I did, in fact, implement several security measures, which I may go into in a future post).
Does this make sense? Let me know in the comments if you have any questions about bursting, or want to know more about how this was implemented.