[REQUEST] Recommend a pdf batch file size compressor/reducer

Moltuae

Rest In Peace
Reaction score
3,671
Location
Lancs, UK
Looking for recommendations for something (command line or GUI) to batch-compress about 50,000 pdf files, preferably with selectable compression/quality and preferably free or cheap too, since this may be a one-time requirement.

Googling provides numerous results but I'm looking for personal recommendations to save checking/testing them all.

I'm presently using Reduce PDF Size, which works and appears to be clean, but it stops every time it encounters a pdf that it can't reduce, meaning that I have to intervene every few hundred or so files.

Thanks in advance.
 
Thanks @dannyict

Looks like a good piece of software but the free edition uses a 'cloud' conversion engine it seems, which uploads/downloads pretty quickly to be fair, but I'm not comfortable with uploading my customer's pdf documents to a cloud service.

The on-premise edition is probably a little too much to pay for a one-time use (@ $199) and I think I may need to purchase the server edition (which is certainly a tad expensive for a one-off compression job @ $999) because ideally I need to run this on Server 2012 R2.
 
What's in the PDF files? If they're just text, you won't get much size reduction without remastering. If they also contain images, the images themselves can be further compressed at the cost of quality.

Ghostscript would be my go-to application, but I've never used it in Windows (though it is available). Easy enough to do from a Linux live CD, if necessary. Rewrite the files for screen resolution (72 dpi) – not a reversible process. You can also try removing embedded fonts if the files only use standard Windows fonts, but this is not always possible (with Ghostscript). It's fully scriptable.
 
They're invoices and job sheets that have been scanned (at a relatively high resolution) and then OCR'd. Typically they'll compress down to between 1/20th and 1/50th of their original size while remaining perfectly readable. Logos and background images in the documents get very slightly grainy at that compression level, but that's unimportant. Collectively there's about 400GB of them so the space savings are quite significant, especially considering that there are multiple backups of the files being stored too.

I'll take a look at Ghostscript, thanks.
 
They're invoices and job sheets that have been scanned (at a relatively high resolution) and then OCR'd.
Okay, they'll probably compress well, but you may lose the OCR text. Honestly, I can't remember if I ever solved the problem of preserving the text underlay of a scanned PDF document when doing further processing. Losing the OCR text means that the documents are no longer searchable, if that's important.

Google will give plenty of suitable Ghostscript one-liners.
 
The OCR text seems be unaffected by most compression processes I've tried, regardless of the chosen quality.

I have done a few tests with Ghostscript now (using the ps2pdf script) and it appears to work well, although I can't get anywhere near the compression ratio of the 'Reduce PDF Size' application I was using, at least not without a drastic loss of quality. Reduce PDF Size somehow manages a reduction to about 5-10% with only a minor (and very acceptable) loss of quality, while Ghostscript compresses to about 50% for a similar output quality.

Still, 50% is better than nothing and Ghostscript processes the files much faster. And, more importantly, it'll do so unattended (unlike 'Reduce PDF Size' which stops every time it encounters a pdf file it cannot compress). Unless I find anything better, Ghostscript appears to be the way to go, so thanks for the suggestion. :)
 
Back
Top