Optimizing PDF Compression

Some PDF files use suboptimal image compression; this page lists some of the ways to fix this by recompressing/recreating the file. The methods described should be lossless; it is better to submit a larger file if it cannot be compressed losslessly.

Contents

 [hide

Methods

Using PDF Split and Merge

See here for information on this program.

Using pdfimages

This method basically attempts to reconstruct the PDF file by ripping out the images, and recreating the PDF file from the images. The following is a Ruby script to do this.

Prerequisites

What PDF files this method cannot be used on

  • Grayscale and colour PDF files
  • PDF files that do not contain images (i.e. retypesetted PDF files)
  • Other PDF files with weird creation methods (generally rare)

Common problems and solutions

  • The resulting PDF file has colours inverted
    • Solution: run this script with -negate at the end, for example: ./pdfcompress.rb somefile.pdf -negate for *nix, and ruby pdfcompress.rb somefile.pdf -negate for Windows
  1. #!/usr/bin/ruby
  2. require( 'fileutils' )
  3.  
  4. BASICCONVERTOPTIONS = " -compress Group4"
  5. DELETEIGNOREFILE = false #Automatically delete files which grow in size after recompression?
  6. TMPDIRNAME = "tmpx139toslw"
  7.  
  8. if ARGV[0] === NIL
  9. $stderr.puts "Syntax: pdfcompress.rb <PDF file> ( <additional convert options> )"
  10. exit 1
  11. end
  12.  
  13. if ARGV[1] === NIL
  14. convertoptions = BASICCONVERTOPTIONS
  15. else
  16. convertoptions = ARGV[1] + BASICCONVERTOPTIONS
  17. end
  18.  
  19. begin
  20. Dir.mkdir( TMPDIRNAME )
  21. $stderr.puts "Processing file " + ( file = ARGV[0] ) + "..."
  22. #Convert to individual PDFs
  23. system( "pdfimages \"" + file +"\" " + File.join( TMPDIRNAME, "images" ) )
  24. Dir.glob( File.join( TMPDIRNAME, "*" ) ).each { |imagefile|
  25. $stderr.printf( "\rCompressing " + File.basename( imagefile ) + "..." );
  26. system( "convert #{convertoptions} \"" + imagefile + "\" \"" + imagefile.sub( /\.[^.]*$/, ".tiff" ) + "\"" )
  27. system( "tiff2pdf \"" + imagefile.sub( /\.[^.]*$/, ".tiff" ) + "\" -o \"" + imagefile.sub( /\.[^.]*$/, ".pdf" ) +"\"" )
  28. }
  29. $stderr.printf( "\n" );
  30. #Put them all together now
  31. $stderr.printf( "Combining PDF files... " );
  32. system( "pdftk \"" + Dir.glob( File.join( TMPDIRNAME, "*.pdf" ) ).join( "\" \"" ) + "\" cat output \"" + ( output_filename = File.basename( file ).sub( /#{File.extname( file )}$/, ".2.pdf" ) ) + "\"" )
  33. $stderr.printf( "Done\n" );
  34. #Compare the sizes
  35. if( File.size( file ) > File.size( output_filename ) )
  36. $stdout.puts "Compressed file " + File.basename( file ) + " - Compressed from " + File.size( file ).to_s + " to " + File.size( output_filename ).to_s
  37. else
  38. $stdout.puts "Ignored file " + File.basename( file ) + " - Changed from " + File.size( file ).to_s + " to " + File.size( output_filename ).to_s
  39. File.delete( output_filename ) if DELETEIGNOREFILE
  40. end
  41. ensure
  42. #Clean up temp dir
  43. Dir.glob( File.join( TMPDIRNAME, "*" ) ).each { |delfile| File.delete( delfile ) }
  44. Dir.delete( TMPDIRNAME );
  45. end