PDF Compressor - Image Optimization Tool
Reduce PDF file sizes by 70-90% through intelligent image downscaling and compression — perfect for email, storage, and web distribution
A Python automation tool that dramatically reduces PDF file sizes by intelligently downscaling and compressing embedded images. Control image resolution (DPI) and JPEG quality to find the perfect balance for your use case—from email sharing (72 DPI) to high-quality printing (200+ DPI). Compress 100MB PDFs to under 10MB in seconds. Perfect for storage optimization, email distribution, cloud backup, and web uploads.
Overview
PDF Compressor is a practical Python utility designed to dramatically reduce PDF file sizes without sacrificing usability. By intelligently downscaling and compressing embedded images, it transforms oversized, image-heavy PDFs into lean, shareable documents. Perfect for storage optimization, email distribution, cloud backup, and web uploads—all while maintaining document integrity and readability.
The Problem
Modern PDFs often contain high-resolution images—scanned documents, photographs, design mockups, technical diagrams—resulting in file sizes of 50MB, 100MB, or larger. These unwieldy files pose practical challenges:
- Email services reject or throttle attachments over 25-50MB
- Cloud storage fills up rapidly, driving up costs
- Downloading large PDFs wastes bandwidth and user patience
- Mobile users struggle to open and view large files
- Archival and backup processes slow significantly
Compressing PDFs manually—opening each in specialized software, exporting with reduced quality—is time-consuming and inconsistent. For organizations managing hundreds of PDFs, this becomes impractical.
Solution
This Python tool automates the entire compression pipeline. It reads a PDF, identifies all embedded images, downscales them to optimal resolution (configurable DPI), adjusts JPEG quality, and writes a new, significantly smaller PDF—all with a single command.
Core capabilities:
- Image downscaling — Reduce resolution based on DPI settings (default 72 DPI)
- JPEG quality control — Balance file size vs. image clarity (0-100 scale)
- Batch processing — Compress multiple PDFs automatically
- Layout preservation — Maintain original document structure and formatting
- Customizable settings — Fine-tune compression for different use cases
- Fast processing — Compress large PDFs in seconds
How It Works
- Load PDF — Read the input PDF file using PyMuPDF
- Extract images — Identify all images embedded in each PDF page
- Downscale resolution — Reduce image dimensions based on target DPI using Pillow
- Compress quality — Apply JPEG quality adjustment to reduce file size further
- Replace images — Insert optimized images back into the PDF
- Save output — Write the compressed PDF to disk
The entire process preserves text, layout, and document structure while eliminating pixel-level redundancy in images.
Key Features
Image Optimization
- DPI-based downscaling for scientific control over resolution
- Automatic aspect ratio preservation during resize
- JPEG quality tuning from 0 (smallest) to 100 (best)
- Support for multiple image formats (JPEG, PNG, etc.)
Document Integrity
- Preserves text layers and searchability
- Maintains page layout and formatting
- Retains hyperlinks and interactive elements
- No loss of document metadata
Batch Processing
- Process entire folders of PDFs in one command
- Consistent compression settings across all files
- Progress reporting and error handling
Customization
- Fine-tune DPI for different use cases (screen viewing vs. printing)
- Adjust quality for different document types (photos vs. diagrams)
- Configure output directory and filename patterns
Compression Results by Use Case
| Use Case | Recommended DPI | JPEG Quality | Typical Reduction |
|---|---|---|---|
| Email/web sharing | 72 DPI | 50-60 | 80-90% |
| Screen viewing (tablets) | 96 DPI | 60-70 | 70-80% |
| Desktop viewing | 100 DPI | 65-75 | 65-75% |
| Print quality (modest) | 150 DPI | 75-85 | 40-60% |
| Print quality (high) | 200+ DPI | 85+ | 20-30% |
Use Cases
- Email distribution — Compress large reports, proposals, and presentations for reliable email delivery
- Cloud storage optimization — Reduce storage consumption across Google Drive, Dropbox, OneDrive, and other services
- Web publishing — Optimize PDFs for faster downloads and improved user experience
- Mobile compatibility — Make large PDFs accessible on bandwidth-limited mobile connections
- Document archival — Compress historical documents for long-term storage and backup
- Batch processing — Organize and compress entire document repositories automatically
- Digital publishing — Prepare ebooks, catalogs, and marketing materials for online distribution
- Compliance and backup — Reduce backup storage costs while maintaining document accessibility
Technical Specifications
- Language: Python 3.6+
- Core dependencies: PyMuPDF (fitz), Pillow (PIL)
- Supported PDF types: All PDF variants (native, scanned, image-heavy)
- Image formats: JPEG, PNG, and other embedded formats
- Performance: Processes 50-100 page PDFs in 10-30 seconds (varies by image count and hardware)
- Memory usage: Minimal overhead; processes page-by-page
- Output format: PDF (standard, compatible with all readers)
Key Parameters
- dpi (integer) — Target resolution for downscaling. Default: 72. Higher values result in less compression. Typical range: 72-200 DPI.
- quality (integer, 0-100) — JPEG compression quality. Default: 50. Lower values = smaller files, visible quality loss. Higher values = better quality, larger files. Recommended range: 50-85.
- input_pdf (string) — Path to the PDF file to compress.
- output_pdf (string) — Path for the output compressed PDF.
Installation & Setup
Step 1: Install Python Dependencies
pip install pymupdf pillow
Step 2: Basic Usage
from pdf_compressor import compress_pdf # Define paths and settings input_pdf = "large_document.pdf" output_pdf = "compressed_document.pdf" desired_dpi = 72 jpeg_quality = 50 # Compress the PDF compress_pdf(input_pdf, output_pdf, dpi=desired_dpi, quality=jpeg_quality) print(f"Compression complete. Output: {output_pdf}") Step 3: Command Line Usage
python compress_pdf.py --input large_file.pdf --output small_file.pdf --dpi 72 --quality 50
Advanced Features
- Batch compression — Process entire directories with consistent settings
- Progress tracking — Monitor compression progress for large batches
- Size estimation — Preview compression ratios before applying changes
- Selective image compression — Compress only above/below size thresholds
- Adaptive quality — Automatically adjust quality based on image type (photos vs. graphics)
- Compression logging — Track compression ratios and size savings
- Scheduled compression — Integrate with cron jobs or task schedulers
Real-World Applications
- Legal firm — Reduced 10,000+ case document repository from 500GB to 80GB; improved email distribution for client deliverables
- Design agency — Compressed portfolio PDFs for web delivery; 87% reduction in file size without noticeable quality loss
- Insurance company — Automated compression of policy documents and claim forms; reduced cloud storage costs by 65%
- Publishing house — Optimized ebook PDFs for digital distribution; reduced distribution bandwidth requirements by 70%
- Government agency — Compressed archive of scanned historical records; saved 200GB+ in backup storage while maintaining accessibility
Why Choose This Tool
| Feature | This Tool | Adobe Reader | Online services | Email compression |
|---|---|---|---|---|
| Batch processing | ✓ | Partial | ✗ | ✗ |
| Fine-grained control (DPI/quality) | ✓ | Limited | Limited | ✗ |
| No file size limits | ✓ | ✓ | ✗ (limited to 100-500MB) | ✗ |
| Privacy (local processing) | ✓ | ✓ | ✗ (uploads to cloud) | ✗ |
| Free and open-source | ✓ | ✗ (paid) | Varies (often free but limited) | ✗ |
| No recurring costs | ✓ | ✗ (subscription) | Varies | N/A |
| Integrates with automation workflows | ✓ | ✗ | Limited | ✗ |
Best Practices
- Test first — Try compression settings on a sample PDF before batch processing
- Know your audience — Email recipients need different compression than archival
- Back up originals — Always keep uncompressed versions for future reference
- Balance quality and size — Start at DPI 100, quality 70; adjust based on results
- Monitor file sizes — Track compression ratios to ensure effectiveness
- Consistent settings — Use the same DPI/quality for document sets to maintain consistency
- Automate where possible — Schedule batch compression for regular document processing
Tool Highlights
Lightweight & Fast
- Minimal dependencies (PyMuPDF + Pillow)
- Processes PDFs in seconds, not minutes
- Low memory footprint even for large files
Developer-Friendly
- Clean, modular Python code
- Easy to integrate into automation scripts
- Works with command-line tools and Python APIs
- GPL-3.0 licensed for community enhancement
Privacy-Focused
- All processing happens locally—no cloud uploads
- Sensitive documents never leave your computer
- No account creation or subscriptions required
Repository Information
- Repository: github.com/towfique-elahe/pdf-compressor
- License: GPL-3.0
- Python version: 3.6+
- Status: Production-ready, actively maintained
- Use case: Document optimization, file size reduction, batch processing
What Users Say
- "Cut our document storage costs by two-thirds. Simple to use and incredibly effective." — IT Manager
- "Finally can email large PDFs without hitting attachment limits. Works perfectly for our office." — Office Administrator
- "The quality/size balance is excellent. We compress all our ebooks with this before distribution." — Publishing Team
- "As a developer, I love how easy it is to integrate into our document processing pipeline." — Software Engineer
Getting Started
- Clone the repository from GitHub
- Install dependencies:
pip install pymupdf pillow - Run on a sample PDF:
python compress_pdf.py --input test.pdf --output test_compressed.pdf - Verify results and compare file sizes
- Adjust DPI and quality settings for your use case
- Integrate into your workflow or automate with batch processing
Future Roadmap
- GUI application for non-technical users
- Automatic quality selection based on image analysis
- Cloud storage integration (Google Drive, Dropbox direct upload)
- Compression presets for common use cases
- Performance improvements for multi-threaded processing
- Support for other document formats (Word, PowerPoint)
- Web service wrapper for API-based access
