Back to Portfolio

Project: Image Compression and Classification System

Implementation of DCT compression algorithms, ML classification with HOG, and mosaic generation

Project Summary

This comprehensive academic project covered three main areas of image processing and artificial intelligence. I implemented from scratch a compression algorithm based on Discrete Cosine Transform (DCT) that managed to surpass the JPEG standard in certain scenarios. Additionally, I developed and evaluated multiple Machine Learning models for handwritten digit classification, significantly improving results through the use of HOG descriptor. Finally, I created a photographic mosaic generator that applies these concepts creatively.

24.65 dB
Best PSNR Achieved
98.94%
Accuracy with SVM + HOG
6.83x
Best Compression Ratio
3
Algorithms Compared

Skills and Technologies

Languages & Libraries

  • Python
  • NumPy & SciPy
  • scikit-image
  • scikit-learn
  • Matplotlib
  • OpenCV

Technical Concepts

  • Discrete Cosine Transform (DCT)
  • Lossy Image Compression
  • Histogram of Oriented Gradients (HOG)
  • Machine Learning (KNN, SVM, Trees)
  • Digital Image Processing
  • Model Evaluation (PSNR, SSIM, ΔE)

Methodologies

  • Algorithm Implementation from Scratch
  • Parameter Optimization
  • Comparative Quantitative Analysis
  • Cross-Validation
  • Feature Engineering
  • Results Visualization

Part 1: Image Compression with DCT

Challenge

Implement an image compression algorithm that surpasses the JPEG standard in terms of compression-quality ratio (PSNR) for certain configurations.

Implementation

I developed a complete codec from scratch that includes:

  • Conversion from RGB to YCbCr color space
  • Image division into 8x8 pixel blocks
  • Application of Discrete Cosine Transform (DCT) to each block
  • Quantization using custom matrices
  • Encoding through zig-zag pattern and RLE (Run-Length Encoding)
  • Efficient storage using pickle

Optimization

Instead of using the standard JPEG quantization matrix, I implemented a random search system to find more efficient matrices defined as Q(i,j) = a + b(i+j). I performed 200 iterations with random values of a (3-100) and b (1-40).

Key Results

Comparison with JPEG

I found 3 configurations that surpassed JPEG:

  • a=17, b=3: PSNR=24.65 dB, Ratio=6.83
  • a=24, b=2: PSNR=24.44 dB, Ratio=7.43
  • a=4, b=6: PSNR=24.44 dB, Ratio=6.24

Compared to JPEG result: PSNR=24.41 dB, Ratio=5.79

Visual Analysis

Although visual differences are minimal, quantitative metrics demonstrate significant improvement. The algorithm achieves higher compression while maintaining or improving image quality.

This result shows that the standard JPEG quantization matrix is not universally optimal and can be improved for specific images.

PSNR vs Compression Ratio (200 Experiments)

This scatter plot compares JPEG against 200 random quantization matrices tested. The green points represent the three configurations that surpassed JPEG in both PSNR and compression ratio.

PSNR Scatter Comparison

Part 2: Handwritten Digit Classification

Challenge

Develop and evaluate Machine Learning models to classify handwritten digits from the MNIST dataset, comparing raw pixels versus HOG features.

Methodology

I used the MNIST dataset with 70,000 images of 28x28 pixels. I implemented and compared three algorithms:

  • K-Nearest Neighbors (KNN) with k=3
  • Decision Tree with Gini criterion
  • SVM with RBF kernel (C=5, gamma=0.05)

I evaluated the models through 5-fold cross-validation to obtain robust results.

HOG Feature Extraction

I implemented the Histogram of Oriented Gradients (HOG) descriptor that captures the structure and shape of digits through the distribution of local gradients. This provides more robust features invariant to small transformations.

Results

Model Comparison (Average Accuracy)

Model Without HOG With HOG Improvement
KNN (k=3) 97.12% 97.37% +0.25%
Decision Tree (Gini) 87.01% 86.87% -0.14%
SVM (RBF) 98.31% 98.94% +0.63%

Results Analysis

SVM with RBF kernel proved to be the most effective model, achieving 98.94% accuracy with HOG features. Using HOG consistently improved results for KNN and SVM, while for Decision Trees it showed no significant benefits.

The HOG descriptor provides more compact and less noisy features than raw pixels, explaining the improvement in distance-based models like KNN and SVM.

HOG Visualization

Applying HOG to a PPM image allowed visualization of how this descriptor captures edges and image structure, confirming its usefulness for classification tasks where shape is more important than exact pixel values.

HOG Feature Representation

This visualization highlights how HOG captures structural information from the images, producing more discriminative features for classification models.

HOG Visualization

Part 3: Photographic Mosaic Generation

Challenge

Create a mosaic composed of thumbnails that preserves the structure and appearance of a large reference image.

Implementation

I divided a large image (Tokyo.jpg) into 32x32 pixel blocks and used the CIFAR-10 dataset as thumbnail source. I implemented and compared three approaches to find the most similar thumbnail for each block:

  • RGB Mean: Calculates average color of each block and thumbnail
  • HSV Histogram: Uses color distribution in HSV space
  • Combined: Combines RGB mean with HSV histogram and color adjustment

For comparison, I used Euclidean and Cityblock (L1) distances.

Evaluation

I compared results using three objective metrics:

  • PSNR (Peak Signal-to-Noise Ratio): Measures pixel-to-pixel similarity
  • SSIM (Structural Similarity): Evaluates structural similarity
  • ΔE (Delta E): Measures perceptual color difference

Results

Method PSNR SSIM ΔE (avg)
Mosaic - Mean RGB 11.531 0.117 19.010
Mosaic - HSV Histogram 10.022 0.145 22.124
Mosaic - Combined (RGB+HSV+Adjust) 11.622 0.131 18.411

Results Analysis

The combined method (RGB+HSV+Adjust) achieved the best PSNR and lowest color difference (ΔE), proving to be the most faithful to the original image. Although the HSV method achieved the best SSIM, its visual performance was less natural due to prioritizing structure over chromatic fidelity.

The color adjustment applied in the combined method was crucial for improving chromatic correspondence between original blocks and selected thumbnails.

Visual Conclusion

Visual evaluation confirmed quantitative results: the mosaic generated with the combined method presents the best combination of structural detail and chromatic fidelity, creating a more recognizable representation of the original image.

Visual Comparison of Mosaic Methods

This combined figure shows the original image next to the three mosaic versions created with different descriptors. The combined method offers the best balance between structure and color fidelity.

Comparison of Mosaic Methods

General Conclusions

Image Compression

The custom implementation of the DCT compression algorithm demonstrated that it's possible to surpass the JPEG standard by optimizing the quantization matrix. Results suggest that adaptive matrices could offer significant improvements over the standard approach.

Classification with ML

Using HOG features consistently improves the performance of distance-based classification models like KNN and SVM. SVM with RBF kernel was confirmed as the most effective algorithm for this task, achieving accuracies close to 99%.

Photographic Mosaics

The combination of multiple descriptors (RGB + HSV) with color adjustment produces the best results in mosaic generation, balancing structural and chromatic fidelity. This hybrid approach surpasses methods using individual descriptors.

Next Steps

As future extensions of this work, we could explore the use of adaptive quantization matrices based on image content, the application of convolutional neural networks for digit classification, and the integration of semantic criteria in mosaic generation.