A machine learning system for classifying different types of documents using computer vision and pattern recognition techniques. The system can distinguish between comics, books, manuscripts, typewritten documents, and tickets.
- Multiple Classifier Configurations: Four different classifier setups with varying preprocessing techniques
- Document Scanning: Automatic document rectification and perspective correction
- Flexible Preprocessing: Support for grayscale conversion and image transformations
- Dimensionality Reduction: Optional Linear Discriminant Analysis (LDA) for feature reduction
- Model Persistence: Save and load trained models for reuse
- Comprehensive Evaluation: Detailed performance metrics and classification reports
The system classifies documents into five categories:
- Comics: Comic books and graphic novels
- Libros (Books): Regular printed books
- Manuscrito (Manuscript): Handwritten documents
- Mecanografiado (Typewritten): Typewritten documents
- Tickets: Receipts, tickets, and similar small documents
| Classifier | LDA | Document Scanner | Grayscale |
|---|---|---|---|
| C1 | ❌ | ❌ | ❌ |
| C2 | ✅ | ❌ | ❌ |
| C3 | ❌ | ✅ | ✅ |
| C4 | ✅ | ✅ | ✅ |
pip install opencv-python numpy scikit-learn joblib argparseopencv-python: Image processing and computer visionnumpy: Numerical computationsscikit-learn: Machine learning algorithmsjoblib: Model serializationargparse: Command-line argument parsing
- Clone or download the repository
- Install the required dependencies
- Ensure you have the
scanner.pymodule (DocumentScanner class) - Create the following directory structure:
project/
├── data/
│ ├── Learning/
│ │ ├── comics/
│ │ ├── libros/
│ │ ├── manuscrito/
│ │ ├── mecanografiado/
│ │ └── tickets/
│ └── Test/
│ ├── comics/
│ ├── libros/
│ ├── manuscrito/
│ ├── mecanografiado/
│ └── tickets/
├── models/ (created automatically)
└── main.py
Train all classifier configurations:
python main.py --trainThis will:
- Load training images from
./data/Learning/ - Train four different classifier configurations
- Save models to
./models/directory
Evaluate all trained classifiers:
python main.py --testThis will:
- Load test images from
./data/Test/ - Evaluate each classifier configuration
- Display accuracy scores and classification reports
Classify a single image:
python main.py path/to/image.jpgUses the best performing classifier (C2 by default) to predict the document class.
IMG_SIZE = (400, 300): Standard image dimensions for processingBEST_CLASSIFIER = 1: Index of the best performing classifier for predictionsTRAIN_DIR = "./data/Learning": Training data directoryTEST_DIR = "./data/Test": Test data directory
Modify CLASSIFIER_CONFIGS to experiment with different preprocessing combinations:
CLASSIFIER_CONFIGS = {
'C1': {'use_lda': False, 'use_scanner': False, 'to_gray': False},
'C2': {'use_lda': True, 'use_scanner': False, 'to_gray': False},
'C3': {'use_lda': False, 'use_scanner': True, 'to_gray': True},
'C4': {'use_lda': True, 'use_scanner': True, 'to_gray': True}
}Organize your training and test data in the following structure:
data/
├── Learning/
│ ├── comics/
│ │ ├── comic1.jpg
│ │ ├── comic2.png
│ │ └── ...
│ ├── libros/
│ │ ├── book1.jpg
│ │ └── ...
│ └── ... (other classes)
└── Test/
├── comics/
│ ├── test_comic1.jpg
│ └── ...
└── ... (other classes)
- Algorithm: Support Vector Machine (SVM) with linear kernel
- Features: Raw pixel values or LDA-transformed features
- Preprocessing: StandardScaler for feature normalization
- Document Scanner: Automatic document rectification using perspective transformation
- Linear Discriminant Analysis: Dimensionality reduction to 4 components
- Grayscale Conversion: Color to grayscale transformation for certain configurations
[TRAINING COMPLETED] Classifier C1
[TRAINING COMPLETED] Classifier C2
[TRAINING COMPLETED] Classifier C3
[TRAINING COMPLETED] Classifier C4
[EVALUATING] Classifier C1
Accuracy: 0.8542
precision recall f1-score support
...
=== RESULTS SUMMARY ===
C1: 0.8542
C2: 0.8790
C3: 0.8333
C4: 0.8611
Prediction: libros
main.py: Main script with all functionalityscanner.py: Document scanner module (required dependency)models/: Directory containing trained modelsmodel_C*.joblib: Trained SVM modelsscaler_C*.joblib: Feature scalersmodel_C*_lda.joblib: LDA transformers (for applicable classifiers)
- Image Quality: Use high-quality, well-lit images for better results
- Consistent Sizing: The system resizes images to 400x300, maintain aspect ratios when possible
- Data Balance: Ensure balanced representation of all document classes in training data
- Scanner Usage: Use document scanner for images with perspective distortion or poor alignment
- "Could not load image" error: Check file path and image format compatibility
- "Could not rectify image" error: Document scanner failed - image may be too distorted
- Low accuracy: Ensure sufficient training data and proper class balance
- Missing models: Run training mode before evaluation or prediction
- JPEG (.jpg, .jpeg)
- PNG (.png)
- BMP (.bmp)
- TIFF (.tiff)
This project is provided as-is for educational and research purposes.
When contributing to this project:
- Maintain code style and documentation standards
- Test new features with multiple classifier configurations
- Update this README for any new functionality or configuration options