This project represents an in-depth analysis of the performance of Deep Learning algorithms, contrasting two fundamental architectural paradigms: Convolutional Networks (CNN) and Multilayer Perceptron (MLP). The study explores how each architecture processes images, reacts to data augmentations, and benefits from techniques such as Transfer Learning.
The project is modular, being designed to run fully reproducible experiments through a simple configuration file.
-
config.toml: This is the "command center" of the project. Here all the parameters of the experiment are defined: the dataset (e.g.imagebits), the model type (CNNorMLP), the training hyperparameters (epochs, learning rate, dropout), and the desired augmentations. -
main.py: This is the main execution script. It reads the configuration, initializes the datasets and the chosen model, applies the Transfer Learning logic (if a path to pre-trained weights is provided) and triggers the training and evaluation processes. -
data_loader.py: Handles data loading and preprocessing. It uses thealbumentationslibrary to apply advanced transformations (e.g. rotations, noise, cropping). It also includes theTransformWrapperclass that guarantees that image resizing (to the standard 64x64 size) is applied correctly before any further processing. -
models.py: Contains the network architectures and the training engine. -
Defines
SimpleCNN(based on 3 convolutional blocks and pooling) andSimpleMLP(a dense, pyramidal architecture with layers of 2048, 1024 and 512 neurons, stabilized by Batch Normalization). -
The
train_enginefunction manages the training loop, including Early Stopping and dynamic learning rate adjustment (Scheduler) functionalities to prevent over-training. -
visualization.py: It is responsible for exploratory data analysis (EDA) and reporting. It can generate useful visualizations such as "ghosts" (average image) per class, RGB distributions and, at the end of the evaluation, performance curves (Loss/Accuracy) and confusion matrix.
The experiments were run on two distinct datasets: ImageBits (geometric shapes and objects) and Land Patches (satellite textures). The results highlight major architectural differences:
Clear superiority of CNNs: On the ImageBits set, the best CNN model achieved a test set accuracy of 70.48%, while the best MLP model achieved only 51.94%. This major difference persists even though the MLP uses ~25 million parameters, compared to only ~590,000 parameters used by the CNN.
The "Rotation Catastrophe" for MLPs: Experiments have shown that geometric augmentations severely affect MLP networks. A simple 10-degree rotation massively permutes pixels in the flattened input vector, making optimization and convergence virtually impossible for an MLP (accuracy dropping below 48%). CNNs, on the other hand, handle these transformations naturally due to their translation invariance.
Transfer Learning works differently: On the Land Patches images, Transfer Learning boosted the CNN to an excellent 90.60% accuracy (a 3.5% improvement over training from scratch). For the MLP, transfer brought a negligible improvement of only 0.7%, because its neurons learn position-specific correlations of pixels, which are not transferable between different image types.
Behavior on Textures vs. Shapes: Although the MLP is weak at identifying geometric shapes, it performed surprisingly well on homogeneous textures (such as forests or industrial areas in the Land Patches), where the global color distribution partially compensates for the lack of local shape detection.