AMaC-Optimizer

The AMaC (Adaptive Momentum and Cosine Annealing) Optimizer is a custom PyTorch optimizer that combines several advanced optimization techniques to improve training stability and performance.

AMaC Optimizer-Vision

AMaC integrates multiple advanced optimization techniques into a single algorithm to enhance neural network training. By combining adaptive momentum estimation, cosine annealing, SWA, and other methods, AMaC achieves faster convergence, better stability, and improved generalization

Features

Learning Rate Warm-up: Gradual increase in learning rate at the beginning of training.
Cosine Annealing: Smooth decrease in learning rate.
Gradient Centralization: Centering gradients to improve stability.
Momentum Updates: Using moving averages of the gradient and squared gradient.
Adaptive Noise Injection: Injecting noise into updates to escape local minima.
Gradient Clipping: Clipping gradients to prevent exploding.
Dynamic Weight Decay: Adjusting weight decay dynamically during training.
Lookahead Mechanism: Periodically updating slow weights with the current weights.
Stochastic Weight Averaging (SWA): Averaging model weights over time for smoother training trajectory.

Comparison Table: Adam Optimizer vs. AMaC Optimizer

Feature	Adam Optimizer	AMaC Optimizer
Learning Rate	Fixed or dynamically adjusted	Dynamically adjusted with warmup and cosine annealing
First Moment Estimate	Yes (moving average of gradients)	Yes (moving average of gradients)
Second Moment Estimate	Yes (moving average of squared gradients)	Yes (moving average of squared gradients)
Bias Correction	Yes	Yes
Weight Decay	Fixed	Dynamic
Gradient Clipping	No	Yes
Noise Injection	No	Yes (adaptive noise injection)
Momentum	Optional (e.g., AdamW)	Yes (with Lookahead mechanism)
Lookahead Mechanism	No	Yes
Stochastic Weight Averaging (SWA)	No	Yes
Learning Rate Warmup	Optional	Yes
Gradient Centralization	No	Yes
Optimizer Type	Adaptive	Adaptive
Parameter Update Rule	Updates parameters based on the average of past gradients (momentum) and current gradient direction.	Updates parameters based on a combination of momentum, adaptive learning rates, noise injection, and Lookahead mechanism.
Implementation Complexity	Moderate	High
Use Case	General-purpose optimization	Enhanced stability and convergence, particularly in noisy or complex optimization landscapes.

Benefits

Improved Convergence Speed:

Learning Rate Warmup and Cosine Annealing: AMaC starts with a warmup phase where the learning rate gradually increases, followed by a cosine annealing schedule. This helps in avoiding the pitfalls of high initial learning rates and provides a more stable training process. Adaptive Learning Rate: The dynamic adjustment of the learning rate ensures that the optimizer can quickly converge during different phases of training. Enhanced Stability:

Gradient Clipping: By clipping gradients, AMaC prevents the explosion of gradients, which can destabilize the training process. Gradient Centralization: This technique subtracts the mean of gradients to center them, which can improve the optimization process and lead to more stable updates. Better Generalization:

Stochastic Weight Averaging (SWA): By averaging the weights over multiple iterations, SWA helps in obtaining a smoother and more robust final model, improving generalization on unseen data. Robust to Noisy Gradients:

Noise Injection: AMaC introduces adaptive noise to the updates, which can help the optimizer escape local minima and explore the parameter space more effectively, especially in noisy environments. Adaptive Momentum and Lookahead:

Momentum with Lookahead Mechanism: The lookahead mechanism periodically updates "slow" weights to improve convergence stability and performance. This helps in achieving a balance between fast convergence and stability. Dynamic Weight Decay:

Adaptive Weight Decay: The weight decay in AMaC is dynamically adjusted, which can lead to better regularization and prevent overfitting. Use Cases The AMaC optimizer is particularly beneficial in scenarios involving:

Usage

Deep Neural Networks: Where training stability and convergence speed are critical. Noisy Data or Gradients: Where robustness to noise can lead to better performance. Complex Optimization Landscapes: Where traditional optimizers might struggle to escape local minima or saddle points. Training with Limited Data: Where generalization to unseen data is crucial, and techniques like SWA can be highly beneficial.

Installation

Clone this repository and include the AMaC_optimizer.py in your project.

Sample Train Results Using MNIST

Epoch 1/10, Loss: 1.1543, Accuracy: 69.09% Epoch 2/10, Loss: 0.4018, Accuracy: 87.40% Epoch 3/10, Loss: 0.2845, Accuracy: 91.22% Epoch 4/10, Loss: 0.2040, Accuracy: 93.73% Epoch 5/10, Loss: 0.1585, Accuracy: 95.17% Epoch 6/10, Loss: 0.1323, Accuracy: 96.00% Epoch 7/10, Loss: 0.1124, Accuracy: 96.55% Epoch 8/10, Loss: 0.0996, Accuracy: 97.06% Epoch 9/10, Loss: 0.0929, Accuracy: 97.23% Epoch 10/10, Loss: 0.0898, Accuracy: 97.33%

git clone https://github.com/godsonj64/AMaC-Optimizer.git

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
AMaC-MNIST.py		AMaC-MNIST.py
AMaC_optimizer.py		AMaC_optimizer.py
AMaC_v.2.py		AMaC_v.2.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMaC-Optimizer

AMaC Optimizer-Vision

Features

Comparison Table: Adam Optimizer vs. AMaC Optimizer

Usage

Installation

Sample Train Results Using MNIST

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AMaC-Optimizer

AMaC Optimizer-Vision

Features

Comparison Table: Adam Optimizer vs. AMaC Optimizer

Usage

Installation

Sample Train Results Using MNIST

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages