A 'from-scratch' Python Neural Network library implementing automatic gradient tracking via computational graph.
This is a minimal, "from-scratch" Pytorch-like library, built with Python nad NumPy. It provides:
- A dynamic computational graph with automatic gradient accumulation and computation
- A small Pytorch-like neural network API
- Core layers including Fully Connected layers and Conv2D layers
- Common Activation functions, Loss functions, and optimisers
This represents a tensor in the graph, holding:
value: A NumPy arraygrad: The gradient of the value with respect to some final lossparents: Upstream nodes that the output value of this node relies onop: The operation that created this Node.
A leaf node (has no parents and no op). Typically used to represent inputs or trainable parameters.
Defines two methods:
forward(input)->outputbackward(output_gradients, node)->[input_gradients]
Child classes (Add, Mul, MatMul, etc.) implement Operation methods.
In a fully explicit graph, there would be 'Value Nodes' (holding tensors) and 'Operation Nodes' (implementing the logic of the operation). Sequences of edges would go Value -> Operation -> Value ...
In this implementation, the graph is compressed, and operation node is collapsed into the child Value Node, keeping it as the child.op attribute, and child.parents are the upstream nodes.
Conceptually, this implementation can be thought of as following the rules:
- Nodes carry data
- Operations are transformations on edges between nodes.
The directed edge from parent to child contains not just a link, but also the operation by which the parent's values combine to create the child node's value. When we write the operation Add operation in the op attribute.
When loss.backward() is invoked, the following occurs:
-
The Value nodes in the graph are topologically sorted. This is to ensure the child gradients required to calculate the gradient of a parent node are always available in time.
-
All node's gradients are zeroed/cleared.
-
Set the output node's gradient to 1 (as
$\partial \mathcal{L} / \partial \mathcal{L} = 1$ ). -
Walk the topologically-sorted nodes in reverse:
- At each Value Node,
n, inspectn.op: - Call
n.op.backward(n.grad, n), which will return a list of gradients (corresponding to each parent node) - Accumulate each computed gradient into the corresponding parent's
gradattribute.
- At each Value Node,
- Python >= 3.12
- Clone the repository:
git clone https://github.com/rates37/cerebra.git
cd cerebra- Install dependencies using uv:
uv sync- Verify installation:
uv run python -c "import cerebra; print('cerebra installed successfully')"To run scripts or examples within the environment:
uv run python examples/your_script.pyTo add new dependencies:
uv add some-packageTo run tests:
uv run pytestWhen adding new features, please ensure that the test coverage is maintained above 80%. While coverage is not necessarily indicative of the quality of the tests, completely untested code is a significantly worse issue. To check the coverage, run:
uv run pytest --cov=cerebra --cov-report=term-missingComing soon. For now, see the examples directory.
MIT License - See LICENSE file for details.
Contributions are always appreciated. Please feel free to fork this repo, add your features/bug fixes, and open a pull request. If contributing a new feature, be sure to add sufficient tests to ensure the correctness of the feature.