NoProp Diffusion-Based Learning
An experimental PyTorch implementation of the NoProp algorithm, exploring its application in two distinct domains: image classification and time-series forecasting. NoProp trains neural networks without any form of end-to-end gradient propagation, replacing backpropagation with a diffusion-based local learning rule.
The approach is grounded in denoising diffusion theory: each layer independently learns to reconstruct a clean label embedding from a noisy state, conditioned on the input data — enabling fully modular, parallel training of deep networks.
Reference Paper: NoProp: Training Neural Networks without Back-propagation or Forward-propagation Qinyu Li, Yee Whye Teh, Razvan Pascanu (2025) — arXiv:2503.24322
Project Implementations
This repository is divided into two main experimental applications of the NoProp architecture:
NoProp for Classification (MNIST). A direct implementation of the paper’s core concept using the MNIST dataset. The goal is to classify digits using a forward diffusion process on label embeddings. The architecture consists of a sequential BlockNN model where gradients are detached between steps, demonstrating how the model learns to reconstruct labels from noisy states without end-to-end backpropagation.
NoProp for Time-Series Prediction (Autoencoder). An extension of the NoProp algorithm applied to sequential data. The model forecasts time-series data by embedding the NoProp architecture within an Autoencoder (AE) framework, using diffusion-based training principles to learn latent representations of time-series windows while avoiding standard propagation through the entire temporal sequence.
Model Architecture
The classification model consists of a sequence of BlockNN modules (T steps). Each block consists of two main components that merge information: a CNN Image Encoder (2× Conv2d layers, ReLU activations, MaxPool2d, producing a 32-dim feature vector), a Label Embedding MLP processing the noisy label state zt−1 from the diffusion process, and a Fusion MLP that concatenates both streams and outputs the denoised label estimate ẑ0.
The key mechanic underlying the whole approach is gradient detachment: during training, the input to the current block zt−1 is detached from the computation graph (z[t-1].detach()). This ensures that the optimizer updates parameters for the current block without backpropagating errors to previous blocks, making each module entirely self-contained.
Training & Results (MNIST)
The model is trained on the MNIST dataset (60,000 train samples, 10,000 test samples) with T = 10 diffusion steps, the Adam optimizer (LR: 0.001), and a weighted MSE loss based on the Signal-to-Noise Ratio (SNR). The model achieves high accuracy rapidly:
| Epoch | Avg Loss | Train Accuracy | Test Accuracy |
|---|---|---|---|
| 1 | 0.5416 | 94.31% | 94.62% |
| 2 | 0.1529 | 95.98% | 96.11% |
| 3 | 0.1094 | 97.38% | 97.52% |
| 4 | 0.0878 | 97.79% | 97.98% |
| 5 | 0.0717 | 98.15% | 98.15% |
Usage
Clone the repository, then open NoProp.ipynb in Jupyter Notebook or Google Colab and run all cells to download the MNIST dataset and begin training. Prerequisites: Python 3.x, PyTorch, Torchvision, Matplotlib, NumPy, and Torchsummary.
# Run inference on a single batch after training:
x_test, y_test = next(iter(test_loader))
prediction = classify_batch(x_test, blockNN)
File Structure:
-
NoProp.ipynb— Main notebook containing the MNIST classification implementation. -
NoProp_TimeSeries.ipynb— Implementation of the Autoencoder for time-series forecasting. -
data/— Directory for datasets (auto-downloaded on first run). -
slides/— LaTeX slides explaining the math and diffusion process.