The work builds upon the Flatland environment by AIcrowd, using the policies PPO and DDDQN, and extends them with CommNet, a fully-connected inter-agent message passing model.
This research explores whether learned communication can help agents improve coordination and reduce deadlocks under constrained spatial environments and sparse rewards.
An overview of the CommNet model.
Left: view of module
Middle: a single communication step, where each agents modules propagate their internal state h, as well as broadcasting a communication vector c on a common channel (shown in red).
Right: full model Φ, showing input states s for each agent, two communication steps and the output actions for each agent.
├── flatland_base/ # Main project code base
│ ├── checkpoints/ # *Trained model checkpoints
│ ├── configurations/ # Configuration Files for environment, hyperparameter setup
│ ├── env_snapshots/ # *Saved environment states
│ ├── eval_logs/ # *Logs from evaluation runs (CSV files)
│ ├── eval_seed/ # Stores fixed map seeds for evaluation
│ ├── images/ # *Rendered images of evaluations
│ ├── logs/ # *General log output
│ ├── reinforcement_learning/ # Core RL implementation
│ │ ├── network/ # Policy headers (actor, critic, etc.)
│ │ ├── policy/ # Models (PPO, DDDQN, CommNet)
│ │ └── multi_agent_training.py # Main training logic
│ ├── replay_buffers/ # *Stored experience replay buffers
│ ├── utils/ # Utility functions (e.g., observation wrappers)
│ └── wandb/ # *Weights & Biases logging data
├── docs/ # Scientific paper, notes and images
├── requirements.yaml # Conda environment configuration
├── run_evaluation.py # Script to evaluate saved models
└── run_experiment.py # Script to train models from scratch
Folders with a * will get created during running experiments and evaluations.
Step 1: Clone the Repository
Step 2: Create the virtual Conda Environment
Ensure Miniconda or Anaconda is installed.
conda env create --file requirements.yaml
conda activate commnet-flStep 1: Train a Model
To reproduce experiments, execute:
python run_experiment.pyBy default, this runs the training based on your configuration (e.g., PPO, DDDQN, CommNet, etc.). You can change parameters in the config section of run_experiment.py.
Step 2: Evaluate a Trained Policy
Once training is complete, run:
python run_evaluation.pyThis will load saved checkpoints and evaluate the policy on predefined maps, logging key metrics such as:
-
Completion rate
-
Deadlock frequency
-
Score (Reward)
CSV logs will be written to flatland_base/eval_logs/ and can be used for plotting or further analysis.
Check the JSON config files in the /flatland_base/configurations/ directory. These can be changed to fit hyperparameters or environment configurations before a run for your own experiments.
