- Author: Dylan Zelkin
- Employer: University of Colorado, Denver
- Supervisor: Mazen Al Borno
- Lab: http://cse.ucdenver.edu/~alborno/
This is an imitation learning project which uses reinforcmenent learning to train deep neural networks to control biomechanical and torque driven models by minimizing the difference between a desired kinematic motion and the actual motion enacted by a network. In this implementation of imitation learning, the network takes, as input, the joint angles and velocities, and outputs the muscle activations or torque activations respectively where each network learns a single unique motion.
In addition, there is a variation of the environment that can be setup to train a generalized kinematic model which can perform the motion of any desired kinematics on the fly by training on a large number of unique kinematics. It does this by adding a vector to the observation space with values that are the difference between the current position and a set number of future kineamtic positions (see path_steps under the config parameters to enable this functionality).
The mouse forelimb physics models have been adapted from the biomechanical mouse forelimb model from Gilmer et al. [1]; originally an OpenSim model, the torque and muscle models available here are implemented and simulated in MuJoCo [2]. The DRL libray used here is StableBaselines3 [3] which offers a varity of reliable learning algorithms; however, the main algorithm in use here is PPO [4].
When training is completed, agents are saved in the ./agents/ folder and contain the following: training logs, the config file used when the model was created, and a zip managed by StableBaselines. Currently the only supported model architecture is a shared LSTM [5] backbone, split off into dense layered reward and action heads.
This project was created and tested on linux (specifically ubuntu), and while it might work on other systems, is not guarenteed.
Torque Driven Solution |
Muscle Driven Solution |
-
Download Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -
Run the Installer
bash Miniconda3-latest-Linux-x86_64.sh
-
Install Git (if not done so already)
sudo apt update && sudo apt install -y git -
Clone Repo from Github and Open
git clone https://github.com/Al-Borno-Lab/MouseArmImitationLearning.git cd MouseArmImitationLearning -
Create Python Environment and Activate
conda env create -f environment.yml conda activate MouseArmImitationLearningEnv -
(Optional) Install Tensorboard for Numerical Results Visualization
pip install tensorboard
-
Install Huggingface Hub (if not done so already)
pip install -U huggingface_hub -
Download Mujoco Model
hf download AlBornoLab/MouseArmModel --repo-type dataset --local-dir ./models -
(OPTIONAL) Download Dataset: MouseArmKinematics
hf download AlBornoLab/MouseArmKinematics --repo-type dataset --local-dir ./MouseArmKinematics -
(OPTIONAL, REQUIRES AUTH) Download Dataset: Welle
hf download AlBornoLab/Welle --repo-type dataset --local-dir ./Welle
This section details which parameters can be tuned from the imitation learning environment, policy, algorithm, and training and testing scripts.
-
General
- name: Name of the model (if there is no folder under ./agents/... with that name, then the train script will create one instead of continuing training; the test script will fail; if an existing model is used, all config data is pulled from it's relevant config file instead)
-
Environment
- model: Mujoco model file to use
- kinematics: Kinematic data to use (can be a file for single kinematics, or a folder containing files for generalized kinematics)
- train_ratio: The trainig ratio used in splitting the kineamtic data (only matters for generalized kineamtics)
- seed: Random seed used when shuffling and splitting the kinmatic data (only matters for generalized kinematics)
- path_steps: The number of future timesteps to sample kinematics from and include in the observation (0 for single kinematics, >1 for generalized kinematics)
- w_bone_diff: A weight on the average difference between tracked bone locations in the reward function
- w_elbow: A weight on the elbow in the bone average difference
- w_paw: A weight on the paw in the bone average difference
- w_effort: A weight on the effort used by all actuaturos in the reward function
- w_qvel: A weight on the difference between qvel on the joints in the reward function
- w_qpos: A weight on the difference between qpos on the joints in the reward function
- w_action: A weight on the difference between action outputs in the reward function
- control_dt: Total simulation time step size per environment step
- n_substeps: Simulation substeps per environment step (increasing improves simulation stability)
-
Policy
- lstm_hidden_size: Number of parameters in the lstm
- n_lstm_layers: Number of lstm layers
- net_arch_pi: A list of layers for the action head
- net_arch_vf: A list of layers for the reward head
-
Algorithm (There are more advanced terms in the config that are unlisted here, see the stablebaselines RecurrentPPO API for more info)
- learning_rate: Learning rate for training
- n_steps: Total number of steps per environment per iteration
- batch_size: Total number of steps per batch
- n_epochs: Training epochs per iteration
-
Training
- timesteps: Total timesteps across all training
- num_envs: Number of environments running in parallel
- eval_freq: Timesteps between evaluations
-
Testing
- slowmo: Sleep time between frames (visual only), increase for greater slowmo effect
-
Train a Model
python train.py -
Visualize Training Results with Tensorboard
PORT=$(shuf -i 6006-9000 -n 1); tensorboard --logdir ./logs --port $PORT & sleep 2 && xdg-open http://localhost:$PORT -
Test a Model's Performance in a Live Viewer
python test.py
[1] Gilmer, Jesse I., Susan K. Coltman, Geraldine Cuenu, John R. Hutchinson, Daniel Huber, Abigail L. Person, and Mazen Al Borno. "A novel biomechanical model of the proximal mouse forelimb predicts muscle activity in optimal control simulations of reaching movements." Journal of neurophysiology 133, no. 4 (2025): 1266-1278.
[2] Todorov, Emanuel, Tom Erez, and Yuval Tassa. "MuJoCo: A physics engine for model-based control." 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (2012): 5026-5033.
[3] Raffin, Antonin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. "Stable-Baselines3: Reliable Reinforcement Learning Implementations." Journal of Machine Learning Research 22, no. 268 (2021): 1-8.
[4] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal Policy Optimization Algorithms." arXiv preprint arXiv:1707.06347 (2017).
[5] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long Short-Term Memory." Neural Computation 9, no. 8 (1997): 1735-1780.

