About

Development

Author: Dylan Zelkin
Employer: University of Colorado, Denver
Supervisor: Mazen Al Borno
Lab: http://cse.ucdenver.edu/~alborno/

Description

This is an imitation learning project which uses reinforcmenent learning to train deep neural networks to control biomechanical and torque driven models by minimizing the difference between a desired kinematic motion and the actual motion enacted by a network. In this implementation of imitation learning, the network takes, as input, the joint angles and velocities, and outputs the muscle activations or torque activations respectively where each network learns a single unique motion.

In addition, there is a variation of the environment that can be setup to train a generalized kinematic model which can perform the motion of any desired kinematics on the fly by training on a large number of unique kinematics. It does this by adding a vector to the observation space with values that are the difference between the current position and a set number of future kineamtic positions (see path_steps under the config parameters to enable this functionality).

The mouse forelimb physics models have been adapted from the biomechanical mouse forelimb model from Gilmer et al. [1]; originally an OpenSim model, the torque and muscle models available here are implemented and simulated in MuJoCo [2]. The DRL libray used here is StableBaselines3 [3] which offers a varity of reliable learning algorithms; however, the main algorithm in use here is PPO [4].

When training is completed, agents are saved in the ./agents/ folder and contain the following: training logs, the config file used when the model was created, and a zip managed by StableBaselines. Currently the only supported model architecture is a shared LSTM [5] backbone, split off into dense layered reward and action heads.

This project was created and tested on linux (specifically ubuntu), and while it might work on other systems, is not guarenteed.

Examples

Torque Driven Solution

Muscle Driven Solution

Setup

Miniconda Installation (if not done so already)

Download Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Run the Installer
```
bash Miniconda3-latest-Linux-x86_64.sh
```

Repo Setup

Install Git (if not done so already)

sudo apt update && sudo apt install -y git

Clone Repo from Github and Open

git clone https://github.com/Al-Borno-Lab/MouseArmImitationLearning.git
cd MouseArmImitationLearning

Create Python Environment and Activate

conda env create -f environment.yml
conda activate MouseArmImitationLearningEnv

(Optional) Install Tensorboard for Numerical Results Visualization
```
pip install tensorboard
```

Huggingface Installations

Install Huggingface Hub (if not done so already)
```
pip install -U huggingface_hub
```

Download Mujoco Model

hf download AlBornoLab/MouseArmModel --repo-type dataset --local-dir ./models

(OPTIONAL) Download Dataset: MouseArmKinematics

hf download AlBornoLab/MouseArmKinematics --repo-type dataset --local-dir ./MouseArmKinematics

(OPTIONAL, REQUIRES AUTH) Download Dataset: Welle

hf download AlBornoLab/Welle --repo-type dataset --local-dir ./Welle

How to Use

Configuration Parameters

This section details which parameters can be tuned from the imitation learning environment, policy, algorithm, and training and testing scripts.

General
- name: Name of the model (if there is no folder under ./agents/... with that name, then the train script will create one instead of continuing training; the test script will fail; if an existing model is used, all config data is pulled from it's relevant config file instead)
Environment
- model: Mujoco model file to use
- kinematics: Kinematic data to use (can be a file for single kinematics, or a folder containing files for generalized kinematics)
- train_ratio: The trainig ratio used in splitting the kineamtic data (only matters for generalized kineamtics)
- seed: Random seed used when shuffling and splitting the kinmatic data (only matters for generalized kinematics)
- path_steps: The number of future timesteps to sample kinematics from and include in the observation (0 for single kinematics, >1 for generalized kinematics)
- w_bone_diff: A weight on the average difference between tracked bone locations in the reward function
- w_elbow: A weight on the elbow in the bone average difference
- w_paw: A weight on the paw in the bone average difference
- w_effort: A weight on the effort used by all actuaturos in the reward function
- w_qvel: A weight on the difference between qvel on the joints in the reward function
- w_qpos: A weight on the difference between qpos on the joints in the reward function
- w_action: A weight on the difference between action outputs in the reward function
- control_dt: Total simulation time step size per environment step
- n_substeps: Simulation substeps per environment step (increasing improves simulation stability)
Policy
- lstm_hidden_size: Number of parameters in the lstm
- n_lstm_layers: Number of lstm layers
- net_arch_pi: A list of layers for the action head
- net_arch_vf: A list of layers for the reward head
Algorithm (There are more advanced terms in the config that are unlisted here, see the stablebaselines RecurrentPPO API for more info)
- learning_rate: Learning rate for training
- n_steps: Total number of steps per environment per iteration
- batch_size: Total number of steps per batch
- n_epochs: Training epochs per iteration
Training
- timesteps: Total timesteps across all training
- num_envs: Number of environments running in parallel
- eval_freq: Timesteps between evaluations
Testing
- slowmo: Sleep time between frames (visual only), increase for greater slowmo effect

Running the Programs

Train a Model
```
python train.py
```

Visualize Training Results with Tensorboard

PORT=$(shuf -i 6006-9000 -n 1); tensorboard --logdir ./logs --port $PORT & sleep 2 && xdg-open http://localhost:$PORT

Test a Model's Performance in a Live Viewer
```
python test.py
```

References

[1] Gilmer, Jesse I., Susan K. Coltman, Geraldine Cuenu, John R. Hutchinson, Daniel Huber, Abigail L. Person, and Mazen Al Borno. "A novel biomechanical model of the proximal mouse forelimb predicts muscle activity in optimal control simulations of reaching movements." Journal of neurophysiology 133, no. 4 (2025): 1266-1278.

[2] Todorov, Emanuel, Tom Erez, and Yuval Tassa. "MuJoCo: A physics engine for model-based control." 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (2012): 5026-5033.

[3] Raffin, Antonin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. "Stable-Baselines3: Reliable Reinforcement Learning Implementations." Journal of Machine Learning Research 22, no. 268 (2021): 1-8.

[4] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal Policy Optimization Algorithms." arXiv preprint arXiv:1707.06347 (2017).

[5] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long Short-Term Memory." Neural Computation 9, no. 8 (1997): 1735-1780.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
readme		readme
README.md		README.md
config.yml		config.yml
data_helper.py		data_helper.py
environment.yml		environment.yml
imitation_env.py		imitation_env.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
train_callback.py		train_callback.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Development

Description

Examples

Setup

Miniconda Installation (if not done so already)

Repo Setup

Huggingface Installations

How to Use

Configuration Parameters

Running the Programs

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Development

Description

Examples

Setup

Miniconda Installation (if not done so already)

Repo Setup

Huggingface Installations

How to Use

Configuration Parameters

Running the Programs

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages