Codes are now mainly copyed from
- google/dopamine
- openai/baselines
- openai/spinningup
- hill-a/stable-baselines
- DLR-RM/stable-baselines3
- LQNew/LWDRLC
The aim is to learn RL and tensorflow1.
Arguments:
usage: run_deepq_atari.py [-h] [--dir_name DIR_NAME]
[--alg {dqn,clipdqn,ddqn,per,duel,c51}] [--env ENV]
[--sticky] [--data_dir DATA_DIR] [--allow_eval]
[--save_model] [--total_steps TOTAL_STEPS]
optional arguments:
-h, --help show this help message and exit
--dir_name DIR_NAME Dir name
--alg {dqn,clipdqn,ddqn,per,duel,c51}
Algorithm name
--env ENV Env name
--sticky Sticky actions
--data_dir DATA_DIR Data disk dir
--allow_eval Whether to eval agent
--save_model Whether to save model
--total_steps TOTAL_STEPS
Total steps trainedRun one experiment:
CUDA_VISIBLE_DEVICES=0 python run_experiment.py --alg dqn
CUDA_VISIBLE_DEVICES=1 python run_experiment.py --sticky --alg c51Run six parallel experiments using script.
zsh run_deepq_atari.py dqn Breakout DQN-testtodo
Arguments:
usage: run_pg_mujoco.py [-h] [--dir_name DIR_NAME] [--data_dir DATA_DIR]
[--env ENV] [--alg {VPG,TRPO,PPO,PPO2,PPOM}]
[--allow_eval] [--save_model]
[--total_steps TOTAL_STEPS] [--num_env NUM_ENV]
optional arguments:
-h, --help show this help message and exit
--dir_name DIR_NAME Dir name
--data_dir DATA_DIR Data disk dir
--env ENV
--alg {VPG,TRPO,PPO,PPO2,PPOM}
Experiment name
--allow_eval Whether to eval agent
--save_model Whether to save model
--total_steps TOTAL_STEPS
Total steps trained
--num_env NUM_ENV Number of envs.Run one experiment:
CUDA_VISIBLE_DEVICES=0 python run_pg_mujoco.py --alg PPO2
CUDA_VISIBLE_DEVICES=1 python run_pg_mujoco.py --alg TRPO --allow_evalRun six parallel experiments using script.
zsh run_pg_mujoco.sh PPO2 Walker2d-v2 PPO-test 1Run six experiments for all envs.
zsh run_pg_mujoco_all.sh PPO2 PPO-testPlot for one env.
python rl-exercise/common/plot.py --logdir PPO-env baselines-PPO --xaxis=Step --value=AvgEpRet --legend PPO-env baselines-PPOPlot for all envs.
python rl-exercise/common/plot_all.py --logdir my_results --xaxis=Step --value=AvgEpRetAverage 6 seeds. 68% confidence interval.