RL-exercise

Introduction

Codes are now mainly copyed from

The aim is to learn RL and tensorflow1.

Usage

Value-based

Arguments:

usage: run_deepq_atari.py [-h] [--dir_name DIR_NAME]
                          [--alg {dqn,clipdqn,ddqn,per,duel,c51}] [--env ENV]
                          [--sticky] [--data_dir DATA_DIR] [--allow_eval]
                          [--save_model] [--total_steps TOTAL_STEPS]

optional arguments:
  -h, --help            show this help message and exit
  --dir_name DIR_NAME   Dir name
  --alg {dqn,clipdqn,ddqn,per,duel,c51}
                        Algorithm name
  --env ENV             Env name
  --sticky              Sticky actions
  --data_dir DATA_DIR   Data disk dir
  --allow_eval          Whether to eval agent
  --save_model          Whether to save model
  --total_steps TOTAL_STEPS
                        Total steps trained

Run one experiment:

CUDA_VISIBLE_DEVICES=0 python run_experiment.py --alg dqn
CUDA_VISIBLE_DEVICES=1 python run_experiment.py --sticky --alg c51

Run six parallel experiments using script.

zsh run_deepq_atari.py dqn Breakout DQN-test

Deterministic Policy gradient

todo

Policy gradient

Arguments:

usage: run_pg_mujoco.py [-h] [--dir_name DIR_NAME] [--data_dir DATA_DIR]
                        [--env ENV] [--alg {VPG,TRPO,PPO,PPO2,PPOM}]
                        [--allow_eval] [--save_model]
                        [--total_steps TOTAL_STEPS] [--num_env NUM_ENV]

optional arguments:
  -h, --help            show this help message and exit
  --dir_name DIR_NAME   Dir name
  --data_dir DATA_DIR   Data disk dir
  --env ENV
  --alg {VPG,TRPO,PPO,PPO2,PPOM}
                        Experiment name
  --allow_eval          Whether to eval agent
  --save_model          Whether to save model
  --total_steps TOTAL_STEPS
                        Total steps trained
  --num_env NUM_ENV     Number of envs.

Run one experiment:

CUDA_VISIBLE_DEVICES=0 python run_pg_mujoco.py --alg PPO2
CUDA_VISIBLE_DEVICES=1 python run_pg_mujoco.py --alg TRPO --allow_eval

Run six parallel experiments using script.

zsh run_pg_mujoco.sh PPO2 Walker2d-v2 PPO-test 1

Run six experiments for all envs.

zsh run_pg_mujoco_all.sh PPO2 PPO-test

Plotting

Plot for one env.

python rl-exercise/common/plot.py --logdir PPO-env baselines-PPO --xaxis=Step --value=AvgEpRet --legend PPO-env baselines-PPO

Plot for all envs.

python rl-exercise/common/plot_all.py --logdir my_results --xaxis=Step --value=AvgEpRet

Training curves

Mujoco

Average 6 seeds. 68% confidence interval.

Name		Name	Last commit message	Last commit date
Latest commit History 333 Commits
assets		assets
common		common
deepq		deepq
dpg		dpg
pg		pg
.gitignore		.gitignore
README.md		README.md
rm_dir.sh		rm_dir.sh
run_baselines.sh		run_baselines.sh
run_baselines_49.sh		run_baselines_49.sh
run_baselines_all.sh		run_baselines_all.sh
run_deepq_atari.py		run_deepq_atari.py
run_deepq_atari.sh		run_deepq_atari.sh
run_dpg_mujoco.py		run_dpg_mujoco.py
run_dpg_mujoco.sh		run_dpg_mujoco.sh
run_dpg_mujoco_all.sh		run_dpg_mujoco_all.sh
run_pg_atari.py		run_pg_atari.py
run_pg_atari_49.sh		run_pg_atari_49.sh
run_pg_mujoco.py		run_pg_mujoco.py
run_pg_mujoco.sh		run_pg_mujoco.sh
run_pg_mujoco_all.sh		run_pg_mujoco_all.sh
run_pg_mujoco_tmux.sh		run_pg_mujoco_tmux.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL-exercise

Introduction

Usage

Value-based

Deterministic Policy gradient

Policy gradient

Plotting

Training curves

Mujoco

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL-exercise

Introduction

Usage

Value-based

Deterministic Policy gradient

Policy gradient

Plotting

Training curves

Mujoco

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages