Deep Reinforcement Learning algorithms model free DQN and DDQN SARSA and Q-Learning Policy Gradients SAC PPO DDPG REINFORCE Model Based Dyna-Q PlaNet Dynamic Programming Notebooks Bandits Multi Arm Bandits