Actor-Critic Methods are very useful reinforcement learning techniques.

Actor-critic methods are most useful for applications in robotics as they allow software to output continuous, rather than discrete actions. This enables control of electric motors to actuate movement in robotic systems, at the expense of increased computational complexity.

We just released a comprehensive course on Actor-Critic methods on the freeCodeCamp.org YouTube channel.

Dr. Tabor developed this course. He is a physicist and former semiconductor engineer who is now a data scientist.

The basic idea behind actor-critic methods is that there are two deep neural networks. The actor network approximates the agent’s policy: a probability distribution that tells us the probability of selecting a (continuous) action given some state of the environment. The critic network approximates the value function: the agent’s estimate of future rewards that follow the current state. These two networks interact to shift the policy towards more profitable states, where profitability is determined by interacting with the environment.

This requires no prior knowledge of how our environment works, or any input regarding rules of the game. All we have to do is let the algorithm interact with the environment and watch as it learns.

This course also incorporate some useful innovations from deep Q learning, such as the use of experience replay buffers and target networks. This increases stability and robustness of the learned policies, so that our agent are able to learn effective policies for navigating the Open AI gym environments.

Here are the algorithms covered in this course:

  • Actor Critic
  • Deep Deterministic Policy Gradients (DDPG)
  • Twin Delayed Deep Deterministic Policy Gradients (TD3)
  • Proximal Policy Optimization (PPO)
  • Soft Actor Critic (SAC)
  • Asynchronous Advantage Actor Critic (A3C)

Watch the full course below or on the freeCodeCamp.org YouTube channel (6-hour watch).