A Beginner’s Guide to Reinforcement Learning

Introduction To Reinforcement Learning

Machine Learning is a growing technology supporting many applications in almost every field. Covering subsets like Deep Learning, machine learning is the study and development of computer machines that learn and adapt to provided data to generalize new data.

ML is all about the system learning from the data. Three types of machine learning techniques exist: Supervised, Unsupervised, and Reinforcement Learning. These techniques differ in the way the machine processes the data.

Reinforcement learning (RL) is a subset of machine learning where an agent learns to make decisions by performing actions in an environment, receiving rewards or penalties. It differs from supervised and unsupervised learning in its unique action-reward mechanism. Key components include the agent, environment, states, actions, rewards, and policy. RL is applied in various fields like gaming (e.g., AlphaGo), autonomous vehicles, and healthcare. Its goal is to maximize positive rewards, making it crucial for decision-making tasks

How Is Reinforcement Learning Different?

It is necessary to understand Supervised and Unsupervised learning techniques to interpret the concepts of reinforcement learning.

Supervised machine learning is like a teacher. In this technique, labeled input with appropriate and correct output is provided to the model. The model is supposed to learn the mapping between the input and output data to generalize the new unseen data.

Coming to unsupervised learning techniques, the machine does not have any information about the output and the input is also unlabelled. The model is supposed to learn the patterns and similarities between the data.

Also read: 3 key differences between supervised and unsupervised learning techniques that you need to know!

The reinforcement Learning technique is a bit different from the above-discussed methods. Consider this analogy; you are supposed to train your pet dog to follow basic commands. For every command your dog successfully follows, you give it a treat as a reward, and if the dog does not understand the command, the dog is given feedback and penalties(lack of treat). In the same manner, reinforcement learning also follows an action-reward sequence.

Exploring the Agent’s Role in Reinforcement Learning

The relationship between the data and the algorithm in reinforcement learning is a bit different than the other two techniques. The algorithm(Actor) in reinforcement learning is called an agent. The agent is introduced to an environment, where it is supposed to learn the conditions of the environment and complete certain tasks.

The agent explores the states in the environment and performs some actions in those states. For every action the agent performs in each state, the agent is awarded a positive reward for good actions and is penalized for every bad action. The goal of the agent is to maximize the positive rewards.

Key Components in Reinforcement Learning

The main components of Reinforcement Learning are the agent, environment, state, action, and rewards. Let us take a look at these components.

  • Agent: The actor who explores the environment and makes decisions, performs actions
  • Environment(E): The physical world where the agent explores and performs actions
  • State(S): The part of the environment the agent is exploring currently
  • Action(A): The tasks the agent performs using trial and error method
  • Reward(R): The scalar amount the agent receives for his action. The agent is penalized for a wrong move

These are the primary components, but there is one crucial component called the policy. The policy is the mapping between the environment and the actions. This policy determines how an agent explores the environment.

Components of Reinforcement Learning
Components of Reinforcement Learning

Types of Reinforcement Learning

The variants of reinforcement learning can be categorized based on the policy or the type of reward received. When it comes to policy, reinforcement learning is categorized into two types.

  • On-policy: In on-policy reinforcement learning, the agent follows the policy determined as it is. The agent does not deviate from the policy path
    Examples – SARSA and REINFORCE
  • Off-policy: In this type of reinforcement learning, the agent follows a different policy or no policy at all
    Examples – Q-Learning and Expected SARSA

To put it simply, the ultimate goal of RL is to maximize rewards in the environment. The agent takes some actions based on a policy or no policy, through a trial and error method, and receives a reward in the form of feedback or a penalty. Based on the reward at the current state, the agent has to decide on which state to explore further to maximize his rewards.

Markov Decision Process and Reinforcement Learning

Markov Decision Process(MDP) is a controlled stochastic process involved in decision-making. It is often laid as a foundation for many complex decision-making problems such as the multi-armed bandit problem.

The components we saw earlier are the framework of an MDP. In reinforcement Learning, any problem is often formulated as an MDP, which is defined as a tuple of (S, A, P, R) where:

  • S: Set of states
  • A: Set of actions
  • P: State transition probabilities
  • R: A reward function

Any RL environment is formulated as an MDP, and the agent is supposed to perform some actions in the states of the MDP to maximize positive rewards which is ultimately the goal of the agent. Most of the reward functions for various problems in RL are also inspired by the concepts of MDP.

Real-World Applications of Reinforcement Learning

The use of the words – agent, environment, actions, etc must have given it away; RL finds its use in many online games and the computer implementation of many traditional games such as chess and Go.

Alpha Go, the computer program developed by DeepMind is proficient in playing Go. So proficient that it defeated the strongest Go player! There have been many versions of this program like the AlphaZero and Master which also performed very well.

Another key usage of RL is in self-driving cars. Recently, many self-driving cars have been developed by many organizations, and some of these use reinforcement learning to make decisions.

One notable mention is the AWS Deep Racer, which is a self-driving robot and a simulator. It has found a cool way to learn and teach the concepts of RL. RL also finds its use in marketing, the food industry, healthcare, manufacturing, and many other industries, making the right decisions and saving time.


To conclude, reinforcement learning is a type of machine learning where the actor teaches himself by exploring his surroundings and performing some action, which is rewarded based on the action taken.

RL goes hand in hand with decision-making, as the agent is supposed to make decisions at every point of time. It finds its use in many applications like games, autonomous cars, robots that can cook and serve, healthcare, manufacturing, and so on.