This repository is now maintained over here, so please switch to the latest version of the repository.
Socially aware path planning enables a robot to navigate through a crowded environment causing the least amount of discomfort to the surrounding people. The aim of the project is to account for the interactions between the different entities in the environment by adding interaction nodes between entities in the graph representation. Since the simulators available are heavy to train, a light-weight OpenAI Gym style environment has been designed. Additionally,
Deep RL agents have also been implemented in the agents directory that contains implementations of DQN, DuelingDQN, A2C, PPO, DDPG, SAC, and CrowdNav, using a custom transformer as the underlying architecture. Scripts usign Stable Baselines 3 are also available (stable_dqn.py
and stable_ppo.py
).
The following dependencies can be installed using pip or Anaconda: gym
matplotlib
opencv-python
, shapely
, Cython
, cv2
.
To run a few agents, stable-baselines-3
is required. It can be installed using : pip install git+https://github.com/carlosluis/stable-baselines3@fix_tests
RVO2 can be installed using the following repository: https://github.com/sybrenstuvel/Python-RVO2/
import socnavgym
import gym
env = gym.make('SocNavGym-v1', config="<PATH_TO_CONFIG>")
import socnavgym
import gym
env = gym.make("SocNavGym-v1", config="./configs/temp.yaml")
obs, _ = env.reset()
for i in range(1000):
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.render()
if terminated or truncated:
env.reset()
SocNavGym-v1
is a highly customisable environment and the parameters of the environment can be controlled using the config files. You can have a look at the config files in paper_configs/. Comments have been written against each parameter in the config file for better understanding. Other than the robot, the environment supports entities like plants, tables, laptops. The environment also models interactions between humans, and human-laptop. It can also contain moving crowds, and static crowds. The environment follows the OpenAI Gym format implementing the step
, render
and reset
functions. The environment uses the latest Gym API (gym 0.26.2).
- X-axis points in the direction of zero-angle.
- The orientation field of the humans and the robot stores the angle between the X-axis of the human/robot and the X-axis of the ground frame.
The observation returned when env.step(action)
is called consists of the following (all in the robot frame):
The observation is of the type gym.Spaces.Dict
. The dictionary has the following keys:
-
"robot"
: This is a vector of shape (9,) of which the first six values represent the one-hot encoding of the robot, i.e[1, 0, 0, 0, 0, 0]
. The next two values represent the goal's x and y coordinates in the robot frame and the last value is the robot's radius. -
The other keys present in the observation are
"humans"
,"plants"
,"laptops"
,"tables"
and"walls"
. Every entity (human, plant, laptop, table, or wall) would have an observation vector given by the structure below: Details of the field values:Encoding Relative Coordinates Relative Angular Details Radius Relative Speeds Gaze enc0 enc1 enc2 enc3 enc4 enc5 x y sin(theta) cos(theta) radius linear_vel angular_vel gaze 0 1 2 3 4 5 6 7 8 9 10 11 12 13 -
One hot encodings of the object.
The one hot encodings are as follows:
- human:
[0, 1, 0, 0, 0, 0]
- table:
[0, 0, 1, 0, 0, 0]
- laptop:
[0, 0, 0, 1, 0, 0]
- plant:
[0, 0, 0, 0, 1, 0]
- wall:
[0, 0, 0, 0, 0, 1]
- human:
-
x, y coordinates relative to the robot. For rectangular shaped objects the coordinates would correspond to the center of geometry.
-
theta : The orientation with respect to the robot
-
radius: Radius of the object. Rectangular objects will contain the radius of the circle that circumscribes the rectangle
-
relative speed
-
relative angular speed is calculated by the difference in the angles across two consecutive time steps and dividing by the time-step
-
gaze value: for humans, it is 1 if the robot lies in the line of sight of humans, otherwise 0. For entities other than humans, the gaze value is 0. Line of sight of the humans is decided by whether the robot lies from -gaze_angle/2 to +gaze_angle/2 in the human frame. Gaze angle can be changed by changing the
gaze_angle
parameter in the config file.
The observation vector of the all entities of the same type would be concatenated into a single vector and that would be placed in the corresponding key in the dictionary. For example, let's say there are 4 humans, then the four vectors of shape (14,) would be concatenated to (56,) and the
"humans"
key in the observation dictionary would contain this vector. Individual observations can be accessed by simply reshaping the observation to (-1, 14).For walls, each wall is segmented into smaller walls of size
wall_segment_size
from the config. Observations from each segment are returned inobs["walls"]
-
The action space consists of three components, vx, vy, and va. Here the X axis is the robot's heading direction. For differential drive robots, the component vy would be 0. All the three components take in a value between -1 and 1, which will be later mapped to the corresponding speed using the maximum set in the config file. If you want to use a discrete action space, you could use the DiscreteActions
wrapper.
The environment also returns meaningful metrics in the info dict (returned during env.step()
). On termination of an episode, the reason of termination can be found out using the info dict. The corresponding key from the following keys is set to True
.
"OUT_OF_MAP"
: Whether the robot went out of the map"REACHED_GOAL"
: Whether the robot reached the goal"COLLISION_HUMAN"
: Whether the robot collided with a human"COLLISION_OBJECT"
: Whether the robot collided with an object (plant, wall, table)"COLLISION"
: Whether the robot collided with any entity"MAX_STEPS"
: Has the episode terminated because the maximum length of episode was reached.
Apart from this, the info dict also provides a few metrics such as "personal_space_compliance"
(percentage of time that the robot is not within the personal space of humans), "success_weighted_by_time_length"
(success * (time_taken_by_orca_agent_to_reach_goal / time_taken_by_robot_to_reach_goal)), "closest_human_dist"
(distance to the closest human), and "closest_obstacle_dist"
(distance to the closest obstacle). Some additional metrics that are also provided are :
"DISCOMFORT_SNGNN"
: SNGNN_value (More about SNGNN in the section below)"DISCOMFORT_DSRNN"
: DSRNN reward value (More about DSRNN reward function in the section below)"sngnn_reward"
: SNGNN_value - 1"distance_reward"
: Value of the distance reward. Note that the above 4 values are returned correctly if the reward function used is"dsrnn"
or"sngnn"
. If a custom reward function is written, then the user is required to fill the above values otherwise 0s would be returned for them.
Lastly, information about the interactions is returned as an adjacency list. There are two types of interactions, "human-human"
and "human-laptop"
. For every interaction between human i
and human j
(i
and j
are the based on the order in which the human's observations appear in the observation dictionary. So to extract the ith human's observation, you could just to obs["humans"].reshape(-1, 14)[i]
), (i, j) would be present in info["interactions"]["human-human"]
, and similarly for an interaction between the ith human and the jth laptop, (i, j) would be present in info["interactions"]["human-laptop"]
.
The environment provides implementation of the SNGNN reward function, and the DSRNN reward function. If you want to use these reward functions, the config passed to the environment should have the value corresponding to the field reward_file
as "sngnn"
or "dsrnn"
respectively.
The environment also allows users to provide custom reward functions. Follow the guide below to create your own reward function.
- Create a new python file in which you have to create a class named
Reward
. It must inherit fromRewardAPI
class. To do this, do the following
from socnavgym.envs.rewards import RewardAPI
class Reward(RewardAPI):
...
- Overwrite the function
compute_reward
with the custom reward function. The input of thecompute_reward
function is the action of the current timestep, the previous entity observations and the current entity observations. The previous and current observations are given as a dictionary with key as the id of the entity, and the value is an instance of theEntityObs
namedtuple defined in this file. It contains the fields : id, x, y, theta, sin_theta, cos_theta for each entity in the environment. Note that all these values are in the robot's frame of reference. - If need be, you can also access the lists of humans, plants, interactions etc, that the environment maintains by referencing the
self.env
variable. An example of this can be found in thedsrnn_reward.py
file - The
RewardAPI
class provides helper functions such ascheck_collision
,check_timeout
,check_reached
andcheck_out_of_map
. These functions are boolean functions that check if the robot has collided with any enitity, whether the maximum episode length has been reached, whether the robot has reached the goal, or if the robot has moved out of the map respectively. The last case can occur only when the envirnoment is configured to have no walls. - The
RewardAPI
class also has a helper function defined to compute the SNGNN reward function. Callcompute_sngnn_reward(actions, prev_obs, curr_obs)
to compute the SNGNN reward. Also note that if you are using the SNGNN reward function in your custom reward function, please set the variableself.use_sngnn
toTrue
. - You can also store any additional information that needs to be returned in the info dict of step function by storing all of it in the variable
self.info
of theReward
class. - Storing anything in a class variable will persist across the steps in an episode. There will be a new instantiation of the Reward class object every episode.
- Provide the path to the file where you defined your custom reward function in the config file's
reward_file
.
The behaviour of the environment is controlled using the config file. The config file needs to be passed as a parameter while doing gym.make
. These are the following parameters and their corresponding descriptions
Parameter | Description | |
---|---|---|
rendering | resolution_view | size of the window for rendering the environment |
milliseconds | delay parameter for waitKey() | |
episode | episode_length | maximum steps in an episode |
time_step | number of seconds that one step corresponds to | |
robot | robot_radius | radius of the robot |
goal_radius | radius of the robot's goal | |
robot_type | Accepted values are "diff-drive" (for differential drive robot) and "holonomic" (for holonomic robot) | |
human | human_diameter | diameter of the human |
human_goal_radius | radius of human's goal | |
human_policy | policy of the human. Can be "random", "sfm", or "orca". If "random" is kept, then one of "orca" or "sfm" would be randomly chosen | |
gaze_angle | gaze value (in the observation) for humans would be set to 1 when the robot lies between -gaze_angle/2 and +gaze_angle/2 | |
fov_angle | the frame of view for humans | |
prob_to_avoid_robot | the probability that the human would consider the robot in its policy | |
laptop | laptop_width | width of laptops |
laptop_length | length of laptops | |
plant | plant_radius | radius of plant |
table | table_width | width of tables |
table_length | length of tables | |
wall | wall_thickness | thickness of walls |
human-human-interaction | interaction_radius | radius of the human-crowd |
interaction_goal_radius | radius of the human-crowd's goal | |
noise_varaince | a random noise of normal(0, noise_variance) is applied to the humans' speed to break uniformity | |
human-laptop-interaction | human_laptop_distance | distance between human and laptop |
env | margin | margin for the env |
max_advance_human | maximum speed for humans | |
max_advance_robot | maximum linear speed for the robot | |
max_rotation | maximum rotational speed for robot | |
wall_segment_size | size of the wall segment, used when segmenting the wall | |
speed_threshold | speed below which would be considered 0 (for humans) | |
crowd_dispersal_probability | probability of crowd dispersal | |
human_laptop_dispersal_probability | probability to disperse a human-laptop-interaction | |
crowd_formation_probability | probability of crowd formation | |
human_laptop_formation_probability | probability to form a human-laptop-interaction | |
reward_file | Path to custom-reward file. If you want to use the in-built SNGNN reward function or the DSRNN reward function, set the value to "sngnn" or "dsrnn" respectively | |
cuda_device | cuda device to use (in case of multiple cuda devices). If cpu or only one cuda device, keep it as 0 | |
min_static_humans | minimum no. of static humans in the environment | |
max_static_humans | maximum no. of static humans in the environment | |
min_dynamic_humans | minimum no. of dynamic humans in the environment | |
max_dynamic_humans | maximum no. of dynamic humans in the environment | |
min_tables | minimum no. of tables in the environment | |
max_tables | maximum no. of tables in the environment | |
min_plants | minimum no. of plants in the environment | |
max_plants | maximum no. of plants in the environment | |
min_laptops | minimum no. of laptops in the environment | |
max_laptops | maximum no. of laptops in the environment | |
min_h_h_dynamic_interactions | minimum no. of dynamic human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0 | |
max_h_h_dynamic_interactions | maximum no. of dynamic human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0 | |
min_h_h_dynamic_interactions_non_dispersing | minimum no. of dynamic human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0 | |
max_h_h_dynamic_interactions_non_dispersing | maximum no. of dynamic human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0 | |
min_h_h_static_interactions | minimum no. of static human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0 | |
max_h_h_static_interactions | maximum no. of static human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0 | |
min_h_h_static_interactions_non_dispersing | minimum no. of static human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0 | |
max_h_h_static_interactions_non_dispersing | maximum no. of static human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0 | |
min_human_in_h_h_interactions | minimum no. of humans in a human-human interaction | |
max_human_in_h_h_interactions | maximum no. of humans in a human-human interaction | |
min_h_l_interactions | minimum no. of human-laptop interactions in the env. Note that these crowds can disperse if the parameter human_laptop_dispersal_probability is greater than 0 | |
max_h_l_interactions | maximum no. of human-laptop interactions in the env. Note that these crowds can disperse if the parameter human_laptop_dispersal_probability is greater than 0 | |
min_h_l_interactions_non_dispersing | minimum no. of human-laptop interactions in the env. Note that these crowds never disperse, even if the parameter human_laptop_dispersal_probability is greater than 0 | |
max_h_l_interactions_non_dispersing | maximum no. of human-laptop interactions in the env. Note that these crowds never disperse, even if the parameter human_laptop_dispersal_probability is greater than 0 | |
get_padded_observations | flag value that indicates whether you require padded observations or not. You can change it using env.set_padded_observations(True/False) | |
set_shape | Sets the shape of the environment. Accepted values are "random", "square", "rectangle", "L" or "no-walls" | |
add_corridors | True or False, whether there should be corridors in the environment | |
min_map_x | minimum size of map along x direction | |
max_map_x | maximum size of map along x direction | |
min_map_y | minimum size of map along y direction | |
max_map_y | maximum size of map along y direction |
Gym wrappers are convenient to have changes in the observation-space / action-space. SocNavGym implements 4 wrappers.
The following are the wrappers implemented by SocNavGym:
-
DiscreteActions
: To change the environment from a continuous action space environment to a discrete action space environment. The action space consists of 7 discrete actions. They are :- Turn anti-clockwise (0)
- Turn clock-wise (1)
- Turn anti-clockwise and moving forward (2)
- Turning clockwise and moving forward (3)
- Move forward (4)
- Move backward (5)
- Stay still (6)
As an example, to make the robot move forward throughout the episode, just do the following:
import socnavgym from socnavgym.wrappers import DiscreteActions env = gym.make("SocNavGym-v1", config="paper_configs/exp1_no_sngnn.yaml") # you can pass any config env = DiscreteActions(env) # creates an env with discrete action space # simulate an episode with random actions done = False env.reset() while not done: obs, rew, terminated, truncated, info = env.step(4) # 4 is for moving forward done = terminated or truncated env.render()
-
NoisyObservations
: This wrapper can be used to add noise to the observations so as to emulate real world sensor noise. The parameters that the wrapper takes in aremean
,std_dev
. Apart from this, there is also a parameter calledapply_noise_to
which defaults to[robot", "humans", "tables", "laptops", "plants", "walls"]
, meaning all enitity types. If you want to apply noise to only a few entity types, then pass a list with only those entity types to this parameter. The noise value can be controlled using themean
and thestd_dev
parameters. Basically, a Gaussian noise withmean
andstd_dev
is added to the observations of all the entities whose entity type is listed in the parameterapply_noise_to
. As an example, to add a small noise with 0 mean and 0.1 std dev to all entity types do the following:import socnavgym from socnavgym.wrappers import NoisyObservations env = gym.make("SocNavGym-v1", config="paper_configs/exp1_no_sngnn.yaml") # you can pass any config env = NoisyObservations(env, mean=0, std_dev=0.1) # simulate an episode with random actions done = False env.reset() while not done: obs, rew, terminated, truncated, info = env.step(env.action_space.sample()) # obs would now be a noisy observation done = terminated or truncated env.render()
-
PartialObservations
: This wrapper is used to return observations that are present in the frame of view of the robot, and also that lies within the range. Naturally, the wrapper takes in two parametersfov_angle
and therange
. An example of using thePartialObservations
wrapper:import socnavgym from socnavgym.wrappers import PartialObservations from math import pi env = gym.make("SocNavGym-v1", config="paper_configs/exp1_no_sngnn.yaml") # you can pass any config env = PartialObservations(env, fov_angle=2*pi/3, range=1) # creates a robot with a 120 degreee frame of view, and the sensor range is 1m. # simulate an episode with random actions env.reset() done = False while not done: obs, rew, terminated, truncated, info = env.step(env.action_space.sample()) done = terminated or truncated env.render()
-
WorldFrameObservations
: Returns all the observations in the world frame. The observation space of the"robot"
would look like this: The other enitity observations would remain the same, the only difference being that the positions and velocities would be in the world frame of reference and not in the robot's frame of reference.Encoding Robot Goal coordinates Robot coordinates Angular Details Velocities Speeds Radius enc0 enc1 enc2 enc3 enc4 enc5 goal_x goal_y x y sin(theta) cos(theta) vel_x vel_y vel_a radius 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 An example of using the
WorldFrameObservations
wrapper:import socnavgym from socnavgym.wrappers import WorldFrameObservations from math import pi env = gym.make("SocNavGym-v1", config="paper_configs/exp1_no_sngnn.yaml") # you can pass any config env = WorldFrameObservations(env) # simulate an episode with random actions env.reset() done = False while not done: obs, rew, terminated, truncated, info = env.step(env.action_space.sample()) # obs contains observations that are in the world frame done = terminated or truncated env.render()