Creating OpenAI Gym Environments with PyBullet (Part 2)

6 min readOct 16, 2020

This is part two of a two part series. Please see the first part if you’re unfamiliar with PyBullet.

We’ll want our environment to be neatly structured so that others can install it with pip and run it quickly as any other OpenAI Gym environment. A complete tutorial on packaging projects can be found in the Python documentation. Install OpenAI Gym through pip.

Basic Structure

The following segment is mostly boilerplate and can be skimmed; it is roughly what is covered here, tailored to out environment — viewing that link is encouraged. Completed code is provided, so you do not need to write any of the following code. Let’s create the following directory structure:

Simple-Driving/ 
    setup.py
    simple_driving/ 
        __init__.py
        envs/
            __init__.py
            simple_driving_env.py
        resources/
            __init__.py
            simplecar.urdf
            simpleplane.urdf 
            car.py
            plane.py

The file setup.py signifies this directory and its sub-contents are an installable package and is run when pip is used, specifying meta-data and dependencies. The __init__.py files specify that a directory is a Python package, and are run when the package is imported. Add this to our setup.py file:

from setuptools import setupsetup(    
    name="simple_driving",
    version='0.0.1',
    install_requires=['gym', 'pybullet', 'numpy', 'matplotlib']
)

In simple_driving/__init__.py, add the following code:

from gym.envs.registration import registerregister(
    id=’SimpleDriving-v0', 
    entry_point=’simple_driving.envs:SimpleDrivingEnv'
)

When we import our package of simple_driving, the code in __init__.py is executed, and our environment is added to Gym’s registry. This allows us to create our environment through the standard method of gym.make(). In the simple_driving/envs/__init__.py, add:

from simple_driving.envs.simple_driving_env import SimpleDrivingEnv

Finally, in simple_driving/envs/simple_driving_env.py, add:

import gym
import numpy as np
import pybullet as p
class SimpleDrivingEnv(gym.Env):
    metadata = {'render.modes': ['human']}  
  
    def __init__(self):
        pass

    def step(self, action):
        pass

    def reset(self):
        pass

    def render(self):
        pass

    def close(self):
        pass    def seed(self, seed=None): 
        pass

Installation and OpenAI Gym Interface

Clone the code, and we can install our environment as a Python package from the top level directory (e.g. where setup.py is) like so from the terminal:

pip install -e .

Then, in Python:

import gym 
import simple_driving 
env = gym.make("SimpleDriving-v0")

If you’re unfamiliar with the interface Gym provides (e.g. env.step(action), env.render(), env.reset()), it’s best to refer to the official documentation before continuing with this tutorial, which is adequately concise and complete.

In the above code, env is an instance of SimpleDrivingEnv. We will use SimpleDrivingEnv to maintain all state information of our environment.

Before we view the implementation of our environment, we should take note of some useful classes / constructs. Gym uses spaces to describe the observation and state space. Instances of gym.spaces should be action_space and observation_space attributes of our environment for consistency with other Gym environments.

    def __init__(self):
        self.action_space = gym.spaces.box.Box(
            low=np.array([0, -0.6]),
            high=np.array([1, 0.6]))
        self.observation_space = gym.spaces.box.Box(
            low=np.array([-10, -10, -1, -1, -5, -5, -10, -10]),
            high=np.array([10, 10, 1, 1, 5, 5, 10, 10]))
        self.np_random, _ = gym.utils.seeding.np_random()

We are starting to define our environment. We use a Box space, for which documentation is found through comments in the source code. We dictate that the action space is two dimensional and continuous, the first dimension in [0, 1] and the second in [-0.6, 0.6]. These correspond to throttle and steering angle. We will eventually train a policy that predicts the throttle and steering angle directly given an observation.

The observation space is eight dimensional and continuous, consisting of xy position of the car, unit xy orientation of the car, xy velocity of the car, and xy position of a target we want to reach.

Gym also provides seeding utilities we can use to ensure different training and demonstration runs are identical. We obtain a randomly seeded numpy random number generator that we’ll use for all random operations.

    def seed(self, seed=None): 
        self.np_random, seed = gym.utils.seeding.np_random(seed)
        return [seed]

Many Gym environments use multiple different random number generators, and the documentation (described only by comments in the source) dictates that a list of seeds used for each of these generators is returned.

In our constructor we have:

        self.client = p.connect(p.DIRECT)

and symmetrically:

    def close(self): 
        p.disconnect(self.client)

Remember from the first part of this tutorial that p.connect() returns an integer which corresponds to our physics simulation in PyBullet. As we may have multiple simulations running at once, we want to hold onto this integer to load URDFs into the correct world. We also want to release the simulation when we close the environment.

We use p.DIRECT as we want to run our environment as quickly as possible when training a policy and only render when render() is called.

Let’s switch to simple_driving/resources/car.py:

import pybullet as p
import numpy as np
import os


class Car:
    def __init__(self, client):
        self.client = client
        f_name = os.path.join(os.path.dirname(__file__),
                              'simplecar.urdf')
        self.car = p.loadURDF(fileName=f_name,
                              basePosition=[0, 0, 0.1],
                              physicsClientId=client)

    def get_ids(self):
        return self.client, self.car

    def apply_action(self, action):
        pass
    
    def get_observation(self):
        pass

Car takes the integer of a PyBullet simulation and loads our simplecar.urdf into it. We’ll implement apply_action() to take a throttle and steering angle as described above and translate that into motion in our simulation through changing the wheel joint velocities (using pybullet.setJointMotorControl()). We can then use get_observation() to query PyBullet about the state of the car after applying wheel joint velocities and stepping the simulation.

Now that we have some basic structure laid out, remaining code only interfaces Gym with PyBullet, calculating rewards, applying actions to joints, and querying the simulation for observations.

Code Overview

To avoid pasting excessive code, the finished repository is provided here. We will elaborate on segments of the code which are not straight forward and haven’t been covered so far.

First, skim through simple_driving/envs/simple_driving_env.py. This is the core logic for our environment. The function step() applies an action to the car and steps the simulation. It then creates a new observation through get_observation() from simple_driving/resources/car.py, which queries the PyBullet simulation for robot state. Using this observation, we compute the reward and whether the episode is done.

This will be the general format for all PyBullet environments — the step() function applies an action (indirectly, in our case) to joints in the PyBullet environment, steps the simulation, then queries the robot’s states for a new observation.

In our Car class, we see that apply_action() manually calculates drag and friction, then sets the velocity of the joint directly (thus setting the wheel speed directly). While applying a torque to the joints rather than setting their final velocity would be more realistic, doing so will be unpredictable with the toy URDF model we have; more mechanically accurate models will be necessary for torque control.

POV rendering from PyBullet simulation, angle and throttle predicted by TRPO

In render(), we obtain an image from the perspective of our car in simulation through p.getCameraImage(), with an example here. This is generally slow, and should be avoided during training, but is the canonical PyBullet method of obtaining simulation images through p.DIRECT.

With these few functions implemented in our SimpleDrivingEnv, we have a completed OpenAI Gym compatible environment. This means that all generic RL algorithms designed to be used with Gym’s environments are immediately usable. I have added my personal hand implementation of Trust Region Policy Optimization to the repository; using Stable Baselines is a good alternative to designing your own RL agent.

Rendering of simulation from PyBullet GUI, angle and throttle predicted by TRPO

While this environment is simple, the same skeleton structure will scale to arbitrarily complex environments. For an example, see the ArmManipulationEnv from Assistive Gym, a recently published set of environments for HRI research. The URDFs, reward calculations, action application, and observation querying are much more complicated but follow the same familiar structure. Their code similarly serves as a glue between the physics engine (PyBullet) and the Gym interface.

To familiarize yourself with the code of our own repo, consider the following changes:

Limit the number of maximum steps in an episode as part of the environment. In other words, if the goal isn’t reached or the car doesn’t go off the edge within some number of steps, set done to True.
The environment’s reward function does not penalize for moving away from the goal. How could we change this to encourage a direct route to the goal, but not prevent exploration? Keep in mind the car will need to move away from the goal initially to turn if the goal is behind the car.
We can change the environment’s action_space and Car’s apply_action() to add a third dimension for breaking between [0, 1]. How should this effect joint_speed? Note joint_speed is the direct velocity of the wheels.
To make the environment harder, we could add obstacles, with the xy position of the obstacles as part of the observation. We’d need to randomly intiailize these obstacle’s positions in reset(), then figure out if the car has collided with them in the step() function; if it collides, what’s an appropriate negative reward? Remember to set self.done to True in a collision.

A final note should be that physics simulation programming is a field in its own right, detached from reinforcement learning research. Unless you have a deeper interest in pursuing a career in that field, using readily available environments is a far more productive alternative to creating your own from scratch. Nonetheless, having a basic understanding of the tools is always useful, and I hope you’ve obtained that from this tutorial!

Creating OpenAI Gym Environments with PyBullet (Part 2)

Basic Structure

Installation and OpenAI Gym Interface

Code Overview

Written by Gerard Maggiolino