Creating OpenAI Gym Environments with PyBullet (Part 1)

Gerard Maggiolino
9 min readOct 22, 2019

--

This guide assumes rudimentary knowledge of reinforcement learning and the structure of OpenAI Gym environments, along with proficiency in Python.

Motivation:

Many of the standard environments for evaluating continuous control reinforcement learning algorithms are built on the MuJoCo physics engine, a paid and licensed software. Bullet Physics provides a free and open source alternative to physics simulation with OpenAI Gym offering a set of environments built upon it. PyBullet is a library designed to provide Python bindings to the lower level C-API of Bullet. We will use PyBullet to design our own OpenAI Gym environments.

Post Overview:

This post will be the first of a two part series.

  1. PyBullet and Building / Manipulating URDF files
  2. OpenAI Gym Structure and Implementation

We’ll go through building an environment step by step with enough explanations for you to learn how to independently build your own. Code will be displayed first, followed by explanation. Please follow along in Python on your command line.

PyBullet Intro

Install PyBullet with pip. The majority of PyBullet documentation comes in the form of the Quickstart Guide, which provides descriptions of all functions in the pybullet module. PyBullet uses a client-server style API; commands and queries are sent to an ongoing physics simulation.

import pybullet as p
# Can alternatively pass in p.DIRECT
client = p.connect(p.GUI)
p.setGravity(0, 0, -10, physicsClientId=client)

A simulation is started using the p.connect() method. Several connection modes are available, with p.GUI allowing for visualization and debugging, and p.DIRECT providing the fastest, non-visual connection. The returned value from p.connect() is an integer, and can be passed to other functions to specify which simulation to apply the command to. This is useful if you have multiple physics simulations, and will default to 0 (the first started simulation) if it’s not provided as an argument.

A command is sent to the server through function calls, such as p.setGravity(). Following the above code, let’s add on:

import pybullet_data
p.setAdditionalSearchPath(pybullet_data.getDataPath())
planeId = p.loadURDF("plane.urdf")

The module pybullet_data provides many example Universal Robotic Description Format (URDF) files. We’ll build our own URDF file soon, but for now, just understand that they specify the physical and visual characteristics of a robot.

The p.loadURDF() command tells the simulation to create a robot as specified in a URDF file. Find it in the QuickStart Guide, and view the arguments and return value. Note that the returned value is an integer, not a modifiable object. This integer is the ID passed into other functions to query the state of a robot and perform actions on it. Of course, the plane.urdf we’ve loaded is a very, very simple “robot”. Let’s load in a more complicated one.

Note: If you’re following along with the code on your command line, the simulation should appear hung. This is expected, and will resolve in future steps.

carId = p.loadURDF(“racecar/racecar.urdf”, basePosition=[0,0,0.2])

We’ve specified a base position to create the racecar slightly above the plane. Make sure you’ve checked the QuickStart Guide and viewed the p.loadURDF() function. Reading documentation will be crucial in becoming comfortable with PyBullet.

position, orientation = p.getBasePositionAndOrientation(carId)

Find the documentation for the above function in the QuickStart Guide. All PyBullet functions follow a similar style; using the ID of a loaded URDF, we can query information about the robot or specify actions to perform. Notice that as we have only a single physics simulation running, we can omit the physicsClientId argument, which defaults to zero. Running the following:

for _ in range(100): 
p.stepSimulation()

will run the simulation for 100 time steps. Each time step is 1/240 of a second. Having set the gravity to -10 in the z direction earlier, we should see the car fall to the ground. Using what we’ve learned above, let’s send commands to the client to move the car. View the documentation for the function p.applyExternalForce() — we’ll cover the linkIndex argument soon when building our own URDF. Try to fill in the labelled blanks below:

for _ in range(300): 
pos, ori = p.getBasePositionAndOrientation(__A__)
p.applyExternalForce(__B__, 0, [50, 0, 0], __C__, p.WORLD_FRAME)
p.stepSimulation()

The car should move forward in the simulation! Normally, we would not apply an external force to move the car. We would apply a force to the axle of the car to rotate the wheels. We’ll first build our own URDF file to understand how robots are structured. We’ll then manipulate the joints directly to move the toy car we’ve created.

Note: Answers for above are as follows. A = B = carId. This specifies which robot is the target of our query and command. C = pos. According to documentation, the 4th argument is where the force is applied. We want to apply the force to our car as it moves.

Building URDF Files

While PyBullet accepts several robot descriptor file formats, including the more powerful SDF and MJCF, URDF is the standard for Robot OS (ROS) and what we’ll be using. A complete tutorial can be found on ROS’s Wiki; we’ll focus on what’s necessary to build and understand a simple model.

Robots specified with URDF are created from two components, links and joints. These components are arranged in a tree structure originating from a base link. Forces are applied to joints through PyBullet to control robots.

The link and joint structure of a URDF file. Source: ROS

Links and joints have properties that define their shape and behavior in the simulation; the visual property in the code below is an example. URDF files are formatted with XML, a markup language similar in structure to HTML. Tags are used to specify and describe the properties of links and joints.

import pybullet as p 
from time import sleep
p.connect(p.GUI)
p.loadURDF("simplecar.urdf")
sleep(3)

Create “simplecar.urdf” as above. Use the p.GUI connection to visualize the model at each step. For now, our model is composed of one link with just a visual property, a box. Note that the box’s visual origin is at the center of it’s geometry, with half the box appearing below the grid.

The wheels of our car will each be their own link connected to the base link with a joint. Joints have exactly one parent and one child link that they connect. We’re going to create rear wheels as cylinder shaped links attached with a “continuous” type joint to the base link. Continuous type joints are rotated from an angle of positive to negative infinity. We’ll also use “revolute” joints, which allow rotational limits. A full list of joint types can be viewed here.

The origin of the continuous type joints specify that the joints attach to the rear left and right of the base link. The wheel’s visual property is rotated so the wheels point towards the ground. Note that we do not need to change the xyz of the wheel’s origin, as the default origin is the origin of the joint which connects it; we’ve already moved the origins of the joints to the back of the base. The axis of the joint is the axis of rotation for continuous and revolute joints.

Note: A “No inertial data for link” warning may be printed to console if following along with intermediate steps. This will resolve as we add inertial and collision properties later on.

The front two wheels will be attached with steering hinges. Each steering hinge is a link, attached to the base with a revolute joint. Each wheel is then attached to the steering hinge through a continuous joint. Let’s first create the steering hinges. We’ll also add a material element to all of our visual properties, which specifies a custom color.

The revolute joint has a required limit element, specifying the maximum angle the joint can be turned to. The origins are set so that both hinges are attached at the front center of the base. We’ll now attach the front wheels to the hinges with continuous joints. While there appears to be a lot of code, much is boilerplate tagging for simple and repeated specifications.

While this is a nice visual model of our car, we need to specify both collision and inertial properties. These properties will share the same geometry as the visual property; in other words, they’ll overlap with what we’ve defined visually. The collision properties determine how our car will collide with other objects like the ground. The inertial properties determine the mass and distribution of mass in each link of the car. For brevity, the inertial and collision elements are demonstrated for the left front wheel. The completed model can be viewed and downloaded here.

A plane that we’ll drive the car over is provided here. The only new property we use in this plane is contact coefficients, which specifies the interaction between two colliding objects. This will allow the wheels of our car to gain traction over the plane.

Note that this is a very basic URDF file. Complex robots can be specified with realistic physical properties, and meshes defined in STL or STP file formats can be imported to create non-primitive link shapes. If you’re interested in building more complex URDF models, I highly recommend the full ROS tutorial. This won’t be necessary for the remainder of our tutorial.

Manipulating URDF Files

To recap, each robot is just a tree structure of links connected by joints. We manipulate joints to move the robot, while links give the robot its structure. Download the completed car and plane into your working directory before continuing.

We want to use setJointMotorControl2(), which allows us to change the velocity, position, or apply a torque to a joint. This is the main method used to control robots. It takes both a robot ID and a joint ID; we’ll use getJointInfo() to figure out what joint IDs correspond to the axles of the car. We’ll also use getNumJoints(), which takes a robot ID and returns the number of joints it has.

import pybullet as p
p.connect(p.DIRECT)
car = p.loadURDF('simplecar.urdf')
number_of_joints = p.getNumJoints(car)
for joint_number in range(number_of_joints):
info = p.getJointInfo(car, joint_number)
print(info)

This prints out all the information returned by getJointInfo(). If you tried it, you’ll see that it’s a bit difficult to parse. Let’s read the documentation to figure out precisely what we want from “info”. Change the print line to the following:

    print(info[0], ": ", info[1])

From this, we know that the joints connecting to our front and back wheels are 1, 3, 4, and 5, and our steering joints are 0 and 2. We’ll pass these joint IDs into setJointMotorControl2() to control the velocity of the wheel joints and the position of the steering joints.

Let’s both load our robots and use the useful addUserDebugParameter() method available in PyBullet’s p.GUI mode. Copy the code into a file.

import pybullet as p
from time import sleep
p.connect(p.GUI)
p.setGravity(0, 0, -10)
angle = p.addUserDebugParameter('Steering', -0.5, 0.5, 0)
throttle = p.addUserDebugParameter('Throttle', 0, 20, 0)
car = p.loadURDF('simplecar.urdf', [0, 0, 0.1])
plane = p.loadURDF('simpleplane.urdf')
sleep(3)

Sliders are added to our GUI view that allow us to dynamically input values to our program through the simulation. The ID returned by addUserDebugParameter() is passed to readUserDebugParameter() to retrieve the value on the slider. If you’re a bit overwhelmed by the number of functions we’re showing you, don’t worry! All of PyBullet is essentially a collection of functions, and learning to use them is at first daunting.

Adding on to the code above:

wheel_indices = [1, 3, 4, 5]
hinge_indices = [0, 2]

we specify which indices correspond to which robotic joints. We can then use setJointMotorControl2(), readUserDebugParameter(), and stepSimulation() to finally control our car! Append the following code:

while True:
user_angle = p.readUserDebugParameter(angle)
user_throttle = p.readUserDebugParameter(throttle)
for joint_index in wheel_indices:
p.setJointMotorControl2(car, joint_index,
p.VELOCITY_CONTROL,
targetVelocity=user_throttle)
for joint_index in hinge_indices:
p.setJointMotorControl2(car, joint_index,
p.POSITION_CONTROL,
targetPosition=user_angle)
p.stepSimulation()

and run the file. The readUserDebugParameter() call returns the value of the slider at each timestep. This value is passed into setJointMotorControl2() with the car ID and joint ID to control either the velocity or position of the joints that articulate our robot. After changing the behavior of our joints, we call stepSimulation() for PyBullet to handle all of the physics regarding robotic collision and movement.

Running the program we built, we can adjust the sliders on our GUI to change the wheel joint’s velocity, in turn spinning the wheel links, which make contact with the plane. As we specified a friction property in the plane’s URDF file, this contact moves the car forward. We can adjust mass, friction, and shape properties our of URDF files to change how the car behaves in the simulation.

Recap

PyBullet simulates the physical dynamics of robots and their collisions. It uses a client-server API, where we send queries or commands to the PyBullet module to receive information about or manipulate robots.

Robots are defined as links connected by joints in URDF files. Joints have a parent and child link that they connect, and can rotate, slide, or be fixed.

We apply a torque to or change the velocity or position of joints to move our robots in the PyBullet simulation. As we move joints, PyBullet simulates motion and collision of our robots.

Now that we have a basic understanding of PyBullet, we’ll use our car and plane model to create an OpenAI Gym environment that we can use to train and test reinforcement learning agents. See part two for the final environment.

--

--