# Cart-Pole Control

Dynamics, Control, and “Learning”: Cart-Pole

Introduction

Here we will explore dynamics, modern control methods, and trajectory optimization by implementing various methods to control the canonical underactuated system, the cart-pole example. This is a classic control problem because deriving the dynamics is relatively straightforward, but it still belies some inherent complexity due to its underactuated nature. This means there are fewer actuated degrees of freedom than there are degrees of freedom of the system. In this case, there is only 1 actuated DOF, but 2 system DOF. The concepts explored here can be extended to control any dynamic system, though not all systems are so straightforward to implement.

Part 1. Deriving Inverse Dynamics

The dynamics of the cart pole system is determined by evaluating the system using Lagrangian Mechanics. The Lagrangian of the cart pole system uses the general energy equation L = T – V where L is the Lagrangian term generalizing the total energy in the system, T is the kinetic energy of the system, and V is the potential energy of the system. The kinetic energy of the system is represented by:

The potential energy of the system is represented by the potential energy of the pole. Pole Potential Energy:

The nonlinear dynamics are then found by finding the Euler-Lagrange Equations using: With j = 2 and q represents generalized coordinate variables of x and theta. This set of partial and total derivatives are computed with the help of the MATLAB symbolic toolbox.

The nonlinear dynamics can then be represented as: These dynamics need to be linearized in order to implement a linear controller. Linearization is achieved by computing the Jacobian in MATLAB and then inserting the resulting equations into state space form: The dynamics derived here are implemented through a majority of the simulations. However for trajectory planning, the inertia of the pole is also considered to increase accuracy of the simulation. The primary changes in the dynamics derivation are as follows:  Part 2. Simulator Dynamics

Simscape Multibody in Simulink is used to implement the forward dynamics of the system. This toolbox is an extension to the simulink software provided by MATLAB. It is a physics simulator built around the MATLAB ode45 solver that makes modeling of rigid body dynamics and contact forces relatively straightforward, while letting users harness the power of the MATLAB control systems and matrix algebra tools. Below is a Simscape Multibody diagram modeling the dynamics of the cart-pole system.

The cart is modeled as being attached to the world frame by a prismatic joint, letting it move freely in the x direction while being constrained in all other directions and orientations. On the other hand, the pole is attached to the cart by a revolute joint, letting it spin freely around its base. Here is a video of the system in freefall: Part 3. Manual Control Gains

As we can see from part 2, this system is unstable when the pole is upright above the cart. In order to keep the pole upright, we must drive the system to its point of unstable equilibrium, which is when the pole is directly above the cart’s center of mass.

We accomplish this by using closed loop full state feedback. Our state vector is taken to be [x, dx, theta, dtheta], which is multiplied by a feedback gain -K and fed back into our plant as the input force to the side of the cart in the positive x direction. In this section we use a tried and true method, hand gain tuning, to find a gain vector K that successfully keeps the pole upright. After some trial and error, a K vector of [-1.5 -2.5 30 7.5] was found to successfully control the cart-pole system. Here is a video showing our results: State Feedback Cart-Pole Control with an Initial Theta of 45 degrees.

As can be seen, decent controlled behavior can be found on a simple system such as this by just using hand-tuned gains. Unfortunately, in any real system the number of states will increase, and the intuition of how the gains affect the system can become much more difficult. Also, there is no guarantee that the gains found are optimal in any way .Wouldn’t it be great if we could somehow automatically compute optimal feedback gains somehow?

Part 4. LQR Control Gains

That’s where LQR (linear quadratic regulator) comes into the picture. LQR poses the control problem as an optimization problem in which we are trying to minimize a quadratic cost function that is a combination of a current state cost, an input cost, and (optionally) a final state cost. This method is guaranteed to give an optimal controller for the cost function provided. The only hiccup in using LQR is that (as the name implies), the system being controlled must be linear. In our case, the cart-pole is a nonlinear system. So how do we control it? By linearizing the system! This is done by taking equations of motion we derived above, and taking the taylor series of each of these equations about some operating point. In the cart-pole example, we will take the operating point as the unstable equilibrium point when the pole is directly above the cart. This gives us with an state vector operating point of [0 0 0 0]. After linearizing the dynamics of the system, we put the dynamics into state space form in order to easily compute the LQR controller. The state space matrices we find are: Now we have to define a Q and R matrix. The Q matrix is the gain of the cost function on the states of the system, while the R matrix is the gain of the cost function on the input to the system. Because we are primarily concerned with keeping the pole upright (theta=0), and driving the cart back to its starting point (x=0), we give higher cost to these states. In addition, ideally we want the input to be as minimal as possible, so we give a higher cost to R as well. This results in the following LQR gains: Once we have a linear state space model of our system around our control point and our LQR gains defined, finding the LQR state feedback gain K is as easy as typing lqr(sys) into MATLAB. Having done this, the K vector found is: This results in the following behavior of the system:

This same control synthesis process can be used to derive a controller for the double inverted pendulum cart-pole system:

Part 5. Nonlinear Control to Optimize Basin of Attraction

Though LQR worked great for stabilizing the cart-pole system around its unstable equilibrium point, it’s basin of attraction is limited. The basin of attraction is defined as the set of initial conditions for which the controller will stabilize the system to an “attractor”, which here is the pole upright position. In the cart-pole case, the most important states for system stability are theta and dtheta. With this in mind, and for more intuitive visualization, we will neglect x and dx when talking about the basin of attraction for this system. For the LQR case the following plots show the basin of attraction for the system:

This plot shows the initial conditions for which the controller successfully stabilized the cart-pole system. As can be seen from this plot, the basin of attraction for the cart-pole with a LQR controller is limited. This is because the controller is derived for the system when it is linearized around the zero operating point. When we stray to far from this 0 linearization point, the linear system approximation to the full nonlinear system becomes poor enough that the controller does not function correctly.

In order to increase the basin of attraction, and thus the robustness of the control policy, we implemented a nonlinear control policy inspired by the LQR-Trees approach. Basically, instead of only linearizing the system about one unstable equilibrium point ([0 0 0 0]), the system was linearized for a range of states. Once again, because theta and dtheta are most important to the stability of the system, the system was systematically linearized around a uniform range of both of these states while keeping the other two states constant. 81 different states were considered and systematically linearized. For each linearization, a state space system was derived, and a LQR feedback gain was computed. These feedback gains were stored in a lookup table matrix of 81 different feedback gain vectors. At any given time, the system will find the linearized feedback gain computed for the operating point closest to the current state of the nonlinear system. Thus, though the system is always still using a linear controller, the linear approximation of the dynamics used to derive that controller will be much closer to the real dynamics of the system at that point. This results in a significant increase in the basin of attraction, which is summarized in the plots below: Comparison between the Basin of Attraction of a Normal LQR Controller (red) and an LQR-Trees Control Policy (blue)

Part 6. Trajectory Planning

Up to this point, all the controllers implemented were to drive a system from an unstable initial condition to the zero position where the pole is directly above the cart. But what if we want to swing the pole from its stable hanging position up to its unstable equilibrium point? The traditional feedback controllers do not work well for this purpose, so trajectory optimization is implemented.

Trajectory optimization is the process of designing a trajectory that minimizes (or maximizes) some measure of performance while satisfying a set of constraints. Generally speaking, trajectory optimization is a technique for computing an open-loop solution to an optimal control problem.

In our case, we pose the problem as a minimization problem in which we are trying to minimize the control input while satisfying the constraints that the system must follow its dynamics. For this we use the dynamic equations of motion derived in part 1.
The trajectory is discretized into 150 points between the initial state (the stable hanging position), and the target state (pole over cart). These 150 points are then optimized using the direct collocation method. Finally, a trajectory is interpolated between these 150 points to create a continuous trajectory. The problem as a whole is modeled as a boundary-value problem with time as the constrained boundaries. By varying the amount of time we want the trajectory to take, we can get different behaviors. These different swing-up behaviors can be seen below: 