Newton’s laws and conservation of energy are two approaches to solving for the equations of motion of an object. We can make Newtonian mechanics more elegant by extending them to fields and potentials. But ultimately, Newtonian mechanics is still cumbersome to use. Here is an alternate, more beautiful approach - Lagrangian mechanics.
Note that the dots are used for the time derivatives - that is, x˙=dtdx. The action is a fundamental quantity of all physical systems and is given by the time integral of the Lagrangian:
The principle of stationary action states that for any given system, the action is stationary. What does stationary mean? Recall the idea of stationary points in calculus - which include minima and maxima. For the action to be stationary, that means the Lagrangian must be a stationary function, which are analogous to stationary points, just for the action, which is a function of functions (what we call a functional, which we’ll go more in-depth with later).
But what form does that Lagrangian have to take to obey the principle of stationary action? The short answer is that it must obey the following equation, known as the Euler-Lagrange equation:
where again, x˙=dtdx is the velocity. This is one of the most fundamental and profound equations of physics, and works for any particle’s Lagrangian (particle, remember, can be a big object like a planet or star, it is a generic term in physics). Once you write down the Euler-Lagrange equation, you just need to take the derivatives of the Lagrangian and substitute to get the equations of motion (the differential equations you use to solve for the trajectory of the particle). Applying it, at least conceptually, is fairly simple.
But to gain a deeper understanding of why this equation works, we must first dive into the theory of functionals and variational calculus. If this section is too math-heavy, feel free to skip this section - it’s not required for applying the Euler-Lagrange equation. But for those that want the step-by-step derivation - let’s dive in!
A functional is a function that takes in other functions as input. In contrast to a function f which takes in a real number x and outputs f(x), a functional L takes in a function y(x) and outputs L(y(x),y′(x),x).
The derivative appearing in L(y,y′,x) and the mention of the word “calculus” suggests that functionals are based in differential operators, such as derivatives and integrals. Indeed, this is the case - a great number of functionals are in fact integrals.
Consider, for instance, a functional that appears - under a different name - in a first introduction to calculus. This is the functional expression for the arc length:
While an introductory treatment of calculus may simply give this formula with the provided function y and its derivative y′, the calculus of variations would consider this formula a functional of an arbitrary function f in the form:
The calculus of variations is concerned with optimizing functionals to find their stationary points. In many cases, we want to obtain the minimum or maximum of a funtional, but remember that stationary points are more general and can include things like saddle points and other points of inflection (i.e. points around which the second derivative changes sign).
In our case, we want to figure out which pathy(x) is the shortest distance between points x=a and x=b. Translated to mathematical terms, we can say that we want to optimizeS(y,y′,x) for the function y(x) that minimizes S. But how do we do so? The answer requires a fair bit of explaining, so this is a section to be read through slowly.
Consider a general functional S(f,f′,q) where the functional S is a function of f(q), f′(q), and q. Here, f(q) is a parametric function of one parameter q - we will explore specific cases of f(q) later (hint: one of these will be the position function x(t) which is a parametric function where the parameter is t). Our functional S(f,f′,q) is given by:
All of this is certainly very abstract, so let us examine what it all means. S is a functional, meaning that it takes some function f(q) and outputs a number. The precise thing it does, in this case, is to integrate any composite function of f, its derivative f′(q), and its input q, between two points in the domain of f. For notational clarity, we call this composite function of f, f′, and q as L(f,f′,q). As we are taking the integral of the composite function, this results in a number, since definite integrals return a number. So to sum it all up, S is a functional, that, given any function f(q) - whatever the function may be - returns the definite integral of any possible composite function of f, its derivative f′, and its input q.
We want to find the function f that minimizes or maximizes S. This means we want to find a function for which S does not change with respect to f (similar to how the derivative is zero at a critical point in normal calculus). To find this optimal function, let us vary S by adding a function η(q) multiplied by a tiny number ε to f between q1 and q2 - this represents adding a tiny shift, also called a variation, to S. Our particular shift is such that η(q1)=η(q2)=0, meaning that η(q) vanishes at the endpoints, since we want this variation to only be between q1 and q2 (and nowhere outside of that range). We then have:
Our next step is to find the amount of change δS between S(f,f′,q) and S(f+εη,f′+εη′,q). As a first step, we want to compute L(f+εη,f′+εη′,q), as that will allow us to compute S(f+εη,f′+εη′,q), which we need in order to calculate δS.
Recall how, in single-variable calculus, we can express a small shift y(x+h) in a function y(x) for some tiny number h by:
In the limit as ε→0, we would expect that δS=0, as the function that maximizes (or minimizes) S, again, is the function for which Sdoes not change with respect to f. In formal language, this is called the process of varyingS by a variationε, and then demanding that ε→0limδS=0. This is why this form of calculus is called the calculus of variations or variational calculus. By setting δS=0 we have:
We would, however, prefer some way to get rid of the added function η(q) to obtain an equation that doesn’t depend on η. We can do this by explicitly performing the above integral. First, we split the sum into two parts for mathematical convenience for the following steps:
We now simplify the second term in the integral by performing integration by parts to evaluate the integral. Recall that the integration by parts formula is as follows:
If we let u=∂f′∂Lε and dv=dqdη, then v=∫dqdηdq=η(q) and du=dqd(∂f′∂L)ε. By substituting these in (we keep the first term there and don’t evaluate, we only perform integration by parts on the second term) we have:
no need to evaluate∫q1q2∂f∂Lηεdt+integrate by parts∫q1q2∂f′∂Ldqdηεdt=∫q1q2∂f∂Lηεdt+result of integration by parts[∂f′∂Lηε∣∣q1q2−∫q1q2ηdqd(∂f′∂L)εdt]
But recall from earlier that we defined η(q) such that η(q2)=η(q1)=0, meaning that the ∂f′∂Lηε∣∣q1q2 term goes to zero. Therfore, we are only left with:
Where in the last term we re-joined the sum of the integrals (which will make the next steps much easier). We know that the integral quantity we derived in the last step must be equal to zero, given that δS=0 is our fundamental requirement for finding the stationary points (minima, maxima, etc.) of functionals. Therefore we have:
Where we factored the common terms out of the integral in the last step. But since our integral is zero, by the fundamental lemma of the calculus of variations, our integrand must be zero as well, and resultingly our quantity in the squared brackets must also be zero. That is:
The last result is the general form of the Euler-Lagrange equation for our functional S. Since our functional is a very general functional, the Euler-Lagrange equation applies to a huge set of functionals - indeed, all functionals in the form S[f(q),f′(q),q] (it is customary to use squared brackets when writing out the functional in its full form, but for our short form S(f,f′,q) it is permissible to simply use parentheses). Thus, it is an extremely crucial and useful equation, so let us write it down one more time:
We want to find y(x) that minimizes this functional, and for this we can use the Euler-Lagrange equation. In this case, f=y(x), f′=y′, and L=1+y′2, so the Euler-Lagrange equation for this particular functional reads:
We may now compute the derivatives (which is much-simplified by the fact that L=1+y′2does not depend on y, but we must be careful to remember that dxdf(y′)=f′(y′)y′′ due to the chain rule):
Where m,b are constants. This is simply an equation of a straight line! By applying the calculus of variations, we have therefore shown that the shortest path between two points a,b - in functional terms, the path that minimizes the arc length - is a straight line. It may seem to be an obvious result, but proving it required quite a bit of calculus!
Note that the one restriction we must place on this result is that we assume S=∫1+y′2dx is the right equation for the arc length. For regular Euclidean space, this is always the right equation, and Euclidean space is what we’ll work with 99% of the time. But in higher dimensions, and especially in non-Euclidean geometries, the arc length equation is no longer the correct equation for the arc length. We must then use differential geometry to construct the right equation for the arc length. But that is a topic we will cover in Chapter 3.
In physics, we consider a specific case of the Euler-Lagrange equation, where (as mentioned at the beginning) q=t is the time, f=x(t) is the position, and f′=x˙=dtdx is the velocity. Therefore, the Euler-Lagrange equation, in its common form used in physics (specifically, Lagrangian mechanics), becomes:
Again, the Euler-Lagrange equation can be used to solve for the equations of motion as long as the Lagrangian is known. Note that for a more general set of coordinate systems, where the system is not one-dimensional motion along the x axis, there is an Euler-Lagrange equation that applies to each coordinate, each of which takes the following form:
where qi stands in for the particular coordinate, so qi can be any one of x,y,z when working in Cartesian coordinates, or any one of r,θ,ϕ when working in spherical coordinates. And in the specific case when we are interested in solving for the motion of a system of objects, and not just of one individual object, it should be noted that the kinetic and potential energies are those of the system - that is, the sum of the kinetic and potential energies of every object in the system:
Note that the Euler-Lagrange equations apply primarily to closed systems, i.e. systems with no external force acting on them. If there is an external applied force on the system that does work W, then the Euler-Lagrange equations become:
Having examined the fundamental theory behind Lagrangian mechanics, we will now look at a few examples of increasing difficulty, to illustrate its usefulness and mathematical elegance.
Using Lagrangian mechanics to solve the simple pendulum¶
We consider the classical problem of the simple pendulum, an idealized model of a pendulum that is frequently introduced as an example of a harmonic oscillator. A diagram of the simple pendulum configuration is shown below:
For the single pendulum problem, we first find the equations x(t) and y(t) given our coordinate system. Our coordinate system is based on the point (0,0) located at the point where the pendulum is attached to the ceiling. Using basic trigonometry, we find that:
Where y(t) is negative because the pendulum is at a negative height relative to our origin. Using our expressions for x(t) and y(t), we want to find the expression for the kinetic energy K. We know that:
To do this, we solve for dtdx and dtdy. This takes a bit of care, because we need to implicitly differentiate x(t) and y(t) with respect to t, where:
Now we find the potential energy. Remember that close to Earth, the potential energy is determined by and only by the vertical distance between the origin (which is the reference height of zero) and the measured point. This means that:
We have arrived at our answer. This is the differential equation of the simple pendulum. Note that while this equation is impossible to solve analytically directly, we can use the small-angle approximation of sinθ≈θ to get:
Using Lagrangian mechanics to solve the orbit equation¶
We want to derive the orbit of Earth around the Sun. To do so, we again first derive the expressions for x(t) and y(t) in terms of the solar-earth system:
These can be solved analytically, but for the sake of simplicity here they will be solved using a numerical differential equation solver:
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import solve_ivp
def newtonian_d_dt(t, X, G=6.67e-11, M=2e30):
r, theta, u, v = X
dr_dt = u
dtheta_dt = v
du_dt = r * v ** 2 -(G * M) / (r ** 2)
dv_dt = -(2 * u * v) / r
return dr_dt, dtheta_dt, du_dt, dv_dt
tmax = 365 * 24 * 60 * 60 # 1 year
samples = 5000
t = np.linspace(0, tmax, samples)
newtonian = solve_ivp(newtonian_d_dt, (0, tmax), y0=newtonian_initial, dense_output=True)
sol = newtonian.sol(t)
fig = plt.figure()
ax = plt.axes()
r = sol[0]
theta = sol[1]
# Convert from polar to cartesian
x1 = r * np.cos(theta)
y1 = r * np.sin(theta)
ax.plot(x1, y1)
ax.set_title('Plot of Newtonian orbit')
plt.show()
As it can be seen, the orbit is a ellipse, and we have arrived at this result using Lagrangian mechanics!
Using Lagrangians to solve the double pendulum problem¶
We will now tackle a problem that would be very difficult to solve using Newton’s laws, but much easier with Lagrangian mechanics. Here we have a system as follows:
Here, the notable difference is that we have a system as opposed to a single object, and we need to find the kinetic and potential energies of the entire system. To do this, we divide the kinetic and potential energies into two parts:
Where K1 and K2 are respectively the kinetic energies of the first pendulum mass and second pendulum mass, and likewise with U1 and U2 and their potential energies.
We will first derive the kinetic energies, because they are harder :( As we know, we first setup a coordinate system where the point (0,0) is centered on the point the double pendulum is attached to the ceiling. Then, we write the position functions of the first pendulum:
We figure these out from basic trigonometry and the fact that y1(t) is negative, as it is below the origin. We then take the derivatives to find the x and y components of the velocity:
Here, we add the x and y displacement of the second pendulum with the x and y displacement of the first to find the total displacement from the origin, because remember, we’re using the same coordinate system for both pendulums. If we sub in the values of x1(t) and y1(t), we have:
We use a similar approach for the potential energies - we add the potential energy of the first pendulum and the second to find the total system’s potential energy:
Let’s see how we can recover Newton’s 2nd law from the Euler-Lagrange equation. Remember that the equation (in the case of one-dimenional motion along the x axis) is given by:
We can even use Lagrangian mechanics on simple problems and check that it matches with Newtonian mechanics. Let’s do our freefall example from earlier. With K=21my˙2 and U=mgy, we use the Euler-Lagrange equations to find:
Besides working with particles and their trajectories, we are also often interested in fields in physics, such as the electromagnetic or gravitational fields. To find the differential equations that describe these fields, we need an Euler-Lagrange equation for fields rather than particles.
Let us consider a generic field φ(r,t). For reasons that will be elaborated in more detailed in the special and general relativity sections (we will give a rough outline for why in a note further down this section), it is conventional to group the space and time components in one vector X which has four components, one of time and three of space, and where c is the speed of light:
Where L is our field Lagrangian, d4x=dVdt is an infinitesimal portion of space and time, and Ω is the domain of all space and all times. We will not repeat the full derivation of the Euler-Lagrange equation, but the steps are very similar to what we have already seen with the single-object Lagrangian case. The result is the Euler-Lagrange equation for fields, which takes the form:
The Lagrangian formulation of classical mechanics is so powerful, precisely because it relies on a differential equation that can be generalized. Beyond classical mechanics, the Lagrangian isn’t always necessarily L=K−U, but the Euler-Lagrange equations still hold true, and so does the principle of stationary action. Thus, a theory - including those that involve fields - can be written as a Lagrangian, as the Euler-Lagrange equations yield the equations of motion for each theory, on which the rest of the theory is built on! This is the reason behind learning Lagrangian mechanics.
We will end with one final thought - one of the most successful theories in all of physics, the Standard model of particle physics (which is a quantum field theory), is encapsulated in one compact Lagrangian:
And one of the most mathematically beautiful theories, in fact one we will see very soon, General Relativity (which is a classical field theory for gravity), is described in another compact Lagrangian: