Subsections

5.3 Understanding the Procedure

All the different steps in the separation of variable procedure as described may seem totally arbitrary. This section tries to explain why the steps are not arbitrary, but really quite logical. To understand this section does require that you have a good understanding of vectors and linear algebra. Otherwise you may as well skip this.

5.3.1 An ordinary differential equation as a model

Partial differential equations are relatively difficult to understand. Therefore we will instead consider an ordinary differential equation, but for a vector unknown:

Here is some given constant matrix. The initial conditions are:

If you want to solve this problem, the trick is to write in terms of the so-called eigenvectors of matrix :

Here , , ...are numerical coefficients that will depend on time. Further , , ..., are the eigenvectors of matrix . By definition, these satisfy

where , , ...are numbers called the eigenvalues of matrix . If is not a defective matrix, a complete set of independent eigenvectors will exist. That then means that the solution of the problem can indeed be written as a combination of the eigenvectors. For simplicity, in this discussion it will be assumed that is not defective.

Now if you substitute the expression for into the ordinary differential equation

you get

Here the dots in the left hand side indicate time derivatives. Also, in the right hand side, use was made of the fact that is the same as for every value of .

The above equation can only be true if the coefficients of each individual eigenvector is the same in the left hand side as in the right hand side:

That are ordinary differential equations. You can solve these particular ones relatively easily. However, each solution , , ...will have two integration constants that still remain unknown. To get them, use the initial conditions

where and are given vectors. You need to write these vectors also in terms of the eigenfunctions,

Then you can see that what you need is

That allows you to figure out the integration constants. So , , ...are now fully determined. And that means that the solution

is now fully determined. Just perform the summation at any time you want. So that is it.

The entire process becomes much easier if the matrix is what is called symmetric. For one, you do not have to worry about the matrix being defective. Symmetric matrices are never defective. Also, you do not have to worry about the eigenvalues possibly being complex numbers. The eigenvalues of a symmetric matrix are always real numbers.

And finally, the eigenvectors of a symmetric matrix can always be chosen to be unit vectors that are mutually orthogonal. In other words, they are like the unit vectors , , , ..., of a rotated Cartesian coordinate system.

The orthogonality helps greatly when you are trying to write and in terms of the eigenvectors. For example, you need to write in the form

If the eigenvectors , , ..., are orthonormal, then , , ...can simply be found using dot products:

Usually, however, you do not normalize the eigenvectors to length one. In that case, you can still write

but now you must find the coefficients as

In short you must divide by the square length of the eigenvector. The values for can be found similarly.

The next subsections will now show how all of the above carries over directly to the method of separation of variables for simple partial differential equations.

5.3.2 Vectors versus functions

The previous subsection showed how to solve an example ordinary differential for a vector unknown. The procedure had clear similarities to the separation of variables procedure that was used to solve the example partial differential equation in section 5.1.

However, in the ordinary differential equation, the unknown was a vector at any given time . In the partial differential equation, the unknown was a function of at any given time. Also, the initial conditions for the ordinary differential equation were given vectors and . For the partial differential equation, the initial conditions were given functions and . The ordinary differential equation problem had eigenvectors . The partial differential equation problem had eigenfunctions .

The purpose of this subsection is to illustrate that it does not make that much of a difference. The differences between vectors and functions are not really as great as they may seem.

Let's start with a vector in two dimensions, like say the vector . You can represent this vector graphically as a point in a plane, but you can also represent it as the 'spike function', as in the left-hand sketch below:

The first coefficient, , is 3. That corresponds to a spike of height of 3 when the subscript, call it , is 1. The second coefficient , so there is a spike of height 4 at . Similarly, the three-dimensional vector can be graphed as the three-spike function in the middle figure. If you keep adding more dimensions, going to the limit of infinite-dimensional space, the spike graph approaches a function with a continuous coordinate instead of .

Phrased differently, you can think of a function as an infinite column vector of numbers, with the numbers being the successive values of . In this way, vectors become functions. And vector analysis turns into functional analysis.

5.3.3 The inner product

You are not going to do much with vectors without the dot product. The dot product makes it possible to find the length of a vector, by multiplying the vector by itself and taking the square root. The dot product is also used to check if two vectors are orthogonal: if their dot product is zero, they are orthogonal. In this subsection, the dot product is generalized to functions.

The usual dot product of two arbitrary vectors and can be found by multiplying components with the same index together and summing that:

The below figure shows multiplied components using equal colors.

The three term sum above can be written more compactly as:

The is called the “summation symbol.”

The dot (or “inner”) product of functions is defined in exactly the same way as for vectors, by multiplying values at the same position together and summing. But since there are infinitely many -values, the sum becomes an integral:

 (5.1)

It is conventional to put a comma between the functions instead of a dot like for vectors. Also, people like to enclose the functions inside parentheses. But the idea is the same, as illustrated in the figure below:

As an example, the ordinary differential equation model problem involved a given initial condition for . To solve the problem, vector had to be written in the form

Here the vectors were the eigenvectors of the matrix in the problem. The coefficients could be found using dot products:

This can be done this way as long as the eigenvectors are orthogonal. The dot product between any two different eigenvectors must be zero. The eigenvectors were indeed orthogonal, because it was assumed that the matrix in the problem was symmetric.

Similarly, the partial differential equation problem of section 5.1 involved a given initial condition for . To solve the problem, this initial condition had to be written in the form:

Here were the so-called eigenfunctions found in the separation of variables procedure. The coefficients can be found using inner products

This can be done this way as long as the eigenfunctions are orthogonal. The inner product between any two different eigenfunctions must be zero. The next section explains why that is indeed the case.

5.3.4 Matrices versus operators

This section compares the solution procedure for the ordinary differential equation

to that for the partial differential equation

You may wonder whether that makes any sense. A matrix is basically a table of numbers. The “linear operator” is shorthand for “take two derivatives and multiply the resulting function by the constant .”

But the difference between matrices and operators is not as great as it seems. One way of defining a matrix is as a thing that, given a vector , can produce a different vector ;

Similarly you can define an operator as a thing that, given a function , produces another function :

After all, taking derivatives of functions simply produces another function. And multiplying a function by a constant simply produces another function.

Since it was already seen that vectors and functions are closely related, then so are matrices and operators.

Like matrices have eigenvectors, linear operators have eigenfunctions. In particular, section 5.1 found the appropriate eigenfunctions of the operator above to be

(This also depended on the boundary conditions, but that point will be ignored for now.) You can check by differentiation that for these eigenfunctions

So they are indeed eigenfunctions of operator .

But, as the previous subsection pointed out, it was also assumed that these eigenfunctions are orthogonal. And that is not automatic. For a matrix the eigenvectors can be taken to be orthogonal if the matrix is symmetric. Similarly, for an operator the eigenfunctions can be taken to be orthogonal if the operator is symmetric.

But how do you check that for an operator? For a matrix, you simply write down the matrix as a table of numbers and check that the rows of the table are the same as the columns. You cannot do that with an operator. But there is another way. A matrix is also symmetric if for any two vectors and ,

In other words, symmetric matrices can be taken to the other side in a dot product. (In terms of linear algebra

where the superscript indicates transpose. For the two expression always to be the same requires .)

Symmetry for operators can be checked similarly by whether they can be taken to the other side in inner products involving any two functions and :

To check that for the operator above, write out the first inner product:

Now use integration by parts twice to get

So operator is symmetric and therefore it has orthogonal eigenfunctions. (It was assumed in the integrations by parts that the functions and satisfy the homogeneous boundary conditions at and given in section 5.1. All functions of interest here must satisfy them.)

5.3.5 Some limitations

Some limitations to the similarity between vectors and functions should be noted.

One difference is that the functions in partial differential equations must normally satisfy boundary conditions. The ones in the example problem were

Usually you do not have boundary conditions on vectors. But in principle you could create an analogue to the first boundary condition by demanding that the first component of vector is the same as the second. An analogue to the second boundary condition would be that the very last component of vector would be zero.

As long as matrix respects these boundary conditions, there is no problem with that. In terms of linear algebra, you would be working in a subspace of the complete vector space; the subspace of vectors that satisfy the boundary conditions.

There is another problem with the analogy between vectors and functions. Consider the initial condition for the solution of the ordinary differential equation. You can give the components of completely arbitrary values and you will still get a solution for .

But now consider the initial condition for the solution of the ordinary differential equation. If you simply give a random value to the function at every individual value of , then the function will not be differentiable. The partial differential equation for such a function will then make no sense at all. For functions to be meaningful in the solution of a partial differential equation, they must have enough smoothness that derivatives make some sense.

Note that this does not mean that the initial condition cannot have some singularities, like kinks or jumps, say. Normally, you are OK if the initial conditions can be approximated in a meaningful sense by a sum of the eigenfunctions of the problem. Because the functions that can be approximated in this way exclude the extremely singular ones, a partial differential equation will always work in a subspace of all possible functions. A subspace of reasonably smooth functions. Often when you see partial differential equations in literature, they also list the subspace in which it applies. That is beyond the scope of this book.