Está en la página 1de 218

UNIV.

OF

T08G8TO
.

:i-"-;, .ii:

MATHEMATICS OF RELATIVITY
LECTURE NOTES
BY

G.*Y?RAINICH
in

(All Rights Reserved)


(Printed in U. S. A.)

EDWARDS BROTHERS,
Lithoprinters and Publishers

ANN ARBOR, MICHIGAN

INC.

COHTBHT8
Introduction
Page
.
Chapter I. OLD PHYSICS
1. Motion of a Particle.
The Inverse Square Law
2. Two Pictures of Matter
3. Vectors, Tensors, Operations
4. Maxwell's Equations
5. The Stress-Energy Tensor
6. General Equations of Motion.
The Complete Tensor

10

Chapter II . NEW GEOMETRY ....'.


7. Analytic Geometry of Four Dimensions
8. Axioms of Four-Dimensional Geometry
9. Tensor Analysis
10. Complications Resulting From Imaginary Coordinate
11. Are the Equations of Physics Invariant
12. Curves in the New Geometry

12
12
14
16
20
22
24

.
Chapter III. SPECIAL RELATIVITY
13. Equations of Motion
,
14. Lorentz Transformations
15. Addition of Velocities
16 . Light Corpuscles , or Photons
17. Electricity and Magnetism in Special Relativity

26
26

1
1
3:

5
7
9

28
29
31
33

Chapter IV. CURVED SPACE


18. Curvature of Curves and Surfaces
19. Generalizations
20. The Riemann Tensor
21. Vectors in General Coordinates
22. Tensors in General Coordinates
23. Covariant and Contravariant Components
24. Physical Coordinates as General Coordinates
25. Curvilinear Coordinates in Curved Space
26. New Derivation of the Riemann Tensor
27 . Differential Relations for the Riemann Tensor
28. Geodesies

35
35
37
39
41
44
46
49

Chapter V. GENERAL RELATIVITY


29. The Law of Geodesies
30. Solar System, Symmetry Conditions
31. Solution of the Field Equations
32. Equations of Geodesies
33. Newtonian Motion of a Planet
34. Relativity Motion of a Planet
35. Deflection of Light
36. Shift of Spectral Lines

57
57
58
60
61

50
53
54
55

62
63
65
66

INTRODOCTIOS
Since we are going to deal with applied
Mathematics, or Mathematics applied to Physics
we have to state In the beginning the general
point of view we take on that subject. A mathematical theory consists of statements or propositions, some of which are written as formulas.
Some of these propositions are proved,
that
means deduced from others, and some are not;
the latter are called definitions and axioms.
Furthermore most mathematical theories, in particular those in which we are interested, deal
with quantities, so that the propositions take
the form of relations between quantities.
Physical experiments also deal with quantities which are measured in definite prescribed
ways, and then empirical relations are established between these measured quantities.
In an application of Mathematics to Physics a correspondence is established between some
mathematical quantities and some physical quantities in such a way that the same relationship
exists (as a result of the mathematical theory)
between mathematical quantities as the experimentally established relation between the corresponding physical quantities. This view is
not new, it was emphatically formulated by H.
Hertz in the introduction to his Mechanics, and
then emphasized again by A. S. Eddington in application to Relativity. The process of establishing the correspondence between the physical
and the mathematical quantities we shall, following Eddington, call identification. An identification is successful, if the condition mentioned above is fulfilled, viz., if the relations deduced for the mathematical quantities
are experimentally proved to exist between the
Physical quantities with which they have been
identified. From this point of view we do not
speak of true or false theories, still less of
absolute truth, etc.; truth for us is nothing
but a successful identification, and it is necessary to say expressly that there may exist at
the same time two successful identifications,
two theories, each of which may be applied within experimental errors to the known experimental
results; and that there may be times when no
such theory has been found; and also that an
identification which is successful at one time
may cease to be so later, when the experimental
precision will be increased.
Very often it happens that the quantities
of a theory are compared not with quantities which
are direct results of experiment but with quantities of another, less comprehensive, theory

whose identification with experimental quantities has proved successful.


In fact, it seldom happens that we have to deal with direct
results of experiment, since even an experimental paper usually contains a great amount of
theory.
We may consider Geometry as a first attempt
at a study of the outside world. It may be considered as a deductive system which reflects
(in the sense explained above, that is of the
existence of a correspondence, etc.) very well
our experiences with some features of the outside world, namely features connected with the
displacements of what we call rigid bodies. We
see at once how much is left out in such a study;
in the first place, time is almost entirely
left out: in trying to bring into coincidence
two triangles we are not interested in whether
we move one slowly or rapidly; in describing a
circle we are not concerned with uniformity of
motion. Another important feature that is left
out is the distinction between vertical and
horizontal lines although we know that this
Then optical
distinction is a very real one.
and electromagnetic phenomena are loft out. We
see thus that Geometry is not a complete theory
of the outside world; one method of building a
complete theory would be to introduce corrections into geometry, to introduce one by one
time, gravitation, ether, electricity, to patch
this is a
it up every time we discover a hole;
correct
but
description
roughly
disrespectful
of the actual development of Mathematical PhysAnother method would be to scrap Geometry,
ics.
and to build instead a new theory which would be
an organic whole, embracing the displacement
phenomena as a (very important, to be sure) special case. The purpose of the following discussion is to exhibit such a theory. Before we
come to the systematic exposition we want to
say something about the plan we are going to
follow.
It is possible to start with a complete statement of the general theory, and then
show how special features mlfcr be obtained by
specializations and approximations; instead, on
didactical grounds, we shall begin with special
cases and work up by modifications (these being
counterparts of approximations) and generalizaThe essential difference between this
tions.
procedure and the development of classical Physics is that in the latter Geometry was considered as a fixed basis, not to be affected by
the upper structure, and we shall feel free to
modify geometry when necessary.

Chapter I.

OLD PHYSICS

The purpose of this chapter is to reformulate some of the fundamental equations of mechanics and electrodynamics, and to write them in
a new form which constitutes an appropriate basThe conis for the discussion that follows.
tents of the chapter is classical, the modifications which are characteristic of the Relativity theory have not been introduced, but, as
was mentioned the form is decidedly new.

1.

Motion of a Particle.

The Inverse Square Law.


The fundamental equations

of Mechanics of

a particle are usually written in the form


=

1.1

8
X, m-d*y/dt = Y,

= Z.

in

Here m denotes the mass of the particle, x,y,z


are functions of the time t whose values are the
coordinates of the particles at the corresponding time, and X,Y,Z are functions of the coordinates whose values are the components of the
force at the corresponding point. This system
of equations was the first example of what we
may call mathematical physics, and much that is
now mathematical physics may be conveniently
considered as a result of a development whose
This chapter will be
germ is the system 1.1.
devoted to tracing out some lines of this development.
We begin by writing the equations 1.1 in
the form
1.11

dmu/dt = X,

dmv/dt = Y,

dmw/dt = Z,

u = dx/dt,

v = dy/dt,

w = dz/dt

where
1.2

are the velocity components.


rau,

The quantities

mv,

mw

are called the momentum components, and in this


form our fundamental equations express the
statement that the time rate of change of the
momentum is equal to the force, the original
statement of Newton. Equations 1.11 are seen
to be equivalent to 1.1 if we use the notations
1.2 and the fact, usually tacitly assumed, that
the mass of a particle does not change with time
or in symbols

1.12

dm/dt = 0.

In the equations 1.1, x,y,z are usually unknown functions of the time and X,Y,Z are given
functions of the coordinates. The situation is
then this:
first the field has to be described
by giving the forces X,Y,Z, and then the motion
in the given field is determined by solving
equations 1.1 (with some additional initial conditions) .
We shall first discuss fields of a certain
of
simple type. One of the simplest fields
force is the so-called inverse square field.
The field has a center, which is a singularity
of the field; in it the field is not determined; in every other point of the field the force
is directed toward the center (or away from it)
and the magnitude of the force Is Inversely
proportional to the distance from the center.
As the most common realization of such a field
of force we may consider the gravitational field
of a mass particle or of a sphere. If cartesian coordinates with the origin at the center
are introduced the force components are

1.3

X = cx/r 3 ,

Y = cy/r 3 ,

Z = cz/r",

where c is a coefficient of proportionality,


negative, when we have attraction and positive
in the case of repulsion, and
ya

1.4

taking the sum of squares of X,Y,Z we easily


4
find c a /r so that the magnitude of the force
8
is c/r ; force is inversely proportional to the
square of the distance. If the field is produced by several attracting particles the force
at every point (outside of the points where the
particles are located) is considered to be given by the sum of the forces due to the separate
particles. In this case the expressions become
quite complicated and it is easier to study the
general properties of such fields by using certain differential equations to which the forcecomponents are subjected rather than by studying the explicit expressions.
These differential equations are as follows:

1.51

1.52

bX/bx + bY/by + bZ/bz =


bY/bz - bZ/by = 0,

bZ/bx - bX/bz = 0,

bX/oy - bY/bX = 0.
The fact that the functions X,Y,Z given by
the formulas 1.5 satisfy these equations may be
proved by direct substitution; to facilitate

calculation we may notice that differentiation


of 1.4 gives
1.6

r-br/bx = x,

rbr/by = y,

rbr/bz

= z.

Differentiating the first of 1.8, we have now


3
3
4
bX/bx = c/r - 3cx/r - br/dx = c/r

3cx s /r*.

Substituting this and two analogous expressions


into 1.51 we easily verify it. The verification
of 1.52 hardly presents any difficulty.
It is known that equations 1.52 give a necessary and sufficient condition for the existence of a function <p of which X,Y,Z are partial
derivatives. The derivative bX/bx is then the
second derivative of this function, and the system 1.51-1.52 may be replaced by the equivalent
system

1.53

X = b<p/bx,

Y = bq>/dy,

Z = b<p/bz

A situation of this character, a situation


where we have to solve a system of partial differential equations with the "additional condition" of symmetry, will appear again later, and
in order to be clear about its
significance
then, it is desirable to treat here this special case in detail.
To begin with we take the system 1.51, 1.5*
in the form 1.53, 1.54, i.e., we state that X,
Y,Z are derivatives of a function . Then fron
the condition of symmetry of the X,Y,Z field it
follows that the field represented by the function (p also must be symmetric, i.e., that this
function may depend only on the distance from
the origin, because two points which are equidistant from the origin are symmetric with respect to it and, therefore, 9 must have the
same value in two such points.
We have thus to solve equation 1.54 with
the additional condition that
depends on x,
y,z, only through r. Indicating differentiawe have
tion with respect to r by
'

1.54

5 <p/dx

+ b

9/by

b^/bz*

= 0.
btp/bx = <p''br/bx, etc.

1.56
The last equation is known as the equation
of Laplace.
In the particular case where X,Y,Z are given by the formulas 1.3 a function <p of which X,
Y,Z are partial derivatives is (as it is easy
to verify)
q>

and
1

x a = <p"-(dr/bx)*

1.57

Squaring each of the formulas 1.6 and


the sum we have

etc.

taking

= -c/r.

We may say now that the field of force given by l.b satisfies the differential equations
1.51, 1.52 or 1.5&, 1.54 which express the same
It is easy to show that these equations
thing.
are satisfied not only by the field produced by
one particle at the origin, but also by that due
to any number of particles:
first we notice
that if a particle is not at the origin this
results only in additive constants in the coordinates, and so does not affect partial derivatives which appear in the equations 1.51-1.52
which therefore remain true in this case;
secondly, these equations are linear and homogeneous, as a consequence of which the sum of two
solutions of these equations necessarily is a
solution, or if two fields satisfy these equations their sum also satisfies them; if then, as
is generally assumed, the field produced by several particles is the sum of the fields due to
the individual particles, such a field also satisfies equations 1.51, 1.52.
Conversely, it can be proved that any field
satisfying the differential equations 1.51-1.52
may be produced by a - finite or infinite - set
of particles each of which acts according to the
inverse square law. We shall not prove here
this fact (the proof is given in Potential Theequations
ory), but we shall show that these
furnish us back the inverse square law, if we
add the condition that the field must be symmetric with respect to one point.

or

on the other hand differentiating each formula


1.6 we have
bar

1, etc.;

summing these we obtain


=

\5

by

bz

2.

Using now 1.57 we can give the equation 1.54


the form
or

=-

=
which substituted in 1.56 to>
1.6
with
gives 1.?. We see thus that
gether
the general equations 1.51-1.52 give us all the
general information we need about the fields of
force in question; we shall call this system
the system of equations of a Newtonian field or
simply the Newtonian system, although Newton
never considered, the differential equations
that make it up.
We may comment briefly on the mathematical
character of the magnitudes and equations we

whence

'

<p

have been dealing with in this section. At every point X,Y,Z may be considered as the components of a vector, the force vector; we have
thus a vector at every point of space, and this
constitutes what is called a vector field. The
function <p is an example of a scalar field.
These two fields are in the particular relation
that the first is derived from the second by
differentiation. The vector field satisfies
equation 1.51, the left hand side of which is
called divergence of the vector field. To find
a divergence of a vector field we. take the sum
of the derivatives of the components of the vector with respect to the corresponding coordinates, or we differentiate each component with
respect to the corresponding coordinate and add
up the results. The formula becomes more expressive if we number the coordinates by writing
1.71

x = x^,

y = xa ,

z =

xa

and also the vector components, viz.,

1.78

X = Xx ,

Y = X2 ,

Z = X3

The expression for the divergence may


written as

then

be

1.8
1=1

The operation of forming a divergence is of


fundamental importance in what follows.
We abandon now for a while the study of
force fields and direct our attention to the
left hand sides of the equations of motion.

2.

Two Pictures of Matter.

Our fundamental equations 1.1 connect matter as represented by the left hand sides with
forces as represented by the right hand sides.
There seems to be a fundamental difference in
the mathematical aspects of matter and force.
The quantities characterizing matter, the momentum components, for example, are functions
of one variable t, and are subjected to ordinary differential equations, whereas quantities
characterizing force, X,Y,Z are functions of
three variables x,y,z and, as a consequence, are
subjected to partial differential equations;
they are field quantities, whereas the matter
components are not; another way of saying this
is to say that force seems to be distributed
continuously through space but matter seems to
be connected with discrete points. This distinction is, however, not as essential as it
looks; it is merely the result of the point of
view we take. Vt'e could very well consider matter to be distributed continuously
through
space; each of the two theories, the discrete
theory, according to which matter consists of

discrete particles, or material points, each of


which carries a finite mass, and the continuous
theory, according to which matter Is distributed continuously through space, or certain portions of space, may be considered as the limiting case of the other. We may start with material points, then increase their number at the
same time decreasing the mass of each and so
approximate with any degree of precision a firen continuous distribution; or we may start with
a continuous distribution, then make the density decrease everywhere except in the constantly decreasing neighborhoods of a discrete
ber of points, and thus approximate, with
precision a given discrete distribution. It is
clear that there cannot be any question as to
which of the two theories is correct, since the
difference between the two can be made as small
as we please, and therefore the predictions
based on the two theories can be made to agree
as closely as we may wish, so that if one identification is successful within experimental
error the other will be likewise. Mathematically the difference will be largely that between ordinary differential equations, which
are used in treating the motion of discrete
particles, and partial differential equations,
which apply to continuous distributions.
We may remark here that although forces
are usually considered to be continuously distributed in space, it is possible to introduce
a discrete picture here also; this is being actually done sometimes in the electromagnetic
theory, when a field of force is represented bydiscrete lines of force, and the intensity is
characterized by the number of lines per square
inch; we shall not, however, have occasion to
use this picture.
We may also remark here that in the last
few years still a third point of view has appeared (in Quantum Theory) which in a way occupies an intermediate position; mathematically
the treatment is that used in the continuous
case (partial differential equations), but the
interpretation is given in terms of discrete
particles, the continuous quantities being considered as probabilities of a particle being
This
within a certain volume, and the like.
point of view also will not be used in What follows, and is mentioned here only for the sake
of completeness.
We want now to translate the equations of
motion 1.11 into the language of the continuous
theory. Each point of space (or of a certain
portion of space) will be considered as occupied at each moment (or each moment during a
Here
certain period) by a material particle.
we also denote by u,v,w the velocity components
of a particle of matter, but here they are also
considered as functions of coordinates as well
.

as of time; by

u(x,y,z,t),

v(x,y,z,t),

w(x,y,z,t)

we understand the velocity components of a particle which at the time t occupies the position
x,y,z. The fundamental quantity in this theory
corresponding to mass of the discrete theory is
density. A particle does not possess any finite
mass, a mass corresponds only to a finite volume (at a given time). To a point (at a given
time) we assign a density which may be explained as the limit of the mass of a sphere with
the center at the given point divided by the
volume of that sphere as the radius of the sphere
tends toward zero. A better way of putting it
is to say that we consider a point function
p(x,y,z,t) called the density and that the mass
of matter occupying a given volume at a given
time is the integral

Now set dx = x, -

Xj.;

ddx/dt = d(x t - xj/dt = dx,/dt


u, -

dx x /dt

Uj.

substituting this in the preceding relation we


find

dV/dt = V(bu/&x + by /by + bw/bz),


and since

dp/dt
= op/ox- dx/dt +

bp/bydy/dt + bp/bz-dz/dt + bp/bt

= 'op/ox-u +T>p/by -v +"op/oz-w + bp/bt,

J"p(x,y,z,t)-dxdydz
extended over the given volume. This integral
the
will, in general, be a function of time;
mass in a given volume changes with time because
new matter may be coming in and old matter going
out, and they do not exactly balance each other.
But if we consider a certain volume at a given
time, and then consider at other moments the
volume which is occupied by the same matter,
then the mass of matter in that new volume must
be the same. That means that if we consider x,
y,z, as functions of t, namely, as the coordinates of the same particle of matter at different times, and if we consider the region of integration as a variable volume but one that is
occupied by the same particles of matter at all
times, then the integral must be independent of
time, or

the expression for the derivative of mass gives

dm/dt = V-(bpu/bx + bpv/by + bpw/bz + bp/bt)


so that constancy of mass is expressed
condition 2.2.

by

the

A rigorous proof would be based on expressing the integral for the moment t 1 which may be
written as '^(x* ,y',z f jt^dx'dy'dz
using as
variables of integration the coordinates of the
The
corresponding particles at the moment t.
formulas of transformation would be
1

2.3

= f(x,y,z,t'),

x'

z'

= g(x,y,r,t'),

= h(x,y,z,t)

where f(x,y,z,t') is the x coordinate at the


moment t of the particle which at the moment
t was at x,y,z, etc.
Using the formulas of
transformation of a multiple integral we would
obtain
1

.1

-jT/J

p(x,y,z,t)dxdydz =

0.

This may be written also in a differential form

/P(f,g,h,t')-J.dxdydz

as

2.2

bpu/bx + bpv/by + bpw/oz + bp/bt = 0.

.3
where J is the Jacobian of the functions
and the integration is over the volume occupied
at the moment t. Setting the derivative of
this integral with respect to t equal to zero,
and then making t = t and noticing that bf/bt
for t = t is the velocity component u, etc.,
we would find the same equation 2.2.
This equation is called the "continuity
equation of matter" or the "equation of conservation of matter." The corresponding equation
In the discrete theory is the equation (1.12)
which is not usually included among
dm/dt =
The
the fundamental equations of mechanics.
a
in
written
be
very
continuity equation may
simple form if we use the index notations for
the coordinates introduced before (1.71), introduce analogous notations for the velocity components, viz.,
1

We indicate two proofs for this fact; first


an easy but not rigorous proof.
For an infinitesimal volume V = dx-dy-dz we
may consider density as the same in all points
of the volume, so that mass will be the product
Vp and the derivative of this product will be
dV/dt-p + V-dp/dt.

Then again, considering V as a product


we find

dxdydz

dV/dt
*

ddx/dtdydz

+ dx-ddy/dt-dz +

dx-dyddz/dt.

2.4

u =

u-

v = ut ,

w =

u,

and in addition write

2.5

t = x*

and agree, when it is convenient,


for unity so that

to write

u4

Inspection will help to discover a far-reaching


symmetry which again finds its best expression
if we use the index notations introduced above
We may write, In fact, for
(1.71, 2.4, 2.5).
the last three expressions
2.6

2.41

1 = u4

With these notations the continuity


tion becomes
2.21

equa-

= 0.

1,2,8

and we note furthermore, that If we let j hare


take the value 4 we obtain the expression appearing on the left hand side of the equation
of continuity (2.21). We come thus to the idea
of considering the quantities

2.7

Noticing the analogy of this equation with equation 1.8 we are tempted to say that the continuity equation expresses the fact that the
"divergence" of the "vector" of components pu^
is zero.
This involves, of course, a generalization of the conceptions divergence and vector,
because the summation here goes from one to 4
instead of to 3 as in the above formula.
This
generalization will be of extreme importance in
what follows. In the meantime we may notice
that the divergence of the vector pu^ plays the
same role in the continuous theory as the time
derivative of the number m played in the discrete theory.
We now continue the translation of the
equations of the discrete theory into the continuous language. The equations of motion express the fact that the time derivatives of the
momentum components are equal to the force components; limiting our consideration to the left
hand sides of the equations we have therefore
to consider the time derivatives of the momentum components; in the first place the time derivative of mu; without repeating the reasoning
which led us to the continuity equation of matter, and noticing that the only change consists
in replacing of m by mu we find that the time
rate of change of the first component of the momentum vector will be here
2.61

fcpuu/ox + dpuv/oy + opuw/oz + "opu/ot

and the analogous expressions for the other components will be

2.62

dpvu/dx + opvv/dy + opvw/dz + opv/dt

2.63

opwu/ox + opwv/dy + opww/oz + opw/ot.

These expression will have to be set equal to


the force components (or, rather, components of
the force density) in order to obtain the equations of motion. Such equations have been obtained by Euler for the motion of a fluid and are referred to as Euler 's hydrodynamic equations; but
at present we are not so much interested in the
equations of motion as in the mathematical structure of the expressions involving matter components that we have written down. An attentive

iJ

and we see that the expression

I OMJ./C-X!

2.65

J.

= 1,2,3,4

plays a very important part in our theory. The


first three components, i.e., the expressions
obtained for J = 1,2,3, give the time rate Of
change of the momentum components, and the last
gives the exone, obtained by setting j = 4,
pression whose vanishing expresses conservation
The expressions 2.65 appear as a genof mass.
eralization of what we call a divergence, and
we shall call it divergence also, but it is
clear that the whole structure of our expressions deserves a closer study to which we shall
devote our next section.

3.

Vectors, Tensors, Operations.

We shall later treat the fundamental concepts of vector and tensor analysis in a systematic way. At present we shall show how the
language of this theory which for ordinary
space has been partly introduced in section 1
can be applied to the case of four independent
variables and extended so as to furnish a simple way of describing the relations introduced
in the preceding section.
A quantity like p which depends on the independent variables x,y,z,t, we shall call a
scalar field. The four quantities u if u a , U 3 ,
u4 , we shall consider as the components of a
if
vector (or of a vector field; the latter,
we want to emphasize the dependence on the independent variables). The sixteen quantities
furnish an example of a tensor (or tensor
pujiij
field) . A convenient way to arrange the components of a tensor is in a square array; for
instance,

3.1

pu a u t
PU 3 U 8
pu4 u 2

pUjU 3
pu a u,
PU 3 U 3
pu 4 u s

pu4 u4

3.2

OU a /bx
"bu s /bx
OU4/&X

"bu 3 /by

ou a /bz
bu 3 /dz

bu a /bt
ou,/bt

bu4 /oy

We want to mention here a very important tensor


the array of whose components is

3.4

1
1

its components are usually denoted by 8i< so that


is one if the indices have the same value,
jj
and zero, when they are different. The Oj, are
often referred to as the Kronecker symbols.
A tensor has been obtained above from a
vector by differentiation; the same process can
also be applied to a scalar in order to obtain
a vector.
From the scalar p, we would thus obtain a vector whose components are 5&, l, ?.,
ox* oy' oz'
fcp
5^.; this vector is often called the gradient of
the scalar p.
On the other hand the same process may be applied to a tensor.
For instance,
differentiating each of the sixteen components,
=
MJJ
pu^j introduced in 2.7 with respect to
each of the independent variables x^ we obtain
the 16 x 4 numbers (or functions)
We
bMji/dxjj.
call these numbers the components of a tensor
of rank three, and we may call now what we
called simply a tensor, a tensor of rank two, a
vector - a tensor of rank one, and a scalar - a
tensor of rank zero. The operation of differentiation leads then from a tensor (or, better,
a tensor field) to a tensor of the next higher
rank.
It is also convenient to introduce another
operation, the operation of contraction; it can
be applied to a tensor of at least rank two and
it lowers the rank by two; for a tensor of rank
two it consists in forming the sum of all the
components whose indices are equal, or, if the
components are arranged as explained before, in
taking the sum of all the components in the
main diagonal.
The operation of taking the divergence of
a vector (field) may be stated now to consist
of the operation of differentiation followed by
the operation of contraction applied to the resulting tensor of rank two.
A tensor of rank three may be contracted,
in general, in three different ways; in general,
a tensor of higher rank in as many ways as
there are pairs of indices. To contract a tensor with respect to two of its indices means to
take the sum of those components in which the
two selected indices have the same values; for
instance, bM^/Ox^ is a tensor of rank three;
its contraction with respect to the first and
the third indices is the sum
ZoMij/bxi; the
index j is allowed to take all the four values
1,2,3,4 so that we have four sums which are

considered as the components of a vector - the


divergence of the tensor U jj.
We may finally mention the operation of
multiplication which has been applied several
times in what precedes. The vector pu t * q t
has been obtained as a result of multiplication
of the vector u^ by the scalar p. The tensor
has been obtained by multiplying the vector
MJJ
q t by the vector \i^ (every component of
the
first by every component of the second - that
la why we have to use different indices - 1 and
J are supposed to take independently of each of
the values 1,2,3,4).
The operation of contraction Introduced
above will be performed very often; it is convenient, therefore, to simplify our notation;
this simplification consists in omitting the
symbol of summation, and in indicating that
summation takes place by using Greek letters
for indices with respect to which we sum. Thai
we shall write
3.5

1>pu a/bx a

=0

and

3.6

for 2.21 and 2.65 respectively. The first gives


an example of a divergence of a vector, the second of a divergence of a tensor.
The Greek index in the above formulas has
no numerical value; any other Greek letter would
do just as well; in this respect a Greek index
may be compared with the variable of Integration in a definite integral. The only case when
we have to pay some attention to the particular
Greek letters we are using is when two (or more)
summations occur in one expression - in such a
case different Greek letters have to be used
for every summation. If we have to write, for
8
example, (ZXiyi) using Greek indices we could
write it as (*aya ) 3 , but if we want to write
out the two factors instead of using the expox aya x ByB because
nent we have to write
would
have
meant zCr^)*.
)
^
(*o7a (Vo
The operation of contraction is used quite
often. The formation of the scalar product of
two vectors u^ and v i may be considered as resulting from their multiplication followed by
contraction; the multiplication gives the tensor of the second rank UjV*, and contracting
+ u 8 v a + u 3 v 3 , which is
this we get u ava =
the scalar product; the scalar product of two
vectors could be also called the contracted
In an analogous
product of the two vectors.
fashion we can form a contracted product of two
tensors of the second rank. If the tensors are
and
ajj
b^ the contracted product will be
** i3 also a tensor of rank two. It may
a iab
aj
be interesting to note that the formation of
the contracted product of two tensors is essentially the same operation as that of multiplying two determinants corresponding to the
arrays representing the tensors; to see that,
it will be enough to consider two three row

u^

determinants

and

a ia

a 33

Their product, according to theorem of multiplication of determinants la

and It Is seen that the elements of this determinant are the components of the tensor of rank
two which arises from the tensor
and by by
a^j
first multiplying them and then contracting with
respect to the two inside indices.
We could also speak of the contracted square
of a tensor, meaning by this the contracted
product of a tensor with itself.

shown In section 1 it follows under assumption


of additivity that for a field produced by any
number of particles the divergence vanishes,
4.1

bX/bx + bY/by + 6Z/oz

0,

and the quantities oY/oz - bZ/by, oZ/bx -bX/bz,


oX/oy - bY/bx also vanish; a static magnetic
field does not interact with the electric field,
but when a changing magnetic field IB present
the laws of the electric field are modified;
viz., the quantities Just mentioned are not
zero any more but are proportional or, in appropriately chosen units, equal to the time derivatives of the components L,M,H of the magnetic field, so that we have, in addition

4.11

oY/oz - bZ/by = oL/ot


oZ/6x - oX/6z = oM/dt
bX/by - bY/bx = oN/ot,

In a similar fashion, the divergence


the magnetic force vanishes,
4.

of

Maxwell's Equations.
4.2

In section 1 we discussed from a formal


point of view the inverse square law and the
fields of a more general nature which can be
derived from it; and we expressed the laws of
these fields in terms of three-dimensional tensor analysis, i.e., we employed only three independent variables; after that we found that
matter is best discussed (from the continuous
point of view) by using four-dimensional tensor analysis. We have thus a discrepancy:
two
different mathematical tools are used in the
treatment of the two sides of the fundamental

equations of mechanics. This discrepancy will


be removed In what follows, it will be removed
by considering force fields that differ from
those derived by composition of inverse square
laws, by modifying in a sense this law; but the
modifications will be different In the two cases
in which the inverse square law has been applied
in older physics, the two cases which we are
going to mention now.
Originally, the inverse square law was introduced in the time of Newton in application to
gravitational forces; we shall discuss in chapter V the gravitational phenomena, and see what
modifications - radical in nature,
but very
slight as far as numerical values are concerned
- the inverse square law will suffer.
Later, it
has been recognized that the inverse square law
applies also to the electrostatic and magnetostatic fields produced by one single electric,
Still later a more
resp. magnetic particle.
general law for electromagnetic fields has been
introduced by Faraday and Maxwell,
which we
shall have to study now.
If X,Y,Z denote the components of electric
force in the static symmetric case, as we just
as
said, the inverse square law applies, and,

bL/bx + bM/by +bH/bz = 0,

and the expressions bM/bz - oH/oy, bH/bx - oL/oz,


oL/oy - oM/ox are proportional to the time derivatives of the electric components, the factor of proportionality, however, cannot be reduced to one; by an appropriate choice of units
it can be reduced to minus one, and no changing
of directions or sense of coordinate axes can
permit us to get rid of this minus sign without
introducing a minus sign in the preceding equations; this minus sign is of extreme importance
in what follows, as we shall have occasion to
observe many times; in the meantime we write
out the remaining equations

4.21

bM/bz
oN/bx
5L/by

dN/oy = -6X/bt
bL/bz
-bY/bt
bM/bx = -bZ/bt,
=

Several remarks must be made here concerning these equations, which will be referred to
as Maxwell's equations.
In the first place, these equations cannot
be proved; they have to be regarded as the fundamental equations of a mathematical theory, whose
Justification lies in the fact that its quantities have been successfully identified with measured quantities of Physics, in the sense that
for physical quantities the same relationships
have been established experimentally as those
deduced for the corresponding theoretical quantities from the fundamental equations. In the
second place, the equations as they appear above
present a simplified and Idealized fora of the
fundamental equations, namely the fundamental
equations ft>r the case of free space, i.e., complete absence of matter.

In the third place the choice of unit*


which made the above simple form possible concerns not only units of electric and magnetic
it
force, but also units of length and time;
was necessary to choose them in such a way that
the velocity of light, which in ordinary units
is 300,000 kilometers per second becomes one.
As a result of this ordinary velocity, those we
observe in everyday life are expressed by very
small quantities.
For our purposes it is convenient to arrange our equations in the following form, where
differentiation with respect to a variable is
Indicated by a subscript,

4.3

N - Mz
y
Lz ~ N x
Mx - L y
*x + *

Xt =
Yt =

- Z
=
t
+ Z =
z

Yx

=
=
+ Nt =
= 0.
+
N,

As mentioned before the above equations


describe the behavior of electric and magnetic
forces in free space, that is in regions where
there is no matter, or where we may neglect matOn the discrete theory of matter these
ter.
equations still hold everywhere except at points
occupied by matter - in this theory matter appears as singularity of the field and some numerical characteristics of matter, such as electric charge, appear as residues corresponding to
these singularities. We shall not discuss this
point of view, although mathematically it is
very interesting. On the continuous theory of
matter some terms which represent matter have
to be added to the preceding equations. The second set of Maxwell's equations (4.3) remains unthe
altered, but the first set is modified;
loft hand sides do not vanish any more but are
proportional to the velocity components of matter; the coefficient of proportionality is electric density which we denote by e .
The equations of Maxwell for space with matter are thus

4.31

Ny - M s
Mz - Ly
Xx +

Yy

- It =

eu

- ** " 6V
- Zt = ew
+ Z = e
f

Zy

- Y

4.4

We come thus across a new scalar quantity - electric density. However, in most cases this density is proportional to mass density p we have
considered before, the factor of proportionality
being capable of only two numerical values - one
negative for negative electricity, and the other
positive for positive electricity.
Even these equations are not sufficient for
the description of electromagnetic phenomena;
they correspond to a certain idealization in
which the dielectric constant and magnetic permeability are neglected, but we shall not go
beyond this idealization.
In the above equations we have four independent variables x,y,z,t, as in the discussion

X = F 4i ,

Y = F 4t ,

Z = F 4 ,,

L = F ts ,

M = F 31 ,

4.5

using these notations together with 4.4 accord


ing to which, for example, F i4 = -X, we can
write the first set 4.3 in tha highly satisfac
tory form,

^F 14 /6x4 =
OF 18 /OX 8 + OF 13 /OX 3
oF 8 3/0*3 + "oF ai /Ox x + oF 14 A>x4 =
oF 31 /OXj. + "oFst/OX. + OF34/OX* =
"oF^/oXi + oF 4t /Ox t + oF 43 /ox\

4.6

Lt^

0;

in fact, the elements in the main diagonal correspond to equal indices; if we set J = i the
above formula becomes FJJ_ +
0,
as asserted. We try now to Idenwhence FJH =
tify the components of this tensor with our
electromagnetic force components in tha following way:

=
Xy + Nt
L x + My + N a = 0,

Yx

of matter in section 2, and we may try now to


have
apply to them the same notations which
bean introduced in that section and section
3.
The main question bar* 1st
how to traat
the six quantities X,Y,Z,L,M,H?
Tha question
was solved by Mlnkowski in 1907 in tha followit is clear that a vector has too faw
ing way:
components to take care of these quantities; instead of using two rectors, Minkowski proposed
to use a tensor of rank two; of course,
a tensor has too many components; to ba axact,
It
has, in the general case, 16 components - four
in the main diagonal, six above, and six below;
we set those in the main diagonal zero, and
those under the main diagonal equal with opposite sign to those above the main diagonal symmetric to them; in this way we are left with
six essentially different components;
tha restriction just introduced is expressed in one
formula

or

= 0.

These four equations show a high degree of


symmetry; moreover they show a very pronounced
similarity to some of the equations we have been
considering in section 2, and for which wa prepared a mathematical theory in section 3; we can
say that the four equations written above express the fact that the divergence of the ten.sor Fji just introduced vanishes in the case of
free space. However, if wa apply tha same notations to the second set 4.3 of Maxwell's aquations nothing very simple comes out; tha minus
sign mentioned after formula 4.2 above seems to
cause trouble; but there exists a way out also
from this difficulty; it mas been indicated (before Minkowski 's paper) in tha work of Polncare
and Marcolongo, and foreshadowed in a private

We can
letter by Hamilton as early as 1845.
overcome the difficulty if we allow ourselves
to use Imaginary quantities side by side with
real quantities - this ought to cause no worry
provided we know the formal rules of operations,
since our new notations are of an entirely formal nature anyway; we set now Instead of .5
4.7

xx ,

It

x,,

and Instead of 4.5


,

1Y = F 4t ,

L = F.,,

M = F, x ,

IX = F 41

4.72

iZ = F 4 ,,
N = Fj.,

and then the first set (4.3) becomes (4.5)


as
before, but the second set (4.3) also acquires
a highly satisfactory form, namely

JJ34

4.61

ox,

0*41
ox 3

4.9

114

* i

instead of 1, as in .41, and everything will


be all right, as far as the left hand sides of
the equations are concerned, except that the
left hand side of the continuity equation becomes imaginary. This, however, does not matter since the right hand sides of the equations,
we temporarily disregard.
And now we may consider the Maxwell equations with matter 4.31; the second set is not
affected and may be written as 4.61, but the
first will appear in a form which may be written simply as
4.62

The equation of continuity of matter Is s


consequence of these equations; to obtain it
differentiate, contract and use the property
= ~y
F
tne result gives the continuity equij
ji
ation of matter if we take into account that
p/e is a constant.

ox a

posal the quantities u x , u g , u, u+ we can arrange it so as to off let the 1 in the x. In


fact, since differentiation with respect to time
occurs always in the presence of an u* It is
enough to set

or
5.

As mentioned before we consider the components Fj, as the components of a tensor. We may
say that we have sixteen of them; the six which
appear in the relations 4.78, six more which result from them by interchanging the indices and
whose values differ from those given in 4.72 only in sign, and four more with equal indices.
According to the formula (4.4) they are zero.
We may arrange them in a square array as follows:

4.8

F ia
F8 a

j.3

Fas
Fas
F 43

F 14
F a4
F 34
F 44

-NO
M
IX

-L
1Y

-M
L

-IX

-H
-iZ

iZ

= -F
We may compare the property F
ij
which our new tensor has with the property
=
of matter (2.7)
MJJ
Mji possessed by the tensor
ana which is simply the result of commutativity
of multiplication. These two properties are
manifested in the square arrays (3.1 and 4.8) in
that the components of M^i which are symmetric
with respect to the main diagonal are equal, and
those of FJLJ which are symmetric with respect to
the main diagonal are opposite.
Tensors of the
first type are called symmetric, those of the
second - antisymmetric .
We want to see now whether the notations
which permitted us to write in a nice form Maxwell 's equations would not spoil the nice form
which we previously gave to the hydrodynamical
equations. But since here we have at our dis-

The Stress-Energy Tensor.

So far the equations to which we subjected


force components have been linear equations,
whereas operations performed on matter involved
the
squares and products of matter components;
similarity which we observed in the mathematical
aspects of force and matter components makes it
seem desirable to subject force components to
operations analogous to those which we applied
to matter, viz., multiplication.
In the static case we return to our notations (1.72) X x , X a , I, for X,Y,Z and form the
this tensor, slightly modified
tensor X
Xj;
plays an important part in the theory; the modification consists in subtracting from it
of the
a^ijXgXd where ftjj are the components
tensor introduced in 3.4 and XgXg stands for
the contracted square of the vector X A , I.e.,
X* + Y* + Z*. We consider then the tensor

whose array is
J(X

- Y* - Z*)

XY

XZ

i(Y

12

-X*-Z t )

TZ

IZ

-X 1 -Y 1 ).

Of this tensor we fora the divergence (threedimensional), and find as its components

10
Z,) + Y(X 7 - Y x ) + Z(X, - Zz),
Y(X X + Yy + Z.) +- Z(Y. - Zy),
Y(Z y - Y.) * Z(X X -f Yy + Z.);

X(X X + Yy
X(Y X - Xy )
X(Z X - X,)

the connection of these expressions with the


Newtonian equations (1.51, 1.52) is obvious;
the expressions in brackets are the left hand
sides of the Newtonian equation, so that the
divergence of our new tensor Xj.Xj - 0^X3X3
vanishes as the result of Newtonian equations.
This again confirms us in our opinion that from
the mathematical point of view force and matter
components are of very similar nature. We have
now in mind electric and magnetic forces; if X,
Y,Z are the components of the electric force
vector the tensor whose array is written out
above is called the "electric stress tensor" j
an analogous expression in magnetic components
is called the "magnetic stress tensor"; the sum
of the two, namely
-Y8 -z a -Ma -H8 )

XY+LM

H+LM

i(I

8 +!!l 8

XZ+LN

-X a-Z 8 -L8 -&a )

xz+iJi

YZ-Hffl

TZ4MN

is called the
"electromagnetic stress tensor";
it has been introduced by Maxwell and plays some
part in electromagnetic theory, for instance, in
the discussion of light pressure; but its main
applications and Importance seem to be in the
study of the fundamental questions, as part of
a more general four-dimensional tensor.
We saw how nicely the system of Indices
worked in the case of Maxwell's equations; it
is natural to express in index notation this
tensor also. We assert that the required expression is given by

values 1,2,3 for i and J. It la interesting to


see what comes out if we let i and j take the
value 4. We get four new components, namely,
3 14 =

*4

= F.
= F tt F l4 * F.,F, 4
p Fp 4

iJ

F F
ij op po

where i,j take on values 1,8,3, and the summations indicated by p and o are extended from 1
to 4.
In fact,
+Fi 4 F4i

= E(-L 8 - a -N 8 +X 8
FipFpx

+I 8 +

Z 8 ),

= FaaF,i +FjaF 31 +F14 F 4 i = -N* - IIs +


I*,

- fc 8 - *T 8 - |Z 8 =

i(LZ - IX),

B.4 - F.pFp* = F, X F X4 * F,,F, 4 - l(MX - LI),

E 44 * F 4 pFp4 - i(-L 8 - M" = i(X 8 + Y 8 + Z 8 * L 8

+ X"

M8

I" + Z)
).

These quantities happen also to have physical meaning. The first three constitute (except for the factor i) the components of the
so-called Poynting vector, and the last on* is
the so-called electromagnetic energy (or, enerWe are thus led by the notations
gy density) .
we have introduced in a purely mathematical way
to some physical quantities;
we may say that
the entire tensor with its sixteen components it is called "the electromagnetic stress-energy
tensor" - unifies in a single expression all
the second degree quantities appearing in the
electromagnetic theory; the stress components,
the Poynting components, and the energy.
The stress-energy tensor may be written
out in the form of the following square array:

Xa + L 8 - h
5.2

XY + LM
XZ + LN

i(NY-MZ)
5.3

XY + LM
Y 8 + M8- h
YZ + MN
i(LZ-NX)

XZ + LN
YZ + MN
Z8

+N a -h

i(BY - MZ)
i(LZ - II)
i(MX - LY)

i(MX-LY)

where h = J(X 8 + Y 8 + Z 8 + L 8 + M 8 + I 8 ).

6.

5.1

= F ,F, 4 + F ,F, 4 X
X
1(HI - HZ),

FipFp*

General Equations of Motion.


The Complete Tensor.

We let ourselves be guided once more by


what seems to be natural from the formal point
of view, and form the divergence (four-dimensional divergence) of the new tensor. This can
be done either in components or in index notation. We show how to do it the latter way, and
leave it to the reader to write out the stressenergy tensor as an array and to font the divergence of the separate lines. Applying formula 3.6 to the tensor 5.1 we have

L8 - I 8 - Z 8 - M 8 - H8 )

and similarly for the other components corresponding to different combinations of the indices
1,2,3 for 1 and J. There seems to be an inconsistency here; the summation indices p and o we
let run from 1 to 4, but we consider only the

the first term on the right may be split up in


two equal parts, one of which, writing P f or Y ,
*** the otner
may be .written as

^F

writing a for y and P for a > and interchanging


Indices in both factors, which does not affect
the value because it amounts to changing the
We thus
?ftkF.
sign twice, takes the form

i*olc^

Substituting for the second factors their


ues from Maxwell's equations 4.61 and 4.62
space with matter) we get

If

'.

or in components without indices

6.1
"oE,

val-

(in

12

Chapter II.

NEW GEOMETRY
In the preceding chapter we achieved by Introducing appropriate notations a great simplicThe noity and uniformity in our formulas.
tations in which indices take the values from 1
to 4 are modeled after those previously
introduced in ordinary geometry, the two points of
distinction being first that we have four independent variables instead of the three coordinates, and second, that the fourth variable la
assigned imaginary values. In spite of these
distinctions the analogy with ordinary geometry
is very great, and we shall profit very much by
pushing this analogy as far as possible, and
using geometrical language, as well as notations modeled after those of geometry.
Physics seems to require then, a mathematical theory analogous to geometry and differing
from it only in that it must contain four coordinates, one of which is imaginary. The first
purpose of this chapter will be to build a theory to these specifications. The remaining part
of the chapter will be devoted to a more systematic treatment of tensor analysis.

7.

Analytic Geometry of Four Dimensions.

In the present section we shall give a


brief outline of properties which we may expect from a four-dimensional geometry guided
by analogy with two and three-dimensional geometries j of course, we shall lay stress mainly
on those features which we shall need for the
application to Physics that we have in mind. In
this outline we shall disregard the fact that
our fourth coordinate must be imaginary; certain peculiarities connected with this circumstance will be treated in section 10.
The equations of a straight line we expect to be written in the form

given a., v x ) the coordinate* of all the different points of the line. The variable p if
called the parameter, and this whole way of describing a line is called "parametric representation". Parametric representation is by no
means peculiar to four dimensions, it may be,
and is, of ten used in plane and solid analytic
geometry. We present it here because we shall
need it later, and it is not always sufficiently emphasized.
A straight line is determined by two
points; the equations of the line through the
points ai and bi is given by the above equations (7.1) in which
7.2

= b x

v a = bt - a a ,

etc.

Two points determine a directed segment or


vector, whose components are the differences
between the corresponding coordinates of the
points, so that v^ is the component of the vector whose initial point is given by a^ and
whose final point is given by b x .
A vector determined by two points of a
straight line is said to belong to that line,
and we may say that we can use as denominators
in the equations 7.1, or as coefficients of p
in 7.11 the components of any vector belonging
to the straight line.
Two vectors are considered equal if they
have equal components; a vector is multiplied
by a number by multiplying its components by
that number, and two vectors are added by adding their corresponding components.
Two lines are parallel if they contain
equal vectors, and it is easy to see that a
condition for parallelism of line 7.11 and
_ X a -A 8 _ X 3 -A 3 _
V8
V,

7.3

is

7.1

vi/ vi = v a /V 8 = v a /V 3 = v4 A* or Vj = avx

entirely similar to that used in solid analytic


geometry; but we may also use another form; as
written out the equations state that for every
point of the line the four ratios have the same
value; denoting this value by p we may express
the condition that a point belongs to the line
by stating that its coordinates may be written

where a is a number so that proportionality of


components of two vectors means parallelism.
A condition for perpendicularity of these
two lines we expect to be the vanishing of the
expression

as

7.4
x 2 = a a +pv 8 ,

7.11
*a = a 3 +pv 3 ,

x 4 = a 4 +pv4 ;

giving here to p different values we obtain (for

YJ.VX

+ v 8 V a + v s V, + v4 V4

which is called the scalar product of the two


vectors v^ and V .
The distance between the points a x , a a , a 3 ,
a 4 and bj. , b a , b 3 , b 4 is given by the square
root of the expression

18
7.41
-b a ) + (a s -b,)

(a 4 -b 4 )j

this distance Is also considered as the length


of the vector joining a and bj,. The expression for the square of the length of a vector
may be considered as a special case of the expression 7.4. We may say then, that the square
of the length of a vector is the square of the
vector, i.e., the product of the vector with
itself.
We shall often use Roman letters to denote
The scalar product of the vectors x
vectors.
and y will be denoted by x.y and the square of
8
the vector x by x .
A vector whose length, or whose square, is
unity we shall call a unit vector. Its components, we would expect, may be considered as
the direction cosines of the line on which the
vector lies. We also expect that the scalar
product of two vectors is equal to the product
of their lengths times the cosine of the angle
between them; but, of course, an angle between
two vectors in four-dimensional space has not
been defined, so that we could simply define
the angle between two vectors by this property,
or by the formula

7.5

cos

<p

But if we want the angle to be a real quantity


the absolute value of this expression cannot exceed unity, or
t

a
a
(x.y)* ^ x .y

or

(x.y)*-

x 8 y 8 - 0.

If we form the vector Xx + yy where X and M- are


two numbers, the square of that vector would be

X 8 x2
and the above inequality, which expresses the
fact that the discriminant of this expression is
negative is seen to be a consequence of the assumption that a square of a vector is never negative.
A plane we would expect to be determined by
three points not in a line, or by two vectors
with the same initial point or by two lines
through a point. Instead of characterizing a
plane by equations we prefer to give it in parametric form; limiting ourselves to a plane
through the origin we have

where ai and bi are the coordinates of two


points in the plane or the components of two
vectors of the plane whose initial points may
be considered as at the origin. We shall write
this formula also as

where x, a, b stand for vectors whose components are x if a^, b^ and where we use Greek
letters for parameters in order to avoid confusion with vectors which we denote now by Roman letters.
We always can choose two mutually perpendicular unit vectors as the two vectors determining a plane; if we call these vectors 1 and
J the preceding formula becomes

7.6

xa

or

i.J = 0,

-1.

It Is easy to see that in this case a and


3
are the projections of x on the directions of
1 and J, or the scalar products of x with 1 and
j respectively.
Every pair of coordinate axes determines a
plane and since six pairs can be formed from
four objects we have six coordinate planes.
In the same way that the direction of a
straight line is determined by a configuration
of two points on it - a vector, the "orientation" of a plane may be determined by the configuration of three points on it - a triangle.
A vector is given by its components, which are
the lengths of its projections on the coordinate axes; in the same way a triangle may be
characterized (to a certain extent) by the areas
of its projections on the coordinate planes. If,
for example, we take a triangle one of whose
vertices is at the origin 0,0,0,0 and the two
others at the points Xi and y respectively, the
areas of the projections will be the six quantities

It is interesting to compare these numbers,


which satisfy the relation
=

= pa a + qb g ,

7.18

x s = pa 3 + qb a ,

oi + pj.

The fact that 1 and J are perpendicular


unit vectors may be written as

F,

qbi,

aa + pb

x 4 = pa4 + qb 4 ,

with the components of the tensor P t


(compare 4.4), three of which have been identified
with the electric, and the other three with the
magnetic force components. Our ten fundamental
quantities pu, pv, pw, p, X, Y., Z, L, M, H, sees
to allow thus a geometrical Interpretation; the

14

first four are considered as the four projections of a part of a straight line on the coordinate axes, the remaining six, as the projections of a part of a plane on the coordinate
planes.
A little against our expectations, however,
these six quantities FIJ are not independent;
the reader will easily verify, using the above
expressions, that
7.7

l.J

J.k

k.l

J" - k* - 1.

0;

For all possible values of a, e, r


obtain all the points of the solid through the
origin determined by the vectors i,J,k.
Mext we consider a configuration determined by five points not In a solid; we obtain all
the points of our four-dimensional space.
Incidentally, as a generalization of the formulas
7.6 and 7.61 we may write now

0.

7.62
We have here a relation that exists in the mathematical theory; at once the question arises:
does a corresponding relation hold for the corresponding quantities in the physical theory;
according to the formulas 4.5 this would mean

L.X + M.Y + N.Z

0,

i.e., perpendicularity of the electric and magnetic force vectors; these vectors are, however,
known not to be necessarily perpendicular to
each other; our Identification is therefore
faulty; a slight modification would, however,
help to overcome the difficulty; if instead of
considering the areas of projections of a triangular contour, we consider the areas of projections of an arbitrary contour, not necessarily a flat one, then the six quantities are independent and the formal analogy holds perfectly.

Returning to the plane we may mention that


although it might seem strange at first glance,
we should expect that two planes may have only
one point in common - a situation which never
occurs in three dimensions. An example of two
planes with only one common point is given by
the x x x a and the x a x 4 coordinate planes;
the
common point is, of course, the origin.
Four points not in a plane, or three vectors with a common origin, we expect to determine a "solid" which may be defined as the totality of points of three kinds:
(1) points on
the lines determined by the given vectors;
() points on lines Joining two points of the
first kind; and (S) points of lines Joining two
points of the second kind. In ordinary geometry a configuration defined In this way exhausts
all points, but not so in our four-dimensional
geometry; as examples may serve the four coordinate solids, the totalities respectively of
points satisfying the relations x x = 0, x t - 0,
XB = 0, x 4 = 0.
A parametric representation of a solid is
analogous to that of a plane. For a solid
through the origin we have as such parametric
representation

al +

6f;

this formula gives the expression of every vector with initial point at the origin in terms
of four mutually perpendicular unit vectors. We
have, of course,
7.8

i.J
k.

8.

0,

= k

Axioms of Four-Dimensional Geometry.

Until now we have been listing some propositions that we may expect to have in four-dimensional geometry. But what is. four-dimensional geometry? As an abstract mathematical theory it is Just a collection of statements of
which some may be taken without proof and considered as axioms and definitions, and the others are deduced from them as theorems.
It is
not difficult then to pass from our expectations
of a four-dimensional geometry to a realization
of such a geometry; all we would have to do
would be to pick out certain of the propositions
listed above and consider them as axioms and to
show that the others can be deduced from them.
But in so doing we do not want to include
among our axioms propositions involving coordinates. In two and three-dimensional geometry
we are accustomed to see analytic geometry based
on the study of elementary geometry, that precedes It. We have an idea of what a straight
line is before we come to coordinate axes, and
we choose three of these pre-existing straight
lines to play the part of coordinate axes.
We
may choose these axes to a certain extent arbitrarily (this arbitrariness being restricted
only by our desire to have a rectangular system) ; the coordinate axes play only an auxiliary
role, and it would be awkward to Include any
reference to a particular coordinate system in
the axioms. We look, therefore, among the propositions mentioned in the preceding section for
some that are Independent of a coordinate system
and from which we can reconstruct the whole system.

7.61

x = oi +

where i,J,k have again been chosen as perpendicular unit vectors so that

As our fundamental undefined conception we


choose "vector". The axioms that follow will
in the main be rules of operations on vectors.

15

AddiAxiom I. Every two vectors have a sum.


tion is commutative a + b = b + a, and associa= (a + b)
c; subtractive, i.e., a + (b + c)
for
is
tion
every two vectors a
unique, i.e.,
and b there exists one and only one vector x
such that a = b + x. It follows that there
that satisfies the relation
exists a vector
= a for every a.
a +

Axiom II. Given a number a and a vector a


there exists a vector aa or aa which is called
their product. The associative law holds in
the sense that a (pa) = (ap)a and also the distributive laws (a + p)a=aa + pa and a(a + b)
= a a + ab.

vector is called a unit vector if its length It


= 1.

Lemma. There exist four mutually perpendicular unit rectors.


Proof. According to Axiom IV there exist
four independent vectors a,b,c,d; i.e., such
vectors that there are no four numbers a, P, Y
0, not all zero such that aa + pb + YC + &d = 0.
It follows that a ^ 0, because If It were zero,
=
the numbers a=l, p = 0, Y = 0,
would
satisfy the above relation. Call a, multiplied
by the reciprocal of Its length, 1, so that
i = a/
a .
It is easy to see that 1 is a unit
vector, and that i,b,c,d are Independent. Consider the vector b = b - i(bi). It Is easy to
see that it is perpendicular to 1; multiply b
by the reciprocal of its length and call the
result J; then, i and J are two perpendicular
unit vectors and l,j,c are independent.
Call
= c - i(ci) - j(cj); this vector is perpenc
dicular to both i and j, and if we multiply it
by the reciprocal of its length and call the
result k, we have in i,J,k, three perpendicular
unit vectors; the fourth vector t may be obtained from d in an analogous way.
Lemma II. Given any vector x we have the
identity
|

Axiom III. To every two vectors a and b corresponds a number a.b or ab called their (scalScalar multiplication is commutaar) product.
=
a.b
it obeys together with multib.a;
tive,
plication of vectors by numbers the associative
law, a(a.b) = (aa.b); and, together with addition - the distributive law a.(b + c) = a.b +
a.c.

Before we formulate the next axiom we inthe vectors a,b,c,....


troduce the definition;
are called linearly dependent if there exist
numbers o, p, Y,.... not all zero, such that

8.1

aa

pb

+ YC + ...

x =

J(jx) + k(kx)

=0;

they are called linearly independent when


such numbers exist.

no

Proof. Since every five vectors are dependent


(Axiom IV), there exist five numbers o, p, y,
6 , e not all zero such that

Axiom V. If a vector is not zero its square is


positive.
To these axioms on vectors we have to add
some statements concerning points if we want to
have a geometry, and as such we may take:
Axiom VI. Every two points A,B have as their
difference a vector, h; or in formulas B - A = h,
B
A + h.
Axiom VII.
(A - B) + (B - C) = A - C.
The body of propositions which may be deduced from these axioms we call four-dimensional geometry.
In order to prove that the whole of geometry can be deduced from the propositions I-VII we
would have to actually deduce it. We shall not
do it, of course, but we shall indicate how analytic geometry can be arrived at in the following discussion, which is meant to be entirely
formal, i.e., during which it is not intended to
invoke our intuition but only the properties
stated in the axioms.
By length of a vector we mean the positive
square root of the product of the vector by itself: |a| = /a 1 .
Two vectors are considered perpendicular, if their scalar product is zero. A

ex

+ PJ +

Axiom IV. There are four independent vectors,


but there are no five Independent vectors.

=0;

here e cannot be zero since otherwise i,J,k, t


would be dependent and we know that they are
not. Dividing by -e we have
=

(-f)i

(-f)J

multiplying by i, the three last terms on the


right vanish as the result of perpendicularity
of i to J, k and t and the first term becomes
whereas the left hand side is ix; in the
same way we prove that -f, -, -f have the values Jx, kx, ^x, respectively, and lemma II is
proved.
A set of four perpendicular unit vectors
we shall call a set of coordinate vectors. We
shall call the quantities Xi = ix, x, = Jx,
x 3 = kx, X4 = x the components of x with retogether with a
spect to i,J,k,. A point
a coordinate
call
we
vectors
set of coordinate
we can ascoordinate
a
Given
system
system.
sign to every point X four coordinates in the
by x
following way: denote the vector X
(according to Axiom VI); and call the components of the vector x the coordinates of the
1
point X. If now we choose another origin O
cothe
and the same set of coordinate vectors,
ordinates of the point X will be the components
of X - 0'; but since, according to Axiom VII,

-,

16

- (X - 0) + (0 - 0), the old coordiX nates will be equal to the new coordinates plus
the old coordinates of the new origin, and thus
a connection is established with ordinary
analytic geometry.
Formula 8.1 may be compared with the formula 7.62. It may also be written as
x =

8.2

Xj.1

9.

+ x a j + x a k + x4 /

Tensor Analysis.

We want to substitute now for the preliminary definitions of tensor analysis that were
suggested by the formal developments in Chapter
At
I, a definition that is more satisfactory.
that time our point of view was simply that we
shall consider symbols with two (or more)
indices as the tensor components in a way similar
to that of using symbols with one index as vector components. But in case of vectors (in ordinary space) we know what vectors are, and we
consider the components as a method of representing that known thing. In the case of tensors we seem to have to take representation as
the starting point of our study.
The situation
seems complicated since the fact that there is
not one but that there are many different representations of the same vector, depending on
the coordinate system we use leads us to think
that the same general situation will obtain in
the case of tensors,, and the question arises nahow shall we be able to find out, givturally:
en two representations of a tensor, whether it
is the same tensor that is represented in the
two cases; or, given a representation of a tensor in one system of coordinates how to find
the representation of the same tensor in a given other system.
In order to be able to answer
such questions intelligently we want to introduce the idea of the tensor itself, to put it in
the foreground and to consider the components as
something secondary. In the beginning we shall
limit ourselves to the consideration of two dimensions.
We look then for some entity, of which the
components will be constituent parts. The first
thing that occurs to our minds in connection
with tensors of rank two is, of course, a determinant. It is a single number determined by its
elements, or components. However, it cannot be
used for our purposes because the determinant
does not, in turn, determine its components.
Another instance where two index symbols
occur in mathematics is the case of quadratic
forms.
The equation of a central conic may, for
a
example, be written in the form ax + 2bxy + cy*
= 1; or
introducing the notations x x for x, x a
for y, a
for a, a 12 and a 2 i for b, and a 22 for
c, in the form

Let us consider the left hand side of this equation.


Here the tensor components
aij are combined (together with the variables xi, x ) into
one expression; and they can, to a certain extent, be gotten back from that expression.
If
we set x x
1, x a
0, for instance, we get an
as the value of the expression; a aa can be gotten in a similar way, but it would be difficult
to imagine how a^ could be obtained; in fact,
it is Impossible to get a xa from this expression, because two expressions which differ in
their form but for which a u * a tl has the same
value would give the same values for all combinations of values of x x , x t . A slight generalization will, however, obviate this difficulThis generalization is suggested by the
ty.
equation of the tangent to the above conic and
can be written as
9.1

+ a t aXy a ;

here, for instance, setting x x = 1, x a = 0,


y i = > 7a = 1 *s get a ia
We shall therefore
consider the bilinear form above aj the tensor.
If we do that we may free ourselves of coordinates easily. The variables Xi, x a and y^ 9
y a may be considered as the components of two
vectors, and the above expression 9.1 furnishes
us then a numerical value every time these two
vectors are given; it may be considered as defining a function 9; the arguments of that
function are the two vectors and the values are
the numbers calculated by substituting the components of these vectors in the expression 9.1.
This functional dependence we may consider as
the tensor so that if we want to use another coordinate system we shall have the same vectors
given by different components x\ , x' a and y'j.,
y' a and we expect to find another expression of
the same type as 9.1 involving these new components, say
.

9 =

a' ai x' a y'i

9.11

which would assign the same values to the two


vectors. The coefficients will, of course, be
different, and these new coefficients we shall
consider as the new components of the sane tensor in the new coordinate system.
Let us perform the calculation. If we rotate our axes through an angle f the old coordinates are expressed in the new coordinates by
the formulas
9.2

- X' a S,

a c,

where

a al x 8 Xj.

a ai x a yx

9.3

c = cos

s =

the components of the other vector

sin 9;
yj. ,

y a will

17

be expressed by analogous formulas in terms of


Substithe new components of the same vector.
tuting these expressions In the above bilinear
form (9.1) we get

dinate systems yet. We want therefore to define linearity independently of coordinate representation and we shall see that the following
definition, entirely Independent of coordinates
leads to the same results.
We say, In general, that ? (x) depends on
its argument x linearly if
9.6

which may be written as 9.11 if we give to


the values
a 12 cs + a 8 isc + a a8 s

+ a ia c a - a^. s 8 + a aa sc

ia

9.21
a1

8i

= -a lx sc - a ia s 8 + a ai c

a sg cs

a
8
a'as 3 ajj.8 - a lg sc - a ai cs + a 88 c .

These are the new components of the tensor


whose old components are the a^. The equation
of the conic section in the new coordinates has
the same form as in the old system. We may say
that 9.11 expresses the same functional dependence on the two vectors using their new components, as 9.1 does using their old components.
The components of a tensor change, in general, when we pass from one coordinate system to
another but there are certain combinations which
do not change; for instance, if we add together
the first and the last of the four above equalities we obtain, taking into account that

It is easy to see that linearity defined


in terms of coordinates as dependence involving
only first powers of the components satisfies
this condition. We arrive thus at the following definition of a tensor:
k tensor of rank r is a_ function which assigns to _r vector arguments numerical values
the dependence of the value on each argument
being linear in the sense of 9.6.
We can prove that an expression of a tensor as a bilinear (or multilinear) fora may be
In
gotten back from this general definition.
fact, if given, e.g., a tensor of rank two, i.e.,
with two vector arguments ?(x,y) we substitute
for x and y their expressions in terms of components and unit vectors (see 7.6)
Jx a

J7i;

we may write, using the above definition


tensor and that of linearity (9.6):

+ s8 = 1

9.4

a'n

of a

e(J,J)x a

the relation

9.5

<p(Xx

+ a' 8a = a 1]L + a aa ;

also it is easy to prove that the expression

an
9.51
a al

is not affected by the substitution


of the
primed components for the unprimed ones.
Expressions of this kind are called invariants.
We have thus achieved our purpose; although
we use coordinates in the definition of a tensor
the result is independent of the particular coordinate system used. We can go a step farther,
however, and dispense with coordinates altogethNot every
er in the definition of a tensor.
functional dependence of a number <p on two vectors we shall call a tensor, the dependence on
each vector must be linear (and homogeneous); by
this we mean that the expression involves only
first powers of the components of each vector,
and no products of components of the same vector; our conception of linearity seems thus to
involve components and we are not rid of coor.

7.

We see that this expression differs from


that given by 9.1 as a bilinear form only in
that (!,!), <p(i,j),
(J,i)> *OJ) appear instead of a lx , a 18 , a ai , and a aa . Prom this
point of view the conception of a tensor is entirely independent of a coordinate system and
of components. We obtain tensor components
when we introduce a set of coordinate vectors;
and transformation of coordinates corresponds
to replacing of one set of coordinate vectors
by a new set.
We pass now to the consideration of operations on tensors. We had threa such operations:
multiplication, contraction and differentiation.
Multiplication is simple. If we have two
tensors, say f(x,y) and g(z,u,v) we obtain a
tensor of rank five by multiplying these two
together
h(x,y,z,u,v)

f(x,y).g(z,u,v);

it is easy to see that the components of h are


obtained from the components of f and g in the
following fashion

18

Next comes contraction. We have defined It


In Indices; I.e., given the components of a tensor of rank two In a certain coordinate system
we have a definite rule for obtaining a scalar,
will we
viz., an + a aa ; the question arises:
obtain the same scalar If we use another system
of coordinates, In other words, Is the definition Independent of the system of coordinates,
Is It Invariant? Yes, this invar iance has been
proved above by formula 9.5. We have now the
right to use the definition of contraction in
terms of components, knowing that it has an intrinsic meaning, that the result is independent
of the system of coordinates used.
We are in a position now to answer a question that must have arisen in the mind of the
In the preceding chapter we agreed to
reader.
consider a vector as a tensor of rank one. Here
with our new definition of tensor a vector and
a tensor of rank one seem to be two entirely
different things; but we may consider together
with every vector a tensor of rank one which
has the same components.
To find a tensor f(x)
which has the same components as the vector v
we have to make f(i) = vx , f(j) = v s , and we
have
f(x) = Xif (1) + X 8 f (J) =

X8 V8

so that the value of this tensor f (x)


is simply
the scalar product of the vector to which it
corresponds by the vector argument.
Incidentally, this raises a question as to
the nature of the scalar product; if we define
it as x av o is it invariant? It may be considered as resulting from two tensors of rank one by
first multiplying them and then contracting the
resulting tensor of rank two.
We also might at this place say a few words
about the symbols of Kronecker Cj*. We may try
to consider these symbols as the components of
a tensor in some coordinate system.
The value
of the tensor will then be given by
and
this is easily seen to be xaya , the scalar product of the vectors x and y and thus independent
of the coordinate system. We may then speak of
the tensor Ojj without mentioning the coordinate
system because its components are the same in
all coordinate systems.
The square of a vector may be defined as
the scalar product of a vector with itself, it
is the sum of the squares of the components in
any coordinate system.
We next take up the operation of differentiation. Let us begin with a scalar field f j f
is a function of the coordinates which we do not
put in evidence. The coordinates of a point P
are the components of the vector OP which Joins
the origin to the point in question and they depend in the fashion discussed before on the
choice of the coordinate vectors.
After choosing a definite coordinate system
we may assign to f in every point a vector by

O^x^

agreeing that the


should be

of

components

9.71

and

this

vector

T, -'of/ox,;

given another coordinate system the relation between X and x'i being given by formulas 9.,
we can form the derivatives
9.72

"of/ox 1 !

and

"of /ox 1 8

and consider them as the components in the new


coordinate system of a vector.
The question
arises whether this will be the same vector as
the one introduced above.
In order to settle this question let us
see what the components of the vector whose
components in the old system were vi should be
in the new coordinate system. According to
formulas 9.2 they are
v 1,* = v x c + v a s = "of/ox^c
v'

-f

"of/ox^.s

= -vx s + vt c = -

On the other hand

^f/OX

= ^f/OXi.^Xi/bX 1 ! +

W/OX 8 .'bX 8 /OX

and since
^Xi/ox'i = c,

we find that v'i = df/dx'i , and, in the sane


way we find that v 8 = *>f/ox' 8 which shows
that the components 9.71 and 9.72 above, are
the components in the two coordinate systems
considered of one and the same vector.
We
proved then that the operation of obtaining a
vector by taking as its components the derivatives of a scalar with respect to the coordinate axes is independent of the particular system of coordinates used, that means -this operation is invariant.
Before passing to differentiation of a
tensor of rank higher than zero we may note
that the components v x , v a we obtained may be
considered as the components of a tensor of
rank one; denoting the components of the argument vector by h x , h t we hare as the values of
this tensor
"ftf/bxi.hx

W/ox 8 .h 8 ;

this reminds us of a differential and suggests


to write dxj. for h^ and dx 8 for h a ;
we have
then the formula

df =

"of/ox a .dx a ,

which leads to the interpretation of the differential as a tensor of rank one whose components
are the derivatives of the given function.

19

We next consider a tensor of rank one whose


components in the old system are fi, these components being functions of coordinates. Differentiating with respect to xj we get

ofa/ox a ;
can we consider these as the components of some
tensor of rank two? In other words, if we define a tensor by saying that in the old system
it has these components will its components in
the new system be obtained by differentiating
with respect to the new coordinates of the new
components f 'i of the given tensor? A calculation analogous to the one preceding will convince us that this is so.
We have thus introduced an operation which
leads from a tensor of rank zero to one of rank
one, and from a tensor of rank one to one of
rank two, and we could, continuing in the same
way pass from any tensor to one of the next
In introducing this operation we
higher rank.
used components of tensors and coordinates of
points, but we proved that the result is the
same no matter what particular coordinate system we might have used; the operation of differ
entiation is thus independent of a coordinate
system.
After this detailed treatment of tensors
and operations on them in plane geometry it
will not be difficult to generalize to higher
dimensions. We consider first three-dimensional space - solid geometry - and begin with an
equation of a central quadric surface.
Using
notations similar to those introduced at the be
ginning of this section it may be written as

9.8

+ a23.Xj.Xa + a 88 x 8 x 2
+ a 3 iX 3 Xj. + 832X3X2 +

33X3X3 = 1,

or, using our notations for summation with Greek


indices introduced in Section 3, as
a

x x

po p o*

For the same reason as before (in the case of a


conic) namely, because not all coefficients of
such an expression can be obtained as its values, we introduce a slightly more general expression

9.81

a,

as the tensor; giving in it to the variables the


values xi = 8 i, y = 3i (see definition of the
symbol 6 under 3.4) we obtain, for instance, the

coefficient a 23 .
In coordinateless notation we were
pendent of the number of dimensions to

indebegin

with, so that we may take over the definition


of linearity 9.6 and the definition of tensor
following it word for word. In order to effectuate the transition from vector notation
to coordinate notation we write now, instead of
x = x x i + x a j, x = xpip, and substituting this
and an analogous expression for y into t(x,y)
we get using 9.6

*(x p ip,
The notation

brings us back to formula 9.91. There is no


difficulty about tensors of higher ranks; quantities with three indices give rise to trilinear forms, e.g.,

those with four indices - to quadrilinear forms,


The definitions of multiplication, contraction and differentiation hardly present any
difficulty, but we shall devote some time to
the question of transformation of coordinates
for three and four-dimensions.
In solid analytic geometry the question is usually treated by
introducing formulas involving all coordinates
at the same titae, i.e., formulas periaitting to
pass at once from one system of coordinates to
any other with the same origin; these formulas
are quite complicated, they involve nine constants which are not independent, but are connected by six relations, and the corresponding
thing for four dimensions would be still more
unwieldy; we could handle it by introducing index notations, but we prefer another method. We
pass from one system of coordinates to another
gradually, in steps, each step involving only
two of the coordinates - and one constant - the
angle through which we rotate in the corresponding plane. Three such steps are enough to pass
from any system to any other in three dimenFor example, we may first perform a rosions.
tation in the xy-plane which brings the x-axis
into the new xy-plane; then a rotation in the
so obtained yz-plane bringing the y-axis into
the new xy-plane, and finally we rotate the so
obtained x and y axes until they coincide with
the new x and y axes.
The advantage of this point of view will
be seen from the following proof of the invariance of the operation of contraction in three
dimensions. Given a tensor of rank two by its
the reetc.,
components an, a ia > a 13, a 81 ,
sult of contraction, according to our definition in Section 3, is
etc.

pp

a 22 + a 33

If we pass to another coordinate system the


components will be changed into some components

20
and the result of contraction will be
PP

'11

22

SJ

in order to prove that contraction has an intrinsic meaning, independent of the system of
coordinates we have to prove that the last two
expressions are equal. If the transformation
involves only the x x and x a coordinates,
but
does not involve x a , then a' 33 which is the coefficient of x'sy's will be a 33 which is the
coefficient of x 3 y 3 , because x 3 = x 3 , y'3=y 3 ,
and the other coordinates do not depend on x
and y 3 ; the coefficients a'n, a' 22 , on the
other hand will be transformed by the same formulas (9.21) as in the two-dimensional case because xi, x a , yi> Yz are transformed by the
same formulas (9.?) as before.
Therefore, formula 9.5 is applicable, and this together with
the fact that a'sa = 33 establishes the invariance of contraction under a transformation of
coordinates involving x x and x a only. But the
same reasoning would apply to transformation involving x 2 and x 3 only, or x x and x 3 only, and
since we have proved that a general transformation of coordinates may be replaced by a succession of transformations involving each only
two coordinates we have proved the invariance of
the operation of contraction of a tensor of rank
two under a general coordinate transformation.
Following the same principle we could prove
the invariance of contraction for tensors of any
rank and also, using the fact that the invariance of the operation of differentiation has
been proved for two dimensions, prove that it
has an intrinsic meaning in three dimensions.
We come now to four dimensions. Here it is
easy to prove that a general transformation can
be effectuated by a succession of six single rotations, i.e., rotations involving only two axes
each; in fact, a rotation in the xt-plane will
bring the x-axis into the new xyz-solid; a rotation in the yt-plane will bring the y-axis into the new xyz-solid, a rotation in the zt-plane
will bring there the z-axis; now the t-axis coincides with the new t-axis and the x,y,z axes
are all in the new xyz-solid and can be brought
into coincidence with the new x,y,z axes by
three more rotations as we saw before.
The reasoning indicated for the three-dimensional case will, therefore, prove the invariance of the fundamental operations of tensor
analysis also for four dimensions.
1

10.

Complications Resulting From


Imaginary Coordinate.

the fact that the fourth coordinate is imaginThe introduction of Imaginarles helped us
ary.
to obtain a symmetrical form of Maxwell B equations, and seems to be beneficial from this
point of view. The formal part of the theory'
runs now smoothly; but there Is a disadvantage
in this smoothness, It conceals very important
peculiarities, and the present section will be
devoted to the consideration of some of these
1

peculiarities.
The discussion may be conveniently attached to the consideration of the expression (pomp.
7.41)

which defines the square of the distance between the two points whose coordinates appear
In it; or the square of the vector joining
these two points.
Since in the above formula the quantities
x 4 and x*4 and, therefore, their difference is
imaginary, the fourth square is negative, and
,the expression may, according to the
relative
magnitudes of the terms, be positive, negative,
or zero. There are thus three types of relative positions of two points, or directions, or
of vectors; there are vectors of positive square,
those of negative square, and those of zero
square. Our geometry is thus more complicated
than what we would expect it to be if it would
differ from the ordinary geometry in the number of dimensions only. This complication, or
this richness of our geometry, far from being
an undesirable feature, is, as we shall see, an
advantage, because it corresponds to certain
features of the outside world, e.g., the existence of both matter and light which are going
to be identified with two kinds of vectors. At
present we only mention that the momentum vector of a material particle, or the vector of
components u,v,w,i are vectors of negative
square; in fact, the square of the latter is
u 2 +v a +w 8 -!; the first three terms representing
the square of the velocity of the particle
which, according to the third remark preceding
formula 4.3 is very small compared with unity,
the expression is negative.
In some cases it may be desirable to sacrifice the formal advantages accruing from the
use of the imaginary coordinate in order to put
in evidence the peculiarities we are discussing; it is permissible to go back then to the
old notations x,y,z,t, but it becomes necessary
shall
to modify the formulas accordingly.
(We
see later how it is possible to use index notaIf we detions and still avoid imaginaries.)
note two points by x,y,z, it and x' , y , z , it ,
instead of by xi and x't the formula for the
square of the distance will be
f

The only new feature of the new geometry


considered in the preceding sections was this
that we have four coordinate axes instead of
three; but we still have another departure from
ordinary geometry (due in the final count to the
"minus sign" in the Maxwell equations),
viz.,

10.1

(x-x')

f
- (t-t')',
(y-y')* + (z-z )*

and the formula for the scalar prc

ee 7.4)

of two vectors given by a,b,c,id, and a',b',c',


id',

10.2

aa'

+ bb'

+ cc

dd

We may say that the minus sign appearing


in these formulas is the same as the one appearing in the second set of Maxwell's equations (4.21), because it may be traced back to
them.
We shall occasionally refer to the quantities xi and ui which carry indices and involve
the square root of -1 as mathematical coordinates and components, and to the quantities x,
y,z,t and a,b,c,d as physical coordinates and
components. Four-dimensional space of the character we are studying now, i.e., either characterized by three real and one imaginary coordinates, or with four real coordinates but with
scalar multiplication with a minus sign given
by formula 10.8 is often called four-dimensional space-time because of the interpretation of
the quantities x,y,z and t in ordinary physics.
Without going into detail we may mention a
few consequences of the "minus sign". According
to our definition,
two vectors are considered
perpendicular if their scalar product vanishes.
But now a scalar product of two equal vectors
may vanish, as happens, for instance, in the
case of two with components 0,0,1,1.
We must
say then, that such a vector is perpendicular to
itself.
We also may mention that corresponding to
the existence of three types of directions, those
which correspond to vectors of positive square,
negative square and zero square, there are three
types of orientations or planes. An orientation
may best be characterized by the number of zero
directions it contains and it is easy to prove
that there are orientations containing two, one
or no zero directions.
As the result of existence of vectors of
negative square our proof that the cosine of the
angle between two vectors as defined by formula
7.5 does not exceed unity in absolute value is
not applicable and we would have to consider
imaginary angles or else consider the cosine as
a hyperbolic cosine, but we shall not go into
this question.
The only peculiarity due to the "minus
sign" other than the existence of zero square
vectors that we shall have to use in what follows is connected with transformation of coordinates.
Formally our transformation formulas remain
the samej we may write, for instance,

xs =

cos

<p

x 4 sin

* 4 = x 3 sin 9 + x 4 cos

1 and 0, respectively, we see that cos


mi it
be real, and sin
imaginary. We shall, bowever, prefer not to use imaginary trigonometric
functions; In order to avoid then we introduce
a new notation as follows:

10.

sin t =

COS f

where o and T are real quantities. The identi8


ty cos** + sin ? = 1 gives for o and T the relation

10.4

1.

If we prefer to use one number, rather than


two numbers connected by a relation, in describing different transformations of coordinates In
the X 3 x 4 plane we may again resort to trigonometry and interpret o and T as the secant and
tangent of a real angle t :

10.5

sec

tan

It must be noted, however, that


so to say,
has no geometrical significance.
If In the above formulas of transformation
we substitute for x 3 and x4 their expressions
in terms of z and t (4.7) and for cos
and
sin 9 their expressions (10.?) In terms of a
and T we obtain the following formulas of transformation for the physical coordinates:

10.6

z'

zo + tt,

t1

ZT

tO.

These formulas are called the Lorentz


transformation formulas and their physical interpretation will be discussed in the next chapter.

Concluding this section we may mention how


our axioms of Section 8 must be modified in order to produce a geometry with the desired peculiarities.
It is clear that our axiom V, according to
which a non-zero vector has positive square has
The proper modification is the
to be modified.
following:

Axiom

There are orientations containing


.
two zero directions, but there are no orientations containing more than two zero directions.
If we replace axiom V by this axiom and
keep the remaining axioms as they were stated
in Section 8 we obtain a geometry of the kind
desired. In order to show this, we first show
the existence of four mutually perpendicular
vectors three of which have squares equal to 1,
and the fourth one equal to -1. We begin by
picking out a plane with two zero square vectors a and b; we assert that a 1 * i(a + b) and
b = J(a - b) are two perpendicular vectors
with squares of opposite sign; In fact a'b =
8 a
and
b ) and this is zero because a* =
= 0; then, a = a
b 1 , squaring this and
8
keeping in mind that a'b' = 0, we have a =
8
=
* b';
it follows that
a'
since a
are of opposite
the squares of a
and b'

(a

but here x 3 is real and x 4 Is imaginary and, of


course, we expect the new coordinates to be of
the same character so that x& will be real and
x^ imaginary. Giving to ,x 3 and x 4 the values

l-c

H
1
sign. Dividing each of the vectors a and b' by
root
of
the
absolute
value
of Its
the square
square we obtain two mutually perpendicular vectors whose squares are +1 and -1, which we may
It is easy now to
call k and / respectively.
pick out two more vectors 1 and J which together with k and Jf constitute a set of four mutually perpendicular vectors; none of them can have
should be
a zero square, because if, say, i a
zero all the vectors of the plane determined by
a and i would have zero squares, contrary to

axiom V.
Using these four vectors we can, as in the
other case, express every vector in the form
(comp. 8.2)

ai + pj

10.7

O/.

will be considered as
The numbers a, p , Y
In order to
the components of this vector.
show that we have what we wanted we shall express the square of x in terms of its components.
Squaring 10.7 and taking into account that f* =
-1 we obtain
+ p

This shows that a, p, Y> 6 are what we call


physical components of the vector; the mathematical components are obtained by setting
xg =

= a,

x3

p,

10,

and we see that we get the kind of geometry


expect to use in physics.

we

Are the Equations of Physics


Invariant?

11.

We return now to physics. In Chapter I we


arrived at certain equations that we consider
as fundamental; namely, the equation of continuity (3.5)
11.1

^pua/oxg

0,

two sets of Maxwell's equations (4.61 and 4.62)


11.

^Fy

/DXfc +

"oFjfc

/QXj[ + "dFfci

^Fia/ox a = eui

11.3

/ox* = 0,

and the equations of motion (6.3)

11.4

'oT

ia /ox a

=0

with (comp. 6.2, 5.1} 2.7)


~
T ij = E
iJ

11.5
= F

iP PJ

(i = 1,2,3)

The fact that the Indices run here fro*


1 to 4 (except in 11.4) suggested four-dimensional geometry; which we have introduced in
Sections 7 and 6; the fact that x* in the above
equations is imaginary (comp. 4.7) suggested
the peculiarities discussed in Section 10. low
that we have followed these suggestions and
built a mathematical theory we have to see to
what results the application of our new theory
In addition to following the suggesleads.
tions we have Introduced into the theory a feature that was not directly called for by physlcs: we made our theory independent of coordinates. In order to bring out the importance of
this fact let us consider for a moment the case
of two dimensions and compare plane geometry
with a two-way diagram. Both in plane analytic
geometry and in a diagram we use coordinate axes, but in geometry the axes play an auxiliary
role, we find it convenient to express by referring to axes properties of configurations which
exist and can be treated independently of the
axes; the same properties can be expressed using any system of coordinate axes. The situation is different in the case when we use a
plane as a means of representing a functional
dependence between two quantities of different
for
kind, when we have a diagram. We may use,
instance, the two axes to plot temperature and
pressure, or the height of an individual and
In
the number of individuals of that height.
the majority of such cases the axes play an essential part in the discussion; if we delete
the
the axes the diagram loses its meaning,
question of rotation of axes does not arise.
Returning to physics we have to ask ourselves what we actually need for it, a diagram
or a geometry; in other words, are the coordinate axes essential or can they be changed at
will, or again, do the equations of physics express properties independent of the coordinate
axes; are they invariant, or not.
In order to answer this question let us
first consider the formal structure of the equations 11.1 to 11.5. The fundamental dependent
variables are here a scalar p, a vector ui and
an anti- symmetric tensor Fj .
The left hand side of equation 11.1 may be
described as a result of multiplying the scalar
p by the vector u i} then differentiating the resulting vector p%, then contracting the tensor
so obtained; since the operations of multiplication, differentiation and contraction have been
shown to be invariant, the scalar opu^/big is
independent of the system of coordinates used,
it
and if it is zero in one system of coordinates
The conis zero in all system of coordinates.
tinuity equation expresses, therefore, a fact
independent of the system of coordinates em-

ployed .
An analogous reasoning applied to 11.3
would show the invariant character of that sysThe question of invariance of the first
tem.

sign. Dividing each of the vectors a and b by


the square root of the absolute value of its
square we obtain two mutually perpendicular vectors whose squares are +1 and -1, which we may
It is easy now to
call k and / respectively.
pick out two more vectors i and J which together with k and ^ constitute a set of four mutually perpendicular vectors; none of them can have
should be
a zero square, because if, say, i 8
zero all the vectors of the plane determined by
a and i would have zero squares, contrary to
1

axiom V .
Using these four vectors we can, as in the
other case, express every vector in the form
(comp. 8.2)

will be considered as
The numbers a, p , Y >
In order to
the components of this vector.
show that we have what we wanted we shall express the square of x in terms of its components.
Squaring 10.7 and taking into account that jf* =
-1 we obtain
= a2

This shows that o, p, y, 6 are what we call


physical components of the vector; the mathematical components are obtained by setting
xa =

= Y,

X 4 = 10,

and we see that we get the kind of geometry


expect to use in physics.

we

Are the Equations of Physics


Invariant?

11.

We return now to physics.


In Chapter I we
arrived at certain equations that we consider
as fundamental; namely, the equation of continuity (3.5)

11.1

SpUa/oxa

0,

two sets of Maxwell's equations (4.61 and 4.62)

11.8

"oF^j

/ox k +

-oFjfc

/oxi +

e-Ffci

^>Fia/x a = eui

11.3

/OXj

= 0,

and the equations of motion (6.3)

11.4

-oT

ia /ox a

=0

with (comp. 6.2, 5.1} 2.7)


11.5
' F

iP PJ

(i = 1,2,3)

The fact that the indices run here from


suggested four-dimensional geometry; which we have introduced la
Sections 7 and 8; the fact that x in the above
equations is imaginary (comp. 4.7) suggested
the peculiarities discussed in Section 10. low
that we have followed these suggestions and
built a mathematical theory we have to see to
what results the application of our new theory
In addition to following the suggesleads.
tions we have introduced into the theory a feature that was not directly called for by physics: we made our theory Independent of coordinates. In order to bring out the importance of
this fact let us consider for a moment the case
of two dimensions and compare plane geometry
with a two-way diagram. Both in plane analytic
geometry and in a diagram we use coordinate axes, but in geometry the axes play an auxiliary
role, we find it convenient to express by referring to axes properties of configurations which
exist and can be treated independently of the
axes; the same properties can be expressed using any system of coordinate axes. The situation is different in the case when we use a
plane as a means of representing a functional
dependence between two quantities of different
for
kind, when we have a diagram. We may use,
instance, the two axes to plot temperature and
pressure, or the height of an individual and
In
the number of individuals of that height.
the majority of such cases the axes play an essential part in the discussion; if we delete
the
the axes the diagram loses its meaning,
question of rotation of axes does not arise.
Returning to physics we have to ask ourselves what we actually need for it, a diagram
or a geometry; in other words, are the coordinate axes essential or can they be changed at
will, or again, do the equations of physics express properties independent of the coordinate
axes; are they invariant, or not.
In order to answer this question let us
first consider the formal structure of the equations 11.1 to 11.5. The fundamental dependent
variables are here a scalar p, a vector ui and
an anti-symmetric tensor Fy .
The left hand side of equation 11.1 may be
described as a result of multiplying the scalar
p by the vector u^, then differentiating the resulting vector pui, then contracting the tensor
so obtained; since the operations of multiplication, differentiation and contraction have been
shown to be invariant, the scalar ^jpUg/oxg is
independent of the system of coordinates used,
and if it is zero in one system of coordinates
it
The conis zero in all system of coordinates.
tinuity equation expresses, therefore, a fact
independent of the system of coordinates em1 to 4 (except in 11.4)

ployed .
An analogous reasoning applied to 11.3
would show the invariant character of that sysThe question of invariance of the first
tem.

23

system of Maxwell's equations requires a special discussion; it can best be treated by introducing a new an ti- symmetric tensor DJ_J connected with F4
by the following relations:
F 8 , = D 14 , F 31 = D, 4 ,
11.6
F 14 = D 13 , F 84 = D 81

Before v,e show how this is going to help us in


connection with our equations we want to prove
that these relations are independent of the coordinate system; i.e., that if 11.6 hold relations of the same form, namely
11.6'

etc.,

will hold in any other coordinate system. Again,


since a general transformation of coordinates
can be achieved in steps it will be enough to
test a XiX 2 rotation only. As a result of such
a rotation F{ 2 becomes (comp. 9.21)
3

12

using the fact that


we find

F 2l

F 82 sc;

is anti-symmetric (4.4)

F' a = F,
12

11.7

and since obviously D 34 = D 34 because the x 3 x 4


axes are not affected we see that the first of
the relations 11.6' follows from 11.6. In order
to find F 23 we have, according to the general
rule following 9.31, to substitute 6 2l and 6 3 i
for
and y'
respectively in F'p O x' p yJ, =
xj^
.
As
tne
corresponding values of x. i and
Fp O Xpya
Yi we find with the aid of 9.2 considering that
X 3 = X 3 , X 4 = X4
"~

yi =

0,

c,

x3

=0,

x 4 = 0,

y 2 = 0,

y3

=1,

y 4 = 0;

fl

so that

11.71

FjJ

= -sF 13 + cF 23 ;

in a similar way we obtain

11.72

D 14 c

the

in 11.2 in terms of the D


i

that set

11.2'

and its invariance follows from general considerations as in the case of 11.3.
Formula 11.5 contains only multiplications,
contractions and additions, so that there la no
doubt concerning its invariance, but the situation changes when we come to the set 11.4. The
vector *> T
ia/ox a has been obtained by invariant
operations but 11.4 states that only three of
its components are zero, a statement which obviously depends on the choice of coordinate axes and is not invariant.
We have now two courses open before us:
one is that of resignation, we can say: we see
that physics is not like geometry in this respect, that we can only use four-dimensional
notations, a four-dimensional diagram but not
four-dimensional geometry; the other course is
that of adventure, we may try to play the game
of geometry; let us pretend that we can apply
the formulas of transformation of coordinates
in this case; we know that there will be a difference between the theory we obtain and the
physics which we undertook to translate into
our language; but it may be that the difference
will amount numerically to very little. Consider the fourth component of the vector oTj^oXgj
we found (comp. remark following 6.3) that one
of the terms of this expression, ^Mio/oxg vanishes, and the other 0^3/0X3 gives Xu+Yv+Zw,
where u,v,w are the components of velocity, but
in order to present the Maxwell equations in a
simple form we had to choose our units in such
ora way that the velocity of light is unity;
dinary velocities are of the order of magnitude
of one ten-millionth of the velocity of light,
so that we see that by setting the fourth component of oTi a /ox o equal to zero we would commit an error that is numerically very small.
This encourages us to go on with our adventure
and try to force the geometrical character on
In order to do that let us go beyond
physics.
the formal structure of our formulas and recall
what the meaning of our fundamental quantities
The components of the vector U A were givwas.
en (see 1.2, 2.4, 4.9) as

- D 84 s;

11.3

taking again into account the anti-symmetric


property of F we come to the conclusion that
the second relation of 11.6% is a consequence of
11.6, and since the same reasoning applies to
the remaining relations we conclude that the relations 11.6 are independent of the coordinate
system; it is easy to see that they assign to
every tensor Fy a tensor DJJ (the tensor Dj* ,
or, rather v^l DJJ is often referred to as the
dual of FJ.J). Now if we express, using 11.6

FJJ

becomes

u x = dx/dt,

u 2 = dy/dt,

u a = dz/dt,

u 4 = 1.

But this identification is obviously not independent of the coordinate system, it gives prefWe may think
erence to the fourth coordinate.
and
that this is the source of our difficulty,
that this difficulty may be overcome if we find
an invariant identification to take the place
The next section will prepare the way
of 11.8.
for this.

4
12.

Curves in the New Geometry.

The root of the difficulty is that our description of motion was not invariant; motion
was described by giving the dependence of the
coordinates x,y,z on time, that means by giving
three of our coordinates Xj., x a , x 3 , as functions of the fourth x 4 which thus is given prefThe situation is analogous to that In
erence.
plane analytic geometry where we give y as a
function of x, or that in solid analytic geometin
ry when we give y and z as functions of x;
both cases we represent curves; from our fourdimensional point of view we should then consider motion of a particle as a curve in four dimensions (using the word curve in a general
sense so that straight line is a special case) .
What we want then is a representation of
curves in four dimensions which would not give
preference to the fourth coordinate. We begin
by considering representations of curves in two
and three dimensions which give no preference to
one coordinate.
In the plane a line may be represented by

x = ap + b,

y = cp + d,

x = r cos p,

y = r sin p;

a circle by

in space a line by
x = ap + b,

y = cp + d,

z =

in the special case when s la used as par


p we differentiate both aides and obtain

12.2

(dx/dp)

+ (dy/dp)

+ (dz/dp)

ter

1.

We may consider in general dx/dp, dy/dp, dt/dp


as the components of a vector tangent to the
curve; the change of parameter would multiply
these derivatives by the same number, i.e., substitute another tangent vector for that one; the
a
quantity (dx/dp)* + (dy/dp) + (dz/dp)
fires
the square of the length of the tangent rector;
the above equality 12.12. expresses then the
fact that if we use arc length as the parameter
the length of the tangent vector whose components are the derivatives of the coordinates
with respect to the parameter Is unity.
We come thus to the idea of a unit tangent
vector; it characterizes in every point the direction of the curve; its components are the direction cosines of the tangent.
We may try to go through an analogous process in the case of curves in four-dimensional
space, which as we saw may be taken to represent motions; if we succeed, the vector at which
we arrive will suggest itself as a natural thing
to identify with the vector of components ui
which appears in our formulas. Starting with
any parametric representation xi = xi(p), where
p may be for example t, we try to change our
parameter by introducing a new variable q and
making p a function of q, choosing this function in such a way that

ep + f ,
(dx a /dq)

+ (dx a /dq) a

a helix by

x = r cos p,

y = r sin p,

z = kp, etc.

In all these cases to every value of the


"parameter" p corresponds a point of the curve;
in general, if we set
= f(p),

y = g(p),

z = h(p)

we have what we call a parametric representation


of a curve (corap. parametric form of equation of
straight line in 7.1). In the same way we may
represent a curve in four dimensions, which we
take to mean motion of a particle, by giving
x^
as functions of a parameter p.
The defect of this method is that it contains a certain arbitrariness; we may substitute
for p another parameter q by making p an arbitrary increasing function of q. We want now to
standardize our parametric representation.
The
usual way is to choose the arc length along the
curve as the parameter. Without going into detail we shall state that arc length between
of the
points corresponding to values p x and p
parameter is given by

18.1

s =

/ V( dx/dp) a +

(dy/dp)*+ (dz/dp)'dp;

but dxi/dq = (dxi/dp) . (dp/dq) ; so


function p(q) must be such that

dp/dq

that

the

/(dx 1 /dp)V(dx,/dpr+(dx,/dp) +(dx4 /dp)

In the case when the origiIs this possible?


nal p is t the expression under the radical
- 1, and
8
for motions
sign will be u + v + w
whose velocities are smaller than the velocity
so that
of light (Section 4) this is negative,
In
we would get an Imaginary value for dp/dq.
to
decide
we
this
order to avoid
unpleasantness
standardize our parameter by requiring
in
to be -1 instead of 1;
(dxjj/dp) . (djL/dp)
this case we find for dq/dt the expression
8
/I - (u 8 + v* + w ) and we may write

12.3

where p stands for /u 8 + v

w,

i.e.,

for

what we call speed (the length of the velocity


The quantities Just written out we
vector).
want to identify with the components of the
vector ui which appears in our formulas. Since
in ordinary cases p is very small, the radical
B
is very near to unity and our new iden/I - P
tification differs from our old identification
On the other
(11.8) numerically very little.
hand the new values for ui are according to

their derivation the components of a vector, to


that if we adopt this identification and also
agree to set the fourth component of the divergence of the tensor T^ equal to zero we obtain
an invariant theory whose statements differ only very slightly from those accepted in classical physios.
It remains to be seen whether
there are cases in which the discrepancy is
large enough to be tested by experiment.

26

Chapter III.
SPECIAL RELATIVITY

Guided by the point of view that the formulas of physics ought to be interpreted in fourdimensional geometry we were led to the interpretation of the motion of a particle as a curve
in space-time. Following the analogy with a
curve in ordinary space where arc length, s, is
often used as a parameter, we have introduced a
standard parameter, which we may also. call arc
length and denote by s, for curves in spaceThe partial derivatives dxj/ds of the cotime.
ordinates of a point on the curve with respect
to s may be considered as the components of a
vector tangent to the curve (the square of this
vector is -1 in every point - we shall refer to
such a vector also as a unit vector) . We have
then at every point Xi, x a > x 3 , x 4 of such a
curve a .unit vector dxi/ds, and we have agreed
to identify this vector with the vector u^ which
appears in our fundamental laws of physics (11. 1
to 11.5) so that
ui

dxi/ds.

In this chapter we want to consider some consequence of this identification.

15.

Equations of Motion.

The one thing that was not satisfactory


about the formulas of physics was the fact that
according to 11.4 only three components of the
vector oTi a /dx a are equal to zero.
In this
section we shall see how this defect is corrected by the adoption of the new identification.
But before we do that we have to study some immediate consequences of this identification.
Before, a motion of a particle was given
by giving the position of the particle in different moments of time, i.e., the coordinates
x,y,z as functions of t. Given these functions
we can calculate for every moment the velocity
vector of the particle - a vector of components

u = dx/dt,

v = dy/dt,

Now, the same motion is


*i *2 x 3 , x 4 as functions
a vector of components ui =
to the special choice of the
the equation

13.1

Ul

U2

U4

w = dz/dt.

described by giving
of s; and we have
dxi/ds which, due
parameter satisfies

u = dx/dt =

dx/ds

dxi/ds

and in a similar fashion


v = i.u a /U4,

13.8

w =

We see thus that the vector u^ determines the


velocity of motion, and we agree to call it the
four-dimensional velocity vector. On the other
hand, being a unit vector this vector characterizes the direction in four dimensions of the
curve representing motion; its components
u^
may be considered as the direction cosines of
the tangent (compare Section 7, between formulas 7.41 and 7.5).
But a velocity vector does not characterize the motion of a particle completely; it gives
only the kinematical characterization; in dynamics we need in addition, to know the mass of
the particle, and then we form the momentum vector (compare beginning of Section 1) whose components are mu, mv, mw. By analogy we form
the expressions mui or pui (depending on
whether we use the discrete or the continuous
picture of matter) and consider them as the components of the four-dimensional momentum vector.
Using the formulas 12.3 we have for its components
mu 2 =

mv

13.3
mu4

These are what we call the mathematical components of the momentum vector; its "physical components" are
mu

1
~ -L

Of course, we have merely two representations of the same thing. Given the Xi(s) we can
express s as a function of t from

x 4 (s) = it

and substituting the expression of s so found


into x x (s), x g (s), x 3 (s) we will have x,y,z,
as functions of t. Or given x,y,z as functions of t we can arrive at the representation
x 1 (s)
as indicated in Section 12.
Also the space-time vector u^
and the
space vector u,v,w describe the same thing.
The formulas 12.5 show how to find the components Ui in terms of the velocity vector u,
in
v,w, and it is easy to find the u,v,w
terms of components u^. We simply have

13.51
c

87
We obtain a relation between the momentum
components and mass if we take the sum of the
squares of the components 13.3 or 13.31 and use
13.1, viz.,
a a + b" + c

- d*

= -m";

In words, the negative square of mass is the


square of the momentum vector, so that mass is
essentially given by the length of the momentum vector; we see here another advantage of the
four-dimensional representation: the four dynamical quantities of a particle which in classical physics are given by the three momentum
components and mass are here represented more
naturally by the four components of a vector.
As stated many times before, numerically
is in most applications very close to
unity so that approximately the first three
components of the four-dimensional momentum vector are equal to the components of the threedimensional momentum vector and the last (physical) component of the four-dimensional momentum vector is, in first approximation, equal to
mass.
Let us consider more in detail this fourth
component of the momentum vector. If we want a
better approximation we develop the last of 13.31
according to powers of p and keep only two terms;
we have thus the approximate equality

13.4

d =

the correction represented by the second term is


nothing but kinetic energy; of course, if ordinary units are used this term has to be written

13.41

imV /c

where c is the velocity of light, and V is the


velocity of the particle measured in the same
units, because p is the ratio of the velocity of
the particle to that of light. We had better
say then that the correction is kinetic energy
divided by the square of the velocity of light.
Sometimes this fact is expressed by saying (neglecting the other terms, which are very, very
small) that when a body is in motion its mass is
increased by its kinetic energy (divided by the
square of the velocity of light) .
The interest of this lies in the close relationship which is thus established between
mass and energy - a relationship that plays a
prominent part in present physics.
Sometimes the whole expression m//l - p 2
is referred to as energy of the particle; mass,
from this point of view, appears then as part of
the energy, that part that the particle possesses even when it is at rest; in other words, mass
appears as the rest-energy of the particle. We
could also call m//l - P 8 generalized mass and
say that mass changes as a result of motion (compare end of this section).

We are ready now to discuss the equations


of notion 11.4 or

The left hand sides may be written as

the first factor of the first term vanishes according to the continuity equation 11.1;
the
second term may be written, recalling the definition of uj, as
= p.ou /ox .dx /ds
1
a

p.duj/ds;

the right hand side of 13.5, according to


former calculation (Section 6) is eFla u a ,
that the equations of motion become

our
so

p.duj/ds
or, if we use the discrete picture, considering
both mass and electric density to be concentrated
in one point, and denoting mass by m, electric
charge by e,

13.51

m.duj/ds =

These are the equations we are going to


discuss. In applying them to physics we give
preference to time by writing

13.52

m.dUj/dt

= eF
la .dxo/dt

which spoils the invariant form but does not


change the contents of the statement because
the transition from 13.51 to 13.52 is equivalent to multiplication by ds/dt. Using 4.72
(or 6.1) the last equations become
m.du x /dt = e(X + Hv - Mw),

m.du 8 /dt = e(Y + Lw - Nu)

13.53
m.du 3 /dt = e(Z + Mu - Lv),
m.du 4 /dt = ie(Xu + Yv + Zw)

Multiplying the left hand side of the first


of these equations by i.u^Au and the right
hand side by u (compare 13.2); using in the
and
same way Lu^/u* = v on the second,
i.u 3 /U4 = w on the third, and adding the results we get the fourth equation because the
left-hand side comes out
(Im/u4 ).(u 1 .du 1 /dt + u,.du./dt + u,.du,/dt)

and differentiating the identity 13.1, we find


that the second factor is, Ut.du 4 /dt. The fourth
equation is thus a consequence of the first
three, a great improvement over the situation
The
as it was before the new identification.
fourth equation also has a definite physical

meaning now; the left hand side may be said to


represent the time rate of change of energy
(since the variable part of mvu has been recognized as kinetic energy), and the right hand
as
side has been recognized before (Section 6)
(pothe rate at which the energy of the field
tential energy) is being expended in moving the
The difficulty with the fourth equation
body.
has thus been settled in a most satisfactory
fashion but the system as a whole, or the first
three equations, have to be tested by experiment (the fourth, being a consequence of the
first three, cannot be wrong if these three
will be proved to be "true").
Since
=
dui/dt = -[^(dxi/ds)
dt/ds.-|kdxi/dt)

we may write our equations as

m'.d*x/dt=e(X+Nv-Mw)
13.6

m' ,d Iy/dt s =e(Y.+Lw-Nu)

m'.d 8 z/dt 8 = e(Z + Mu - Lv)

where
m

13.7

= m/-

The right hand sides of these equations (as


stated in Section 6) are the components of the
force exerted by the electromagnetic field on
the particle.
Comparing the left hand sides with
the classical expressions we see then that the
correction resulting from our identification is
equivalent to the substitution of m 1 for m in
the classical equations of motion. We may say
then that our theory predicts that motion will
be governed by the old equations in which mass
has been replaced by a corrected mass the correction being the kinetic energy (divided by the
In the vast
square of the velocity of light).
s
is very
majority of cases the factor 1//1 - (5
close to one, but there are a few cases where it
is not, and these cases afford an opportunity to
test the new theory and to see whether it or the
old one is better adapted to give account of experimental results. In experiments with "cathode ray particles" by Bucherer the predictions
of the new theory seem to have been verified.

14.

Lorentz Transformations.

Now that we saw that the new identification


removes the difficulty in connection with the
fourth equation of motion we want to consider
some other consequences, and in the first place
we want to give a discussion of the physical
significance of the transformation of coordinates promised in Section 10. The new feature

about our coordinate system* IB the greater arbitrariness in their choice. Before we were
free to pass from one system to another with
the same time axis; now we may change the time
axis also (formulas 10.6), and we want to see
what it means.
In general, in one system of coordinates A
geometrical configuration is described by certain numbers (e.g., coordinates of its points)
and certain equations (e.g., equations of its
straight lines); In another system of coordinates the same configuration will be characterized by other numbers and other equations, but
It will be another description of the same configuration; or, we may say, another identification which is theoretically just as good as the
first, but may be more, or less, convenient for
practical purposes. In general we make our
choice of a coordinate system guided by the properties of the object we are studying and our owi
position in space. If we study an ellipsoid we
would choose for coordinate axes its principal
axes; or in another case we would choose the direction away from us as the y-axis, the direction to the right as the x-axis, the vertical
direction as the z-axis; but in principle all
axes are permitted. The same general situation
obtains in physics, considered as four-dimensional geometry. We have many systems of coordinate axes at our disposal, and we want to investigate now what use we can make of this arbitrariness, how we can adjust the choice of axes to the requirements of a particular situaIn particular, we are interested in the
tion.
choice of the t-axis.
The object we want to study in the first
We repreplace is the motion of a particle.
sent such a motion by a curve in four- space (and
a straight line we consider as a special case of
a curve) . At every point of that curve we have
a unit tangent vector, the four-dimensional velocity vector of components a,b,c,d; or we may
characterize it by the three-dimensional velocity components u,v,w; if we pass to another coordinate system the components a,b,c,d will be
changed, and so will the components u,v,w. If
the coordinate transformation affects only the
space coordinates x,y,z then the component d will
not be affected, and therefore p will not change;
in other words, u,v,w will be changed but not
u a + v 8 + w; the velocity vector will have different components, but its absolute value, the
speed, will be the same. This is essentially as
in old physics; the new feature is in the existence of transformations affecting t, and the
most striking result of it is expressed in the
following theorem.
Theorem. For every motion it is possible
for every moment to choose a system of spacetime coordinates in such a way that the speed
be zero.
Proof. Begin with any axes'; then, without
changing the, time axis, change the space axes
so that the motion, at the moment considered,

takes place

along the z-axis; we have then


now consider a transformation involving z and t; denoting the new components of
the four-dimensional velocity vector a , b , c ,
d' we shall have (10.6)
a = b = 0;

= 0,

if we want to make c
so that

a'

=0,

angle

b'

co + dT,

= cr +
do;

we have to choose the

\J>

-o

= sin

<r

= -c/d;

if c is in absolute value less than d, and this


is so for all motions of bodies so far observed,
an angle satisfying this relation and therefore
a system of coordinates for which a - b = c =
can be found. From formulas 12.2 it follows
that in such a system u = v = w = 0, and the
theorem is proved.
The theorem just proved is expressed often
by saying that every particle can be transformed
to rest.
After we have found a coordinate system in
which a particle is at rest we can perform any
transformations of space coordinates and the
property will not be destroyed; any transformation involving time, on the contrary will result

in introducing significant space components of


the velocity vector; we see thus that whether a
particle is at rest in a coordinate system depends exclusively on the choice of the time axis, so that the choice of the time axis is equivalent to the choice of a body which we desire to
consider at rest; in other words, the direction
of the time axis may be characterized by indicating what particle is at rest in the corresponding coordinate system.
What time axis we actually choose depends,
as in geometry, on circumstances; in many cases
we shall want to consider ourselves as being at
rest, or our laboratory, or the earth.
In what precedes we spoke of a motion of a
particle at a given moment; in a given system'
of space-time coordinates a particle may be at
rest at one moment and not at rest some other
time; but there exists a class of particles which
if transformed to rest for one moment will be at
rest always; these are those particles whose
representative four-dimensional curves have the
same tangent vector at all points, i.e., are
straight lines; it is clear that if the direction of such a straight line is taken as the direction of the t-axis the velocity in the so obtained coordinate system is zero. But if we
choose any other (cartesian) coordinate system,
the three-dimensional velocity u,v,w, will be
constant in absolute value and direction, so
that we have a rectilinear uniform motion. From
our point of view then the distinction between
uniform rectilinear motion and rest is a non-essential distinction, this distinction does not
exist until we introduce a coordinate system; it

is of the same nature as the distinction between


lines which are and those which are not parallel
to the x-axls in ordinary analytic
geometry.
If a motion la not uniform and rectilinear
then there is no coordinate system in which the
particle is permanently at rest. But rather
than to make a strict distinction between particles which are and those which are not in uniform rectilinear motion or rest, it Is more in
keeping with our point of view to speak of particles which may be (within experimental error)
considered at rest for a sufficiently long period of time.
We are now in a position to explain the
name and the origin of the theory we are studyWe saw that in this theory there is no
ing.
such thing as absolute rest or absolute notion
of a body. If it is at rest with respect to
one system of coordinates it may move with respect to another and vice versa; we can only
speak of relative motion; that is where the
name Relativity comes from.
If we adopt this point of view, we hare to
consider as permissible all transformations of
coordinates from one to any other cartesian coordinate system. Later on we shall consider
other, more general, systems of coordinates, and

therefore more general coordinate transformations; we shall replace our equations by more
general ones which will be invariant under these
more general coordinate transformations; in comparison with this situation we may say that we
consider now only special coordinate transformations and invariance under them; therefore the
present theory is called "Special Relativity
Theory".
It may be mentioned that the historical
order of appearance of the ideas of our subject - as it happens so often - has been quite
different from the order which seems natural and
in which we have presented them. First the
formulas of transformation involving space coordinates and time have been introduced by Lorentz without, however, giving to them the meaning they have now; in Lorentz's theory there
exists one universal time t, and other times t 1
play only an auxiliary part. The merit of making the decisive step and recognizing the fact
that all these variables are on the same footing - belongs to Einstein (1905). The four-dimensional point of view, after some preliminary
work had been done by Poincare and Marco longo,
was most emphatically introduced by Minkowski
in 1908.

15.

Addition of Velocities.

As explained in Section 13 we have two ways


of characterizing the velocity of a body:
by
means of the three-dimensional velocity vector
and by means of the four-dimensional velocity
vector. We can pass from one representation of

velocity to the other without difficulty and


the two methods are equivalent as long as we do
not change our coordinate system.
But if we
come to study the relative motion of one body
with respect to another and want to define the
relative velocity, the four-dimensional point
of view leads to conceptions which are at variance with commonly accepted ideas and we want to
devote this section to the clarification of this
situation. It is natural to reduce the definition of relative velocity of a body with respect to another body to the conception of the
velocity of a body in a coordinate system by
saying: By velocity of the body B with respect
to a body A we mean the velocity of B in a system of coordinates in which the velocity of A
zero.
If we want to find the velocity of B with
respect to A we have to transform our coordinates so that in the transformed coordinate system A be at rest. It is clear that the meaning
of relative velocity is made to depend by the
preceding definition on what we mean by transformation of coordinates. If by transformation
of coordinates we mean only transformation of
three-dimensional coordinates - transition to
moving axes - we have the old idea of relative
velocity; if, on the other hand, we consider
four-dimensional coordinate axes and our transformation of coordinates involves the coordinate
x 4 , or t, in the sense of the theorem of Section 14, it is clear that we give a new meaning
to relative velocity, and we should not be surprised if the so defined "relativistic" relative velocity will possess properties different
from those of the "classical" relative velocity.
Consider a body A and a body B that moves
with respect to A uniformly and rectilinearly
with a velocity VBA ; this means according to our
definition that the velocity of B in a coordinate system in which A is at rest is VBA
Introduce a coordinate system in which A is at
rest and B moves along the z-axis, and call the
coordinates XA , y A , Z A , t A ; introduce also a
system of coordinates in which B is at rest so
that (10.6)
is.

V A = V B>

XA ~

ZA =

^A*^* Comparing this with the preceding


equation we have (10.5)
15.2

Solving the above transformation formulas


for Z B , tB we also find that TIB a -^BA*
AB =
o BA so that

V AB = - VBAThis result, that the relative velocity of A


with respect to B is the negative of the relative velocity of B with respect to A is In keeping with the old Ideas.
Now consider three bodies, A,B,C, all moving in one direction (more precisely B and C
moving in the same direction with respect to A).
Denote the velocities of B and C respectively
with respect to A by VBA and VCA > and the velocity of C with respect to B by VCB . We have
in addition to the above transformation formulas the formulas

15.11

(L

= Z T
B BA

Now describe the motion of the body B in each


In
system neglecting the x and y coordinates.
the system A the motion of the body B which, we
was at Z =
is
assume, at the moment t =
given by

15.12

=Z C O CA

+ t cT CA ,

tA

=Z CJTCA + t c o cl ,

Z=ZO
B
C(B

=ZT
CCB

and also
15.21

VGA =

VBA

VCB

OCA'

TCB
OCB*

Express now Z A , t A in terms of z c , t c by substituting the values for ZB, t B given by the
transformation formulas 15.12 into the transformation formulas 15. Ij comparing the result
with the transformation formulas 15.11 we get
CA

GB'BA +TCB'TBA

TCA = TCB*BA

+
CB*

TBA

whence, using the above expressions of velocities in terms of transformation coefficients,


15.2 and 15.21,
VBA

15.3

VCB
'

"CA

l + VB A .V CB

This is the Einstein formula for addition of


velocities for the case of two motions In the
same direction. This formula should be compared to the formula of addition of classically defined relative velocities

15.4
ZA

ZA

and

15.1
ZA ~

BA

V =

V'

+ V".

in the system B the body B is at all times at


the origin of the coordinate system,
so that
Z B = 0; substituting this value in the
transformation formulas and eliminating t B we get

Of course, there is no contradiction between


the two formulas because they refer to different quantities. Still it is legitimate to ask
which formula is better from the point of view
of experiment, which - if any - is "correct"

for the relative velocities that we actually


measure.
In ordinary units th3 second term in the
denominator in formula 15.3 should be divided
by the square of the velocity of light, so that
for moderate velocities the formulas give results that differ numerically very little, and
it seems to be difficult to devise an experiment with high enough velocities of material
particles so that the formulas could be tested
In the next section we shall considdirectly.
er the case when one of the velocities is that
of light; in the meantime we may mention that
formula 15. '6 is a special case of a more general formula which corresponds to the case when
the two motions are not in the same directions.
This general formula gained temporary importance
some years ago v.'hen it played a decisive role
in the early stages of the application of the
idea of the spinning electron to the explanation
of spectra.

16.

Light Corpuscles, or Photons.

In studying curves in four-space representing motions of particles we succeeded


(Section
1) in choosing a standard parameter, s, by considering the expression

and by setting ds/dp equal to the reciprocal of


the square root of minus the above expression.
This procedure would not work if the above expression were equal to zero. We can imagine in
our four-dimensional geometry curves and straight
lines for which the above expression is zero
(Section 10), and the question arises: what will
be the physical interpretation of such curves;
in other words: is there anything in physics
that could be identified with such curves in the
same way that motions of particles are identified v/ith curves for which the above expression
is negative.
In order to answer this question
let us calculate the three-dimensional velocity
corresponding to such a curve; if the above expression is zero for one choice of parameter it
will be zero for all choices; using t as parameter, and using physical coordinates we have
then
(dx/dt)

+ (dy/dt)

or

+ (dz/dt)

u2 + v2 + wa

-1=0

= 1,

i.e., we can say that the curves of zero square


tangent vectors correspond to what we have to
call from the three-dimensional point of view
particles moving with the velocity of light.
This suggests to identify such curves in some
way with propagation of light.
Since the time of Newton and Huygens two
theories of light have been vying for suprem-

acy with variable success; according to one,


the so-called corpuscular theory, light consists (like matter on the discrete theory)
of

particles which lately (Wolfers, 1925) hare


been called "photons"; according to the other
theory light is a wave phenomenon. For our
present purposes the former view seems to be
better adapted. If we adopt it we can make our
former statement more specific by saying that
we identify curves of zero square tangent vectors with photons, or with motion of photons.
In adopting thus the corpuscular theory of
light we do not in the least mean to say that
the corpuscular theory of light is correct, and
still less that the other theory - the wave theory - is wrong. We simply want to show that
the identification Just mentioned permits us to
give account of certain light phenomena; and it
is enough to mention polarization in order
to
see that other phenomena are left out.
To begin with we want to point out an advantage that the Relativity theory has compared
to classical theory in the matter of corpuscular theory of light. In classical theory difference in velocity is merely a quantitative
difference, in relativity this means an entirely different kind of curves, and there are other differences entirely of qualitative nature
that are consequences of our identification,
which is more in keeping with the nature of
light compared to matter as we know it from experiment. This seems to constitute a very
strong argument in favor of the adoption of the
point of view of Relativity in general, and of
the identification we are discussing now in
particular.
We want next to discuss what is usually
referred to as constancy of the velocity of
light. The reader may have noticed that a while
ago when we were calculating the three-dimensional velocity, corresponding to curves with
zero-square tangent vectors, we did not say in
calculate
vrhat coordinate system we wanted to
this three-dimensional velocity. As a matter
of fact, the result shows that it is independent of the coordinate system; i.e., no matter
what bodies we consider as being at rest, we
come out with the same value for the velocity
of light, in our units - one.
This seems surprising; it contradicts the
commonly accepted ideas concerning addition of
velocities; but we have been led to a different
formula for the addition of velocities, and we
can show that the constancy of velocity of light
is in agreement with that formula 15.3. In fact,
if we consider the case that C moves with the
velocity of light (that is one in our units)
with respect to B, that means that VCB = 1; substituting this value in 15.3 we find that VGA
is also one; that is, what is motion with velocity one in one system is motion with velocity
one in another system.
This discussion, of course, proves nothing
but the inner consistency of the thec

Another question is whether constancy of


velocity of light, i.e., independence of this
velocity from the choice of the system which Is
considered at rest is consistent with experiment.
As a matter of fact, it appears that it
is; the weight of experimental evidence seems
to be for it.
Historically, results of some
experiments by Michelson and Moreley performed
in 1887 and pointing in the same direction played a great role in the creation of the Theory
of Relativity.
Having considered thus the question of velocity of light we pass to the discussion of another consequence of our identification.
We have decided in a general way to identify straight lines whose vectors are of zero
square with light or the motion of photons in
the same way that straight lines whose vectors
are of negative square are identified with matter or uniform rectilinear motion of material
particles. But a straight line (in four dimenthe
sions) does not characterize the motion of
particle completely - it only gives the velocity
of the motion, it characterizes it only kinematically; for a complete dynamical characterization we had to introduce (Section 13) the mass
of the particle, and that led us to introduce
the momentum vector, whose square we found to
be -m 8 ; the complete characterization of a material particle consists then of a line with a
vector (of negative square) on that line.
In
the same way we shall characterize the motion of
a photon by a line with a vector of zero square
on it. We have thus the same picture for a material particle and a photon; in both cases we
have a line with a vector on it; only in the
first case it is a vector of negative square;
in the second of zero square; this difference
corresponds to the difference in the speeds of
the particles in the classical theory. But in
the classical theory this is a purely quantitative difference and here, as mentioned before,
it leads to qualitative differences, some of
which we are going to consider.
In the first place a photon cannot be transformed to rest. In fact transforming a photon
to rest would mean finding a coordinate system
such that in it the time axis will have the direction of the photon; but that would mean that
the vector 0,0,0,1 would have zero square which
is impossible.
Then there is this distinction:
two material particles may differ in mass, that means in
the squares of their momentum vectors, and this
is an essential difference because the square
of a vector is not affected by a transformation
of coordinates; all photons on the other hand
have vectors of the same square, namely zero.
We shall prove that as a consequence of this,
two photons never differ essentially,
that is,
given two photons there always exist two systems
of coordinates in which the descriptions of the
two photons are the same.
To begin with, we may

choose the origins of the two coordinate systems on the respective straight lines; next we
may consider the two lines In the respective
z-t planes.
The momentum vectors of the corresponding photons will have now In their respective coordinate systems the components
0,0,q a ,q 4 and 0,0,p,,p 4 (contrary to our general agreement we use here subscripts with physical coordinates) and since both vectors are of
zero square we'll have

/-

- 0,

Q4*

- P4*

P*

Of

by choosing appropriately the sense on each coordinate axis we can reduce these conditions to

16.1

" P4<

Now perform in the second system the transformation 10.6


z'

ZO + tT,

t'

ZT

tO,

which applied to the second vector and taking


into account 16.1 gives
16.2

= P* =

But a and T are subject only to the

condition
that o 2 - ** = 1, so that we can choose o + t
arbitrarily; if we make the choice

16.3

+ T

= q a /p 3 ,

we shall have p 3 = q 3 and the statement Is


proved.
This theoretical conclusion, that any two
photons are not essentially different from each
At
other must be confronted with experience.
first sight it seems to contradict it. We know
that light differs from case to case; it differs
in intensity and color. For difference in intensity we account by assuming that every bean
of light consists of many photons so that intensity (for a given color) is proportional to
Remains
the number of photons in the beam.
color. But experiments show that color actually depends on the state of motion of the observer; when an observer approaches a source of
light, color seemingly changes (Doppler effect)
and so the field is clear for our assertion.
Now let us see how it works out.
Before we treat the situation from the
point of view of the Relativity theory we have
to say a few words about how color appears in
From the
physics as a measurable quantity.
to
wave
of
the
of
view
light Is
theory
point
attached a certain measurable quantity v "frequency" which corresponds to color In the
sense that different colors correspond to different frequencies. On the corpuscular theory
photons are characterized by their energies, E,
and the fundamental relation between frequency

M
and energy is given by the formula

in the corresponding systems of coordinates it

E = hv,

16.4

where h is the so-called Planck's quantum constant, which for us appears simply as coefficient of proportionality establishing the relation between the values of two quantities which
measure the same thing in different units, much
in the same way that c, the velocity of light,
appears in the formula connecting mass and energy (compare 15.41) . Of the two quantities, S
E
and v, which can be used to measure color,
will be the one that is more convenient for our
purposes because we use the corpuscular theory
of light.
The question now is with what quantity in
our theory are we going to identify E. In order
to have a suggestion we notice that E is of the
character of kinetic energy; it plays for light
particles the same role that kinetic energy
plays for material particles. There (Section
sec13) we identified kinetic energy with the
ond term in the development
m

= m

of the fourth component of the momentum vector;


of the other terms the third and the following
are negligible for material particles,
and the
first is a constant so that it plays no part in
these considerations where only differences in
energy are important; besides, the corresponding constant for light is zero; everything leads
us thus to compare E with the fourth component
of the momentum vector of light, or photon. We
arrive in this way to a new identification; we
identify the mathematical quantity "time component of the momentum vector of a photon" with
the physical quantity E which, except for a factor of proportionality, is frequency and meaThis identification makes color
sures color.
on
the
coordinate system but this dedependent
pendence, as was said before, is to be expected,
and our next question is whether the character
of this dependence corresponds to experimental
facts.
Suppose that E is the energy of a light
corpuscle in one system of coordinates; what
will it be in another? We have already calculated how the components of a zero square vector change under a transformation of coordinates
involving time. Formula 16.3 shows that the
ratio of the fourth components of the two vectors, and according to our Identification this
means the ratio of the energies or frequencies,
is
v ' /v

a+i

this gives

and taking into account the identity


we find

16.4

v /v

T* = 1

T *

V.
or, in first approximation 1
Let us try to figure out the predicted
change of frequency on the classical (ware) theIf we have a wave of frequency v that
ory.
means that there are v vibrations per unit of
there
time, and since the velocity is unity,
will be v waves per unit of length, low, if we
move toward the source with a velocity V we
shall travel in a unit of time the distance V

and we shall meet V.v additional vibrations,


so that the number of vibrations our eye receives in a unit of time will be (1 * V).v, and
this will be the frequency for the moving observer.
The two theories give then the predictions

for the change in frequency due to motion of


the observer, and the difference between these
two values is too small to be subjected to an
experimental test; within experimental error
both seem to fit observations equally well.

17.

Electricity and Magnetism


in Special Relativity.

In the preceding sections of this chapter


we have discussed some modifications that are
brought about by the Theory of Relativity in
Kinematics, Mechanics, and Optics. There are
other modifications which have attracted a great
deal of popular attention due to their sensational a"nd paradoxical character. We shall only
mention the so-called effects of motion on the
shape of bodies, lengths, measure of tine, and
the fact that in the Theory of Relativity the
conception of simultaneity loses its absolute
character so that two events which are to be
considered simultaneous in one system of spacetime coordinates need not be simultaneous in another. But we shall say a few words about electricity and magnetism. Even in the first chapter the components of the electric and the magnetic force vectors were combined into one tensor f , so that electricity and magnetism seem
to be treated as two aspects, or manifestations
of a higher entity. But as long as we limit
t

On the other hand, we saw before (15.2) that the


relative velocity of two bodies which are at rest

o*

ourselves to transformations of space coordinates the components of F corresponding to


electricity are transformed among themselves and
those corresponding to magnetism - among themselves, so that their unification in one tenflowsor FJJ may be considered as artificial.
ever, when we introduce transformations of
space-time coordinates (formulas 10.6) the situation changes radically.
Following the procedure used in Section 11
when we were proving the invariant character of
the relations between the tensors F and D we
can deduce the following formulas corresponding
to rotation in the x 3 x 4 plane.
= F 31 s + F 41 c

F 48 - F 38 s
F 43 = F 43

F 48 c

F 83 = F 8 ,c - F 84 s
Fix = F 31 c - F 41 s
IB

From these mathematical formulas we can


pass to formulas involving physical components
and only real quantities by making use of 4.72,
.
We obtain
10.3 and the fact that F^ = thus the relations

ox + TM

= oL - TY

yi

= oy - TL

= oM + TX

Z'

= Z

L.

The interpretation of these formulas it that


if the unprimed letters give the component* of
electric and magnetic force in one system the
primed letters will give the components of electric and magnetic force in a system which mores
with respect to the first with Telocity ? * T/O.
These formulas show that the distinction
between electric and magnetic forces is not an
absolute distinction, but depends on the coordinate system used; we might, for example, hare
in the old coordinate system a purely electric
field, L = M = N = 0; in the new system the magnetic components will be different from zero,
viz., -YT , Xxp, 0. What is the physical meaning
of this? It means that a field may have electric effects on one body but electric and magnetic effects on a body that moves with respect
to it.
This prediction is verified by experiThe fact may be restated by saying that
ment.
an electric charge in motion has magnetic effects, it may, e.g., deflect a magnetic needle.
As an example, the magnetic field of a moving
We start
electron may be easily calculated.
with an electron at rest. Its magnetic field
is supposed to be zero, its electric field is
supposed to be Independent of time and symmetric
with respect to the point; under these conditions
Maxwell's equations reduce, as can be seen easas we know,
ily, to Newton's equations which,
give the inverse square law for the electric
forces. The field of an electron in motion can
now be obtained by applying the above formulas.

M
Chapter IT.

CURVED SPACE

The theory that has been developed so far


a general
may be said to consist of two parts:
part which may be called Geometry and which, In
addition to material analogous to that treated
in ordinary geometry, includes general rules of
operations on tensors, and a special part which
may be called Physics and which deals with three
definite tensor fields, a scalar field p, a vector field u^, an antisymmetric tensor field FJI,
which all have been combined into the tensor
field TJJ , and with special conditions which we
impose on these fields, viz., equations 11.1 to
The second part is independent of the
11.5.
first in the sense that we could have built with
the same geometry a different "physics", we could
have chosen another set of tensors instead of
The reason why our physics was inp, u 4 , FJJ .
dependent of our geometry is because the latter
does not furnish us any tensors, except the tensor Oij, or the tensor of scalar multiplication
which is, so to say, the same in all points
(and at all times) and therefore cannot be used
to explain the variety and change which are
characteristic of the outside world. In other
words, our geometry does not possess any structure which seems to be necessary for the interpretation of the outside world and therefore we
had to superimpose on our geometry a certain
arbitrary structure by introducing special tensors, by filling, so to say, the empty spacetime with these tensor, fields. Our geometry
does not give us a landscape, it gives us, so
to say, only a frame for one, or only a stage
and the landscape can be constructed on it by
means of stage-settings which do not constitute
an organic part of the stage. Although some
success has been achieved with the theory Just
described we may want to accomplish more, we may
want to have a geometry possessing a structure
of its own which might be used in interpreting
Such a possibility is sugthe outside world.
gested by the consideration of curved surfaces.
The space- time we have been working with is of
the same character as a plane (except for the
number of dimensions) it is as devoid of structure as a plane. A curved surface, on the othit is
er hand, possesses a certain structure;
not necessarily the same in all points, there
may be a difference in curvature. We shall investigate the possibility of a four-dimensional
space which bears the same relationship to our
flat space-time as a curved surface bears to a
plane; we shall expect to find that it possesses a certain structure which we'll try to interpret in terms of our physical quantities; more
specifically, since all our physical quantities
have been combined (by formula 11.5) into a symmetric tensor of the second rank, viz., T^j > *e
shall expect to find a tensor of that character

connected with the curvature of oar curved fourdimensional space.


The plan of our study will be to begin with
the simplest case, a case that Is even simpler
than that of a surface, viz., with the case of
a curve, and then to work up gradually.

18.

Curvature of Curves and Surfaces.

We consider a curve in the plane.


We assume that it possesses a tangent at every point,
and, furthermore, that if the origin of coordinates is chosen in any point of the curve, and
the tangent at the origin is chosen as the xaxis, the curve, in the neighborhood of the origin may be represented by a function

=*

f(x),

which can be expanded into a power series converging in the neighborhood of the origin. The
constant term of this expansion vanishes because
the curve passes through the origin so that the
equation must be satisfied for x = 0, y = 0; the
coefficient of the first power of x also vanishes, since the slope of the tangent, which is
the x-axis, must be zero; if we write the next
term in the form

the coefficient a a is called the curvature of


curve at the point considered, i.e., the point
chosen for the origin. Since every point can
be chosen for the origin this assigns to every
point of the curve a curvature. We may say
that if we drop all terms of the expansion following the one Just written out, i.e., consider
the curve

18.1

Y = *a,x'

this curve (a parabola) is an approximation to


the given curve in the neighborhood of the point
considered.
We consider next a surface. Here we assume that it has a tangent plane at every point.
Taking a particular point on the surface as the
origin and the tangent plane at that point as
the xy-plane we may represent the surface by an
equation z = f(x,y). We again assume that for
every point on the surface this function may be
developed into a power series converging in the
neighborhood of that point. The constant term
and the coefficients of the first powers of the
variables will vanish as before. We write out
the next group of terms, those that are quadratic in the variables, in the form

18. tt

Z =

a* +

2*18*1*8 +

where we use z x for x, and x a for y.


We may consider the coordinates x lf x 8 as
the components of a vector in the tangent plane
which Joins the origin to the projection of the
point on the surface, whose coordinates are x x ,
The expression 18. E assigns
x a , z = f(x 1 ,x a ).
thus to every vector in the tangent plane a number (which may be considered as the ordinate of
This
a paraboloid approximating the surface) .
assignment is independent of the coordinate system, i.e., if we choose another system of coordinates we shall have the same number assigned
to the same vector although its components will
have changed; in fact in a rotation of the coordinate axes the degree of a polynomial is not
affected so that the group of second degree
terms in the expansion of z is transformed into
the group of second degree terms of that -expansion in the new coordinates. We have then a
function whose values are numbers and whose argument is a vector; is it a tensor? Of course
not; but it is easy to introduce a tensor with
which our function 18.2 is closely connected.
We simply write, as in 9.1

18.3

cated thing and is characterized by a tensor;


we know, however, that there are certain numbers connected with a tensor, in an intrinsic
way (that is, independent of the system of coordinates), viz., the numbers given by 9.5 and
9.51, and we may expect that they have geometrical significance. In fact the first one,
+ a aa> is known as the mean curvature, and
a
the second,

iX

18.4

as the total curvature of the surface at the


point considered. We know that K is independent of the choice of a system of coordinates,
but we want to show how it can be obtained without the use of any coordinates at all. We hare
introduced above a vector notation s(x,y) for
our tensor 18.3; we now write out the expression (the expressions in coordinates are written down for the sake of future references and
may be disregarded at present):

s(x,u)

s(x,v)

s(y,u)

s(y,v)

18.5

a a2 x 3 y 2 = s(x,y)

using the coefficients a xl , a 12 , a zz of our


function 18.2 and writing in the third term a ai
for aia for the sake of symmetry; this is a
(symmetric) tensor of the second rank depending
on two vector arguments x and y, and from which
our function is obtained by setting the vector
arguments equal to each other. We arrived thus
in the case of a surface at a tensor of rank
two which expresses the curvature properties,
i.e., the structure of the surface insofar as it
describes its deviation from its tangent plane
in the neighborhood of the point of contact.
This encourages us in our enterprise: if we
succeed in generalizing this result to higher
dimensions, we may try to Identify the generalization of this symmetric tensor of rank two
with the symmetric tensor of rank two which, as
we saw, combines in itself matter and electricWe want to state at this time that we shall
ity.
be ultimately successful in our enterprise but
that everything will not run very smoothly, and
we shall have to make an effort in order to arrive in the general case at a tensor of rank
The configurations which will present themtwo.
selves immediately will not be exactly tensors,
and even after we shall arrive at a tensor it
will not be a tensor of rank two. We shall have
to overcome these obstacles, and in order to be
able to do that we shall need some preparation,
which we shall make by studying more attentively
the case of a surface before we pass to the consideration of more complicated cases.
The curvature of a curve is characterized
by a number; that of a surface is a more compli-

where x,y,u,v are arbitrary vectors, and we assert that it is equal to

x.u
18.6

X U

x.v
=

K.
y.u

P P

7ou o

y.v

to prove this consider the expression

u
& 21

multiplying by the law of multiplication of determinants the second and the third factors and
writing K for the first, we get 18.6; applying
the law of multiplication of determinants first
to the first two factors, then to the resulting
determinant and the third factor we get 18.5;
we may thus write
s(x,u)

s(x,v)

x.u

x.v

y.u

y.y

K.

18.7
s(y,u)

s(y,v)

we may now obtain K without using any system of


coordinates by dividing the left hand side by
the second factor on the right; the vectors
x,y,u,v, may be any arbitrary four vectors, only
such that the second factor on the right does
not vanish. Setting x=i, u=i, y=J, v=J,
where i,J are two coordinate vectors we get
formula 18.4; now it is seen to hold for any
system of coordinate vectors, so that incidentally we have a new proof of the invariance of K.

Before we leave the topic of ordinary surfaces we want to establish a relation between
curvature of surfaces and curvature of curves.
The points common to our surface and the xzplane constitute a plane curve whose equation
in the xz-plane may be obtained from z = f (x,y)
It is clear that the x-axis
by setting y = 0.
is a tangent to this curve and that the first
term of the expansion of z as a function f (x,0)
into a power series will be obtained by setting
in 18.2. We have thus
x2 =

the same plane, the form of the development*


will not be changed, of course, but the coefflcients a,, b t will assume new values; If, however, we consider these coefficients as components of a vector, the vector represented by
them will be the same in *T
coordinate systems.
Calling this vector v we may say that
the curvature situation of the curve is characterized by the expression

18.11

This expression plays the part of the expressions 18.1 and 18.2 which have occurred in the
two preceding situations.

ll*l

as this first non-vanishing term, and comparing


.with 18.1 we see that the curvature of the curve
is an, or (18.3) the value s(i,i). We can consider any plane passing through the z-axls as
the xz-plane, or any unit vector in the tangent
plane as the coordinate vector 1; we have thus
the result, that the curvature at the point of
contact of a curve, resulting from the intersection of the surface with a normal plane is
s(i,i), if 1 is a unit vector common to the

tangent plane and the normal plane considered.


In other words to every direction in the tangent plane, characterized by a unit vector i
corresponds a normal plane containing it, and
the curvature of the intersection of that plane
with the surface is s(i,i).
We see thus that to every direction in the
tangent plane corresponds a definite number
s(i,i), the curvature in that direction. As an
exercise the reader may try to express the curvature corresponding to a direction in the tangent plane in terms of the angle that direction
makes with the x-axis.
As the next step of our discussion whose
general aim is to arrive at the most general
situation as far as the number of dimensions is
concerned both of the space from which we start
and the configuration in it that we study, we
take a skew curve in ordinary space; first we
studied a curve (n =1) in a plane (N = 2); then
a surface (n = 2) in the ordinary space (N = 3);
8.
now we take up the case n
A curve
1, N
may be given in general by two equations on the
three coordinates x,y,z. Solving these equations for y and z, we represent the curve in the
rorm y = f(x), z = g(x); we again make the assumption that a tangent exists for every point
and that for every point, if we take this point
as the origin and the tangent as the x-axls, it
is possible to solve for y and z, and that the
functions f and g can be developed into power
series; as a result of the choice of the coordinate system, the two power series will begin
with quadratic terms

18.8
If we change the y- and z-axes which fall into
the normal plane to the curve, to other axes in

18.9

19 .

General! za t ions .

In the preceding section we discussed configurations in the ordinary space, and we could
rely on our intuition; everybody can conceive a
plane curve, a surface, a twisted curve; we
have at our disposal physical objects (drawings,
graphs, models) with measurements on which quantities of our theories may be identified successfully. In the investigation we undertake
and
now, we cannot use our intuition any more,
the identifications, when they come, will be of
a much less immediate character.
We have than
to rely on analogy with the configurations studied in the preceding section and on mathematical
reasoning supported by formulas.
We begin with what seems the next simplest
it
case, a surface in four-dimensional space;
may be considered as a generalization both of a
surface and of a curve in ordinary space. Such
one is given, in general, by two equations on
the four coordinates; in other words, we daf^nf
as a surface in four-space the totality of
points whose coordinates satisfy two equations
=
where F,G are
P(x,y,z,t) = 0, G(x,y,z,t)
two functions subjected to certain restrictions
We define
to be imposed presently.
plane
In four dimensions as a surface which may be
given by two linear equations (this definition,
although given in terms of coordinates, is invariant, because it can be proved that if the
equations are linear In one system of coordinates they remain linear after a transformation; the equivalence of this definition of the
plane with that given in Section 7 is easily
recognized). In the general case we choose a
point on the surface as the origin of coordinates; we solve the two equations for two of the
coordinates, and we define as the tangent plane
at that point the plane whose equations result
from omitting all but linear terms in the expansions. We next choose that plane as one of
our coordinate planes; the lowest terms in the
expansions are then the quadratic ones; denoting the coordinates for which the equation* apand the two other
pear as solved by x, x,

H
coordinates by x x , x a , we may write the groups
of quadratic terms in the two expansions as
a ta x,),

19.1
2b 18 x x x a + b aa x a ").
For every vector in the tangent plane of components x x , x a this gives us two numbers which
may be considered as the components of a vector
in the normal x, x 4 plane; or, we may consider
this vector as given by a vector form

19.2

'11*1

2v xa x 1 x a + v aa x 8

the coefficients v^j being vectors of the normal plane whose components are a ij and bjj (along
This expresthe x 3 and x 4 axes respectively) .
sion assigns to every vector of the tangent
plane a vector of the normal plane; we may substitute for it, as we did in an analogous case
before (compare 18.3), a more general expression

19.3

s(x,y)

19.41

v ai

but although this is linear in each of the vectors x and y it is_ not a tensor, because the
values of this expression are not numbers (they
are vectors in the normal plane) . We shall not
introduce a special name for such expressions
because we shall not have to deal with them much;
the expression 19.5 we have denoted, as before,
by s(x,y), but we must keep in mind that the
values of s(x,y) are not numbers but vectors of
the normal plane.
We may in this case form the expression
18.5 where it is understood that in the expansion of the determinant scalar products have to
be used where ordinary products were used before; this change is made necessary by the more
than once mentioned fact that the values of the
elements are vectors. In all other respects we
can apply to the expression the same reasoning
as before and we come to the conclusion that the
relation 18.7 remains true, where K is a number,
independent of the coordinate systems in the tangent and normal planes, but which after such coordinate systems have been chosen can be calculated in terms of the coefficients of the vector
form 19.3 by means of the formula

xx

The next generalization is an easy one; we


still consider a surface (n = 2) but instead of
a four-dimensional space we take a space of an
arbitrary number of dimensions H; we denote
=
and we have, a tangent plane,
as before, but instead of a normal plane,
we
have now a normal r-dimensional space, an r-flat
as we may say.
We call the corresponding
coordinates x, x 4 , etc., or Xg+k, where k goes
now from 1 to r instead of only taking the values 1 and 2; we have here a vector form which
may be written as before (19.2) only the v's
are vectors of the normal flat and have r components each; these components we may distinguish by upper indices In brackets; if we denote by I^ the r coordinate vectors in the normal flat, and denote by aQO the components corresponding to I k of v^ we may write

N-n

V 82 x 8 y a ,

where

invariant, which could be called mean curvature


and written as v lx + v t Is not a number in this
case. The number, K, we call, as before, the
total curvature of the surface at the point considered.
In terms of the coefficients of the numerical forms 19.1 the total curvature K may be
expressed as follows: expanding the determinant 19.4 we have K - v^.v.. - v ai .v xa ; the
term v xl .v 22 , for Instance, Is the scalar product of the vectors vn and v aa whose components
are respectively a lt , bn and a.,, b 88 ; the scalar product vlx .v aa is then a lx a sa + b lx b aa ; In
the same way, the term v ai .v ia in the expression
for K is a ia a ai + b ia b ai ,
so that we have for
,K in terms of the a's and b's rearranging terms
and using determinant notation

N-2byr

19.5

k=l

and for s(x,y) we may write

19.6
r

(k)

krl

otherwise there will be no changes. We can fora


the expression 19.4 as before; it will be independent of the choice of the coordinates x t + k
because the scalar products used in the expansion of the determinant are; substituting the
values 19.5 and evaluating we will have

19.4
xx
r

The important fact is


sion 19.3 is a vector
does not furnish us a
responding to 19.4 is

that, although our expresexpression, and therefore,


tensor, the invariant corstill a number. The other

19.41

k=l

.00

an obvious generalization of formula 1^.41 which

may be obtained from this by taking r = 2, and


writing a for at 1 ) and b for a( 2 ) with proper
subscripts. We may, if we wish, write out an
expression analogous to 18.5. Substituting for
s(x,u) etc., the expressions 19.6 and using
scalar products in the evaluation of the determinant we shall find

It Is natural to try to generalize the


of total curvature. We can form the expression
18.5 but, and this is important, the transformation 18.7 does not apply; It was based essentially on the fact n = 2, and it breaks down
here.

s(x,u)

20.
=

19.43

k=l

where the Greek letter subscripts imply summation fron one to two corresponding to the tangent plane, and the summation with respect to k
corresponding to the r coordinates of the normal flat is indicated in the usual fashion. Each
of the determinants corresponding to the differrent values of k is exactly of the same nature
as 18.5 so that the reasoning which led from it
to 18.7 applies to each of the determinants
without change, and it is easy to see that formula 18.7 continues to hold. We may use this
formula to define K which we continue to call
total curvature.
And now we come to the last generalization.
We consider, in a space of an arbitrary number
of dimensions N a curved space of n dimensions
with n also an arbitrary number N, which by
definition is the totality of points whose coordinates satisfy N - n = r equations

19.7

Fk

(xi.,

x 2 ,....xN )

=0

(k = l,2,...r).

V.'e
assume that for every point in the curved space these equations can be solved for r
of the coordinates and that these solutions can
be expanded into power series in the remaining
n coordinates, converging in the neighborhood of
the point selected. By a transformation of coordinates we may arrange it so that these expansions begin with second degree terms so that
we may write

19.71

x n+j

terms of higher degree

where the summation indicated by the Greek indices now goes from 1 to n. The sub-space defined by the first n coordinate axes we call the
tangent flat space at the point considered, and
the sub-space corresponding to the remaining r
coordinate axes - the normal flat space at that
point.
/jjN
As before, we use the coefficients aj
to
form the expressions

19.8

where x if y^ are two vectors of the tangent flat;


and we combine these expressions into a vector
expression
19.9

s(x,y)

The Rlemann Tensor.

p-J-k

The way out of this difficulty is very


simple. Although relation 18.7 does not hold
we still may consider its left hand side; it is
a function of the four vectors x,y,u,r,
function, which has numerical values; it is easy to
show that it is linear in each of the vector
arguments (we leave this proof to the reader
because the result will follow later from formula 20.8); it is therefore a tensor, a tensor
of rank four; we call it the Riemann tensor,
denote it by R(x,y;u,v) and write

20.1

s(x,u)

s(x,v)

s(y,u)

s(y,v)

R(x,y;u,v) =

We have then at every point of the curved


space a tensor of rank four instead of a number;
it is connected with the second degree terms of
the expansions 19.71 and therefore characterizes, at least in part, the deviation of the expressions of the x a+k from linearity, or of our
space from flatness. The Riemann tensor tells
us then something about the curvature of the
curved space, and it is often called the curvature tensor.
The situation we have now reminds us of a
situation in Section 18. The curvature of a
curve was a number; when we passed to a surface
we found that its curvature was characterized
by a tensor; we have succeeded to derive from
this tensor a number K, so that we could express (at least partially) the curvature of a
surface by a number. Now passing to higher
curved spaces we again obtain a* tensor. In the
preceding situation we succeeded in interpreting the tensor s(x,y) given by formula 18.5 in
terms of curvatures of certain curves on the
surface; we found that the value s(i,i) gives
the curvature of the normal section determined
by the unit vector 1 and the normal to the surIs it possible to interpret the Riemann
face.
tensor in an analogous way as giving the total
curvatures of some surfaces on our curved space?
This is a natural question to ask, and the answer is affirmative. We shall prove, in fact,
that certain values of the Riemann tensor give
us total curvatures of surfaces situated on the
curved space. Let i,J be two arbitrary mutually perpendicular unit vectors of the tangent
so
flat; choose a set of coordinate vectors
that i,J be two of them. Pass through 1,J and
the normal flat, i.e., through the r vectors I k ,

10

its points
a flat space of 2 + r dimensions;
will be those points of the N-space whose coordinates x a , x 4 ,....x n vanish; the intersection
of this flat space with the given curved space
will be a surface, i.e., a two-dimensional curved space, because the coordinates of its points
must satisfy the r equations of the curved space
(19.7) and n - 2 equations

but we shall get one of rank two froa R(x,y;u,v)


by applying to it the operation of contraction.
In the meantime let us study the Riemann
tensor, or the curvature tensor as It Is called
sometimes, as we have It. The Riemann tensor
is not a general tensor of rank four.
It satisfies the relations
20.41

20.2

0,

= 0,

x4 =

isN-n+n-2=N-2

which together
equaThis surface we may consider as a surtions.
face of the r + 2 dimensional flat space 20.2;
its equations in that space will be obtained by
in the equations
setting x 3 = x 4 = ...x n =
19.71 of the curved space (just as the equation
of the normal section of a surface in the xzplane was obtained (preceding formula 18.11) by
in the equation of the surface);
setting y =
these equations will then become

20.3
+ terms of higher degree

R(x,y;u,v) - -R(y,x;u,v) - -B(x,y;T,u),


R(x,y;u,v) = R(u,v;x,y),

20.42

20.43

R(x,y;u,v) + R(x,u;v,y) + R(x,v;y,u) - 0,

which are easily verified by using 20.1.


The
first of these relations says that R is antisymmetric in each of the pairs of the vector
arguments, and the second, that it is symmetric
in the two pairs.
If we introduce a coordinate system in the
tangent flat, by picking four coordinate vectors
i,j,k,l or i x , i t , 1 3 , 1 4 , we may represent the
vector arguments (as in Section 9) in the form
x = i a Xa> etc., substitute these expressions Into R(x,y;u,v) and, by using linearity as defined in 9.6, write the Riemann tensor as

and the total curvature of this surface is

20.5
,1

v ia

where
'28

R ab;cd

20.6

with
(k)

but v lx = s(i,i), v 12 = s(i,j),


v 22 = s(j,j), so that we have

v ai =

(We use here the first letters of the alphabet


as subscripts, instead of i, etc., as before,
in order to avoid confusion with the coordinate
vectors which we denote by i.) These are the
components of the Riemann tensor in t.' c ordinate system chosen. The relations 20.4 can be
written in components as

20.71

which is R(i,J;i,j), and our statement is proved.


As we saw at the end of Section 18, a unit vector i in the tangent plane to an ordinary surface determines a direction, a straight line,
which, together with the normal determines a normal plane, and the intersection of that normal
plane with the surface is a normal section of
curvature s(i,i); here we have the situation
that two unit vectors i,j in the tangent flat to
a curved space determine an orientation, a plane,
which together with the normal r-flat determines
a normal r + 2 flat, and the intersection of
that normal r + 2 flat with the curved space is
a normal section, a surface of curvature
R(i,J;i,j). We see then that the Riemann tensor plays with respect to a curved space a role
analogous to that played by the tensor s(x,y)
with respect to an ordinary surface; our expectations then are fulfilled; we need, it is true,
for the purposes of identification with the complete tensor TJ a tensor of the second rank,

20.72
20.73

abjcd

= ~R

Rab;cd
Rab;cd

+ R

'

ba;od

= R

~R

abjdc

od;ab

ac;db

+ R ad;bc

= 0.

Exercise. Prove that the number of independent


components of a Riemann tensor for four dimensions is 20.
The vectors of the flat spaces tangent to
the curved space may be considered as belonging
to the curved space, they may be characterized
In terms of the space itself, for instance, by giving direction and length; they are accessible
as we may say, to beings who live In the space
and for whom points outside the space do not
exist. Normal vectors, the function s(x,y) etc.,
to the inare, on the contrary, not accessible
habitants of the space. We shall confine ourselves for the most to the consideration of
these internal properties, properties accessible
to the inhabitants; but later in the course of
our investigation we shall have to use the

41

expression of the Riemann tensor in terms of the


and we shall conclude this seccoefficients
a[j)
tion by deducing it.
Substituting the expression 19.9 for s into 20.1 and using 20.5 for the left hand side,
we find

20 ' 8

this determinant may be presented as the sum of


8
determinants, of which, however, only r are
different from zero, namely those In which the same
I appears in the two columns, because in the expansions of the others all terms vanish as involving products of different and therefore mutually perpendicular I's. What remains is (compare 19.45)
r

r
Z
k=l

(k)

A
because the I's are unit vectors and I^.Ifc = 1>
or
r
2

k=l

PY

Comparing this to the left hand side of 20.8 we


have the required expression
0.9

21.

=
abjcd

Vectors in General Coordinates.

In the last section we learned how to associate with every point of a curved space a tensor of rank fourj for our physical interpretation
we need one of rank two; but we know how to obtain from a tensor of rank four one of rank two;
we have to apply the operation of contraction.
The result we shall call the "contracted Riemann
tensor" and we shall expect to identify it with
The first question we have to ask
the tensor T.
ourselves in this connection is whether the contracted Riemann tensor satisfies the equation
11.4, viz., oTia/axg = o. But before we do that
we have to go through quite a lengthy development
because at the present stage we do not know how
to differentiate tensors on a curved space. In
flat space we could consider the differential of
a vector, or, more exactly, of a vector field,
by (roughly) considering the difference of two

vectors of the field in two neighboring points.


In curved space, or on a curved surface two
vectors in two different points belong to two
different tangent planes and their difference
is not a vector of the surface at all.
Or we
could in a flat space adopt a cartesian system
of coordinates and Introduce as the components
of the differential the derivatives of the components of the given tensor. This method also
is not applicable directly to curved space because there is no such a thing here as cartesian
coordinates. Each method could be so modified
as to apply to curved space - the geometrical
method and the coordinate method. We shall develop here the coordinate method because in addition to permitting us to introduce differentiation - our immediate concern now - it is Indispensable in treating special cases.
As we said before, there is no such a thipg
as the cartesian system of coordinates in curved
so
space, because there are no straight lines;
we shall have to use some other coordinates, let
us say, general coordinates; the main difficulty
in treating curved spaces is just this, viz.,
that rectilinear coordinates are not applicable
part of the difficulty
here, or we may say:
lies in the fact that we have to use curvilinear
coordinates (the greater part) and part - in the
fact that the situation itself is so different
from that we encounter in flat space and with
which we are more or less familiar. Or to put
the difficulty
it in a still different form:
is two-fold, we have a new material to work on
and we have to use new tools. To obviate the
difficulty we are going to divide it; we already
have studied curved spaces in the preceding sections; now we shall try to become familiar with
the new tool - the method of general coordinates,
applying it to the old material - ordinary threedimensional space; and then - beginning with
Section 25, we shall study curved space by means
of curvilinear coordinates.
The essential thing in the matter of coordinates Is that points receive names, the names
being composed of numbers, so that we can handle
numbers, which we can do by means of formulas,
instead of points themselves. There are many
different ways of establishing a correspondence
between points and triples of numbers; in the
one that bears the name of Descartes (Cartesius)
the three numbers which are assigned to a point
are its distances from three mutually perpendicular planes; there does not seem to be anything
that can take the place of this method in a general curved space because there are no planes
and straight lines; still we may use coordinates;
a system of coordinates on a special curved surface is known to everybody, even to those who
never studied analytic geometry; we mean the system of specifying the position on the surface of
the earth by means of latitude and longitude.
Polar coordinates in the plane or in space furnish another example of a coordinate system
which is not cartesian; in what follows we shall

use an entirely arbitrary system of coordinates;


we shall assume that a one-to-one correspondence
is established between the points of a certain
portion of space (which may be the whole space)
and the triples of a certain set of triples of
numbers. We shall call these numbers u^u,, u 3
or ui and we shall keep the notation X A for
some definite system of cartesian coordinates.
To every triple Ui, u 8 , u, corresponds a point
whose cartesian coordinates (in some definite
system) are XJL; these numbers xi are therefore
determined by the ui's;we have three functions
1.1

X 8 = X(Ui,U 8 ,U 3 )
*1 = Xl(u X ,U 8 ,U 3 )
x, = X 3 (u!,u 8 ,u 3 )

which are defined on a certain range of triples.


Conversely, if xx , x 2f x 3 are the cartesian coordinates of a point of the portion of our space
for which general coordinate have been introduced, they determine three numbers u x , u 2 , u 3 ,
which therefore are functions of the x's

u8
21.2
= U 3 (xi,x 8 ,x 3 ),

which are defined for a certain range of triples


xi and are the inverse functions of the functions 21.1.
We have to handle vectors even more often
than we have to handle points, and we want to
have a numerical representation also for vectors.
Together with cartesian coordinates for points
goes a very simple numerical representation for
vectors; we represent a vector by three numbers
which are the differences between the corresponding coordinates of its end-points, and are
called the components of the vector; of course,
in a different system of cartesian coordinates
the same vector will have other components, but
as long as we keep to a definite coordinate system, vectors, as well as points have definite
names. The method of representing vectors by
their cartesian components has the great advantage that two equal vectors have equal components, that we can add vectors by adding their
components, and multiply a vector by a number
by multiplying the components by that number;
these advantages are peculiar to the cartesian
method and cannot be reproduced in other systems.
The theory of curved space is differential geometry, we cannot handle immediately by its methods
such things as a configuration consisting of two
points at a finite distance; if we do we have
to introduce intermediate points, instead of subtraction we have here differentiation. There
are two ways in which vectors arise by differentiation, and each gives rise to a system of notation for vectors associated with a given coordinate system for points - only for the rectangular cartesian system do the two representations coincide. The two ways in which a vector
appears as a result of differentiation are - the

tangent vector of a curve and the gradient of a


field. In this and the next section we shall
take only the first of these two points of view.
Given a curve in cartesian coordinates in parametric fora

21.3

y(p),

the components of the tangent vector are obtained by differentiation

21.4

dx/dp,

dy/dp,

dz/dp.

This vector is determined not by the curve alone,


but by the particular parametric representation
we are using, but in this chapter we are not
going to change the parameter often and we shall
speak of a curve when we mean "curve in a given
parametric representation", and of the tangent
vector, when we mean "tangent vector resulting
from differentiation with respect to that particular parameter". In cartesian coordinates,
then, the components of the tangent vector are
obtained by differentiating the coordinates of
the points of the curve. This is certainly convenient, and we may ask ourselves whether we
could not reproduce this advantage in general
coordinates. Let us try; the parametric representation of the curve in the u s can be obtained by substituting XI(P) into 21.2; the u^s
become then functions of p, and this gives a
parametric representation of the same curve in
general coordinates; let us agree to represent
the vector which, when we used the cartesian
system had the components 21.4, by the three
numbers
1

du a /dp,

21.5

duj/dp.

We have then the required system of representation; but it is not necessary, every time
when we want to represent a vector to introduce
a curve to which it is tangent; we shall show
how to find the components 21.5 when we are given the cartesian components 21.4 without actually considering the curve.
We have, considering that Uj depends on
xx , x 8 , x 3 which in turn depend on p,

or using the summation convention and applying


to

Uj.,

21.6

u8 , u a ,
duj/dp

oui/3xa.dx<i/dp,

so that, if we denote the quantities 21.4 by


1
and the quantities 21.5 by V , we have

21.7

V 1 * OUi/OXo.Ya

Introducing the abbreviation

where we use another summation letter In the


second expression to avoid complications in
what follows. How we write the scalar product
using the formula V YW and get

21.8

we may write the last formulas as

21.15
21.9

It will be explained later (Section 23) why we


use in the left hand side the index as a superThese are transformation formulas for
script.
vector components which are associated with the
transformation formulas 21. 2 for the coordinates
of points; the formula Just written out permits
to find the general components when the cartesian components are given. In a similar way we
can find the inverse transformation formulas by
starting with a parametric representation of a
curve in general coordinates, substituting the
u^p) into the formulas 21.1 and differentiating;
we arrive thus at

21.10
where
21.11

Before we go further we shall use the fact


that the formulas 21.9 and 21.10 are inverses
of each other to obtain some relations on the
a's and b's. Substituting 21.9 into 21.10 we
get
21.12

V 1 = a la v a .

v l = b ia a v
op p>

the left hand side may be written as


that
bio a

ap

=0
p

1Q v fl

so

10V

and since this is an identity (v being arbitrary)


we have

It follows that in order to be able to find


scalar products of rectors given by their general components we have to know the quantities

21.16

The quantities a's and b's help to pass fro* a


certain cartesian system of coordinates to the
general system; they express a relation between
the general system and that particular cartesian system; and thus are not of fundamental importance; the quantities gj* , on the contrary,
although they have been obtained by means of
the a's and b's, are independent, as their significance shows, from any particular system of
cartesian coordinates, they characterize the
system of coordinates we are using In itself
(and, as we shall see later, they characterize
it completely, so that the g's are all we need
to know in order to be able to handle vectors
The a's
given by their general components).
as well as the b's may be considered either as
functions of the x's, or as functions of the
u's.
The g's always will be considered as
functions of the u's.
Before we go any farther re note that, as
it immediately follows from the definition
21.16 the g's are symmetric in the indices:
= g
Ji*

21.16'

In order tb show the importance of the


g's let us deduce a formula for the length of
a curve given in general coordinates.
Let the
curve be given by

21.31

21.13

ia*aj=lj-

In the same way we may obtain

21.14

a iab

=
aj

C,

We want to be able to operate with general


components of vectors, for instance, find a
scalar product of two vectors given in their general components; it is easy to obtain a formula
answering this question by passing to cartesian
components, and then applying the formula for the
scalar product in cartesian components. Let the
general components of two vectors be V 1 and W*;
according to the formula 21.10 their cartesian
components are
and

n b rj

Ui(p).

For a curve given in cartesian coordinates


assume as known the formula (compare 12.1)

21.17

=/|/(dx/dp)"

+ (dy/dp)*

-i-

we

(dz/dp)dp

where s is the arc length between two points.


This formula involves three inverse operations,
that of integration, that of taking the square
It is not pleasant,
root, and that of division.
in general, to have to do with these operations,
and so we shall free our formula from them, and
write it as
21.17'

ds

= dx 8 + dy*

dz 1

The formula as Just written is not essentially


different from the one written above, and means
exactly the same thing. The sign d may be taken

to mean differentiation with respect to some un-

specified parameter, since the correctness of


the formula does not depend on what parameter we
are using (provided that the same parameter IB
used on both sides). We translate 21.17* now
Into general components. Differentiating 21.2
we have
a
dy = b ia .du ,

dx =

dz = b 3 a.du;

for dx 2 we may write bi adu.bijjduP; using similar expressions for the other terms of 21.17' we
get
ds a

21.18

using the abbreviation 21.15 introduced before.


We see that the quantities gj_< appear again.

22.

Tensors in General Coordinates.

We come now to the representation of tenWe know that a tensor is a function which
assigns numbers to vectors, and the question of
representation will be simply this: given the
general components of the vector arguments how
to find the corresponding value of the
tensor.
We have already solved this problem for one particular tensor, namely for the tensor of the
scalar product which we expressed in the preceding section by means of the g's, and we shell
use the same method in the general case. Given
the cartesian components of a tensor
fj* and the
1
general components V* and W of two vector arguments, to find the corresponding value of the
tensor. We pass from general components to cartesian components by formulas 21.10 and substitute these expressions into the expression
f
Y6 v Yw C for tne value of tne tensor; the result
sors.

is

and this may be written as


pp

Tji

T7^wP ~

f*

f
Z

if we set
PP P

.Vi
D

Y6 Yi Oj

1?

*lj

we call FIJ the general components of the tensor, whose cartesian components are fji , and we
see that the values of a tensor are expressed by
formula 22.1 in its general components and the
general components of the vector arguments in
the same way as in terms of cartesian components
of the tensor and the vector arguments.
Formula 21.15 for the scalar product of two vectors
may be considered as a special case of 22.1. The
cartesian components of the tensor gj j are, of

course, the dj , and substituting 6 for f


In
22.2 we get the expressions 21.18 for P .
We
tj
treated here as an example a tensor of rank
two;
similar calculations can be performed for a tensor of any rank; we give the results for rank
one and three, leaving it to the reader to go
through the calculations:

22.11

F,

Now naturally the inverse problem present*


itself; given the general components of a tensor to find its cartesian components. The problem can be solved by substituting in the expression for the value of a tensor (we again take a
tensor of rank two as an example) given in terms
of general components, F VY W, the expressions
r0
21.9 for the general components of the vector
arguments in terms of the cartesian components:
vi = a iava> w4 = a w
the result is
p>
lp

comparing this to the expression fa0 va wfl


the value of a tensor in cartesian components
we derive the desired transformation formula
for passing from general to cartesian components; here are these formulas for the first
three ranks
fi = Fa a oi>

22.5
ljk

We know now how to write tensors in general components, and we want to find out how to
perform operations on them. Of course, we could
always pass from general components to cartesian components, perform the required operations
and then, if the result is a tensor, pass back
to general components; but instead of following
this program in every special case as it presents itself we shall do it once for all and
derive general formulas whose application in
special cases is much more convenient than ad
hoc calculations.
We begin with the operation of contraction.
Given again a tensor of rank two by its general
components FJJ we pass to its cartesian components by formula 22.3 and now we contract by
taking the sum of components with equal indices
according to the original definition,
22.4

fyy

where we use the abbreviation


22.5

aiY a

jY

= gi ^'

For tensors of higher ranks (contraction is


possible only for tensors whose rank is* 2) entirely analogous formulas may be obtained easily;
indices which are not affected by contraction
may be simply disregarded, as it follows from
similar calculations which are left to the

For example, the result of contracting


respect to the second and fourth Indices of
will be
a tensor of the fourth rank F

22.8

22.41

22.81

student.

irith

ijkl

iokp

The quantities g 1 J introduced a moment ago


play quite an Important role comparable to that
with lower indices, and they are conof the gj
nected with them by the formula
8

j
g

aj

To prove it suffices to substitute the expressions 21.16 and 22.5 for the two kinds of g's
also
and to apply formula 21.14 twice; we may
notice that, as it follows from the definition,
op 7(
66

"
tfij
g

"
=- g
^Ji

so that formula 22.6 may also

be written as

22.61
Next comes the operation of differentiation.
The result of differentiating a tensor is always
a tensor of rank higher by one than the
given
tensor; its components will have one more index than those of the given tensor; we shall
denote them by simply adding a new index preceded by a comma, to the symbol of the given tensor.
Because the situation is slightly more complicated, let us start in translating differentiation into general components with the simplest case of a tensor of rank one given in general components, F^. We follow the same program:
as a first step we pass to cartesian components
we next
by formula 22.5 and get f^ = Ffja^:
find the cartesian components of the differential
by simply differentiating with respect to cartesian coordinates with the result

As a third step we pass back to general components using the formula 22.2 and arriving at the
result

but b

according to formula 21.11 is


j
so that this expression reduces to

using relation 21.14 the first term reduces to

just what we would expect from analogy with car


tesian coordinates as an expression for the re
sult of differentiation; however, this is not
the whole answer because there is a second term
so that the final result is

where we set

the second term may be considered as being In


the nature of a correction to the expected result; we call it the correction term, and we
call r.. the correction coefficient*.
We see
then that in general coordinates the components
of the differential of a tensor of rank one consist of two parts - .the first expresses the
change (or rate of change) of components of the
tensor, the second is due to the change of the
coordinate system from point to point. In the
case of the cartesian system the coordinate system is, so to say, the same in all points,
the
second term is zero (the a's reduce in this
case to constants, and their derivatives vanish);
another extreme example is furnished by a tensor
whose components are constants in some non-cartesian system of coordinates (for Instance, polar coordinates) ; the derivatives of the components with respect to the coordinates are zero
but the components of the differential are not;
their values are given by the correction tents
alone .
For tensors of rank higher than the first
the calculations are slightly more complicated,
but the principle is the sane; we write out the
results for tensors of rank two and three

22.82

ij,k

22.66

ijk,a

r
r

<*

v
F
iak

ru
r

r
9
r
i F ajk

TT

k*ija>

there
the general rule ought to be clear now;
are as many correction terms as there are indices; each correction term corresponds to one
index, the other indices being disregarded in
its formation.
In order to be able to perform the operation of contraction (and the operation of scalin
ar multiplication is a special case of it)
general coordinates we have to know, as we saw,
the values of the g's; in order to be able to
perform the operation of differentiation we
have to know the value of the T's (the correction coefficients); if we know those we can perform all the necessary operations in general coordinates without going back to cartesian coordinates. We shall show now the values of the
T 's can be derived from those of the g's.
The correction coefficients were given originally by the formula 22.81; we can give to
this another form by using the relation 21.14;
=ft kl and differentiating
writing it as a^b
it with respect to uj we get, since the as are
constants,

&

so that we have

We multiply now both sides by g** and turn with


respect to k, writing for it a Greek Index, e.g.,
.
Taking into account 22.61 we have

or, recalling the definition 21.11 of the b's,

22.91

22.84

k
r, 4

d*Xv

from this expression it follows that r is not


affected by interchanging the two lower indices,
or
22.71

=
ij

r i*
j

We may now show how the T's can be derived


from the g's. We shall do that by using the
following artifice. Consider the tensor of scalar multiplication, whose cartesian components
are the O^j and whose general components were
shown to be
the components of the differengjj ;
tial of this tensor in cartesian components are
the derivatives of the 6' s and therefore zero;
the second formula 22.11 shows that the general
components of this tensor of the third rank also
must vanish, so that

22.85

(we did not promise that general components of


tensors will always be given by capital letters,
but since heretofore we have been using capital
letters for them it may be well to emphasize
that the g's are intended to represent (following the generally accepted custom) general components of the scalar multiplication tensor) .
On the other hand, we can calculate the components of
the application of formula
gij^fc by
22.82 and so* we get

o,

this is a system of equations connecting the r's


with the g's and their derivatives; we want to
solve them for the r's. For that purpose we
write out the above relation in two more forms
resulting from it by cyclic interchanges of indices:
-

o,

subtracting the last two relations from the first


we notice that as the result of symmetry of the
g's and the r's in the lower indices (formulas
21.16' and 22.71) four of the terms containing
the r' s cancel and the remaining two are identical; we thus have

This shows how, given the g's, to calculate the


We see thus that if only we are given
r's.
the g's as functions of the u's we can perform
all the required operations on tensors.
Very
often the calculation of the rs is divided into two parts; first the left hand sides of
22.9 are calculated and listed; they are denoted by r
and then the
k>1 j ;
r^ are calculated
using the formulas in the form

22.92

23.

Co variant and Contravariant Components.

We know (Section 9) that a vector is a ten


sor of rank one, or, more precisely, that to
every vector v there corresponds a tensor of
rank one v.x which has the same cartesian components. Now we have introduced general components for vectors and also for tensors; if we
have cartesian components of a vector v^
to
them correspond (21.9) the general components
V 1 = a la va ;
also if we consider the vi as the components of
a tensor to them correspond (22.11) the general

components

to the same cartesian components v^ there correspond thus two different sets of general components depending on whether we consider the v^
as vector or as tensor components; it was in anicipation of this situation that we have been
using the index for general vector components
as a superscript.
Essentially, a vector and a
tensor of rank one are one and the same thing;
and so we have two different systems of components for every vector (in a given general coordinate system) ; the components with subscripts
are called covariant components, those with the
superscript - contravariant components. It is
clear from what precedes, but it may be worthwhile to repeat that we have here two different
representations of one and the same thing.
It was mentioned in Section 21 that there
are two ways in which a vector results from differentiation; one, a vector considered as a tangent vector to a curve, was discussed before,
and is the basis of what we have been doing all
this time; it is interesting to consider now
briefly the other. If we start with a scalar
field f = f(x,y,z) we may derive a vector field
by differentiation, and the cartesian components

47
of tals vector field will be

23.1
this vector la known as the gradient of the field
f ; now, we may give the same scalar field in general coordinates
f = f

Xi(U!U t U,), X,(UiU t U,),

will
if we differentiate f with respect to u lf
we obtain general components of the gradient
vector field? The question is easily answered
by computing these components; we have
23.2

of

Pf

oxg

23.4

3f

comparing this with 22.11 we see that the partial derivatives of/oui are the components with
subscripts - the covariant components in general coordinates of the gradient. The two representations, the covariant, and the contravariant, may be thus considered as corresponding to
two ways in which a vector can be arrived at by
differentiation; if we consider a vector as a
gradient we arrive naturally at Its representation by covariant components, if we consider it
as a tangent vector we arrive at its representation by contravariant components.
(The name coto
Indicate
is
Intended
the
way,
variant, by
that these components change in the same way, as,
or have similar formulas of transformation with,
partial derivatives.) In the case of cartesian
coordinates, of course, covariant and contravariant components of a given vector coincide: in
this case it is not necessary to make any distinction.
We shall have to use covariant as well as
contravariant components, and it is important
to be able to pass from one to the other representation; the necessary formulas can be found,
of course, by passing through a cartesian representation. Let covariant components F A of a
tensor of rank one (or vector) be given; formulas 22.3 show us that the corresponding carteterms of these
sian components are fA = Fa a ai ;
the contravariant components are obtained by
formula 21.9 which gives here

23.3

two, means to Indulge In luxury; as a natter of


fact this double notation If a defect from a
didactical point of view: it makes it more
difficult to learn the new language; but once
mastered it makes the calculations much simpler
and the formulas much shorter and more elegant,
if properly used; as an example, we want to
give the formula for the scalar product of two
vectors, one of which is given In covariant,
and the other in contravariant components. This
formula can be obtained by the usual procedure,
i.e., passing through cartesian components, but
we have already reached a stage where we can
dispense to a great extent with the use of cartesian coordinates. The required formula is
simply

F 1 = a F aa a p =
lp

and it can be proved by simply substituting for


W a its expression according to formula 23. 81,
viz., g a aW0 and comparing the result to 21.15;
of course, the scalar product could also be
written as VgW*, and also g a ^VJlp, as it Is
easy to verify.
In Section 22 we derived a system of representation for tensors starting with the contravariant representation of vectors; we could do
the same thing starting with the covariant representation of vectors, and we shall do it so
as to have a perfectly symmetrical system of
notations.
Suppose we are given covariant components
of two vectors Vj, W4 and the components PJJ of
a tensor, and we want to find the value of the
tensor corresponding to the given vectors as arguments; we know how to solve the problem if
the vectors are given by their contravariant
first
components; therefore, let us calculate
1
the contravariant components, viz., V = gr'*\f
W 1 = g^Wg, and then substitute them Into the
left hand side of expression 22.1 giving the
value of the tensor. The result is
23.4

which may be written as


23.41
if we introduce the notation

23.42

In the same way


if abbreviation 22.5 is used.
it is easy to prove the following formula, which
permits to calculate covariant components when
the contravariant components are given
'

23.31
It may seem that there is a wasteful redundancy in this double system of notations, that
one representation is enough, and that to have

1
We call F ^ the contravariant components of the
tensor FJJ and the components with lower Indices
(subscripts) which we have been using for tensors heretofore we call covariant components.
We have thus two representations not only for
tensors of the first rank (vectors) but also for
tensors of all ranks. In one case we have been
using already a symbol with two superscripts,
1
we shall
viz., the g ^ (introduced by 22.5);
in
is
agreement
notation
this
that
show now

48
with the general notation we are introducing now
by proving that these g's with upper indices are
the contravariant components of the tensor of
In order to prove that,
scalar multiplication.
the
we notice that, according to formula 23.42
contravariant components of a tensor of covariant components g^ are

advantages In using mixed components.


One advantage appears in connection with
contraction. The result of contraction Is given in terns of covariant components by formula
2.4 (or 22.41). But, according to 23.3 we may
write Rlj for Po g la so that 22.4 may be writj
ten as

23.48
O a so that
we
but according to 22.61
j
1
get 6ag al - g^ * and the assertion is proved.
text we want to learn how to differentiate
a tensor given in contravariant components, but
before we do that it seems necessary to Introduce what we call mixed components. Suppose we
are given one vector argument of a tensor of
rank two in contravariant, and the other In covariant components, and we want to find the value of the tensor; if the components of the two
given vectors are V 1 and W^, and the cartesian
components of the tensor are f^j , we pass to
cartesian components of the vector arguments

23.5

Vi

= b

VY
1Y ,

wj

= W a ,
ol

and express the value of the tensor as

23.51

v
ap a wp

where the notation


23.52
The numbers F^ with one lower
is introduced.
and one upper index are called mixed components
of the tensor of rank two whose cartesian components are fjj . In this same way we may consider mixed components for a tensor of any rank
with as many of the indices up as we may wish,
and the others - down.
We can pass from one kind of component to
any other directly, without going through cartesian components. The transition from components
in which a certain index (for example, the third)
is used as a superscript to components in which
the same index is used as a subscript is called the lowering of that index.
This change does
not affect the geometrical meaning of the tensor, it merely corresponds to a transition from
an expression of the tensor in which the corresponding vector argument (in our example, the
covariant components
third) was given
by Its
to an expression of the same tensor using contravariant components for that vector argument. The
formula for the lowering of an index is easily
found to be independent from all other indices,
so that, disregarding them, we always may use
23.31. Formula 23.3 may be considered as a general formula for raising an index. Lowering and
raising of indices is sometimes referred to as
juggling with indices.
Again it may seem that the introduction of
mixed components is superfluous, but there are

and for 22.41 we may write ?i\ ' We see thus


a
that if a tensor is given by its mixed components and the two indices with respect to which
we contract appear on different levels (one as
a subscript, the other as a superscript)
contraction is performed (like in cartesian coordinates) by simply replacing each of the two
indices by the same Creek letter.
Another case where there is great advantage in using mixed components is that of differentiation of a contravariant tensor (as we
"tensor given by its consay sometimes for:
travariant components"; a tensor in itself is,
of course, neither contravariant nor covariant
- covariance and contravariance are only
properties, or types of representation of tensors);
the components of the differential will have
one more index, and this index as one derived
by differentiation will naturally be a subscript,
whereas the old indices are superscripts; this
does not mean that we cannot pull the new index
up, or the old ones down, but the expressions
resulting from that would be more complicated.
Suppose the given contravariant tensor is
of rank one (a vector) V4 ; we pass to cartesian

components :

we differentiate this:

"ft

O~Vj.

and we pass to mixed components by formula


23.52:
"&

&

and this, using 21.14 and 22.71 reduces to

23.6

^J

"

lu~

r?

r*

The reader should be able, following the


examples given, to deduce formulas for differWe
entiation of a tensor given in any form.
just mention, because we will have occasion to
use it later, the formula for differentiation
of a mixed tensor

23.7

We are in
possession now of all the formal rules of operations on tensors in general
coordinates. Although these rules were deduced
by means of cartesian coordinates these coordinates and components together with all formulas
involving the a's and the b's form only a kind
of scaffolding that can be removed after the
building has been completed. All ire have to
know in order to operate on tensors are the g's.
Using the g's we can lower and raise Indices and
contract and, as a special case, find the scalar
product of two vectors; also find the angle between two vectors (using formula 7.5) and the
length of a curve (using 81.18). Given the g's
we can calculate the r" s (end of Section 22) and
with the aid of the P s we can differentiate
tensors (formulas
2.8). We see thus that the
g's play a fundamental part in all operations the tensor of which they are components is often called the fundamental tensor.
Before we conclude we might state explicitly that all the formulas we have obtained are
entirely independent of the number of dimensions.

from these we obtain using


g tl * g,,

24.21

4.3

x x = x,

x t = y,

x a = z,

x4 = it

as the transformation formulas, corresponding to


21.1; and

x =

x.

y = xs ,

z = x,,

t = -1x4

as the inverse formulas corresponding to 1.2.


The ajj and the b
with different subjj
scripts are easily seen to be zero, and we have
(compare 21.8 and 21.11)

24.2

* -

+ 7* +

t.

We come next to Maxwell's equations where


the "minus sign trouble* originated.
To conform with the notations of this chapter ve
should use for the cartesian components - the
mathematical components of preceding chapters small letters, so that formulas 4.72 will hare
to be written
f 41 = 1Y,

IX,

31 = M,

24.11

* 1, all others zero

f,, - L,

f 4 , = 1Z,

Physical Coordinates as General Coordinates.

The principal purpose for the introduction


of general coordinates was to make possible the
treatment of tensors in curved space but it happens that general coordinates may be used with
great advantage also in Special Relativity Theory, namely, in connection with the situation
arising from the "minus sign". We remedied this
situation in Section 4 by introducing imaginary
coordinates and tensor components; we know how,
using these imaginary quantities to write our
formulas in a nice symmetrical form. The system
of notations for general coordinates that we
have introduced permits us now to reintroduce
real quantities, and still to preserve symmetry
in the formulas. We shall express our four mathematical coordinate's x x , x 2 , x 3 , x 4 , of which
the fourth is imaginary in terms of four real
coordinates which we may denote by u 4 ; we may
choose as these four real numbers the physical
coordinates x,y,z,t and consider the formulas

24.1

-g 44

and the same values we obtain for the g's with


.5.
upper indices, using
The x,y,z,t may be considered as the contravariant components of the radius vector leading from the origin to the point P; the co variant components of the same vector are seen, applying the formula 3.21, to be x,y,z,-t.
The formula for the square of the distance
from the origin may be obtained either from
1.15 or from 3.4; it is (compare 10.1)

41

24.

1.16

aa = ass = 1

Using formulas 22.2 and 24.2 we obtain the covariant components in physical coordinates - and
we use here capital letters - as follows:

F 4l = X,

F 4t

F 83 = L,

F J8 = M,

Y,

= Z,

P4

24.4
= M.

Mixed and contravariant components may be obIXie


tained by raising indices - formula 23.5.
to the simple character of the g's given by
4.1 it is easy to see that raising one of the
indices 1,2,3 does not change the numerical val
ue of a component, so that, for instance,
24.5

= ga

V*F al

24.6

F\

'

Which components shall we use in Maxwell's


equations? It is clear that in the first (11.2)
set all the indices must be on the same level,
and since the last one must be a lower index we
write all of them down. In the second set (UJ$)
again the one after the comma must be down; but
the one with which we contract it must be on
the other level, and therefore up; the position
of the third index is arbitrary. We have thus,
as the Maxwell equations for free space

a 44 = i

b 44 = -ij

and raising of the index 4 Just changes the


sign of the component so that

24.7
b sl = b, 3 = 1,

f lt = i.

jk>l

0,

50
and in the presence of matter the
becomes

second

set

eu j

24.71

We notice here that no Imaginary quantities appear and In spite of this our formulas
are symmetric. The raising of the index 4 Is
equivalent to changing the sign of a component,
and this is how the minus sign is taken care of,
We shall now write out the expressions for
the stress energy tensor and the equations of
motion; it is clear that the formulas 11.4 and
11.5 become
24.6

T la

24.9

=
<*

*o,jFF a

or

24.91

T iJ =

- ig 1J F

FP

= F FP - i
F FP
p
gij
lp
j

puSx

or

24.92

The continuity equation (11. 1) will now be


written as

24.10

25.

(pu

a
),

0.

Curvilinear Coordinates In Curved Space.

We want next to apply the general coordinate system that we have introduced for flat
space also to curved spaces. In flat space we
introduce the language of general coordinates
and components by translating from the language
of cartesian coordinates and components; in
curved space we have no cartesian components; we
shall have, therefore, to begin by introducing
something that will play the role of cartesian
coordinates; we shall introduce quasi-cartesian
coordinates, which will take that place; but
whereas cartesian coordinates are universal in
that the same system of coordinates works for
the whole plane, or flat space - the neighborhood of every point in curved space has its own
system of quasi-cartesian coordinates. They are
defined in the following way: Consider at a
given point P the tangent flat; there will be In
general a neighborhood of P such that no two
points of that neighborhood have the same projection on the tangent flat (for a sphere, e.g.,
we may obtain such a neighborhood by drawing any
small circle around the point of contact) for
such a neighborhood there exists a one-to-one
correspondence between the points of it and the
points of the tangent flat which are their projections. We introduce now on the tangent flat
a cartesian system of coordinates with origin
at the point of contact, and we use the coordi-

nates of a point of the flat as the quasi-cartesian coordinates of the point of the curved
space whose projection it is; If, for Instance,
a surface is given by equation
5.1

z = J(ax

2bxy + cy)+t.h.d.,

x,y will be the quasi-cartesian coordinates of


the point x,y,z, of the surface for the neighborhood of 0,0,0; - and in the general ease, If
the curved space is given by 19.71

25.11

t.h.d.

the x (i - l,...,n) will be the quasi-cartesian


coordinates of the point x t (i = 1,...,H) for
the neighborhood of the point 0,0,0,...,0.
When we were discussing curved space In
Sections 18, 19, 20 we were speaking of vectors
and tensors; these vectors were vectors of the
tangent plane or tangent flat with initial point
at the point of contact. We shall not consider
any other vectors in connection with curved
spaces and we shall refer to these vectors as
the vectors of the curved space.
To make this
Idea seem more natural we may remark that a tangent vector to a curve on a surface ( or on a
curved space) is such a vector, that is, a vector of the tangent plane (or flat) with initial
point at the point of contact. In handling
these vectors we have been using for the vectors
at every point of the curved space a cartesian
coordinate system in the flat tangent at that
point, in fact, we may say the same system that
furnishes us the quasi-cartesian coordinates
for the points of the curved space in the neighborhood of the point of contact. We shall,
therefore, refer to the cartesian components of
the vectors *"d tensors of the tangent flat when
they are considered as vectors and tensors of
the curved space - as quasi-cartesian components
of these vectors and tensors.
We have thus in connection with every point
P on the curved space a local coordinate system
which gives quasi-cartesian coordinates of the
neighboring points and the quasi-cartesian components of the tensors at P, and in some cases
these local coordinate systems are very useful,
but it will be necessary to introduce more general, more universal systems and learn how to
represent vectors in them. The necessity of
this last requirement will be clear if we consider that, although a quasi-cartesian system
of a point P may be used to represent points In
the neighborhood of P it is not quasi-cartesian
for these points and cannot be used as such to
represent vectors at such points.
There is no difficulty In introducing a
universal system of coordinates for the points
of a space - what we want is just as in ordinary
space a one-to-one correspondence between the
points and n-ples of numbers. A simple example
is furnished by the so-called geographical coordinates for the surface of a sphere. Another

51

approach is given by the so-called parametric


representation of a curved space. If the coordinates of a flat space X,...,xg are given as
functions of one parameter we have a curve;
when they are given as functions

or it may be considered as a vector of the containing space. Its (cartesian) components will
be given by x^ (l = !,...,) where the x are
functions of t which are obtained fro* 25.2 by
substituting for the u's the expressions
characterizing our curve; we hare thus

25.2
1

of n parameters Uj we have what we have called


an n-dimensional curved space because eliminating these n parameters from the N equations
which express the coordinates in terms of them
we find that the coordinates must satisfy N - n

and

oxi/ou a .u

for the square of the vector


,*xl.

= r equations,

and this was our definition of


an n-dimensional curved space. Now since to every set of values i^,...,^ of the parameters
there corresponds one point of the curved space
the parameters u^ may be used as coordinates for
the curved space.
Suppose then that we have introduced in
some way a general system of coordinates for the
points on a curved space (the reader may always
think of the special case of a surface) .
What
will be a natural system of representation of
vectors to go with it? Just as we use a cartesian system in the tangent flat to represent
points on the surface we may, so to say, project
the general coordinate system on each tangent
flat and use it to represent vectors and tensors
in that flat, in particular those with initial
points at the point of contact, i.e., the vectors and tensors of the curved space. For the
neighborhood of each point we have thus two coordinate systems: the general and the quasicartesian for that point - and the same two systems, or, rather their projections, we may use
on the corresponding tangent flat. For the
neighborhood of each point there will be transformation formulas for the coordinates of points,
and from these we can derive transformation formulas for components of vectors and tensors involving the a s, the b's and the g's as introduced in Sections 21, 22 and 23. But since we
consider only vectors with initial points at
the point of contact we shall use the a's,
the
b's and the g's calculated from the correspondence between the quasi-cartesian and the general coordinates at a point only for that point
itself. We know that the a's and b's are necessary only in the building up of a system so
that all we shall need in order to be actually
able to handle tensors and vectors in a given
general coordinate system are the g's.
We want to explain now how to obtain the
g's when a space is given in parametric form
25.2.
We consider a curve u^(t) on the space, and
its tangent vector at some point; it may be considered either as a vector of the curved space,
and then its contravariant components will be
given, if we denote differentiation with respect
to the parameter by a dot placed over a letter,
1
by u , the square of its length will be
1

Equating this to the expression we obtained


above we find
25.3

1=1

This formula ought to be compared to 21.16 of


which it will be seen to be a generalization if
account is taken of the values 21.11 of the b's.
The method of giving the curved space by means
of the formulas 25.11 may be considered as e
special case of the one used above; this will
be clear if we write 25.11 in the form
Xj.

= u ,
x

x n = Ua,

x a = ua ,

It is seen that the parts of the u's are played


by the first n of the x's. Differentiation of
the formulas Just written with respect to these
variables gives

substitution of these expressions into 25.2 gives


z

i=n+l

25.4

= z

t.h.d

k=l
t.h.d.

*n

These formulas give the values for the g's In


pseudocartesian coordinates for a neighborhood
of the point of contact.
For the point of contact itself, i.e., for
the origin of our system of coordinates we have
25.41

(g. n )

- o

and from the formulas 22.61 we conclude easily


that the g's with the upper Indices are also
the A's:

25.42

As a consequence of this the distinction between


covariant and contravarlant components vanishes
for quail-cartesian coordinates at the point of
contact.
We come now to the operation of differentiation. In the case of flat space we were simply trying to find a system of notations for
some operations that were defined Independently.
Here the situation Is different; we have not defined differentiation; we cannot define It In
what would seem to be the natural way, as the
rate of change of a rector, for Instance, because this would necessitate the consideration
of the difference between two vectors at two
different points and this conception Is not defined for curved space.
Before we come to this definition let us
formulate the situation In flat space as follows:
a tensor field dF has been obtained by
differentiation from a tensor field F if in every point the components of dF in a cartesian
system are the derivatives of the components of
F in that system.
In curved space there is no universal cartesian system but there is a quasi-cartesian
system for every point; It is natural, therefore, to define differentiation in curved space
by substituting in the above statement "quasicartesian system" for "cartesian system"; if we
do that we arrive at the following definition:
Definition of Differentiation.
We shall
say that a tensor field dF has been obtained by
differentiation from the tensor field F if atevery point the components of dF, in a system
of coordinates that is quasi-cartesian at that
point, are the derivatives of the components of
F in that system of coordinates.
Although this definition may sound complicated it is the simplest imaginable adaptation
of the idea of differentiation to curved spaces.
The complication arises from the fact that there
is no cartesian system in curved space but when
we apply this definition to flat space we see
that it brings us back to differentiation as we
knew it in flat space.
We shall not have actually to pass from
general coordinates to quasi-cartesian coordinates and then, after differentiation translate
the result back into the language of general coordinates in every special case. We can derive
the formulas in general coordinates once for
all, just as we did it in the case of flat space
in Sections 22 and 23, and we shall obtain exactly the same formulas. The only difference
may be in the derivation of the r*s from the g's
(end of Section 22) which was based there on
the fact that the derivatives of the g's in cartesian coordinates vanish (22.84). Is this true
also in curved space? or, more precisely, do
the derivatives of the g's in quasi-cartesian
coordinates vanish at the point of contact?
In these coordinates the g's are given by
25.4; differentiating these expressions we obtain

25.5

t.h.d.

and for the point of contact, where


vanish, this Is zero so that

the

x's

25.6

We see thus that formally everything is


just the sane as In flat space so that we can
take over Into curved space the whole
apparatus
of formulas worked out In Sections 21,
22, 23.
Incidentally we may mention that as It follows easily from 25.6 the quantities r also
vanish in quasi-cartesian coordinates at the
point of contact. For future reference we put
down the formula

25.7

0.

In general, the point of contact In quasicartesian coordinates is a place where we have


the closest possible approach to the situation
which obtains in flat space when we use cartesian coordinates.
Another formula that can be easily obtained from 25.5 and that we need later Is obtained
by differentiating 25.5 once more and setting
XA = 0. We get thus

25.8

Given a curved space by the formulas 25.2


we know how to find the g's. The question now
arises:
suppose we are given
$n(n + 1)
functions of the coordinates; is it possible to
find a space for which these functions serve as
the g's.
The question is that of solving the
system of partial differential equations (25.3),
and without going into details we shall state
that such a system of equations in general can be
solved if the number of unknown functions is
equal to that of equations; since we have here
n(n + l) equations we must have that 'many unknown functions; that means that the number of
dimensions N of the containing space must be In
general &n(n +1); in special cases it may, of
course, be less than that. We may say then: a
two dimensional curved space given by its g's
may be always considered as Immersed into a 3dimensional space; a three-dimensional curved
space may be always considered as part of a sixdimensional flat space; and a four-dimensional
as part of a ten-dimensional flat.
Another question is, whether for given real
g's the containing space will come out real; and
this Is by no means always the case. We know
= gj 3 3 -g44 = 1, all others
that for g xl = g
cartesian
the
minimum
containing space is
zero,
four-dimensional with one imaginary coordinate
and it Is clear that no real cartesian space
can contain It.
Henceforth we may consider the curved space
as given by its g's, and the g's may be considered as arbitrarily given functions of the u's.

65
It may seem that we lost from view the
original purpose of Introducing curved space,
which was that of obtaining a tensor which we
could Identify with T^ . We Introduced the
Riemann tensor having this In mind, but now we
seem to be Immersed In an entirely formal theory and far removed from the Riemann tensor; as
a matter of fact, It Is just around the corner;
differentiation, although performed according
to formulas that are formally the same
as
In
flat space, has, as we shall see, a new content;
in trying to discover the difference we will be
led to the Riemann tensor from a new point of

26.21

now we form the difference we want to Investigate, viz.,

the last bracket vanishes according to 22.71,


and what remains becomes after the substitution
of the above expression 26.2 for the first differential and rearrangement of tei

view.

26.

New Derivation of the Riemann Tensor.

We said that the meaning of differentiation


in curved space is different from that in flat
space. To show this difference in one of its
most important manifestations we start out with
a tensor of the first rank given in its contravariant components F 1 ; we differentiate it twice
to obtain a tensor of third rank
in flat
F*,j n ;

space this tensor would not differ from F*,^


because in cartesian components differentiation
of a tensor reduces to ordinary differentiation
of its components, so that the cartesian compoi
nents of the two tensors mentioned are
7^

^^71

7K^*

- F3

Here cancellation takes place in the first thret


in the first as a result of inpairs of terms:
dependence of ordinary differentiation on order,
In the next two pairs as a result of the fact
that the name of the index of summation is IBmaterial; we come out with
-

in flat space.

or

26.4

- F

where

26.5
-

This reasoning does not apply to curved


1
In a point P we
space; in fact, to find F
have to differentiate F 1
and in order to do
;
j
that we have to know F 1^ j in different points
of the neighborhood of P; the finding of F *
j
in these points involves the use of quasi-cartesian systems for each of these points; we do not
have then one quasi-cartesian system in which
we can perform all our operations and the reasoning that led us to 26.1 breaks down. In spite
of this the result might still hold. In order
to show that it 033 not let us calculate the
left hand side of 26.1 using the formulas which
we deduced for flat space in Sections 22 and 23
and which, as we proved In Section 25, apply to
curved space also.
We start with the contravariant components
FI; we calculate the first differential according to formula 23.6 to obtain

26.2

26.3

O3C-J

and g
r ^ respectively,
and these are equal because the result of ordinary differentiation does
not depend on the order; two tensors having equal
components in one system of coordinates would be
equal in all systems of coordinates and so we
have

26.1

+I* P;
aJ

next, we differentiate this again, and get, according to formula 23.7

rj

Before we discuss the question whether the


expression vanishes we want to show that the
B's are the components of a tensor. In fact,
multiplying both sides of 26.4 by X^Z", where
Z" are components of arbitrary vectors,
Xj^, Y^,
and contracting we have

The left hand side is a scalar that has been obtained by legitimate operations and is, therefore, independent from the coordinate system
used; so is therefore the right hand side, and
this proves that the B's are the components of
a tensor.
(In order to see that this is an essential point and that not every symbol with
indices may be considered as a tensor, the read<
er might consider the expression P krX e^Z*';
this expression is, obviously, not independent
from the choice of coordinates since in cartesian coordinates the r's vanish, and in other
systems they do not; the r's furnish thus an

54

example of symbols with indices that can not be


interpreted as components of a tensor) .
Now we can settle our question as to the
vanishing of 26. 3 by showing that the B's are
mixed components of the Riemann tensor which
has been Introduced in Section 20.
Since we
hare proved that they are components of a tensor we may use any system of coordinates,
and
we decide to use a quasi-cartesian system.
In
such a system the r's vanish at the point of
contact (25.7) so that we are left with the
terms

substituting for the r's with one upper index


their expressions in terms of the g's with upper indices and the r's with all indices down
(22.92) we get

the first two


vanish at the
count (25.41)
contact equal
sions 22.9 we

terms vanish again because the r ! s


point of contact; taking into acthat the g's are for the point of
to the 6's, and using the expresfind after a few cancellations

(the index i appears here as a subscript because


the distinction between contravariant and covariant quantities vanishes for quasi-cartesian coordinates at the point of contact) . Using here
for the second derivatives of the g's the expressions 25.8 we find

Comparing this to the expression for the Riemann


tensor deduced at the end of Section 20 we convince ourselves of the identity of the two expressions.
This shows that, if the Riemann tensor does
not vanish, the second differential of a vector
field actually may depend on the order of differentiation. This fact is very interesting in itself, it confirms our statement that in curved
space differentiation has a new meaning and it
has many important implications, on which, however, we cannot dwell here. For us it is important that we have obtained an expression of the
Riemann tensor in terms of the g's alone; this
means that those properties of the curvature of
space which are expressed in the Riemann tensor
are determined by the metric of the space, i.e.,
if distances along different curves are given,
the curvature (as far as it is expressed in the
Riemann tensor) is determined. According to our
conception, the Inhabitants of the space cer-

tainly can measure lengths; it follows that curvature, as expressed by the Riemann tensor if
accessible to the Inhabitants, It Is an internal
property of the space. In particular, for I
5,
n
2, i.e., for the ordinary surface we obtain
the fact that the total curvature can be calculated from the expression for the line element;
this is Gauss's Theorema Egreglun.

27.

Differential Relations for the


Riemann Tensor.

The method of quasi-cartesian coordinates


in proving a relation between tensors that we
used in identifying the B's with the components
of the Riemann tensor can be applied often and
helps to avoid lengthy computations. We shall
use it now to prove certain differential relations for the Riemann tensor that are very important for us because we know that the tensor
T 1 , which we want to identify with the contracted Riemann tensor satisfies a certain differential equation, namely, ^Tj/oXfl =0, and, of
course, we expect the tensor in our mathematical theory with which we are going to identify
T to satisfy the same relations. In order to
deduce differential relations on the contracted
Riemann tensor we have to prove first some relations for the non-contracted tensor. These
relations have been discovered by Ricci and then
rediscovered by Bianchi and bear the latter' s
name; they are
0.

27.1

The proof is very simple if we use quasicartesian coordinates. In these coordinates the
r's at the point of contact vanish and although
the first derivatives of the r's do not, the
components of the tensor obtained by differentiating the B's (formula 26.5) which we have
identified with the R's will contain the second
derivatives only, because the first derivatives
will be multiplied by the r's themselves that
do vanish. With this remark in mind the proof
of the Bianchi relations does not present any
difficulty; we simply substitute for each of
the three terms in 27.1 the difference of the
two second order derivatives and find that the
result vanishes identically.
Now, in order to deduce froa 27.1 the relations for the contracted tensor we raise in
27.1 the second index so that we have

n,p

and here we contract 1 with m, and

with n. We

obtain

The second term here may be written as


tensor
Riemann
using the fact (20.71) that the

55

changes its sign when two Indices of the same


pair are Interchanged; and the third tern is
equal to the second as we can see by Interchanging a and p (which does not change the value
of the expression since a and p are summation
indices) , and then interchanging the first two
Indices, i.e.) p and a and the next two, i.e.,
p and p (each of these interchanges changes the
sign, so that nothing is changed in the result).
We have thus
- 2R a(3

p p,a

But R Pjp are the mixed components of the contracted Riemann tensor which we denote by R*j
so that, dividing by 2 and changing the sign we
have

a
Finally, R a is the result of contraction of the
contracted Riemann tensor; we denote this scalar by R (it is called the twi^e contracted Riemann tensor); then we can write for R a a p simply
R - or (O aijR) a and our relation becomes

28.

Multiplying by
to i, writing i

0.

a^
we

and summing with respect


get

or, taking into account 21.14 and 22.82

28.2

ttJ

r^pU^

0.

We pass now to curved space; in general, ve


have here no straight lines but we may consider
the same equation and investigate the properties
of the curves represented by them. We introduce
as our definition:
Geodesies are curves which satisfy for an
appropriate choice of parameter Equation 26.2.
In studying geodesies it is often more conbut
venient to consider not a single geodesic
so
a portion of space filled with geodesies,
that through every point there passes one and
only one geodesic of the bunch. If we have this
situation we have a vector u 1 in every point of
that portion of space, so that we have a vector
1
field, and the components u may be considered
We may then
of
coordinates.
functions
the
as
write

Geodesies.

In concluding this fragmentary development


of the mathematical theory that we intend to apply to Physics in the next chapter we shall study
briefly a class of curves in curved space which
play an important part in the study of motion.
These curves may be considered as generalizations
of straight lines in flat space, and we shall
begin by considering these.
In agreement with the point of view of differential geometry (Section 21) we shall characterize a straight line by differential equations.
If it is given in parametric form (7.11) we obtain by differentiating twice with respect to
the parameter and indicating differentiation by
a dot placed over the letter

28.1

dil/dp = b 1(

and equation 28.2 becomes


=

or, according to 22.6,

28.3

= 0.

This form is very convenient in some cases.


We shall use it to prove two properties of geIn the first place we may discuss the
odesies.
meaning of the parameter that we are using. Consider the square of the tangent vector xr*^; we
can prove that this quantity is constant along
In fact, differentiating with rea geodesic.
the
to
parameter, we have
spect

0.

Since the choice of the parameter is in a


high degree arbitrary, and for another choice of
a parameter the representation may cease
to be
linear and the equations (28.1) may not hold any
more - we cannot say that they characterize a
straight line; a complete statement would be: a
straight line is a curve for which there exists
a parametric representation such that 28.1 holds.
Next, we translate 28.1 into the language
of curvilinear coordinates, still keeping to flat
space. We have, as in Section 21, except that we
write now in agreement with Section 23, the index
as a superscript,
1

dxVdp

= bi u,
a

and this vanishes according to 28.3


length is given by the formula

Since arc

we see that, as a result of the fact that


is constant, s is proportional to p, or p is
proportional to the arc length s. Since multiplication of the parameter by a constant factor will not affect equation 28.2 or 8.5 we
may always consider that the parameter il arc
length. This discussion does not apply, howwhen
ever, when the quantity uUci is zero, i.e.,
1
to
call
we
If
vector.
agr^e
zero
u is a
square

curves whose tangent vectors have zero square


singular curves we may state the following:
In case of a non-singular geodesic,
Theorem.
the parameter mentioned In the definition of a
geodesic and used In i)8.P and 8.3 Is proportional or equal to arc length.
Next we may give an Interpretation to equation 8.2 which sheds some light on the geometrical nature of geodesies. We may assume now
that in all geodesies of the bunch arc length
has been chosen as the parameter; then the vec-

tors uJ are unit vectors and they characterize


In each point the direction of the geodesic.
The derivative uJ (1 characterizes the change of
direction as we move in the direction given by
the coordinate u 1 and uj gu a gives the change
f
of direction in the direction of the vector ui,
i.e., in the direction of the geodesic itself.
Since, according to 8.2 this quantity is zero
we have proved that the direction of a geodesic
does not change as we move along it. (The above
discussion applies, strictly speaking, only to
non-singular geodesies.)

57

Chapter V.
GENERAL RELATIVITY

In Chapter I we introduced certain fundamental quantities, and we combined them into the
We found
symmetric tensor of rank two, TJJ .
that this tensor satisfies the differential
equation

^T la /'ox a =

0,

first for 1 = 1,2, '6 and then, in Chapter III we


showed that, as a result of the new identification introduced there, the fourth equation is
also satisfied. We thought it desirable to build
the
a mathematical theory in which a tensor of
same formal properties will appear in a natural
way, and in the preceding Chapter IV we succeeded in actually setting up such a theory
the
theory of curved space- time.
The structure of such a space, we found, is
expressed *in a tensor of rank four - the Riemann
tensor, but we obtained from it by contraction a
tensor of rank two - the contracted Riemann tensor.
In investigating the differential properties of the Riemann tensor we found in Section
27 a relation of the type desired; it is satisfied by a tensor which differs slightly from the
contracted Riemann tensor, namely, the tensor
R* - ic^ j R which we may call the corrected contracted Riemann tensor, and this is the tensor
which we are going to identify with the physical
tensor T so that our fundamental assumption will
be
T4

D1
n

XK

t>
304X Jt n.

Thus we decide to interpret T, and therefore our


fundamental quantities of matter and electricity
p, u, v, w, X, Y, Z, L, U, N, which went into it,
in terms of structure of curved space as it is
reflected in the contracted corrected Riemann
tensor. But in doing this we find ourselves before a radically new situation. As we wanted,
the tensor is now an expression of the properties of space, i.e., the space is now different
from the one we had before - geometry and physics is now an organic whole and it is not clear
what changes this brings with it; together with
the desirable feature, namely the fact that T
grew out of space, so to say, we may have brought
in some not desirable and hard to manage features. But then there would be no gain if we
could merely say that T is a geometrical thing
now; we expected to gain something essential in
undertaking the merging together of our geometry
and physics; and now we stand before an accomplished fact and we have to see what it brought
with it. We conjured up something and we do not
seem to be able to stop, we have to go ahead and
hope that the changes will be beneficial.

It might seem strange that we find a physical interpretation only for the contracted
Riemann tensor, only for ten combinations of
its twenty components. But this is quite In
order. Should all the components of the Riemann tensor be used up in interpreting matter
and electricity that would mean that where there
is no matter (and electricity) space-time is
flat (as far as internal properties are concerned) ; that would mean that matter acts only
where it is; but we know that matter make* itself felt, for instance, by the gravitational
field that it produces, also outside the region
which it occupies, and this is in accord with
our identification as a result of which only
part of the components of the Riemann tensor
vanish where there is no matter, so that the
remaining components may be Interpreted as corresponding to gravitational forces.

29.

The Law of Geodesies.

In questions of celestial mechanics which


we are going to treat now the effects of the
electromagnetic field are usually negligible
and we shall begin by equating to zero our electromagnetic tensor. Equation 24.9 becomes then
29.1

According to our fundamental assumption,


this tensor has been identified with the corrected contracted Riemann tensor, and it must
satisfy the equation
29.2

a
=
1>a

which formally is the same as our old equation


of motion 24.8 but differs from it In that it
has to be interpreted in curved space. The lest
two equations impose certain conditions on the
1
velocity components u and we want to find these
conditions or, in other words, we want to eliminate density from the equations 9.1, 29. P.
(Y/e have been using in Chapter IV the letter u for.
the general coordinates - in this chapter we go
back to our notation of the first three chapters and denote by u 1 again the four-dimensional velocity vector, and we shall denote the general coordinates by x*0
First of all we shall prove the following
theorem due to ilineur.
If the field u 1 satisfies the equaTheorem.
tions 29.1, 29.2 the vectors u 1 may be considered as tangent unit vectors to a family of geodesies filling the space.

Proof. We consider first the case when u


Is a unit vector (and not a zero square vector),
Differentiating this relation
I.e., UpuP = -1.
we have

u p,iuP

29.3

Substituting 29.1 Into

29.4

op/oxa.^Uj

0.

the first approximation we neglect the MSB of


the planet, I.e., assume that there Is no matter outside the tun. Since we have
already neglected electromagnetlsm we have then that outside the sun the tensor T Is zero so that, according to the fundamental assumption,

9.2 we get

R}

= 0.
+ pua
auj + Prf*Uj a

iijR

Contracting we get R - $.4R


and we have simply

0.
0, so that R

Dividing by p and introducing the notation


30.1
29.5

A = o log p/ox a .u

ua a

we may write 29.? as


Auj

uSij a

= 0.

Multiplying this by uJ and summing with respect


to J, for which we write P, we have
=

which, according to 29. 3 gives A = 0.


tuting this into 29.5 we obtain
29.6

Substi-

u a .u.

which, according to 28.3 proves the theorem in


the case considered.
But we also have to consider the case of
In this case we do not
propagation of light.
heve to consider any density p the momentum vec= 0. The
tor being given by Qi with
precedq^qP
ing proof breaks down in this case', but continuity considerations lead us to the result that, in
this case also we can find a scalar field p such
that qVp will be tangent vectors to geodesies.
We conclude that in a gravitational field
matter and light particles follow geodesies.
In the present chapter we are going to apply this result to the investigation of the motion of a planet and the propagation of light
in the Solar system. We shall see that the
changed significance of differentiation takes
care, in a way, of what is usually accounted
for by gravitational forces.
30.

Solar System.

Symmetry Conditions.

Our equations 29.1 and 29.2 describe relations existing between matter and field.
We
proved that the motion of matter is characterized by the geodesies of the curved space, but
the curvature is in turn determined by matter.
Theoretically, we may have a complete description of the situation, but in practice we do not
know how to handle it, we do not know where to
begin. In such cases we often resort to the
method of successive approximations. Let us try
to apply this method here.
In investigating the
motion of a planet around the sun we neglect in
the first place the motion of the sun.
Then, in

Rj

0.

These equations are known as Einstein's equations. We see that the statement that the corrected contracted Riemann tensor vanishes is
equivalent to the statement that the contracted
Riemann tensor vanishes.
As a result of our first approximation we
derived thus the field equations 30.1. In the
next approximation we introduce the planet n4
assume that its action on the field is negligible but that the field acts on It, i.e., that
the motion of the planet is given by the geodesies of the field which has been determined
in the preceding step; the motions will then be
given by the equations 28.?
30.2

in which the r's are calculated from the g's


which have been found to satisfy 30.1.
Our problem, therefore, falls in two: first,
to find a field satisfying the equations
30.1,
and second, to find the geodesies of this field.
In this form the problem is comparable to
the problem in Newtonian mechanics as explained
in Section 1. There the field was given by the
potential which had to satisfy the Laplace equation; here the field is given by the g's which
have to satisfy the equations 30.1.
There the motion, after the field had been
determined, was described by second order ordinary equations, differentiation being taken
with respect to time; here motion Is also described by second order differential equations,
derivation being with respect to s.
It is possible by making some special assumptions, neglecting certain quantities, for
instance the derivatives of all the g's except
g 4 4 and dropping some terms, to obtain the general Newtonian equations as a special or limiting case of our equations. The equations 30.1
would thus reduce to the Laplace equation 1.54
for g 44 and the equations of a geodesic to the
equations of motion 1.1 in which X,Y,Z are given by 1.53, so that we could consider the general Newtonian theory of motion in a gravitational field as a first approximation to the
theory of Relativity, but it is quite difficult
in the general case to estimate what we neglect
and the error we commit, and we prefer to compare the two theories on some concrete special

All these cases will refer to what corcases.


responds to a gravitational field produced by a
single attracting center. We found In Section
1 such a field by using the general
equations
and, In addition, the condition of symmetry. We
Intend to follow an analogous course here. Our
general equations are 30.1 and now we want to
find what will correspond to the conditions of
symmetry. The situation is much more complicatThere the field could be characterized here.
ed by a scalar <p and the condition of symmetry
with respect to a point was simply expressed by
stating that 9 Is a function of distance from
that point; here the field is characterized by
a tensor gij.
There, in the second place, we
worked in ordinary space; here we have spacet.
time which has an additional coordinate,
Last, there the space was given and in it distances were well defined; on this space was superimposed a field whose symmetry we had to
discuss; here the field is not superimposed on
a space with a given metric, but the metric itself constitutes a field which has to be determined by the symmetry condition.
We shall take up these three difficulties
one by one.
In the first place let us consider a tensor field in ordinary space, and let us impose
on it the condition of symmetry with respect to
A tensor we may consider (Section 9)
a point.
as the left hand side of an equation of a central quadric surface (we are interested in a
symmetric tensor here, since the g's are symmetric in the indices 1 and J - this symmetry
we must try not to confuse with the symmetry
with respect to a point which we impose on the
field - and a symmetric tensor is sufficiently
characterized by a quadratic form) which we may
consider as an ellipsoid. Our tensor field will
then be represented by an ellipsoid at every
point of space. The field must allow rotations
around a fixed center 0, i.e., such a rotation
must bring the field into itself; in other words,
if a rotation brings a point P into a point Q
it must bring the ellipsoid at P into the ellipsoid at Q. In particular, a rotation, which
leaves P unchanged must not change the ellipsoid
at P.
It is clear that every ellipsoid must be
an ellipsoid of revolution and that its axis
must be directed along the radius vector from
to P.
The ellipsoid at the point x,0,0, will be
seen to be
.

'

- x) 8 + B(TJ

= 1

and for a general point P, if we use polar coor


it helps,
dinates for P and (considering if
the ellipsoid as infinitesimal) their differentials for the points of the quadratic relative
to P,

30.3

8
Adr 8 + B(d9 + sin a e.d9 8 ) = 1.

Since ellipsoids at points equidistant fro*


must have the same dimensions, A and B oust be
functions of r alone.
The left hand side of this equation gives
a tensor field which satisfies the condition of
symmetry with respect to a point. Bext, we
consider the complication resulting from the introduction of time. In Section 1 time was not
mentioned, it means that the field was considered as Independent of time, or static; we may
say that the field must not be affected by a
change in t or, from the four -dimensional point
of view, by a translation along the t-axls. This
Is a requirement of the same character as that
of symmetry with respect to a point; froa the
four-dimensional point of view we may combine
the two requirements and say that the field must
be symmetric with respect to a line - the t-axls.
But the field now is a field in four-space, It
will be represented by a quadratic form in dr,
it must reduce to the
For dt =
de, d<p and dt.
field given before; the coefficients must be independent of t corresponding to the requirement
that the field be static; and a change from t
to -t must also not affect the field (reversibility of time) so that terms of the quadratic
form involving dt to the first power must be abIt follows that the addition of the
sent.
fourth dimension results in the addition of only one term to our tensor which now may be written as
20.4

Adr

+ B(d6 2 +

Cdt

where C, as well as A and B, are functions of r


alone.
And now we have to overcome the last difficulty, that connected with the fact that our
space is curved and that we cannot define symmetry in terms of rotations because rotation
means a transformation in which distances are
preserved, and distances are defined only by the
field of the g's which we want to determine by
the requirement that it be not affected by rotations. To overcome this difficulty we have to
agree on some other definition of symmetry, and
it seems natural to adopt as such the following:
in order to define a symmetry for a curved space
we shall compare it with a flat space by establishing a one-to-one correspondence between the
points of the two spaces. Corresponding to every transformation of the flat space we will
have then a transformation of the curved space;
and we shall say that the curved space possesses the same symmetry as a field F In the flat
as
space if the metric of the curved space
given by the g's - is not affected by those
transformations of the curved space which correspond to the transformations in flat space
not affecting the field F.
Suppose now that we have such a curved
This implies that we have a one-to-one
space.
correspondence with the flat space, and we may

60

use for the points of the curved space the same


coordinates that we use for the corresponding
points of the flat space. It Is clear that 30.4
will satisfy the requirements, so that we can
or as we
take It for our fundamental tensor,
shall say (compare 21.8) for our ds 8 .
But the quantities r, 6, <p, t, which have
definite geometrical significance in flat space
lose it in curved space - they are Just numbers which we use to characterize different
points as we use numbers to characterize houses
on a street. There is no reason why we should
not replace them by other numbers, i.e., transform our coordinates, if it would simplify our
formulas. Now, it is clear that transformations
involving 6, 9, t will make our expression 20.4
more complicated because it would introduce
these coordinates into the coefficients. But we
could choose a transformation on r alone so as
for into simplify that expression; we could,
to
a precoefficient
one
reduce
any
stance,
We make this
scribed function of the new r.
choice in such a way as to reduce B to r 8 because, in a way, it restores to r a geometrical
meaning as we shall see presently. If we write
new r
(r) and -T(r) for the functions of the
which now appear instead of A and C, and inter2
pret 30.4 as giving -ds , in accordance with
the standardization of the parameter adopted In
Section 12, our final formula will be

and which involve two unknown functions.


In order to determine these functions we have to
substitute 31.1 into 30.1. In the first place
we have to calculate the g's with the upper indices from the formulas (2.6)

SO. 5

Differentiation with respect to r will be de'


noted, as in Section 1, by . We next calculate the r's with all indices down according to
22.9 and obtain, omitting those that come out
zero,

-ds a =

8
+ r 8 (d9 8 + sin 8 0.d<p 8 ) - T)(r).dt

Letting here r and t have constant values


we have a surface, and a simple calculation
would show that -^ is the total curvature of
this surface, which gives a geometrical meaning

Since the g's with two distinct lover Indices


vanish, only those terms on the left are not
zero in which a
1 and we have

for 1 ^ J the right hand sides are zero, and


since the first factors on the left are not
zero the second must vanish; we see thus that
the g's with two distinct upper indices also
vanish. For J = 1 we have unity on the right
and thus
,11

31.2
g

33 =

Our task is now accomplished, we have imposed on our space the conditions of symmetry
and we have next to impose on it the general
equations 30.1.

31.

Solution of the Field Equations.

We are now at a stage which corresponds to


the assumption that the potential 9 is a function of r alone in Section 1, and our next task
corresponds to the substitution of <p(r) into
Instead of one unknown
Laplace's equation.
function <p(r) we have here the ten g's determined by 30.5 which we may write out as
8x1 =

32 =

g aa ? =

all others zero,

44 =
-i/ n (r)

In what follows
Xi

= r,

x a = 8,

x, = f, x

= -r'slnO. cos 8,

-iV*

= t.

>

31.3
r
r

s,33

= r>

t, i

= r sin*0

r ,i

= r a sin 8. cos 8,

F 4 , 14 = -Jtf

Raising of an index is accomplished In this case


simply by multiplying by the g with the index to
be raised appearing above twice, because the sua
1
ia
g Fa which, according to 83.3 is equal to F reof
the
g's
duces, as the result of the Tanishing
with two equal upper indices to one term, namely
11
This permits us to write out easily the
g
r's with one index above:

^.

r
31.31

r sin 8 6

r;,

li

r; 4 = J

31.1
a
2
g 33 = r sin e,

e,

all others zero.

1,44

to r.

1/r",

g*

7,

rJ =

33
8
,

= -sine .cos e,
;

-cote,

r,: =

Next we have to calculate those components


of the Riemann tensor which appear In the

81

expressions for the components of the contracted


Hieraann tensor, I.e., those with the first Index
equal to the one before last or those of the
l
We do not write those out but state
type R jjih
that the result of the calculation with their
aid of the components of the contracted Riemann
tensor is, that all these components with two
distinct indices vanish and the others are

Ol * const.

By choosing our unit of time appropriately


we can reduce this constant to 1, so that
31.7

or

Using 31.6 and 21.7 in the second of the equations 31.5 we obtain

4T1

g'r

1 -

which gives
31.8

It is more convenient to operate with the mixed


components of the contracted Riemann tensor ('although it is not necessary, and the reader might
for the sake of practice go through the same
calculations using covariant components) , and
these are obtained from the last formulas by
multiplication by the corresponding g with upper
indices; we obtain thus
t

TJ

= 1 -

where y denotes a constant of integration.


Our field then is given by
a

31.9
where

-ds
TI

r d0

.d*

- r)dt*

is given by 31.8.

Equations of Geodesies.

32.

We first consider the non-zero geodesies


We
which correspond to a material particle.
know that in this case arc-length can be taken
as parameter so that the curve in addition to
the equations 30.2 must satisfy the equation
30.5 which we may write as

^Ti-

31.4

32.1

We come now to the ten equations E *


that we have to satisfy; six of them, namely,
those in which i ^ J are satisfied identically
because our R's as well as the O's vanish for
distinct indices. Of the remaining four equations the second and the third are identically
the same because of the equality of the corresponding values of R in 31.4. Three equations
remain, viz.,

=0
Subtracting the last one from the first we have

= -1;

we shall, however, make our discussion slightly


more general and write A in the right hand side
with a view of using the results also in the
We
case corresponding to a light particle.
shall discuss this equation together with the
equations 30.2 which become here

32.21

r -

-.r

rn

rrj

sin a 9.f8

2T]

32. 22

9 +

32.23

32.24

2.8
2p.f

sinO

+ 2

cosO f a = 0,

cote .9* * 0,

5-.rt = 0.

coordinates is at our
and
The choice of the
such a way that the
in
them
choose
We
disposal.
initial position of the particle be on the equator and that the tangent be tangent to the equator.
In this case 8 =^ and e = zero at the
shows
initial moment and the second equation
that
always. Now the last two equations
may be integrated once each and they furnish

0=4

32.3
32.4

where h and k are constants. Together with


these two equations we have to consider the one
corresponding to 32.1; viz.,
r*9* ~

32.5

1
lit

A.

We simplify our system of equations In the


of
(a) we eliminate t by means
following way:
38.4; (b) we eliminate differentiation with respect to the parameter by using f = (dr/d9) . 9
and 32.3; (c) we Introduce as a new variable,
as Is customary In celestial mechanics, the Inverse distance u = -^, Instead of r, so that
r =

32.6

co.,

.in,

we substitute the value for


We obtain in this way a differential
between u and 9; viz.,
and,

We calculate the first and the second derivatives of x and y with reipect to t, substitute
them into 33.1 and combining the terns with COM
and
those with sin f we get

(d)

TJ

from 31.8,
equation

where X is a constant. This equation may be


considered as the equation of the orbit of a
planet.

dt

Multiplying the first of these equalities


sin 9, the second by cos 9 and adding the
sults we obtain
33.8

Every reader knows, of course, that accord-

33.1

211

-a du d9 ' -i ~d*9 _
"
'dt 1
'dt'dt

The last equation may be written as

"dt

9/dt

gdu/dt
u

d9/dt

whence
1
d9/dt * Hu

33.3

We next
where H is a constant of integration.
want to eliminate t from 33. 2 with the help of
the last formula. Differentiating it we have

dt
and then

du = du d9 = du
de
dt
d9*dt

d!u m
dt*

dV*v"
dfPMV

+ _du.d!9 . d!^j
1

d9 dtT

de

**-

We have now to introduce variables corresponding


to those used above in the Relativity treatment,
i.e., to set
x =

Substituting into 33.8 we arrive at

dx

COS9/U,

y =

sin9/u.

re-

and then easily

ing to the Newtonian theory a planet moves around


the sun on an ellipse in one of whose foci the
sun is situated, although he may not be in the
possession of a proof of that; we shall not give
disa proof of that here either, but we shall
cuss in detail only one feature of the situation.
The vertex of the ellipse which is nearest to
the focus in which the sun is located is called
the perihelion, the other vertex - the aphelion;
the line Joining the perihelion and the aphelion
is the major axis, and therefore passes through
the sun.
Using the coordinates u and 9 corresponding to those of the preceding section we may
say that the perihelion corresponds to the maximum value of u, and the aphelion to the minimum
value of u, and that the transition from the
maximum to the minimum value of u corresponds to
It is this last
the change of 9 by the amount *.
fact that we shall deduce from the equations of
motion. We may (corresponding to the fact that
= i* in the preceding section) consider
we set
the
a motion in the xy-plane characterized by
equations (see 1.1 and 1.3)

by

8u~"(-n:)

Newtonian Motion of a Planet.

33.

- 0.

-u~ 1 H u

+ Mu* =

and, after two terms cancel, at

33.4

du

This, we easily see, may be obtained by differ-

entiation from
=

33.5

EMu
H

a change of variable will help us to evaluate


this integral. We put

33.7

where a is a constant.
This corresponds to equation 38.7 obtained from Relativity theory in the
preceding section; in that last equation we
have, of course, to take A = -1 if we consider
the motion of a planet so that it becomes

du

- u,

ua

sln'x;

increase

2(u x - u.) sin z cos z dx,

u - u,

(ux - u t ) sin*x

33.8

ux - u =
and we see that the difference is essentially
in
only one term. But before we come to the com-

parison of the motions described by these two


equations we have to continue the discussion of
33.5. The character of motion described by it
depends on the values of the constants appearing
We
in it, and also on the initial conditions.
begin the discussion by writing 33.5 in the form
= -(u -

u8 )

where u x and u a are the two roots of the polyno2


mial u 8 - 2Mu/H - ct. If the two roots are comor
equal, the right hand side of 33.52 is
plex,
Also
negative and we cannot have real motion.
when both roots are negative the right hand side
is negative for positive values of u (and u,
being the inverse distance, must be positive).
The case of one positive and one negative root
to a finite
corresponds to u changing from
value and then going back to zero, for instance,
a comet approaching the sun from an infinite
distance and then receding back into infinity.
But we want to treat the case of a planet, and
this will obviously correspond to the only remaining case; viz., that of two distinct posiIf by u^ we denote the larger and
tive roots.
by u a the smaller of the two roots it will be
convenient to write our equation as

33.53
and we see that a real solution is only possible
when u is between u 2 and ux . The motion will
manifest itself in an oscillation of u between
u a and Ui and the sign of du/d? will change at
these points. The particular question we want
to investigate is, as was mentioned at the beginning of the section, to what change of q> corresponds one oscillation of u, between u a and u^
In order to find this we solve the equasay.
tion for dp, obtaining

whence
33.6

when z changes from


to */, u will
from u t to Ui as required. We bar*

33.51

33.52

U)(U

U 8 )'

Uj.

(u^

(u x

|u a
-

cos'z,

and the integral becomes.


33.9

2dx

The answer to our question is then, that f


changes exactly through x while u performs an
oscillation between its minimum and its
yi^i
values, which corresponds to the fact mentioned
before that the aphelion and the perihelion are
on a straight line with the sun, which fact we
thus proved.

34.

Relativity Motion of a Planet.

Following this excursion into celestial


mechanics according to Newton we return to our
Relativity formulas which we shall treat by comparing them to the formulas derived in the last
section.
At this stage we come again upon a fundamental question: we have two theories; the
quantities of one of them have been identified
with measured quantities, and this identification proved, in the main, a splendid success; if
the new theory is to be applied successfully, it
is clear that it has essentially to agree with
the old theory with which it may be compared instead of being compared with results of measurement directly; that means that we have to establish a correspondence between quantities of the
two theories, and we have to expect that the
corresponding quantities of the two theories
This
obey approximately the same relations.
correspondence has been anticipated in the preceding pages by using the same letters for quantities which it is intended to identify. But it
may not be superfluous to remind the reader that
the quantities u, f , 6 of the two theories are
not the same; there Is a certain arbitrariness
in choosing coordinates in curved space, and especially obvious it must be in the case of r
(of which u is the inverse); it is possible to
substitute for r some simple function of r, and,
indeed, it has been done; the criterion of correctness of choice must lie In the success of
the identification.

64
Next we must identify the constants of the
new theory with those of the old. It would seem
as though we must, in order to reach an agreeso as to get rid of the last
ment, make Y =
term of the equation 33.51 by which it differs
essentially from 33.5. But this would annihilate also the preceding term in the new formula
and so spoil the correspondence altogether. We
must, therefore, ascribe to Y a finite value,
but we will expect that it will be small; more
precisely, it will be small in such a way that
3
the term yu will not affect the equation 33.51
essentially or will be small in comparison with
u a . Next, let us compare 32.3 with 33.3.
Of
differ
the
the
left
hand
sides
facby
course,
to
tor dt/ds, but this is equal (Section 13)
* which
1//1 - p
is, even for motions of planets,
very close to one, so that, in the first approximation we may identify h with H. Comparing now
33.5 and 33.51 we come to the conclusion that
34.1

the roots denoted in the saae way in the preceding case, the third root will be

ar

ur

ut ,

and the Integral corresponding to 55.6 will be

y
I

.
/(u x -u)(u - u t ) [1 -

u,

u)E*J

the same substitution 55.7 as before will be applied. We only have to calculate

u x + us + u

-i x

+ u,

+ (u x -

ut

u t )sin*z

= ux + u a + u x sin'x + u, cos*x,
so that the integral becomes

= 2M

2M(ui + u r

u x sin'x

u,cos x)"

so that we may write 35.51 as

34.2

After we have made these identifications


the situation is then this:
if we neglect the
3
term 2Mu in the equation, and this term is
negligible in most cases, we have the same
equation of the orbit as in Newtonian mechanics.
This result is very satisfactory, we have been
able to obtain the equations of motion of a
planet without considering any gravitational
forces, as a result of our identification of the
contracted Riemann tensor with the complete tensor.
Still the term 2Mu 3 is there, the Relativity theory predicts an orbit that is slightly
different from that predicted by the Newtonian
theory; is the difference within the error of
observation? We shall consider now this question, but instead of considering the motion as a
whole, we shall consider only the feature of it
which for the case of Newtonian motion has been
considered in the preceding section; viz., we
shall ask ourselves whether, corresponding to an
oscillation of u between a minimum and a maximum
Of
value, the change in <p will be exactly x.
course, we are sure that in the new theory there
will be motions which differ but slightly from
the motion considered in the preceding section,
so that the general character of the motion will
be the same, and u will oscillate between a minimum and maximum. The value of du/d will now
be expressed by a polynomial of degree three the
first two terms of which are
2Mu 3 - u*.
The sum of the three roots of this polynomial is
1/2M so that if Ux, u a denote two roots, viz.,
those two roots which differ but slightly from

As we saw before, II is a very small quantity;


before, we neglected it altogether nd obtained
K for the value of the integral; now, we shall
go to the next approximation; we shall develop
the denominator according to the powers of M and
neglect all terms beyond the second (it would be
a very easy but not a worthwhile matter to estimate the value of the error) ; we get in this way,
as an approximate value for f^- fg

ua

ux

sinx

cosx)] dx,

or
The new theory predicts then that the angle e
will have changed by this amount while the distance from the sun changes from its miniaum to
its maximum; i.e., that the perihelion and aphelion are not in a straight line with the sun but
that the planet moves through an additional angle
of -^f (u x + u a ) after reaching the position opposite the one where it was during the perhelion,
before reaching the aphelion. Since the same
situation applies to the motion between an aphelion and the next perihelion we see that between
two consecutive perihelia the planet will have
moved through an angle * * 3xM(u! + u,), or
that the perihelion will have moved through an
angle 3M(u 1 + u t ) during one revolution of the
planet. This is a very small amount, and it nay
be considered as a correction to the classical
result according to which the planet moves on an
elliptic orbit with the sun in one of the foci.
If a is the major semi-axis and e the eccentricity, the distance at perihelion is a- ae and the
distance at aphelion a + ae: we have then

ux

-i-

= 2/a(l u, = l/(a - ae) + l/(a + ae)

e),

65
and the final formula for the advance
perihelion comes out

34.5

of

the

Here then we have two predictions: on the old


theory the perihelion will remain fixed in
space; according to the new one it will advance
What are the reby p during one revolution.
sults? In the case of most planets either this
amount is too small or the position of the perihelion too uncertain to permit any decision but
in the case of the planet Mercury it was known
for a long time that there is a discrepancy between the prediction of the Newtonian theory and
actual observations; and it happens that the
discrepancy is very nearly the amount showing
the discrepancy between the two theories,
so
that the theory of Relativity predicts a result
that has been actually observed. This must be
considered as a success of the new theory.

35.

Deflection of Light.

According to Section 29 a light particle


also moves along a geodesic, only in this case
It is a zero geodesic, one along which the tanThe equations
gent vectors have zero length.
for such a geodesic are the same as for the
other kind with the difference that the parameter is no longer arc length.
As a result we
have to have zero instead of -1 in the right
hand member of equation 32.1, that is to make
A =
in the equations 32.5 and 32.7. The equation of the orbit will therefore be (34.1)
35.1

+ u2 = o

2Mu

which will have to be compared with the same


3
which
equation without the term containing u
is an equation of a straight line and characterizes the propagation of a beam of light on the
old theory. In fact, the equation of a straight
line whose distance from the origin is 1/p and
which is perpendicular to the polar axis is in
our coordinates u = p cos 9 ; we have then du/d9
= - p sin 9 and taking the sum of the squares of
the last two expressions we find that they' add
8
we may identify with o.
Again
up to p which
3
the term 2Mu is very small because the maximum
value u can take is the inverse of the minimum
value of the distance from the center of the
radius of the sun; we
sun, which is the
treat the problem again as a perturbation problem, that is, compare the required solution to
3
that of the equation without the 2Mu term.
Again we are interested in the change of the
angle 9 corresponding to a transition between
the two extreme values of u. We shall be interested in a beam of light emitted from a star,
arriving into our telescope and passing on its
The
way very near to the surface of the sun.

distances of the star and even of the earth from


the sun are very large in comparison with the
minimum distance, and we shall take them
ex
The maximum value of u, corresponding to the
minimum distance from the sun we shell denote
Since du/df changes its sign when the
by u
light particle reaches this point it must vanish there so that the left hand side of 35.1 reduces to u ", and we have
.

Etfu

a'

we may use u Instead of a in our equation


write it in the form
8

(^)

Mu

and

2Mu.

Solving this for d0 we find

d9 =

-^_
/2M(u* -

$U
3
- (u* - u
)

)"

We introduce a new variable x letting

u = UQ sin x
and after the substitution develop according to
powers of M and keep only two terms; we thus
get for d9 approximately

if we let x change from o to


u will change
,
from zero to u and back to zero, Just the
change that the inverse distance will experience during the propagation of the light particle.
The total change of the angle will then
be represented by the integral

on the old theory which corresponds to the absence of the term with M in the equation 35.1
we will have to omit the term with M in this
integral and the result is *. The approximate
result according to the new theory will differ
from that by

The beam of light coming to us from a star will


then be deflected by an angle 4M/r where r is
its minimum distance from the sun, compared to
the old theory, or to the beam as it would go if
If then we observe a star
the sun were absent.
in a certain position on the sky while the sun
is far away, and then observe the same star when
the sun is near the line of vision; I.e., when
the apparent position of the sun is near the

M
apparent position of the star, this latter position must appear shifted away from the sun apActual measureproximately by the angle 4M/r
ments are possible only during an eclipse of the
sun, because otherwise the light from the sun
and
drowns out the fainter light from the star,
are beset with difficulties but the results seem
to be in favor of the prediction.
.

* - 0.

This means that r is a linear function of the


parameter,
r

ap * b

so that

d/dp
36.

The momentum vector uj is here therefore

Shift of Spectral Lines.

We come to the third so-called test of the


General Relativity Theory, that is, a case where
the predictions of the theory differ from those
of older theories by an amount exceeding the
error of measurement, thus affording an opportunity to prove or disprove the advantages of
the new theory.
In this case again we deal with propagation
of light in the gravitational field of the sun,
but this time the source is supposed to be on
the sun itself, and the observer is on the earth,
so that the direction of the beam is that of a
radius of the sun; we take this to mean that
9 = const, and <p = const.
We have then according to 32.5 with A =

36.1

a.d/dr.

. nt a = o.

a
a.

dr
dr

-r

dr

dr

dr

the last according to 36.2. What about the value of a? The answer is that it is not and cannot be determined by the foregoing discussion.
There are different beams of light which satisfy
all the conditions imposed so far; they differ
in color, and different colors correspond to
different values of a.
Color, according to our definition, is the
time components of the momentum vector; i.e.,
the scalar product of the momentum vector of
light and the unit vector in the time direction.
If we denote the (contravariant) components of
the latter by T 1 , the condition that it has tlae
direction will be given by
rpl

3 q3 = Q

ip8

This gives
dr =

and the condition that it is a unit vector - by


I

rpGCmp

...

cc(3

the double sign corresponds to two possible


senses of the beam:
from the sun to the earth
and from the earth to the sun.
The former,
in
which we are now Interested, is characterized by
the property that r increases as t increases;
the ratio dr/dt must therefore be positive, and
since T] is positive we must take

dr =

36.2

equa-

tions
dq>

4
T)(T )"

= 1.

The scalar product of the vectors ui and T 1 calculated according to the formula g^JiJT^ becomes

ryit.

The orbit is thus determined by the

de = 0,

which, taking into account the relations just


preceding and the values of the g's becomes

= 0,

dr =

rjdt.

But we are interested in the color this time,


and color, as we have agreed in Section 16, is
proportional to the time component of the momentum vector. As the momentum vector we have to
consider the vector of components du^/dp, where
p is a parameter appearing in the equations of
Geodesies, the one with respect to which differentiation is denoted by
.
In order to find
this parameter we have to go back to the original equations of geodesies; 32.21 becomes here

or, if we expand and keep only the first


terms,

e(l + M/r)

two

The color of a beam of light is then not constant along the beam. We shall compare the col
or as it appears near the surface of the sun,
where r is equal to the radius of the sun r s ,
and near the surface of the earth, where we may
The frequencies in these two
assume r = &.
cases will be for a given beam of light proportional to
1 + M/r

and

1;

the change in frequency will be proportional to


- 1
(1 * H/r)

equation 36.1 shows that the last two terms cancel leaving us with

M/r,

and this will be also the relative change in


frequency .

67
If now we consider some source of light
near the surface of the sun, whose frequency we
know, the light emitted by it when it is received at the surface of the earth will have a
frequency that is less, the amount of the relative change being given by

M/r.

If then we compare light coming from a terrestrial source, for Instance, emitted by an

atom, and light emitted by a corresponding sourct


on the sun, for instance, emitted by an atom of
the same kind, we would expect a change of frequency of the amount M/r. Or, if we compare a
Solar spectrum with a Terrestrial spectrum, the
lines of the former will be shifted toward the
red by the amount M/r. This is the prediction
of the General Relativity Theory.
Again the experimental evidence seems to
favor this prediction.

BINDING LIST

University of Torontp

o
Oi
o>
CD
00

Library

CVJ

H 0)
s ^
JH

0)
tiO

to

0)

-P

h o
O -H

O
\

CJ

o3

e
o
-P
0}

-S

DO NOT
REMOVE
THE
CARD
FROM
THIS

POCKET

&
Acme Library Card Pocket
Under Pat. "Ref. Index Flte"
PS

Made by LIBRARY BUREAU

También podría gustarte