Entropy & PDE - Evans

Entropy and Partial Dierential Equations
Lawrence C. Evans Department of Mathematics, UC Berkeley

Inspiring Quotations A good many times I have been present at gatherings of people who, by the standards of traditional culture, are thought highly educated and who have with considerable gusto been expressing their incredulity at the illiteracy of scientists. Once or twice I have been provoked and have asked the company how many of them could describe the Second Law of Thermodynamics. The response was cold: it was also negative. Yet I was asking something which is about the scientic equivalent of: Have you read a work of Shakespeares? C. P. Snow, The Two Cultures and the Scientic Revolution . . . C. P. Snow relates that he occasionally became so provoked at literary colleagues who scorned the restricted reading habits of scientists that he would challenge them to explain the second law of thermodynamics. The response was invariably a cold negative silence. The test was too hard. Even a scientist would be hard-pressed to explain Carnot engines and refrigerators, reversibility and irreversibility, energy dissipation and entropy increase. . . all in the span of a cocktail party conversation. E. E. Daub, Maxwells demon He began then, bewilderingly, to talk about something called entropy . . . She did gather that there were two distinct kinds of this entropy. One having to do with heat engines, the other with communication. . . Entropy is a gure of speech then. . . a metaphor. T. Pynchon, The Crying of Lot 49
CONTENTS Introduction A. Overview B. Themes I. Entropy and equilibrium A. Thermal systems in equilibrium B. Examples 1. Simple uids 2. Other examples C. Physical interpretations of the model 1. Equilibrium 2. Positivity of temperature 3. Extensive and intensive parameters 4. Concavity of S 5. Convexity of E 6. Entropy maximization, energy minimization D. Thermodynamic potentials 1. Review of Legendre transform 2. Denitions 3. Maxwell relations E. Capacities F. More examples 1. Ideal gas 2. Van der Waals uid II. Entropy and irreversibility A. A model material 1. Denitions 2. Energy and entropy a. Working and heating b. First Law, existence of E c. Carnot cycles d. Second Law e. Existence of S 3. Eciency of cycles 4. Adding dissipation, Clausius inequality B. Some general theories 1. Entropy and eciency 1
a. Denitions b. Existence of S 2. Entropy, temperature and separating hyperplanes a. Denitions b. Second Law c. HahnBanach Theorem d. Existence of S, T III. Continuum thermodynamics A. Kinematics 1. Denitions 2. Physical quantities 3. Kinematic formulas 4. Deformation gradient B. Conservation laws, ClausiusDuhem inequality C. Constitutive relations 1. Fluids 2. Elastic materials D. Workless dissipation IV. Elliptic and parabolic equations A. Entropy and elliptic equations 1. Denitions 2. Estimates for equilibrium entropy production a. A capacity estimate b. A pointwise bound 3. Harnacks inequality B. Entropy and parabolic equations 1. Denitions 2. Evolution of entropy a. Entropy increase b. Second derivatives in time c. A dierential form of Harnacks inequality 3. Clausius inequality a. Cycles b. Heating c. Almost reversible cycles V. Conservation laws and kinetic equations A. Some physical PDE 2
1. Compressible Euler equations a. Equations of state b. Conservation law form 2. Boltzmanns equation a. A model for dilute gases b. H -Theorem c. H and entropy B. Single conservation law 1. Integral solutions 2. Entropy solutions 3. Condition E 4. Kinetic formulation 5. A hydrodynamical limit C. Systems of conservation laws 1. Entropy conditions 2. Compressible Euler equations in one dimension a. Computing entropy/entropy ux pairs b. Kinetic formulation VI. HamiltonJacobi and related equations A. Viscosity solutions B. HopfLax formula C. A diusion limit 1. Formulation 2. Construction of diusion coecients 3. Passing to limits VII. Entropy and uncertainty A. Maxwells demon B. Maximum entropy 1. A probabilistic model 2. Uncertainty 3. Maximizing uncertainty C. Statistical mechanics 1. Microcanonical distribution 2. Canonical distribution 3. Thermodynamics VIII. Probability and dierential equations A. Continuous time Markov chains 3
1. Generators and semigroups 2. Entropy production 3. Convergence to equilibrium B. Large deviations 1. Thermodynamic limits 2. Basic theory a. Rate functions b. Asymptotic evaluation of integrals C. Cramers Theorem D. Small noise in dynamical systems 1. Stochastic dierential equations 2. It os formula, elliptic PDE 3. An exit problem a. Small noise asymptotics b. Perturbations against the ow Appendices: A. Units and constants B. Physical axioms References
INTRODUCTION A. Overview This course surveys various uses of entropy concepts in the study of PDE, both linear and nonlinear. We will begin in Chapters IIII with a recounting of entropy in physics, with particular emphasis on axiomatic approaches to entropy as (i) characterizing equilibrium states (Chapter I), (ii) characterizing irreversibility for processes (Chapter II), and (iii) characterizing continuum thermodynamics (Chapter III). Later we will discuss probabilistic theories for entropy as (iv) characterizing uncertainty (Chapter VII). I will, especially in Chapters II and III, follow the mathematical derivation of entropy provided by modern rational thermodynamics, thereby avoiding many customary physical arguments. The main references here will be Callen [C], Owen [O], and ColemanNoll [C-N]. In Chapter IV I follow Day [D] by demonstrating for certain linear second-order elliptic and parabolic PDE that various estimates are analogues of entropy concepts (e.g. the Clausius inequality). I as well draw connections with Harnack inequalities. In Chapter V (conservation laws) and Chapter VI (HamiltonJacobi equations) I review the proper notions of weak solutions, illustrating that the inequalities inherent in the denitions can be interpreted as irreversibility conditions. Chapter VII introduces the probabilistic interpretation of entropy and Chapter VIII concerns the related theory of large deviations. Following Varadhan [V] and Rezakhanlou [R], I will explain some connections with entropy, and demonstrate various PDE applications. B. Themes In spite of the longish time spent in Chapters IIII, VII reviewing physics, this is a mathematics course on partial dierential equations. My main concern is PDE and how various notions involving entropy have inuenced our understanding of PDE. As we will cover a lot of material from many sources, let me explicitly write out here some unifying themes: (i) the use of entropy in deriving various physical PDE, (ii) the use of entropy to characterize irreversibility in PDE evolving in time, and 5
(iii) the use of entropy in providing variational principles. Another ongoing issue will be (iv) understanding the relationships between entropy and convexity. I am as usual very grateful to F. Yeager for her quick and accurate typing of these notes.
CHAPTER 1: Entropy and equilibrium A. Thermal systems in equilibrium We start, following Callen [C] and Wightman [W], by introducing a simple mathematical structure, which we will later interpret as modeling equilibria of thermal systems: Notation. We denote by (X0 , X1 , . . . , Xm ) a typical point of Rm+1 , and hereafter write E = X0 . A model for a thermal system in equilibrium Let us suppose we are given: (a) an open, convex subset of Rm+1 , and (b) a C 1 -function (1) such that (2) S:R (i) S is concave S >0 (ii) E (iii) S is positively homogeneous of degree 1.
We call the state space and S the entropy of our system: (3) S = S (E, X1 , . . . , Xm )
Here and afterwards we assume without further comment that S and other functions derived from S are evaluated only in open, convex regions where the various functions make sense. In particular, when we note that (2)(iii) means (4) S (E, X1 , . . . , Xm ) = S (E, X1 , . . . , Xm ) ( > 0),
we automatically consider in (4) only those states for which both sides of (4) are dened. Owing to (2)(ii), we can solve (3) for E as a C 1 function of (S, X1 , . . . , Xm ): (5) Denitions. (6) = temperature T = E S E Pk = Xk = k th generalized force (or pressure). 7 E = E (S, X1 , . . . , Xm ). We call the function E the internal energy.
Lemma 1 (i) The function E is positively homogeneous of degree 1: (7) E (S, X1 , . . . , Xm ) = E (S, X1 , . . . , Xm ) ( > 0).
(ii) The functions T, Pk (k = 1, . . . ) are positively homogeneous of degree 0: (8) T (S, X1 , . . . , Xm ) = T (S, X1 , . . . , Xm ) Pk (S, X1 , . . . , Xm ) = Pk (S, X1 , . . . , Xm )
( > 0).
We will later interpret (2), (7) physically as saying the S, E are extensive parameters and we say also that X1 , . . . , Xn are extensive. By contrast (8) says T, Pk are intensive parameters. Proof. 1. W = E (S (W, X1 , . . . , Xm ), X1 , . . . , Xm ) for all W, X1 , . . . , Xm . Thus W = E (S (W, X1 , . . . , Xm ), X1 , . . . , Xm ) = E (S (W, X1 , . . . , Xm ), X1 , . . . , Xm ) by (4). Write S = S (W, X1 , . . . , Xm ), W = E (S, X1 , . . . , Xm ) to derive (7). 2. Since S is C 1 , so is E . Dierentiate (7) with respect to S , to deduce E E (S, X1 , . . . , Xm ) = (S, X1 , . . . , Xm ). S S
E . S
The rst equality in (8) follows from the denition T = similar. Lemma 2 We have (9)
E S
The other equalities in (8) are
S Pk 1 S = = , E T Xk T =
S 1 . E
(k = 1, . . . , m).
Proof. T =
Also
W = E (S (W, X1 , . . . , Xm ), X1 , . . . , Xm ) for all W, X1 , . . . , Xm . Dierentiate with respect to Xk : 0= E E S + . S Xk Xk

=T =Pk
We record the denitions (6) by writing

m
(10)
dE = T dS
k=1
Pk dXk
Gibbs formula.
E Note carefully: at this point (10) means merely T = E , Pk = X (k = 1, . . . , m). We will S k m later in Chapter II interpret T dS as innitesimal heating and k=1 Pk dXk as innitesimal working for a process. In this chapter however there is no notion whatsoever of anything changing in time: everything is in equilibrium.
Terminology. The formula S = S (E, X1 , . . . , Xm ) is called the fundamental equation of our system, and by denition contains all the thermodynamic information. An identity involving other derived quantities (i.e. T , Pk (k = 1, . . . , m)) is an equation of state, which typically does not contain all the thermodynamic information. B. Examples In applications X1 , . . . , Xm may measure many dierent physical quantities. 1. Simple uid. An important case is a homogeneous simple uid, for which E V N S T P = = = = = = = internal energy volume mole number S (E, V, N ) E = temperature S E V = pressure E = chemical potential. N
(1)
So here we take X1 = V , X2 = N , where N measures the amount of the substance comprising the uid. Gibbs formula reads: (2) dE = T dS P dV dN.
Remark. We will most often consider the situation that N is identically constant, say N = 1. Then we write (3) S (E, V ) = S (E, V, 1) = entropy/mole, 9
and so E V S T P = = = = = internal energy volume S (E, V ) = entropy E = temperature S E V = pressure
(4)
with (5) dE = T dS P dV.
Note that S (E, V ) will not satisfy the homogeneity condition (2)(iii) however.
Remark. If we have instead a multicomponent simple uid, which is a uniform mixture of r dierent substances with mole numbers N1 , . . . , Nr , we write S = S (E, V, N1 , . . . , Nr ) E j = N = chemical potential of j th component. j 2. Other examples. Although we will for simplicity of exposition mostly discuss simple uid systems, it is important to understand that many interpretations are possible. (See, e.g., Zemansky [Z].) Extensive parameter X length area volume electric charge magnetization
E Intensive parameter P = X tension surface tension pressure electric force magnetic intensity
Remark. Again to foreshadow, we are able in all these situations to interpret: P dX = innitesimal work performed by the system during some process innitesimal displacement 10
generalized force
C. Physical interpretations of the model In this section we provide some nonrigorous physical arguments supporting our model in A of a thermal system in equilibrium. We wish therefore to explain why we suppose (i) S is concave S >0 (ii) E (iii) S is positively homogeneous of degree 1. (See Appendix B for statements of physical postulates.) 1. Equilibrium First of all we are positing that the thermal system in equilibrium can be completely described by specifying the (m + 1) macroscopic parameters X0 , X1 , . . . , Xm , of which E = X0 , the internal energy, plays a special role. Thus we imagine, for instance, a body of uid, for which there is no temporal or spatial dependence for E, X1 , . . . , Xm . 2. Positivity of temperature Since
S E 1 =T , hypothesis (ii) is simply that the temperature is always positive.
3. Extensive and intensive parameters The homogeneity condition (iii) is motivated as follows. Consider for instance a uid body in equilibrium for which the energy is E , the entropy is S , and the other extensive parameters are Xk (k = 1, . . . , m). Next consider a subregion # 1, which comprises a th fraction of the entire region (0 < 1 < 1). Let S 1 , E 1 , . . . , Xk be the extensive parameters for the subregion. Then 1 S = S E 1 = E (1) X 1 = X (k = 1, . . . , m) k k
S1,E1,..,X1,..
k
S2,E2,..,X2,..
k
11
Consider as well the complementary subregion # 2, for which 2 S = (1 )S E 2 = (1 )E X2 (k = 1, . . . , m). k = (1 )Xk Thus (2) 1 2 S = S +S E = E1 + E2 X = X1 + X2 k k k
(k = 1, . . . , m).
The homogeneity assumption (iii) is just (1). As a consequence, we see from (2) that S, E, . . . , Xm are additive over subregions of our thermal system in equilibrium. 1 On the other hand, if T 1 , Pk , . . . are the temperatures and generalized forces for subregion 2 2 # 1, and T , . . . , Pk , . . . are the same for subregion # 2, we have T = T1 = T2 1 2 Pk = P k = Pk (k = 1, . . . , m),
owing to Lemma 1 in A. Hence T, . . . , Pk are intensive parameters, which take the same value on each subregion of our thermal system in equilibrium. 4. Concavity of S Note very carefully that we are hypothesizing the additivity condition (2) only for subregions of a given thermal system in equilibrium. We next motivate the concavity hypothesis (i) by looking at the quite dierent physical situation that we have two isolated uid bodies A, B of the same substance:
SA,EA,..,XA,..
k
SB,EB,..,XB,..
k
Here
A S A = S (E A , . . . , Xk , . . . ) = entropy of A B B B S = S (E , . . . , Xk , . . . ) = entropy of B,
12
for the same function S (, ). The total entropy is SA + SB. We now ask what happens when we combine A and B into a new system C , in such a way that no work is done and no heat is transferred to or from the surrounding environment:
SC,EC,..,XC,..
k
(In Chapter II we will more carefully dene heat and work.) After C reaches equilibrium, C we can meaningfully discuss S C , E C , . . . , Xk , . . . . Since no work has been done, we have
C A B Xk = Xk + Xk
(k = 1, . . . , m)
and since, in addition, there has been no heat loss or gain, EC = EA + EB. This is a form of the First Law of thermodynamics. We however do not write a similar equality for the entropy S . Rather we invoke the Second Law of thermodynamics, which implies that entropy cannot decrease during any irreversible process. Thus (3) But then SC = = =
C S (E C , . . . , Xk ,...) A B A B S (E + E , . . . , Xk + Xk ,...) A B S +S A B S (E A , . . . , Xk , . . . ) + S (E B , . . . , Xk , . . . ).
SC SA + SB.
(4)
This inequality implies S is a concave function of (E, X1 , . . . , Xm ). Indeed, if 0 < < 1, we have: A B + (1 )Xk ,...) S (E A + (1 )E B , . . . , Xk A A B B S (E , . . . , Xk , . . . ) + S ((1 )E , . . . , (1 )Xk , . . . ) by (4) A B = S (E A , . . . , Xk , . . . ) + (1 )S (E B , . . . , Xk , . . . ) by (iii). 13
Thus S is concave. 5. Convexity of E Next we show that (5) E is a convex functionof (S, X1 , . . . , Xm ).
A A B B To verify (5), take any S A , S B , X1 , . . . , Xm , X1 , . . . , Xm , and 0 < < 1. Dene A A E A := E (S A , X1 , . . . , Xm ) B B B B E := E (S , X1 , . . . , Xm );
so that
A A S A = S (E A , X1 , . . . , Xm ) B B B B S = S (E , X1 , . . . , Xm ).
Since S is concave,
A B + (1 )Xk ,...) S (E A + (1 )E B , . . . , Xk A A S (E , . . . , Xk , . . . ) B +(1 )S (E B , . . . , Xk , . . . ).
(6)
Now W = E (S (W, . . . , Xk , . . . ), . . . , Xk , . . . ) for all W, X1 , . . . , Xm . Hence

A E A + (1 )E B = E (S (E A + (1 )E B , . . . , Xk B A B +(1 )Xk , . . . ), . . . , Xk + (1 )Xk ,...) k E (S (E A , . . . , XA ,...) B B A B , . . . ), . . . , Xk + (1 )Xk ,...) +(1 )S (E , . . . , Xk
owing to (6), since
E S
= T > 0. Rewriting, we deduce
A B , . . . ) + (1 )E (S B , . . . , Xk ,...) E (S A , . . . , Xk A B A B E (S + (1 )S , . . . , Xk + (1 )Xk , . . . ),
and so E is convex. 6. Entropy maximization and energy minimization 14
Lastly we mention some physical variational principles (taken from Callen [C, p. 131 137]) for isolated thermal systems. Entropy Maximization Principle. The equilibrium value of any unconstrained internal parameter is such as to maximize the entropy for the given value of the total internal energy. Energy Minimization Principle. The equilibrium value of any unconstrained internal parameter is such as to minimize the energy for the given value of the total entropy.
graph of S = S(E,.,Xk,.)
E *,.) (E*,.,Xk
Xk E=E* (constraint) S graph of E = E(S,.,Xk,.) *,.) (S*,.,Xk
S=S* (constraint) E
Xk
15
The rst picture illustrates the entropy maximization principle: Given the energy constraint E = E , the values of the unconstrained parameters (X1 , . . . , Xm ) are such as to maximize (X1 , . . . , Xm ) S (E , X1 , . . . , Xm ). The second picture is the dual energy minimization principle. Given the entropy constraint S = S , the values of the unconstrained parameters (X1 , . . . , Xm ) are such as to minimize (X1 , . . . , Xm ) E (S , X1 , . . . , Xm ). D. Thermodynamic potentials Since E is convex and S is concave, we can employ ideas from convex analysis to rewrite E various formulas in terms of the intensive variables T = E , Pk = X (k = 1, . . . , m). The S k primary tool will be the Legendre transform. (See e.g. Sewell [SE], [E1, III.C], etc.) 1. Review of Legendre transform Assume that H : Rn (, +] is a convex, lower semicontinuous function, which is proper (i.e. not identically equal to innity). Denition. The Legendre transform of L is (1) L(q ) = sup (p q H (p))
pRn
(q Rn ).
We usually write L = H . It is not very hard to prove that L is likewise convex, lower semicontinuous and proper. Furthermore the Legendre transform of L = H is H : (2) L = H , H = L .
We say H and L are dual convex functions. Now suppose for the moment that H is C 2 and is strictly convex (i.e. D2 H > 0). Then, given q , there exists a unique point p which maximizes the right hand side of (1), namely the unique point p = p(q ) for which (3) Then (4) L(q ) = p q H (p), p = p(q ) solving (3). q = DH (p).
16
Furthermore
DL(q ) = p + (q DH (p))Dq p = p by (3),
and so (5) p = DL(q ).
Remark. In mechanics, H often denotes the Hamiltonian and L the Lagrangian. 2. Denitions
The energy E and entropy S are not directly physically measurable, whereas certain of the intensive variables (e.g. T, P ) are. It is consequently convenient to employ the Legendre transform to convert to functions of various intensive variables. Let us consider an energy function E = E (S, V, X2 , . . . , Xm ), where we explicitly take X1 = V = volume and regard the remaining parameters X2 , . . . , Xm as being xed. For simplicity of notation, we do not display (X2 , . . . , Xm ), and just write (6) E = E (S, V ).
There are 3 possible Legendre transforms, according as to whether we transform in the variable S only, in V only, or in (S, V ) together. Because of sign conventions (i.e. T = E S E P = V ) and because it is customary in thermodynamics to take the negative of the mathematical Legendre transform, the relevent formulas are actually these: Denitions. (i) The Helmholtz free energy F is (7) (ii) The enthalpy H is (8) H (S, P ) = inf (E (S, V ) + P V ).
V
F (T, V ) = inf (E (S, V ) T S ).1

S
(iii) The Gibbs potential (a.k.a. free enthalpy) is (9) G(T, P ) = inf (E (S, V ) + P V ST ).
S,V
The functions E, F, G, H are called thermodynamic potentials.

1
The symbol A is also used to denote the Helmholtz free energy.
17
Remark. The inf in (7) is taken over those S such that (S, V ) lies in the domain of E . A similar remark applies to (8), (9). To go further we henceforth assume: (10) E is C 2 , strictly convex
and furthermore that for the range of values we consider (11) the inf in each of (7), (8), (9) is attained at a unique point in the domain of E .
We can then recast the denitions (7)(9): Thermodynamic potentials, rewritten: (12) F = E T S, where T = E S E V
(13)
H = E + P V, where P =
(14)
G = E T S + P V, where T =
E E , P = . S V
E (S, V S
More precisely, (12) says F (T, V ) = E (S, V ) T S , where S = S (T, V ) solves T = We are assuming we can uniquely, smoothly solve for S = S (T, V ).
).
Commentary. If E is not strictly convex, we cannot in general rewrite (7)(9) as (12)(14). In this case, for example when the graph of E contains a line or plane, the geometry has the physical interpretation of phase transitions: see Wightman [W]. Lemma 3 (i) E is locally strictly convex in (S, V ). (ii) F is locally strictly concave in T , locally strictly convex in V . (iii) H is locally strictly concave in P , locally strictly convex in S . (iv) G is locally strictly concave in (T, P ). Remark. From (9) we see that G is the inf of ane mappings of (T, P ) and thus is concave. However to establish the strict concavity, etc., we will invoke (10), (11) and use the formulations (12)(14). Note also that we say locally strictly convex, concave in (ii)(iv), since what we really establish is the sign of various second derivatives. 18
Proof. 1. First of all, (i) is just our assumption (10). 2. To prove (ii), we recall (12) and write (15) where (16) Then (15) implies
F T F V
F (T, V ) = E (S (T, V ), V ) T S (T, V ),
T =
E (S (T, V ), V ). S
= =
E S E S
S T S V
S S T T = S E S E + V T V = V (= P ).
Thus (17) Next dierentiate (16): 1= 0= Thus (17) gives: Since E is strictly convex: 2E 2E 2E 2E > 0 , > 0 , > S 2 V 2 S 2 V 2 2F 2F < 0 , > 0. T 2 V 2 This proves (ii), and (iii),(iv) are similar. 3. Maxwells relations Notation. We will hereafter regard T, P in some instances as independent variables (and not, as earlier, as functions of S, V ). We will accordingly need better notation when we compute partial derivatives, to display which independent variables are involved. The standard notation is to list the other independent variables outside parenthesis. 19 Hence: 2E SV
2 2F T 2 2F V 2 2E S 2 2E S 2 S T S V 1 2 1 2F T 2 2F V 2 S = T 2 E S V S V 2E . V 2
2E SV
= =
2E V 2
2E S 2
2E SV
2E S 2
For instance if we think of S as being a function of, say, T and V , we henceforth write S T
to denote the partial derivative of S in T , with V held constant, and S V
to denote the partial derivative of S in V , T constant. However we will not employ parenthesis when computing the partial derivatives of E, F, G, H with respect to their natural , not arguments. Thus if we are as usual thinking of F as a function of T, V , we write F T F . T V We next compute the rst derivatives of the thermodynamic potentials: Energy. E = E (S, V ) (18) Free energy. F = F (T, V ) (19) Enthalpy. H = H (S, P ) (20) Gibbs potential. G = G(T, P ) (21) G G = S, = V. T P H H = T, = V. S P F F = S, = P. T V E E = T, = P. S V
Proof. The formulas (18) simply record our denitions of T, P . The remaining identities , S = S (T, V ). are variants of the duality (3), (5). For instance, F = E T S , where T = E S So F S S = E S T T T S T V V = S, as already noted earlier. 20
We can now equate the mixed second partial derivatives of E, F, G, H to derive further identities. These are Maxwells relations: (22) T V S V T P S P
2E V S
=
S
P S P T V S V T
(23)
=
T
(24)
=
S
(25) The equality (22) just says E. Capacities =
=
T
2E SV
; (23) says
2F V T
2F T V
, etc.
For later reference, we record here some notation: (1) CP = T S T S T S P S V V T = heat capacity at constant pressure
P
(2)
CV = T
= heat capacity at constant volume

V
(3)
P = T
= latent heat with respect to pressure

T
(4)
V = T
= latent heat with respect to volume

T
(5)
1 V
= coecient of thermal expansion

P
21
(6)
KT =
1 V 1 V
V P V P
= isothermal compressibility
T
(7)
KS =
= adiabatic compressibility.
S
(See [B-S, p. 786-787] for the origin of the terms latent heat, heat capacity.) There are many relationships among these quantities: Lemma 4 (i) CV = E T V (ii) CP = H T P (iii) CP CV > 0 E (iv) V P = V
Proof. 1. Think of E as a function of T, V ; that is, E = E (S (T, V ), V ), where S (T, V ) means S as a function of T, V . Then E T =
V
E S
S T
=T
V
S T
= CV .
V
Likewise, think of H = H (S (T, P ), P ). Then H T where we used (20) from D. 2. According to (19) in D: S= Thus (8) CV = T S T = T
V
=
P
H S
S T
=T
P
S T
= CP ,
P
F . T 2F > 0, T 2
since T F (T, V ) is locally strictly concave. Likewise S= owing to (21) in D; whence (9) CP = T S T = T
P
G T 2G > 0. T 2
22
3. Now according to (12), (14) in D: G = F + PV ; that is, G(T, P ) = F (T, V ) + P V, where F (T, V ) = P. V = V (T, P ) solves V Consequently:
G T
= =
F + T F , T
F V
+P
V T
and so (10) 2G 2F F = + 2 2 T T T V V T .
P
But dierentiating the identity F/V (T, V ) = P , we deduce 2F 2F + V T V 2 V T = 0.

P
Substituting into (10) and recalling (8), (9), we conclude CP CV (11) = = T

2F T 2
2G T 2 2F V T 2
T 2 F/V 2
0,
since V F (T, V ) is strictly convex. This proves (iii). Assertion (iv) is left as an easy exercise. Remark. Using (19) in D, we can write (12) CP CV = T P T
2 V
P V
(Kelvins formula).
T
F. More examples 1. Ideal gas An ideal gas is a simple uid with the equation of state (1) P V = RT, 23
where R is the gas constant (Appendix A) and we have normalized by taking N = 1 mole. As noted in A, such an expression does not embody the full range of thermodynamic information available from the fundamental equation S = S (E, V ). We will see however that many conclusions can be had from (1) alone: Theorem 1 For an ideal gas, (i) CP , CV are functions of T only: CP = CP (T ), CV = CV (T ). (ii) CP CV = R. (iii) E is a function of T only:
T
(2)
E = E (T ) =
T0
CV ()d + E0 .
(iv) S as a function of (T, V ) is:

T
(3)
S = S (T, V ) = R log V +
T0
CV () d + S0 .
Formulas (2), (3) characterize E, S up to additive constants. Proof. 1. Since E = E (S, V ) = E (S (T, V ), V ), we have
E V T
= E S = T = T
S E + V V T S P V T P P, T V
P = where we utilized the Maxwell relation (23) in D. But for an ideal gas, T P T V TR E P = 0. Consequently V T = 0. Hence if we regard E as a function of (T, V ), E in V fact depends only on T . But then, owing to the Lemma 4, CV = E T =
V
dE dT
depends only on T . 2. Next, we recall Kelvins formula (12) in E: CP CV = T

P V T
P T
.
V
24
Since P V = RT , we have
P V T P T V
= RT , V2 R =V . R2 V2 = R.
Thus CP CV =
T
RT V2
As R is constant and CV depends only on T , CP likewise depends only on T . 3. Finally, think of S as a function of T, V : S = S (E (T, V ), V ) = S (E (T ), V ). Then S T S dE 1 = CV (T ), E dT T =
T
=
V
S V Formula (3) follows.
S P R = = . V T V
Remark. We can solve (2) for T as a function of E and so determine S = S (T (E ), V ) as a function of (E, V ). Let us check that (E, V ) S is concave, provided CV > 0. Now
S E S V
= =
2S E 2 2S EV 2S V 2
S T T V E S = V T
1 =T CV (T ) CV1(T ) = P T V
1 T
R . V
Thus
T = T12 E = T 2 C1 <0 V (T )
= 0
R = V 2 < 0,
and so (E, V ) S is concave. Remark. Recalling B, we can write for N > 0 moles of ideal gas: (4) P V = N RT,
25
and
S (E, V, N ) The function S of (E, V, N ) from B
= NS
E V , ,1 N N
NS
E V , N N
The function S of (E, V ) from B. We note next that (5) (E, V, N ) S is concave.
> 0, then: Indeed if 0 < < 1, N, N V + (1 )V , N + (1 )N ) S (E + (1 )E, )S = (N + (1 )N

V +(1)V E +(1)E , N +(1)N N +(1)N V V )S E + (1 ) E = (N + (1 )N , N + (1 ) N , N N
where = N N + (1 )N ,1 =
(1 )N N + (1 )N
Since (E, V ) S is concave, we deduce: V + (1 )V , N + (1 )N ) S (E + (1 )E, ) S (N + (1 )N = N S

E V , N N E V , N N
+ (1 )S
V E , N N
V E , N N
S + (1 )N
V ,N ). = S (E, V, N ) + (1 )S (E, This proves (5).
A simple ideal gas is an ideal gas for which CV (and so CP ) are positive constants. Thus (6) := CP > 1. CV
From (2), (3), we deduce for a simple ideal gas that (7) E = CV T S = R log V + CV log T + S0 26
where we have set E0 = 0. Thus for a simple ideal gas (8) (N = 1) S (E, V ) = R log V + CV log E + S0 V E S (E, V, N ) = N R log N + N CV log N + S0 N.
(The constants S0 in (7), (8) and below dier.) For later reference we record: (9) where =
CP CV
S = CV log(T V 1 ) + S0 S = CV log(P V ) + S0 , CP CV = R, N = 1.
2. Van der Waals uid A van der Waals uid is a simple uid with the equation of state (10) for constants a, b > 0. Theorem 2 For a van der Waals uid, (i) CV is a function of T only: CV = CV (T ). (ii) E as a function of (T, V ) is:
T
P =
a RT 2 V b V
(V > b, N = 1)
(11)
E = E (T, V ) =
T0
CV ()d
a + E0 . V
(iii) S as a function of (T, V ) is (12) S = S (T, V ) = R log(V b) +

T T0
CV () d + S0 .
Proof. 1. As in the previous proof, E V But P = deduce

RT V b
=T
T
P T
P.
V
a V2
and so
E V T
a . V2
Hence if we think of E as a function of (T, V ), we
E=
a + (a function of T alone). V 27
But then CV = depends only on T . Formula (11) follows. 2. As before, S = S (E (T, V ), V ). Then
S T V S V T
E T
= = = =
S E E T S E E V 1 a + T V2 R . V b
V T P T
1 =T CV (T ), S + V
Formula (12) results upon integration. Note. CP depends on both T and V for a van der Waals uid. We can dene a simple van der Waals uid, for which CV is a constant. Then
a E = CV T V + E0 S = R log(V b) + CV log T + S0 ,
(N = 1).
However if we solve for S = S (E, V ), S is not concave everywhere. Thus a van der Waals uid ts into the foregoing framework only if we restrict attention to regions where (E, V ) S is concave. Remark. More generally we can replace S by its concave envelope (= the smallest concave function greater than or equal to S in some region). See Callen [C] for a discussion of the physical meaning of all this.
28
CHAPTER 2: Entropy and irreversibility In Chapter I we began with an axiomatic model for a thermal system in equilibrium, and so could immediately discuss energy, entropy, temperature, etc. This point of view is static in time. In this chapter we introduce various sorts of processes, involving changes in time of the parameters in what we now call a thermodynamic system. These, in conjunction with the First and Second Laws of thermodynamics, will allow us to construct E and S . A. A model material We begin by turning our attention again to the example of simple uids, but now we reverse the point of view of Chapter I and ask rather: How can we construct the energy E and entropy S ? We will follow Owen [O] (but see also BharathaTruesdell [B-T]). 1. Denitions Since we intend to build E, S , we must start with other variables, which we take to be T, V . A model for a homogeneous uid body (without dissipation) Assume we are given: (a) an open, simply connected subset (0, ) (0, ) ( is the state space and elements of are called states) and (b) C 1 -functions P , V , CV dened on (P is the pressure, V the latent heat with respect to volume, CV the heat capacity at constant volume) Notation. We write P = P (T, V ), V = V (T, V ), CV = CV (T, V ) to display the dependence of the functions P, V , CV on (T, V ). We further assume: (1) P < 0, V = 0, CV > 0 in . V 2. Energy and entropy a. Working and heating
29
We dene a path for our model to be an oriented, continuous, piecewise C 1 curve in . A path is a cycle if its starting and endpoints coincide.
V
(T(t),V(t)) t
Notation. We parameterize by writing = {(T (t), V (t)) for a t b}, where a < b and V, T : [a, b] R are C 1 . Denitions. (i) We dene the working 1-form (2)
d W = P dV
and dene the work done by the uid along to be (3) W() =
d W =
P dV.
(ii) We likewise dene the heating 1-form (4)

d Q = CV dT + V dV
and dene the net heat gained by the uid along to be (5) Remarks. (a) Thus W() =
a b
Q() =
d Q=
CV dT + V dV.
(t)dt P (T (t), V (t))V 30
d dt
and Q() =
a
(t) + V (T (t), V (t))V (t)dt. CV (T (t), V (t))T
We call (6) the rate of working and (7) (t) + V (T (t), V (t))V (t) q (t) = CV (T (t), V (t))T (t) w(t) = P (T (t), V (t))V
the rate of heating at time t (a t b). (b) Note very carefully that there do not in general exist functions W , Q of (T, V ) W, d Q whose dierentials are the working, heating 1-forms. The slash through the d in d emphasizes this. Consequently W(), Q() depend upon the path traversed by and not merely upon its endpoints. However, W(), Q() do not depend upon the parameterizations of . Physical interpretations. (1) If we think of our homogeneous uid body as occupying the region U (t) R3 at time t, then the rate of work at time t is w(t) =
U (t)
P v dS,
v denoting the velocity eld and the outward unit normal eld along U (t). Since we assume P is independent of position, we have w(t) = P
U (t) d v dS = P dt U (t)
dx
(t), = PV in accordance with (6). (2) Similarly, V records the gain of heat owing to the volume change (at xed temperature T ) and CV records the gain of heat due to temperature change (at xed volume V ). Denitions. Let = {(T (t), V (t)) | a t b} be a path in . (i) is called isothermal if T (t) is constant (a t b). (ii) is called adiabatic if q (t) = 0 (a t b). (t) + V (T (t), V (t))V (t), Construction of adiabatic paths. Since q (t) = CV (T (t), V (t))T (a t b), we can construct adiabatic paths by introducing the parameterization (T, V (T )) and solving the ODE (8) dV CV (V, T ) = dT V (V, T ) ODE for adiabatic paths 31
for V as a function of T , V = V (T ). Taking dierent initial conditions for (8) gives dierent adiabatic paths (a.k.a. adiabats). Any C 1 parameterization of the graph of V = V (T ) gives an adiabatic path. b. The First Law, existence of E We turn now to our basic task, building E, S for our uid system. The existence of these quantities will result from physical principles, namely the First and Second Laws of thermodynamics. We begin with a form of the First Law: We hereafter assume that for every cycle of our homogeneous uid body, we have: (9) W() = Q().
This is conservation of energy: The work done by the uid along any cycle equals the heat gained by the uid along the cycle. Remark. We assume in (9) that the units of work and heat are the same. If not, e.g. if heat is measured in calories and work in Joules (Appendix A), we must include in (9) a multiplicative factor on the right hand side called the mechanical equivalent of heat (= 4.184J/calorie). We deduce this immediate mathematical corollary: Theorem 1 For our homogeneous uid body, there exists a C 2 function E : R such that (10) E E = V P, = CV . V T
We call E = E (T, V ) the internal energy. Proof. According to (3), (5), (9): CV dT + (V P )dV = 0
for each cycle in . The 1-form CV dT + (V P )dV is thus exact, since is open, simply connected. This means there exists a C 2 function E with (11) dE = CV dT + (V P )dV. 32
This statement is the same as (10).
Notation. From (11), it follows that

dE = d Qd W
(12) exact 1-form non-exact 1-forms c. Carnot cycles Denition. A Carnot cycle for our uid is a cycle consisting of two distinct adiabatic paths and two distinct isothermal paths, as drawn:
V b c a d T1 T2 T
(We assume V > 0 for this picture and take a counterclockwise orientation.) We have Q(b ) = Q(d ) = 0, since b , d are adiabatic paths. Notation. Q = Q(c ) = heat emitted at temperature T1 Q+ = Q(a ) = heat gained at temperature T2 Q = W() = Q+ Q = work.
Denition. A Carnot cycle is a Carnot heat engine if Q+ > 0 (13) and Q > 0
heat is gained at heat is lost at the higher temperature T2 the lower temperature T1
The picture above illustrates the correct orientation of for a Carnot heat engine, provided V > 0. 33
Example. Suppose our uid body is in fact an ideal gas (discussed in I.F). Then P V = RT if we consider N = 1 mole, and (14) P (T, V ) = V (T, V ) =
RT , V RT . V
CV (T, V ) = CV (T ),
(The formula for V is motivated by our recalling from I.E that we should have V = S T V = T P = RT .) Consider a Carnot heat engine, as drawn: T V V T
V3 V=V2(T) V2
V4 V1
V=V1(T)
T1
T2
We compute (15) Q+ =
V2
V dV = RT2 log
V1
V2 V1
The equation for the adiabatic parts of the cycle, according to (8), is: V CV (T ) CV dV = = . dT V RT Hence the formulas for the lower and upper adiabats are: T C V ( ) V1 (T ) = V1 exp T d R 2 V (T ) = V exp 2 2 and so
T C V ( ) d T2 R
V4 = V1 (T1 ) = V1 exp V = V (T ) = V exp 3 2 1 2 34
T1 CV () d R T2 T1 CV () d R T2
Therefore Q = The work is
V4 V3
V dV
= RT1 log = RT1 log
V4 V3 V1 V2
> 0.
W = Q+ Q = R(T2 T1 ) log > 0;

V2 V1
and for later reference we deduce from (15) that (16) W= 1 T1 T2 Q+ for a Carnot cycle of an ideal gas. d. The Second Law We next hypothesize the following form of the Second Law of thermodynamics: For each Carnot heat engine of our homogeneous uid body, operating between temperatures T1 < T2 , we have W>0 and (17) W= 1 T1 T2 Q+ .
In other words we are assuming that formula (17), which we showed above holds for any Carnot heat engine for an ideal gas, in fact holds for any Carnot heat engine for our general homogeneous uid body. Physical interpretation. The precise relation (17) can be motivated as follows from this general, if vague, statement, due essentially to Clausius: no system which employs homogeneous uid bodies operating through cycles can absorb heat at one temperature (18) T1 and emit the same amount of heat at a higher temperature T2 > T1 , without doing work on its environment. Let us rst argue physically that (18) implies this assertion: are two Carnot heat engines (for possibly If , dierent homogeneous uid bodies) and , both operate (19) between the same temperatures T2 > T1 , then + . implies Q+ = Q W=W 35
This says that any two Carnot cycles which operate between the same temperatures and which perform the same work, must absorb the same heat at the higher temperature. Physical derivation of (19) from (18). To see why (19) is in some sense a consequence of (18), suppose not. Then for two uid bodies we could nd Carnot heat engines , operating between the temperatures T2 > T1 , such that +. , but Q+ > Q W=W , we observe Then since W = W + Q+ < 0. Q ) = Q (Q followed by the reversal of . Then would Imagine now the process consisting of Q ) > 0 units of heat at the lower temperature T1 and emit the same Q absorb Q = (Q W = 0, no work would be performed units of heat at the higher temperature. But since W by . This would all contradict (18), however. Physical derivation of (17) from (19). Another way of stating (19) is that for a Carnot heat engine, Q+ is some function (T1 , T2 , W) of the operating temperatures T1 , T2 and the work W, and further this function is the same for all uid bodies. But (16) says T2 Q+ = W = (T1 , T2 , W) T2 T1 for an ideal gas. Hence (19) implies we have the same formula for any homogeneous uid body. This is (17). Remark. See Owen [O], Truesdell [TR, Appendix 1A], BharathaTruesdell [B-T] for a more coherent discussion. e. Existence of S We next exploit (17) to build an entropy function S for our model homogeneous uid body: Theorem 2 For our homogeneous uid body, there exists a C 2 function S : R such that (20) S V S CV = , = . V T T T
We call S = S (T, V ) the entropy. Proof. 1. Fix a point (T , V ) in and consider a Carnot heat engine as drawn (assuming V > 0): 36
V3 V=V2(T) V2
V4 V1
V=V1(T) (V * ,T * ) T1 T2 T
Now (21) Furthermore W=
Q+ =
V2
V (V, T2 )dV.
V1
T2
V2 (T ) V1 (T )
P dV =
T1
P dV dT T
by the GaussGreen Theorem. This identity, (17) and (21) imply

V2 V1
T2 V (V, T2 )dV = T2 T1
V2
T2 T1
V2 (T ) V1 (T )
P dV dT. T
Let T1 T2 = T :
V2
V (V, T )dV = T
V1 V1
P (V, T )dV. T
Divide by V2 V1 and let V2 V1 = V , to deduce (22) V = T P T (Clapeyrons formula)
at the point (T , V ). Since this point was arbitrary, the identity (22) is valid everywhere in . 2. Recall from (10) that (23) E E = V P, = CV . V T
37
Consequently:
T V T
= = =
1 V T T 1 T V
V T2
2E V T CV T
P T
1 T
P T
by (22), (23)
by (23) again.
Thus the form
V CV dT + dV T T is exact: there exists a C 2 -function S such that dS = CV V dT + dV. T T

d Q , T
(24) This is (20).
Notation. From (24) it follows that (25) and so (12) becomes Gibbs formula: dE = T dS P dV. S as a function of (E, V ). We have determined E, S as functions of (T, V ). To be consistent with the axiomatic approach in Chapter I, however, we should consider S as a function of the extensive variables (E, V ). First, since E = CV > 0, we can solve for T = T (E, V ). Then P = P (T, V ) = T P (T (E, V ), V ) gives P as a function of (E, V ). Also the formulas S = S (T, V ) = S (T (E, V ), V ) display S as a function of (E, V ). Consequently (26) and (27)
S V E S E V
dS =
= =
S T T V E CV 1 1 =T T CV
by (20)
= =
S S T + V T V V T CV T V + T by (20). T V
But E (T (W, V ), V ) = W for all W and so E T T + V 38 E V = 0.

T
Hence (10) implies (28) Consequently (27) says (29)

S V E
CV =
P . T
T = P V . V
In summary: =
V
S E
1 , T
S V
=
E
P , T
as expected from the general theory in Chapter I. Finally we check that (30) S is a concave function of (E, V ).
For proving this, we deduce rst from (29) that for S = S (E, V ): (31) Also 1 2S = < 0. 2 E CV T 2 2S = V 2 P V =
E P V E
T V T2
. ,
E
Now
P P + V T
T V
and so
2S 1 P P 1 = + 2 T P 2 V T V T T T V =
E E V E T
T V
.
E
But
P V by (10), CV P . T
and (22) says: V = T Thus (32) since

P V
1 P 1 2S = (P V )2 < 0, 2 V T V CV T 2 < 0, CV > 0. Lastly,

T V 2S = EV T2 E
V P . T 2 CV
39
Consequently (31), (32) imply

2S E 2 2S V 2
2S EV
(33)
= CV1T 2
2 V
1 P T V
1 (P CV T 2
V )2
V P ) > 0. ( T 4C2
Owing to (31), (32), (33) S is a concave function of (E, V ). 3. Eciency of cycles Recall from 2 that q (t) = rate of heating at time t (t) + V (T (t), V (t))V (t), = CV (T (t), V (t))T where = {(T (t), V (t)) | a t b} is a path in . Notation. (i) q + (t) = q (t) = q (t) if q (t) 0 0 if q (t) 0 0 if q (t) 0 q (t) if q (t) 0
q (t) = q + (t) q (t) (a t b) (ii) Q+ () = Q () = (iii)

b a b a
q + (t)dt = heat gained along q (t)dt = heat emitted along
W() = Q+ () Q () = work performed along .
Denition. Assume is a cycle. The eciency of is (34) = W() , Q+ ()
the ratio of the work performed to the heat absorbed. Note 0 1.
40
Example. If is a Carnot heat engine operating between temperatures 0 < T1 < T2 , we have (35) according to (16). =1 T1 T2
Notation. Let be an arbitrary cycle in , with parameterization {(T (t), V (t)) | a t b}. Let (36) T1 = min{T (t) | a t b} T2 = max{T (t) | a t b}
denote the highest and lowest temperatures occurring in the cycle.
T1
T2
Theorem 3 Let be a cycle as above, and let denote its eciency. Then (37) Proof. According to (20), CV V d q = T+ V = S (T (t), V (t)). T T T dt Since is a cycle, we therefore have (38) 0=
d Q = T b a
T1 . T2
q dt. T
41
Then 0 = (39) =
b q+ qT dt a T b + 1 1 q dt T T2 a 1 Q + () Q () T1 T2
b a
q dt,
since q + 0, q 0, T1 T T2 on . Consequently: W () = 1 =Q + () 1
Q () Q + () T1 . T2
We will later (in B) take the eciency estimate (37) as a starting point for more general theory. 4. Adding dissipation, Clausius inequality We propose next, following Owen [O] and Serrin [S1], to modify our model to include irreversible, dissipative eects. Notation. Remember that we represent a parametric curve by writing (40) (T (t), V (t)) for a t b,
where a < b and V, T : [a, b] R are C 1 . We will henceforth call a process to emphasize that the following constructions depend on the parameterization. We call a cyclic process if (T (a), V (a)) = (T (b), V (b)). A model for a homogeneous uid body (with dissipation). Assume we are given: (a) a convex open subset (0, ) (0, ) ( is the state space), and (b) two C 2 functions W , Q dened on R2 . Notation. We will write W = W (T, V, A, B ) Q = Q(T, V, A, B )
where (T, V ) , (A, B ) R2 . We assume further that W , Q have the form: (41) W (T, V, A, B ) = P (T, V )B + R1 (T, V, A, B ) Q(T, V, A, B ) = CV (T, V )A + V (T, V )B + R2 (T, V, A, B ) 42
for all (T, V ) , (A, B ) R2 , where the remainder terms R1 , R2 satisfy: (42) |R1 (T, V, A, B ), R2 (T, V, A, B )| C (A2 + B 2 )
for some constant C and all (T, V ), (A, B ) as above. Lastly we suppose that P, CV , V satisfy: (43) P < 0, V = 0, CV > 0 in . V
b a
Given a process as above, we dene the work done by the uid along to be (44) W() =
d W =
(t), V (t))dt W (T (t), V (t), T
and the net heat gained by the uid along to be (45) Notation. We call (46) the rate of working and (47) (t), V (t)) q (t) = Q(T (t), V (t), T (a t b)
d W and d Q are dened
Q() =
dQ =
a
(t), V (t))dt. Q(T (t), V (t), T
(t), V (t)) w(t) = W (T (t), V (t), T
the rate of heating at time t. In this model the expressions by (44), (45), but d W and d Q are not dened.
Remark. Note very carefully: W(), Q() depend not only on the path described by but also on the parameterization. Our intention is to build energy and entropy functions for our new model uid with dissipation. As before we start with a form of the First Law: For each cyclic process of our homogeneous uid body with dissipation, we assume (48) W() = Q().
This is again conservation of energy, now hypothesized for every cyclic process, i.e. for each cycle as a path and each parameterization. Theorem 4 There exists a C 2 function E : R such that (49) E E = CV , = V P. T V 43
Proof. Consider any cyclic process . We may assume a = 0, b > 0. Then (48) says
b 0
(t), V (t))dt = W (T (t), V (t), T

0
(t), V (t))dt. Q(T (t), V (t), T
Owing to (41), we can rewrite:

b 0
+ (V P )V dt = CV T
0
,V ) R2 (T, V, T ,V )dt. R1 (T, V, T
This identity must hold for each parameterization. So x > 0 and set (T (t), V (t)) = (T (t), V (t)) (0 t b/). Then
b/
(50)
0
+ (V P )V dt = CV T
0
b/
, V ) R2 (T , V , T , V )dt. R1 (T , V , T
(t), V (t) = V (t), we can rewrite the left hand side of (51) as (t) = T Since T
b
(51)
0
dt = + (V P )V CV T
CV dT + (V P )dV,
where we use 1-form notation to emphasize that this term does not depend on the parameterization. However (42) implies
b/ 0
, V ) R2 (T , V , T , V )dt R1 (T , V , T
b )2 + (V )2 ] max0tb/ [(T C C.
Sending 0, we conclude CV dT + (V P )dV = 0
for each cycle in , and the existence of E follows. Remark. In fact (48) implies (52) R1 (T, V, A, B ) = R2 (T, V, A, B ) 44
for all (T, V ) , (A, B ) R2 . To see this, consider any process parameterized by (T (t), V (t)), where a t 0. Extend to a cyclic process parameterized by (T (t), V (t)) for a t 1. Then by the proof above:
1
(53)
a
,V )dt = R1 (T, V, T
a
,V )dt. R2 (T, V, T
Reparameterize by writing T (t) = V (t) = T (t) a t 0 T (t) 0 t 1/, V (t) a t 0 V (t) 0 t 1/.
, V ) replacing Formula (53) holds for any parameterization and so is valid with (T , V , T ,V ). Making this change and letting 0, we deduce (T, V, T
0 a
,V )dt = R1 (T, V, T
a
,V )dt R2 (T, V, T
for any process (T (t), V (t)), a t 0. This is only possible if R1 R2 .
The foregoing proofs demonstrate that our model with dissipation in some sense approximates an ideal model without dissipation in the limit as we consider our processes on slower and slower time scales. The model without dissipation corresponds simply to setting R1 0, R2 0. To further develop our model we next assume as an instance of the Second Law that W>0 and (54) W= 1 T1 T2 Q+
for any Carnot heat engine of the ideal model without dissipation acting between temperatures T2 > T . This assertion as before implies there exists a C 2 function S : R with (55) CV S V S = , = . T T V T R1 (T, V, A, B ) = R2 (T, V, A, B ) 0 45
Finally it seems reasonable to assume (56)
for all (T, V ) , (A, B ) R2 . Indeed we can interpret for any process the term
b a
,V )dt R1 (T, V, T
as representing the lost amount of work performed by the uid along owing to velocity dependent internal friction (= dissipation). Combining (56), (57), we deduce for any cyclic process 1 that
,V ) b Q(T,V,T dt T a
= +
(57)
= 0. We employ earlier notation to rewrite: (58)

d Q 0. T
b a b a b a
d S (T, V )dt dt ,V ) R2 (T,V,T dt T R2 (T,V,T ,V ) dt T
a cyclic process
This is a form of Clausius inequality. If is a process from the state = (T0 , V0 ) to = (T1 , V1 ), we likewise deduce (59)
d Q S ( ) S (). T
Lastly, note for our model that if is a cyclic process with maximum temperature T2 and minimum temperature T1 , its eciency is 1 T1 . T2
The proof is precisely like that in 3, except that we write (t), V (t)) q (t) = Q(T (t), V (t), T and start with the inequality 0
d Q = T b a
q dt. T
B. Some general theories In this section we discuss two representative modern theories for thermodynamics. 46
1. Entropy and eciency First we follow DaySilhav y [D-S] and introduce a new mathematical model for thermodynamic systems. a. Denitions Notation. (Y1 , . . . , Yn ) = Y = typical point in Rn . A model for a thermodynamic system in many variables We are given: (a) an open, simply connected subset of Rn ( is the state space and elements of are called states) and (b) two C 2 functions T : (0, ) : Rn . T is the temperature and the components of are the generalized latent heats = (1 , . . . , n ). Notation. (a) A path is an oriented, continuous, piecewise C 1 curve in . A path is a cycle if its starting and end points coincide. is the path taken with opposite orientation. (b) If is a path, its reversal (c) If 1 and 2 are two paths such that the endpoint of 1 is the starting point of 2 , we write 2 1 to denote the path consisting of 1 , followed by 2 .
Y(t)
a a
Rn
47
Note. We parameterize by writing {Y (t) | a t b} Y (t) = (Y1 (t), . . . , Yn (t)) = state at time t , 2 1 have the obvious parameterizations. Denitions. (i) Q() = = (ii) (t) = heating at time t q (t) = (Y (t)) Y q + (t) = q (t) = (iii) If is a path, Q+ () = Q () = If is a cycle, W() = Q() = Q+ () Q () = work performed along . (iv) (t) exists, q (t) > 0} = times at which heat is gained t+ () = {t [a, b] | Y (t) exists, q (t) < 0} = times at which heat is emitted. t () = {t [a, b] | Y (v) T + () = = T () = = sup{T (Y (t)) | t t+ () t ()} maximum temperature at which heat is absorbed or emitted min{T (Y (t)) | t t+ () t ()} minimum temperature at which heat is absorbed or emitted.
b a b a b (t)dt d Q = dY = a (Y (t)) Y heat absorbed along .
q (t) if q (t) 0 0 if q (t) 0 0 if q (t) 0 q (t) if q (t) 0.
q + dt = heat gained along q dt = heat lost along .
48
Remark.
Q+ () = Q () Q () = Q+ () Q() = Q() T + () = T + () . T () = T ()
Terminology: A path is called (a) adiabatic if t (), t () = . (b) absorbing and essentially isothermal if t () = , t+ () = and T (Y (t)) is constant on t+ (). (c) emitting and essentially isothermal if t+ () = , t () = and T (Y (t)) is constant on t (). (d) monotonic and essentially isothermal if is one of types (a), (b), (c) above. (e) essentially isothermal if T (Y (t)) is constant on t+ () t (). (f) a Carnot path if there exist T1 T2 such that
+
Abbreviations A M+ M M I C
T (Y (t)) = T1 if t t () T (Y (t)) = T2 if t t+ (). Notation. If , , P (, ) = collection of all paths from to in A(, ) = collection of all adiabatic paths from to . M (, ), M (, ), I (, ), C (, ) are similarly dened. b. Existence of S Main Hypothesis. We assume that for each pair of states , and for each temperature level attained in , there exists a monotone, essentially isothermal path M (, ) such that (1) T (Y (t)) = on t+ () t ().
49
isotherm, q of one sign level set {T=}
adiabats
Rn
Theorem. Assume (2) for each cycle in and (3) W() = T () 1 + T () Q+ () W() 1 T () T + () Q+ ()
for each Carnot cycle in . Then there exists a C 2 function S:R such that (4) k S = Yk T in (k = 1, 2, . . . , n).
This result says that the eciency estimates (2), (3) (special cases of which we have earlier encountered) in fact imply the existence of the entropy S . Conversely it is easy to check using the arguments in A.3 that (4) implies (2), (3). Proof. 1. Dene the crude entropy change (5) () := 0 if is adiabatic
Q + () T + () Q () T ()
if not.
50
Then (2), (3) say: (6) () 0 for each cycle , () = 0 for each Carnot cycle . 2. Claim # 1. For each , : (7) M (, ) = A(, ) or M (, ) = M + (, ) or M (, ) = M (, ).
/ To prove this, x , , 1 , 2 M (, ). Assume rst 1 M + (, ), but 2 + + M (, ). Then 2 A(, ) M (, ), and so Q (2 ) = Q (1 ) = 0. Hence Q (2 2 ) + Q (1 ) = Q+ (2 ) + Q (1 ) = 0, but 2 1 is not adiabatic. Thus 1 ) = Q ( 2 1 ) = ( = =
2 1 ) Q + ( 2 1 ) T + ( 2 )+Q + (1 ) Q + ( 2 1 ) T + ( Q (2 )+Q + (1 ) 2 1 ) T + (
> 0.
2 1 is a cycle, we have a contradiction to (6). Hence 1 M + (, ) implies Since 2 M + (, ). Likewise 1 M (, ) implies 2 M (, ). This proves (7). 3. Claim # 2. If 1 , 2 M (, ), then (8) (1 ) = (2 ).
According to Claim # 1, 1 , 2 A(, ) or 1 , 2 M + (, ) or 1 , 2 M (, ). The rst possibility immediately gives (8). Suppose now 1 , 2 M + (, ), with 2 1 is cyclic, with Q+ (1 ) Q+ (2 ). Then = + T () = T + (1 ) T () = T + ( ) 2 + + Q () = Q (1 ) Q () = Q+ (2 ). Thus (6) implies 0 = () = =
() Q + () Q T + () T () + ( ) Q+ (1 ) 2 Q T + (1 ) T + (2 )
= (1 ) (2 ). This is (8) and a similar proof is valid if 1 , 2 M (, ). 51
4. Claim # 3. If P (, ) and M (, ), then (9) () ()
To prove (9), recall the possibilities listed in (7). If M (, ) = A(, ), then is adiabatic is a cycle, and ( ) = (). Thus (6) implies and () = 0. Also ) 0 = (). () = ( This is (9). If M (, ) = M + (, ), then A(, ) = M (, ) = and so is not adiabatic. Thus T () is dened. According to the Main Hypothesis, there exists M (, ) = M + (, ) such that T + () = T (). . Then is a cycle, with Set = + T () = T + () T () = T + () = T () Q+ () = Q+ () Q () = Q () + Q+ () Thus (6) says 0 () = = Then () () = (), where we employed Claim # 3 for the last equality. The proof of (9) if M (, ) = M (, ) is similar. 5. Claim # 4. If P (, ) and I (, ), then (10) () ().
() Q + () Q T + () T () () Q+ () Q T + () T ()
Q+ () T + ()
= () ().
M (, ), P (, ), and so To verify (10), select any M (, ). Then () () according to Claim # 3. Since , I (, ), = (), () = (). () 52
Thus () (). Owing to Claim # 3, then, () () (). This is (10). 6. Claim # 5. There is a function : R such that (11) () ( ) ()
for all , , P (, ). To prove this, note rst that Claim # 4 implies (1 ) = (2 ) if 1 , 2 I (, ). Thus we can dene (, ) := () Then according to Claim # 4 () (, ) ( P (, )), ( I (, )).
and so to derive (11) we must show that we can write (12) (, ) = ( ) () for all , .
For this x a state and a temperature level . Owing to the Main Hypothesis, there exist 1 M (, ), 2 M (, ) such that T (1 ) = on t+ (1 ) t (1 ) T (2 ) = on t+ (2 ) t (2 ). Then 2 1 I (, ) and (2 1 ) =
Q + (2 1 ) Q + (2 1 ) T ( ) T + (2 1 ) 2 1 Q + (2 )+Q + (1 ) Q (2 )+Q (1 )
= = (2 ) + (1 ). Hence (13)
(, ) = (, ) + (, ). 53
Dene () := (, ), to deduce (12) from (13). 7. Finally we build the entropy function S . Take any cycle in , parameterized by {Y (t) | 0 t 1}. Fix > 0. Then take N so large that 1 1 T (Y (t1 )) T (Y (t2 )) if t1 , t2 (14)
k 1 k ,N N
(k = 1, . . . , N ). Thus we have 0 1 T (k ) 1 T + (k ) (k = 1, . . . , N ),
1 k . We here and afterwards assume each where k is parameterized by Y (t) | kN t N k is not adiabatic, as the relevant estimates are trivial for any adiabatic k . Thus
Q d T
:= = = =
1 1 0 T (Y (t))
(t) dt (Y (t)) Y
q (t)
N k=1 N k=1 N k=1 N k=1 N k=1 N k=1
k 1 N k1 T (Y (t)) N Q+ (k ) Q (k ) T (k ) T + (k )
q (t)dt
Q+ (k ) (k ) +
1 T + (k )
+ + T 1 + Q (k ) (k )
(k ) + (Q+ (k ) + Q (k ))
1 0
|q (t)|dt
(k ) + C. (k ) (k ) (k )
Now by Claim # 5, with k = Y k = Y Since is a cycle, N +1 = 1 , and so

n
k 1 N k = N
k+1 .
(k ) 0.
k=1
54
Consequently the calculation above forces:

1 0
(Y (t)) Y (t)dt C, T (Y (t))

1 0
and thus
d Q = T
q (t) dt 0. T (Y (t))
in place of , we conclude Applying the same reasoning to

d Q = 0 for each cycle . T
As is simply connected, this fact implies the existence of S : R with DS = . T 2. Entropy, temperature and supporting hyperplanes Modern rigorous approaches to thermodynamics vastly extend the realm of applicability of the foregoing notions to extremely diverse systems of various sorts. See for instance Serrin [S2], [S3]. As an interesting second illustration of a modern theory we next present Feinberg and Lavines derivation [F-L1] of entropy and temperature for an abstract thermal system, as a consequence of the HahnBanach Theorem. a. Denitions Notation. (i) = a compact metric space. (ii) C () = space of continuous functions : R, with
C ()
= max |()|.
(iii) M() = space of signed Radon measures on . M+ () = space of nonnegative Radon measures. M0 () = { M() | () = 0}. 55
(iv) We endow M() with the weak topology, which is the weakest topology for which the mappings
are continuous on M() for all C (). (v) M0 () M() = {(, ) | M0 (), M()}. We give M() the weak topology, M0 () the inherited subspace topology and M0 () M() the product topology. A model for an abstract thermodynamic system We are given (a) a compact metric space , as above. ( is the state space and elements of are called states). and (b) a nonempty set P M0 () M(), such that (15) P is a closed, convex cone.
(Elements of P are called processes. If = (, ) P is a process, then M0 () is the change of condition and M() is the heating measure for .) Denitions. (i) A cyclic process for our system is a process whose change of condition is the zero measure. (ii) Let C denote the set of measures which are heating measures for cyclic processes. That is, (16) C = { M() | (0, ) P}.
Physical interpretation. The abstract apparatus above is meant to model the following situation. (a) Suppose we have a physical body U made of various material points. At any given moment of time, we assign to each point x U a state (x) .
56
(E) x
-1
(x) E
physical body U
abstract state space
The condition of the body at this xed time is then dened to be the measure M+ () such that (E ) = mass of 1 (E ) for each Borel subset E . (b) We now image a process which somehow acts on U , thereby changing the initial condition i to a nal condition f . The change of condition is = f i . Observe () = f () i () = mass of U mass of U = 0.
Thus M0 (). If the process is cyclic, then f = i ; that is, = 0. (c) We record also the heat received during the process by dening (E ) = net amount of heat received from the exterior during the entire process, by the material points with initial states lying in E for each Borel subset E . The signed measure is the heating measure. b. Second Law We next assume that our abstract system satises a form of the Second Law, which for us means (17) C M+ () = {0}.
57
Physical interpretation. Condition (17) is a mathematical interpretation of the Kelvin Planck statement of the Second Law, namely if, while suering a cyclic process, a body absorbs heat from its exterior, that body must also emit heat (18) to its exterior during the process. This is quoted from [F-L1, p. 223]. In other words, the heat supplied to the body undergoing the cyclic process cannot be converted entirely into work: there must be emission of heat from the body as well. The cyclic process cannot operate with perfect eciency. Our condition (17) is a mathematical interpretation of all this: the heating measure for a cyclic process cannot be a nonnegative measure. c. The HahnBanach Theorem We will need a form of the HahnBanach Theorem. Let X be a Hausdor, locally convex topological vector space. Suppose K1 , K2 are two nonempty, disjoint, closed convex subsets of X and that K1 is a cone. Then there exists a continuous linear function (19) such that (20) (k1 ) 0 for all k1 K1 (k2 ) > 0 for all k2 K2 . :XR
We can think of the set { = } as a separating hyperplane between K1 , K2 .
58
{ = 0}
K1 K2
{ < 0} { > 0}
To utilize the HahnBanach Theorem, we will need an explicit characterization of in certain circumstances: Lemma (i) Let X = M() and suppose (21) :XR
is continuous and linear. Then there exists C () such that (22) () =
for all X.
(ii) Let X = M0 () M() and suppose :XR is continuous and linear. Then there exist , C () such that (23) for all (, ) X . Proof. 1. Suppose X = M() and :XR 59 ((, )) =
d +
is continuous, linear. Fix any point and let denote the point mass at . Next dene () := ( ) ( ). 2. We rst claim C (). To prove this, let k in . Then for every C (), dk = (k ) () =

and so k weakly as measures. This means k in M(). As is continuous, (k ) = (k ) ( ) = (). Thus is continuous. 3. Since is linear,
m m
(24)
k=1
ak k
=
k=1
ak (k )
m for all {k }m k=1 , {ak }k=1 R. Finally take any measure M(). We can nd measures {m }m=1 of the form m
m =
k=1
am k m k
(m = 1, . . . )
such that (25) Then (24) implies

m m am k (k ) = (m ) () as m . k=1
m in M().
But since is continuous, (25) says

m
dm =
k=1
m am k (k )
d as m .
60
Thus () =
d.
This proves the representation formula (22) and the proof of (23) is then immediate. d. Existence of S, T
Theorem Assume that our abstract thermodynamic system satises (17). Then there exist continuous functions S:R T : (0, ) such that (26)
d T
Sd
for each process = (, ) P . We will later interpret (26) as a form of Clausius inequality. Proof. 1. Hereafter set X = M0 () M(), K1 = P K2 = {0} M1 + (), where M1 + () = { M+ () | () = 1}. By hypothesis (15), K1 is a closed, convex cone in X . K2 is also closed, convex in X . In addition our form of the Second Law (17) implies that K 1 K 2 = . We may therefore involve the HahnBanach Theorem, in the formulation (20), (23): there exist , C () with (27) and (28) (0, ) =
(, ) =
d +
d 0 for all = (, ) P
d > 0 for all M1 + ().
61
Taking = in (28), where , we deduce that > 0 on . Change notation by writing S := , T := Then (27) reads
1 .
d T
Sd
for all processes = (, ) P .
Physical interpretation. We can repackage inequality (26) into more familiar form by rst of all dening (29)
d Q := T
d T
for each process = (, ). Then (26) says (30)

d Q 0 T
a cyclic process.
Fix now any two states , . Then if there exists a process = ( , ) P , (26) and (29) read (31)
d Q S ( ) S () T
a process from to .
Clearly (30) is a kind of Clausius inequality for our abstract thermodynamic system. := ( , ) P . Finally let us say a process = ( , ) P is reversible if Then d Q = S ( ) S () a reversible process from to . T Remark. See Serrin [S2], [S3], ColemanOwenSerrin [C-O-S], Owen [O] for another general approach based upon a dierent, precise mathematical interpretation of the Second Law. Note also the very interesting forthcoming book by Man and Serrin [M-S].
62
CHAPTER 3: Continuum thermodynamics Since our primary subject in these notes is the application of entropy ideas to PDE theory, we must next confront a basic issue: the classical physical theory from Chapter I and from A, B.1 in Chapter II are concerned with entropy dened as an extensive parameter over an entire system. That is, S does not have spatial dependence. The rather abstract framework in B.2 of Chapter II does on the other hand allow for variations of S over a material body, and this prospect points the way to other modern approaches to continuum thermodynamics, for which the entropy, internal energy, temperature, etc. are taken to be functions of both position and time. We will present here the axiomatic method of ColemanNoll [C-N]. A. Kinematics 1. Deformations We model the motion of a material body by introducing rst a smooth bounded region U , the reference conguration, a typical point of which we denote by X . We suppose the moving body occupies the region U (t) at time t 0, where for simplicity we take U (0) = U .
x=(X,t) X
U=U(0)
U(t)
Let us describe the motion by introducing a smooth mapping : U [0, ) R3 so that (1) x = (X, T )
is the location at time t 0 of the material particle initially at X U . We require that for each t 0, (, t) : U U (t) is an orientation preserving dieomorphism. Write (, t) = 1 (, t); so that (2) X = (x, t).
63
Then (3) v(x, t) = (X, t) t
is the velocity eld, where X, x are associated by (1), (2). 2. Physical quantities We introduce as well: (i) the mass density (x, t) (ii) the stress tensor T(x, t) (iii) the body force/unit mass b(x, t) (iv) the internal energy/unit mass e(x, t) (v) the heat ux vector q(x, t) (vi) the heat supply/unit mass r(x, t) (vii) the entropy/unit mass s(x, t) (viii) the local temperature (x, t). The functions , are assumed to be positive. Remarks. (1) We henceforth assume these are all smooth functions. Note , e, r, s and are real valued b, q take values in R3 T takes values in S3 (= space of 3 3 symmetric matrices) (2) We will sometimes think of these quantities as being functions of the material point X rather than the position x. In this case we will write = (X, t), T = T(X, t), etc. 64
where X, x are related by (1), (2). Notation. We will often write (4) dm = dx.
Note. The kinetic energy at time t 0 is K (t) =

U (t)
|v|2 dm; 2
the internal energy is E (t) =

U (t)
edm;
and the entropy is S (t) =

U (t)
sdm.
3. Kinematic formulas Take V to be any smooth subregion of U and write V (t) = (V, t), (t 0). If f = f (x, t) (x R3 , t 0) is smooth, we compute: d dt f dx
V (t)
=
V (t)
f dx + t
f v dS,
V (t)
where dS denotes 2-dimensional surface measure on V (t), v is the velocity eld and is the unit outer normal vector eld to V (t). Applying the GaussGreen Theorem we deduce: (5) Take f = above. Then d + div(v)dx = t dt dx
V (t)
d dt
f dx
V (t)
=
V (t)
f + div(f v)dx. t
= 0,
V (t)
as the total mass within the regions V (t) moving with the deformation is unchanging in time. Since the region V (t) is arbitrary, we deduce (6) + div(v) = 0 t conservation of mass. 65
This PDE holds in U (t) at each time t 0. Now again take f = f (x, t) and compute
d dt V (t)
f dm
= = =
d dt V (t)
V (t)
f dx
by (4) by (5) (6),
(f ) + div(f v)dx t f + v Df dx by V (t) t
where Df = Dx f =
f , f , f x1 x2 x3
= gradient of f with respect to the spatial variables x = (x1 , x2 , x3 ).
Recalling (4) we deduce (7) where d dt f dm

V (t)
=
V (t)
Df dm, Dt
Df f = + v Df Dt t is the material derivative. Observe that d Df f ((X, t), t) = . dt Dt 4. Deformation gradient We dene as well the deformation gradient F (X, t) = DX (X, T ) 1 1 . . . X X1 3 ... =
3 X1 3 X3
(8)
and the velocity gradient (9) for X U , t 0. L(X, t) = F (X, t)F 1 (X, t) t
66
To understand the name of L, recall v(x, t) =

3 i vx j
(X, t) t
( (x, t), t). t
Thus
=
k=1
2 i k tXk xj
(1 i, j 3).
As F 1 = Dx , we see that Dv(x, t) = Thus (10) L(X, t) = Dv(x, t) F (X, t)F 1 (X, t). t
is indeed the velocity of the gradient. Remark. Let (, 0) = 0 denote the mass density at time t = 0. Then (x, t) = (det Dx (x, t))0 (X ) is the density at x U (t), t 0. Since F = (Dx )1 , we see that (11) (x, t) = (det F (X, t))1 0 (X )
for t 0, X, x associated by (1), (2). B. Conservation laws; ClausiusDuhem inequality We now further assume for each moving subregion V (t) (t 0) as above that we have: I. Balance of linear momentum
(1)
d dt
vdm
V (t)
=
V (t)
bdm +
V (t)
T dS.
This says the rate of change of linear momentum within the moving region V (t) equals the body force acting on V (t) plus the contact force acting on V (t). We employ (6), (7) in A and the GaussGreen Theorem to deduce from (1) that (2) Dv = b + div T Dt balance of momentum.
We additionally suppose: 67
II. Energy balance
(3)
d dt
V (t)
|v |2 + edm 2
=
V (t)
v b + rdm v T q dS
V (t)
This identity asserts that the rate of change of the total energy (= kinetic energy + internal energy (including potential energy)) within the moving region V (t) equals the rate of work performed by the body and contact forces, plus the body heat supplied minus the heat ux outward through V (t). Since T is symmetric, v T = (Tv) . Hence (3) and the GaussGreen Theorem imply: D Dt |v |2 +e 2 = (v b + r) + div(Tv q).
Simplify, using (2) to deduce: (4) De = r div q + T : Dv Dt energy balance.
Notation. If A, B are 3 3 matrices, we write

3
A:B=
i,j =1
aij bij .
Lastly we hypothesize the III. ClausiusDuhem inequality
(5)
d dt
sdm
V (t)
V (t)
r dm
V (t)
q dS.
This asserts that the rate of entropy increase within the moving region V (t) is greater than or equal to the heat supply divided by temperature integrated over the body plus an entropy ux term integrated over V (t). Observe that (5) is a kind of continuum version of the various forms of Clausius inequality we have earlier encountered. 68
As before (5) implies (6) r q Ds div Dt entropy inequality.
Notation. We dene the local production of entropy per unit mass to be: q Ds r 1 + div = Dt Ds r div(q) q D (7) = + Dt 2 0. We call a motion and a collection of functions , T, b, etc. satisfying IIII above an admissible thermodynamic process. C. Constitutive relations A particular material is dened by adding to the foregoing additional constitutive relations, which are restrictions on the functions T, b, etc. describing the thermodynamic process. 1. Fluids Notation. We refer to (1) as the specic volume. Note that then |U (t)| =
U (t)
v=
vdm = volume of U (t).
a. We call our body a perfect uid , q such that e , , T (a) (b) (2) (c) (d)
with heat conduction if there exist four functions e=e (s, v ) (s, v ) = (s, v ) T=T (s, v, D). q=q 69
These are the constitutive relations. Notation. Formula (a) means e(x, t) = e (s(x, t), v (x, t)) where e : R R R. Equations (b)(d) have similar interpretations. Below we will sometimes omit writing the and so regard e as a function of (s, v ), etc. The key question is this: What restrictions on the constitutive relations (2) are imposed by the ClausiusDuhem inequality? To deduce useful information, let us rst combine (4), (6) from B: (3) 0 = Ds De Dt Dt 1 + T : Dv q D. (x U (t), t 0)
Owing to (2)(a) we have: (4) De e Ds e Dv = + . Dt s Dt v Dt
Now the conservation of mass (6) in A implies D = div v. Dt Thus (5) Insert (4), (5) into (3): (6) e 0 s Ds e + T I Dt v 1 D. : Dv q Dv 1 D 1 = 2 = div v = v div v. Dt Dt
The main point is that (6) must hold for all admissible thermodynamic processes, and here is how we can build them. First we select any deformation as in A and any function s. Then v = , F = DX , = (det F )1 0 . Thus we have v = 1 and so can then employ t 70
(2) to compute e, , T, q. Lastly the balance of momentum and energy equations ((2), (4) from B) allow us to dene b, r. We as follows exploit this freedom in choosing , s. First x any time t0 > 0. Then choose as above so that (, t0 ) = (det F (, t0 ))1 0 is constant, and s so that s(, t0 ) s is constant. Then v (, t0 ) is constant, whence (2)(b) implies D(, t0 ) = 0. Fix any point x0 U (t0 ). We can choose , s as above, so that Dv = L = F F 1 and Ds at (x0 , t0 ) are t Dt arbitrary. As (6) implies e 0 s at (x0 , t0 ) for all choices of Dv, dropping the circumex: (7) and (8) for (9) e = p v pressure formula. T = pI,
Ds , Dt
Ds e + T I Dt v
e , s
: Dv
e I. v
= we conclude
= T
We rewrite these formulas,
e = s
temperature formula
Equation (9) is the denition of the pressure p = p (s, v ). Thus in (7)(9) we regard e, T, p as functions of (s, v ). Remarks. (i) The balance of momentum equation (for b 0) becomes the compressible Euler equations (10) Dv = Dp, Dt
about which more later. If the ow is incompressible, then is constant (say 1) and thus (6) in A implies div v = 0. We obtain then the incompressible Euler equations (11) = Dp div v = 0.
Dv Dt
(ii) Note the similarity of (7), (9) with the classical formulas E E = T, = P S V 71
for a simple uid in Chapter I. Returning to (6), we conclude from (7)(9) that (s, v, D) D 0. q (s, v ) = 0, we deduce: As s, v , and D are arbitrary, provided D (12) q (s, v, p) p 0 heat conduction inequality
for all s, v and all p Rn . Since the mapping (s, v, p) p pq has a maximum at p = 0, we see as well that (s, v, 0) = 0, q which means there is no heat ow without a temperature gradient. b. We call our body a viscous , such that e , , T l, q (a) (b) (13) (c) (d) uid with heat conduction if there exist ve functions e=e (s, v ) (s, v ) = (x, v ) + T=T l(s, v )[Dv] (s, v, D). q=q
Here for each (s, v ), we assume l(s, v )[] is a linear mapping from M33 into S3 . This term models the uid viscosity. As before, we conclude from the ClausiusDuhem inequality that e 0 s Ds e + T I Dt v 1 : Dv + l[Dv] : Dv q D.
Since this estimate must be true for all and s, we deduce as before the temperature formula (7). We then pick , s appropriately to deduce: 0 e T I v : Dv + l[Dv] : Dv.
In this expression Dv is arbitrary, and so in fact e I 0 T v l[Dv] : Dv : D v + 2 72
for each R. If we dene p by (9), the identity (14) follows, as does the inequality (15) l(L) : L 0 dissipation inequality for all L M33 . The heat conduction inequality (12) then results. Remark. As explained in [C-N] constitutive relations must also satisfy the principle of material objectivity, which means that an admissible process must remain so after a (possibly time dependent) orthogonal change of frame. This principle turns out to imply that l[] must have the form (16) l[Dv] = (Dv + (Dv)T ) + (div v)I T = pI + l[Dv ]
where , are scalar functions of (s, v ). The dissipative inequality (15) turns out to force 2 0, + 0. 3 Taking , to be constant in the balance of motion equations (for b 0) gives the compressible NavierStokes equations Dv = Dp + v + ( + )D(div v). Dt If the ow is incompressible, then is constant (say 1) and so the conservation of momentum equation implies div v 0. We thereby derive the incompressible NavierStokes equations (17) (18) = Dp + v div v = 0.
Dv Dt
2. Elastic materials We can also model as well elastic solids, for which it is appropriate to display certain quantities in Lagrangian coordinates (i.e., in terms of X , not x). a. We call our body a perfect elastic T ,q such that four functions e , , (a) (b) (19) (c) (d) material with heat conduction provided there exist e=e (s, F ) (s, F ) = (s, F ) T=T (s, F, D). q=q 73
These are the constitutive relations. Notation. Equation (a) means e(x, t) = e (s(x, t), F (X, t)) (x U (t), t 0)
where e : M33 R R and X = (x, t). Equations (b)(d) have similar interpretations. Recall from A that F = DX is the deformation gradient. According to (3): (20) 0 Ds De Dt Dt 1 + T : Dv q D.
Owing to (19)(a) we have De D e Ds e F = + : + DX F Dt s Dt F t Dt

2
.
D Dt
Dierentiate the identity ((X, t), t) = X with respect to t, to deduce e Ds e F De = + : . Dt s Dt F t Recalling from (9), (10) in A that Dv = L = deducing (21) e 0 s
F 1 F , t
= 0. So
we substitute into (20), thereby
Ds T e + T F Dt F
1 D. : Dv q
Fix t0 > 0. We take so that DX (, t0 ) = F (, t0 ) = F , an arbitrary matrix. Next we pick s so that s(, 0) s is constant. Thus (19)(b) forces D(, 0) = 0. As Dv = L = F F 1 and t Ds can take any values at t = t0 , we deduce from (21) the temperature formula (7), as well Dt as the identity (22) T = e T F F stress formula,
where, we recall from (11) in A, = (det F )1 0 . Here we have again dropped the circumex, as so are regarding T, e as functions of (s, F ). Next the heat conduction inequality (12) follows as usual.
2
In coordinates:
e F ij
e Fij ,
(1 i, j 3).
74
Note. Let us check that (22) reduces to (8), (9) in the case of a uid, i.e. e = e (s, v ), 1 v = 1 = (det F ) . Then the ( i, j )-th entry of T is 0
3
k=1
e e (det F ) Fjk = Fjk . Fik 0 v k=1 Fik
Now
(det F ) = (cof F )ik , Fik cof F denoting the cofactor matrix of F . Thus
T = e T F = F 0 0 e (cof F )F T v e (det F )I. v
= = As = (det F )1 0 , we conclude T
e I, v
and this identity is equivalent to (8), (9).
b. We have a viscous elastic material with heat conduction if these constitutive relations hold: (a) e = e (s, F ) (b) = (s, F ) (23) (s, F ) + (c) T = T l(s, F )[Dv] (s, F, D). (d) q = q We deduce as above the temperature formula (7), the heat conduction inequality (12), the dissipation inequality (15), and the identity T= e T F + l[Dv]. F
Remark. Our constitutive rules (2), (13), (19), (23) actually violate the general principle of equipresence, according to which a quantity present in an independent variable in one constitutive equation should be present in all, unless its presence contradicts some rule of physics or an invariance rule. See TruesdellNoll [T-N, p. 359363] for an analysis incorporating this principle. D. Workless dissipation Our model from II.A of a homogeneous uid body without dissipation illustrated an idealized physical situation in which work, but not dissipation, can occur. This is dissipationless work. We now provide a mathematical framework for workless dissipation (from Gurtin [GU, Chapter 14]). 75
We adopt the terminology from AC above, with the additional proviso that our material body is now assumed to be rigid. So U (t) = U (for all t 0) and (1) v 0, b 0.
We simplify further by supposing the mass density is constant: (2) 1.
The remaining relevant physical quantities are thus e, q, r, s and . Under assumption (1) the momentum balance equation is trivial. The energy balance equation ((4) in B) now reads: (3) e = r div q t
and the entropy ux inequality ((6) in B) becomes: (4) The local production of entropy is (5) = s r div(q) q D 0. + t 2 s r q div . t
Combining (3)(5) as before, we deduce: (6) 0 = s e t t q D .
It is convenient to introduce the free energy/unit mass: (7) f = e s,
a relation reminiscent of the formula F = E T S from Chapter 1. In terms of f , (6) becomes: (8) f q D +s + 0. t t
For our model of heat conduction in a rigid body we introduce next the constitutive relations (, D) (a) e = e (b) s = s (, D) (9) (c) q = q (, D) 76
are given functions. We seek to determine what (8) implies about these strucwhere e , s , q tural functions. First, dene (10) so that (7) says Therefore (, p) := e f (, p) s (, p); (, D). f =f f f D = + Dp f t t t
Plug into (8): (11) f D +s + Dp f t

, t
D q 0.
As before, we can select so that Consequently we deduce (12)
and D are arbitrary at any given point (x, t).
f = s
free energy formula, F = S T
an analogue of the classical relation
discussed in Chapter I. Also we conclude (13) and so (10) implies Dp e (, p) = Dp s (, p) for all , p. But (12), (13) allow us to deduce 0= ) = Dp s . (Dp f = 0, Dp f
Hence Dp e Dp s 0, and so e , s do not depend on p. Thus (9)(a), (b) become (14) e = e() s = s(). 77
The energy and entropy thus depend only on and not D. Finally we conclude from (11) that (15) q(, p) p 0 heat conduction inequality for all , p. The free energy is () = e f =f () s (). Finally we dene the heat capacity/unit mass: (16) cv () = e (), = d d ,
in analogy with the classical formula CV = E T .

V
=e s s ; whence (12), (16) imply Let us compute f cv =s = . f In summary (17) cv () = s () = f () . In particular f is a strictly concave function of if and only if cv () > 0. Finally we derive from (3), (16) the general heat conduction equation cv () + div(q(, D)) = r. (18) t This is a PDE for the temperature = (x, t) (x U, t 0). The local production of entropy is (19) = q(, D) D = q(, D) D 2 q(, p) = Ap 1 .
Remark. The special case that (20) is called Fouriers Law, where A M33 , A = ((aij )). Owing to the heat conduction inequality, we must then have p (Ap) 0 for all p, and so
3
(21)
i,j =1
aij i j 0
( R3 ).
78
CHAPTER 4: Elliptic and parabolic equations In this chapter we will analyze linear PDE of elliptic and parabolic type in light of the physical theory set forth in Chapters IIII above. We will mostly follow Day [D]. Generic PDE notation. U, V, W = open subsets, usually of Rn u, v, w = typical functions ut =
u , t
u xi =
u , xi
u xi xj =
2u , xi xj
etc.
A. Entropy and elliptic equations 1. Denitions We will rst study the linear PDE
n
(1)
i,j =1
(aij (x)uxi )xj = f in U,
where U Rn is a bounded, connected open set with smooth boundary U , R, f :U and Sn , A = ((aij )). A:U The unknown is u = u(x) and u is smooth. We assume u > 0 in U Physical interpretation. We henceforth regard (1) as a time-independent heat conduction PDE, having the form of (18) from III.D, with u = temperature q = ADu = heat ux (2) f = heat supply/unit mass. ). (x U
79
We will additionally assume in the heat condition PDE that the heat capacity cv is a constant, say cv 1. Then, up to additive constants, we see from formula (17) in III.D that (3) u = internal energy/unit mass log u = entropy/unit mass.
The local production of entropy is

n
(4)
=
i,j =1
aij
u xi u xj . u2
We will henceforth assume the PDE (1) is uniformly elliptic, which means that there exists a positive constant such that
n
(5)
i,j =1
aij (x)i j | |2
, Rn . We assume as well that the coecient functions {aij }n for all x U i,j =1 are smooth. Note that (5) implies 0. Notation. (i) If V is any smooth region, the outer unit normal vector eld is denoted = (1 , . . . , n ). The outer A-normal eld is (6) . A = A R is smooth, the A-normal derivative of u on V is (ii) If u : V (7) u ) = = Du (A aij i uxj . A i,j =1 Note. According to (5) A > 0 on V and so A is an outward pointing vector eld. Denitions. Let V U be any smooth region. We dene (8) F (V ) =
V n
f dx = entropy supply to V u
(9)
G(V ) =
V
dx =
V
aij uxi uxj dx = internal entropy production u2 i,j =1 80
(10)
R(V ) =
V
1 u dS = entropy ux outward across V . u A
Lemma For each subregion V U , we have: (11) F (V ) + G(V ) = R(V ).
This is the entropy balance equation. Proof. Divide the PDE (1) by u and rewrite:
n
i,j =1
ij uxi
=
xj
aij uxi uxj f + . 2 u u i,j =1
Integrate over V and employ the GaussGreen Theorem:

V
1 aij uxi j dS = u i,j =1

R (V )
aij
V i,j =1
u xi u xj dx u2
G(V )
+
V
f dx . u
F (V )
We henceforth assume (12) f 0 in U,
meaning physically that only heating is occurring. 2. Estimates of equilibrium entropy production The PDE (1) implies for each region V U that (13)
V
q dS =
V
f dx
for q = ADu. This equality says that the heat ux outward through V balances the heat production within V . Likewise the identity (11) says (14)
V
q dS = u 81
+
V
f dx, u
which means the entropy ux outward through V balances the entropy production within V , the later consisting of
V
dx, the rate of internal entropy generation, and the entropy supply.
f dx, V u
a. A capacity estimate Now (13) shows clearly that the heat ux outward through U can be made arbitrarily large if we take f large enough. In contrast if V U , there is a bound on the size of the entropy ux and the rate of internal entropy production, valid no matter how large f is.
U
Assume that V U and let w solve the boundary n ij i,j =1 (a (x)wxi )xj = (15) w = w =
value problem 0 in U V 1 on V 0 on U .
Denition. We dene the capacity of V (with respect to U and the matrix A) to be

n
(16)
CapA (V, U ) =
U V i,j =1
aij wxi wxj dx,
w solving (15). Integrating by parts shows: (17) CapA (V, U ) =

V
w dS, A
82
A denoting the outer A-normal vector eld to V . Theorem 1 We have (18) for all choices of f 0. : Proof. Take w as in (15) and compute in U V
n i,j =1 w2 ij a u xi u n i,j =1 xj
R(V ) CapA (V, U )
w2 (aij uxi )xj u
w2 ij a u xi u xj u2
+ 2w aij uxi wxj u = wuf +

2
n ij w u i,j =1 a u xi n ij i,j =1 a wxi wxj ,
wxi
w u u xj
wxj
where we employed the PDE (1). Since f 0 and the elliptivity condition (5) holds, we deduce n n w2 ij . aij wxi wxj in U V a uxi u x j i,j =1 i,j =1 : Integrate over U V
V
w2 u dS u A
n U V i,j =1
aij wxi wxj dx.
Since w = 1 on V , the term on the left is R(V ), whereas by the denition (16) the term on the right is CapA (V, U ). Note that the entropy balance equation and (18) imply: (19) G(V ) CapA (V, U )
for all f , the term on the right depending solely on A and the geometry of V, U . The above calculation is from Day [D]. b. A pointwise bound Next we demonstrate that in fact we have an internal pointwise bound on within any region V U , provided (20) f 0 in U. 83
Theorem 2 Assume f 0 in U and V U . Then there exists a constant C , depending only on dist(V, U ) and the coecients, such that (21) for all positive solutions of (1). In physical terms, if the heat supply is zero, we can estimate the local production of entropy pointwise within V , completely irrespective of the boundary conditions for the temperature on U . The following calculation is technically dicult, butas we will see later in VIII.Dis important in other contexts as well. Proof. 1. Let (22) denote the entropy. Then the PDE
n
sup C
V
v = log u
i,j =1
(aij uxi )xj = 0 in U
becomes
n n
(23) So
i,j =1
(a vxi )xj =
i,j =1
ij
aij vxi vxj = .
(24) for
i,j =1
a vxi xj +
i=1
ij
bi vxi = in U
b :=
i i,j =1
aij xj
(1 i n).
Dierentiate (24) with respect to xk :

n n
(25)
i,j =1
a vxk xi xj +
i=1
ij
bi vxk xj = xk + R1 ,
where R1 denotes an expression satisfying the estimate (26) |R1 | C (|D2 v | + |Dv |). 84
2. Now =
akl vxk vxl .

k,l=1
Thus (27) where (28) Hence (29)

n i,j =1
xi = xi xj =
n k,l=1 n k,l=1
2akl vxk xi vxl + akl,xi vxk vxl , 2akl vxk xi xj vxl + 2akl vxk xi vxl xj + R2 ,
|R2 | C (|D2 v | |Dv | + |Dv |2 ).
aij xi xj +
n i i=1 b xi
=2 2
n n kl ij k,l=1 a vxl i,j =1 a vxk xi xj n ij kl i,j,k,l=1 a a vxk xi vxl xj + R3 ,
n i i=1 b vxk xi
R3 another remainder term satisfying an estimate like (28). In view of the uniform ellipticity condition (5), we have
n
aij akl vxk xi vxl xj 2 |D2 v |2 .

i,j,k,l=1
This estimate and the equality (25) allow us to deduce from (29) that
n n n
(30)
i,j =1
a xi xj +
i=1
ij
b xi 2
i k,l=1
akl vxl xk 22 |D2 v |2 + R4 ,
R4 satisfying an estimate like (28): |R4 | C (|D2 v | |Dv | + |Dv |2 ). Recall now Cauchys inequality with ab a2 + and further note |Dv |2 . Thus |R4 | 2 |D2 v |2 + C. 85 1 2 b 4 (a, b, > 0),
Consequently (30) implies

n
(31)
|D v |
2 2 2 i,j =1
aij xi xj C (1 + 1/2 )|D | + C.
Next observe that the PDE (24) implies C (|D2 v | + |Dv |) C (|D2 v | + 1/2 ) C |D2 v | + C + , 2 where we again utilized Cauchys inequality. Thus (32) C (|D2 v | + 1).
This estimate incorporated into (31) yields:

n
(33)

2 i,j =1
aij xi xj C (1 + 1/2 )|D | + C
for some > 0. 3. We have managed to show that satises within U the dierential inequality (33), where the positive constants C, depend only on the coecients. Now we demonstrate that this estimate, owing to the quadratic term on the left, implies a bound from above for on any subregion V U . So take V U and select a cuto function : U R satisfying (34) Write (35) and compute (36) xi = 4 xi + 4 3 xi xi xj = 4 xi xj + 4 3 (xj xi + xi xj ) + 4( 3 xi )xj . := 4 0 1, 1 on V, 0 near U.
, where attains its maximum. If (x0 ) = 0, then = 0. Otherwise Select a point x0 U (x0 ) > 0, x0 U , and so D (x0 ) = 0, D2 (x0 ) 0. 86
Consequently (37) So at the point x0 : D = 4D at x0 . 0 = where (38) Invoking (33) we compute (39) 4 2 C 4 1/2 |D | + C + R6 , |R5 | C ( 3 |D | + 2 ).
n ij i,j =1 a xi xj ij 4 n i,j =1 a xi xj
+ R5 ,
R6 being estimated as in (38). Now (37) implies 4 1/2 |D | C 3 3/2 4 2 + C, 4 where we employed Youngs inequality with ab ap + C ()bq , q = 4. Also for p = 4 3 1 1 + = 1, a, b, > 0 p q
|R6 | C ( 3 |D | + 2 ) C 2 4 2 + C. 4 4 2 C at x0 ,
These estimates and (39) imply (40)
the constants C, depending only on and the coecients of the PDE. As = 4 attains at x0 , (40) provides a bound on . its maximum over U In particular then, since 1 on V , estimate (21) follows. 3. Harnack inequality As earlier noted, the pointwise estimate (21) is quite signicant from several viewpoints. As a rst illustration we note that (21) implies 87
Theorem 3 For each connected region V U there exists a constant C , depending only on U, V , and the coecients, such that (41) for each nonnegative solution u of
n
sup u C inf u
V V
(42)
i,j =1
(aij uxi )xj = 0 in U.
Remark. Estimate (41) is Harnacks inequality and is important since it is completely independent of the boundary values of u on U . Proof. Take V W U and r > 0 so small that B (x, r) W for each x V . Let > 0. Since u = u + > 0 solves (42), Theorem 2 implies (43) sup
W
|Du| C u+
for some constant C depending only on W, U , etc. Take any points y, z B (x, r) W . Then | log(u(y ) + ) log(u(z ) + )| supB (x,r) |D log(u + )| |y z | 2Cr =: C1 owing to (43). So log(u(y ) + ) C1 + log(u(z ) + ) and thus u(y ) + C2 (log(u(z ) + ) for C2 := eC1 . Let 0 to deduce: (44)
B (x,r )
max u C2 min u.
B (x,r )
As V is connected, we can cover V by nitely many overlapping balls {B (xi , r)}N i=1 . We N repeatedly apply (44), to deduce (41), with C := C2 . ) solves the PDE (42), and U is Corollary (Strong Maximum Principle) Assume u C 2 (U bounded, connected. Then either (45) min u < u(x) < max u
U U
(x U )
88
or else (46) u is constant on U.
= M u. Then Proof. 1. Take M := maxU u, u (47)

n ij xi )xj i,j =1 (a u
= 0 in U
u 0 on U .
Multiply the PDE by u and integrate by parts:

n
(48)
U {u < 0}
|Du | dx
2 i,j =1 U
aij u xi u xi dx = 0.
Here we used the fact that Du = 0 a.e. on {u 0} Du a.e. on {u < 0}.
Then (48) implies Du = 0 a.e. in U . As u = 0 on U , we deduce u 0 in U and so (49) u 0 in U.
This is a form of the weak maximum principle. 2. Next take any connected V U . Harnacks inequality implies < C inf u . sup u
V V
Thus either u > 0 everywhere on V or else u 0 on V . This conclusion is true for each V as above: the dichotomy (45), (46) follows. Remark. We have deduced Harnacks inequality and thus the strong maximum principle from our interior pointwise bound (21) on the local production of entropy. An interesting philosophical question arises: Are the foregoing PDE computations really entropy calculations from physics? Purely mathematically the point is that change of variables v = log u converts the linear PDE n
i,j =1
(aij uxi )xj = 0 in U 89
into the nonlinear PDE

n n
i,j =1
(a vxi )xj =
i,j =1
ij
aij , vxi vxj in U,
which owing to the estimate
aij vxi vxj |Dv |2

i,j =1
admits better than linear interior estimates. Is all this just a mathematical accident (since log is an important elementary function) or is it an instance of basic physical principles (since entropy is a fundamental thermodynamic concept)? We urgently need a lavishly funded federal research initiative to answer this question. B. Entropy and parabolic equations 1. Denitions We turn our attention now to the time-dependent PDE
n
(1) where U is as before and
ut
i,j =1
(aij uxi )xj = f in UT
UT = U (0, T ] for some T > 0. We are given and Sn A:U and the unknown is u = u(x, t) We always suppose u > 0. Physical interpretation. We henceforth think of (1) as a heat conduction PDE, having the form of (18) from III.D with u = temperature q = ADu = heat ux, (2) f = heat supply/unit mass 90 , 0 t T ). (x U A = ((aij )); T R f :U
and the heat capacity is taken to be cv 1. Also, up to additive constants, we have (3) u = internal energy/unit mass log u = entropy/unit mass.
The local production of entropy is = aij uxi uxj . 2 u i,j =1

n
In the special case that A = I , f 0, our PDE (1) reads (4) This is the heat equation. Denitions. Let t 0 and V U be any smooth subregion. We dene (5) S (t, V ) =
V
ut u = 0 in UT .
log u(, t)dx = entropy within V at time t f (, t) dx = entropy supply to V at time t u(, t)
n i,j =1
(6)
F (t, V ) =
V
(7)
G(t, V ) =
(, t)dx =
aij
uxi (,t)uxj (,t) dx u(,t)2
= rate of internal entropy generation in V at time t 1 u(, t) dS = entropy ux outward across V at time t. u(, t) A
(8)
R(t, V ) =
V
Lemma For each t 0 and each subregion V U we have (9) d S (t, V ) = F (t, V ) + G(t, V ) R(t, V ). dt This is the entropy production equation. 91
Proof. Divide the PDE (1) by u and rewrite: ut u Integrate over V :

V n
a
i,j =1
ij uxi
u
xj
=
i,j =1
aij
uxi uxj f + . 2 u u
ut dx + u
1 u dS = u A
dx +
V G(t,V ) V
f dx u
d S (t,v ) dt
R(t,V )
F (t,V ).
2. Evolution of entropy In this section we suppose that (10) and also (11) u = 0 on U [0, ). A f 0 in U (0, )
The boundary condition (11) means there is no heat ux across U : the boundary is insulated. a. Entropy increase Dene S (t) =
U
log u(, t)dx
(t 0).
Theorem 1 Assume u 0 solves (1) and conditions (10), (11) hold. Then (12) dS 0 on [0, ). dt
The proof is trivial: take V = U in the entropy production equation (9) and note that (11) implies R(t, U ) = 0. Remarks. (i) Estimate (12) is of course consistent with the physical principle that entropy cannot decrease in any isolated system. But in fact the same proof shows that t
U
(u(, t))dx 92
is nondecreasing, if : (0, ) R is any smooth function with 0, 0. Indeed

d dt U
(u)dx = =
U U
(u)ut dx (u)
U n i,j =1 n i,j =1
(13)
aij uxi
xj
+ f dx
0.
(u)
aij uxi uxj dx
If f 0, the same conclusion holds if only 0, i.e. is convex. So the entropy growth inequality (12) is just the special case (z ) = log z of a general convexity argument. Is there anything particularly special about the physical case (z ) = log z ? (ii) There is a partial answer, which makes sense physically if we change our interpretation. For simplicity now take aij = ij and regard the PDE (14) ut u = 0 in U (0, )
as a diusion equation. So now u = u(x, t) > 0 represents the density of some physical quantity (e.g. a chemical concentration) diusing within U as time evolves. If V U , we hypothesize that (15) d dt u(, t)dx
V
=
V
u ds
which says that the rate of change of the total quantity within V equals the outward ux through V normal to the surface. The identity (15) holding for all t 0 and all V U , implies u solves the diusion equation (14). Next think of the physical quantity (whose density is u) as being passively transported by a ow with velocity eld v = v(x, t). As in III.A, we have 0= d dt udx =
V (t) V (t)
ut + div(uv)dx,
V (t) denoting a region moving with the ow. Then (16) ut + div(uv) = 0.
Equations (14), (16) are compatible if (17) v= Du = Ds u 93
s = log u denoting the entropy density. So we can perhaps imagine the density as moving microscopically with velocity equaling the negative of the gradient of the entropy density. Locally such a motion should tend to decrease s, but globally S (t) = U s(, t)dx increases. b. Second derivatives in time Since this way.
dS dt
0 and S is bounded, it seems reasonable to imagine the graph of t S (t)
graph of S(t)
t
S We accordingly might conjecture that t S (t) is concave and so d 0 on [0, ). This is dt2 true in dimension n = 1, but false in general: Day in [D, Chapter 5] constructs an example d2 where n = 2, U is a square and dt 2 S (0) > 0. We turn our attention therefore to a related physical quantity for which a concavity in time assertion is true. Let us write
2
(18)
H (t) =
U
u(, t) log u(, t)dx.
Remark. Owing to (3) we should be inclined to think of U u(, t) u(, t) log u(, t)dx as representing the free energy at time t. This is not so good however, as (z ) = z log z z is convex and so if f 0, t U u(, t) u(, t) log u(, t)dx is nondecreasing. This conclusion is physically wrong: the free (a.k.a. available) energy should be diminishing during an irreversible process. It is worth considering expressions of the form (18) however, as they often appear in the PDE literature under the name entropy: see e.g. Chow [CB], Hamilton [H], and see also V.A following. Theorem 2 Assume u > 0 solves the heat equation (14), with u = 0 on U [0, ). 94
(i) We have (19) (ii) If U is convex, then (20) We will need this ) satises Lemma. Let U Rn be smooth, bounded, convex. Suppose u C 2 (U (21) Then (22) |Du|2 0 on U. u = 0 on U. d2 H 0 on [0, ). dt2 dH 0 on [0, ). dt
Proof. 1. Fix any point x0 on U . We may without loss assume that x0 = 0 and that near 0, U is the graph in the en -direction of a convex function : Rn1 R, with (23) D (0) = 0
Thus a typical point x U near 0 is written as (x , (x )) for x Rn1 , x near 0. Let h denote any smooth function which vanishes on U . Then h(x , (x )) 0 near 0. Thus if i {1, . . . , n 1}, hxi + hxn xi = 0 near 0; and consequently (24) 2. Set h = (25)
j =1 u
hxi (0) = 0 =
n j =1 n
(i = 1, . . . , n 1).
uxj j , where = ( 1 , . . . , n ). Then (21), (24) imply

j uxi xj j + uxj x =0 i
(i = 1, . . . , n 1)
95
at 0. Now
|Du|2 u xi j u xi xj , =2 i,j =1 |Du|2 j = 2 uxi uxj x at 0. i i,j =1

xj (1+|D |2 )1/2 1 . (1+|D |2 )1/2 n
and consequently (25) says (26)
3. But since U is the graph of near 0, we have j = So for 1 i, j n 1:

j x i n
(j = 1, . . . , n 1)
xj xk xk xi xi xj = . (1 + |D |2 )1/2 k=1 (1 + |D |2 )3/2

n1
n1
As (21) implies uxn (0) = 0, we conclude from (23), (26) that |Du|2 uxi uxj xi xj 0 at 0, = 2 i,j =1 since is a convex function. Proof of Theorem 2. 1. In light of (18),
d H (t) dt
t ut log u + u u dx u u log u + udx U
= =
|Du|2 dx, u u
where we used the no-ux boundary condition 2. Suppose now U is convex. Then
d2 H (t) dt2
= 0 on U .
= =
Dut 2 Duu + U U
|Du|2 ut dx u2
uxi n i,j =1 u uxi xj xj
n |Du|2 j =1 u2 uxj xj dx,
since ut = u. Integrate by parts:

d2 H (t) dt2
uxi xj uxi xj u x ux ux x 4 i uj2 i j u ij uxi uxj +2|Du|2 u3 dx uxi uxi xj j dS. 2 n i,j =1 U u n i,j =1 U
96
Then boundary term is

U
1 |Du|2 dS 0, u
uxi uxj n |4 uxi xj + |Du dx i,j =1 u2 u3 2 2 4 | 2 |Du|u|2D u| + |Du dx u3
according to the Lemma. Consequently:

d2 H (t) dt2
2 2 0,
|D 2 u| 2 u |D 2 u| 2 u U U
since
|Du|2 |D2 u| u2
|Du|2 u3/2 1 |Du|4 2 u3
|D 2 u| u1/2
1 |D 2 u| 2 . 2 u
c. A dierential form of Harnacks inequality Again we consider positive solutions u of the heat equation (27) with (28) We assume U is convex. Lemma We have (29) n |Du|2 ut . + u 2t u2 u = 0 on U [0, ). ut u = 0 in U (0, )
This estimate, derived in a dierent context by Li-Yau, is a time-dependent version of the pointwise estimate on from A.2. Note that we can rewrite (29) to obtain the pointwise estimate n st + 2t where as usual s = log u is the entropy density and = production. Proof. 1. Write v = log u; so that (27) becomes (30) vt v = |Dv |2 . 97
|Du|2 u2
is the local rate of entropy
Set (31) Then (30) implies (32) But w2 = (v )2 n|D2 v |2 , and so (32) implies (33) wt w 2Dv Dw 2 2 w . n wt w = (|Dv |2 ) = 2|D2 v |2 + 2Dv Dw. w = v.
2. We will exploit the good term on the right hand side of (33). Set (34) Then (35) Now (34) says w= and so w2 = Thus (35) implies (36) 3. Now
w
w = tw +
n . 2
2Dv Dw = w + t(wt w 2Dv Dw) w t w t 2 w+2 w . n n w t 2t
w 2 nw n2 + . t2 t2 4t2
1 2Dv Dw w. w t w t = t w = t (vt |Dv |2 ) on U [0, ) =

1 u u
owing to (30), (31). Since
= 0 on U [0, ), v = 0 on U [0, ). 98
(vt ) = t
Also, the Lemma in B gives |Dv |2 0 on U [0, ). Thus (37) 4. Fix > 0 so small that (38) w = tw + n > 0 on U {t = }. 2 w 0 on U [0, ).
Then (36)(38) and the maximum principle for parabolic PDE imply w 0 in U [, ). This is true for all small > 0, and so tw + But w = v = vt |Dv |2 =
ut u
n 0 in U (0, ). 2 Estimate (33) follows.
|Du|2 . u2
The following form of Harnacks inequality is a , 0 < t1 < t2 , we have Corollary. For x1 , x2 U (39) u(x1 , t1 ) t2 t1
n/2
e 4(t2 t1 ) u(x2 , t2 ).
|x2 x1 |2
Proof. As before, take v = log u. Then (40) v (x2 , t2 ) v (x1 , t1 ) = =

1 d v (sx2 + (1 s)x1 , st2 + (1 s)t1 )ds 0 ds 1 Dv (x2 x1 ) + vt (t2 t1 )ds 0 1 n |Dv | |x2 x1 | + |Dv |2 2(st2 +(1 s)t1 ) 0 t2 t1
(t2 t1 )ds
log n 2
| x2 x 1 | 2 . 4(t2 t1 )
Here we used the inequality ab a2 +

1 0
1 2 b 4
and the identity t2 t1 .
t2 t1 ds = log t1 + s(t2 t1 ) 99
Exponentiate both sides of (40) to obtain (39). 3. Clausius inequality
Day [D, Chapters 3,4] has noted that the heat equation (and related second-order parabolic PDE) admit various estimates which are reminiscent of the classical Clausius inequality from Chapter II. We recount in this section some simple cases of his calculations. We hereafter assume u > 0 is a smooth solution of
n
(1)
ut
i,j =1
(aij uxi )xj = 0 in U (0, )
subject now to the prescribed temperature condition (2) u(, t) = (t) on U,
where : [0, ) (0, ) is a given, smooth function. a. Cycles Let us assume that T > 0 and is T -periodic: (3) (t + T ) = (t) for all t 0.
We call a T -periodic solution of (1), (2) a cycle. Lemma 1 Corresponding to each T -periodic as above, there exists a unique cycle u. (0, ), with g = (0) on U , we denote by u Proof. 1. Given a smooth function g : U the unique smooth solution of ij ut i,j =1 (a uxi )xj = 0 in UT (4) u = on U [0, T ] u = g on U {t = 0}. 2. Let g be another smooth function and dene u similarly. Then
d dt U
(u u )2 dx
= 2 U (ut u t )(u u )dx n ij = 2 U i,j =1 a (uxi u xi )(uxj u xj )dx 2

U
|D(u u )|2 dx,
there being no boundary term when we integrate by parts, as u u = = 0 on U . A version of Poincar es inequality states w2 dx C
U U
|Dw|2 dx
100
R, with w = 0 on U . Thus for all smooth w : U d dt (u u )2 dx

U
(u u )2 dx
for some > 0 and all 0 t T . Hence (5)

U
(u(, T ) u (, T ))2 dx eT
U
(g g )2 dx.
Dene (g ) = u(, T ), ( g) = u (, T ). As eT /2 < 1, extends to a strict contraction from 2 2 L (U ) into L (U ). Thus has a unique xed point g L2 (U ). Parabolic regularity theory implies g is smooth, and the corresponding smooth solution u of (4) is the cycle. b. Heating Let u be the unique cycle corresponding to and recall from A.1 that q = ADu = heat ux. Thus (6) Q(t) =
U
q dS =
U
u dS A
represents the total heat ux into U from its exterior at time t 0. We dene as well (7) + = sup{ (t) | 0 t T, Q(t) > 0} = inf { (t) | 0 t T, Q(t) < 0}
to denote, respectively, the maximum temperature at which heat is absorbed and minimum temperature at which heat is emitted. Theorem 1 (i) We have
T
(8)
0
Q dt 0,
which strict inequality unless is constant. (ii) Furthermore if is not constant, (9) < +.
Notice that (8) is an obvious analogue of Clausius inequality and (9) is a variant of classical estimates concerning the eciency of cycles. In particular (9) implies that it is not 101
possible for heat to be absorbed only at temperatures below some given temperature 0 and to be emitted only at temperatures above 0 . This is of course a form of the Second Law. Proof. 1. Write v = log u, so that
n n
vt
i,j =1
(a vxi )xj =
i,j =1
ij
aij vxi vxj = 0.
Then
d dt
v (, t)dx
= =
v dS + U A 1 u dS U u A Q(t) , (t)
dx
since u(, t) = (t) on U . As t v (, t) is T -periodic, we deduce (8) upon integrating the above inequality for 0 t T . We obtain as well a strict inequality in (8) unless
T
dxdt = 0,
0 U
which identity implies

T 0 U
|Dv |2 dxdt = 0.
Thus x v (x, t) is constant for each 0 t T and so u(x, t) (t) (x U )
for each 0 t T . But then the PDE (1) implies ut 0 in UT and so t (t) is constant. 2. We now adapt a calculation from II.A.3. If is not constant, then 0> (10)
T Q(t) dt 0 (t)
= = = +
T Q+ (t)Q (t) dt (t) 0 T T 1 Q+ (t)dt 1 Q (t)dt + 0 0 T |Q(t)|+Q(t) T |Q(t)|Q(t) 1 dt 1 dt + 0 2 2 0 T 1 1 1 |Q(t)|dt 2 + 0 T 1 1 + 1 Q(t)dt. 2 + 0
But the PDE (1) implies

T T
Q(t)dt =
0 0
d dt 102
u(, t)dx dt = 0,
U
since u is T -periodic. Hence (10) forces 1 1 < 0. + See Day [D, p. 64] for an interesting estimate from below on + . c. Almost reversible cycles Recall from Chapter II that the Clausius inequality becomes an equality for any reversible cycle. Furthermore the model discussed in II.A.4 suggests that real, irreversible cycles approximate ideal, reversible cycles if the former are traversed very slowly. Following Day [D, Chapter 3], we establish next an analogue for our PDE (1). Denition. We say a family of functions { }0<1 is slowly varying if there exist times {T }0<1 and constants 0 < c C so that (a) : [0, ) (0, ) is T -periodic (b) c (11) (c) T C/ (d) | | C, | | C2 for all > 0, t 0. For any as above, let u be the corresponding cycle, and set Q (t) =
U
u (, t)dS. A
Theorem 2 We have
T
(12)
0
Q dt = O() as 0.
Estimate (12) is a sort of approximation to the Clausius equality for reversible cycles. Proof. 1. Let w = w(x) be the unique smooth solution of (13) and set (14) u (x, t) := u (x, t) (t) + w(x) (t) 103
n ij i,j =1 (a wxi )xj
= 1 in U
w = 0 on U ,
for x U , 0 t T . Then (1), (13) imply (15) Now (16) for (17) 2. We now assert that
T
u t
n ij xi )xj i,j =1 (a u
= w(x) (t) in UT on U [0, T ].
u = 0
Q (t) =
u dS U A
d dt U
u (, t)dx
= |U | (t)
w dx (t) + R(t),
R(t) =
U
u t (, t)dx.
(18)
0
R2 (t)dt = O(3 ) as 0.
To verify this claim, multiply the PDE (15) by u t and integrate over UT : (19)
T 0 U
u 2 t dxdt +
U
T 0
n i,j =1
aij u xi u xj t dxdt
T 0
w u t dxdt.
The second term on the left is

T 0
d dt
1 2
aij u xi u xj dx dt = 0,
U i,j =1
. The expression on the right hand side of (19) owing to the periodicity of u , and thus u is estimated by T 1 T u 2 dxdt + C 0 | |2 dt 2 0 U t
1 2 T 0 U 3 u 2 t dxdt + O ( ),
| C2 . So (19) implies since T C/, |

T 3 u 2 t dxdt = O ( ), 0 U T 0 2 u dx dt U t T u 2 dxdt 0 U t
and thus
T 0
R2 (t)dt =
|U |
= O(3 ).
104
This proves (18). 3. Return now to (16). We have (20)

T Q dt 0
= |U |
T dt 0 T wdx 0 dt U
T R dt. 0
d = dt (log ) and is T -periodic. The second The rst term on the right is zero, since term is estimated by C |T | sup | | = O(),
and the third term is less than or equal T c

1 /2 0 T 1/2
R dt
C 3/ 2 = O(), 1/2
according to (18). This all establishes (12). ... Remark. Under the addition assumption that | | C3 , Day proves:
T 0
Q dt + A A=
T 0
dt = O(2 )
where
1 |U |2
wdx.
U
See [D, p. 5361].
105
CHAPTER 5: Conservation laws and kinetic equations The previous chapter considered linear, second-order PDE which directly model heat conduction. This chapter by contrast investigates various nonlinear, rst-order PDE, mostly in the form of conservation laws. The main theme is the use of entropy inspired concepts to understand irreversibility in time. A. Some physical PDE 1. Compressible Euler equations We recall here from III.C the compressible Eulers equations (1)
D Dt
+ div v = 0 v D = Dp Dt
where v = (v 1 , v 2 , v 3 ) is the velocity, is the mass density, and p is the pressure. a. Equation of state Our derivation in Chapter III shows that p can be regarded as a function of s (the entropy density/unit mass) and v (the specic volume). Now if the uid motion is isentropic, we can take s to be constant, and so regard p as a function only of v . But v = 1 and so we may equivalently take p as a function of : (2) p = p().
This is an equation of state, the precise form of which depends upon the particular uid we are investigating. Assume further that the uid is a simple ideal gas. Recall now formula (9) from I.F, which states that at equilibrium (3) P V = constant,
CP > 1. where = C V Since we are assuming our ow is isentropic, it seems plausible to assume a local version of (3) holds:
(4)
pv = ,
denoting some positive constant. Thus for an isentropic ow of a simple ideal gas the general equation of state (2) reads (5) p = . 106
b. Conservation law form For later reference, we recast Eulers equations (1). To do so, let us note that the second equation in (1) says
3
Hence
i vt
+
j =1
i v j vx j
= pxi
(i = 1, 2, 3).
i (v i )t = t v i + vt
= v i =
3 j j =1 (v )xj 3 i j j =1 (v v )xj
pxi
3 j =1
i v j vx p xi , j
(i = 1, 2, 3).
Therefore we can rewrite Eulers equations (1) to read (6) t + div(v) = 0 (v)t + div(v v + pI ) = 0
where v v = ((v i v j )) and p = p(). This is the conservation law form of the equations, expressing conservation of mass and linear momentum. 2. Boltzmanns equation Eulers equations (1), (6) are PDE describing the macroscopic behavior of a uid in terms of v and . On the other hand the microscopic behavior is presumably dictated by the dynamics of a huge number ( NA 6.02 1023 ) of interacting particles. The key point is this: of the really large number of coordinates needed to characterize the details of particle motion, only a very few parameters persist at the macroscopic level, after averaging over the microscopic dynamics. Understanding this transition from small to large scale dynamics is a fundamental problem in mathematical physics. One important idea is to study as well the mesoscopic behavior, which we can think of as describing the uid behavior at middle-sized scales. The relevant PDE here are generically called kinetic equations, the most famous of which is Boltzmanns equation. a. A model for dilute gases The unknown in Boltzmanns equation is a function f : R3 R3 [0, ) [0, ), such that f (x, v, t) is the density of the number of particles at time t 0 and position x R3 , with velocity v R3 . 107
Assume rst that the particles do not interact. Then for each velocity v R3 d dt f (x, v, t)dx
V (t)
= 0,
where V (t) = V + tv is the region occupied at time t by those particles initially lying within V and possessing velocity v . As III.A, we deduce 0=
V (t)
ft + divx (f v )dx =
V (t)
ft + v Dx f dx.
Consequently (7) ft + v Dx f = 0 in R3 R3 (0, )
if the particles do not interact. Suppose now interactions do occur in the form of collisions, which we model as follows. Assume two particles, with velocities v and v , collide and after the collision have velocities v and v
v v
v * v*
We assume the particles all have the same mass m, so that conservation of momentum and kinetic energy dictate: (8) (a) v + v = v + v (b) |v |2 + |v |2 = |v |2 + |v |2 ,
where v, v , v , v R3 . We now change variables, by writing v v = w 108
for w S 2 (= unit sphere in R3 ) and = |v v | 0. Then (8)(a) implies v v = w. Furthermore |v |2 + |v |2 = |v w|2 + |v + w|2 = |v |2 2v w + 2 + |v |2 + 2v w + 2 . Consequently (8)(b) yields the identity: = (v v ) w. Hence (9) v = v [(v v ) w]w v = v + [(v v ) w]w. Q(f, f )(v, ) = [f (v , )f (v , ) f (v, )f (v , )] B (v v , w)dv dS (w)
We next introduce the quadratic collision operator (10)

S2 R3
where dS means surface measure over S 2 , v , v are dened in terms of v, v , w by (9), and B : R3 S 2 (0, ) is given. We assume also that B (z, w) in fact depends only on |z |, |z w|, this ensuring that model is rotationally invariant. Boltzmanns equation is the integro/dierential equation (11) ft + v Dx f = Q(f, f ) in R3 R3 (0, ),
which diers from (7) with the addition of the collision operator Q(f, f ) on the right hand side. This term models the decrease in f (v, x, t) and f (v , x, t) and the gain in f (v , x, t) and f (v , x, t), owing to collisions of the type illustrated above. The term B models the rate of collisions which start with velocity pairs v, v and result in velocity pairs v , v given by (9). See Huang [HU, Chapter 3] for the physical derivation of (11). b. H -Theorem We henceforth assume f is a smooth solution of (11), with f 0. We suppose as well that f 0 as |x|, |v | , fast enough to justify the following calculations. Dene then Boltzmanns H -function (12) H (t) =
R3 R3
f log f dvdx 109
(t 0),
concerning the form of which we will comment later. Theorem 1 We have (13) dH 0 dt on [0, ).
Proof. 1. Let us as shorthand notation hereafter write f = f (v , ), f = f (v , ), f = f (v , ). Thus Q(f, f ) =

S2 3 R3
[f f f f ]B (v v , w)dv dS.
2. We now claim that if : R R is smooth, = (v ), then (14) (v )Q(f, f )(v )dv =

R3
1 4
S2
R3
R3
(f f f f )( + )Bdvdv dS.
This identity is valid since 1. interchanging v with v does not change the integrand on the left hand side of (14), 2. interchanging (v, v ) with (v , v ) changes the sign of the integrand, and 3. interchanging (v, v ) with (v , v ) changes the sign as well. More precisely, write B1 to denote the left hand side of (14). Then B2 := =
S2 S2 R3 R3
(f f f f )B (v v , w)dvdv dS (f f f f )B (v v, w)dv dvdS, R3

R3
where we relabeled variables by interchanging v and v . Since B (z, w) depends only on |z |, |z w|, we deduce B2 = B1 . Next set B3 :=
S2 R3 R3
(f f f f )B (v v , w)dvdv dS.
For each xed w S 2 , we change variables in R3 R3 by mapping (v, v ) to (v , v ), using the formulas (9). Then (v , v ) = (v, v ) I ww ww 110 ww I ww
6 6
and so
(v , v ) = 1. (v, v ) B3 =
S2
Consequently
R3 R3
(f f f f )B (v v , w)dv dv dS.
The integrand is (v )(f (v , )f (v , ) f (v , )f (v, ))B (v v , w) and we can now regard v, v as functions of v , v : v = v [(v v ) w]w v = v + [(v v ) w]w. Next we simply relabel the variables above, and so write v, v for v , v , and vice-versa: B3 =
S2 R3 R3
Now (9) implies |v v | = |v v | (v v ) w = (v v ) w; and so, since B (z, w) depends only on |z |, |z w|, we deduce B3 = B1 . Similarly we have B4 = B1 , for B4 :=
S2 R3 R3
Combining everything, we discover 4B1 = B1 + B2 B3 B4 , and this is the identity (14). 3. Now set (v ) = log f (v, ) in (14). Then (15)
R3
log f (v, )Q(f, f )(v, )dv = 1 4 0,
S2
R3
R3
(f f f f )[log(f f ) log(f f )]Bdvdv dS
111
since B 0 and log is increasing. Also put 1, to conclude (16)

R3
Q(f, f )(v, )dv = 0.
4. Thus
d H (t) dt
= R3 R3 ft (log f + 1)dvdx = R3 R3 [v Dx f + Q(f, f )](log f + 1)dvdx R3 R3 v Dx f (log f + 1)dvdx by (15), (16) = R3 v R3 Dx (f log f )dx dv = 0.
Remark. A smooth function f : R3 [0, ), f = f (v ), is called a Maxwellian if (17) Q(f, f ) 0 on R3 .
It is known that each Maxwellian has the form: (18) f (v ) = aeb|vc|

2
(v R3 )
for constants a, b R, c R3 : see TruesdellMuncaster [T-M]. According to the proof of Theorem 1, we have d H (t) < 0 dt unless v f (x, v, t) is a Maxwellian for all x R3 . This observation suggests that as t solutions of Boltzmanns equations will approach Maxwellians, that is (19) for t 1. c. H and entropy We provide in this section some physical arguments suggesting a connection between the H -function and entropy. The starting point is to replace (19) by the equality (20) f (x, v, t) = a(x, t)eb(x,t)|vc(x,t)|
2
f (x, v, t) a(x, t)eb(x,t)|vc(x,t)|
(x, v R3 )
(x, v R3 , t 0),
where a, b : R3 [0, ) (0, ), c : R3 [0, ) R3 . In other words we are assuming that at each point x in space and instant t in time, the distribution v f (x, v, t) is a Maxwellian 112
and is thus determined by the macroscopic parameters a = a(x, t), b = b(x, t), c = c(x, t). We intend to interpret these physically. (i) It is rst of all convenient to rewrite (20): (21) f (x, v, t) =
|v v|2 n 2 e , (2)3/2
where n, , v are functions of (x, t), n, > 0. Then (a) R3 f dv = n (b) R3 vf dv = nv (22) (c) R3 |v v|2 f dv = 3n. Thus (22)(a) says: (23)
R3
f (x, v, t)dv = n(x, t),
where n(x, t) is the particle density at x R3 , t 0. Introduce also m = mass/particle. Then (24)
R3
mf (x, v, t)dv = mn(x, t) =: (x, t),
for (x, t) the mass density. Then (22)(b) implies (25)

R3
mvf (x, v, t)dv = (x, t)v(x, t),
and thus v(x, t) is the macroscopic velocity. Using (22)(c) we deduce: (26)
R3
m 2 1 3 |v | f (x, v, t)dv = (x, t)|v(x, t)|2 + (x, t)(x, t). 2 2 2
|v |2 is the kinetic energy of a particle The term on the left is the total energy at (x, t) (since m 2 with mass m and velocity v , and f (x, v, t) is the number of such particles at (x, t)). The expression 1 |v|2 on the right is the macroscopic kinetic energy. Thus the term 3 must 2 2 somehow model macroscopic internal energy. (ii) To further the interpretation we now suppose that our gas can be understood macroscopically as a simple ideal gas, as earlier discussed in I.F. From that section we recall the equilibrium entropy function (27) S = R log V + CV log T + S0 , 113
valid for mole number N = 1. Here S = CV = entropy/mole heat capacity/mole.
Now a mole of any substance contains Avagadros number NA of molecules: see Appendix A. Thus the mass of a mole of our gas is NA m. Dene now s = cv = Then (28) Recall also from I.F that (29) Then (27)(29) imply: s = (30) = = where (31) k= R NA
R log V +CV log T + s0 NA m R (log V +( 1) 1 log T ) + s0 NA m k (log V + ( 1)1 log T ) m
entropy/unit mass heat capacity/unit mass.
s = S/NA m cv = CV /NA m.
CP > 1, CP CV = R. CV
+ s0 ,
is Boltzmanns constant. We now further hypothesize (as in Chapter III) that formula (30) makes sense at each point x R3 , t 0 for our nonequilibrium gas. That is, in (30) we replace s by s(x, t) = entropy density/unit mass T by (x, t) = local temperature (32) V by 1 = volume/unit mass. (x,t) Inserting (32) into (30) gives: (33) s= k (( 1)1 log log ) + s0 . m 114
(iii) Return now to (21). We compute for x R3 , t 0: h(x, t) := = =

R3
f (x, v, t) log f (x, v, t)dv

R3
n (2)3/2
|v v|2 2
log n 3 log(2) 2
|v v | 2 2
dv
log + r0 , n log n 3 2
r0 denoting a constant. Since nm = , we can rewrite: (34) h(x, t) = m log 3 log + r0 , 2
r0 now a dierent constant. Comparing now (33), (34), we deduce that up to arbitrary constants (35) (x, t)s(x, t) = kh(x, t) (x R3 , t 0),
provided is proportional to , (36) and ( 1)1 = 3/2, that is (37) 5 = . 3 (x, t) = (x, t) ( > 0),
Making these further assumptions, we compute using (35) that S (t) := R3 s(, t)dm = R3 s(, t)(, t)dx = k R3 h(, t)dx = kH (t). Hence (38) S (t) = kH (t) (t 0).
So the total entropy at time t is just kH at time t and consequently the estimate (13) that dH/dt 0 is yet another version of Clausius inequality. (iv) It remains to compute . Owing to (26), we expect 3 3 = 2 2 should be the internal energy/unit to represent the total internal energy at (x, t). That is, 3 2 mass. 115
Now we saw in I.F that the internal energy/mole at equilibrium is CV T . Thus we can expect (39) 3 = cv , 2
where we recall cv is the heat capacity/unit mass. Thus (28), (39) imply: = R and so But (29), (37) imply CV = 3 2 (40) We summarize by returning to (21): (41) where f (x, v, t) = n m 2k
3 /2
2 CV . 3 NA m
R k = . NA m m
m|v v|2 2k
n(x, t) = particle density at (x, t) (x, t) = local temperature at (x, t) v(x, t) = macroscopic uid velocity at (x, t).
This formula is the MaxwellBoltzmann distribution for v . Remark. For reference later in Chapter VII, we rewrite (41) as (42) for = H = Z = f =n eH , Z
(43)
1 , k m |v v|2 , 2 2k 3/2 = m
R3
eH dv.
B. Single conservation law Eulers and Boltzmanns equations are among the most important physical PDE but, unfortunately, there is no fully satisfactory mathematical theory available concerning the 116
existence, uniqueness and regularity of solutions for either. Much eort has focused therefore on certain simpler model PDE, the rigorous analysis of which presumably provides clues about the structure of solutions of Eulers and Boltzmanns PDE. In this section we discuss a type of nonlinear rst-order PDE called a conservation law and describe how entropy-like notions are employed to record irreversibility phenomena and in fact to dene appropriate weak solutions. Following Lions, Perthame, Tadmor [L-P-T1] we introduce also a kinetic approximation, which is a sort of simpler analogue of Boltzmanns equation. In C following we discuss systems of conservation laws. 1. Integral solutions A PDE of the form (1) ut + div F(u) = 0 in Rn (0, )
is called a scalar conservation law. Here the unknown is u : Rn [0, ) R and we are given the ux function F : R Rn , F = (F 1 , . . . , F n ). Physical interpretation. We regard u = u(x, t) (x Rn , t 0) as the density of some scalar conserved quantity. If V represents a xed subregion of Rn , then (2) d dt u(, t)dx
V
represents the rate of change of the total amount of the quantity within V , and we assume this change equals the ux inward through V : (3)
V
F dS.
We hypothesize that F is a function of u. Equating (2), (3) for any region V yields the conservation law (1). Note that (1) is a caricature of Eulers equations (6) in A. Notation. We will sometimes rewrite (1) into nondivergence form (4) where (5) b = F , b = (b1 , . . . , bn ). 117 ut + b(u) Du = 0 in Rn (0, ),
We can interpret (4) as a nonlinear transport equation, for which the velocity v = b(u) depends on the unknown density u. The left hand side of (4) is thus a sort of nonlinear variant of the material derivative Du from Chapter III. The characteristics of (1) are solutions Dt x() of the ODE (6) x (t) = v(x(t), t) = b(u(x(t), t)) (t 0). We will study the initial value problem (7) ut + div F(u) = 0 in Rn (0, ) u = g on Rn {t = 0},
n where g L1 loc (R ) is the initial density. The rst diculty in this subject is to understand the right way to interpret a function u as solving (7). n Denition. We say u L1 loc (R (0, )) is an integral solution of (7) provided:
(8)
0 Rn 1 (Rn [0, )). for all v Cc
uvt + F(u) Dvdxdt +
Rn
gv (, 0)dx = 0
Examples. (a) If g (x) = then u(x, t) = is an integral solution of (9) (b) If, instead, g (x) = 0 x<0 1 x > 0, ut +
u2 2 x
1 x<0 0 x > 0, 1 x< 0 x>

t 2 t 2
= 0 in R (0, )
u = g on R {t = 0}.
118
then u1 (x, t) = and 0 x< 1 x>
t 2 t 2
0 x<0 x 0<x<t u2 (x, t) = t 1 x>t
are both integral solutions of (9). As explained in Smoller [S], [E1, Chapter III], etc., the physically correct integral solution of (b) is u2 . The function u from Example (a) admits a physical shock, with which the characteristics collide. The function u1 from Example (b) is wrong since it has a nonphysical shock, from which characteristics emanate. (c) If x<0 1 1x 0<x<1 g (x) = 0 x > 1, then u(x, t) = 1 0 x t, 0 t 1 t x 1, 0 t 1 x 1, 0 t 1
1+t , 2 1+t , 2
1 x 1 t
1 x< 0 x>
t1 t1
is an integral solution of (9). The function u is the physically correct solution of (9). Note carefully that although g = u(, 0) is continuous, u(, t) is discontinuous for times t > 1. Note also that this example illustrates irreversibility. If 1 x 1 2 g (x) = 1 0 x > 2, then the corresponding physically correct solution is u (x, t) = But then u u for times t 1. 2. Entropy solutions 119 1 x< 0 x>
1+t 2 1+t . 2
We next introduce some additional mathematical structure which will clarify what we mean above by a physically correct solution. Denition. We call (, ) an entropy/entropy ux pair for the conservation law (1) provided (i) : R R is convex and (ii) : R Rn , = (1 , . . . , n ) satises (10) Thus
z
= b .
(11) up to additive constants.
(z ) =
0
bi (v ) (v )dv
(i = 1, . . . , n)
Motivation. Notice that it is a C 1 -solution of (1) in some region of Rn (0, ), then (10) implies (u)t + div (u) = 0 there. We interpret this equality to mean that there is no entropy production within such a region. On the other hand our examples illustrate that integral solutions can be discontinuous, and in particular Examples (a), (b) suggest certain sorts of discontinuities are physically acceptable, others not. We intend to employ entropy/entropy ux pairs to provide an inequality criterion, a kind of analogue of the ClausiusDuhem inequality, for selecting the proper integral solution. The easiest way to motivate all this is to introduce the regularized PDE (12)
n u t + div F(u ) = u in R (0, ),
where > 0. By analogy with the incompressible NavierStokes equations ((18) in III.C) we can regard the term u as modelling viscous eects, which presumably tend to smear out discontinuities. And indeed it is possible to prove under various technical hypotheses that (12) has a smooth solution u , subject to initial condition u = g on Rn {t = 0}. Take a smooth entropy/entropy ux pair , and compute: (u )t + div (u ) = = = =
(u )u t + (u ) Du (u )(b(u ) Du + u ) + (u ) Du (u )u by (10) div( (u )Du ) (u )|Du |2 div( (u )Du ),
(13)
120
1 the inequality holding since is convex. Now take v Cc (Rn (0, )), v 0. Then (13) implies 0 Rn
(u ))dxdt (u )vt + (u ) Dvdxdt = 0 Rn v ((u )t + div( 0 Rn v div( (u )Du )dxdt = 0 Rn (u )Dv Du dxdt.
Assume now that as 0, (14) and further we have the estimate
u u boundedly, a.e.,
(15) Then send 0 above
sup
0 Rn
|Du |2 dxdt < .
(16)
0 Rn
(u)vt + (u) Dvdxdt 0.
This inequality motivates the following Denition. We say that u C ([0, ); L1 (Rn )) is an entropy solution of (17) provided (18) (u)t + div (u) 0 ut + div F(u) = 0 in Rn (0, ) u = g on Rn {t = 0}
in the weak sense for each entropy/entropy ux pair (, ), and (19) u(, 0) = g.
Remarks. (i) The meaning of (18) is that the integral inequality (16) holds for all v 1 (Rn (0, )), v 0. Cc (ii) We can regard (18) as a form of the ClausiusDuhem inequality, except that the sign is reversed. Note carefully: if , is an entropy/entropy ux pair, then s = (u) acts like a physical entropy density. The customary mathematical and physical usages of the term entropy dier in sign. (iii) Since we have seen that (u)t + div (u) = 0 in any region where u is C 1 , the possible inequalities in (18) must arise only where u is not C 1 , e.g. along shocks. We motivated (18) 121
by the vanishing viscosity method of sending 0 in (12). This is somewhat related to the model thermodynamic system introduced in II.A.4 where we added dissipation to an ideal dissipationless model. In that setting, the ideal model arises if conversely we send the dissipation terms R1 , R2 0. By contrast, for conservation laws if we obtain u as the limit (14) of solutions u of the viscous approximations (12), then u remembers this vanishing viscosity process in the structure of its discontinuities. For instance in examples (a)(b) above, the physically correct shocks do arise as the limit 0, whereas the physically incorrect shock cannot. (To be consistent, we should redene the g s and thus the us for large |x|, so that u L1 .) The existence of an entropy solution of (17) follows via the vanishing viscosity process, and the really important assertion is this theorem of Kruzkov: Theorem Assume that u, u are entropy solutions of (20) and (21) Then (22) for each 0 s t. In particular an entropy solution of the initial value problem (20) is unique. See for instance [E1, 11.4.3] for proof. 3. Condition E We illustrate the meaning of the entropy condition for n = 1 by examining more closely the case that u is a piecewise smooth solution of (23) ut + F (u)x = 0 in R (0, ). u(, t) u (, t)
L1 (Rn )
ut + div F(u) = 0 in Rn (0, ) u = g on Rn {t = 0}
u t + div F( u) = 0 in Rn (0, ) u = g on Rn {t = 0}.
u(, s) u (, s)
L1 (Rn )
More precisely assume that some region V R (0, ) is subdivided into a left region Vl and a right region Vr by a smooth curve C , parametrized by {x = s(t)}:
122
C={x=s(t)} t
Vl
Vr
l and V r , and further u satises the condition Assume that u is smooth in both V (24) (u)t + (u)x 0 in V
in the weak sense, for each entropy/entropy ux pair (, ). Thus (25)

V 1 (V ), with v 0. Here for each v C0
(u)vt + (u)vx dxdt 0
(26)
: R R is convex, and = b for b = F .
1 First take v Cc (Vl ), v 0. Then, since u is smooth in Vl , we can integrate by parts in (25) to deduce
[(u)t + (u)x ]vdxdt 0.

Vl
This inequality is valid for each v as above, whence (u)t + (u)x 0 in Vl . Take (z ) = z , (z ) = F (z ), to conclude ut + F (u)x = 0 in Vl . But then (26) implies (27) (u)t + (u)x = 0 in Vl . 123
Similarly ut + F (u)x = 0 in Vr , (28) (u)t + (u)x = 0 in Vr .
1 (V ), v 0. Then (25) says Next take v Cc
(u)vt + (u)vx dxdt +

Vt Vr
(u)vt + (u)vx dxdt 0.
Integrate by parts in each term, recalling (27), (28), to deduce (29)

C
v [((ul ) (ur )) 2 + ((ul ) (ur )) 1 ]dl 0
where = ( 1 , 2 ) is the outer unit normal to Vl along C , ul is the limit of u from the left along C , and ur is the limit from the right. Since = 1 (1, s ) (1 + (s )2 )1/2
and v 0 is arbitrary, we conclude from (29) that (30) s ((ur ) (ul )) (ur ) (ul ) along C .
Taking (z ) = z , (z ) = F (z ), we obtain the RankineHugoniot jump condition (31) s (ur ul ) = F (ur ) F (ul ).
Select now a time t, and suppose ul < ur . Fix ul < u < ur and dene the entropy/entropy ux pair (z ) = (z u)+ z (z ) = ul sgn+ (v u)F (v )dv. Then (ur ) (ul ) = ur u (ur ) (ul ) = F (ur ) F (u).
Consequently (30) implies (32) Combine (31), (32): (33) F (u) F (ur ) F (ul ) (u ur ) + F (ur ) ur ul 124 (ul u ur ). s (u ur ) F (u) F (ur ).
This inequality holds for each ul u ur and says that the graph of F on the internal [ul , ur ] lies above the line segment connecting (ul , F (ul )) to (ur , F (ur )). A similar proof shows that if ul > ur , then (34) F (u) F (ur ) F (ul ) (u ur ) + F (ur ) ur ul (ur u ul )
The inequalities (33), (34) are called Oleiniks condition E . Remarks. (i) In particular, if F is strictly convex, then ul ur and if F is strictly concave, then ul ur . (ii) If (33) or (34) holds, then = F (ur ) s F (ur ) F (ul ) F (ul ). ur ul
As characteristics move with speed b = F , we see that characteristics can only collide with the shock and cannot emanate from it. The same conclusion follows from (34). This geometric observation records the irreversibility inherent in the entropy condition (24). 4. Kinetic formulation Our intention next is to introduce and study a sort of kinetic formulation of the conservation law ut + div F(u) = 0. If we think of this PDE as a simplied version of Eulers equations, the corresponding kinetic PDE is then a rough analogue of Boltzmanns equation. The following is taken from PerthameTadmor [P-T] and LionsPerthameTadmor [L-P-T1]. We will study the kinetic equation (35) where w : Rn R (0, ) R, w = w(x, y, t), is the unknown, b = F as in 2, and m is a nonnegative Radon measure on Rn R (0, ). Hence my = m. y 125 wt + b(y ) Dx w = my in Rn R (0, ),
We interpret w as solving (35) in the weak (i.e. distribution) sense. We can think of y as a variable parameterizing the velocity eld v = b(y ) and so in analogy with Boltzmanns equationinterpret w(x, y, t) as the density of particles with velocity v = b(y ) at the point x Rn and time t 0. Then (36) u(x, t) :=
R
w(x, y, t)dy
should represent the density at (x, t). The idea will be to show under certain circumstances that u solves the conservation law ut + div F(u) = 0. To facilitate this interpretation, we introduce the pseudo-Maxwellian 1 if 0 < y a 1 if a y 0 (37) a (y ) = 0 otherwise. for each parameter a R. As a sort of crude analogue with the theory set forth in A, we might guess that w, u are further related by the formula: (38) w(x, y, t) = u(x,t) (y ).
This equality says that on the mesoscopic scale the velocity distribution is governed by the pseudo-Maxwellian, with macroscopic parameter a = u(x, t) at each point x Rn , time t 0. It is remarkable that this rough interpretation can be made quite precise. Theorem (i) Let u be a bounded entropy solution of (39) and dene (40) Then w(x, y, t) = u(x,t) (y ) (x Rn , g R, t 0). ut + div F(u) = 0 in Rn (0, )
1 w C ([0, ), L1 (Rn R)) L (Rn x (0, ), L (Ry ))
solves the kinetic equation (41) wt + b(y ) Dx w = my in Rn R (0, )
for some nonnegative Radon measure m, supported in Rn [R0 , R0 ] (0, ), 126
where R0 = u L . 1 (ii) Conversely, let w C ([0, ); L1 (Rn R)) L (Rn x (0, ), L (Ry )) solve (41) for some measure m as above. Assume also w has the form w = u(x,t) . Then (42) is an entropy solution of (39). Proof. 1. First we prove (ii). Let : R R be convex, with (0) = 0. Temporarily (R) satisfy assume as well that is C 2 . Let Cc (43) 0 1, 1 on [R0 , R0 ] 0 on R [R0 1, R0 + 1]. u(x, t) =
R
w(x, y, t)dy
1 (Rn (0, )), v 0. We employ Take v Cc
(44)
v (x, t) (y ) (y )
as a test function in the denition of w as a weak solution of the transport equation (41): (45)
0 R Rn
w(v )t + wb(y ) Dx (v )dxdydt = 0 R Rn (v )y dm.
We must examine each term in this identity. 2. Now w(v )t dxdydt 0 R Rn = 0 Rn vt R w dy dxdt. By hypothesis w = u(x,t) , and therefore
R
w(x, y, t) (y ) (y )dy =
(46)
= = (u(x, t)),
(y ) (y ) (y )dy R u(x,t) u(x,t) (y ) (y )dy if 0
u(x, t) 0
since (0) = 0 and 1 on [0, u(x, t)]. A similar computation is valid if u(x, t) 0. Hence (47)
0
w(v k )t dxdydt R Rn v (u)dxdt. Rn t 0 127
Similarly,
= Now if u(x, t) 0, then

R
R Rn Dx v Rn 0
wb(y ) Dx (v )dxdydt R b(y )w dy dxdt.

u(x,t)
wb(y ) dy = 0 b(y ) (y ) (y )dy = (u(x, t)),
for
z
(48)
(z ) :=
0
b(y ) (y )dy.
The same calculation is valid if u(x, t) 0. Thus (49)

0 R Rn Dv Rn 0
wb(y ) Dx (v )dxdydt (u)dxdt.
3. We investigate now the term on the right hand side of (45): (50)
0 R Rn
(v )y dm =
0 0
Rn Rn
v ( + )dm v dm, R
R
since 0, , v 0. Additionally, since 0 on the support of m, we have
(51)
0 Rn R
v dm = 0.
4. Combine (45), (47), (49), (51), to conclude
(52)
0 Rn
(u)vt + (u) Dvdxdt 0
for all v as above. An easy approximation removes the requirement that be C 2 . Thus for each entropy/entropy ux pair we have (u)t + div (u) 0 in Rn (0, ) in the weak sense, and consequently u is an entropy solution of (39). 5. Now we prove assertion (i) of the Theorem. Let u be a bounded entropy solution of ut + div F(u) = 0 and dene (53) w(x, y, t) := u(x,t) (y ) (x Rn , y R, t 0). 128
Dene also the distribution T on Rn R (0, ) by (54) T, :=

0 R Rn
w(t + b(y ) Dx )dxdydt
for all Cc (Rn R (0, )). That is,
(55)
T = wt + b(y ) Dx w in the distribution sense.

L .
Observe T = 0 o Rn [R0 , R0 ] (0, ), where R0 = u Dene now another distribution M by (56) for as above. Then (57) 6. We now claim that (58) M, 0 T = M, := T,
y
(x, z, t)dz
M in the distribution sense. y
(Rn R (0, )) with 0. for all Cc To verify this, rst suppose
(59) with
(x, y, t) = (x, t) (y ),
0, Cc (Rn (0, )) 0, Cc (R).
Take
y z
(60) Then M, (61)
(y ) :=
0
(w)dwdz.
= M, = T, by (56), (60) = 0 R Rn w[t + b(y ) Dx ] dxdydt by (54) = 0 Rn R u [t + b(y ) Dx ]dydxdt = 0 Rn (u)t + (u) Dx dxdt, 129
where = a. The last equality results from calculations like those in steps 1,2 above. Now is convex and so, since u is an entropy solution, the last term in (60) is nonnegative. Thus (62) Next, take (x, y, t) = (x, t) (y ) where , are smooth, nonnegative and have compact support. Assume also
M, 0 if (x, y, t) = (x, t) (y ), where 0, 0.
dxdydt = 1.
0 R Rn
Let () =
1 (/), n+2
and then, given Cc (Rn R (0, )), 0, set
(x, y, t) = ( )(x, y, t) ) (y y )dx . = 0 R Rn (x x , t t )( x, y , t dy dt We have M, = 0 0,

R Rn
)dx M, ( x, y , t dy dt
owing to (62). Send 0 to establish the claim (58). 7. Finally we recall that (56) implies M is represented by a nonnegative Radon measure. That is, there exists m as stated in the Theorem such that
M, =
0 R R
dm.
(See e.g. [E-G].) Thus wt + b(y ) Dx w = T = in the distribution sense. Remark. For each entropy/entropy ux pair (u)t + div (u) 0 in the distribution sense, and soas abovewe can represent (u)t + div (u) = 130 M = my y
where is a nonnegative Radon measure on Rn (0, ), depending on . This measure is supported outside of any open regions where u is C 1 , and so records the change of the entropy (u) along the shocks and other places where u is not smooth. The measure m on the right hand side of the kinetic equation (41) for w = u somehow records simultaneously the information encoded in for all the entropies . 5. A hydrodynamical limit To illustrate the usefulness of the kinetic formulation of the conservation law introduced in 4, we consider in this section the problem of understanding the limit as 0 of solutions of the scaled transport equation (63) where (64) u (x, t) :=
R
1 + b(y ) Dx w = (u w ) in Rn R (0, ) wt
w (x, y, t)dy.
This is a nonlocal PDE for the unknown w . Physical interpretation. We may think of (63) as a scaled, simplied version of Boltzmanns equation, the parameter being a crude approximation to the mean free path length between particle collisions. The right hand side of (63) is a sort of analogue of the collision operator Q(, ). If we similarly rescale Boltzmanns equation 1 ft + v Dx f = Q(f , f ) and send 0, we may expect the particle density f (, v, ) to approach a Maxwellian distribution, controlled by the macroscopic parameters (x, t), v(x, t), (x, t), which in turn should satisfy macroscopic PDE. See for instance BardosGolseLevermore [B-G-L] for more on this. This is called a hydrodynamical limit. Our scaled problem (63), (64) is a vastly simplied variant, for which it is possible to understand rigorously the limit 0. First let us adjoin to (63), (64) the initial condition (65) w = g on Rn R {t = 0},
where g : Rn R is a given, smooth function, with compact support. Theorem As 0, (66) w
w weakly in L (Rn R (0, )), 131
where w solves (67)
wt + b(y ) Dx w = my w = u in Rn R (0, ) w = g on Rn R {t = 0},
for m a nonnegative Radon measure and u the unique entropy solution of (68) ut + div F(u) = 0 on Rn (0, ) u = g on Rn {t = 0}.
We say the conservation law (68) is the hydrodynamical limit of the scaled kinetic equations (63) as 0. Proof (Outline). 1. It is shown in [P-T] that (69) and further (70) {u }0<1 is strongly precompact in n L1 loc (R [0, )), |w | 1 a.e., w 0 on {y 0}, w 0 on {y 0}. supt(w ) Rn [R , R ] (0, ), 0 0
where R0 = g L . We will use these facts below. 2. We now claim that we can write (71) 1 (u w ) = m y,
for some nonnegative function m supported in Rn [R0 , R0 ] (0, ). To conrm this, x R0 a R0 and assume h L (R) satises supt(h) [R0 , R0 ], 1 h 0 if y 0 (72) 0 h 1 if y 0 hdy = a R Then (73) a (y ) h(y ) = q (y ) 132 (a.e. y R)
for q (y ) :=
a (z ) = h(z )dz.
Recall that
1 if 0 z a 1 if a z 0 a (z ) = 0 otherwise.
Thus if a 0, we deduce from (72), (73) that q 0 for a.e. < y < a q 0 for a.e. a < y < and the same inequalities are true if a 0. Furthermore q (R0 ) = 0 and q (R0 ) =
R0 R0
a (z ) h(z )dz

= a Hence (74)
hdz = 0.
q 0 on R.
3. Recall (72) and apply the results in step 2 to h(y ) = w (x, y, t), a = u (x, t) =
w (x, y, t)dy.
According to (70), this choice of h satises conditions (72). Then (73), (74) say 1 (u w ) = m y, where supt(m ) [R0 , R0 ], m 0 for each (x, t). This is assertion (71). 4. Next we assert: (75)
0<1
sup m
L1 (Rn R(0,))
< .
133
A formal calculation leading to (75) is this:

0 R Rn
m dxdydt = 0 Rn R m (y )y dydxdt = 0 R n R m y ydydxdt + b(y ) Dx w )ydydxdt = 0 Rn R (wt = Rn R w (x, y, 0)ydydx = Rn R g(x) (y )ydydx by (65) =
g2 dx Rn 2
< .
We omit the detailed proof of (75). 5. Employing now (69), (70), (75) we extract a sequence r 0 so that r w weakly in L wr ur u strongly in L1 loc r m m weakly as measures. Hence (76) wt + b(y ) Dx w = my in Rn R (0, ) u w = m y ;
(Rn R (0, )), and so for each Cc 0 R Rn 0 Rn R
in the weak sense. Furthermore
(u w )dxdydt =
y dm 0.
Consequently (77) Now ur
w weakly in L .
1 if 0 y ur 1 if ur y 0 ur (y ) = 0 otherwise, |ur u |dy = |ur u|.
and so
R r 1
Since u u strongly in Lloc , we see ur u in L1 loc . 134
Thus w = u . Hence (67) holds and so, according to the kinetic formulation in 4, u solves the conservation law (68). C. Systems of conservation laws A system of conservation laws is written (1) where the unknown is u : Rn [0, ) Rm , u = (u1 , . . . , um ) and
1 F1 ... . m mn F:R M , F= . . m F1 ...
ut + div F(u) = 0 in Rn (0, ),
1 Fn . . . m Fn mn
is given. Notation. (i) We can rewrite (1) into the nondivergence form (2) for ut + B (u)T : Du = 0 in Rn (0, ) B = DF, B : Rm L(Rm , Mmn ).
We sometimes write F = F(z ), B = B(z ) for z Rm . (ii) In terms of the components of u, (1) says
n
(3) and (2) means
uk t +
i=1
(Fik (u))xi = 0
(k = 1, . . . , m)
(4)
uk t
+
i=1 l=1
Fik (u) l uxi = 0 zl
(k = 1, . . . , m).
135
We are interested in properly formulating the initial value problem (5) where g : Rn Rm , g = (g 1 , . . . , g m ) is given. 1. Entropy conditions
n m Denition. We say u L1 loc (R (0, ); R ) is an integral solution of (5) provided
ut + div F(u) = 0 in Rn (0, ) u = g on Rn {t = 0},
(6)
0 Rn
u vt + F(u) : Dvdxdt +
Rn
g v(, 0)dx = 0
1 (Rn [0, ); Rm ). for each v Cc
Notation. We write v = (v 1 , . . . , v m ), Dv = F(u) : Dv =

1 vx ... 1 m v x1 . . . m k=1 n i=1 1 vx n m vx n
k Fik (u)vx . i
As for scalar conservation laws this is an inadequate notion of solution, and so we introduce this additional Denition. We call (, ) an entropy/entropy ux pair for the conservation law (1) provided (i) : Rm R is convex and (ii) : Rm Rn , = (1 , . . . , n ) satises (7) Notation. The identity (7) means:
m
D = BD.
(8)
i zk
=
l=1
Fil z zk l
(1 i n, 1 k m).
136
Motivation. Suppose u is a C 1 solution of (1) in some region of Rn (0, ). Then (9) there. Indeed, we compute: (u)t + div (u) = + = + =
m k k=1 zk (u)ut m n i k k=1 i=1 zk (u)uxi Fik (u) l m n ( u ) u xi z k k,l=1 i=1 zl m n i k k=1 i=1 zk (u)uxi according to (4) Fil (u) m n m ( u ) + i z zk (u) l k=1 i=1 l=1 zk
(u)t + div (u) = 0
uk xi
= 0, owing to (8).
Unlike the situation for scalar conservation laws (i.e. m = 1), there need not exist any entropy/entropy ux pairs for a given system of convservation laws. For physically derived PDE, on the other hand, we can hope to discern at least some such pairs. 2. Compressible Euler equations in one space dimension We return to A.1 and consider now the compressible, isentropic Euler equations in one space dimension. According to (6) in A.1, the relevant PDE are t + (v )x = 0 in R1 (0, ), (10) 2 (v )t + (v + p)x = 0 where is the density, v the velocity and (11) p = p()
is the pressure. Observe that (10) is of the form ut + (F(u))x = 0 for m = 2, (12) u = (, v ) 2 F = (z2 , z2 /z1 + p(z1 )).
137
Remark. We have B = DF = The eigenvalues of B are = assuming (13) Reverting to physical variables, we see = v (p ())1/2 . It follows that the speed of sound for isentropic ow is p ()1/2 . a. Computing entropy/entropy ux pairs We attempt now to discover entropy/entropy ux pairs (, ), where to simplify subsequent calculations we look for , as functions of (, v ) (and not (u1 , u2 ) = (, v )). Thus we seek = (, v ), = (, v ) such that (14) and (15) t + x = 0 in any region where (, v ) are C 1 -solutions of (10). the mapping (, v ) is convex p > 0.
2 2 z2 /z1
0 1 + p (z1 ) 2z2 /z1
z2 (p (z1 ))1/2 z1
So let us assume (, v ) solve (10), which we recast into nondivergence form: (16) t + x v + vx = 0 1 vt + vvx = px = p
x .
138
Observe that the second line here is just the second line of formula (1) from A.1. So if = (, v ), = (, v ), we can compute: t + x = t + v vt + x + v vx = (x v vx ) + v vvx p + x + v vx = x v
p x
+vx [v v v ]. Consequently, t + x 0 for all smooth solutions (, v ) of (15) if and only if (17) = v +
p
v = + v v .
We proceed further by noting v = v : v + Hence + v v + and consequently (18) = p () vv 2 ( > 0, v R). p v = ( + v v ) .

v
p vv = + + v v ,
In summary, if solves (18) and we compute from (17), then (, ) satises t +x = 0, whenever (, v ) are smooth solutions of Eulers equations (10). Since p > 0, (18) is a linear nonhomogeneous wave equation. Denition. is called a weak entropy function if solves (18), with the initial conditions (19) = 0, = g on R { > 0},
for some given g : R R, g = g (v ). To go further, let us take from A.1 the explicit equation of state (20) p() = , where = ( 1)2 , >1 4
the constant so selected to simplify the algebra. 139
Lemma (i) The solution of (17), (18) for g = {0} = Dirac mass at the origin is (21) (, v ) = ( 1 v 2 ) +, = 3 . 2( 1)
(ii) The general solution of (17), (18) is (22) (, v ) =

R
g (y )(, y v )dy
( > 0, v R).
(iii) Furthermore, dened by (21) is convex in (, v ) if and only if g is convex. (iv) The entropy ux associated with is (23) for =
1 . 2
(, v ) =
R
g (y )(y + (1 )v )(, y v )dy
See [L-P-T2] for proof. We will momentarily see that we can regard as a sort of pseudo-Maxwellian, parameterized by the macroscopic parameters , v . Example. Take g (v ) = v 2 . Then (24) (, v ) = = y 2 ( 1 (y v )2 ) + dy 1 k v 2 + . 2 1
R
k The term 1 v 2 is the density of the kinetic energy, and is the density of the internal 2 1 energy. Hence is the energy density. If (, v ) is an entropy solution of (10), then
t + x 0, and so (25) sup

t0 R
k 1 (x, t)v 2 (x, t) + (x, t)dx < , 2 1
provided the initial conditions satisfy this bound. b. Kinetic formulation
140
Theorem Let (, v ) L ((0, ); L1 (R, R2 )) have nite energy and suppose 0 a.e. Then (, v ) is an entropy solution of Eulers equations t + (v )x = 0 in R (0, ) (26) 2 (v )t + (v + p)x = 0 if and only if there exists a nonpositive measure m on R R (0, ) such that (27) satises (28) wt + [(y + (1 )v )w]x = myy in R R (0, ). w = (, y v ) ( = (x, t), v = v (x, t), y R)
We call (27), (28) a kinetic formulation of (26). Proof. 1. As in B.4 dene the distributions (29) and (30) 2M = T. y 2 T = wt + [(y + (1 )v )w]x
2. Take , to be a weak entropy/entropy ux pair as above. That is, (, v ) = (, v ) = Then (31) Suppose now (x, y, t) = (x, t) (y ) where
0, Cc 0, Cc .
g (y )(, y v )dy g (y )(y + (1 )v )(, y v )dy. R

R
t + x =
g (y )(wt + [(y + (1 )v )w]x )dy.
Take g so that (32) g = . 141
Then (30) implies

0 R
t + x dxdt = = = =
g (wt + [(y + (1 )v )w]x )dxdydt
T, g M, by (29), (31) M, .
3. Now if (, v ) is an entropy solution, then
(33)
0 R
t + x dxdt 0
since 0, and thus M, 0. This holds for all = as above and so, as in B.4, (34)
, 0. M, 0 for all Cc
Thus M is represented by a nonpositive measure m. Conversely if (33) holds, then (32) is 1 valid for all 0, Cc . 4. Lastly note the estimate
0 R R
dm = = = = =
1 2 1 2 1 2 1 2 1 2
(y 2 )yy dm y 2 [wt + [(y + (1 )v )w]x dxdydt R R + x dxdt R t ((, 0), v (, 0))dx R 1 k (, 0)v (, 0)2 + (, 0) dx 1 R 2
R R
0 0 0
< . See LionsPerthameTadmor [L-P-T2] and LionsPerthameSouganidis [L-P-S] for remarkable applications.
142
CHAPTER 6: HamiltonJacobi and related equations A. Viscosity solutions A PDE of the form (1) ut + H (Du) = 0 in Rn (0, )
is called a HamiltonJacobi equation. The unknown is u : Rn [0, ) R and the Hamiltonian H : Rn R is a given continuous function. Here Du = (Dx u) = (ux1 , . . . , uxn ). In this short chapter we introduce the notion of viscosity solutions of (1), which are dened in terms of various inequalities involving smooth test functions. The relevant theory will not seem to have anything much to do with our ongoing themes concerning entropy and PDE, but connections will be established later, in Chapter VIII. Following CrandallLions [C-L] and [C-E-L] let us make the following Denition. A bounded uniformly continuous function u is called a viscosity solution of (1) provided for each v C (Rn (0, )) if u v has a local maximum at a point (x0 , t0 ) Rn (0, ), (2) then v t (x0 , t0 ) + H (Dv (x0 , t0 )) 0 and (3) if u v has a local minimum at a point (x0 , t0 ) Rn (0, ), then vt (x0 , t0 ) + H (Dv (x0 , t0 )) 0.
Motivation. If u happens to be a C 1 solution of (1) in some region of Rn (0, ), then in fact vt (x0 , t0 ) + H (Dv (x0 , t0 )) = 0 at any point in that region where u v has a local maximum or minimum. This follows since ut = vt , Du = Dv at such a point. 143
The interest in (2), (3) is consequently the possibility of the inequalities holding at points where u is not C 1 . This is all obviously some kind of vague analogue of the theory from Chapter V. As in that chapter let us motivate (2), (3) by the vanishing viscosity method. So x > 0 and consider the regularized PDE (4)
n u t + H (Du ) = u in R (0, ).
Let us assume that as 0, (5) u u locally uniformly
and further suppose for some v C that u v has a strict local maximum at some point (x0 , t0 ) Rn (0, ). Then u v has a local maximum at a nearby point (x , t ), with (6) (x , t ) (x0 , t0 ) as 0.
As v and our solution u of the regularized problem (4) are smooth, we have (7)
2 2 u t = vt , Du = Dv, D u D v at (x , t ),
the third expression recording the ordering of symmetric matrices. Then

vt (x , t ) + H (Dv (x , t )) = u t (x , t ) + H (Du (x , t )) by (7) = u (x , t ) by (4) v (x , t ) by (7).
Let 0 and recall (6):
vt (x0 , t0 ) + H (Dv (x0 , t0 )) 0.
It is easy to modify this proof if u v has a local maximum which is not strict at (x0 , t0 ). A similar proof shows that the reverse inequality holds should u v have a local minimum at a point (x0 , t0 ). Hence if the u (or a subsequence) converge locally uniformly to a limit u, then u is a viscosity solution of (1). This construction by the vanishing viscosity method accounts for the name.3 We will not develop here the theory of viscosity solutions, other than to state the fundamental theorem of CrandallLions: Theorem Assume that u, u are viscosity solutions of (8)
3
ut + H (Du) = 0 in Rn (0, ) u = g on Rn {t = 0}
In fact, Crandall and Lions originally considered the name entropy solutions.
144
and (9) Then u(, t) u (, t) for each 0 s t. In particular a viscosity solution of the initial value problem (8) is unique. See [C-E-L] for proof (cf. also [E1, 10.2]). B. HopfLax formula For use later we record here a representation formula for the viscosity solution of (1) in the special case that (2) and (3) g : Rn R is bounded, Lipschitz. H : Rn R is convex ut + H (Du) = 0 in Rn (0, ) u = g on Rn {t = 0},
L (Rn )
u t + H (Du ) = 0 in Rn (0, ) u = g on Rn {t = 0}. u(, s) u (, s)
L (Rn )
We have then the HopfLax formula: u(x, t) = infn tL

y R
xy t
+ g (y )
(x Rn , t 0)
for the unique viscosity solution of (1). Here L is the Legendre transform of H : L(q ) = sup {p q H (p)}.
pRn
See [E1, 10.3.4] for a proof. We will invoke this formula in VIII.C. C. A diusion limit In Chapter VIII we will employ viscosity solution methods to study several asymptotic problems, involvingas we will seeentropy considerations. As these developments must wait, it is appropriate to include here a rather dierent application. We introduce for each 145
> 0 a coupled linear rst-order transport PDE, with terms of orders O 1 , O 12 and study under appropriate hypotheses the limit of w as 0. This will be a diusion limit, a sort of twin of the hydrodynamical limit from V.B.5. 1. Formulation Our PDE system is (1) The unknown is w : Rn (0, ) Rm , w = (w1, , . . . , wm, ). Notation. In (1) we are given the matrix C = ((ckl ))mm and also B = diag(b1 , . . . , bm ),
n k k k where the vectors {bk }m k=1 in R are given, b = (b1 , . . . , bn ). In terms of the components of w , our system (1) reads:
1 1 wt + BDw = 2 C w in Rn (0, ).
(2) for k = 1, . . . , m.
k, wt
1 1 + bk Dwk, = 2
ckl wl,
l=1
Remark. We can think of (2) as a variant of the PDE (63) in V.B.5, where the velocity parameterization variable y is now discrete. Thus (1) is a scalar PDE for w = w (x, y, t), y {1, . . . , m}. The left hand side of (1) is for each k a linear, constant coecient transport operator, and the right hand side of (1) represents linear coupling. As 0, the velocity 1 bk on the left becomes bigger and the coupling 12 C on the right gets bigger even faster. What happens in the limit? To answer we introduce some hypotheses, the meaning of which will be revealed only later. Let us rst assume:
m
(3)
ckl > 0 if k = l,
l=1
ckl = 0.
146
It follows from PerronFrobenius theory for matrices that there exists a unique vector = (1 , . . . , m ) satisfying (4) and
m k=1 ckl k
k > 0 (k = 1, . . . , m), = 0.
m k=1
k = 1,
(See for instance Gantmacher [G] for PerronFrobenius theory, and also look at VIII.A below.) We envision as a probability vector on = {1, . . . , m} and then make the assumption of average velocity balance:
m
(5)
k=1
k bk = 0.
2. Construction of diusion coecients Our goal is to prove that for each k {1, . . . , m}, wk, u as 0, u solving a diusion equation of the form:
n
ut
i,j =1
aij uxi xj = 0 in Rn (0, ).
1 = (1, . . . , 1) Rm . Then (2), (3) We must construct the matrix A = ((aij )). First, write 1 say (6) C1 1 = 0, C = 0.
PerronFrobenius theory tells us that the nullspace of C is one-dimensional and so is spanned by 1 1. Likewise spans the nullspace of C . In view of (5), for each j {1, . . . , n}, the m m vector bj = (b1 is perpendicular to the nullspace of C and thus lies in the j , . . . , bj ) R range of C . Consequently there exists a unique vector di Rm solving (7) normalized by our requiring dj 1 1=0 (j = 1, . . . , n). 147 Cdj = bj (j = 1, . . . , n),
m We write dj = (d1 j , . . . , dj ), and then dene the diusion coecients m
(8)
a =
k=1
ij
k k bk i dj
(1 i, j n).
Lemma The matrix A = ((aij )) is nonnegative denite; that is,

n
(9)
i,j =1
aij i j 0 for each Rn .
Proof. Take = (1 , . . . , n ) Rn and write

n
k :=
j =1
dk j j
(k = 1, . . . , m).
Observe further that (7) says

m
(10)
bk i =
l=1
ckl dl i
(1 k m, 1 i n).
Consequently (8) implies

n i,j
aij i j = = =
(11)
n i,j =1 m k,l=1 m k,l=1
m k,l=1
k k ckl dl i i dj j
k ckl l k skl k l ,
for The matrix S = ((skl ))mm
skl :=
k ckl + l clk (1 k, l m). 2 is symmetric, with skl > 0 (k = l) and S1 1=0
owing to (5). Since obviously the entries of 1 1 = (1, . . . , 1) are positive, PerronFrobenius theory asserts that every other eigenvalue of S has real part less than or equal to the eigenvalue (namely 0) associated with 1 1. But as S is symmetric, each eigenvalue is real. So 0 for each eigenvalue of S . Consequently (10) follows from (11). 148
3. Passage to limits Assume now g : Rn R is smooth, with compact support. We introduce the initial value problem for (1): (12)
k, +1 bk Dwk, = 12 wt wk, = g m l, l=1 ckl w
in Rn (0, ) on Rn {t = 0}
for k = 1, . . . , n. Linear PDE theory implies there exists a unique smooth solution w . Theorem As 0, we have for k = 1, . . . , m: (13) wk, u locally uniformly in Rn [0, )
where u is the unique solution of (14) ut

n i,j =1
aij uxi xj = 0 in Rn (0, ) u = g on Rn {t = 0}.
Remark. We are asserting that each component wk, of w converges to the same limit function u : Rn [0, ) R and that u solves the diusion equation (14). Proof. 1. See [E2] for a proof that {w }0<1 is bounded, uniformly continuous on compact subsets of Rn [0, ). Thus we can nd a subsequence r 0 such that wr w locally uniformly, w = (w1 , . . . , wm ). 2. We rst claim that (15) w1 = w2 = = wm
at each point x Rn , t 0, orin other words w = u1 1
149
for some scalar function u = u(x, t). To verify this, take any v Cc (Rn (0, ); Rm ) and observe from (11) that
Rn 0
v (C w )dxdt = O(). v (C w)dxdt = 0
It follows that
Rn
for all v as above and so C w 0 in Rn (0, ). Since the nullspace of C is the span of 1 1, (14) follows. 3. Thus (16) We next claim that (17) u is a viscosity solution of (14). wk,r u locally uniformly (k = 1, . . . , m).
This means that if v C 2 (Rn (0, )) and (18) Then

n
u v has a local maximum (resp. minimum) at a point (x0 , t0 ) Rn (0, ),
(19)
vt (x0 , t0 )
i,j =1
aij vxi xj (x0 , t0 ) 0 (resp. 0).
4. To prove this, let us take v as above and suppose u v has a strict local maximum at some point (x0 , t0 ). Dene then the perturbed test functions v := (v 1, , . . . , v m, ), where
n
(20)
k,
:= v
j =1
dk j v xj
(k = 1, . . . , m),
the constants dk j (1 j n, 1 k m) satisfying (7). Clearly (21) v k,r v locally uniformly (k = 1, . . . , m). 150
Since u v has a strict local maximum at (x0 , t0 ), it follows from (16), (21) that (22) for = r , and (23)
k (xk , t ) (x0 , t0 ) as = r 0, k = 1, . . . , m.
wk, v k, has a local maximum near (x0 , t0 ) k at a point (xk , t ) (k = 1, . . . , m),
Since w and v are smooth functions, it follows from (22) and the PDE (12) that: (24)
k, vt
1 1 + bk Dv k, = 2
ckl w,l
l=1
k at the point (xk , t ), = r . Recalling (20), we conclude from (24) that
vt (x0 , t0 ) (25) = +
n k k i,j =1 bi dj vxi xj (x0 , t0 ) n k k k 1 i=1 bi vxi (x , t ) m 1 ,l k k l=1 ckl w (x , t ) + 2
o(1)
as = r 0, k = 1, . . . , m. l 5. Now since w,l v ,l has its local maximum near (x0 , t0 ) at (xl , t ), we have (26)
l ,l k v ,l )(xk (w,l v ,l )(xl , t ) (w , t ),
= r . Recalling that ckl > 0 for k = l, we can employ the inequalities (26) in (25): vt (x0 , t0 ) + + o(1). But (10) says out. Thus
m l l=1 ckl di n k k i,j =1 bi dj vxi xj (x0 , t0 ) n k k k 1 i=1 bi vxi (x , t ) m 1 ,l l v ,l )(xl , t ) l=1 ckl (w 2
k + v (xk , t )
n i=1
k k dl i vxi (x , t )
= bk i , and so the O
terms in the foregoing expression cancel
vt (x0 , t0 )
n k k i,j =1 bi dj vxi xj (x0 , t0 ) m 1 ,l l v ,l )(xl , t ) l=1 ckl [(w 2
k + v (xk , t )].
Multiply by k > 0 and sum k = 1, . . . , m, recalling (2), (3) to deduce:

n
vt (x0 , t0 )
n i,j =1 k=1
k k bk i dj aij
vxi xj (x0 , t0 )
o(1). 151
Let = r 0 to derive the inequality (19). A simple approximation removes the requirement at u v have a strict maximum, and a similar argument derives the opposite inequality should u v have a minimum at (x0 , t0 ). Commentary. The linear system (12) for each xed > 0 represents a system of linear transport PDE with simple linear coupling. This PDE is reversible in time and yet the diusion equation (14) is not. The interesting question is this: where did the irreversibility come from? Section VIII.A will provide some further insights. See also Pinsky [P] for other techniques, mostly based upon interpreting (12) as a random evolution.
152
CHAPTER 7: Entropy and uncertainty In this and the subsequent chapter we consider various probabilistic aspects of entropy and some implications for PDE theory. The present chapter is a quick introduction to entropy in statistical mechanics. A. Maxwells demon Let us begin with a simple physical situation, consideration of which will soon suggest that there is some kind of connection between entropy, information, and uncertainty.
gas
vacuum
initial state
gas
final state
Take one mole of a simple ideal gas, and suppose it is initially at equilibrium, being held by a partition in half of a thermally insulated cylinder. The initial volume is Vi , and the initial temperature is Ti . We remove the partition, the gas lls the entire cylinder, and, after coming to equilibrium, it has nal volume Vf , nal temperature Tf . What is the change of entropy? According to I.F, we have (1) Si = CV log Ti + R log Vi + S0 Sf = CV log Tf + R log Vf + S0 ,
so being an arbitrary constant. As there is no heat transfer nor work done to or from the exterior, the internal energy is unchanged. Since, furthermore, the energy depends only on the temperature (see I.F), we deduce Ti = Tf . 153
As Vf = 2Vi , we deduce that the change of entropy is Sf Si = R log 2 > 0, in accordance with the Second Law. The mole of gas contains NA molecules, and so (2) change of entropy/particle = k log 2,
since k = R/NA . As the last sentence suggests, it is convenient now to shift attention to the microscopic level, at which the gas can be thought of as a highly complex, random motion of NA molecules. We next imagine that we reinstall the partition, but now with (a) a small gate and (b) a nanotechnology-built robot, which acts as a gatekeeper.
Our robot is programmed to open the door whenever a gas molecule approaches the door from the right, but to close the door if a gas molecule approaches from the left. After our robot has been at work for awhile, we will see more particles in the left region than in the right. This is close to our initial situation.
154
The eect of our tiny robot has thus been to decrease the entropy, with a very small expenditure of energy on its part. We have here an apparent contradiction of the Second Law. Maxwell in 1867 proposed this thought experiment, with an intelligent creature (called Maxwells demon by Kelvin) in place of our nanoscale robot. Generations of physicists have reconsidered this problem, most notably L. Szilard [SZ], who argued that the Second Law is not violated provided the overall entropy of the system increases by k log 2 each time the robot measures the direction of an incoming molecule in order to decide whether or not to open the gate. As (2) presumably implies the entropy decreases by k log 2 once a particle is trapped on the left, the Second Law is saved, providedto repeatwe appropriately assign an entropy to the robots gaining information about molecule velocities. We will not attempt to pursue such reasoning any further, being content to learn from this thought experiment that there seems to be some sort of connection between entropy and our information about random systems. Remark. The book [L-R], edited by Le and Rex, is a wonderful source for more on Maxwells demon, entropy concepts in computer science, etc. See also the website www.math.washington.edu/~hillman/entropy.html. B. Maximum entropy This section introduces a random model for thermal systems and a concept of entropy as a measure of uncertainty. The following is based upon Huang [HU], Jaynes [J], Bamberg Sternberg [B-S]. 1. A probabilistic model A probabilistic model for thermal systems in equilibrium 155
We are given: (i) a triple (, F , ), consisting of a set , a -algebra F of subsets of , and a nonnegative measure dened on F . (We call (, F , ) the system, and the reference measure. A typical point is a microstate.) (ii) the collection of all -measurable functions : [0, ), such that (1)
d = 1.
(We call such a the density of the microstate measure d ) and (iii) a -measurable function (2) X : Rm+1 , X = (X 0 , . . . , X m ).
(We call each X k an observable.) Notation. (3) E (X, ) = X = Xd = expected value of X, given the microstate distribution . Physical interpretation. We think of as consisting of a huge number of microstates , each of which is equivalent to a precise, detailed microscopic description of some physical system, e.g. an exact description of the behavior of all the particles in a mole of gas. The main point is that is not observable physically. We instead model the state of the system by the probability measure d, where satises (1). Thus if E F , then d
E
156
is the probability that the true (but unobserved) microstate is in E , given the density . Our goal is to determine, or more precisely to estimate , given certain macroscopic physical measurements. These we model using the observables X 0 , . . . , X m .
X=(X0,..,Xm)

Rm+1
Given as above, we assume that we can physically measure the values (4) E (X, ) = X = X.
= (X 0, . . . , X m ) as lying in some region Rm+1 , which we may Think of the point X thus corresponds to m + 1 physical interpret as the macroscopic state space. A point X measurements, presumably of extensive parameters as in Chapter I. To accord with the notation from I.A, we will often write (5) E = X0 . = (X 0, X 1, . . . , X m ), The fundamental problem is this. Given the macroscopic measurements X there are generally many, many microstate distributions satisfying (4). How do we determine the physically correct distribution? 2. Uncertainty To answer the question just posed, let us rst consider the special case that (6) = {1 , . . . , N } is a nite set, F = 2 , and is counting measure.
157
Then each distribution as above corresponds to our assigning (i ) = pi (i = 1, . . . , N ), where

N
(7)
0 pi 1 (i = 1, . . . , N ),
i=1
pi = 1
Thus pi is the probability of i . We propose now to nd a function S = S (p1 , . . . , pN ) which somehow measures the uncertainty or disorder inherent in the probability distribution {p1 , . . . , pN }. Let us imagine S is dened for all N = 1, 2, . . . and all N -tuples {p1 , . . . , pN } as above. We will ordain these Axioms for S : A. Continuity For each N , the mapping (p1 , . . . , pN ) S (p1 , . . . , pN ) is continuous. B. Monotonicity The mapping N S C. Composition For each N and each probability distribution (p1 , . . . , pN ), set q1 = p1 + + pk1 , . . . , qj = pkj1 +1 + + pkj , . . . where 1 = k0 k1 k2 kM = N . Then (8) S (p1 , . . . , pN ) = S (q1 , . . . , qM ) +
M j =1 qj S (pkj 1 +1 /qj , . . . , pkj /qj ). 1 1 ,..., N N
is monotonically increasing.
Probabilistic interpretation The monotonicity rule B says that if all the points in = {1 , . . . , N } have equal 1 probability N , then there is more uncertainty the bigger N is. The composition rule C applies if we think of subdividing = M i=1 j , where j = {kj 1 +1 , . . . , kj }. Then qj is the probability of the event j . Further (8) says that the uncertainty inherent in the distribution {p1 , . . . , pN } on should equal the uncertainty of the induced probability distribution {q1 , . . . , qM } on {1 , . . . , M } plus the average of the uncertainties within each j . This last expression is the sum on j of qj , the probability of j , times S computed for the induced probability distribution {pkj1 +1 /qj , . . . , pkj /qj } on j . If some qj = 0, we omit this term. 158
Lemma. The axioms imply S has the form

N
(9) for some positive constant K .
S (p1 , . . . , pN ) = K
i=1
pi log pi
Proof. 1. We follow Jaynes [J]. Suppose S satises Axioms AC, and dene (10) 1 1 A(N ) := S N , . . . , N . N terms pi = and (11) qj = nj N (j = 1, . . . , M ), 1 N (i = 1, . . . , N ),
Take
where {nj }M j =1 are integers satisfying

M
(12)
j =1
nj = N.
Then (8) implies S 1 1 ,..., N N

M
= S (q1 , . . . , qM ) +
j =1
qj S
1 1 ,..., nj nj
In terms of (10), this equality reads

M
(13) Now select N of the form
A(N ) = S (q1 , . . . , qM ) +
j =1
qj A(nj ).
N = M L, and set nj = L for j = 1, . . . , M . Then (13) implies A(M L) = S

1 1 ,..., M M
M 1 j =1 M S
1 1 ,..., L L
= A(M ) + A(L). 159
Thus in particular (14) A(N a ) = aA(N ) for positive integers a, N.
Axiom B implies then A(N ) > 0 (N = 2, . . . ). 2. We claim that in fact (15) A(N ) = K log N (N = 1, . . . ) for some positive constant K .
To prove this, let M, N be any integers greater than one. Given any large integer ak , choose the integer bk so that (16) Then bk log M ak log N (bk + 1) log M, and so (17) ak log M log N bk 1+ 1 bk log M . log N M bk N ak M bk+1 .
Now since N A(N ) is increasing according to Axiom B, (16) and (14) imply bk A(M ) ak A(N ) (bk + 1)A(M ). Then, since A(M ) > 0, (18) A(N ) bk + 1 bk . ak A(M ) ak
Sending ak and thus bk , we conclude from (17), (18) that log N A(N ) = A(M ) log M (M, N 2).
This identity implies A(N ) = K log N for some constant K , and necessarily K > 0 in light of Axiom B. This proves (15). 3. Now drop the assumption that nj = L (j = 1, . . . , M ). We then deduce from (11)(15) that nj 1 M S n , . . . , nN = A(N ) M j =1 N A(nj ) N = K log N = K 160
M nj j =1 N nj M nj j =1 N log N
log nj
provided n1 , . . . , nM are nonnegative integers summing to N . In view of Axiom A, formula (9) follows. We henceforth agree to take K = k , Boltzmanns constant, this choice being suggested by the physical calculations in V.A.2. Thus
N
S (p1 , . . . , pN ) = k
i=1
pi log pi
provided 0 pi 1 (i = 1, . . . , N ), N i=1 pi = 1. Return now to the general probabilistic model in 1 for a thermal system in equilibrium. Motivated both by the above formula and our earlier study of Boltzmanns equation in V.A.2, we hereafter dene (19) S () = k
log d
to be the entropy of the microstate density , with respect to the reference measure . We interpret S () as measuring the uncertainty or disorder inherent in the probability distribution d . C. Maximizing uncertainty We can now provide an answer to the question posed at the end of 1, namely how to select the physically correct microstate distribution satisfying the macroscopic constraints (20) k E (X k , ) = X (k = 0, . . . , m)?
Here is the idea: Since all we really know about are these identities, we should select the distribution which maximizes the uncertainty (= entropy) S (), subject to the constraints (20). Remark. This is both a principle of physics (that we should seek maximum entropy congurations (I.C.6)) and a principle of statistics (that we must employ unbiased estimators). See Jaynes [J] for a discussion of the latter. We analyze the foregoing entropy maximization principle by introducing the admissible class: (21) A= : [0, ) | is -measurable,
, d = 1, E (X, ) = X
161
= (X 0, . . . , X m ) is given. Recall that we write X : Rm+1 , X = (X 0 , . . . , X m ). where X Theorem. (i) Assume there exist Rm+1 and Z > 0 such that (22) belongs to A. Then (23) S ( ) = max S ().
A
e X Z
(ii) Any other maximizer of S () over A diers from only on a set of -measure zero. Remark. Observe (24) Z=
e X d.
Proof. 1. First note that (x) := satises (x) = Hence
1 + log x x
(x > 0)
1 1 + 2 x x
> 0 if x 1 < 0 if 0 < x 1.
(x) (1) = 1 for all x > 0, and so (25) (x) := x log x x + 1 0 (x > 0),
with equality only for x = 1. 2. Dene by (20) and take A. Then (26) log + log on ,
since this inequality is equivalent to log +1= In view of (25) then, (26) holds. 3. Integrate (26) over : (27)
0.
log d
log d.
162
But in light of (21): log = log Z X, and so
log d = log Z Xd = log Z X, log d = log Z X = log d.
since A. Since A as well,
Consequently (27) implies (28) S () S ( ).
We have a strict inequality here unless we have equality in (26) a.e., and this in turn holds only if = a.e. D. Statistical mechanics 1. Microcanonical distribution We consider in somewhat more detail rst of all the case that there are no observables, in which case we deduce from Theorem 1 in B that the entropy S () is maximized by the constant microstate distribution (1) where (2) Z = (). 1 , Z
Thus each microstate is equally probable. This is the microcanonical distribution (a.k.a. microcanonical ensemble). Example 1. Let us take to be a nite set, = {1 , . . . , N }, and take the reference measure to be counting measure. Then (1), (2) imply Z = N , 1 (i ) = N for i = 1, . . . , N . The entropy is S ( ) = k N i=1 (i ) log( (i )) = k log N. 163
This formula is usually written (3) where W = N = ||. Example 2. Assume we are given a smooth function H : Rn R, called the Hamiltonian. Fix a number E R and set E = {x Rn | H (x) = E }. We assume that E is a smooth, (n 1)-dimensional surface in Rn . Consider now the energy band = {x Rn | E H (x) E + } for small > 0 and note | | =
E + E {H =t}
S = k log W
dS |DH |
dt
according to the Coarea Formula. (See [E-G].) It follows that | | = 0 2 lim dS , |DH |
assuming E is a smooth surface and |DH | > 0 on E . Here dS is (n 1)-dimensional surface measure. Now take d = The entropy is then (4) (E ) = k log (E ). 1 dS, |DH | (E ) =
E
dS . |DH |
Physical interpretation. The Hamiltonian gives us the energy of each microstate. We are here assuming our system is thermally isolated, so that all attainable microstates lie on the energy surface {H = E }, where E is the macroscopic energy. We can, as in Chapter I, dene the temperature T by S 1 = . T E 1 dS depends not only on the geometry of Remark. Notice that our choice d = |DH | the level set {H = E }, but also on |DH |. Another plausible possibility would therefore be simply to take d = dS . 164
The issue here is to understand the physical dierences between taking the hard constraint H = E versus the limit as 0 of the softer constraints E H E + . The expository paper [vK-L] by van Kampen and Lodder discusses this and some related issues. 2. Canonical distribution Next we apply the results of B to the case that we have one observable X 0 = H , where H:R is the Hamiltonian. As before we write E for the macroscopic energy: (5) E= H .
We now invoke Theorem 1 from B to deduce that the entropy S () is maximized by the microstate distribution (6) where (7) Z=
eH for some R, Z
eH d.
This is the canonical distribution (a.k.a. Gibbs distribution, canonical ensemble). We assume the integral in (7) converges. Physical interpretation. We should in this context imagine our system as not being thermally isolated, but rather as being in thermal contact with a heat reservoir and so being held at a constant temperature T . In this setting energy can be transferred in and out of our system. Thus the energy level H of the various microstates is not constant (as in Example 2 in 1) but rather its average value H = E is determined, as we will see, by T. Example. Let us take = R3 , to be Lebesgue measure, and H= m 2 |v | 2 (v R3 ).
H is the kinetic energy of a particle with mass m > 0, velocity v . Then canonical distribution is then m|v |2 1 = e 2 . Z 165
But this is essentially the Boltzmann distribution (43) from V.A.2, with the macroscopic velocity v = 0, the macroscopic particle density n = 1, and = being the temperature. 3. Thermodynamics We next show how to recover aspects of classical equilibrium thermodynamics (as in Chapter I) from the canonical distribution (6), (7). The point is that all the relevant information is encoded within (8) Z=
1 , k
eH d.
We regard (8) as a formula for Z as a function of and call Z the partition function. Remember H : R. Denitions of thermodynamic quantities in terms of and Z . We dene (i) the temperature T by the formula (9) (ii) the energy (10) (iii) the entropy (11) and (iv) the free energy (12) 1 F = log Z. S = k (E + logZ ), E= (log Z ), = 1 , kT
Note carefully: we regard (10)(12) as dening E, S, F as functions of . We must check that these denitions are consistent with everything before: 166
Theorem. (i) We have (13) E= H ,

1 H e . Z
the expected value of H being computed with respect to the canonical distribution = (ii) Furthermore (14) and (15) (iii) Finally, (16) and (17) F = S. T F = E ST 1 S = . E T S = S ( ) = k
log d,
Proof. 1. Using the denition (10) we calculate

1 Z = E = Z 1 Z
HeH d
= This is assertion (i) of the Theorem. 2. We compute as well
Hd = H .
S ( ) = k log d k = Z eH (H log Z )d = kE + k log Z. From (11) we conclude that the identity (14) is valid. Furthermore
S E
1 1
= k E + E + = k by (10) 1 = T by (9). 167
(log Z )
3. Since (12) says log Z = F , formulas (9), (11) imply F = E T S . This is (16). We next rewrite (12) to read: (18)
eH d = eF
and so e (F H ) d = 1.
Dierentiate with respect to , recalling that F is a function of : F H +
e (F H ) d = 0.
Thus (13), (18) imply (19) Now since = (20) Therefore Then (19) says F =E+T . Owing to formula (16), we deduce S = F T Remark. We can dene as well the heat capacity (21) Thus (10), (20) say (22) CV = E , T CV = k 2 2 (log Z ). 2 F . T
1 , kT
F E+
F = 0.
1 = 2 . k T 1 F F F = = T . k T T
consistently with classical thermodynamics. Now since E= H = 1 Z HeH d =

He (F H ) d,
168
we have (E H )e (F H ) d = 0.
Dierentiate with respect to : E + But F + F = E TS (E H ) F H +

1 F k T
e (F H ) d = 0.
= E and so
2
EH Rewriting, we obtain the formula (23)
E 2 (log Z ). = 2
2
EH
= kT 2 CV ,
the average on the left computed with respect to the canonical density. This is a probabilistic interpretation of the heat capacity, recording the variance of the microstate energy H from its macroscopic mean value E = H . Remark. Finally we record the observations that the mapping (24) is uniformly convex, (25) and (26) log Z = max E +
E
log Z = log
eH d
S = min k (E + log Z ),
S k
where in (25), (26) we regard S = S (E ), Z = Z ( ). 2 2 Indeed, we just computed > 0 (unless H E ). Consequently the 2 (log Z ) = E H minimum on the right hand side of (25) is attained at the unique for which (log Z ) = E, in accordance with (10). Thus (25) follows from (11). Formula (26) is dual to (25). Observe that (25), (26) imply: E S is concave, log Z is convex. 169
CHAPTER 8: Probability and PDE This chapter introduces some further probabilistic viewpoints that employ entropy notions. These in turn give rise in certain settings to various PDE. A. Continuous time Markov chains We begin however with a simple system of linear ODE. This example both illustrates how entropy controls convergence to equilibrium for Markov chains and also partially answers the question left open in VI.C (about the appearance of irreversibility in the diusion limit). Continuous time Markov chain We are given (i) a nite set (called the state space) and (ii) a function p : [0, ) [0, 1] such that (1) t p(t, , ) is C 1 (, ), 1 = 0 otherwise,
(2)
p(0, , ) = ( ) =
(3)

p(t, , ) = 1,
and (4) p(t + s, , ) =

p(t, , )p(s, , ).
We call p a Markov transition function and (4) the ChapmanKolmogorov formula. 1. Generators and semigroups Denitions. (i) Dene c:R 170
by (5) (ii) We further write (6) d(, ) = c(, ) if = 0 if = . c(, ) = lim

t0
p(t, , ) p(0, , ) . t
Remark. Owing to (2), (3), c(, ) 0 if = ,
c(, ) = 0.
Thus d(, ) 0. Denitions. (i) If f : R, we dene Lf : R by [Lf ]( ) = (7)
c(, )f ( ), or equivalently [Lf ]( ) =
d(, )(f ( ) f ( ))
( ).
We call L the generator of the Markov process. (ii) We dene also the semigroup {S (t)}t0 generated by L by (8) [S (t)f ]( ) =
p(t, , )f ( ).
Probabilistic interpretation. Think of a randomly jumping particle whose position at time t 0 is X (t) . Thus {X (t)}t0 is a stochastic process and we may interpret p(t, , ) as the probability that X (t) = , given that X (0) = . According to (2), (5) p(t, , ) = ( ) + tc(, ) + o(t) as t 0; and so if = , c(, ) is the rate of jumps/unit time from to . Furthermore [S (t)f ]( ) = E (f (X (t)) | X (0) = )), 171
the expected value of f (X (t)), given that X (0) = . Owing to (4) {X (t)}t0 is a continuous time Markov process, with generator L. Properties of S (t). For s, t 0, we have (9) S (0) = I, S (t + s) = S (t)S (s) = S (s)S (t), d (S (t)f ) = L(S (t)f ) = S (t)(Lf ), S (t)1 1=1 1, dt
where 1 1 denotes the function identically equal to 1 on . Denition. Let be a probability measure on . We dene then the probability measure S (t) by requiring (10)
S (t)f d =
f dS (t)
(t 0)
for each f : R. We call {S (t)}t0 the dual semigroup. Now x a reference probability measure on , with ( ) > 0 for all . Notation. Given a probability measure , write (11) so that (12) Lemma. We have (13) = L t on [0, ), [S (t)]( ) = (, t) ( ) ( , t 0). dS (t) (, t) = ; d
where L is the adjoint of L with respect to . We call (13) the forward equation. Proof. Let us rst note
f d = =
f dS (t) S (t)f d.
172
Thus
f d t
= = = = =
d dt
S (t)f d S (t)Lf d Lf dS (t) Lf (, t)d f L d.
This identity is valid for all f : R and so (13) follows. 2. Entropy production Denition. We say the probability measure is invariant provided (14) S (t) = for all t 0.
Remark. It is easy to see that is invariant if and only if (15)
Lf d = 0
or, equivalently, (16)
S (t)f d =
f d
for all f : R.
We wish to identify circumstances under S (t) converges to an invariant measure as t . The key will be certain estimates about the rate of entropy production. Denition. Let be a probability measure on , with ( ) > 0 for each . If is another probability measure, we dene the entropy of with respect to to be (17) where = d/d. Remark. Since is nite, (17) says (18) H (, ) =
H (, ) =
log d,
log
( ) ( )
( ).
173
Clearly H (, ) is continuous.
Lemma. Let be an invariant probability measure, with ( ) > 0 for all . Take to be any probability measure. Then (19) Proof. 1. Write (x) := x log x x + 1 As noted in VII, (20) is convex, 0 for x 0. (x 0). d H (S (t), ) 0. dt
2. Take any and set (, t) = dS (t)/d . Assume (, t) > 0. Then

d H (S (t), ) dt
= = =
d dt
log d + t (L ) log d +
log d
t d L d.
Now L d =

g (L1 1)d = 0,
owing to (7). Thus

d H (S (t), ) dt
= =
L(log )d
(, t)
, ,
d(, ) log
(,t) (,t)
( )
= = + The last expression is

,
(,t) d(, ) log (,t)
(,t) (,t)
( )(, t)
d(, )
(,t) (,t)
( )(, t)
d(, )((, t) (, t) ( )).
L(, t)d = 0,
since is invariant. Since 0, estimate (19) results. If is not everywhere positive, we omit the sites where = 0 in the foregoing calculation.
174
Remark. We call (21) G(t) =

,
d(, )
(, t) (, t)
( )(, t) 0
the rate of entropy production at time t. Then d H (S (t), ) = G(t) 0. dt 3. Convergence to equilibrium Denition. The Markov chain is called irreducible if (22) p(t, , ) > 0 (t > 0, , ) .
Remark. It is straightforward to show that the Markov chain is irreducible if and only if for each pair , , = there exists a path = 0 , 1 , . . . , m = with d(i , i+1 ) > 0 (i = 0, . . . , m 1). Theorem. Assume the Markov chain is irreducible. Then (i) there exists a unique invariant probability measure > 0, and (ii) for each probability measure , (23)
t
lim S (t) = .
Proof. 1. First we build . Fix any site 0 and write 1 (t, ) = t Dene then (24) ( ) := lim (tk , )
tk t
p(s, 0 , )ds
0
(t > 0).
175
the sequence tk selected so that the limit (24) exists for each . Clearly (25) 0 ( ) 1,
( ) = 1.
2. In addition, if h > 0 the ChapmanKolmogorov formula (4) implies

1 t t 0
p(s, 0 , )ds p(h, , ) = = =
1 t 1 t 1 t
t 0
p(s + h, 0 , )ds
t+h t
t+h p(s, 0 , )ds h t p(s, 0 , )ds + 1 t 0
p(s, 0 , )ds
1 t
h 0
p(s, 0 , )ds.
Let t = tk and recall (24): (26)
( )p(h, , ) = ( )
( ).
Then (25) and the irreducibility condition (22) imply (27) ( ) > 0 for each .
Next dierentiate (26) with respect to h and set h = 0: ( )c(, ) = 0
( ).
This identity implies Lf d =

,
c(, )f ( ) ( ) = 0
for all f : R and so is an invariant measure. 3. Next x any and dene ( ) = Then (28) According to the Lemma, (29) t H (p(t, , ), ) is nonincreasing in t. 176 p(t, , ) = S (t) . 1 if = 0 if = .
Now select any sequence tl such that the limit (30) ( ) := lim p(tl , , )
tl
exists for each . Then (29) implies inf t0 H (p(t, , ), ) = limt H (p(t, , ), ) = limtl H (p(tl , , ), ) = H (, ). Also S (t) = limtl S (t)S (tl ) = limtl S (t + tl ) = limtl p(t + tl , , ). H (S (t), ) = limtl H (p(t + tl , , ), ) = H (, ).
Consequently
Thus (31) Set t H (S (t), ) is constant. (, t) = dS (t)/d > 0.
Then (21), (31) imply the rate of entropy production G(t) =

,
d(, )
(, t) (, t)
( )(, t) = 0
for t 0. Since , > 0, we have

(,t) (,t)
= 0 and so (, t) = (, t)
for each , with d(, ) > 0. The Remark after (22) thus implies (, t) is constant and so S (t) = for all t 0. So = and therefore S (t) 177
for each . Assertion (23) follows. B. Large deviations 1. Thermodynamic limits
We turn next to the theory of large deviations, which will provide links between certain limit problems in probability, statistical mechanics, and various linear and nonlinear PDE. Physical motivation. To motivate the central issues, let us rst recall from Chapter VII these formulas for the canonical distribution: 1 F = log Z, Z = eH d.
Now it is most often the case in statistical mechanics that we are interested in a sequence of free energies and partition functions: (1) 1 FN = log ZN , ZN = eHN dN
N
(N = 1, 2, . . . ).
Typically (1) represents the free energy and partition function for a system of N interacting particles (described by microstates in the system (N , FN , N ), with Hamiltonian HN : N R). We often wish to compute the limit as N of the free energy per particle: (2) f ( ) = lim 1 1 FN = lim log ZN . N N N N
Understanding in various models the behavior of the mapping f ( ) is a central problem in statistical mechanics; see for instance Thompson [T, 3.6]. We call (2) a thermodynamic limit. To help us understand the mathematical and physical issues here, let us rewrite ZN =
N
eHN dN =
eN dPN ,
1 H N N
where the state space is R1 and PN is the distribution of (3) for R1 . Setting = PN (, ] = N N |
on ; that is,
HN ( ) N
1 , N
178
we recast (2) as (4) 1 f ( ) = lim log 0 e dP .
Let us next suppose that for small > 0: (5) dP eI/ dQ
in some unspecied sense, where I : R, Q is a reference measure on . Inserting (5) into (4), we may expect that lim log
I ()
dQ
= sup( I ( )).
Consequentlysupposing the foregoing computations are somehow legitimatewe deduce (6) f ( ) = 1 inf ( + I ( )) ( R).
What is the physical meaning of this formula? First note that the energy per particle is
1 E N N
= = = =
1 N 1 N
HN
N HN eHN d ZN N N eN dP ZN e dP Z I () e dQ . Z
Hence as N , we may expect (7) where (8) e + I (e) = inf ( + I ( )).
EN e, N
Let us therefore interpret e as the energy in the thermodynamic limit. Next recall from VIII.C.3 the formula log Z = sup E +
E
S k
179
and remember
1 F = log Z. S k
Thus (9) On the other hand, (6) says (10) f = sup( I ( )).
F = sup E +
E
In view of (8)(10) we might then conjecture that S denoting the entropy and deduce then (11)
S k
= I ,
the Legendre transform. As S is presumably concave, we S = kI .
Should I also be convex, then (11) reduces to (12) S = kI.
In this case our supposition (5), which we now rewrite as (13) dP e k dQ,
S
says that entropy S controls the asymptotics of {P }0<1 in the thermodynamic limit and that the most likely states for small > 0 are those which maximize the entropy. 2. Basic theory We now follow DonskerVaradhan (see e.g. Varadhan [V], DemboZeitouni [D-Z], etc.) and provide a general probabilistic framework within which to understand and generalize the foregoing heuristics. a. Rate functions Notation. Hereafter denotes a separable, complete, metric space, 180
and {P }0<1 is a family of Borel probability measures on . Denition. We say that {P }0<1 satises the large deviation principle with rate function I provided (i) I : [0, ] is lower semicontinuous, I +, (ii) for each l R, (14) the set { | 0 I ( ) l} is compact,
(iii) for each closed set C , (15) lim sup log P (C ) inf I
0 C
and (iv) for each open set U , (16) lim inf log P (U ) inf I.
0 U
Remarks. (i) If E is a Borel subset of for which inf I = inf I = inf I, 0

E E E
then
0
lim log P (E ) = inf I.

E
This gives a precise meaning to the heuristic (5) from the previous section (without the unnecessary introduction of the reference measure). (ii) The rate function I is called the entropy function in Ellis [EL]. This book contains clear explanations of the connections with statistical mechanics. (iii) It is sometimes convenient to consider instead of 0 an index n . Thus we say {Pn } n=1 satises the large deviation principle with rate function I if 1 log Pn (C ) inf C I (C closed) lim supn n and (17) 1 lim inf n n log Pn (U ) inf U I (U open). b. Asymptotic evaluations of integrals 181
We now make clearer the connections between large deviation theory and the heuristics in 1. Theorem 1. Let {P }0<1 satisfy the large deviation principle with rate function I . Let g:R be bounded, continuous. Then (18)
0
lim log
e dP
= sup(g I ).
Proof. 1. Fix > 0. We write =
Ci ,
i=1
where each Ci is closed and the oscillation of g on Ci is less than or equal to (i = 1, . . . , N ). (Assuming without loss that g 0, we can for instance take Ci = { | (i 1) g ( ) i }.) Then
e dP
N i=1 N i=1
Ci Ci
e dP e
gi +
dP ,
where gi = inf g
Ci
(i = 1, . . . , N ).
gi +
Thus log
e dP
log N max1iN e = log N + max1iN
P (Ci ) + log P (Ci ) ,
gi +
and so (15) implies lim sup0 log
e dP
max1iN [(gi + ) inf Ci I ] max1iN supCi (g I ) + = sup (g I ) + .
Consequently (19) lim sup log

0
e dP 182
sup(g I ).
2. Again x > 0. There exists with (20) g ( ) I ( ) sup(g I ) . 2
Since g is continuous, there exists an open neighborhood U of such that (21) Then lim inf 0 log
g ( ) g ( )
for U. 2 e dP
g ( ) 2 g
e dP
lim inf 0 log lim inf 0 log =
e U
dP
by (21)
g ( ) 2 + lim inf 0 P (U ) inf U I by (16) g ( ) 2 g ( ) I ( ) 2 sup (g I ) by (20).
Hence lim inf log

0
e dP
sup(g I ).
This bound and (19) nish the proof of (18). We will require in the next section the converse statement.
Theorem 2. Assume the limit (18) holds for each bounded, Lipschitz function g : R, where I : [0, ] is lower semicontinuous, I +. Suppose also the sets {0 I l} are compact for each l R. Then {P }0<1 satises the large deviation principle with rate function I . Proof. We must prove (15), (16) 1. Let C be closed and set (22) gm ( ) = max(m, m dist(, C )) (m = 1, . . . ).
Then gm is bounded, Lipschitz and log
egm / dP
log P (C ).
Thus
lim sup0 log P (C ) lim0 log egm / dP = sup (gm I ). 183
Let m , noting from (21) that sup(gm I ) sup(I ) = inf I,

C C
since C is closed and I is lower semicontinuous. The limit (15) is proved. 2. Next suppose U open and take Ck = U | dist(, U ) 1 k .
Then Ck U , CK is closed (k = 1, . . . ). Dene gm ( ) = k max Note that m gm 0, gm = 0 on Ck , gm = m on U. Then log
m , m dist(, Ck ) . k
egm / dP
log(P (U ) + em/ P ( U )) log(P (U ) + em/ ).
Thus (23) But supCk (I ) sup (gm I ) = lim0 log egm / dP lim inf 0 log(P (U ) + em/ ). log(P (U ) + em/ ) log(2 max(P (U ), em/ ) = log 2 + max( log P (U ), m), lim inf log(P (U ) + em/ ) max lim inf log P (U ), m .
0 0
and so
Combining this calculation with (23) and sending m , we deduce inf I lim inf log P (U ).
Ck 0
Since U =
k=1
Ck , the limit (16) follows.
C. Cramers Theorem In this section we illustrate the use of PDE methods by presenting an unusual proof, due to R. Jensen, of Cramers Theorem, characterizing large deviations for sums of i.i.d. random variables. 184
More precisely, take (, F , ) to be a probability space and suppose m (k = 1, . . . ) Yk : R are independent, identically distributed (1) random variables. We write Y = Y1 and assume as well that (2) the exponential generating function Z = E (epY ) is nite for each p Rm ,
where E () denotes expected value. Thus Z=
epY d.
We turn attention now to the partial sums (3) Sn = Y1 + + Yn n
and their distributions Pn on = Rm (n = 1, . . . ). Next dene (4) that is, (5) F (p) = log E (epY ) = log
F := log Z,
epY d .
We introduce also the Legendre transform of F : (6) L(q ) = sup (p q F (p))

pRm
(q Rm )
which turns out to be the rate function: Theorem. The probability measures {Pn } n=1 satisfy the large deviation principle with rate function I () = L(). Remark. By the Law of Large Numbers Sn E (Y) =: y a.s. as n . 185
As we will compute in the following proof, DF (0) = y and so DL(y ) = 0, provided L is smooth at y . Hence x L has its minimum at y , and in fact L(y ) = 0 L(x) > 0 (x = y ) Take a Borel set E . Assuming inf L = inf L = inf L, 0
E E E
we deduce
Pn (E ) = en( inf E L+o(1)) .
, Pn (E ) 0 exponentially fast as n . So if y /E Proof. 1. Write Y = (Y 1 , . . . , Y m ). Then for 1 k, l m:

F pk 2F pk pl E (Y k epY ) , E (epY ) k l p Y E (Y Y e ) E (epY )
= =
E (Y k epY )E (Y l epY ) . E (epY )2
Thus if Rm ,
m k,l=1
Fpk pl k l =
E ((Y )2 epY )E (epY )E ((Y )epY )2 E (epY )2
0, since Hence (7) Clearly also (8) F (0) = log E (e0 ) = 0. p F (p) is smooth, convex. E ((Y )epY ) E ((Y )2 epY )1/2 E (epY )1/2 .
Dene L by the Legendre transformation (6). Then L(q ) = sup(p q F (p)) F (0) = 0
p
186
for all q , and so (9) In addition L : Rm [0, ] is convex, lower semicontinuous. L(q ) = |q | |q | lim
(cf. [E1, III.3]), and thus for each l R, (10) the set {q Rm | 0 L(q ) l} is compact.
2. Next take g : Rm R to be bounded, Lipschitz. We intend to prove (11)

n
lim
1 log n
Rm
eng dPn
= sup(g L),
Rm
+ Yn where Pn is the distribution of Sn = Y1 + on Rm . The idea is to associate somehow the n left hand side of (11) with the unique viscosity solution of the HamiltonJacobi PDE
(12)
ut F (Du) = 0 in Rm (0, ) u = g on Rm {t = 0}.
The right hand side of (11) will appear when we invoke the HopfLax formula for the solution of (12). To carry out this program, we x any point x Rm and then write tk = k/n We dene (13) where (14) 3. We rst claim: (15) wn (x, tk+1 ) = E wn x + Yk+1 , tk n (k = 0, . . . ). hn := eng . wn (x, tk ) := E hn Y1 + + Yk +x n , (k = 0, . . . ).
187
This identity is valid since the random variables {Yk } k=1 are independent. Indeed, wn (x, tk+1 ) = E hn
Y1 ++Yk n
Yk+1 n
+x +x Yk+1
= E E hn = E wn
Y1 ++Yk n
+ ,
Yk+1 n
Yk+1 n
+ x, tk
the last equality holding by independence. More precisely, we used here the formula E ((X, Y ) | Y ) = (Y ) a.s., where X, Y are independent random variables, is continuous, bounded, and (y ) := E ((X, y )). See, e.g., Breiman [BN, 4.2]. 4. Next dene (16) un (x, tk ) := 1 log wn (x, tk ) n
for n = 1, . . . , k = 0, . . . . We assert next that (17) un

L
L ,
Dun
Dg
L ,
D as usual denoting the gradient in the spatial variable x. Let us check (17) by rst noting from (13), (14) that wn L hn L = en g L .
k . Then for a.e. x Rm we may The rst inequality in (17) follows. Now x a time tk = n compute from (13), (14) that + Yk Dwn (x, tk ) = E Dhn Y1 + +x n X1 ++Yk +x = nE Dghn n
Consequently (18) |Dwn | n Dg = n Dg

L E
hn L wn .
Y1 ++Yk n
+x
Recalling (16) we deduce the second inequality in (17).
188
Next take a point x Rm and compute: wn (x, tk+1 ) = E e

ng
Y1 ++Yk+1 +x n
E e = E e by independence. Thus (19)
ng
Y1 ++Yk n
+x + Dg
L |Yk+1 |
ng
Y1 ++Yk n
+x
E e
Dg
L |Yk+1 |
wn (x, tk+1 ) wn (x, tk )E e
Dg
L |Y |
for Y = Y1 , as the {Yk } k=1 are identically distributed. Assumption (2) implies E e Therefore (16), (19) imply: un (x, tk+1 ) un (x, tk ) and a similar calculation veries that 1 un (x, tk+1 ) un (x, tk ) C. n Consequently (20) |un (x, tk ) un (x, tl )| C |tk tl | (k, l 1). 1 C, n
Dg
L |Y |
=: eC < .
5. Extend un (x, t) to be linear in t for t [tk , tk+1 ] (k = 0, . . . ). Then estimates (17), (20) imply there exists a sequence nr such that (21) unr u locally uniformly in Rm [0, ).
Obviously u = g on Rm {t = 0}. We assert as well that u is a viscosity solution of the PDE (22) ut F (Du) = 0 in Rm (0, ).
To verify this, we recall the relevant denitions from Chapter VI, take any v C 2 (Rm (0, )) and suppose u v has a strict local maximum at a point (x0 , t0 ). 189
We must prove: (23) vt (x0 , t0 ) F (Dv (x0 , t0 )) 0.
We may also assume, upon redening v outside some neighborhood of (x0 , t0 ) if necessary, that u(x0 , t0 ) = v (x0 , t0 ), (24) sup |v, Dv, D2 v | < ,
and v > sup(un ) except in some region near (x0 , t0 ). In view of (21) we can nd for n = nr n points (xn , tkn ), tkn = kn , such that (25) and (26) Write (27) Then for n = nr : en(n +v(xn ,tkn )) = enun (xn ,tkn ) = wn (xn , tkn ) by (16) = E wn xn + = E e E e
nun xn + Ykn , tkn 1 n ,tkn 1
Yk n n
un (xn , tkn ) v (xn , tkn ) =
xRm ,k=0,...
max
[un (x, tk ) v (x, tk )]
(xn , tkn ) (x0 , t0 ) as n = nr .
n := un (xn , tkn ) v (xn , tkn ).
by (15) by (16)
Yk n n
n n +v xn +
,tkn 1
the last inequality holding according to (25), (27). Thus env(xn ,tkn ) E env(xn + n ,tkn 1 )
Y
for n = nr . Now v xn + Y , tk 1 n n = v (xn , tkn 1 ) + Dv (xn , tkn 1 ) Y + n , n
190
where (28) Thus and hence (29) v (xn , tkn ) v (xn , tkn 1 ) log E eDv(xn ,tkn 1 )Y+nn . 1/n lim nn = 0 a.s., n := v xn + Y , tk 1 n n v (xn , tkn 1 ) Dv (xn , tkn 1 ) Y . n
env(xn ,tkn ) envn (xn ,tkn 1 ) E eDv(xn ,tkn 1 )Y+nn ,
Now (24), (27) imply

n
and furthermore
eDvY+nn eC |Y| .
Our assumption (2) implies also that E (eC |Y| ) < . We consequently may invoke the Dominated Convergence Theorem and pass to limits as n = nr : vt (x0 , t0 ) log E eDv(x0 ,t0 )Y = F (Dv (x0 , t0 )). This is (23), and the reverse inequality likewise holds should u v have a strict local minimum at a point (x0 , t0 ). We have therefore proved u is a viscosity solution of (22). Since u = g on Rm {t = 0}, we conclude that u is the unique viscosity solution of the initial value problem (12). In particular un u. 6. We next transform (12) into a dierent form, by noting that u = u is the unique viscosity solution of (30) for (31) (p) = F (p). g = g, F (Du u t + F ) = 0 in Rm (0, ) u =g on Rm {t = 0}
Indeed if u v has a local maximum at (x0 , t0 ), then u v has a local minimum, where v = v . Thus (32) vt (x0 , t0 ) F (Dv (x0 , t0 )) 0, 191
since u is a v is viscosity solution of (22). But Dv = Dv , vt = v t . Consequently (32) says (Dv (x0 , t0 )) 0. v t (x0 , t0 ) + F The reverse inequality obtains if u v has a local minimum at (x0 , t0 ). This proves (30). According to the HopfLax formula from V.B: (33) u (x, t) = inf tL
y
xy t
+g (y ) ,
is the Legendre transform of the convex function F . But then where L (p)) (q ) = sup(p q F L
p p p
= sup(p q F (p)) = sup(p (q ) F (p)) = L(q ). Therefore u(x, t) = u (x, t) xy g (y ) t y yx . = sup g (y ) tL t y = sup tL In particular (34) But un (0, 1) = = = =
1 n 1 n 1 n 1 n
u(0, 1) = sup{g (y ) L(y )}.

y
(35)
log wn (0, tn ) + Yn log E hn Y1 + n log E eng(Sn ) log Rm eng dPn .
As un (0, 1) u(0, 1), (34) and (35) conrm the limit (11). The second theorem in B thus implies that I = L is the rate function for {Pn } n=1 .
Remark. This proof illustrates the vague belief that rate functions, interpreted as functions of appropriate parameters, are viscosity solutions of HamiltonJacobi type nonlinear PDE. 192
The general validity of this principle is unclear, but there are certainly many instances in the research literature. See for instance the next section of these notes, and look also in the book by FreidlinWentzell [F-W]. If we accept from B.1 the identication of rate functions and entropy (up to a minus sign and Boltzmanns constant), then the foregoing provides us with a quite new interplay between entropy ideas and nonlinear PDE. D. Small noise in dynamical systems In this last section we discuss another PDE approach to a large deviations problem, this involving the small noise asymptotics of stochastic ODE. 1. Stochastic dierential equations We rapidly recount in this subsection the rudiments of stochastic ODE theory: see, e.g., Arnold [A], Freidlin [FR] or Oksendal [OK] for more. Notation. (i) (, F , ) is a probability space. (ii) {W(t)}t0 is a m-dimensional Wiener process (a.k.a. Brownian motion) dened on (, F , ). We write W(t) = (W 1 (t), . . . , W m (t)). (iii) b : Rn Rn , b = (b1 , . . . , bn ) and B : Rn Mnm , B = ((bij )) are given Lipschitz functions. (iv) X0 is a Rn -valued random variable dened on (, F , ). (v) F (t) = (X0 , W(s)(0 s t)), the smallest -algebra with respect to which X0 and W(s) for 0 s t are measurable. We intend to study the stochastic dierential equation (1) dX(t) = b(X(t))dt + B (X(t))dW(t) X(0) = X0 (t > 0)
for the unknown Rn -valued stochastic process {X(t)}t0 dened on (, F , ). Remarks. (i) We say {X(t)}t0 solves (1) provided this process is progressively measurable with respect to {Ft }t0 and
t t
(2)
X(t) = X0 +
0
b(X(s))ds +
0
B (X(s)) dW(s)
a.s., for each time t 0. The last term on the right is an It o stochastic integral, dened for instance in [A], [FR], etc. 193
(ii) We may heuristically rewrite (1) to read (3) where = (4)

d dt
(t) = b(X(t)) + B (X(t)) (t) X X(0) = X0 and (t) =
(t 0),
dW(t) = m-dimensional white noise. dt
(iii) If we additionally assume that X0 is independent of {W(t)}t0 and E (|X0 |2 ) < , then there exists a unique solution {X(t)}t0 of (1), such that
T
(5)
E
0
|X(t)|2 dt
<
(t)}t0 is another process solving (1) for each time T > 0. Unique here means that if {X and satisfying an estimate like (5), then (6) (t) for all 0 t T ) = 1 (X(t) = X
for each T > 0. Furthermore, the sample paths t X (t) are continuous, with probability one. 2. It os formula, elliptic PDE Solutions of (1) are connected to solutions of certain linear elliptic PDE of second order. The key is It os chain rule, which states that if u : Rn R is a C 2 -function, then (7) du(X(t)) = Du(X(t)) dX(t) + 1 A(X(t)) : D2 u(X(t))dt 2 (t 0),
where A : Rn Mnn is dened by A = BB T . Remarks. (i) If we write A = ((aij )), then

n
(8)
i,j =1
aij i j 0
( Rn ).
194
(ii) Formula (7) means u(X(t)) = u(X(0)) (9) + + for each time t 0. Next assume u solves the PDE4 (10)
1 2 n i,j =1 t 0 t 0 n n 1 ij i=1 bi (X(s))uxi (X(s)) + 2 i,j =1 a (X(s))uxi xj (X(s))ds n m k i=1 k=1 ik (X(s))uxi (X(s))dW (s)
aij uxi xj +
n i=1 bi uxi
= 0 in U
u = g on U ,
where U Rn is a bounded, connected open set with smooth boundary, and g : U R is given. In view of (8) this is a (possibly degenerate) elliptic PDE.
sample path of X( )
Fix a point x U and let {X(t)}t0 solve the stochastic DE (11) dX(t) = b(X(t))dt + B (X(t))dW(t) X(0) = x. (t > 0)
Dene also the hitting time (12) x := min{t 0 | X(t) U }.
os formula, with Assume, as will be the case in 3,4 following, that x < a.s. We apply It u a solution of (10) and the random variable x replacing t:
x
u(X(x )) = u(X(0)) +
0
4
Du B dW.
Note that there is no minus sign in front of the term involving the second derivatives: this diers from the convention in [E1, Chapter VI].
195
But X(0) = x and u(X(x )) = g (X(x )). Thus u(x) = g (X(x ))

0 x
Du B dW.
We take expected values, and recall from [A], [FR], etc. that
x
E
0
Du B dW
= 0,
to deduce this stochastic representation formula for u: (13) u(x) = E (g (X(x ))) (x U ).
Note that X and x here depend on x. 3. An exit problem We hereafter assume the uniform ellipticity condition
n
(14)
i,j =1
aij (x)i j | |2
and suppose also that bi , aij (1 i, j n) are smooth. a. Small noise asymptotics Take > 0. We rescale the noise term in (11): (15) dX (t) = b(X (t))dt + B (X (t)) dW(t) X (0) = x. (t 0)
Now as 0, we can expect the random trajectories t X (t) to converge somehow to the deterministic trajectories t x(t), where (16) (t) = b(x(t)) x x(0) = x. (t 0)
We are therefore interpreting (15) as modeling the dynamics of a particle moving with velocity v = b plus a small noise term. What happens when 0? This problem ts into the large deviations framework. We take = C ([0, T ]; Rm ) for some T > 0 and write P to denote the distribution of the process X () on . Freidlin 196
and Wentzell have shown that {P }0<1 satisfying the large deviation principle with a rate function I [], dened this way: (17) I [y()] =
1 2 T 0 n i,j =1
i (s) bi (y(s)))(y j (s) bj (y(s)))ds if y() H 1 ([0, T ]; Rn ) aij (y(s))(y otherwise.
Here ((aij )) = A1 is the inverse of the matrix A = DDT and H 1 ([0, T ]; Rn ) denotes the Sobolev space of mappings from [0, T ] Rn which are absolutely continuous, with square integrable derivatives. We write y() = (y1 (), . . . , yn ()). b. Perturbations against the ow We present now a PDE method for deriving an interesting special case of the aforementioned large derivation result, and in particular demonstrate how I [] above arises. We follow Fleming [FL] and [E-I]. To set up this problem, take U Rn as above, x a point x U , and let {X (t)}t0 solve the stochastic ODE (15). We now also select a smooth, relatively open subregion U and ask: For small > 0, what is the probability that X (t) rst exits U through the (18) region ? This is in general a very dicult problem, and so we turn attention to a special case, by hypothesizing concerning the vector eld b that 1 n if y() Hloc ([0, ); R ) and (19) y(t) U for all t 0, then (t) b(y(t))|2 dt = +. |y 0 Condition (19) says that it requires an innite amount of energy for a curve y() to resist being swept along with the ow x() determined by b, staying within U for all times t 0.
197
flow lines of the ODE x=b(x)
a random trajectory which exits U against the deterministic flow
Intuitively we expect that for small > 0, the overwhelming majority of the sample paths of X () will stay close to x() and so be swept out of U in nite time. If, on the other hand we take for a smooth window within U lying upstream from x, the probability that a sample path of X () will move against the ow and so exit U through should be very small. Notation. (i) (20) (ii) (21) g = = 1 on 0 on U . Then (22)
u (x) = probability that X () rst exits U through = (X (x ) ).
u (x) = E (g (X (x )))
(x U ). problem = 0 in U = 1 on = 0 on U .
But according to b, u () solves the boundary value 2 n n ij 2 i,j =1 a uxi xj + i=1 bi uxi (23) u u 198
We are interested in the asymptotic behavior of the function u as 0. Theorem. Assume U is connected. We then have (24) u (x) = e
w(x)+o(1) 2
as 0,
uniformly on compact subsets of U , where (25) w(x) := inf

A
1 2
i (s) bi (y(s)))(y j (s) bj (y(s)))ds , aij (y(s))(y

i,j =1
the inmum taken among curves in the admissible class (26)

1 ([0, ); Rn ) | y(t) U for 0 t < , y( ) if < }. A = {y() Hloc
Proof (Outline). 1. We introduce a rescaled version of the log transform from Chapter IV, by setting (27) w (x) := 2 log u (x) (x U ).
According to the Strong Maximum Principle, 0 < u (x) < 1 in U and so the denition (27) makes sense, with w > 0 in U. We compute:
wx = 2 i u xi , u u xi xj u
2 w xi xj =
+ 2
u xi uxj
(u )2
Thus our PDE (23) becomes (28) 2 2

n i,j =1 aij wx + i xj 1 2 n i,j =1 aij wx wx i j n i=1 bi wxi
in U
w = 0 on w at U .
2. We intend to estimate |Dv | on compact subsets of U , as in IV.A.2. For this let us rst dierentiate PDE: (29) 2 2
n n n
a
i,j =1
ij
wx k xi xj
+
i,j =1
ij
wx wx j k xi
i=1
bi wx = R1 , k xi
199
where the remainder term R1 satises the estimate |R1 | C (2 |D2 w | + |Dw |2 + 1). Now set (30) so that xi = 2 xi xj = 2 Thus 2 (31)
2
:= |Dw |2 ,
n k=1 n k=1 wx wx k k xi wxk wxk xi xj + wx wx . k xi k xj
n i,j =1
aij xi xj
n i=1 bi xi
=2 2
n n 2 ij k=1 wxk 2 i,j =1 a wxk xi xj n n ij k=1 i,j =1 a wxk xi wxk xj . n n
n i=1 bi wxi xj
Now
aij wx wx |D2 w |2 . k xi k xj k=1 i,j =1
This inequality and (29) imply: (32) where 2 2

n n
a xi xj
ij i,j =1 i=1
bi xi 2 |D2 w |2 + R2 ,
|R2 | C (2 |D2 w ||Dw | + |Dw |3 + 1) =

2 |D2 w |2 2 2 |D2 w |2 2
+ C (|Dw |3 + 1) + C ( 3/2 + 1).
Consequently (32) yields the inequality: (33) 2 2 2 2 |D w | 2 2

n n
a xi xj
ij i,j =1 i=1
bi xi C ( 3/2 + 1).
Now the PDE (28) implies C (2 |D2 w | + |Dw |) = C (2 |D2 w | + 1/2 ) C (2 |D2 w | + 1) + , 2 200
and so C (2 |D2 w | + 1). This inequality and (33) give us the estimate: (34) 4 2
2 n
aij xi xj 2 C (|D | + 3/2 ) + C,

i,j =1
for some > 0. 3. We employ this dierential inequality to estimate . Take any subregion V U and select then a smooth cuto function such that 0 1, 1 on V, 0 near U .
Write (35) and compute (36) xi = 4 xi + 4 3 xi xi xj = 4 xi xj + 4 3 (xj xi + xi xj ) + 4( 3 xi )xj . := 4
where attains its maximum. Consider rst the case that x0 U , Select a point x0 U (x0 ) > 0. Then D (x0 ) = 0, D2 (x0 ) 0. 201
Owing to (27) (37) and also

i,j =1
D = 4D at x0 ,
n
aij xi xj 0 at x0 .
n
Thus at x0 : 4 0 2 where
a xi xj
i,j =1
ij
4 = 4 aij xi xj + R3 2 i,j =1
|R3 | 4 C ( 3 |D | + 2 ) 4 C 2 .
Therefore (25) implies 4 2 2 4 C (|D | + 3/3 ) + 4 C 2 + C

4 2 2
+ C.
Thus we can estimate = 4 at x0 and so bound |Dw (x0 )|. 4. If on the other hand x0 U , (x0 ) > 0, then we note u 1 on U near x0 . In this case we employ a standard barrier argument to obtain the estimate |Du (x0 )| from which it follows that (38) |Dw (x0 )| = 2 |Du (x0 )| C. u (x0 ) C , 2
Hence we can also estimate = 4 = 4 |Dw |2 if x0 U . It follows that (39) sup |Dw | C
V
for each V U , the constant C depending only on V and not on . 5. As w = 0 on , we deduce from (39) that (40) sup |w | C.
V
202
In view of (39), (40) there exists a sequence r 0 such that uniformly on compact wr w subsets of U . It follows from (28) that (41) and (42) 1 aij w xi w xj 2 i,j =1
n n
w = 0 on
bi w xi = 0 in U,
i=1
in the viscosity sense: the proof is a straightforward adaptation of the vanishing viscosity calculation in VI.A. Since the PDE (42) holds a.e., we conclude that |Dw | C a.e. in U, and so ). w C 0,1 (U We must identify w . 6. For this, we recall the denition (43) w(x) = inf 1 2
0 n
y()A
i (s) bi (y(s)))(y j (s) bj (y(s)))ds , aij (y(s))(y

i,j =1
the admissible class A dened by (26). Clearly then (44) We claim that in fact (45) 1 aij wxi wxj 2 i,j =1
n n
w = 0 on .
bi wxi = 0 in U,
i=1
in the viscosity sense. To prove this, take a smooth function v and suppose (46) We must show (47) 1 aij vxi vxj 2 i,j =1
n n
w v has a local maximum at a point x0 U .
bi vxi 0 at x0 .
i=1
203
To establish (47), note that (46) implies (48) w(x) v (x) w(x0 ) v (x0 ) if x B (x0 , r)
for r small enough. Fix any Rn and consider the ODE (49) (s) = b(y(s)) + A(y(s)) y y(0) = x0 . (s > 0)
Let t > 0 be so small that y(t) B (x0 , r). Then (43) implies 1 w(x0 ) 2
t n
i bi (y))(y j bj (y))ds + w(y(t)). aij (y(s))(y
0 i,j =1
Therefore (48), (49) give the inequality v (x0 ) v (y(t)) w(x0 ) w(y(t))
1 2 t 0 n i,j =1
aij (y(s))i j ds.
Divide by t and let t 0, recalling the ODE (49): 1 Dv (b + A) (A) at x0 . 2 This is true for all vectors Rn , and consequently (50) sup 1 Dv (b + A) (A) 2 0 at x0 .
R n
But the supremum above is attained for = Dv, and so (50) says 1 (ADv ) Dv b Dv 0 at x0 . 2 This is (47). 7. Next let us suppose (51) w v has a local minimum at a point x0 U ,
204
and prove (52) 1 aij vxi vxj 2 i,j =1

n n
bi vxj 0 at x0 .
i=1
To verify this inequality, we assume instead that (52) fails, in which case (53) 1 aij vxi vxj 2 i,j =1
n n
bi vxi < 0 near x0

i=1
for some constant > 0. Now take a small time t > 0. Then the denition (43) implies that there exists y() A such that 1 w(x0 ) w(y(t)) + 2 In view of (51), therefore (54) Now dene so that (55) Then (s) = b(y(s)) + A(y(s))(s) y y(0) = x0 . v (x0 ) v (y(t)) = = Combine this identity with (54):
t 2 t (s)) 1 (s)) (s))ds (Dv ) (b(y) + A (A 2 0 t sup (Dv ) (b(y) + A) 1 (A) ds 2 0 t 1 (ADv ) Dv b Dv ds 0 2 t d v (y(s))ds 0 ds t Dv (y(s)) [b(y(s)) 0 t 0
i bi (y))(y j bj (y))ds t. aij (y)(y 2 i,j =1
v (x0 ) v (y(t)) w(x0 ) w(y(t))

1 2 t 0 n i,j =1 i bi (y))(y j bj (y))ds 2 aij (y)(y t.
b(y)); (s) := A1 (y(s))(y
(s > 0)
+ A(y(s))(s)]ds.
= t,
according to (53), provided t > 0 is small enough. This is a contradiction however. 205
We have veried (52). 8. To summarize, we have so far shown that wr w , w solving the nonlinear rst order PDE (42). Likewise w dened by (43) solves the same PDE. In addition w = w = 0 on . We wish nally to prove that (56) w w in U.
This is in fact true: the proof in [E-I] utilizes various viscosity solution tricks as well as the condition (19). We omit the details here. Finally then, our main assertion (24) follows from (56).
206
Appendix A: Units and constants 1. Fundamental quantities time length mass temperature quantity 2. Derived quantities force pressure work, energy power entropy heat pressure = force/unit area work = force distance pressure volume Units seconds (s) meters (m) kilogram (kg ) Kelvin (K ) mole (mol) Units kg m s2 = newton (N ) N m2 = pascal (P a) N m = joule (J ) J s1 = watt (W ) J K 1 4.1840 J = calorie
power = rate of work 3. Constants R = gas constant = 8.314 J mol1 K 1 k = Boltzmanns constant = 1.3806 1023 J K 1 NA = Avogadros number = R/k = 6.02 1023 mol1
207
Appendix B: Physical axioms We record from Callen [C, p. 283-284] these physical axioms for a thermal system in equilibrium. Postulate I. There exist particular states (called equilibrium states) that, macroscopically, are characterized completely by the specication of the internal energy E and a set of extensive parameters X1 , . . . , Xm , later to be specically enumerated. Postulate II. There exists a function (called the entropy) of the extensive parameters, dened for all equilibrium states, and having the following property. The values assumed by the extensive parameters in the absence of a constraint are those that maximize the entropy over the manifold of constrained equilibrium states. Postulate III. The entropy of a composite system is additive over the constituent subsystems (whence the entropy of each constituent system is a homogeneous rst-order function of the extensive parameters). The entropy is continuous and dierentiable and is a monotonically increasing function of the energy. Postulate IV. The entropy of any system vanishes in the state for which T = (E/S )X1 ,...,Xm = 0. These statements are quoted verbatim, except for minor changes of notation. Postulate IV is the Third Law of thermodynamics, and is not included in our models.
208
References
[A] L. Arnold, Stochastic Dierential Equations, Wiley. [B-G-L] C. Bardos, F. Golse and D. Levermore, Fluid dynamic limits of kinetic equations I, J. Stat. Physics 63 (1991), 323344. [BN] [B-S] [B-T] [C] [CB] L. Breiman, Probability, AddisonWesley, 1968. S. Bamberg and S. Sternberg, A Course in Mathematics for Students of Physics, Vol 2, Cambridge, 1990. S. Bharatha and C. Truesdell, Classical Thermodynamics as a Theory of Heat Engines, Springer, 1977. H. Callen, Thermodynamics and an Introduction to Thermostatistics (2nd ed.), Wiley, 1985. B. Chow, On Harnacks inequality and entropy for Gaussian curvature ow, Comm. Pure Appl. Math. 44 (1991), 469483.
[C-E-L] M. G. Crandall, L. C. Evans and P. L. Lions, Some properties of viscosity solutions of HamiltonJacobi equations, Trans. AMS 282 (1984), 487502. [C-L] [C-N] M. G. Crandall and P. L. Lions, Viscosity solutions of HamiltonJacobi equations, Trans. AMS 277 (1983), 142. B. Coleman and W. Noll, The thermodynamics of elastic materials with heat conduction and viscosity, Arch. Rat. Mech. Analysis 13 (1963), 167178.
[C-O-S] B. Coleman, D. Owen and J. Serrin, The second law of thermodynamics for systems with approximate cycles, Arch. Rat. Mech. Analysis 77 (1981), 103142. [D] [DA] [D-S] [D-Z] [EL] W. Day, Entropy and Partial Dierential Equations, Pitman Research Notes in Mathematics, Series 295, Longman, 1993. E. E. Daub, Maxwells demon, in [L-R]. W. Day and M. Silhav y, Eciency and existence of entropy in classical thermodynamics, Arch. Rat. Mech. Analysis 66 (1977), 7381. A. Dembo and O. Zeitouni, Large Deviation Techniques and Applications, Jones and Barlett Publishers, 1993. R. S. Ellis, Entropy, Large Deviations and Statistical Mechanics, Springer, 1985. 209
[E1] [E2] [E-G] [E-I]
L. C. Evans, Partial Dierential Equations, AMS Press. L. C. Evans, The perturbed test function method for viscosity solutions of nonlinear PDE, Proc. Royal Soc. Edinburgh 111 (1989), 359375. L. C. Evans and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press, 1992. L. C. Evans and H. Ishii, A PDE approach to some asymptotic problems concerning random dierential equations with small noise intensities, Ann. Inst. H. Poincare 2 (1985), 120. E. Fermi, Thermodynamics, Dover, 1956. M. Feinberg and R. Lavine, Thermodynamics based on the HahnBanach Theorem: the Clausius inequality, Arch. Rat. Mech. Analysis 82 (1983), 203293. M. Feinberg and R. Lavine, Foundations of the ClausiusDuhem inequality, Appendix 2A of [TR]. W. H. Fleming, Exit probabilities and optimal stochastic control, Appl. Math. Optimization 4 (1978), 327346. M. I. Freidlin, Functional Integration and Partial Dierential Equations, Princeton University Press, 1985. M. I. Freidlin and A. D. Wentzell, Random Perturbations of Dynamical Systems, Springer, 1984. F. R. Gantmacher, The Theory of Matrices, Chelsea, 1960. M. Gurtin, Thermodynamics of Evolving Phase Boundaries in the Plane, Oxford, 1993. M. Gurtin and W. Williams, An axiomatic foundation for continuum thermodynamics, Arch. Rat. Mech. Analysis 26 (1968), 83117. R. Hamilton, Remarks on the entropy and Harnack estimates for Gauss curvature ow, Comm. in Analysis and Geom. 1 (1994), 155165. K. Huang, Statistical Mechanics (2nd ed.), Wiley, 1987. E. T. Jaynes, E. T. Jaynes: Papers on Probability, Statistics and Statistical Physics (ed. by R. D. Rosenkrantz), Reidel, 1983. 210
[F] [F-L1] [F-L2] [FL] [FR] [F-W] [G] [GU] [G-W] [H] [HU] [J]
[vK-L] [L-P-S]
N. G. van Kampen and J. J. Lodder, Constraints, Amer. J. Physics 52 (1984), 419424. P. L. Lions, B. Perthame and P. E. Souganidis, Existence and stability of entropy solutions for the hyperbolic systems of isentropic gas dynamics in Eulerian and Lagrandian coordinates, Comm. Pure Appl. Math. 49 (1996), 599638.
[L-P-T1] P. L. Lions, B. Perthame and E. Tadmor, A kinetic formulation of multidimensional conservation laws and related equations, J. Amer. Math. Soc. 7 (1994), 169191. [L-P-T2] P. L. Lions, B. Perthame and E. Tadmor, Kinetic formulation of isentropic gas dynamics and p-systems, Comm. Math. Physics 163 (1994), 415431. [L-R] [M-S] [OK] [O] [P-T] [P] [R] [S1] [S2] [S3] [SE] [S] [ST] H. Le and A. Rex (ed.), Maxwells Demon: Entropy, Information, Computing, Princeton U. Press, 1990. C.-S. Man and J. Serrin, Book in preparation. B. Oksendal, Stochastic Dierential Equations, Springer, 1989. D. Owen, A First Course in the Mathematical Foundations of Thermodynamics, Springer, 1984. B. Perthame and E. Tadmor, A kinetic equation with kinetic entropy functions for scalar conservation laws, Comm. Math. Physics 136 (1991), 501517. M. A. Pinsky, Lectures on Random Evolution, World Scientic, 1992. F. Rezakhanlou, Lecture notes from Math 279 (UC Berkeley). J. Serrin, Foundations of Classical Thermodynamics, Lecture Notes, Math. Department, U. of Chicago, 1975. J. Serrin, Conceptual analysis of the classical second laws of thermodynamics, Arch. Rat. Mech. Analysis 70 (1979), 353371. J. Serrin, An outline of thermodynamical structure, in New Perspectives in Thermodynamics (ed. by Serrin), Springer, 1986. M. J. Sewell, Maximum and Minimum Principles, Cambridge, 1987. J. Smoller, Shock Waves and Reaction-Diusion Equations, Springer. D. W. Strock, Probability Theory: An Analytic View, Cambridge University Press, 1993. 211
[SZ] [T] [TR] [T-M] [T-N] [V] [W]
L. Szilard, On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings, in [L-R]. C. J. Thompson, Mathematical Statistical Mechanics, Princeton University Press, 1972. C. Truesdell, Rational Thermodynamics (2nd ed.), Springer. C. Truesdell and R. G. Muncaster, Fundamentals of Maxwells Kinetic Theory of a Simple Monotonic Gas, Academic Press, 1980. C. Truesdell and W. Noll, Nonlinear Field Theories of Mechanics, Springer, 1965. S. R. S. Varadhan, Large Deviations and Applications, SIAM, 1984. A. Wightman, Convexity and the notion of equilibrium state in thermodynamics and statistical mechanics, Introduction to R. B. Israel, Convexity in the Theory of Lattice Gases, Princeton U. Press, 1979. M. Zemansky, Heat and Thermodynamics, McGrawHill.
[Z]
212

Entropy & PDE - Evans

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Entropy & PDE - Evans

Cargado por

Copyright:

Formatos disponibles

Entropy and Partial Dierential Equations

Lawrence C. Evans Department of Mathematics, UC Berkeley

The other equalities in (8) are

W = E (S (W, X1 , . . . , Xm ), X1 , . . . , Xm ) for all W, X1 , . . . , Xm . Dierentiate with respect to Xk : 0= E E S + . S Xk Xk

We record the denitions (6) by writing

and so E V S T P = = = = = internal energy volume S (E, V ) = entropy E = temperature S E V = pressure

with (5) dE = T dS P dV.

A A B B To verify (5), take any S A , S B , X1 , . . . , Xm , X1 , . . . , Xm , and 0 < < 1. Dene A A E A := E (S A , X1 , . . . , Xm ) B B B B E := E (S , X1 , . . . , Xm );

Now W = E (S (W, . . . , Xk , . . . ), . . . , Xk , . . . ) for all W, X1 , . . . , Xm . Hence

owing to (6), since

= T > 0. Rewriting, we deduce

and so E is convex. 6. Entropy maximization and energy minimization 14

Xk E=E* (constraint) S graph of E = E(S,.,Xk,.) *,.) (S*,.,Xk

DL(q ) = p + (q DH (p))Dq p = p by (3),

and so (5) p = DL(q ).

F (T, V ) = inf (E (S, V ) T S ).1

The functions E, F, G, H are called thermodynamic potentials.

The symbol A is also used to denote the Helmholtz free energy.

F (T, V ) = E (S (T, V ), V ) T S (T, V ),

to denote the partial derivative of S in T , with V held constant, and S V

(25) The equality (22) just says E. Capacities =

= heat capacity at constant volume

= latent heat with respect to pressure

= latent heat with respect to volume

= coecient of thermal expansion

But dierentiating the identity F/V (T, V ) = P , we deduce 2F 2F + V T V 2 V T = 0.

Substituting into (10) and recalling (8), (9), we conclude CP CV (11) = = T

(iv) S as a function of (T, V ) is:

depends only on T . 2. Next, we recall Kelvins formula (12) in E: CP CV = T

S V Formula (3) follows.

S (E, V, N ) The function S of (E, V, N ) from B

> 0, then: Indeed if 0 < < 1, N, N V + (1 )V , N + (1 )N ) S (E + (1 )E, )S = (N + (1 )N

Since (E, V ) S is concave, we deduce: V + (1 )V , N + (1 )N ) S (E + (1 )E, ) S (N + (1 )N = N S

V ,N ). = S (E, V, N ) + (1 )S (E, This proves (5).

(iii) S as a function of (T, V ) is (12) S = S (T, V ) = R log(V b) +

Proof. 1. As in the previous proof, E V But P = deduce

Hence if we think of E as a function of (T, V ), we

(ii) We likewise dene the heating 1-form (4)

(t)dt P (T (t), V (t))V 30

(t) + V (T (t), V (t))V (t)dt. CV (T (t), V (t))T

This statement is the same as (10).

Notation. From (11), it follows that

V4 = V1 (T1 ) = V1 exp V = V (T ) = V exp 3 2 1 2 34

Therefore Q = The work is

= RT1 log = RT1 log

W = Q+ Q = R(T2 T1 ) log > 0;

Now (21) Furthermore W=

by the GaussGreen Theorem. This identity, (17) and (21) imply

Divide by V2 V1 and let V2 V1 = V , to deduce (22) V = T P T (Clapeyrons formula)

Thus the form

V CV dT + dV T T is exact: there exists a C 2 -function S such that dS = CV V dT + dV. T T

(24) This is (20).

But E (T (W, V ), V ) = W for all W and so E T T + V 38 E V = 0.

Hence (10) implies (28) Consequently (27) says (29)

and (22) says: V = T Thus (32) since

1 P 1 2S = (P V )2 < 0, 2 V T V CV T 2 < 0, CV > 0. Lastly,

Consequently (31), (32) imply

q (t) = q + (t) q (t) (a t b) (ii) Q+ () = Q () = (iii)

q + (t)dt = heat gained along q (t)dt = heat emitted along

W() = Q+ () Q () = work performed along .

Denition. Assume is a cycle. The eciency of is (34) = W() , Q+ ()

the ratio of the work performed to the heat absorbed. Note 0 1.

denote the highest and lowest temperatures occurring in the cycle.

Xk E=E* (constraint) S graph of E = E(S,.,Xk,.) ,.) (S,.,Xk