<<

. 3
( 5)



>>

(c1L1 + + cnLn)(fi ) = 0 i = 1 : : : n:
Since f1 : : : fn form a basis for X, this means that
(c1L1 + + cnLn)(f) = 0
for all f 2 X. That is, c1L1 + + cnLn = 0 (the zero functional), and so L1 : : : Ln are
linearly dependent.
Conversely, if L1 : : : Ln are linearly dependent, just reverse the steps in the rst part
of the proof to see that Li(fj )] is singular.
Interpolation 76
Theorem. Let X be an n-dimensional vector space and let L1 : : : Ln be linear function-
als on X. Then, the interpolation problem

Li (f) = yi i = 1 ::: n ()

always has a (unique) solution f 2 X for any choice of scalars y1 : : : yn if and only if
L1 : : : Ln are linearly independent.
Proof. Let f1 : : : fn be a basis for X. Then, ( ) is equivalent to the system of equations
a1L1 (f1 ) + + anL1 (fn ) = y1
a1L2 (f1 ) + + anL2 (fn ) = y2
()
..
.
a1 Ln(f1 ) + + an Ln(fn ) = yn
by taking f = a1 f1 + + an fn. Thus, ( ) always has a solution if and only if ( ) always
has a solution if and only if Li(fj )] is nonsingular if and only if L1 : : : Ln are linearly
independent. In any of these cases, note that the solution must be unique.
In the case of Lagrange interpolation, X = Pn and Li is evaluation at xi i.e., Li (f) =
f(xi ), which is easily seen to be linear in f. Moreover, L0 : : : Ln are linearly independent
provided that x0 : : : xn are distinct. (Why?)
In the case of Hermite interpolation, the linear functionals are of the form Lx k (f) =
f (k)(x), di erentiation composed with a point evaluation. If x 6= y, then Lx k and Ly m
are independent for any k and m if k 6= m, then Lx k and Lx m are independent. (How
would you check this?)
Problem Set: Lagrange Interpolation
Math 682 6/2/98

Throughout, x0 x1 : : : xn are distinct points in some interval a b ], and V (x0 x1 : : : xn )
denotes the Vandermonde determinant:
xn
1 x0 x2 0 0
xn
1 x1 x2
V (x0 x1 : : : xn) = .. .. ..1 . . 1
.. :
.
... .
xn
1 xn x2 n n
51. Show by induction that V (x0 x1 : : : xn ) = Q (xi ; xj ).
.
0 j<i n
Hint: In order to reduce to the n n case, replace cj , the j-th column, by cj ;x0 cj;1 ,
starting on the right with j = n. Factor and use the induction hypothesis.]
52. Let y0 y1 : : : yn 2 R be given. Show that the polynomial p 2 Pn satisfying p(xi ) = yi ,
i = 0 1 : : : n, may be written as
xn
0 1 x x2
xn
y0 1 x0 x2
p(x) = c .. .. .. . . .. 00
..
...
xn
y n 1 xn x2 n n
where c is a certain constant. Find c and prove the formula.
53. Given f 2 C a b ], let Ln(f) denote the polynomial of degree at most n that agrees
with f at the xi 's. Prove that Ln is a linear projection onto Pn. That is, show that
Ln( f + g) = Ln(f) + Ln(g), and that Ln(f) = f if and only if f 2 Pn.
54. Let `i (x), i = 0 1 : : : n, denote the Lagrange interpolating polynomials of degree
at most n associated with the nodes x0 x1 : : : xn that is, `i(xj ) = i j . Show that
Pn ` (x) 1 and, more generally, that Pn xk ` (x) = xk , for k = 0 1 : : : n.
i=0 i i=0 i i
55. If `i and Ln are as above, show that the error in the Lagrange interpolation formula
P
is (Ln(f) ; f)(x) = n f(xi ) ; f(x) ] `i (x).
i=0
56. With `i and Ln as above, show that kLn(f)k nkfk, where n = Pn j`i(x)j . i=0
Show that no smaller number has this property for all f 2 C a b ].
Approximation on Finite Sets
Math 682 6/3/98

In this section we consider a question of computational interest: Since best approximations
are often very hard to nd, how might we approximate the best approximation? The
answer to this question lies in approximations over nite sets. Here's the plan:
(1) Fix a nite subset Xm of a b ] consisting of m distinct points a x1 < < xm b,
and nd the best approximation to f out of Pn considered as a subspace of C(Xm ).
In other words, if we call the best approximation pn(Xm ), then

max jf(xi ) ; pn(Xm )(xi )j = min max jf(xi ) ; p(xi )j En(f Xm ):
1im p2Pn 1 i m

(2) Argue that this process converges (in some sense) to the best approximation on all
of a b ] provided that Xm \gets big" as m ! 1. In actual practice, there's no need
to worry about pn(Xm ) converging to pn (the best approximation on all of a b ])
rather, we will argue that En(f Xm ) ! En(f) and appeal to \abstract nonsense."
(3) Find an e cient strategy for carrying out items (1) and (2).

Observations
1. If m n + 1, then En(f Xm ) = 0. That is, we can always nd a polynomial p 2 Pn
that agrees with f at n + 1 (or fewer) points. (How?) Of course, p won't be unique if
m < n + 1. (Why?) In any case, we might as well assume that m n + 2. In fact, as
we'll see, the case m = n + 2 is all that we really need to worry about.
2. If X Y a b ], then En(f X) En(f Y ) En(f). Indeed, if p 2 Pn is the best
approximation on Y , then

En(f X) max jf(x) ; p(x)j max jf(x) ; p(x)j = En(f Y ):
x2X x2Y

Consequently, we expect En(f Xm ) to increase to En(f) as Xm \gets big."
Now if we were to repeat our earlier work on characterizing best approximations,
restricting ourselves to Xm everywhere, here's what we'd get:
Finite Sets 79
Theorem. Let m n + 2. Then,
(i) p 2 Pn is a best approximation to f on Xm if and only if f ; p has an alternating set
containing n+2 points out of Xm that is, f ;p = En(f Xm ), alternately, on Xm .
(ii) pn(Xm ) is unique.

Next let's see how this reduces our study to the case m = n + 2.

Theorem. Fix n, m n + 2, and f 2 C a b ].
(i) If pn 2 Pn is best on all of a b ], then there is a subset Xn+2 of a b ], containing n+2
points, such that pn = pn(Xn+2). Moreover, En(f Xn+2 ) En(f) = En(f Xn+2 )
for any other subset Xn+2 of a b ], with equality if and only if pn(Xn+2) = pn.
(ii) If pn(Xm ) 2 Pn is best on Xm , then there is a subset Xn+2 of Xm such that
pn(Xm ) = pn(Xn+2 ) and En(f Xm ) = En(f Xn+2 ). For any other Xn+2 Xm
we have En(f Xn+2 ) En(f Xn+2 ) = En(f Xm ), with equality if and only if
pn(Xn+2) = pn(Xm ).
Proof. (i): Let Xn+2 be an alternating set for f ;pn over a b ] containing exactly n+2
points. Then, Xn+2 is also an alternating set for f ;pn over Xn+2. That is, for x 2 Xn+2,

(f(x) ; pn(x)) = En(f) = max jf(y) ; pn (y)j:
y2Xn+2

So, by uniqueness of best approximations on Xn+2, we must have pn = pn(Xn+2) and
En(f) = En(f Xn+2 ). The second assertion follows from a similar argument using the
uniqueness of pn on a b ].
(ii): This is just (i) with a b ] replaced everywhere by Xm .

Here's the point: Through some as yet undisclosed method, we choose Xm with
m n + 2 (in fact, m >> n + 2) such that En(f Xm ) En(f) En(f Xm ) + ", and
then we search for the \best" Xn+2 Xm , meaning the largest value of En(f Xn+2 ). We
then take pn(Xn+2) as an approximation for pn. As we'll see momentarily, pn(Xn+2) can
be computed directly and explicitly.
Finite Sets 80
Now suppose that the elements of Xn+2 are a x0 < x1 < < xn+1 b, let
p = pn (Xn+2) be p(x) = a0 + a1 x + + anxn , and let
E = En(f Xn+2 ) = max jf(xi ) ; p(xi )j:
0 i n+1

In order to compute p and E, we use the fact that f(xi ) ; p(xi ) = E, alternately, and
write (for instance)
f(x0 ) = E + p(x0 )
f(x1 ) = ;E + p(x1 )
.
.
.
f(xn+1 ) = (;1)n+1E + p(xn+1)
(where the \E column" might, instead, read ;E, E, : : :, (;1)n E). That is, in order to
nd p and E, we need to solve a system of n + 2 linear equations in the n + 2 unknowns
E, a0 : : : an . The determinant of this system is (up to sign)
xn
1 x0
1 0
;1 n
1 x1 x1
. . . . = A0 + A1 + + An+1 > 0
.. .
. .
. . .
(;1)n+1 1 xn xnn
where we have expanded by cofactors along the rst column and have used the fact that
each minor Ak is a Vandermonde determinant (and hence each Ak > 0). If we apply
Cramer's rule to nd E we get

f(x0 )A0 ; f(x1 )A1 + + (;1)n+1f(xn+1 )An+1
E= A0 + A1 + + An+1
= 0f(x0 ) ; 1 f(x1 ) + + (;1)n+1 n+1f(xn+1 )
P P
where i > 0 and n+1 1 = 1. Moreover, these same i's satisfy n+1(;1)i i q(xi ) = 0
i=0 i=0
for every polynomial q 2 Pn since E = En(q Xn+2 ) = 0 for polynomials of degree at most
n (and since Cramer's rule supplies the same coe cients for all f's).
It may be instructive to see a more explicit solution to this problem. For this, recall
that since we have n + 2 points we may interpolate exactly out of Pn+1. Given this, our
original problem can be rephrased quite succinctly.
Finite Sets 81
Let p be the (unique) polynomial in Pn+1 satisfying p(xi ) = f(xi ), i = 0 1 : : : n + 1,
and let e be the (unique) polynomial in Pn+1 satisfying e(xi ) = (;1)i , i = 0 1 : : : n + 1.
If it is possible to nd a scalar so that p ; e 2 Pn, then p ; e = pn(Xn+2) and
j j = En(f Xn+2 ). Why? Because f ; (p ; e) = e = , alternately, on Xn+2 and so
j j = max jf(x) ; (p(x) ; e(x))j. Thus, we need to compare leading coe cients of p
x2Xn+2
and e.
Now if p has degree less than n + 1, then p = pn(Xn+2) and En(f Xn+2 ) = 0. Thus,
= 0 would do nicely in this case. Otherwise, p has degree exactly n + 1 and the question
is whether e does too. Now,
X
n+1
(;1)i W(x)
e(x) = W 0 (x ) (x ; x )
i i
i=0
Q Pn+1(;1)i =W 0(x ).
where W(x) = n+1(x ; xi ), and so the leading coe cient of e is i
i=0 i=0
We'll be done if we can convince ourselves that this is nonzero. But
Y n;i+1 Y (xi ; xj ) Y (xj ; xi )
i;1 n+1
W 0 (xi ) = (xi ; xj ) = (;1)
j=0 j=i+1
j6=i

hence (;1)i =W 0 (xi ) is of constant sign (;1)n+1. Finally, since
X
n+1 f(xi ) W(x)
p(x) =
i=0 W (xi ) (x ; xi )
0
Pn+1 f(x )=W 0 (x ) and it's easy to
p has leading coe cient nd the value of .
i i
i=0

Conclusion. pn(Xn+2) = p ; e, where
Pn+1 f(x )=W 0 (x ) X
n+1
Pi=0 (;1)i =W 0 (xii)
i (;1)i i f(xi )
= =
n+1
i=0 i=0
and
1=jW 0 (xi )j
i = Pn+1
j=0 1=jW 0 (xj )j
P
and j j = En(f Xn+2 ). Moreover, n+1(;1)i i q(xi ) = 0 for every q 2 Pn.
i=0
Finite Sets 82
Example
Find the best linear approximation to f(x) = x2 on X4 = f0 1=3 2=3 1g 0 1 ].
We seek p(x) = a0 + a1 x and we need only consider subsets of X4 of size 1 + 2 = 3.
There are four:

X4 1 = f0 1=3 2=3g X4 2 = f0 1=3 1g X4 3 = f0 2=3 1g X4 4 = f1=3 2=3 1g:

In each case we nd a p and a (= E in our earlier setup). For instance, in the case of
X4 2 we would solve the system of equations f(x) = + p(x) for x = 0 1=3 1.
9 1
>
(2) + a (2) =
0= >
0
= 9
1 = ; (2) + a + 1 a
a0 = ; 1
> =)
0 31
9 > 9
1 = (2) + a0 + a1 a1 = 1
In the other three cases you would nd that (1) = 1=18, (3) = 1=9, and (4) = 1=18.
Since we need the largest , we're done: X4 2 (or X4 3) works, and p1 (X4 )(x) = x ; 1=9.
(Recall that the best approximation on all of 0 1 ] is p1 (x) = x ; 1=8.)

Where does this leave us? We still need to know that there is some hope of nding an
initial set Xm with En(f) ; " En(f Xm ) En(f), and we need a more e cient means
; m subsets X
n+2 Xm . In order to attack the problem of
of searching through the n+2
nding an initial Xm , we'll need a few classical inequalities. We won't directly attack the
0
second problem instead, we'll outline an algorithm that begins with an initial set Xn+2,
1
containing exactly n + 2 points, which is then \improved" to some Xn+2 by changing only
a single point.

The Inequalities of Markov and Bernstein
In order to discuss the convergence of approximations over nite sets, we will need to know
that di erentiation is bounded on Pn (a fact that is nearly obvious by itself).
The inequality we'll use is due to A. A. Markov from 1889:
Finite Sets 83
Theorem. If p 2 Pn, and if jp(x)j 1 for jxj 1, then jp 0 (x)j n2 for jxj 1.
Moreover, jp 0 (x)j = n2 can only occur at x = 1, and only when p = Tn, the Chebyshev
polynomial of degree n.
Markov's brother, V. A. Markov, later improved on this, in 1916, by showing that
(k)
jp(k)(x)j Tn (1). We've alluded to this fact already (see Rivlin, p. 31), and even more
is true. For our purposes, it's enough to have some bound on di erentiation in particular,
we'll only use
kp 0 k n2 kpk and kp 00 k n4kpk
where k k is the norm in C ;1 1 ].
About 20 years after Markov, in 1912, Bernstein asked for a similar bound for the
derivative of a complex polynomial over the unit disk jzj 1. Now the maximum modulus
theorem tells us that we may reduce to the case jzj = 1, that is, z = ei , and so Bernstein
was able to restate the problem in terms of trig polynomials.
Theorem. If S 2 Tn, and if jS( )j 1, then jS 0 ( )j n. Equality is only possible for
S( ) = sin n( ; 0).
Our plan is to deduce Markov's inequality from Bernstein's inequality by a method of
proof due to Polya and Szego in 1928. To begin, let's consider the Lagrange interpolation
formula in the case where xi = cos((2i ; 1) =2n), i = 1 : : : n, are the zeros of the
Chebyshev polynomial Tn. Recall that we have ;1 < xn < xn;1 < < x1 < 1.
Lemma 1. Each polynomial p 2 Pn;1 may be written
1 X p(x ) (;1)i;1 q1 ; x2 Tn(x) :
n
p(x) = n i i x ; xi
i=1

Proof. We know that the Lagrange interpolation formula is exact for polynomials of
degree < n, and we know that, up to a constant multiple, Tn(x) is the product W(x) =
0
(x ; x1 ) (x ; xn). All that remains is to compute Tn(xi ). But recall that for x = cos
we have
Tn(x) = n sin = pn sin n 2 = p sin n 2 :
n
sinn
0
1 ; cos 1;x
Finite Sets 84
But for xi = cos((2i ; 1) =2n), i.e., for i = (2i ; 1) =2n, it follows that sin n i =
sin((2i ; 1) =2) = (;1)i;1 that is,
i;1 p1 ; x2
(;1)
1= i:
0
Tn(xi ) n
Lemma 2. For any polynomial p 2 Pn;1, we have
p
max jp(x)j max n 1 ; x2 p(x) :
;1 x 1 ;1 x 1
p
Proof. To save wear and tear, let's write M = max n 1 ; x2 p(x) .
;1 x 1
First consider an x in the interval xn x1 ] that is, jxj cos( =2n) = x1. In this case
p
we can estimate 1 ; x2 from below:
r
q
p 1
1 ; x2 1 ; x2 = 1 ; cos2 = sin 2n
1 n
2n
=2 (from the mean value theorem). Hence, for jxj
2 = for 0
because sin
p
cos( =2n), we get jp(x)j n 1 ; x2 jp(x)j M.
Now, for x's outside the interval xn x1 ], we apply our interpolation formula. In this
case, each of the factors x ; xi is of the same sign. Thus,
p
1 X p(x ) (;1)i;1 1 ; x2 Tn(x)
n
i
jp(x)j = n i x ; xi
i=1
X Tn (x) X Tn(x)
n n
M =M x ; xi :
x ; xi
n2 n2
i=1 i=1
But,
X Tn(x)
n
0
= Tn (x) (why?)
i=1 x ; xi
0
and we know that jTn(x)j n2. Thus, jp(x)j M.
We next turn our attention to trig polynomials. As usual, given an algebraic poly-
nomial p 2 Pn, we will sooner or later consider S( ) = p(cos ). In this case, S 0( ) =
p 0 (cos ) sin is an odd trig polynomial of degree at most n and jS 0 ( )j = jp 0 (cos ) sin j =
Finite Sets 85
p
jp 0 (x) 1 ; x2 j. Conversely, if S 2 Tn is an odd trig polynomial, then S( )= sin is even,
and so may be written S( )= sin = p(cos ) for some algebraic polynomial p of degree at
most n ; 1. From Lemma 2,

max S( ) = max jp(cos )j n max jp(cos ) sin j = n max jS( )j:
0 2 sin 02 0 2 0 2

This proves

Corollary. If S 2 Tn is an odd trig polynomial, then
max S( ) n max jS( )j:
0 2 sin 0 2

Now we're ready for Bernstein's inequality.

Bernstein's Inequality. If S 2 Tn, then
max jS 0 ( )j n max jS( )j:
0 2 0 2

) = S( + ) ; S( ; ) 2. For
Proof. We rst de ne an auxiliary function f(
xed, f( ) is an odd trig polynomial in of degree at most n. Consequently,
f( ) n max jf( n max jS( )j:
)j
sin 0 2 0 2

But
S 0 ( ) = lim S( + ) 2 S( ; ) = lim f( )
;
!0 sin
!0
and hence jS 0( )j n max jS( )j.
0 2

Finally, we prove Markov's inequality.
Markov's Inequality. If p 2 Pn, then
max jp 0 (x)j n2 max jp(x)j:
;1 x 1 ;1 x 1
Finite Sets 86
Proof. We know that S( ) = p(cos ) is a trig polynomial of degree at most n satisfying

max jp(x)j = max jp(cos )j:
;1 x 1 0 2

Since S 0 ( ) = p 0 (cos ) sin is also trig polynomial of degree at most n, Bernstein's in-
equality yields
max jp0 (cos ) sin j n max jp(cos )j:
0 2 0 2
In other words,
0 (x)p1 ; x2 n max jp(x)j:
max p
;1 x 1 ;1 x 1
Since p 0 2 Pn;1, the desired inequality now follows easily from Lemma 2:
0 (x)p1 ; x2
max jp 0 (x)j n2 max jp(x)j:
n max p
;1 x 1 ;1 x 1 ;1 x 1

Convergence of Approximations over Finite Sets
In order to simplify things here, we will make several assumptions: For one, we will
consider only approximation over the interval I = ;1 1 ]. As before, we consider a xed
f 2 C ;1 1 ] and a xed integer n = 0 1 2 : : :. For each integer m 1 we choose a nite
subset Xm I, consisting of m points ;1 x1 < < xm 1 in addition, we will
assume that x1 = ;1 and xm = 1. If we put

= max min jx ; xi j > 0
m
x2I 1 i m

then each x 2 I is within m of some xi . If Xm consists of equally spaced points, for
example, it's easy to see that m = 1=(m ; 1).
Our goal is to prove
Theorem. If m ! 0, then En (f Xm ) ! En(f).
And we would hope to accomplish this in such a way that m is a measurable quantity,
depending on f, m, and a prescribed tolerance " = En(f Xm ) ; En(f).
As a rst step in this direction, let's bring Markov's inequality into the picture.
Finite Sets 87
Lemma. Suppose that m m n4=2 < 1. Then, for any p 2 Pn, we have
2

max jp(x)j (1 ; m);1 max jp(xi )j
(1)
;1 x 1 1im
and
m n2 (1 ; m );1
!p( ;1 1 ] max jp(xi )j.
(2) m)
1im
Proof. (1): Take a in ;1 1 ] with jp(a)j = kpk. If a = 1 2 Xm , we're done (since
(1 ; m );1 > 1). Otherwise, we'll have ;1 < a < 1 and p 0 (a) = 0. Next, choose xi 2 Xm
with ja ; xi j and apply Taylor's theorem:
m
p(xi ) = p(a) + (xi ; a) p 0 (a) + (xi ; a) p 00 (c)
2
2
for some c in (;1 1). Re-writing, we have
2
jp(a)j jp(xi )j + 2 jp 00 (c)j:
m

And now we bring in Markov:
24
mn
kpk max jp(xi )j + 2 kpk
1im
which is what we need.
(2): The real point here is that each p 2 Pn is Lipschitz with constant n2 kpk. Indeed,
jp(s) ; p(t)j = j(s ; t) p 0 (c)j js ; tj kp 0 k n2kpkjs ; tj
n2 kpk and,
(from the mean value theorem and Markov's inequality). Thus, !p( )
combining this with (1), we get
m n2 (1 ; m );1
m n2 kpk max jp(xi )j:
!p( m )
1im

Now we're ready to compare En(f Xm ) to En(f). Our result won't be as good as
Rivlin's (he uses a fancier version of Markov's inequality), but it will be a bit easier to
prove. As in the Lemma, we'll suppose that
24
mn < 1
m= 2
and we'll set
m n2 :
m = 1;
m
Note that as m ! 0 we also have m ! 0 and m ! 0.]
Finite Sets 88
Theorem. For f 2 C ;1 1 ],
(1 + m) En (f Xm ) + !f ( ;1 1 ] m ) + m kfk:
En(f Xm ) En(f)

Consequently, if m ! 0, then En(f Xm ) ! En(f) (as m ! 1).
Proof. Let p = pn (Xm ) 2 Pn be the best approximation to f on Xm . Recall that

max jf(xi ) ; p(xi )j = En(f Xm ) kf ; pk:
En(f)
1im

Our plan is to estimate kf ; pk.
Let x 2 ;1 1 ] and choose xi 2 Xm with jx ; xi j m. Then,
jf(x) ; p(x)j jf(x) ; f(xi )j + jf(xi ) ; p(xi )j + jp(xi ) ; p(x)j
!f ( m ) + En(f Xm ) + !p( m )
!f ( m ) + En(f Xm ) + m max jp(xi )j
1im

where we've used (2) from the previous Lemma to estimate !p( m ). All that remains is to
revise this last estimate, eliminating reference to p. For this we use the triangle inequality
again:
max jp(xi )j max jf(xi ) ; p(xi )j + max jf(xi )j
1im 1im 1im
En(f Xm ) + kfk:
Putting all the pieces together gives us our result:

En(f Xm ) + kfk :
En(f) !f ( m ) + En(f Xm ) + m


As Rivlin points out, it is quite possible to give a lower bound on m in the case of, say,
equally spaced points, which will give En(f Xm ) En(f) En(f Xm ) + ", but this is
surely an ine cient approach to the problem. Instead, we'll discuss the one point exchange
algorithm.
The One Point Exchange Algorithm
We're given f 2 C ;1 1 ], n, and " > 0.
Finite Sets 89
1. Pick a starting \reference" Xn+2. A convenient choice is the set xi = cos n+1;i ,
n+1
These are the \peak points" of Tn+1 that is, Tn+1(xi ) = (;1)n+1;i
i = 0 1 : : : n+1.
(and so Tn+1 is the polynomial e from our \Conclusion").
2. Find p = pn (Xn+2) and (by solving a system of linear equations). Recall that
j j = jf(xi ) ; p(xi )j kf ; p k kf ; pk
where p is the best approximation to f on all of ;1 1 ].
3. Find (approximately, if necessary) the \error function" e(x) = f(x) ; p(x) and any
point where jf( );p( )j = kf ;pk. (According to Powell, this can be accomplished
using \local quadratic ts.")
4. Replace an appropriate xi by so that the new reference Xn+2 = fx01 x02 : : :g has the
0
properties that f(x0i ) ; p(x0i ) alternates in sign, and that jf(x0i ) ; p(x0i )j j j for all
i. The new polynomial p0 = pn (Xn+2) and new 0 must then satisfy
0

j j = min jf(x0i ) ; p(x0i )j max jf(x0i ) ; p0 (x0i )j = j 0 j:
0 i n+1 0 i n+1

This is an observation due to de la Vallee Poussin: Since f ; p alternates in sign on
an alternating set for f ; p0 , it follows that f ; p0 increases the minimum error over
this set. (See the Theorem on page 53 of \Characterization of Best Approximation"
for a precise statement.) Again according to Powell, the new p0 and 0 can be found
quickly through matrix \updating" techniques. (Since we've only changed one of the
xi 's, only one row of the matrix on page 82 needs to be changed.)
5. The new 0 satis es j 0j kf ; p k kf ; p0 k, and the calculation stops when
kf ; p0 k ; j 0 j = jf( 0 ) ; p0 ( 0 )j ; j 0 j < ":
A Brief Introduction to Fourier Series
Math 682 6/8/98

The Fourier series of a 2 -periodic (bounded, integrable) function f is
a0 + X;a cos kx + b sin kx
1
2 k=1 k k

where the coe cients are de ned by
1 Z f(t) cos kt dt 1 Z f(t) sin kt dt:
ak = bk =
and
; ;
Please note that if f is Riemann integrable on ; ], then each of these integrals is
well-de ned and nite indeed,
1 Z jf(t)j dt
jak j
;
and so, for example, we would have jak j 2kfk for f 2 C 2 .
We write the partial sums of the series as
a0 + X;a cos kx + b sin kx :
n
sn(f)(x) = 2 k k
k=1
Now while sn(f) need not converge pointwise to f (in fact, it may even diverge at a given
point), and while sn (f) is not typically a good uniform approximation to f, it is still a
very natural choice for an approximation to f in the \least-squares" sense (which we'll
make precise shortly). Said in other words, the Fourier series for f provides a useful
representation for f even if it fails to converge pointwise to f.
Observations
1. The functions 1 cos x cos 2x : : :, sin x sin 2x : : :, are orthogonal on ; ]. That is,
Z Z Z
cos mx cos nx dx = sin mx sinnx dx = cos mx sinnx dx = 0
; ; ;
for any m 6= n (and the last equation even holds for m = n),
Z Z
sin2 mx dx =
cos2 mx dx =
; ;
Fourier Series 91
R
for any m 6= 0, and, of course, ; 1 dx = 2 .
Pn ; cos kx + sin kx , then
2. What this means is that if T(x) = 2 0+
k=1 k k
1 Z T(x) cos mx dx = m Z cos2 mx dx =
m
; ;
for m 6= 0, while
1 Z T(x) dx = 0 Z dx = :
0
2
; ;
That is, if T 2 Tn, then T is actually equal to its own Fourier series.
3. The partial sum operator sn(f) is a linear projection from C 2 onto Tn.
Pn ; cos kx + sin kx is a trig polynomial, then
4. If T(x) = 2 0+
k=1 k k
1 Z f(x) T(x) dx = 0 Z f(x) dx + X k Z f(x) cos kx dx
n
2
; ; ;
k=1
X kZ
n
f(x) sinkx dx
+
;
k=1
X;
n
= 02a0 + k ak + k bk
k=1
where (ak ) and (bk ) are the Fourier coe cients for f. This should remind you of the
dot product of the coe cients.]
5. Motivated by 1, 2, and 4, we de ne the inner product of two elements f, g 2 C 2 by
1 Z f(x) g(x) dx:
hf gi =
;
Note that from 4 we have hf sn (f)i = hsn (f) sn (f)i for any n. (Why?)
6. If some f 2 C 2 has ak = bk = 0 for all k, then f 0.
Indeed, by 4 (or linearity of the integral), this means that
Z
f(x) T(x) dx = 0
;
for any trig polynomial T. But from Weierstrass's second theorem we know that f is
the uniform limit of some sequence of trig polynomials (Tn ). Thus,
Z Z
f(x)2 dx = n!1 f(x) Tn (x) dx = 0:
lim
; ;
Fourier Series 92
Since f is continuous, this easily implies that f 0.
7. If f, g 2 C 2 have the same Fourier series, then f g. Hence, the Fourier series for
an f 2 C 2 provides a representation for f (even if the series fails to converge to f).
8. The coe cients a0 a1 : : : an and b1 b2 : : : bn minimize the expression
Z
f(x) ; sn (f)(x) 2 dx:
'(a0 a1 : : : bn ) =
;
It's not hard to see, for example, that
@ ' = Z 2 f(x) ; s (f)(x) cos kx dx = 0
n
@ ak ;
precisely when ak satis es
Z Z
cos2 kx dx:
f(x) cos kx dx = ak
; ;
9. The partial sum sn (f) is the best approximation to f out of Tn relative to the L2
norm Z
q 1=2
1
kfk2 = hf fi = f(x)2 dx :
;
(Be forewarned: Some authors prefer 1=2 in place of 1= .) That is,

kf ; sn (f)k2 = min kf ; Tk2:
T2Tn
Moreover, using 4 and 5, we have
kf ; sn (f)k2 = hf ; sn (f) f ; sn(f)i
2
= hf fi ; 2 hf sn (f)i + hsn (f) sn (f)i
= kfk2 ; ksn(f)k2
Z2 2
a2 ; X(a2 + b2 ):
n
1 f(x)2 dx ; 20
= kk
; k=1
This should remind you of the Pythagorean theorem.]
10. It follows from 9 that
1 Z s (f)(x)2 dx = a2 + X;a2 + b2 1 Z f(x)2 dx:
n
0
n 2 k=1 k k
; ;
Fourier Series 93
In other symbols, ksn (f)k2 kfk2 . In particular, the Fourier coe cients of any
f 2 C 2 are square summable. (Why?)
11. If f 2 C 2 , then its Fourier coe cients (an ) and (bn ) tend to zero as n ! 1.
12. It follows from 10 and Weierstrass's second theorem that sn (f) ! f in the L2 norm
whenever f 2 C 2 . Indeed, given " > 0, choose a trig polynomial T such that
kf ; Tk < ". Then, since sn(T) = T for large enough n, we have
kf ; sn(f)k2 kf ; Tk2 + ksn(T ; f)k2
p p
2kf ; Tk2 2 2 kf ; Tk < 2 2 ":

(Compare this calculation with Lebesgue's Theorem, page 74.)
By way of comparison, let's give a simple class of functions whose Fourier partial sums
provide good uniform approximations.
Theorem. If f 00 2 C 2 , then the Fourier series for f converges absolutely and uniformly
to f.
Proof. First notice that integration by-parts leads to an estimate on the order of growth
of the Fourier coe cients of f:
Z Z sin kx = ; 1 Z f 0 (x) sin kx dx
ak = f(x) cos kx dx = f(x) d k k;
; ;
2kf 0 k=k ! 0 as k ! 1. Now we integrate
(because f is 2 -periodic). Thus, jak j
by-parts again:
Z Z cos kx = 1 Z f 00 (x) cos kx dx
; kak = f 0 (x) sinkx dx = f 0 (x) d k k;
; ;
(because f 0 is 2 -periodic). Thus, jak j 2kf 00 k=k2 ! 0 as k ! 1. More importantly,
this inequality (along with the Weierstrass M-test) implies that the Fourier series for f is
both uniformly and absolutely convergent:
a0 + X;a cos kx + b sinkx a0 + X;ja j + jb j X
1 1 1 1:
C
2 k=1 k k 2 k=1 k k
k=1 k
2
Fourier Series 94
But why should the series actually converge to f ? Well, if we call the sum
a0 + X;a cos kx + b sin kx
1
g(x) = 2 k k
k=1
then g 2 C 2 (why?) and g has the same Fourier coe cients as f (why?). Hence (by 7),
g = f.
Our next chore is to nd a closed expression for sn(f). For this we'll need a couple of
trig identities the rst two need no explanation.
cos kt cos kx + sin kt sinkx = cos k(t ; x)
2 cos sin = sin( + ) ; sin( ; )
1
1 + cos + cos 2 + + cos n = sin (n + 2 )
1
2 2 sin 2
Here's a short proof for the third:
X X
n n
sin (k + 2 ) ; sin (k ; 1 )
sin 1 + 2 cos k sin 1 = sin 2 +
1 1 = sin (n + 1 ) :
2 2 2 2
k=1 k=1
The function 1
sin (n + 2 ) t
Dn(t) = 1
2 sin 2 t
is called Dirichlet's kernel. It plays an important role in our next calculation.
Now we're ready to re-write our formula for sn (f).
X;
n
1
sn (f)(x) = 2 a0 + ak cos kx + bk sinkx
k=1
" #
1Z X
n
1
f(t) cos kt cos kx + sinkt sin kx dt
= +
2
; k=1
" #
1Z Xn
cos k(t ; x) dt
1
f(t)
= +
2
; k=1
1 Z f(t) sin (n + 1 ) (t ; x) dt
2
= 1 (t ; x)
2 sin 2
;
1 Z f(t) D (t ; x) dt = 1 Z f(x + t) D (t) dt:
= n n
; ;
Fourier Series 95
It now follows easily that sn (f) is linear in f (because integration against Dn is linear),
that sn(f) 2 Tn (because Dn 2 Tn), and, in fact, that sn(Tm ) = Tmin(m n). In other words,
sn is indeed a linear projection onto Tn.
While we know that sn(f) is a good approximation to f in the L2 norm, a better
understanding of its e ectiveness as a uniform approximation will require a better under-
standing of the Dirichlet kernel Dn . Here are a few pertinent facts:

Lemma. (a) Dn is even,
1 Z D (t) dt = 2 Z D (t) dt = 1,
(b) n n
; 0
(c) jDn(t)j n + 1 and Dn(0) = n + 2 , 1
2
j sin (n + 1 ) t j jD (t)j
2
2t for 0 < t < ,
(d) n
tZ
(e) If n = 1 jDn(t)j dt, then 42 log n 3 + log n.
n
;
Proof. (a), (b), and (c) are relatively clear from the fact that
1
Dn(t) = 2 + cos t + cos 2t + + cos nt:
(Notice, too, that (b) follows from the fact that sn(1) = 1.) For (d) we use a more delicate
estimate: Since 2 = sin for 0 < < =2, it follows that 2t= 2 sin(t=2) t for
0 < t < . Hence,
jsin (n + 2 ) t j j sin (n + 1 ) t j
1
2
t
1t
2t 2 sin 2
for 0 < t < . Next, the upper estimate in (e) is easy:
Z Z j sin(n + 1 ) t j
2 jD (t)j dt = 2 2 dt
n 1t
2 sin 2
0 0
2Z 2Z
1=n
(n + 1 ) dt + 2t dt
2
0 1=n

= 2n + 1 + log + log n < 3 + log n:
n
The lower estimate takes some work:
2 Z jD (t)j dt = 2 Z jsin(n + 2 ) t j dt 2 Z jsin(n + 1 ) t j dt
1
2
n t
1t
2 sin 2
0 0 0
Fourier Series 96
2Z 2 Z n jsinx j dx
j sin x j dx
(n+ 2 )
1

= x x
0 0
2 XZ k
n j sinx j dx
= x
k=1 (k;1)
X 1 Zk 4 X1
n n
2 jsinx jdx = 2 k
k=1 k (k;1) k=1
4 log n
2
P1
because n k log n.
k=1
R
The numbers n = kDn k1 = 1 ; jDn(t)j dt are called the Lebesgue numbers asso-
ciated to this process (compare this to the terminology we used for interpolation). The
point here is that n gives the norm of the partial sum operator (projection) on C 2 and
(just as with interpolation) n ! 1 as n ! 1. As a matter of no small curiosity, notice
that, from Observation 10, the norm of sn as an operator on L2 is 1.
Corollary. If f 2 C 2 , then
1 Z jf(x + t)j jD (t)j dt
jsn(f)(x)j n kfk: ()
n
;
In particular, ksn(f)k nkfk (3 + log n)kfk.
If we approximate the function sgn Dn by a continuous function f of norm one, then
1 Z jD (t)j dt = :
s (f)(0)
n n n
;
Thus, n is the smallest constant that works in ( ). The fact that the partial sum operators
are not uniformly bounded on C 2 , along with the Baire category theorem, tells us that
there must be some f 2 C 2 for which ksn(f)k is unbounded. But, as we've seen, this has
more to do with projections than it does with Fourier series:
Theorem. (Kharshiladze, Lozinski) For each n, let Ln be a continuous, linear projection
from C 2 onto Tn . Then, there is some f 2 C 2 for which kLn(f) ; fk is unbounded.
Although our last Corollary may not look very useful, it does give us some information
about the e ectiveness of sn(f) as a uniform approximation to f. Speci cally, we have
Lebesgue's theorem:
Fourier Series 97
Theorem. If f 2 C 2 , and if we set En (f) = min kf ; Tk, then
T
T2Tn
T T
kf ; sn (f)k
En (f) (4 + log n) En (f):

Proof. Let T be the best uniform approximation to f out of Tn . Then, since sn (T ) =
T , we get
kf ; sn(f)k kf ; T k + ksn (T ; f)k (4 + log n) kf ; T k:

As an application of Lebesgue's theorem, let's speak brie y about \Chebyshev se-
ries," a notion that ts neatly in between our discussions of approximation by algebraic
polynomials and by trig polynomials.
Theorem. Suppose that f 2 C ;1 1 ] is twice continously di erentiable. Then, f may
be written as a uniformly and absolutely convergent Chebyshev series that is, f(x) =
P1 ak Tk (x), where P1 jak j < 1.
k=0 k=0
2 C 2 . Since ' is even and twice di eren-
Proof. As usual, consider '( ) = f(cos )
tiable, its Fourier series is an absolutely and uniformly convergent cosine series:
X X
1 1
f(cos ) = '( ) = ak cos k = ak Tk (cos )
k=0 k=0
P
where jak j 2k' 00 k=k2. Thus, f(x) = 1 ak Tk (x).
k=0
Pn a T (x),
If we write Sn(f)(x) = we get an interesting consequence of this
k=0 k k
Theorem. First, notice that
Sn(f)(cos ) = sn(')( ):
Thus, from Lebesgue's theorem,
kf ; Sn(f)kC ;1 1 ] = k' ; sn(')kC2
En(f)
T
(4 + log n) En (') = (4 + log n) En(f):
For n < 400, this reads
kf ; Sn(f)k
En(f) 10 En(f):
Fourier Series 98
P
That is, for numerical purposes, the error incurred by using n ak Tk (x) to approximate
k=0
f is within one decimal place accuracy of the best approximation! Notice, too, that En(f)
would be very easy to estimate in this case since
X X X 1:
2 k' 00 k
kf ; Sn(f)k = jak j
En(f) ak Tk
k>n k
2
k>n k>n
Lebesgue's theorem should remind you of our \fancy" version of Bernstein's theorem
if we knew that En (f) log n ! 0 as n ! 1, then we'd know that sn(f) converged uni-
T
T
formly to f. Our goal, then, is to improve our estimates on En (f), and the idea behind
these improvements is to replace Dn by a better kernel (with regard to uniform approx-
T
imation). Before we pursue anything quite so delicate as an estimate on En (f), though,
let's investigate a simple (and useful) replacement for Dn.
Since the sequence of partial sums (sn ) need not converge to f, we might try looking
at their arithmetic means (or Cesaro sums):
s0 + s1 + + sn;1 :
n= n
(These averages typically have better convergence properties than the partial sums them-
selves. Consider n in the (scalar) case sn = (;1)n, for example.) Speci cally, we set
1 hs (f)(x) + + s (f)(x)i
n (f)(x) = n 0 n;1
" n;1 #
Z 1 X D (t) dt = 1 Z f(x + t) K (t) dt
1 f(x + t) n
= k n
; ;
k=0
where Kn = (D0 + D1 + + Dn;1)=n is called Fejer's kernel. The same techniques we
used earlier can be applied to nd a closed form for n (f) which, of course, reduces to
simplifying (D0 + D1 + + Dn;1 )=n. As before, we begin with a trig identity:
X X
n;1 n;1
cos 2k ; cos (2k + 2)
2 sin sin (2k + 1) =
k=0 k=0
= 1 ; cos 2n = 2 sin2 n :
Thus,
X
1 n;1 sin (2k + 1) t=2 = sin2 (nt=2) :
Kn(t) = n
2n sin2(t=2)
k=0 2 sin (t=2)
Fourier Series 99
R
Please note that Kn is even, nonnegative, and 1 ; Kn (t) dt = 1. Thus, n(f) is
a positive, linear map from C 2 onto Tn (but it's not a projection|why?), satisfying
k n(f)k2 kfk2 (why?).
Now the arithmetic mean operator n (f) is still a good approximation f in L2 norm.
Indeed,
X X
1 n;1(f ; s (f)) 1 n;1 kf ; s (f)k ! 0
kf ; n (f)k2 = n k k 2
n
k=0 k=0
2
as n ! 1 (since kf ; sk (f)k2 ! 0). But, more to the point, n (f) is actually a good
uniform approximation to f, a fact that we'll call Fejer's theorem:
Theorem. If f 2 C 2 , then n(f) converges uniformly to f as n ! 1.
Note that, since n(f) 2 Tn, Fejer's theorem implies Weierstrass's second theorem.
Curiously, Fejer was only 19 years old when he proved this result (about 1900) while
Weierstrass was 75 at the time he proved his approximation theorems.
We'll give two proofs of Fejer's theorem one with details, one without. But both
follow from quite general considerations. First:

Theorem. Suppose that kn 2 C 2 satis es
(a) kn 0,
1 Z k (t) dt = 1, and
(b) n
;
Z
kn(t) dt ! 0 for every > 0.
(c)
jtj
1 Z f(x + t) k (t) dt f(x) for each f 2 C 2 .
Then, n
;
Proof. Let " > 0. Since f is uniformly continuous, we may choose > 0 so that
jf(x) ; f(x + t)j < ", for any x, whenever jtj < . Next, we use the fact that kn is
nonnegative and integrates to 1 to write
1 Z f(x + t) k (t) dt = 1 Z
f(x) ; f(x) ; f(x + t) kn(t) dt
n
; ;
1Z f(x) ; f(x + t) kn(t) dt
;
Fourier Series 100
"Z 2kfk Z
kn(t) dt + kn(t) dt
jtj< jtj
< " + " = 2"

for n su ciently large.
To see that Fejer's kernel satis es the conditions of the Theorem is easy: In particular,
jtj . Indeed, since sin(t=2)
(c) follows from the fact that Kn (t) 0 on the set
t
increases on we have
sin2 (nt=2) 1 ! 0:
Kn(t) =
2n sin2(t=2) 2n sin2( =2)
Our second proof, or sketch, really, is based on a variant of the Bohman-Korovkin
theorem for C 2 , due to Korovkin. In this setting, the three \test cases" are

f0(x) = 1 f1(x) = cos x f2 (x) = sin x:
and

Theorem. Let (Ln) be a sequence of positive, linear maps on C 2 . If Ln(f) f for
each of the three functions f0(x) = 1, f1 (x) = cos x, and f2 (x) = sin x, then Ln(f) f
for every f 2 C 2 .
f in each of the three
We won't prove this theorem rather, we'll check that n(f)
test cases. Since sn is a projection, this is painfully simple!
1
n (f0 ) = n (f0 + f0 + + f0 ) = f0
+ f1) = n;1 f1
1
n (f1 ) = n (0 + f1 + f1
n
+ f2) = n;1 f2
1
n (f2 ) = n (0 + f2 + f2 :
n

Kernel operators abound in analysis for example, Landau's proof of the Weierstrass
theorem uses the kernel Ln(x) = cn(1 ; x2)n . And, in the next section, we'll encounter
Jackson's kernel Jn(t) = cn sin4 nt=n3 sin4 t, which is essentially the square of Fejer's kernel.
While we will have no need for a general theory of such operators, please note that the key
to their utility is the fact that they're nonnegative!
Fourier Series 101
Lastly, a word or two about Fourier series involving complex coe cients. Most modern
textbooks consider the case of a 2 -periodic, integrable function f : R ! C and de ne the
Fourier series of f by
X ikt
1
ck e
k=;1
where now we have only one formula for the ck 's:
1 Z f(t) e;ikt dt
ck = 2
;
but, of course, the ck 's may well be complex. This somewhat simpler approach has other
advantages for one, the exponentials eikt are now an orthonormal set (relative to the
normalizing constant 1=2 ). And, if we remain consistent with this choice and de ne the
L2 norm by
1 Z jf(t)j2 dt 1=2
kfk2 = 2
;
then we have the simpler estimate kfk2 kfk for f 2 C 2 .
The Dirichlet and Fejer kernels are essentially the same in this case, too, except that
P
we would now write sn(f)(x) = n ck eikx. Given this, the Dirichlet and Fejer kernels
k=;n
can be written
X ikx = 1 + X(eikx + e;ikx )
n n
Dn (x) = e
k=;n k=1
X n
cos kx
=1+2
k=1
sin (n + 1 ) x
2
= 1x
sin 2
and
1 n;1 X eikx = X 1 ; jkj eikx
Xm n
Kn (x) = n n
m=0 k=;m k=;n
X
1 n;1 sin(m + 1 ) x
2
= 1x
n m=0 sin 2
sin2(nt=2) :
=
n sin2(t=2)
Fourier Series 102
In other words, each is twice its real coe cient counterpart. Since the choice of normalizing
p
p
constant (1= versus 1=2 , and sometimes even 1= or 1= 2 ) has a (small) e ect on
these formulas, you may nd some variation in other textbooks.
Problem Set: Fourier Series
Math 682 6/8/98

57. De ne f(x) = ( ; x)2 for 0 x 2 , and extend f to a 2 -periodic continuous
function on R in the obvious way. Check that the Fourier series for f is 2 =3 +
P
4 1 cos nx=n2 . Since this series is uniformly convergent, it actually converges to f.
n=1
P
In particular, note that setting x = 0 yields the familiar formula 1 1=n2 = 2 =6.
n=1
58. (a) Given n 1 and " > 0, show that there is a continuous function f 2 C 2
R
satisfying kfk = 1 and 1 ; jf(t) ; sgnDn(t)j dt < "=(n + 1).
(b) Show that sn(f)(0) n ; " and, hence, that ksn (f)k n ; ".
2 , prove that g(x) = R f(x + t) k(t) dt is also in C 2 .
59. (a) If f, k 2 C ;
(b) If we only assume that f is 2 -periodic and Riemann integrable on ; ] (but
still k 2 C 2 ), is g still continuous?
(c) If we simply assume that f and k are 2 -periodic and Riemann integrable on
; ], is g still continuous?
60. Suppose that kn 2 C 2 satis es
1 Z k (t) dt = 1 and Z k (t) dt ! 0 (n ! 1)
k0
n n n
; jtj
R
for every > 0. If f is Riemann integrable, show that 1 ; f(x + t) kn(t) dt ! f(x)
pointwise, as n ! 1, at each point of continuity of f. In particular, n (f)(x) ! f(x)
at each point of continuity of f.
61. Given f, g 2 C 2 , we de ne the convolution of f and g, written f g, by
1 Z f(t) g(x ; t) dt:
(f g)(x) =
;
(Compare this integral with that used in problem 59.)
(a) Show that f g = g f and that f g 2 C 2 .
(b) If one of f or g is a trig polynomial, show that f g is again a trig polynomial
(of the same degree).
(c) If one of f or g is continuously di erentiable, show that f g is likewise continu-
ously di erentiable and nd an integral formula for (f g)0(x).
Jackson's Theorems
Math 682 6/16/98

We continue our investigations of the \middle ground" between algebraic and trigonometric
approximation by presenting several results due to the great American mathematician
Dunham Jackson (from roughly 1911{1912). The rst of these results will give us the best
possible estimate of En(f) in terms of !f and n.

Jackson's Theorem 1. If f 2 C 2 , then En (f) 6 !f ( ;
T 1
] n ).

Theorem 1 should be viewed as an improvement over Bernstein's Theorem, which
stated that En(f) 3 !f ( p1n ) for f 2 C ;1 1 ]. As we'll see, the proof of Theorem 1
2
not only mimics the proof of Bernstein's result, but also uses some of the ideas we talked
about in the last section. In particular, the proof we'll give involves integration against an
\improved" Dirichlet kernel.
Before we dive into the proof, let's list several immediate and important Corollaries:

Corollary. Weierstrass's second theorem (since !f ( n ) ! 0 for any f 2 C 2 ).
1

Corollary. The Dini-Lipschitz theorem: If !f ( n ) log n ! 0 as n ! 1, then the Fourier
1
series for f converges uniformly to f.

Proof. From Lebesgue's theorem,

1
kf ; sn (f)k 6 (4 + log n) !f n ! 0:
T
(4 + log n) En (f)

Jackson's Theorem 2. If f 2 C ;1 1 ], then En(f) 6 !f ( ;1 1 ] n ).
1

Proof. Let '( ) = f(cos ). Then, as we've seen,

1 1
T 6 !' ; 6 !f ;1 1 ] n
En(f) = En (') ]n

where the last inequality follows from the fact that

j'( ) ; '( )j = jf(cos ) ; f(cos )j !f (j cos ; cos j) !f (j ; j):
Jackson's Theorems 105
Corollary. If f 2 lipK on ;1 1 ], then En(f) 6Kn; . (Recall that Bernstein's
theorem gives only n; =2 .)
Corollary. If f 2 C ;1 1 ] has a bounded derivative, then En(f) 0 k.
n kf
6

Corollary. If f 2 C ;1 1 ] has a continuous derivative, then En(f) 0 ).
6
n En;1 (f
2 Pn;1 be the best uniform approximation to f 0 and consider p(x) =
Proof. Let p
R x p (t) dt 2 P . From the previous Corollary,
n
;1
En(f) = En(f ; p) (Why?)
6 kf 0 ; p k = 6 E (f 0 ):
n n;1
n
Iterating this last inequality will give the following result:
Corollary. If f 2 C ;1 1 ] is k-times continuously di erentiable, then
6k+1 1
En(f) !k n ; k
n(n ; 1) (n ; k + 1)
where !k is the modulus of continuity of f (k).
Well, enough corollaries. It's time we proved Jackson's Theorem 1. Now Jackson's
approach was to show that
1 Z f(x + t) c sin nt 4 dt f(x)
n sin t
;
where Jn(t) = cn(sin nt= sin t)4 is the \improved" kernel we alluded to earlier (it's essen-
tially the square of Fejer's kernel). The approach we'll take, due to Korovkin, proves the
existence of a suitable kernel without giving a tidy formula for it. On the other hand,
it's relatively easy to outline the idea. The key here is that Jn(t) should be an even,
R
nonnegative, trig polynomial of degree n with 1 ; Jn(t) dt = 1. In other words,
1 + X cos kt
n
Jn(t) = 2 kn
k=1
(why is the rst term 1/2?), where 1 n : : : n n must be chosen so that Jn(t) 0. As-
suming we can nd such k n's, here's what we get:
Jackson's Theorems 106
Lemma. If f 2 C 2 , then
" #
r
1Z 1; 1n :
1
f(x) ; f(x + t) Jn (t) dt !f n 1+n 2
;
Proof. We already know how the rst several lines of the proof will go:

1 Z f(x + t) J (t) dt = 1 Z
f(x) ; f(x) ; f(x + t) Jn(t) dt
n
; ;
1 Z jf(x) ; f(x + t)j J (t) dt
n
;
1 Z ! ( jtj)J (t) dt:
f n
;
Next we borrow a trick from Bernstein. We replace !f ( jtj ) by
;1 + njtj ! 1
1
!f ( jtj) = !f njtj n fn

and so the last integral on the right-hand side, above, is dominated by
Z n Z jtj J (t) dt :
1 1 ;1 + njtj J (t) dt = ! 1
!f n 1+
n fn n
; ;
R
All that remains is to estimate ; jtj Jn(t) dt, and for this we'll appeal to the Cauchy-
Schwarz inequality (again, compare this to the proof of Bernstein's theorem).
1 Z jtj J (t) dt = 1 Z jtjJ (t)1=2 J (t)1=2 dt
n n n
; ;
1 Z jtj2 J (t) dt Z
1=2 1=2
1 Jn(t) dt
n
; ;
1 Z jtj2 J (t) dt 1=2
:
= n
;
But,
2 2
t
jtj2 = 2 (1 ; cos t ):
sin 2
So,
r
1Z 1Z 1=2
1 ; 1 n:
2
jtjJn(t) dt (1 ; cos t) Jn(t) dt =
2 2
; ;
Jackson's Theorems 107
Now we still have to prove that we can actually nd a suitable choice of scalars
1 n : : : n n . We already know that we need to choose the k n 's so that Jn (t) will be
nonnegative, but now it's clear that we also want 1 n to be very close to 1. To get us
started, let's rst see why it's easy to generate nonnegative cosine polynomials. Given real
numbers c0 : : : cn, note that
! 0X 1
X X X
2
n n n
ck eikx @ cj e;ijx A
ck eikx ck cj ei(k;j)x
0 = =
j=0
k=0 k=0 kj
X2 X ;
n
ck cj ei(k;j)x + ei(j;k)x
ck
= +
k=0 k>j
X2 X
n
ck cj cos(k ; j)x
ck
= +2
k=0 k>j
X2 X
n n;1
ck ck ck+1 cos x + + 2c0cn cos nx:
= +2 ()
k=0 k=0
In particular, we need to nd c0 : : : cn with
X X
n n;1
c2 = 1 ck ck+1
and 1n=2 1:
k 2
k=0 k=0
Pn;1 c c Pn c2 , and then normalize. But, in fact,
nd c 's with
What we'll do is k k=0 k k+1 k=0 k
we won't actually nd anything|we'll simply write down a choice of ck 's that happens to
work! Consider:
X X
n n
k k k k
sin n + 1 sin n + 2 sin n + 1
= sin n + 2
+2 +2 +2
k=0 k=0
1 X sin k
n k k
+ sin n + 2 sin n + 1 :
=2 n+2 +2 +2
k=0
By changing the index of summation, it's easy to see that rst two sums are equal and,
hence, each is equal to the average of the two. Next we re-write this last sum, using the
1; ; ;
trig identity 2 sinA + sinB = cos A;B sin A+B , to get
2 2
X X
n n
k k k
sin n + 1 sin n + 2 sin2 n + 1 :
= cos n + 2
+2 +2 +2
k=0 k=0
Jackson's Theorems 108
k+1
1 for large n, we've done it! If we de ne ck = c sin n+2 , where
Since cos n+2
P
c is chosen so that n c2 = 1=2, and if we de ne Jn(x) using ( ), then Jn(x) 0 and
k=0 k
1 n = cos n+2 (why?). The estimate needed in our Lemma becomes
v
u

<<

. 3
( 5)



>>