1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high-quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:26,090 --> 00:00:27,410
GILBERT STRANG: OK.

9
00:00:27,410 --> 00:00:31,070
Let me just say our next
topic, moving really

10
00:00:31,070 --> 00:00:35,210
toward machine
learning, would be

11
00:00:35,210 --> 00:00:38,990
the last section
of chapter 6 about

12
00:00:38,990 --> 00:00:40,730
stochastic gradient descent.

13
00:00:43,280 --> 00:00:49,810
I think Professor Sra
from computer science,

14
00:00:49,810 --> 00:00:54,410
if he is able, he will take
the class Friday and tell us

15
00:00:54,410 --> 00:01:01,130
about SGDs, Stochastic Gradient
Descent, critical algorithm.

16
00:01:01,130 --> 00:01:07,550
I mean that, at a quick look,
stochastic gradient descent

17
00:01:07,550 --> 00:01:11,720
does smaller batches at a time,
so it's computationally faster

18
00:01:11,720 --> 00:01:14,810
than pure gradient descent.

19
00:01:14,810 --> 00:01:17,150
But also, stochastic
gradient descent

20
00:01:17,150 --> 00:01:20,990
is able to solve
probabilistic problems where

21
00:01:20,990 --> 00:01:24,300
you're trying to minimize
an expected value instead

22
00:01:24,300 --> 00:01:25,310
of a function.

23
00:01:25,310 --> 00:01:30,220
Anyway, it's an important thing.

24
00:01:30,220 --> 00:01:35,450
Actually, there is
a lecture at 2:00

25
00:01:35,450 --> 00:01:42,020
in neuroscience about the
gradient descent algorithms.

26
00:01:42,020 --> 00:01:45,110
And then Professor Boyd
is speaking here at 4:30

27
00:01:45,110 --> 00:01:46,040
this afternoon.

28
00:01:46,040 --> 00:01:49,330
It's overwhelming.

29
00:01:49,330 --> 00:01:57,350
So I'm going to talk today
about several topics that are

30
00:01:57,350 --> 00:02:01,250
specific parts of optimization.

31
00:02:01,250 --> 00:02:04,910
Linear programming is a very
famous part of optimization.

32
00:02:04,910 --> 00:02:07,170
I don't know if you've met it.

33
00:02:07,170 --> 00:02:13,400
I think it's worth knowing
what the inputs are

34
00:02:13,400 --> 00:02:17,270
and what's the key fact
about linear programming.

35
00:02:17,270 --> 00:02:20,870
And, of course, also,
what are the algorithms.

36
00:02:20,870 --> 00:02:23,150
That's usually what
we want to know.

37
00:02:23,150 --> 00:02:25,460
What's the problem?

38
00:02:25,460 --> 00:02:27,980
What are the results,
the mathematical results?

39
00:02:27,980 --> 00:02:30,910
And what are the
computational tools?

40
00:02:30,910 --> 00:02:33,220
Yes, so that's very established.

41
00:02:33,220 --> 00:02:37,200
Max-flow min-cut is one
specific linear programming

42
00:02:37,200 --> 00:02:41,780
that I think maybe would
serve as the best example.

43
00:02:41,780 --> 00:02:46,310
And then two-person
games, have you met those?

44
00:02:46,310 --> 00:02:48,830
So would you know
what the rules are

45
00:02:48,830 --> 00:02:53,330
in a two-person zero-sum game?

46
00:02:53,330 --> 00:02:57,930
In game theory, those are
the ones that are clear--

47
00:02:57,930 --> 00:03:04,230
everything's
well-established there.

48
00:03:04,230 --> 00:03:07,850
And, in fact, they're
equivalent to a linear program.

49
00:03:07,850 --> 00:03:10,640
So that's why those two
are coming in today.

50
00:03:10,640 --> 00:03:15,860
And then the key fact about
two-person games and about

51
00:03:15,860 --> 00:03:17,870
linear programming is duality.

52
00:03:17,870 --> 00:03:20,060
So if there's any
word on that board,

53
00:03:20,060 --> 00:03:25,640
it's the one in capital
letters that has math content.

54
00:03:25,640 --> 00:03:28,190
I'm just going to start
with linear programming

55
00:03:28,190 --> 00:03:35,190
and then move on.

56
00:03:35,190 --> 00:03:37,965
So what's a linear program?

57
00:03:37,965 --> 00:03:41,610
It's we're optimizing
a linear cost function.

58
00:03:41,610 --> 00:03:46,580
So we're minimizing
the cost c transpose x.

59
00:03:46,580 --> 00:03:50,810
So that vector x is the
unknown that we're looking for,

60
00:03:50,810 --> 00:03:53,210
and this vector c
is the cost vector.

61
00:03:53,210 --> 00:03:59,540
So that is c1 x1
plus, plus cn xn.

62
00:03:59,540 --> 00:04:02,450
So you can see why it's
called linear programming.

63
00:04:02,450 --> 00:04:04,130
The cost is linear.

64
00:04:04,130 --> 00:04:06,560
The constraints are also linear.

65
00:04:06,560 --> 00:04:19,820
So the constraints on x are
a set of linear equations.

66
00:04:19,820 --> 00:04:23,480
Of course, I'm not thinking of
A as being a square invertible

67
00:04:23,480 --> 00:04:24,070
matrix.

68
00:04:24,070 --> 00:04:25,100
No way.

69
00:04:25,100 --> 00:04:27,680
If it were, that would
tell us what x had to be

70
00:04:27,680 --> 00:04:29,420
and our problem would be over.

71
00:04:33,350 --> 00:04:36,630
We have n unknowns.

72
00:04:36,630 --> 00:04:39,470
This is m by n, of course.

73
00:04:39,470 --> 00:04:41,060
x is n by 1.

74
00:04:41,060 --> 00:04:44,000
That's our unknown vector.

75
00:04:44,000 --> 00:04:49,520
And I'm thinking that m
would be smaller than n.

76
00:04:49,520 --> 00:04:52,250
But we do have
constraints, and now comes

77
00:04:52,250 --> 00:04:56,780
the thing that makes linear
programming not actually

78
00:04:56,780 --> 00:04:58,280
linear.

79
00:04:58,280 --> 00:05:03,410
And that's the constraint
x greater or equal to 0.

80
00:05:03,410 --> 00:05:05,960
And I've written
that as a vector

81
00:05:05,960 --> 00:05:14,840
but that means x1 greater
or equal to 0, on to xn

82
00:05:14,840 --> 00:05:16,110
greater or equal to 0.

83
00:05:16,110 --> 00:05:19,010
So it's a vector inequality.

84
00:05:19,010 --> 00:05:24,080
So we have minimizing
a very simple function

85
00:05:24,080 --> 00:05:28,170
with pretty straightforward
constraints but inequality

86
00:05:28,170 --> 00:05:29,570
constraints.

87
00:05:29,570 --> 00:05:32,690
So the set of--

88
00:05:32,690 --> 00:05:36,210
these two together
are the constraints.

89
00:05:36,210 --> 00:05:38,060
And in linear algebra--

90
00:05:38,060 --> 00:05:45,710
in linear programming language
this is called a feasible set

91
00:05:45,710 --> 00:05:48,680
of x's.

92
00:05:48,680 --> 00:05:54,250
It's the constraint set,
capital K in the notes.

93
00:05:54,250 --> 00:06:01,660
So let me draw a
picture to show--

94
00:06:01,660 --> 00:06:03,910
or maybe I'll just ask.

95
00:06:03,910 --> 00:06:06,040
How many of you are
already familiar

96
00:06:06,040 --> 00:06:08,060
with linear programming?

97
00:06:08,060 --> 00:06:09,700
Oh, quite a lot.

98
00:06:09,700 --> 00:06:11,900
So I won't belabor the point.

99
00:06:11,900 --> 00:06:16,600
But let me just draw a picture.

100
00:06:16,600 --> 00:06:22,340
So my picture will be just here.

101
00:06:22,340 --> 00:06:25,730
So that's n equals 3.

102
00:06:25,730 --> 00:06:30,880
So this is the x1, x2, x3 space.

103
00:06:30,880 --> 00:06:34,450
And x greater or
equal to 0 means what?

104
00:06:34,450 --> 00:06:40,830
It means that I'm in this
1/8 of this space, right?

105
00:06:40,830 --> 00:06:44,310
It's the non-negative--
well, I would say quadrant

106
00:06:44,310 --> 00:06:48,190
but I really should
say octant because it's

107
00:06:48,190 --> 00:06:51,630
1/8 of the full 3D space.

108
00:06:51,630 --> 00:06:53,250
So I'm in here.

109
00:06:53,250 --> 00:06:59,250
And then maybe I have m as
maybe just one constraint.

110
00:06:59,250 --> 00:07:08,880
So let me take the cost to be,
say, x1 plus 2 x2 plus 5 x3,

111
00:07:08,880 --> 00:07:10,360
say.

112
00:07:10,360 --> 00:07:14,100
And the constraint, I'm just
going to have one equation--

113
00:07:14,100 --> 00:07:16,110
x1 plus x2.

114
00:07:16,110 --> 00:07:22,380
Let me make it easy, make that
constraint an easy one to--

115
00:07:22,380 --> 00:07:27,960
so this is what we
want to minimize.

116
00:07:27,960 --> 00:07:33,210
And that's what we have
to satisfy, as well

117
00:07:33,210 --> 00:07:35,070
as x greater or equal to 0.

118
00:07:35,070 --> 00:07:38,150
So that's a plane.

119
00:07:38,150 --> 00:07:41,410
One equation gives
us a plane in R3.

120
00:07:41,410 --> 00:07:43,085
The plane would go through--

121
00:07:47,222 --> 00:07:57,110
would hit the axes at 3, 0, 0.

122
00:07:57,110 --> 00:07:59,990
And so the min--

123
00:07:59,990 --> 00:08:02,840
the point has to
lie on that plane.

124
00:08:02,840 --> 00:08:07,250
And it has to lie in the octant,
so that plane is chopped off

125
00:08:07,250 --> 00:08:08,960
to be a triangle.

126
00:08:08,960 --> 00:08:13,820
This is a good visualization
of linear programming.

127
00:08:13,820 --> 00:08:15,470
And what's the conclusion?

128
00:08:15,470 --> 00:08:17,210
Well, our cost is linear.

129
00:08:19,880 --> 00:08:24,800
So the result is that one
of these three corners

130
00:08:24,800 --> 00:08:27,860
is the winner.

131
00:08:27,860 --> 00:08:29,040
It could happen.

132
00:08:29,040 --> 00:08:32,570
It could happen that maybe I
have equal values on those two

133
00:08:32,570 --> 00:08:35,480
corners, and therefore
all the way along

134
00:08:35,480 --> 00:08:37,309
would be also winners.

135
00:08:37,309 --> 00:08:39,559
But when I have a
linear function,

136
00:08:39,559 --> 00:08:41,960
it's a maximum at the ends.

137
00:08:41,960 --> 00:08:46,370
And these are the ends,
those three corners.

138
00:08:46,370 --> 00:08:56,330
So 3, 0, 0, 0, 3, 0, or 0,
0, 3 are the three corners,

139
00:08:56,330 --> 00:08:59,640
and one of those is the winner
and the problem is solved.

140
00:08:59,640 --> 00:09:03,290
And, in fact, for this
case, since I'm minimizing,

141
00:09:03,290 --> 00:09:06,230
I guess it would be
that corner that wins.

142
00:09:06,230 --> 00:09:09,080
So let me give it
a star, not an x.

143
00:09:09,080 --> 00:09:10,480
Yes.

144
00:09:10,480 --> 00:09:12,980
3, 0, 0 because that--

145
00:09:12,980 --> 00:09:13,600
oh.

146
00:09:13,600 --> 00:09:16,100
No, that's 3.

147
00:09:16,100 --> 00:09:24,980
So the value turned out to
be 3 at the point 3, 0, 0.

148
00:09:24,980 --> 00:09:26,050
And that's x star.

149
00:09:30,290 --> 00:09:30,875
Good.

150
00:09:30,875 --> 00:09:33,770
Good, good, good.

151
00:09:33,770 --> 00:09:36,290
What more do I
want to do, I said?

152
00:09:36,290 --> 00:09:37,530
Yes.

153
00:09:37,530 --> 00:09:39,322
AUDIENCE: Shouldn't it
be at point 0, 0, 3?

154
00:09:41,780 --> 00:09:43,650
GILBERT STRANG: Should it be--

155
00:09:43,650 --> 00:09:44,870
oh, no.

156
00:09:44,870 --> 00:09:47,730
0, 0, 3 would have x3 is 3.

157
00:09:47,730 --> 00:09:48,350
That's fine.

158
00:09:48,350 --> 00:09:49,630
It gives us a chance, I think.

159
00:09:49,630 --> 00:09:52,030
That would give me a cost of 15.

160
00:09:52,030 --> 00:09:53,930
This would give me a
cost of 6, and that

161
00:09:53,930 --> 00:09:55,270
would give me a cost of 3.

162
00:09:55,270 --> 00:09:55,770
Yes.

163
00:09:58,690 --> 00:10:02,750
It's obvious that if we could
enumerate all the corners,

164
00:10:02,750 --> 00:10:09,290
we would have a super fast
way to get the answer.

165
00:10:09,290 --> 00:10:14,260
But the trouble is, of course,
that for large values of m

166
00:10:14,260 --> 00:10:17,810
and n, there are
exponentially many corners,

167
00:10:17,810 --> 00:10:21,830
and we don't want
to see them all.

168
00:10:21,830 --> 00:10:30,050
So that's what makes linear
programming take time and need

169
00:10:30,050 --> 00:10:31,200
ideas.

170
00:10:31,200 --> 00:10:33,800
So, for algorithms,
there are two types

171
00:10:33,800 --> 00:10:39,370
of algorithms, two types
of codes that solve these.

172
00:10:39,370 --> 00:10:45,320
The older, well-established
ones are the--

173
00:10:45,320 --> 00:10:56,860
so one way is the simplex
method, which finds a corner.

174
00:10:56,860 --> 00:11:00,730
We know the optimum
one of the corners.

175
00:11:00,730 --> 00:11:07,020
So it will find one corner, and
it will go to the next corner.

176
00:11:09,970 --> 00:11:15,280
So if it starts at
one of these corners,

177
00:11:15,280 --> 00:11:21,610
it will travel along an
edge that lowers the cost.

178
00:11:21,610 --> 00:11:25,050
And it has to stop at the
end because it's linear.

179
00:11:25,050 --> 00:11:27,610
The cost will keep going
down and going down

180
00:11:27,610 --> 00:11:32,200
until it bumps
into the end point,

181
00:11:32,200 --> 00:11:33,910
and then it can't go further.

182
00:11:33,910 --> 00:11:36,940
And then we would, from
that next corner, we

183
00:11:36,940 --> 00:11:38,185
would go to the next corner.

184
00:11:42,940 --> 00:11:49,090
Each time, it's like steepest
descent on the edges.

185
00:11:49,090 --> 00:11:52,550
From the first corner,
we go the steepest way

186
00:11:52,550 --> 00:11:56,320
till we can't go further,
we've hit another corner.

187
00:11:56,320 --> 00:11:59,020
And we recompute
the linear algebra,

188
00:11:59,020 --> 00:12:03,880
find which direction
is steepest from there.

189
00:12:03,880 --> 00:12:06,490
So that's the idea of
the simplex method,

190
00:12:06,490 --> 00:12:09,190
which was invented by Dantzig.

191
00:12:15,900 --> 00:12:20,310
And the algebra is not going
to be in today's lecture

192
00:12:20,310 --> 00:12:23,020
but it's straightforward.

193
00:12:23,020 --> 00:12:25,340
Well, people have to
optimize it because that's

194
00:12:25,340 --> 00:12:28,200
a highly frequently used method.

195
00:12:28,200 --> 00:12:29,580
Yes.

196
00:12:29,580 --> 00:12:35,250
But then, about 20 years
later maybe, or 30, I

197
00:12:35,250 --> 00:12:40,980
remember going to a lecture in
downtown Boston by Karmarkar.

198
00:12:40,980 --> 00:12:44,360
So I have to put his name down--

199
00:12:44,360 --> 00:12:44,995
Karmarkar.

200
00:12:50,670 --> 00:12:55,950
And he was in The New York
Times, all the newspapers,

201
00:12:55,950 --> 00:12:58,890
so it was a big deal.

202
00:12:58,890 --> 00:13:04,190
He had an alternative algorithm.

203
00:13:04,190 --> 00:13:09,860
And the exact
algorithm he proposed

204
00:13:09,860 --> 00:13:15,200
hasn't survived until
today but the idea has.

205
00:13:15,200 --> 00:13:21,920
And the idea was to go to travel
inside the feasible set and not

206
00:13:21,920 --> 00:13:23,420
around the edges.

207
00:13:23,420 --> 00:13:27,800
So his idea-- because
in here, would maybe

208
00:13:27,800 --> 00:13:31,670
travel down near that edge,
start again, travel again.

209
00:13:31,670 --> 00:13:38,900
So it's steepest descent in the
constraint set, the feasible

210
00:13:38,900 --> 00:13:39,920
set.

211
00:13:39,920 --> 00:13:46,310
And you can use
calculus and this idea.

212
00:13:46,310 --> 00:13:51,440
So this is interior
point method.

213
00:13:51,440 --> 00:13:54,440
So I'll just use
the word interior.

214
00:13:54,440 --> 00:14:01,550
That's telling us that we're
inside the feasible set.

215
00:14:01,550 --> 00:14:04,490
We don't hit, come all
the way to the boundary

216
00:14:04,490 --> 00:14:06,920
intentionally,
because we want room

217
00:14:06,920 --> 00:14:10,490
to move, and to find
derivatives, and to use

218
00:14:10,490 --> 00:14:13,780
Newton's method to minimize.

219
00:14:13,780 --> 00:14:17,930
You choose a search direction,
just as all of optimization

220
00:14:17,930 --> 00:14:18,950
does.

221
00:14:18,950 --> 00:14:20,780
And in that search
direction, you

222
00:14:20,780 --> 00:14:25,130
track it and you
stop before you bump

223
00:14:25,130 --> 00:14:27,800
into the edge of
the feasible set.

224
00:14:27,800 --> 00:14:30,010
And then you compute
derivatives again.

225
00:14:30,010 --> 00:14:32,270
So you can use calculus.

226
00:14:32,270 --> 00:14:35,120
And it's a method that--

227
00:14:35,120 --> 00:14:37,130
it's an idea that--

228
00:14:37,130 --> 00:14:40,760
the interior idea
came before Karmarkar.

229
00:14:40,760 --> 00:14:46,460
But he got super publicity,
so it really got attention,

230
00:14:46,460 --> 00:14:48,020
got new thinking.

231
00:14:48,020 --> 00:14:52,080
And new ideas came partly
from people at MIT.

232
00:14:56,140 --> 00:15:01,180
And these two are now still
locked in competition.

233
00:15:01,180 --> 00:15:06,710
One hasn't beaten the
other in all problems.

234
00:15:06,710 --> 00:15:11,390
So linear programming is here.

235
00:15:11,390 --> 00:15:13,760
But then, for
nonlinear programming,

236
00:15:13,760 --> 00:15:17,330
quadratic programming,
where the cost is quadratic,

237
00:15:17,330 --> 00:15:21,320
nonlinear programming,
semi-definite programming--

238
00:15:21,320 --> 00:15:24,890
that's where you have a
matrix unknown and matrix

239
00:15:24,890 --> 00:15:27,200
constraints--

240
00:15:27,200 --> 00:15:31,250
those are all-- the more
complicated you get,

241
00:15:31,250 --> 00:15:35,960
the more it tends to be
interior point methods.

242
00:15:35,960 --> 00:15:39,147
That's a summary of
linear programming.

243
00:15:41,890 --> 00:15:44,500
Now I'd like to give an example.

244
00:15:44,500 --> 00:15:46,390
And then I haven't
told you the main--

245
00:15:46,390 --> 00:15:48,525
well, let me tell you the
main fact about duality.

246
00:15:51,430 --> 00:15:55,090
This is really-- and I'll write
that maybe here next to it.

247
00:15:58,550 --> 00:16:05,230
So duality is there's a
dual program, a dual LP.

248
00:16:05,230 --> 00:16:12,770
And that dual is going to do
a maximum instead of a minimum

249
00:16:12,770 --> 00:16:20,030
And the cost is going to involve
the b from the primal problem.

250
00:16:20,030 --> 00:16:24,980
So this is now called the primal
problem, and this is its dual.

251
00:16:24,980 --> 00:16:27,320
So they're twin problems.

252
00:16:27,320 --> 00:16:30,230
They use the same data
but quite differently.

253
00:16:30,230 --> 00:16:33,030
It's like transposing things.

254
00:16:33,030 --> 00:16:36,890
So let's call it the unknown y.

255
00:16:36,890 --> 00:16:47,030
So for y1 to ym, I guess,
because b, the right-hand side

256
00:16:47,030 --> 00:16:49,940
over here, is m by 1.

257
00:16:53,240 --> 00:16:59,190
So that's maximize that subject
to-- what are the constraints?

258
00:16:59,190 --> 00:17:03,080
Well, the cost function over
here, c, the cost vector,

259
00:17:03,080 --> 00:17:06,260
goes into the constraints.

260
00:17:06,260 --> 00:17:10,220
And I think the greater or
equal sign probably becomes

261
00:17:10,220 --> 00:17:12,190
a less or equal to sign.

262
00:17:12,190 --> 00:17:15,650
The A gets transposed.

263
00:17:15,650 --> 00:17:20,660
I think that's probably
the constraint in the dual.

264
00:17:20,660 --> 00:17:23,020
And it happens it
doesn't need y.

265
00:17:23,020 --> 00:17:26,690
We don't have y greater
or equal to 0 over here.

266
00:17:26,690 --> 00:17:28,820
So that's a dual problem.

267
00:17:28,820 --> 00:17:31,910
It has a linear cost.

268
00:17:31,910 --> 00:17:35,720
It has linear
inequality constraints.

269
00:17:35,720 --> 00:17:38,480
It can be solved by
the simplex method.

270
00:17:38,480 --> 00:17:41,570
You could choose whether you
solve that one or this one,

271
00:17:41,570 --> 00:17:45,761
because if you solve
one, you solve the other.

272
00:17:45,761 --> 00:17:48,530
The two are closely
connected and that's

273
00:17:48,530 --> 00:17:51,140
the key idea of duality.

274
00:17:51,140 --> 00:17:56,745
So maybe I'll put the idea of,
first of all, a weak duality.

275
00:18:03,030 --> 00:18:08,160
Which says that this quantity
that we're trying to maximize--

276
00:18:08,160 --> 00:18:11,180
we're getting it as
large as possible--

277
00:18:11,180 --> 00:18:18,630
is always less or equal
to any c transpose x.

278
00:18:18,630 --> 00:18:28,020
This is for any
feasible, allowed--

279
00:18:28,020 --> 00:18:32,430
any x and y that
satisfy the constraints.

280
00:18:32,430 --> 00:18:35,640
Remember, feasible means
satisfies the constraints.

281
00:18:35,640 --> 00:18:39,300
So, in other words, this
problem, this maximization

282
00:18:39,300 --> 00:18:42,190
problem, you're trying
to push this up.

283
00:18:42,190 --> 00:18:46,020
The minimization problem,
you're trying to pull this down.

284
00:18:46,020 --> 00:18:49,950
And it's easier to show
that the one, the b

285
00:18:49,950 --> 00:18:52,860
transpose-- it's easier
to show that inequality.

286
00:18:52,860 --> 00:18:54,360
Let's do that.

287
00:18:54,360 --> 00:18:56,150
So b transpose y proof.

288
00:18:58,730 --> 00:19:01,560
It should be a one-line proof.

289
00:19:01,560 --> 00:19:04,290
So b transpose y--

290
00:19:06,990 --> 00:19:07,830
what do I do?

291
00:19:07,830 --> 00:19:11,950
So b transpose y well,
I look over here.

292
00:19:11,950 --> 00:19:14,280
That's where b shows up.

293
00:19:14,280 --> 00:19:24,900
That's x transpose A
transpose y, because b is Ax.

294
00:19:24,900 --> 00:19:29,730
I'm feasible, so my b is Ax.

295
00:19:29,730 --> 00:19:32,700
And for any x-- this
is for this is for any

296
00:19:32,700 --> 00:19:35,930
x that b is going to be Ax.

297
00:19:35,930 --> 00:19:38,340
So am I good so far?

298
00:19:38,340 --> 00:19:40,730
And now what do I do?

299
00:19:40,730 --> 00:19:43,440
Now I look here.

300
00:19:43,440 --> 00:19:46,920
I see A transpose y sitting
right in front of me.

301
00:19:46,920 --> 00:19:50,760
So I say, well, OK,
less or equal to.

302
00:19:50,760 --> 00:19:52,560
And I guess I'm done.

303
00:19:52,560 --> 00:19:54,870
A transpose y is
less or equal to c.

304
00:19:54,870 --> 00:19:59,880
So I have x transpose
c, c transpose x.

305
00:19:59,880 --> 00:20:04,800
So this equals this,
less or equal that.

306
00:20:04,800 --> 00:20:05,780
I've got it.

307
00:20:05,780 --> 00:20:11,820
Weak duality is just put
together the requirements

308
00:20:11,820 --> 00:20:13,650
on x and y and you have it.

309
00:20:13,650 --> 00:20:19,600
But was this important?

310
00:20:22,260 --> 00:20:24,300
If the mathematics
is right, everything

311
00:20:24,300 --> 00:20:27,840
has to have its
place, play its role.

312
00:20:27,840 --> 00:20:32,460
And so what's the role of x
greater or equal to 0 there?

313
00:20:32,460 --> 00:20:34,260
Why do we need x--

314
00:20:34,260 --> 00:20:40,160
If we just even think of
n equal 1, just a number,

315
00:20:40,160 --> 00:20:43,970
where do we use the fact that
x is greater or equal to 0?

316
00:20:43,970 --> 00:20:45,020
Do we use it?

317
00:20:45,020 --> 00:20:52,975
This looks so smooth,
because A transpose y--

318
00:20:52,975 --> 00:20:58,880
I'll write that as x
transpose c for the moment,

319
00:20:58,880 --> 00:21:03,720
just so your eye says
that's the same x transpose,

320
00:21:03,720 --> 00:21:06,010
and the A transpose y
is less or equal to c.

321
00:21:06,010 --> 00:21:08,926
Where is x greater or
equal to 0 coming in?

322
00:21:08,926 --> 00:21:10,732
AUDIENCE: [INAUDIBLE].

323
00:21:10,732 --> 00:21:11,565
GILBERT STRANG: Yes.

324
00:21:11,565 --> 00:21:12,060
AUDIENCE: Like above.

325
00:21:12,060 --> 00:21:12,905
It's from the that.

326
00:21:12,905 --> 00:21:14,280
GILBERT STRANG:
Yes, that's true.

327
00:21:14,280 --> 00:21:17,070
But I want to see--

328
00:21:17,070 --> 00:21:19,860
so here's my point.

329
00:21:19,860 --> 00:21:21,720
The fact that A
transpose y is less

330
00:21:21,720 --> 00:21:23,880
or equal to c, that's fine.

331
00:21:23,880 --> 00:21:25,420
Highly important.

332
00:21:25,420 --> 00:21:30,300
But if x was negative, it
would be the other way.

333
00:21:30,300 --> 00:21:31,890
You would go the
other way and you

334
00:21:31,890 --> 00:21:33,780
wouldn't have what you want.

335
00:21:33,780 --> 00:21:38,220
So we really do use the fact
that x is greater or equal to 0

336
00:21:38,220 --> 00:21:41,250
to say that I have this
less or equal this.

337
00:21:41,250 --> 00:21:44,310
And then I multiply
by something positive

338
00:21:44,310 --> 00:21:46,290
and then I still
have less or equal.

339
00:21:46,290 --> 00:21:47,620
OK, good.

340
00:21:47,620 --> 00:21:50,800
So the math is right.

341
00:21:50,800 --> 00:21:53,160
Everything does its part.

342
00:21:53,160 --> 00:21:56,700
And then, of course,
the beautiful result,

343
00:21:56,700 --> 00:21:59,680
the important
result is that there

344
00:21:59,680 --> 00:22:03,330
is strong duality,
just called duality,

345
00:22:03,330 --> 00:22:06,810
which is that at the maximum--

346
00:22:06,810 --> 00:22:10,380
now this is not for any
x and y but this is for--

347
00:22:10,380 --> 00:22:14,700
this is going to be for
x y star, the winner,

348
00:22:14,700 --> 00:22:16,560
and x star, the winner--

349
00:22:16,560 --> 00:22:18,570
equality holds.

350
00:22:18,570 --> 00:22:19,500
So that's duality.

351
00:22:22,010 --> 00:22:24,540
The maximum in the
dual problem is

352
00:22:24,540 --> 00:22:27,550
the same as the minimum
in the primal problem.

353
00:22:27,550 --> 00:22:29,640
The two have met.

354
00:22:29,640 --> 00:22:32,550
There is no duality gap.

355
00:22:32,550 --> 00:22:36,360
In some cooked up
nonlinear problems,

356
00:22:36,360 --> 00:22:41,620
there could be a gap between
the maximum and the minimum,

357
00:22:41,620 --> 00:22:43,920
but you hope not.

358
00:22:43,920 --> 00:22:48,010
And here the big theorem is not.

359
00:22:48,010 --> 00:22:49,390
They're equal.

360
00:22:49,390 --> 00:22:53,580
You push this up,
push that down.

361
00:22:53,580 --> 00:22:59,100
Another way to say that
duality is that this is

362
00:22:59,100 --> 00:23:01,050
pushing that up at some max--

363
00:23:01,050 --> 00:23:06,660
I could write that as a maximum
of a minimum equaling a minimum

364
00:23:06,660 --> 00:23:08,660
of a maximum, if I wanted.

365
00:23:12,366 --> 00:23:16,270
So the duality in
linear programming

366
00:23:16,270 --> 00:23:20,710
was the same as von
Neumann's minimax theorem.

367
00:23:20,710 --> 00:23:25,300
And his theorem applied
to two-person games.

368
00:23:25,300 --> 00:23:34,090
So the key math result is
duality for linear programming,

369
00:23:34,090 --> 00:23:38,110
and it's going to be-- you'll
see the same thing happening

370
00:23:38,110 --> 00:23:41,200
for two-person games.

371
00:23:41,200 --> 00:23:44,590
And it's a minimax
theorem, or a saddle point.

372
00:23:44,590 --> 00:23:50,740
Or it's just things
come out right.

373
00:23:50,740 --> 00:23:51,410
Yes.

374
00:23:51,410 --> 00:23:58,440
So just to mention that
mathematical programming,

375
00:23:58,440 --> 00:24:03,440
of course, includes much
more difficult problems.

376
00:24:03,440 --> 00:24:05,850
This is linear programming.

377
00:24:05,850 --> 00:24:10,790
That problem, as you see, has
a beautiful, simple theory.

378
00:24:10,790 --> 00:24:16,410
And the paying attention
is paying attention

379
00:24:16,410 --> 00:24:19,410
to the algorithm, because
you've got two important choices

380
00:24:19,410 --> 00:24:21,770
and they both get
highly developed.

381
00:24:24,840 --> 00:24:25,490
Now, OK.

382
00:24:25,490 --> 00:24:28,740
So for game, now I'm
going to turn to--

383
00:24:28,740 --> 00:24:33,930
well, I'll do an example
of max flow equal min cut,

384
00:24:33,930 --> 00:24:35,900
just see what--

385
00:24:35,900 --> 00:24:38,960
and then go to two-person game.

386
00:24:38,960 --> 00:24:42,100
So here's an example of a--

387
00:24:52,920 --> 00:24:56,200
so I start with a graph.

388
00:24:56,200 --> 00:24:58,400
Let me just imagine.

389
00:25:04,330 --> 00:25:06,400
So this is the source.

390
00:25:09,220 --> 00:25:11,680
This is the sink.

391
00:25:11,680 --> 00:25:15,770
And I'm sending flow
through the network.

392
00:25:15,770 --> 00:25:19,810
So it's a network that
I'm sending flow through.

393
00:25:19,810 --> 00:25:21,760
So my job is to maximize--

394
00:25:24,790 --> 00:25:27,380
I'll set x at the
source to be 0.

395
00:25:32,620 --> 00:25:36,130
And then the flow, the total
flow will come into the sink.

396
00:25:36,130 --> 00:25:42,520
So I want to maximize
x at the sink.

397
00:25:42,520 --> 00:25:45,080
So it's a linear programming
and there are constraints.

398
00:25:45,080 --> 00:25:47,370
So what are the constraints?

399
00:25:47,370 --> 00:25:50,130
Every edge has a capacity.

400
00:25:50,130 --> 00:25:51,240
So let's see.

401
00:25:51,240 --> 00:25:57,460
Suppose that edge
has capacity 5.

402
00:25:57,460 --> 00:26:01,710
Let me put capacities
on all these edges.

403
00:26:01,710 --> 00:26:07,860
2, 1, 3, 4.

404
00:26:07,860 --> 00:26:09,780
I'm just stabbing around here.

405
00:26:12,980 --> 00:26:15,710
1,000.

406
00:26:15,710 --> 00:26:22,310
And let me say 2 and 4.

407
00:26:22,310 --> 00:26:25,570
I have no idea what's
happening here.

408
00:26:25,570 --> 00:26:31,170
But if we see the problem, we'll
probably be able to solve it.

409
00:26:31,170 --> 00:26:36,710
So these constraints are
that the flow variable,

410
00:26:36,710 --> 00:26:42,520
which would be the y that I'm
trying to maximize, cannot--

411
00:26:42,520 --> 00:26:45,340
the amount of flow,
just in ordinary words,

412
00:26:45,340 --> 00:26:50,170
the amount of flow on
the edges can't go higher

413
00:26:50,170 --> 00:26:51,520
than the capacity.

414
00:26:51,520 --> 00:26:54,880
I could send 11 along this
edge, but then I've got nowhere

415
00:26:54,880 --> 00:26:57,870
to send it after that.

416
00:26:57,870 --> 00:27:00,670
I could send, well,
900 on that edge.

417
00:27:00,670 --> 00:27:04,980
but, obviously, it
would get stuck there.

418
00:27:04,980 --> 00:27:09,250
So the question is,
what's the maximum I

419
00:27:09,250 --> 00:27:11,620
can send through that network?

420
00:27:11,620 --> 00:27:15,670
It's a classical problem.

421
00:27:15,670 --> 00:27:20,400
And, in fact, it's an
integer programming problem.

422
00:27:20,400 --> 00:27:22,950
These are all
integer capacities.

423
00:27:22,950 --> 00:27:27,560
I could insist on integer flow.

424
00:27:27,560 --> 00:27:32,880
But it's a very
remarkable integer problem

425
00:27:32,880 --> 00:27:37,080
because I could allow
fractions and the answer

426
00:27:37,080 --> 00:27:40,020
would not be a-- would
not involve fractions.

427
00:27:40,020 --> 00:27:44,730
In other words, if I keep
it as an integer problem,

428
00:27:44,730 --> 00:27:49,420
then the mathematics
is definitely

429
00:27:49,420 --> 00:27:51,660
harder for an integer problem.

430
00:27:51,660 --> 00:27:53,380
What's different?

431
00:27:53,380 --> 00:28:02,720
So the x's could be integers
or they could be real numbers.

432
00:28:07,480 --> 00:28:11,000
Over here, they
were real numbers.

433
00:28:11,000 --> 00:28:13,040
I happened to get
an integer for this,

434
00:28:13,040 --> 00:28:20,540
but if I had 10 crossing
planes in 15 dimensions,

435
00:28:20,540 --> 00:28:24,330
the integers would
be totally lost.

436
00:28:24,330 --> 00:28:32,460
But the point is, here, that
if I allow real numbers,

437
00:28:32,460 --> 00:28:36,360
it doesn't get me any more
flow, that the winning

438
00:28:36,360 --> 00:28:38,490
flow is an integer anyway.

439
00:28:38,490 --> 00:28:43,350
So it's an integer problem
which can be safely relaxed,

440
00:28:43,350 --> 00:28:48,240
and you can safely use simplex
method, or Karmarkar's method,

441
00:28:48,240 --> 00:28:53,250
or any interior point
method with non-integers,

442
00:28:53,250 --> 00:28:57,600
because in the interior here
you're not starting or ending

443
00:28:57,600 --> 00:28:59,070
at integers.

444
00:28:59,070 --> 00:29:04,650
You can do it because
the integer answer

445
00:29:04,650 --> 00:29:07,340
will be, in the end, better.

446
00:29:07,340 --> 00:29:09,480
And what is that answer?

447
00:29:09,480 --> 00:29:16,470
I think I've made those
capacities too easy to--

448
00:29:16,470 --> 00:29:17,940
oh, I didn't do this one.

449
00:29:24,000 --> 00:29:27,710
So what shall I-- shall
I make it large, like 9?

450
00:29:27,710 --> 00:29:29,280
I don't know.

451
00:29:29,280 --> 00:29:40,460
What's the best we can
do through that network?

452
00:29:40,460 --> 00:29:42,620
Can you-- I can't
see it from here yet?

453
00:29:45,515 --> 00:29:47,580
What should I do?

454
00:29:47,580 --> 00:29:49,885
Obviously, in this
simple problem

455
00:29:49,885 --> 00:29:54,360
I can get anything I want
pretty much that far.

456
00:29:54,360 --> 00:29:56,040
But then what can I do?

457
00:29:58,990 --> 00:30:00,430
Let me even make that 19.

458
00:30:05,180 --> 00:30:10,710
So I'm not imposing much of
a limit on that edge either.

459
00:30:10,710 --> 00:30:12,920
But these edges are
quite tight limits,

460
00:30:12,920 --> 00:30:15,810
and these are sort of
intermediate limits.

461
00:30:15,810 --> 00:30:22,140
What do you think is the
most I can send through?

462
00:30:22,140 --> 00:30:25,800
And how would you show me
that I couldn't send more?

463
00:30:25,800 --> 00:30:28,020
That's the key question.

464
00:30:28,020 --> 00:30:31,530
I want to get a bound on the--

465
00:30:31,530 --> 00:30:34,250
an upper bound on this maximum.

466
00:30:34,250 --> 00:30:36,860
This maximum getting
into the sink

467
00:30:36,860 --> 00:30:41,900
is less or equal to-- and
what number would you propose?

468
00:30:41,900 --> 00:30:44,270
Could I get 1,000 through?

469
00:30:44,270 --> 00:30:49,460
I could start, but of course
it would pile up there.

470
00:30:49,460 --> 00:30:50,810
I couldn't get further.

471
00:30:54,010 --> 00:30:57,574
What do you think is
the best I can do?

472
00:30:57,574 --> 00:30:58,550
AUDIENCE: 10?

473
00:30:58,550 --> 00:30:59,860
GILBERT STRANG: 10?

474
00:30:59,860 --> 00:31:01,380
I can't do more than 10?

475
00:31:01,380 --> 00:31:04,340
How could I do 10?

476
00:31:04,340 --> 00:31:07,140
AUDIENCE: So you go 12--

477
00:31:07,140 --> 00:31:08,710
GILBERT STRANG: 12 this way?

478
00:31:08,710 --> 00:31:09,293
AUDIENCE: Yes.

479
00:31:09,293 --> 00:31:10,242
19.

480
00:31:10,242 --> 00:31:11,200
GILBERT STRANG: Oh, OK.

481
00:31:11,200 --> 00:31:13,070
AUDIENCE: And then
split the 8 and 2.

482
00:31:13,070 --> 00:31:14,733
Can you split?

483
00:31:14,733 --> 00:31:15,650
GILBERT STRANG: I see.

484
00:31:15,650 --> 00:31:19,280
10 that way, 10 this-- yes,
absolutely you can split.

485
00:31:19,280 --> 00:31:23,650
And then this could go here, and
that would be able to go there.

486
00:31:23,650 --> 00:31:25,910
So I can get 10 through.

487
00:31:25,910 --> 00:31:27,220
Correct.

488
00:31:27,220 --> 00:31:29,180
AUDIENCE: [INAUDIBLE].

489
00:31:29,180 --> 00:31:31,190
GILBERT STRANG: Can
I do anything better?

490
00:31:31,190 --> 00:31:35,900
Oh, I could be sending some
up this way at the same time.

491
00:31:35,900 --> 00:31:38,950
So I could get 3 along the top.

492
00:31:38,950 --> 00:31:41,440
So this is like an auction.

493
00:31:41,440 --> 00:31:43,720
We can get up to 13.

494
00:31:43,720 --> 00:31:53,300
Can we get-- so that was 3 going
this way and 10 going this way.

495
00:31:53,300 --> 00:31:58,248
Is 13 an-- can I not exceed 13?

496
00:31:58,248 --> 00:31:58,790
AUDIENCE: 14?

497
00:31:58,790 --> 00:32:00,670
AUDIENCE: 14.

498
00:32:00,670 --> 00:32:04,010
GILBERT STRANG: Do I hear 14?

499
00:32:04,010 --> 00:32:06,470
Oh, I've got a lot
of room for one more.

500
00:32:06,470 --> 00:32:07,670
So, OK.

501
00:32:07,670 --> 00:32:10,440
So 14 in any case.

502
00:32:10,440 --> 00:32:10,940
All right.

503
00:32:10,940 --> 00:32:12,227
Do I hear 15?

504
00:32:12,227 --> 00:32:14,060
AUDIENCE: Can you do
two more on the bottom,

505
00:32:14,060 --> 00:32:15,710
like 12 instead of 10?

506
00:32:15,710 --> 00:32:18,350
GILBERT STRANG: If I did
12, then what would I do?

507
00:32:18,350 --> 00:32:20,380
AUDIENCE: Split
the other two up.

508
00:32:20,380 --> 00:32:23,170
GILBERT STRANG: I'd
send two up here.

509
00:32:23,170 --> 00:32:24,920
What am I going to do
with them from here?

510
00:32:24,920 --> 00:32:26,027
AUDIENCE: [INAUDIBLE].

511
00:32:26,027 --> 00:32:27,110
GILBERT STRANG: Send them?

512
00:32:27,110 --> 00:32:28,070
AUDIENCE: Up again.

513
00:32:28,070 --> 00:32:30,630
GILBERT STRANG: Up
again, and along.

514
00:32:30,630 --> 00:32:37,850
But then the 3 that I had right
now would be cut back to 1.

515
00:32:41,430 --> 00:32:46,780
It's a lot of fun,
this max flow problem.

516
00:32:46,780 --> 00:32:49,260
And I'm looking
for a bound to know

517
00:32:49,260 --> 00:32:51,990
when to quit, to know
when I've optimized.

518
00:32:51,990 --> 00:32:53,850
That's the whole
idea of duality,

519
00:32:53,850 --> 00:33:00,940
is to find some upper thing
that I'm trying to push down

520
00:33:00,940 --> 00:33:03,350
but I can't go beyond it.

521
00:33:03,350 --> 00:33:06,540
So I don't think I
could get more than--

522
00:33:06,540 --> 00:33:09,040
you see, everything has
to cross this middle.

523
00:33:09,040 --> 00:33:10,610
So 3 and 4.

524
00:33:10,610 --> 00:33:13,540
And I don't think
I could beat 23.

525
00:33:13,540 --> 00:33:18,360
If somebody said more than
23, I would be very doubtful,

526
00:33:18,360 --> 00:33:21,352
because I couldn't get
it across the middle.

527
00:33:21,352 --> 00:33:24,390
But then, can I get 23?

528
00:33:24,390 --> 00:33:25,920
I doubt it.

529
00:33:25,920 --> 00:33:28,840
Maybe 14 is the best possible.

530
00:33:28,840 --> 00:33:31,330
So how would we show that
14 is the best possible?

531
00:33:35,330 --> 00:33:37,760
I think if I could find--

532
00:33:37,760 --> 00:33:43,610
so this is called a cut,
a cut in this network.

533
00:33:43,610 --> 00:33:45,185
Oh, yes, I see a cut.

534
00:33:45,185 --> 00:33:46,985
A cut like there.

535
00:33:50,310 --> 00:33:51,750
You see that?

536
00:33:51,750 --> 00:33:58,180
Every bit of flow has
got to cross that cut.

537
00:33:58,180 --> 00:34:03,640
And the total capacity crossing
is the 3 and the 1 and the 2

538
00:34:03,640 --> 00:34:06,510
and the 8, which is 14.

539
00:34:06,510 --> 00:34:08,590
So I can't get more
than 14 through.

540
00:34:08,590 --> 00:34:13,620
And somehow that cut
is loaded to capacity.

541
00:34:13,620 --> 00:34:19,494
Probably those edges
all have to be fully up

542
00:34:19,494 --> 00:34:23,530
to capacity that cross the cut.

543
00:34:23,530 --> 00:34:27,290
So the cut is a
separation of edges

544
00:34:27,290 --> 00:34:29,800
that go with the source and--

545
00:34:29,800 --> 00:34:32,110
sorry-- nodes that go
with the source, nodes

546
00:34:32,110 --> 00:34:33,510
that go with the sink.

547
00:34:33,510 --> 00:34:34,010
Yes.

548
00:34:37,030 --> 00:34:39,219
And then it's the
edges across the cut.

549
00:34:42,820 --> 00:34:45,739
Is that OK for an example?

550
00:34:45,739 --> 00:34:47,679
So that's the duality.

551
00:34:47,679 --> 00:34:51,679
The maximum flow
turned out to be 14,

552
00:34:51,679 --> 00:34:54,139
and the minimum cut
turned out to be 14.

553
00:34:54,139 --> 00:34:58,110
And when those match, 14
equal 14, I know I'm through.

554
00:34:58,110 --> 00:35:00,560
I know I'm through because
I'm able to get 14 through

555
00:35:00,560 --> 00:35:02,560
and I could never get more.

556
00:35:02,560 --> 00:35:03,060
Yes.

557
00:35:03,060 --> 00:35:04,880
3 and 4 and 6 and 8.

558
00:35:04,880 --> 00:35:06,770
Yes.

559
00:35:06,770 --> 00:35:11,180
And, of course,
in a big network,

560
00:35:11,180 --> 00:35:15,110
the maximum cut is not
going to be visible.

561
00:35:15,110 --> 00:35:18,590
Well, you couldn't-- it would
have thousands of edges.

562
00:35:18,590 --> 00:35:21,080
You couldn't see
what you were doing,

563
00:35:21,080 --> 00:35:26,660
But you could solve this
problem fast, actually fast.

564
00:35:26,660 --> 00:35:28,440
And it's an important--

565
00:35:28,440 --> 00:35:31,210
in practice, it's an
important example.

566
00:35:31,210 --> 00:35:36,740
A lot of other things fit into
the max flow min cut example.

567
00:35:36,740 --> 00:35:40,310
And therefore solving
it in faster than-- oh,

568
00:35:40,310 --> 00:35:44,150
I didn't even say about speed.

569
00:35:44,150 --> 00:35:49,760
So the simplex method, almost
always it's average case.

570
00:35:49,760 --> 00:35:51,790
Dan Spielman, who
was on the faculty

571
00:35:51,790 --> 00:35:58,520
here, who's just
terrific in this area,

572
00:35:58,520 --> 00:36:03,260
was maybe among the first
to study the average case.

573
00:36:03,260 --> 00:36:08,960
So for an average
choice of A and b and c,

574
00:36:08,960 --> 00:36:11,180
instead of making
the worst choice--

575
00:36:11,180 --> 00:36:15,050
for the worst choice, you
can create a feasible set

576
00:36:15,050 --> 00:36:17,030
that it has to go
corner, corner, corner,

577
00:36:17,030 --> 00:36:22,310
corner to get to the end,
so the simplex method would

578
00:36:22,310 --> 00:36:24,890
take exponentially long.

579
00:36:24,890 --> 00:36:27,020
But that's extremely rare.

580
00:36:27,020 --> 00:36:30,020
It doesn't happen in
practice, I think.

581
00:36:30,020 --> 00:36:33,590
And the average one
is a polynomial,

582
00:36:33,590 --> 00:36:40,580
so it's a fast
method on average.

583
00:36:40,580 --> 00:36:43,880
And we can show-- actually,
that was a famous result that

584
00:36:43,880 --> 00:36:51,260
came from Russia, that linear
programming is in the big P

585
00:36:51,260 --> 00:36:55,220
versus LP world,
linear programming

586
00:36:55,220 --> 00:36:57,620
goes with P. Linear
programming is

587
00:36:57,620 --> 00:37:01,480
a problem that can be
solved in polynomial time.

588
00:37:01,480 --> 00:37:05,700
The simplex method won't always
do it but it can be done.

589
00:37:05,700 --> 00:37:06,200
Yes.

590
00:37:06,200 --> 00:37:10,430
There came into the world,
the ellipsoid methods,

591
00:37:10,430 --> 00:37:15,650
and it was an exciting time to
decide that linear programming

592
00:37:15,650 --> 00:37:18,400
was actually P and not LP.

593
00:37:22,900 --> 00:37:27,130
So this is a case of duality
where you can understand

594
00:37:27,130 --> 00:37:32,620
duality by the statement
that the flow cannot exceed

595
00:37:32,620 --> 00:37:34,720
the capacity of the cut.

596
00:37:34,720 --> 00:37:42,460
So that's what duality
is, that any flow has

597
00:37:42,460 --> 00:37:51,955
to be less or equal the
capacity of any cut,

598
00:37:51,955 --> 00:37:56,650
the capacity being
the sum of the ones.

599
00:37:56,650 --> 00:37:58,720
So, of course, there
are other cuts.

600
00:37:58,720 --> 00:38:01,810
That cut has to be
crossed, but that's

601
00:38:01,810 --> 00:38:05,710
gone 3, 7, 14, 22 capacity.

602
00:38:05,710 --> 00:38:11,030
So the 14 capacity
was the minimum,

603
00:38:11,030 --> 00:38:12,620
and that gave the maximum.

604
00:38:12,620 --> 00:38:13,120
Nice.

605
00:38:13,120 --> 00:38:14,060
Isn't it nice?

606
00:38:14,060 --> 00:38:14,560
Yes.

607
00:38:17,100 --> 00:38:22,860
So there is another world
here of two-person games

608
00:38:22,860 --> 00:38:26,190
that also has duality
and could be expressed.

609
00:38:26,190 --> 00:38:31,680
So let me just talk finally
today about two-person games.

610
00:38:31,680 --> 00:38:35,430
So the two persons
are x and y of course.

611
00:38:35,430 --> 00:38:37,770
What other names
could they have?

612
00:38:37,770 --> 00:38:44,670
And there is a matrix involved.

613
00:38:44,670 --> 00:38:47,325
This is the payoff matrix.

614
00:38:51,690 --> 00:38:53,930
So let me take a very
simple game first.

615
00:38:56,690 --> 00:39:01,890
So x is going to choose
one of these rows,

616
00:39:01,890 --> 00:39:05,040
and y is going to choose
one of these columns.

617
00:39:05,040 --> 00:39:07,920
And let's say
payoff from x to y.

618
00:39:11,760 --> 00:39:16,980
I like it that way because,
then, x, who's paying,

619
00:39:16,980 --> 00:39:21,090
is going to be minimizing,
as x was up here.

620
00:39:21,090 --> 00:39:24,170
And y, who's
collecting, is going

621
00:39:24,170 --> 00:39:28,070
to be maximizing,
as in the dual.

622
00:39:28,070 --> 00:39:30,660
So the payoff-- well, let's see.

623
00:39:30,660 --> 00:39:37,050
I think maybe 1, 4, 2, 8 would
be probably a fairly easy game

624
00:39:37,050 --> 00:39:39,900
to play.

625
00:39:39,900 --> 00:39:43,290
So it's a two-person
game, a zero-sum game.

626
00:39:43,290 --> 00:39:54,630
Zero sum means--
that means all pay--

627
00:39:54,630 --> 00:39:58,026
what x pays goes to y.

628
00:39:58,026 --> 00:40:03,840
y gets all that x pays.

629
00:40:03,840 --> 00:40:08,250
There's no third party here.

630
00:40:08,250 --> 00:40:09,300
No lawyers involved.

631
00:40:13,310 --> 00:40:16,280
And y is going to
choose a column,

632
00:40:16,280 --> 00:40:18,870
and x is going to choose a row.

633
00:40:18,870 --> 00:40:23,790
And x wants to make it small,
and y wants to make it big.

634
00:40:23,790 --> 00:40:26,100
So what happens in this game?

635
00:40:26,100 --> 00:40:28,230
What does y choose
to make it big?

636
00:40:28,230 --> 00:40:29,760
The second column.

637
00:40:29,760 --> 00:40:31,930
What does x choose
to make it small?

638
00:40:31,930 --> 00:40:33,420
The first row.

639
00:40:33,420 --> 00:40:37,510
So it's going to be
focused in on that 2,

640
00:40:37,510 --> 00:40:42,920
because if y keeps choosing
that column, 2 is the best--

641
00:40:42,920 --> 00:40:46,170
the least that-- x is
going to have to pay 2,

642
00:40:46,170 --> 00:40:50,550
and he achieves that by
choosing the first row.

643
00:40:50,550 --> 00:40:52,820
So it's a simple game.

644
00:40:52,820 --> 00:40:54,890
That's a saddle point.

645
00:40:54,890 --> 00:40:59,000
It's a minimum for
y in its column,

646
00:40:59,000 --> 00:41:03,090
and it's a maximum
for x in its row.

647
00:41:03,090 --> 00:41:05,400
But, of course, a
matrix, another matrix

648
00:41:05,400 --> 00:41:08,380
might not have such
a saddle point.

649
00:41:08,380 --> 00:41:09,630
So let me-- do you see--

650
00:41:09,630 --> 00:41:12,960
OK with this game?

651
00:41:12,960 --> 00:41:15,090
That would be a sort
of straightforward game

652
00:41:15,090 --> 00:41:17,460
where simple strategy--

653
00:41:17,460 --> 00:41:22,320
column 2 every time, row 1
every time is the optimal.

654
00:41:22,320 --> 00:41:24,570
But now let me
just exchange that.

655
00:41:29,980 --> 00:41:33,440
So, again, x1 and x2.

656
00:41:33,440 --> 00:41:36,230
y1 and y2 are the two columns.

657
00:41:36,230 --> 00:41:37,100
What happens now?

658
00:41:40,740 --> 00:41:43,641
Well, y kind of likes
the big number 8 there.

659
00:41:46,940 --> 00:41:48,550
So it goes for the second.

660
00:41:48,550 --> 00:41:51,580
The second column still
has the bigger numbers.

661
00:41:51,580 --> 00:41:55,180
So y aims for column 2.

662
00:41:55,180 --> 00:41:58,440
But then what does x do?

663
00:41:58,440 --> 00:42:03,143
What row does x choose
if y is in the column 2?

664
00:42:03,143 --> 00:42:04,010
AUDIENCE: Second.

665
00:42:04,010 --> 00:42:05,520
GILBERT STRANG: The second.

666
00:42:05,520 --> 00:42:06,160
OK.

667
00:42:06,160 --> 00:42:09,600
So have I found a saddle point?

668
00:42:09,600 --> 00:42:14,640
Where y chooses this
column, x chooses this row,

669
00:42:14,640 --> 00:42:16,810
is that a saddle point?

670
00:42:16,810 --> 00:42:17,310
No.

671
00:42:17,310 --> 00:42:20,100
Because what will y do now?

672
00:42:20,100 --> 00:42:24,480
He sees x choosing the
second row all the time,

673
00:42:24,480 --> 00:42:29,280
and y sees a 4, a very
tempting 4, in that row.

674
00:42:29,280 --> 00:42:35,040
So y is going to choose y1,
the first column, pretty often.

675
00:42:35,040 --> 00:42:38,580
Not all the time, because
what happens if he just--

676
00:42:38,580 --> 00:42:42,300
if y chooses this
column all the time,

677
00:42:42,300 --> 00:42:45,270
then x will choose
this row and the payoff

678
00:42:45,270 --> 00:42:50,880
would only be 1, so that things
went downhill for y there.

679
00:42:50,880 --> 00:42:53,950
So this is not a saddle point.

680
00:42:53,950 --> 00:42:54,840
And what do we do?

681
00:43:00,340 --> 00:43:07,390
Mixed strategy, which is y
will choose the two columns

682
00:43:07,390 --> 00:43:10,770
with some probabilities
that add to 1.

683
00:43:10,770 --> 00:43:19,360
So there will be a third
possibility, that p x1 and 1--

684
00:43:19,360 --> 00:43:22,840
I'm sorry-- p y1.

685
00:43:22,840 --> 00:43:28,120
p times this column and 1 minus
p times the second column.

686
00:43:28,120 --> 00:43:38,410
So that will be p plus 1
minus p times 8, and 4p plus 1

687
00:43:38,410 --> 00:43:41,250
minus p times 2.

688
00:43:41,250 --> 00:43:44,170
That's this mixed column.

689
00:43:44,170 --> 00:43:51,550
By mixing his strategy,
we have a strategy

690
00:43:51,550 --> 00:43:53,050
like a third person, y3.

691
00:43:55,630 --> 00:44:03,520
And, of course, the x is going
to notice after a while what

692
00:44:03,520 --> 00:44:06,350
the strategy is.

693
00:44:06,350 --> 00:44:09,650
This is an open competition.

694
00:44:09,650 --> 00:44:13,040
You're not hiding-- you're
not able to hide anything.

695
00:44:13,040 --> 00:44:16,220
You might think, well,
maybe y will jump around,

696
00:44:16,220 --> 00:44:19,620
but that's foolishness.

697
00:44:19,620 --> 00:44:24,470
y is going to end
up finding the best

698
00:44:24,470 --> 00:44:26,880
choice of p between 0 and 1.

699
00:44:29,510 --> 00:44:32,060
And it will actually
be between 0 and 1

700
00:44:32,060 --> 00:44:38,010
because the extreme strategies
of column 1 and column 2

701
00:44:38,010 --> 00:44:39,760
were not winners.

702
00:44:39,760 --> 00:44:42,290
But now x is going
to do the same thing.

703
00:44:42,290 --> 00:44:44,660
p x1-- he's going to--

704
00:44:44,660 --> 00:44:51,160
he or she is going to
combine those rows.

705
00:44:51,160 --> 00:44:58,420
So this would be p1 for
the first row, and 1--

706
00:44:58,420 --> 00:45:03,900
sorry-- p-- oh, I better
choose another number.

707
00:45:03,900 --> 00:45:05,510
What's another letter than p?

708
00:45:05,510 --> 00:45:06,460
AUDIENCE: q.

709
00:45:06,460 --> 00:45:08,620
GILBERT STRANG: q?

710
00:45:08,620 --> 00:45:09,810
OK, q.

711
00:45:14,820 --> 00:45:21,360
So this row is a combination
of those rows, so it's q plus 1

712
00:45:21,360 --> 00:45:32,500
minus q times 4 in the first
position, and p times 8 and 1

713
00:45:32,500 --> 00:45:35,760
minus p-- q times 8.

714
00:45:35,760 --> 00:45:39,940
q times 8 and 1 minus q times 2.

715
00:45:39,940 --> 00:45:44,242
I'm sorry that I've
written this too small.

716
00:45:44,242 --> 00:45:46,126
But what's going to happen?

717
00:45:50,370 --> 00:45:54,200
What's going to determine
p and q and solve the game?

718
00:45:57,840 --> 00:46:04,100
Well, if these are
equal, then what happens?

719
00:46:04,100 --> 00:46:08,180
Then x is OK with either one.

720
00:46:10,790 --> 00:46:14,330
x has nothing to choose
if those are equal

721
00:46:14,330 --> 00:46:18,740
and y is staying with his
mixed strategy, which gives him

722
00:46:18,740 --> 00:46:22,640
this third combination column.

723
00:46:22,640 --> 00:46:24,620
Then x is good.

724
00:46:24,620 --> 00:46:28,730
Now, you might say, well, then
x could do what it wanted.

725
00:46:28,730 --> 00:46:31,850
But x couldn't do
what it wanted.

726
00:46:31,850 --> 00:46:37,850
If x doesn't stick with the
optimal strategy q for x,

727
00:46:37,850 --> 00:46:40,040
then y will take advantage.

728
00:46:40,040 --> 00:46:45,460
So, really, the game
settles down to once--

729
00:46:45,460 --> 00:46:47,630
so these should be equal.

730
00:46:47,630 --> 00:46:51,770
So when those are
equal, what do I get?

731
00:46:51,770 --> 00:46:57,110
I have p-- it looks
like I have 8 minus 7p

732
00:46:57,110 --> 00:46:59,000
there for the first one.

733
00:46:59,000 --> 00:47:01,670
And here, I have 4p minus 2p.

734
00:47:01,670 --> 00:47:04,640
That's 2p plus 1.

735
00:47:04,640 --> 00:47:07,074
Did I get that possibly right?

736
00:47:07,074 --> 00:47:08,038
AUDIENCE: Plus 2.

737
00:47:11,412 --> 00:47:15,550
GILBERT STRANG: 2, 4p
minus 2p is the 2p.

738
00:47:15,550 --> 00:47:16,690
Oh, it's 2.

739
00:47:16,690 --> 00:47:17,640
Yes.

740
00:47:17,640 --> 00:47:19,190
OK.

741
00:47:19,190 --> 00:47:20,770
So those are equal.

742
00:47:20,770 --> 00:47:25,635
And that tells me that
9p is 6, and p is 2/3.

743
00:47:29,300 --> 00:47:33,800
So it turned out that
the best strategy for y

744
00:47:33,800 --> 00:47:39,770
is 2/3 on the first column
which didn't look so promising,

745
00:47:39,770 --> 00:47:42,950
and 1/3 on the second column.

746
00:47:42,950 --> 00:47:48,210
And by creating that strategy,
what do these numbers come out

747
00:47:48,210 --> 00:47:48,710
to be?

748
00:47:48,710 --> 00:47:50,730
And they're supposed
to come out equal.

749
00:47:50,730 --> 00:47:57,040
So 4p plus-- so 4
times 2/3 is 8/3,

750
00:47:57,040 --> 00:48:00,320
plus 1 minus p is 1/3 times--

751
00:48:00,320 --> 00:48:03,140
plus 2/3 is 10/3.

752
00:48:03,140 --> 00:48:06,260
So I think that this is 10/3.

753
00:48:06,260 --> 00:48:10,250
And if that one isn't
10/3, I'm very surprised.

754
00:48:10,250 --> 00:48:14,220
So that's 2/3-- help.

755
00:48:14,220 --> 00:48:16,172
Is it?

756
00:48:16,172 --> 00:48:23,140
4p, 2p, 4/3.

757
00:48:23,140 --> 00:48:27,860
Yes, it's telling
me they're the same.

758
00:48:27,860 --> 00:48:29,070
Did I set them equal?

759
00:48:29,070 --> 00:48:29,570
Yes.

760
00:48:29,570 --> 00:48:30,920
I set them equal to find p.

761
00:48:30,920 --> 00:48:33,330
So they have to be the same.

762
00:48:33,330 --> 00:48:38,070
Except for instructor mistakes,
which never happen, I suppose.

763
00:48:38,070 --> 00:48:38,910
OK.

764
00:48:38,910 --> 00:48:41,580
So those should be equal.

765
00:48:41,580 --> 00:48:48,330
Then x has no way to do
better, choose these columns.

766
00:48:48,330 --> 00:48:50,220
But he can't choose
those columns freely--

767
00:48:50,220 --> 00:48:55,590
those rows freely because y
could take advantage, unless--

768
00:48:55,590 --> 00:49:00,990
so this would give the
best strategy for q--

769
00:49:00,990 --> 00:49:04,030
sorry-- the best q for x.

770
00:49:04,030 --> 00:49:04,530
Yes.

771
00:49:07,533 --> 00:49:08,700
Do you see that picture now?

772
00:49:08,700 --> 00:49:10,860
There could be other--

773
00:49:10,860 --> 00:49:13,470
it could be a much
bigger matrix, of course.

774
00:49:13,470 --> 00:49:15,150
There could be other columns.

775
00:49:15,150 --> 00:49:19,830
Suppose there was a 0, 0 column.

776
00:49:19,830 --> 00:49:21,630
What difference-- what
would be the effect

777
00:49:21,630 --> 00:49:25,620
on the optimal strategy of
having that additional column,

778
00:49:25,620 --> 00:49:29,820
that additional option for y3?

779
00:49:29,820 --> 00:49:32,130
Well, he wouldn't take it.

780
00:49:32,130 --> 00:49:34,850
So let's make it more tempting.

781
00:49:34,850 --> 00:49:38,730
10, 10, 10.

782
00:49:38,730 --> 00:49:41,360
Oh, he would take that, right?

783
00:49:41,360 --> 00:49:41,860
Yes.

784
00:49:41,860 --> 00:49:45,760
That was not-- so I'm not sure
what I am trying to say here.

785
00:49:49,360 --> 00:49:51,150
Yes.

786
00:49:51,150 --> 00:49:54,900
There certainly could be--
there could be rows that--

787
00:49:54,900 --> 00:49:59,770
or columns, or rows for x,
that don't enter the mixed,

788
00:49:59,770 --> 00:50:02,040
the optimal mixed strategy.

789
00:50:02,040 --> 00:50:04,060
A mixed strategy
is some combination

790
00:50:04,060 --> 00:50:08,350
of strategies, some
combination of pure strategies,

791
00:50:08,350 --> 00:50:11,500
like choose this
column, choose this row.

792
00:50:11,500 --> 00:50:15,820
But some columns may
not-- or rows-- may not

793
00:50:15,820 --> 00:50:17,800
show up in the best mixture.

794
00:50:17,800 --> 00:50:22,342
So I won't complete that
partial thought there.

795
00:50:26,700 --> 00:50:31,260
So do you see that we
have a duality theorem?

796
00:50:31,260 --> 00:50:35,470
First of all, you could write
that as a linear program.

797
00:50:35,470 --> 00:50:39,030
The unknown is p for x--

798
00:50:39,030 --> 00:50:40,190
for y.

799
00:50:40,190 --> 00:50:42,630
The unknown is q for x.

800
00:50:42,630 --> 00:50:50,370
So, actually, you just have one
unknown in this simple problem,

801
00:50:50,370 --> 00:50:52,200
for this small problem.

802
00:50:52,200 --> 00:50:55,690
But you have a minimization,
a maximization.

803
00:50:55,690 --> 00:50:58,260
They meet at the optimum.

804
00:50:58,260 --> 00:51:05,460
And the duality theorem
becomes the theorem

805
00:51:05,460 --> 00:51:08,040
for two-person games.

806
00:51:08,040 --> 00:51:12,330
Linear programming
matches two-person games.

807
00:51:12,330 --> 00:51:16,430
The point about two
persons is very important.

808
00:51:16,430 --> 00:51:20,400
Three-person games are
incredibly complicated.

809
00:51:20,400 --> 00:51:27,600
No theory at this simple level
would solve three-person games.

810
00:51:27,600 --> 00:51:32,760
So that's where John
Nash's Nobel Prize

811
00:51:32,760 --> 00:51:35,830
comes into the picture.

812
00:51:35,830 --> 00:51:38,340
So his Nobel Prize
was in economics

813
00:51:38,340 --> 00:51:40,710
because of the
wide applications,

814
00:51:40,710 --> 00:51:45,060
but he was able to
analyze the problem of--

815
00:51:45,060 --> 00:51:47,895
and for functions more
than for matrices--

816
00:51:47,895 --> 00:51:53,760
of a three-person
or n-person game.

817
00:51:53,760 --> 00:51:57,070
So you know the
story of John Nash?

818
00:51:57,070 --> 00:52:01,070
The book A Beautiful Mind and
the movie A Beautiful Mind.

819
00:52:01,070 --> 00:52:03,560
So it's one of those
movies that involves MIT

820
00:52:03,560 --> 00:52:05,750
because he was here.

821
00:52:05,750 --> 00:52:09,760
But you can't recognize
anything about MIT in the movie.

822
00:52:09,760 --> 00:52:12,762
It's like Good Will Hunting.

823
00:52:12,762 --> 00:52:17,360
You're maybe in some basement
of some remote building.

824
00:52:17,360 --> 00:52:20,360
But anyway.

825
00:52:20,360 --> 00:52:22,550
It was of course a tragic--

826
00:52:22,550 --> 00:52:26,750
tragic, and then cheer, then
wonderful, and then tragic

827
00:52:26,750 --> 00:52:29,060
again for John Nash.

828
00:52:29,060 --> 00:52:33,650
He had-- so when
I met him, he was

829
00:52:33,650 --> 00:52:36,470
going to teach linear
algebra, one section,

830
00:52:36,470 --> 00:52:39,530
but that never developed.

831
00:52:39,530 --> 00:52:41,310
That was when he was--

832
00:52:41,310 --> 00:52:45,710
his mental state
was going downhill.

833
00:52:45,710 --> 00:52:53,010
And he moved to Princeton
and just stayed.

834
00:52:53,010 --> 00:52:55,640
And then the wonderful
thing was that he improved,

835
00:52:55,640 --> 00:52:58,820
and then the very sad
thing was his death

836
00:52:58,820 --> 00:53:04,640
in a car accident on the way
home from the Nobel Prize.

837
00:53:04,640 --> 00:53:09,090
So quite a story,
an amazing story.

838
00:53:09,090 --> 00:53:10,430
Yes.

839
00:53:10,430 --> 00:53:17,180
So that's specific optimization
problems, linear programming

840
00:53:17,180 --> 00:53:19,250
and two-person games.

841
00:53:19,250 --> 00:53:27,240
And I hope that Friday it
will be Professor Sra. Anyway,

842
00:53:27,240 --> 00:53:30,540
the lecture will certainly
be about stochastic gradient

843
00:53:30,540 --> 00:53:31,200
descent.

844
00:53:31,200 --> 00:53:31,800
Good.

845
00:53:31,800 --> 00:53:33,350
Thanks.