1
00:00:13,972 --> 00:00:16,440
GILBERT STRANG: OK,
so I was speaking

2
00:00:16,440 --> 00:00:19,650
about eigenvalues
and eigenvectors

3
00:00:19,650 --> 00:00:21,690
for a square matrix.

4
00:00:21,690 --> 00:00:25,760
And then I said for data
for many other applications,

5
00:00:25,760 --> 00:00:27,390
the matrices are not square.

6
00:00:27,390 --> 00:00:31,560
We need something that replaces
eigenvalues and eigenvectors.

7
00:00:31,560 --> 00:00:32,640
And what they are--

8
00:00:32,640 --> 00:00:37,530
and it's perfect-- is singular
values and singular vectors.

9
00:00:37,530 --> 00:00:41,620
So may I explain singular
values and singular vectors?

10
00:00:41,620 --> 00:00:45,930
This slide shows a lot of them.

11
00:00:45,930 --> 00:00:51,010
The point is that
there will be--

12
00:00:51,010 --> 00:00:54,430
now I don't say eigenvectors--
two-- different left singular

13
00:00:54,430 --> 00:00:55,380
vectors.

14
00:00:55,380 --> 00:00:58,270
They will go into this matrix u.

15
00:00:58,270 --> 00:01:01,740
Right singular vectors
will go into v.

16
00:01:01,740 --> 00:01:03,880
It was the other case
that was so special.

17
00:01:03,880 --> 00:01:06,820
When the matrix was
symmetric, then the left

18
00:01:06,820 --> 00:01:08,250
equals left eigenvector.

19
00:01:08,250 --> 00:01:10,720
They're the same
as the right one.

20
00:01:10,720 --> 00:01:12,430
That's sort of sensible.

21
00:01:12,430 --> 00:01:14,740
But a general
matrix and certainly

22
00:01:14,740 --> 00:01:16,135
a rectangular matrix--

23
00:01:19,270 --> 00:01:21,100
well, we don't call
them eigenvectors,

24
00:01:21,100 --> 00:01:22,600
because that would
be confusing-- we

25
00:01:22,600 --> 00:01:24,520
call them singular vectors.

26
00:01:24,520 --> 00:01:28,750
And then, inbetween
are not eigenvalues,

27
00:01:28,750 --> 00:01:31,450
but singular values.

28
00:01:31,450 --> 00:01:32,130
Oh, right.

29
00:01:32,130 --> 00:01:34,700
Oh, hiding over here is a key.

30
00:01:34,700 --> 00:01:40,120
A times the v's gives
sigma times the u's.

31
00:01:40,120 --> 00:01:43,210
So that's the replacement
for ax equal lambda

32
00:01:43,210 --> 00:01:45,510
x, which had x on both sides.

33
00:01:45,510 --> 00:01:48,330
OK, now we've got two.

34
00:01:48,330 --> 00:01:51,820
But the beauty is now we've
got two of those to work with.

35
00:01:51,820 --> 00:01:57,280
We can make all the u's
orthogonal to each other--

36
00:01:57,280 --> 00:02:01,000
all the v's orthogonal
to each other.

37
00:02:01,000 --> 00:02:03,910
We can do what only
symmetric matrices

38
00:02:03,910 --> 00:02:06,160
could do for eigenvectors.

39
00:02:06,160 --> 00:02:08,289
We can do it now
for all matrices,

40
00:02:08,289 --> 00:02:13,150
not even squares, just
this is where life is, OK.

41
00:02:13,150 --> 00:02:15,340
And these numbers
instead of the lambdas

42
00:02:15,340 --> 00:02:17,350
are called singular values.

43
00:02:17,350 --> 00:02:20,290
And we use the letter
sigma for those.

44
00:02:20,290 --> 00:02:24,110
And here is a picture of
the geometry in 2 by 2

45
00:02:24,110 --> 00:02:26,600
if we had a 2 by 2 matrix.

46
00:02:26,600 --> 00:02:29,860
So you remember,
factorization breaks

47
00:02:29,860 --> 00:02:33,960
up a matrix into
separate small parts,

48
00:02:33,960 --> 00:02:36,160
each doing its own thing.

49
00:02:36,160 --> 00:02:39,630
So if I multiply by
vector x, the first thing

50
00:02:39,630 --> 00:02:42,270
that's going to hit
it is v transpose.

51
00:02:42,270 --> 00:02:45,660
V transpose is an
orthogonal matrix.

52
00:02:45,660 --> 00:02:48,930
Remember, I said we can
make these singular vectors

53
00:02:48,930 --> 00:02:50,160
perpendicular.

54
00:02:50,160 --> 00:02:52,200
That's what an orthogonal
matrix-- so it's just

55
00:02:52,200 --> 00:02:54,390
like a rotation that you see.

56
00:02:54,390 --> 00:03:00,290
So the v transpose is just
turns the vector to get here

57
00:03:00,290 --> 00:03:01,890
to get to the second one.

58
00:03:01,890 --> 00:03:04,200
Then I'm multiplying
by the lambdas.

59
00:03:04,200 --> 00:03:05,490
But they're not lambdas now.

60
00:03:05,490 --> 00:03:07,040
They're sigma.

61
00:03:07,040 --> 00:03:10,360
The matrix, so
that's capital sigma.

62
00:03:10,360 --> 00:03:12,940
So there is sigma 1 and sigma 2.

63
00:03:12,940 --> 00:03:16,140
What they do is
stretch the circle.

64
00:03:16,140 --> 00:03:17,470
It's a diagonal matrix.

65
00:03:17,470 --> 00:03:20,020
So it doesn't turn things.

66
00:03:20,020 --> 00:03:23,590
But it stretches the
circle to an ellipse

67
00:03:23,590 --> 00:03:26,710
because it gets the two
different singular values

68
00:03:26,710 --> 00:03:28,480
in-- sigma 1 and sigma 2.

69
00:03:28,480 --> 00:03:34,480
And then the last guy, the
u is going to hit last.

70
00:03:34,480 --> 00:03:36,820
It takes the ellipse
and turns out again.

71
00:03:36,820 --> 00:03:38,350
It's again a rotation--

72
00:03:38,350 --> 00:03:41,758
rotation, stretch, rotation.

73
00:03:41,758 --> 00:03:42,550
I'll say it again--

74
00:03:42,550 --> 00:03:45,140
rotation, stretch, rotation.

75
00:03:45,140 --> 00:03:47,860
That's what singular
values and singular

76
00:03:47,860 --> 00:03:51,430
vectors do, the singular
value decomposition.

77
00:03:51,430 --> 00:03:56,740
And it's got the best
of all worlds here.

78
00:03:56,740 --> 00:04:01,750
It's got the rotations,
the orthogonal matrices.

79
00:04:01,750 --> 00:04:06,003
And it's got the stretches,
the diagonal matrices.

80
00:04:06,003 --> 00:04:07,920
Compared to those two,
those are the greatest.

81
00:04:07,920 --> 00:04:13,290
Triangular matrices were good
when we were young an hour ago.

82
00:04:13,290 --> 00:04:15,670
Now, we're seeing the best.

83
00:04:15,670 --> 00:04:20,014
OK, now let me just show
you where they come from.

84
00:04:20,014 --> 00:04:23,780
Oh, so how to find these v's.

85
00:04:23,780 --> 00:04:27,970
Well, the answer is, if I'm
looking for orthogonal vectors,

86
00:04:27,970 --> 00:04:32,540
the great idea is find
a symmetric matrix

87
00:04:32,540 --> 00:04:35,000
and with those eigenvectors.

88
00:04:35,000 --> 00:04:40,280
So these v's that I want for
A are actually eigenvectors

89
00:04:40,280 --> 00:04:43,100
of this symmetric
matrix A transpose times

90
00:04:43,100 --> 00:04:45,560
A. That's just nice.

91
00:04:45,560 --> 00:04:48,350
So we can find those
singular vectors

92
00:04:48,350 --> 00:04:51,110
just as fast as we
can find eigenvectors

93
00:04:51,110 --> 00:04:52,700
for a symmetric matrix.

94
00:04:52,700 --> 00:04:56,510
And we know there, because
A transpose A is symmetric.

95
00:04:56,510 --> 00:04:59,540
We know the eigenvectors
are perpendicular

96
00:04:59,540 --> 00:05:01,580
to each other orthonormal.

97
00:05:01,580 --> 00:05:04,910
OK, and now what about the
other ones because remember,

98
00:05:04,910 --> 00:05:06,160
we have two sets.

99
00:05:06,160 --> 00:05:13,100
The u's-- well, we just multiply
by A. And we've got the u's.

100
00:05:13,100 --> 00:05:16,280
Well, and divide by sigmas,
because these vectors u's

101
00:05:16,280 --> 00:05:19,700
and v's are unit
vectors, length one.

102
00:05:19,700 --> 00:05:22,100
So we have to scale
them properly.

103
00:05:22,100 --> 00:05:27,380
And this was a little key
bit of algebra to check that,

104
00:05:27,380 --> 00:05:29,660
not only the v's
were orthogonal,

105
00:05:29,660 --> 00:05:31,880
but the u's are orthogonal.

106
00:05:31,880 --> 00:05:33,320
Yeah, it just comes out--

107
00:05:33,320 --> 00:05:34,590
comes out.

108
00:05:34,590 --> 00:05:36,870
So this singular
value decomposition,

109
00:05:36,870 --> 00:05:41,750
which is maybe, well, say 100
years old, maybe a bit more.

110
00:05:41,750 --> 00:05:48,410
But it's really in the last 20,
30 years that singular values

111
00:05:48,410 --> 00:05:50,160
have become so important.

112
00:05:50,160 --> 00:05:54,212
This is the best
factorization of them all.

113
00:05:54,212 --> 00:05:57,620
And that's not always reflected
in linear algebra courses.

114
00:05:57,620 --> 00:06:04,540
So part of my goal today is
to say get to singular values.

115
00:06:04,540 --> 00:06:08,780
If you've done symmetric
matrices and their eigenvalues,

116
00:06:08,780 --> 00:06:11,220
then you can do singular values.

117
00:06:11,220 --> 00:06:17,850
And I think that's absolutely
worth doing, OK, yeah,

118
00:06:17,850 --> 00:06:22,850
so and remembering down here
that capital Sigma stands

119
00:06:22,850 --> 00:06:27,500
for the diagonal matrix of
these positive numbers, sigma 1,

120
00:06:27,500 --> 00:06:30,500
sigma 2 down to sigma r there.

121
00:06:30,500 --> 00:06:34,820
The rank, which came way
back in the first slides,

122
00:06:34,820 --> 00:06:36,830
tells you how many there are.

123
00:06:36,830 --> 00:06:40,700
Good, good.

124
00:06:40,700 --> 00:06:43,230
Oh, here's an example.

125
00:06:43,230 --> 00:06:45,740
So I took a small
matrix because I'm

126
00:06:45,740 --> 00:06:49,430
doing this by pencil and
paper and actually showing you

127
00:06:49,430 --> 00:06:52,590
that the singular value.

128
00:06:52,590 --> 00:06:55,100
So there is my matrix, 2 by 2.

129
00:06:55,100 --> 00:06:56,310
Here are the u's.

130
00:06:56,310 --> 00:06:58,130
Do you see that those
are orthogonal--

131
00:06:58,130 --> 00:07:01,070
1, 3 against minus 3, 1?

132
00:07:01,070 --> 00:07:03,150
Take the dot product,
and you get 0.

133
00:07:03,150 --> 00:07:04,960
The v's are orthogonal.

134
00:07:04,960 --> 00:07:06,800
The sigma is diagonal.

135
00:07:06,800 --> 00:07:11,890
And then the pieces from
that add back to the matrix.

136
00:07:11,890 --> 00:07:14,190
So it's really, it's
broken my matrix

137
00:07:14,190 --> 00:07:16,560
into a couple of pieces--

138
00:07:16,560 --> 00:07:20,720
one for the first
singular value in vector,

139
00:07:20,720 --> 00:07:23,980
and the other for the second
singular value in vector.

140
00:07:23,980 --> 00:07:26,430
And that's what
data science wants.

141
00:07:26,430 --> 00:07:30,310
Data science wants to know
what's important in the matrix?

142
00:07:30,310 --> 00:07:34,610
Well, what's important
is sigma 1, the big guy.

143
00:07:34,610 --> 00:07:36,400
Sigma 2, you see.

144
00:07:36,400 --> 00:07:38,020
Well, it was 3 times smaller--

145
00:07:38,020 --> 00:07:40,450
3/2 versus 1/2.

146
00:07:40,450 --> 00:07:45,760
So if I had a 100 by 100
matrix or 100 by 1,000,

147
00:07:45,760 --> 00:07:51,560
I'd have 100 singular values and
maybe the first five I'd keep.

148
00:07:51,560 --> 00:07:54,502
If I'm in the financial
market, those guys,

149
00:07:54,502 --> 00:07:57,490
those first numbers
are telling me

150
00:07:57,490 --> 00:08:00,790
maybe what bond prices
are going to do over time.

151
00:08:00,790 --> 00:08:06,110
And it's a mixture
of a few features,

152
00:08:06,110 --> 00:08:09,930
but not all 1,000
features, right.

153
00:08:09,930 --> 00:08:13,890
So this is singular
value decomposition

154
00:08:13,890 --> 00:08:17,450
picks out the important
part of a data matrix.

155
00:08:17,450 --> 00:08:19,490
And you cannot ask
for a more than that.

156
00:08:22,250 --> 00:08:25,700
Here's what you do with a matrix
is just totally enormous--

157
00:08:25,700 --> 00:08:27,730
too big to multiply--

158
00:08:27,730 --> 00:08:29,560
too big to compute.

159
00:08:29,560 --> 00:08:37,220
Then you randomly sample it.

160
00:08:37,220 --> 00:08:39,799
Yeah, maybe the next
slide even mentions

161
00:08:39,799 --> 00:08:42,690
that word randomized
numerical linear algebra.

162
00:08:42,690 --> 00:08:46,170
So this, I'll go back to this.

163
00:08:46,170 --> 00:08:48,980
So the singular
value decomposition--

164
00:08:48,980 --> 00:08:52,410
this, what we just talked
about with the u's and the v's

165
00:08:52,410 --> 00:08:54,350
and the sigmas.

166
00:08:54,350 --> 00:08:56,670
Sigma 1 is the biggest.

167
00:08:56,670 --> 00:08:59,400
Sigma r is the smallest.

168
00:08:59,400 --> 00:09:01,640
So in data science,
you very often

169
00:09:01,640 --> 00:09:06,410
keep just these first ones,
maybe the first k, the k

170
00:09:06,410 --> 00:09:08,150
largest ones.

171
00:09:08,150 --> 00:09:10,250
And then you've
got the matrix that

172
00:09:10,250 --> 00:09:14,900
has rank only k, because you're
only working with k vectors.

173
00:09:14,900 --> 00:09:19,130
And it turns out that's the
closest one to the big matrix

174
00:09:19,130 --> 00:09:23,610
A. So this singular value
is among other things

175
00:09:23,610 --> 00:09:27,930
is picking out, putting
in order of importance

176
00:09:27,930 --> 00:09:30,940
the little pieces of the matrix.

177
00:09:30,940 --> 00:09:34,700
And then you can just pick
a few pieces to work with.

178
00:09:34,700 --> 00:09:36,090
Yeah, yeah.

179
00:09:36,090 --> 00:09:40,620
And the idea of norms is how to
measure the size of a matrix.

180
00:09:40,620 --> 00:09:46,060
Yeah, but I'll leave
that for the future.

181
00:09:46,060 --> 00:09:50,230
And randomized linear algebra
I just want to mention.

182
00:09:50,230 --> 00:09:53,950
Seems a little crazy that
by just randomly sampling

183
00:09:53,950 --> 00:09:58,580
a matrix, we could
learn anything about it.

184
00:09:58,580 --> 00:10:02,560
But typically data
is sort of organized.

185
00:10:02,560 --> 00:10:05,270
It's not just
totally random stuff.

186
00:10:05,270 --> 00:10:09,820
So if we want to know like, my
friend in the Broad Institute

187
00:10:09,820 --> 00:10:14,410
was doing the ancient
history of man.

188
00:10:14,410 --> 00:10:18,470
So data from thousands
of years ago.

189
00:10:18,470 --> 00:10:20,090
So he had a giant matrix--

190
00:10:20,090 --> 00:10:22,580
a lot of data-- too much data.

191
00:10:22,580 --> 00:10:27,880
And he said, how can we find the
singular value decomposition?

192
00:10:27,880 --> 00:10:29,660
Pick out the important thing.

193
00:10:29,660 --> 00:10:33,210
So you had to sample the data.

194
00:10:33,210 --> 00:10:36,990
Statistics is a beautiful
important subject.

195
00:10:36,990 --> 00:10:40,590
And it leans on linear algebra.

196
00:10:40,590 --> 00:10:44,140
Data science leans
on linear algebra.

197
00:10:44,140 --> 00:10:46,650
You are seeing the tool.

198
00:10:46,650 --> 00:10:52,080
Calculus would be functions
would be continuous curves.

199
00:10:52,080 --> 00:10:54,735
Linear algebra is about vectors.

200
00:10:54,735 --> 00:10:57,300
This is just n components.

201
00:10:57,300 --> 00:10:59,010
And that's where you compute.

202
00:10:59,010 --> 00:11:01,350
And that's where you understand.

203
00:11:01,350 --> 00:11:03,060
OK.

204
00:11:03,060 --> 00:11:07,120
Oh, this is maybe the
last slide to just

205
00:11:07,120 --> 00:11:10,570
help orient you in the courses.

206
00:11:10,570 --> 00:11:14,830
So at MIT 18.06 is the
Linear Algebra Course.

207
00:11:14,830 --> 00:11:17,950
And maybe you know
18.06 and also

208
00:11:17,950 --> 00:11:23,390
18.06 Scholar, SC,
on OpenCourseWare.

209
00:11:23,390 --> 00:11:30,770
And then this is the new course
with the new book, 18.065.

210
00:11:30,770 --> 00:11:34,430
So its numbers sort of
indicating a second course

211
00:11:34,430 --> 00:11:35,180
in linear algebra.

212
00:11:35,180 --> 00:11:36,790
That's when I'm
actually teaching now,

213
00:11:36,790 --> 00:11:40,050
Monday, Wednesday, Friday.

214
00:11:40,050 --> 00:11:42,300
And so that starts
with linear algebra,

215
00:11:42,300 --> 00:11:44,880
but it's mostly
about deep learning--

216
00:11:44,880 --> 00:11:45,960
learning from data.

217
00:11:45,960 --> 00:11:47,280
So you need statistics.

218
00:11:47,280 --> 00:11:50,520
You need optimization,
minimizing.

219
00:11:50,520 --> 00:11:53,910
Big functions of
calculus comes into it.

220
00:11:53,910 --> 00:11:58,170
So that's a lot of fun
to teach and to learn.

221
00:11:58,170 --> 00:12:01,470
And, of course,
it's tremendously

222
00:12:01,470 --> 00:12:03,510
important in industry now.

223
00:12:03,510 --> 00:12:06,570
And Google and Facebook
and ever so many companies

224
00:12:06,570 --> 00:12:09,300
need people who understand this.

225
00:12:09,300 --> 00:12:12,960
And, oh, and then I
am repeating 18.06

226
00:12:12,960 --> 00:12:16,170
because there is this
new book coming, I hope.

227
00:12:16,170 --> 00:12:19,140
Did some more this morning.

228
00:12:19,140 --> 00:12:20,640
Linear Algebra for Everyone.

229
00:12:20,640 --> 00:12:23,580
So I have
optimistically put 2021.

230
00:12:23,580 --> 00:12:27,140
And you're the first
people that know about it.

231
00:12:27,140 --> 00:12:30,850
So these are the websites
for the two that we have.

232
00:12:30,850 --> 00:12:32,800
That's the website
for the linear algebra

233
00:12:32,800 --> 00:12:35,390
book, math.mit.edu.

234
00:12:35,390 --> 00:12:39,520
And this is the website for
the Learning from Data book.

235
00:12:39,520 --> 00:12:43,990
So you see there the table of
contents and all and solutions

236
00:12:43,990 --> 00:12:47,830
to problems-- lots of things.

237
00:12:47,830 --> 00:12:50,650
Thanks for listening
to this is--

238
00:12:50,650 --> 00:12:56,470
what-- maybe four or five
pieces in this 2020 vision

239
00:12:56,470 --> 00:13:03,850
to update the videos
that have been watched

240
00:13:03,850 --> 00:13:07,480
so much on OpenCourseWare.

241
00:13:07,480 --> 00:13:09,330
Thank you.