1 00:00:13,972 --> 00:00:16,440 GILBERT STRANG: OK, so I was speaking 2 00:00:16,440 --> 00:00:19,650 about eigenvalues and eigenvectors 3 00:00:19,650 --> 00:00:21,690 for a square matrix. 4 00:00:21,690 --> 00:00:25,760 And then I said for data for many other applications, 5 00:00:25,760 --> 00:00:27,390 the matrices are not square. 6 00:00:27,390 --> 00:00:31,560 We need something that replaces eigenvalues and eigenvectors. 7 00:00:31,560 --> 00:00:32,640 And what they are-- 8 00:00:32,640 --> 00:00:37,530 and it's perfect-- is singular values and singular vectors. 9 00:00:37,530 --> 00:00:41,620 So may I explain singular values and singular vectors? 10 00:00:41,620 --> 00:00:45,930 This slide shows a lot of them. 11 00:00:45,930 --> 00:00:51,010 The point is that there will be-- 12 00:00:51,010 --> 00:00:54,430 now I don't say eigenvectors-- two-- different left singular 13 00:00:54,430 --> 00:00:55,380 vectors. 14 00:00:55,380 --> 00:00:58,270 They will go into this matrix u. 15 00:00:58,270 --> 00:01:01,740 Right singular vectors will go into v. 16 00:01:01,740 --> 00:01:03,880 It was the other case that was so special. 17 00:01:03,880 --> 00:01:06,820 When the matrix was symmetric, then the left 18 00:01:06,820 --> 00:01:08,250 equals left eigenvector. 19 00:01:08,250 --> 00:01:10,720 They're the same as the right one. 20 00:01:10,720 --> 00:01:12,430 That's sort of sensible. 21 00:01:12,430 --> 00:01:14,740 But a general matrix and certainly 22 00:01:14,740 --> 00:01:16,135 a rectangular matrix-- 23 00:01:19,270 --> 00:01:21,100 well, we don't call them eigenvectors, 24 00:01:21,100 --> 00:01:22,600 because that would be confusing-- we 25 00:01:22,600 --> 00:01:24,520 call them singular vectors. 26 00:01:24,520 --> 00:01:28,750 And then, inbetween are not eigenvalues, 27 00:01:28,750 --> 00:01:31,450 but singular values. 28 00:01:31,450 --> 00:01:32,130 Oh, right. 29 00:01:32,130 --> 00:01:34,700 Oh, hiding over here is a key. 30 00:01:34,700 --> 00:01:40,120 A times the v's gives sigma times the u's. 31 00:01:40,120 --> 00:01:43,210 So that's the replacement for ax equal lambda 32 00:01:43,210 --> 00:01:45,510 x, which had x on both sides. 33 00:01:45,510 --> 00:01:48,330 OK, now we've got two. 34 00:01:48,330 --> 00:01:51,820 But the beauty is now we've got two of those to work with. 35 00:01:51,820 --> 00:01:57,280 We can make all the u's orthogonal to each other-- 36 00:01:57,280 --> 00:02:01,000 all the v's orthogonal to each other. 37 00:02:01,000 --> 00:02:03,910 We can do what only symmetric matrices 38 00:02:03,910 --> 00:02:06,160 could do for eigenvectors. 39 00:02:06,160 --> 00:02:08,289 We can do it now for all matrices, 40 00:02:08,289 --> 00:02:13,150 not even squares, just this is where life is, OK. 41 00:02:13,150 --> 00:02:15,340 And these numbers instead of the lambdas 42 00:02:15,340 --> 00:02:17,350 are called singular values. 43 00:02:17,350 --> 00:02:20,290 And we use the letter sigma for those. 44 00:02:20,290 --> 00:02:24,110 And here is a picture of the geometry in 2 by 2 45 00:02:24,110 --> 00:02:26,600 if we had a 2 by 2 matrix. 46 00:02:26,600 --> 00:02:29,860 So you remember, factorization breaks 47 00:02:29,860 --> 00:02:33,960 up a matrix into separate small parts, 48 00:02:33,960 --> 00:02:36,160 each doing its own thing. 49 00:02:36,160 --> 00:02:39,630 So if I multiply by vector x, the first thing 50 00:02:39,630 --> 00:02:42,270 that's going to hit it is v transpose. 51 00:02:42,270 --> 00:02:45,660 V transpose is an orthogonal matrix. 52 00:02:45,660 --> 00:02:48,930 Remember, I said we can make these singular vectors 53 00:02:48,930 --> 00:02:50,160 perpendicular. 54 00:02:50,160 --> 00:02:52,200 That's what an orthogonal matrix-- so it's just 55 00:02:52,200 --> 00:02:54,390 like a rotation that you see. 56 00:02:54,390 --> 00:03:00,290 So the v transpose is just turns the vector to get here 57 00:03:00,290 --> 00:03:01,890 to get to the second one. 58 00:03:01,890 --> 00:03:04,200 Then I'm multiplying by the lambdas. 59 00:03:04,200 --> 00:03:05,490 But they're not lambdas now. 60 00:03:05,490 --> 00:03:07,040 They're sigma. 61 00:03:07,040 --> 00:03:10,360 The matrix, so that's capital sigma. 62 00:03:10,360 --> 00:03:12,940 So there is sigma 1 and sigma 2. 63 00:03:12,940 --> 00:03:16,140 What they do is stretch the circle. 64 00:03:16,140 --> 00:03:17,470 It's a diagonal matrix. 65 00:03:17,470 --> 00:03:20,020 So it doesn't turn things. 66 00:03:20,020 --> 00:03:23,590 But it stretches the circle to an ellipse 67 00:03:23,590 --> 00:03:26,710 because it gets the two different singular values 68 00:03:26,710 --> 00:03:28,480 in-- sigma 1 and sigma 2. 69 00:03:28,480 --> 00:03:34,480 And then the last guy, the u is going to hit last. 70 00:03:34,480 --> 00:03:36,820 It takes the ellipse and turns out again. 71 00:03:36,820 --> 00:03:38,350 It's again a rotation-- 72 00:03:38,350 --> 00:03:41,758 rotation, stretch, rotation. 73 00:03:41,758 --> 00:03:42,550 I'll say it again-- 74 00:03:42,550 --> 00:03:45,140 rotation, stretch, rotation. 75 00:03:45,140 --> 00:03:47,860 That's what singular values and singular 76 00:03:47,860 --> 00:03:51,430 vectors do, the singular value decomposition. 77 00:03:51,430 --> 00:03:56,740 And it's got the best of all worlds here. 78 00:03:56,740 --> 00:04:01,750 It's got the rotations, the orthogonal matrices. 79 00:04:01,750 --> 00:04:06,003 And it's got the stretches, the diagonal matrices. 80 00:04:06,003 --> 00:04:07,920 Compared to those two, those are the greatest. 81 00:04:07,920 --> 00:04:13,290 Triangular matrices were good when we were young an hour ago. 82 00:04:13,290 --> 00:04:15,670 Now, we're seeing the best. 83 00:04:15,670 --> 00:04:20,014 OK, now let me just show you where they come from. 84 00:04:20,014 --> 00:04:23,780 Oh, so how to find these v's. 85 00:04:23,780 --> 00:04:27,970 Well, the answer is, if I'm looking for orthogonal vectors, 86 00:04:27,970 --> 00:04:32,540 the great idea is find a symmetric matrix 87 00:04:32,540 --> 00:04:35,000 and with those eigenvectors. 88 00:04:35,000 --> 00:04:40,280 So these v's that I want for A are actually eigenvectors 89 00:04:40,280 --> 00:04:43,100 of this symmetric matrix A transpose times 90 00:04:43,100 --> 00:04:45,560 A. That's just nice. 91 00:04:45,560 --> 00:04:48,350 So we can find those singular vectors 92 00:04:48,350 --> 00:04:51,110 just as fast as we can find eigenvectors 93 00:04:51,110 --> 00:04:52,700 for a symmetric matrix. 94 00:04:52,700 --> 00:04:56,510 And we know there, because A transpose A is symmetric. 95 00:04:56,510 --> 00:04:59,540 We know the eigenvectors are perpendicular 96 00:04:59,540 --> 00:05:01,580 to each other orthonormal. 97 00:05:01,580 --> 00:05:04,910 OK, and now what about the other ones because remember, 98 00:05:04,910 --> 00:05:06,160 we have two sets. 99 00:05:06,160 --> 00:05:13,100 The u's-- well, we just multiply by A. And we've got the u's. 100 00:05:13,100 --> 00:05:16,280 Well, and divide by sigmas, because these vectors u's 101 00:05:16,280 --> 00:05:19,700 and v's are unit vectors, length one. 102 00:05:19,700 --> 00:05:22,100 So we have to scale them properly. 103 00:05:22,100 --> 00:05:27,380 And this was a little key bit of algebra to check that, 104 00:05:27,380 --> 00:05:29,660 not only the v's were orthogonal, 105 00:05:29,660 --> 00:05:31,880 but the u's are orthogonal. 106 00:05:31,880 --> 00:05:33,320 Yeah, it just comes out-- 107 00:05:33,320 --> 00:05:34,590 comes out. 108 00:05:34,590 --> 00:05:36,870 So this singular value decomposition, 109 00:05:36,870 --> 00:05:41,750 which is maybe, well, say 100 years old, maybe a bit more. 110 00:05:41,750 --> 00:05:48,410 But it's really in the last 20, 30 years that singular values 111 00:05:48,410 --> 00:05:50,160 have become so important. 112 00:05:50,160 --> 00:05:54,212 This is the best factorization of them all. 113 00:05:54,212 --> 00:05:57,620 And that's not always reflected in linear algebra courses. 114 00:05:57,620 --> 00:06:04,540 So part of my goal today is to say get to singular values. 115 00:06:04,540 --> 00:06:08,780 If you've done symmetric matrices and their eigenvalues, 116 00:06:08,780 --> 00:06:11,220 then you can do singular values. 117 00:06:11,220 --> 00:06:17,850 And I think that's absolutely worth doing, OK, yeah, 118 00:06:17,850 --> 00:06:22,850 so and remembering down here that capital Sigma stands 119 00:06:22,850 --> 00:06:27,500 for the diagonal matrix of these positive numbers, sigma 1, 120 00:06:27,500 --> 00:06:30,500 sigma 2 down to sigma r there. 121 00:06:30,500 --> 00:06:34,820 The rank, which came way back in the first slides, 122 00:06:34,820 --> 00:06:36,830 tells you how many there are. 123 00:06:36,830 --> 00:06:40,700 Good, good. 124 00:06:40,700 --> 00:06:43,230 Oh, here's an example. 125 00:06:43,230 --> 00:06:45,740 So I took a small matrix because I'm 126 00:06:45,740 --> 00:06:49,430 doing this by pencil and paper and actually showing you 127 00:06:49,430 --> 00:06:52,590 that the singular value. 128 00:06:52,590 --> 00:06:55,100 So there is my matrix, 2 by 2. 129 00:06:55,100 --> 00:06:56,310 Here are the u's. 130 00:06:56,310 --> 00:06:58,130 Do you see that those are orthogonal-- 131 00:06:58,130 --> 00:07:01,070 1, 3 against minus 3, 1? 132 00:07:01,070 --> 00:07:03,150 Take the dot product, and you get 0. 133 00:07:03,150 --> 00:07:04,960 The v's are orthogonal. 134 00:07:04,960 --> 00:07:06,800 The sigma is diagonal. 135 00:07:06,800 --> 00:07:11,890 And then the pieces from that add back to the matrix. 136 00:07:11,890 --> 00:07:14,190 So it's really, it's broken my matrix 137 00:07:14,190 --> 00:07:16,560 into a couple of pieces-- 138 00:07:16,560 --> 00:07:20,720 one for the first singular value in vector, 139 00:07:20,720 --> 00:07:23,980 and the other for the second singular value in vector. 140 00:07:23,980 --> 00:07:26,430 And that's what data science wants. 141 00:07:26,430 --> 00:07:30,310 Data science wants to know what's important in the matrix? 142 00:07:30,310 --> 00:07:34,610 Well, what's important is sigma 1, the big guy. 143 00:07:34,610 --> 00:07:36,400 Sigma 2, you see. 144 00:07:36,400 --> 00:07:38,020 Well, it was 3 times smaller-- 145 00:07:38,020 --> 00:07:40,450 3/2 versus 1/2. 146 00:07:40,450 --> 00:07:45,760 So if I had a 100 by 100 matrix or 100 by 1,000, 147 00:07:45,760 --> 00:07:51,560 I'd have 100 singular values and maybe the first five I'd keep. 148 00:07:51,560 --> 00:07:54,502 If I'm in the financial market, those guys, 149 00:07:54,502 --> 00:07:57,490 those first numbers are telling me 150 00:07:57,490 --> 00:08:00,790 maybe what bond prices are going to do over time. 151 00:08:00,790 --> 00:08:06,110 And it's a mixture of a few features, 152 00:08:06,110 --> 00:08:09,930 but not all 1,000 features, right. 153 00:08:09,930 --> 00:08:13,890 So this is singular value decomposition 154 00:08:13,890 --> 00:08:17,450 picks out the important part of a data matrix. 155 00:08:17,450 --> 00:08:19,490 And you cannot ask for a more than that. 156 00:08:22,250 --> 00:08:25,700 Here's what you do with a matrix is just totally enormous-- 157 00:08:25,700 --> 00:08:27,730 too big to multiply-- 158 00:08:27,730 --> 00:08:29,560 too big to compute. 159 00:08:29,560 --> 00:08:37,220 Then you randomly sample it. 160 00:08:37,220 --> 00:08:39,799 Yeah, maybe the next slide even mentions 161 00:08:39,799 --> 00:08:42,690 that word randomized numerical linear algebra. 162 00:08:42,690 --> 00:08:46,170 So this, I'll go back to this. 163 00:08:46,170 --> 00:08:48,980 So the singular value decomposition-- 164 00:08:48,980 --> 00:08:52,410 this, what we just talked about with the u's and the v's 165 00:08:52,410 --> 00:08:54,350 and the sigmas. 166 00:08:54,350 --> 00:08:56,670 Sigma 1 is the biggest. 167 00:08:56,670 --> 00:08:59,400 Sigma r is the smallest. 168 00:08:59,400 --> 00:09:01,640 So in data science, you very often 169 00:09:01,640 --> 00:09:06,410 keep just these first ones, maybe the first k, the k 170 00:09:06,410 --> 00:09:08,150 largest ones. 171 00:09:08,150 --> 00:09:10,250 And then you've got the matrix that 172 00:09:10,250 --> 00:09:14,900 has rank only k, because you're only working with k vectors. 173 00:09:14,900 --> 00:09:19,130 And it turns out that's the closest one to the big matrix 174 00:09:19,130 --> 00:09:23,610 A. So this singular value is among other things 175 00:09:23,610 --> 00:09:27,930 is picking out, putting in order of importance 176 00:09:27,930 --> 00:09:30,940 the little pieces of the matrix. 177 00:09:30,940 --> 00:09:34,700 And then you can just pick a few pieces to work with. 178 00:09:34,700 --> 00:09:36,090 Yeah, yeah. 179 00:09:36,090 --> 00:09:40,620 And the idea of norms is how to measure the size of a matrix. 180 00:09:40,620 --> 00:09:46,060 Yeah, but I'll leave that for the future. 181 00:09:46,060 --> 00:09:50,230 And randomized linear algebra I just want to mention. 182 00:09:50,230 --> 00:09:53,950 Seems a little crazy that by just randomly sampling 183 00:09:53,950 --> 00:09:58,580 a matrix, we could learn anything about it. 184 00:09:58,580 --> 00:10:02,560 But typically data is sort of organized. 185 00:10:02,560 --> 00:10:05,270 It's not just totally random stuff. 186 00:10:05,270 --> 00:10:09,820 So if we want to know like, my friend in the Broad Institute 187 00:10:09,820 --> 00:10:14,410 was doing the ancient history of man. 188 00:10:14,410 --> 00:10:18,470 So data from thousands of years ago. 189 00:10:18,470 --> 00:10:20,090 So he had a giant matrix-- 190 00:10:20,090 --> 00:10:22,580 a lot of data-- too much data. 191 00:10:22,580 --> 00:10:27,880 And he said, how can we find the singular value decomposition? 192 00:10:27,880 --> 00:10:29,660 Pick out the important thing. 193 00:10:29,660 --> 00:10:33,210 So you had to sample the data. 194 00:10:33,210 --> 00:10:36,990 Statistics is a beautiful important subject. 195 00:10:36,990 --> 00:10:40,590 And it leans on linear algebra. 196 00:10:40,590 --> 00:10:44,140 Data science leans on linear algebra. 197 00:10:44,140 --> 00:10:46,650 You are seeing the tool. 198 00:10:46,650 --> 00:10:52,080 Calculus would be functions would be continuous curves. 199 00:10:52,080 --> 00:10:54,735 Linear algebra is about vectors. 200 00:10:54,735 --> 00:10:57,300 This is just n components. 201 00:10:57,300 --> 00:10:59,010 And that's where you compute. 202 00:10:59,010 --> 00:11:01,350 And that's where you understand. 203 00:11:01,350 --> 00:11:03,060 OK. 204 00:11:03,060 --> 00:11:07,120 Oh, this is maybe the last slide to just 205 00:11:07,120 --> 00:11:10,570 help orient you in the courses. 206 00:11:10,570 --> 00:11:14,830 So at MIT 18.06 is the Linear Algebra Course. 207 00:11:14,830 --> 00:11:17,950 And maybe you know 18.06 and also 208 00:11:17,950 --> 00:11:23,390 18.06 Scholar, SC, on OpenCourseWare. 209 00:11:23,390 --> 00:11:30,770 And then this is the new course with the new book, 18.065. 210 00:11:30,770 --> 00:11:34,430 So its numbers sort of indicating a second course 211 00:11:34,430 --> 00:11:35,180 in linear algebra. 212 00:11:35,180 --> 00:11:36,790 That's when I'm actually teaching now, 213 00:11:36,790 --> 00:11:40,050 Monday, Wednesday, Friday. 214 00:11:40,050 --> 00:11:42,300 And so that starts with linear algebra, 215 00:11:42,300 --> 00:11:44,880 but it's mostly about deep learning-- 216 00:11:44,880 --> 00:11:45,960 learning from data. 217 00:11:45,960 --> 00:11:47,280 So you need statistics. 218 00:11:47,280 --> 00:11:50,520 You need optimization, minimizing. 219 00:11:50,520 --> 00:11:53,910 Big functions of calculus comes into it. 220 00:11:53,910 --> 00:11:58,170 So that's a lot of fun to teach and to learn. 221 00:11:58,170 --> 00:12:01,470 And, of course, it's tremendously 222 00:12:01,470 --> 00:12:03,510 important in industry now. 223 00:12:03,510 --> 00:12:06,570 And Google and Facebook and ever so many companies 224 00:12:06,570 --> 00:12:09,300 need people who understand this. 225 00:12:09,300 --> 00:12:12,960 And, oh, and then I am repeating 18.06 226 00:12:12,960 --> 00:12:16,170 because there is this new book coming, I hope. 227 00:12:16,170 --> 00:12:19,140 Did some more this morning. 228 00:12:19,140 --> 00:12:20,640 Linear Algebra for Everyone. 229 00:12:20,640 --> 00:12:23,580 So I have optimistically put 2021. 230 00:12:23,580 --> 00:12:27,140 And you're the first people that know about it. 231 00:12:27,140 --> 00:12:30,850 So these are the websites for the two that we have. 232 00:12:30,850 --> 00:12:32,800 That's the website for the linear algebra 233 00:12:32,800 --> 00:12:35,390 book, math.mit.edu. 234 00:12:35,390 --> 00:12:39,520 And this is the website for the Learning from Data book. 235 00:12:39,520 --> 00:12:43,990 So you see there the table of contents and all and solutions 236 00:12:43,990 --> 00:12:47,830 to problems-- lots of things. 237 00:12:47,830 --> 00:12:50,650 Thanks for listening to this is-- 238 00:12:50,650 --> 00:12:56,470 what-- maybe four or five pieces in this 2020 vision 239 00:12:56,470 --> 00:13:03,850 to update the videos that have been watched 240 00:13:03,850 --> 00:13:07,480 so much on OpenCourseWare. 241 00:13:07,480 --> 00:13:09,330 Thank you.