1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high-quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:22,307 --> 00:00:23,890 JULIAN SHUN: Good afternoon, everyone. 9 00:00:23,890 --> 00:00:24,640 Let's get started. 10 00:00:27,080 --> 00:00:31,951 So welcome to the 11th lecture of 6.172. 11 00:00:31,951 --> 00:00:34,330 It seems that there are many fewer people 12 00:00:34,330 --> 00:00:36,846 here today than on Tuesday. 13 00:00:36,846 --> 00:00:40,160 [LAUGHTER] 14 00:00:40,160 --> 00:00:40,660 All right. 15 00:00:40,660 --> 00:00:44,540 So today we're going to talk about storage allocation. 16 00:00:44,540 --> 00:00:47,170 And it turns out that storage allocation 17 00:00:47,170 --> 00:00:51,520 is about both allocating memory and also freeing it. 18 00:00:51,520 --> 00:00:53,980 But in the literature, it's just called storage allocation, 19 00:00:53,980 --> 00:00:55,870 so that's the term we're going to use. 20 00:00:55,870 --> 00:00:57,870 And whenever you do a malloc or a free, 21 00:00:57,870 --> 00:00:59,860 then you're doing storage allocation. 22 00:00:59,860 --> 00:01:02,500 So how many of you have used malloc or free before? 23 00:01:05,500 --> 00:01:10,510 So hopefully all of you, since you needed it 24 00:01:10,510 --> 00:01:14,120 for the projects and homeworks. 25 00:01:14,120 --> 00:01:16,990 So the simplest form of storage is a stack. 26 00:01:20,010 --> 00:01:24,000 And in a stack, you just have an array and a pointer. 27 00:01:24,000 --> 00:01:26,790 So here we have an array, which we call A, 28 00:01:26,790 --> 00:01:29,910 and then there's some portion of this array that's 29 00:01:29,910 --> 00:01:34,030 used for memory, and the rest of it is free-- it's not used. 30 00:01:34,030 --> 00:01:36,660 And then there's a pointer, sp, that 31 00:01:36,660 --> 00:01:42,170 points to the end of the used region in this stack. 32 00:01:42,170 --> 00:01:47,330 And if you want to allocate x bytes on the stack, all you do 33 00:01:47,330 --> 00:01:52,070 is, you just increment this sp pointer by x. 34 00:01:52,070 --> 00:01:56,450 And then, of course, you should also check for overflow 35 00:01:56,450 --> 00:01:57,950 to make sure that you don't actually 36 00:01:57,950 --> 00:02:00,110 go off the end of this array, because if you 37 00:02:00,110 --> 00:02:03,380 do that then you'll get a segmentation fault. 38 00:02:03,380 --> 00:02:05,630 But actually, nowadays, compilers 39 00:02:05,630 --> 00:02:07,700 don't really check for stack overflow 40 00:02:07,700 --> 00:02:11,860 because your stack is usually big enough for most 41 00:02:11,860 --> 00:02:14,210 of your program, and when you do get a stack overflow, 42 00:02:14,210 --> 00:02:17,390 you'll just get a segfault and then you go debug your program. 43 00:02:17,390 --> 00:02:19,940 So for efficiency reasons, the stack overflow 44 00:02:19,940 --> 00:02:22,620 isn't actually checked. 45 00:02:22,620 --> 00:02:23,120 OK? 46 00:02:23,120 --> 00:02:27,140 And then it returns a pointer to the beginning of the memory 47 00:02:27,140 --> 00:02:31,970 that you just allocated, so that's just as sp minus x. 48 00:02:31,970 --> 00:02:34,400 So that's pretty simple. 49 00:02:34,400 --> 00:02:37,580 And in fact, this is how the C call stack works. 50 00:02:37,580 --> 00:02:40,820 It also uses a stack discipline. 51 00:02:40,820 --> 00:02:44,960 So when you call a function, you save local variables 52 00:02:44,960 --> 00:02:47,240 and registers on the stack, and you also 53 00:02:47,240 --> 00:02:51,110 save the return address of the function that's 54 00:02:51,110 --> 00:02:53,000 calling another function. 55 00:02:53,000 --> 00:02:56,510 And then, when you return, you pop things off the stack. 56 00:02:56,510 --> 00:02:59,580 So you can also free things from the stack. 57 00:02:59,580 --> 00:03:04,070 So what you do is, you just decrement sp 58 00:03:04,070 --> 00:03:08,300 by x if you want to free x bytes. 59 00:03:08,300 --> 00:03:12,800 So here we just decremented sp by x, and everything 60 00:03:12,800 --> 00:03:16,735 after sp is now considered to be free. 61 00:03:16,735 --> 00:03:18,110 And again, if you're careful, you 62 00:03:18,110 --> 00:03:21,380 would check for a stack underflow. 63 00:03:21,380 --> 00:03:23,690 But again, the compiler usually doesn't do this 64 00:03:23,690 --> 00:03:26,000 because if you do have a stack overflow, 65 00:03:26,000 --> 00:03:28,720 there's a bug in your program, and you'll get a segfault, 66 00:03:28,720 --> 00:03:29,840 and you should go fix it. 67 00:03:33,670 --> 00:03:36,340 So allocating and freeing in a stack 68 00:03:36,340 --> 00:03:41,050 takes constant time because all you have to do 69 00:03:41,050 --> 00:03:45,170 is manipulate the stack pointer, so it's pretty efficient. 70 00:03:45,170 --> 00:03:47,080 However, you have to free consistently 71 00:03:47,080 --> 00:03:49,700 with the stack discipline. 72 00:03:49,700 --> 00:03:53,710 So the stack has limited applicability. 73 00:03:53,710 --> 00:03:56,470 Does anybody see why you can't do everything 74 00:03:56,470 --> 00:03:58,690 with just a stack? 75 00:03:58,690 --> 00:04:00,640 So what's one limitation of the stack? 76 00:04:05,920 --> 00:04:06,885 Yes? 77 00:04:06,885 --> 00:04:10,180 AUDIENCE: [INAUDIBLE] 78 00:04:10,180 --> 00:04:12,850 JULIAN SHUN: So it turns out that you can actually 79 00:04:12,850 --> 00:04:15,460 pass a compile time constant to make the stack bigger 80 00:04:15,460 --> 00:04:16,519 if you wanted to. 81 00:04:16,519 --> 00:04:18,730 There's actually a more fundamental limitation 82 00:04:18,730 --> 00:04:19,690 of the stack. 83 00:04:19,690 --> 00:04:20,882 Yes. 84 00:04:20,882 --> 00:04:22,299 AUDIENCE: You can only read things 85 00:04:22,299 --> 00:04:25,192 in the reverse order in which you allocate them. 86 00:04:25,192 --> 00:04:27,400 JULIAN SHUN: Yeah, so the answer is that you can only 87 00:04:27,400 --> 00:04:29,470 free the last thing that you allocated, 88 00:04:29,470 --> 00:04:31,870 so there's no way to free anything 89 00:04:31,870 --> 00:04:33,730 in the middle of this used region here. 90 00:04:33,730 --> 00:04:36,130 You have to free the last thing here 91 00:04:36,130 --> 00:04:39,070 because the stack doesn't keep track of the objects 92 00:04:39,070 --> 00:04:41,410 in the middle of this used region. 93 00:04:41,410 --> 00:04:44,110 So there's limited applicability, 94 00:04:44,110 --> 00:04:47,260 but it's great when it works because it's very efficient, 95 00:04:47,260 --> 00:04:49,660 and all of this code can essentially be inline. 96 00:04:49,660 --> 00:04:51,620 You don't have to have any function calls, 97 00:04:51,620 --> 00:04:54,230 so it's very fast. 98 00:04:54,230 --> 00:04:56,110 And it also turns out that you can 99 00:04:56,110 --> 00:05:00,640 allocate on the call stack using this function called alloca. 100 00:05:00,640 --> 00:05:02,080 It's actually not a function. 101 00:05:02,080 --> 00:05:04,690 It's just a keyword that the compiler recognizes, 102 00:05:04,690 --> 00:05:07,600 and it will transform it to instructions 103 00:05:07,600 --> 00:05:09,820 that manipulate the stack. 104 00:05:09,820 --> 00:05:13,707 However, this function is now deprecated 105 00:05:13,707 --> 00:05:15,790 because it turns out that the compiler is actually 106 00:05:15,790 --> 00:05:17,680 more efficient when you're dealing 107 00:05:17,680 --> 00:05:19,930 with these fixed-size frames if you just allocate 108 00:05:19,930 --> 00:05:22,870 a pointer on the stack that points to some piece of memory 109 00:05:22,870 --> 00:05:24,280 on the heap. 110 00:05:24,280 --> 00:05:28,143 But nevertheless, if you want to allocate on the call stack, 111 00:05:28,143 --> 00:05:29,560 you can call this alloca function, 112 00:05:29,560 --> 00:05:31,643 but you should check that is doing the right thing 113 00:05:31,643 --> 00:05:35,230 since it's now deprecated and the implementation is compiler 114 00:05:35,230 --> 00:05:37,930 dependent. 115 00:05:37,930 --> 00:05:42,060 So what's another type of storage besides the stack? 116 00:05:42,060 --> 00:05:44,260 So you can't do everything with a stack. 117 00:05:44,260 --> 00:05:45,610 So what else can we use? 118 00:05:54,050 --> 00:05:54,550 Yes? 119 00:05:54,550 --> 00:05:55,720 AUDIENCE: Heap. 120 00:05:55,720 --> 00:05:56,590 JULIAN SHUN: Yes. 121 00:05:56,590 --> 00:06:01,180 So we also have the heap, which is more general than the stack. 122 00:06:01,180 --> 00:06:04,480 So a stack looks very nice and tidy, 123 00:06:04,480 --> 00:06:06,742 and it's very efficient to use the stack, 124 00:06:06,742 --> 00:06:08,200 but it doesn't work for everything. 125 00:06:08,200 --> 00:06:10,150 So that's why we have the heap. 126 00:06:10,150 --> 00:06:13,640 And a heap is much more general, but it's very messy. 127 00:06:13,640 --> 00:06:17,650 It's very hard to organize this and work with it efficiently. 128 00:06:17,650 --> 00:06:19,290 And for the rest of this lecture, 129 00:06:19,290 --> 00:06:21,190 I am going to be talking about how 130 00:06:21,190 --> 00:06:23,700 to manage memory in the heap. 131 00:06:23,700 --> 00:06:26,320 And I found these pictures on Stack Overflow, 132 00:06:26,320 --> 00:06:29,221 so maybe they're biased towards stacks. 133 00:06:29,221 --> 00:06:34,540 [CHUCKLES] OK, so how do we do heap allocation? 134 00:06:34,540 --> 00:06:37,060 So let's first start with fixed-size heap allocation, 135 00:06:37,060 --> 00:06:40,060 where we assume that all of the objects that we're dealing with 136 00:06:40,060 --> 00:06:41,700 are of the same size. 137 00:06:41,700 --> 00:06:43,450 In general this isn't true, but let's just 138 00:06:43,450 --> 00:06:45,550 start with this simpler case first. 139 00:06:49,270 --> 00:06:53,950 OK, so as I said earlier, if you use malloc and free in C, 140 00:06:53,950 --> 00:06:56,650 then you're doing heap allocation. 141 00:06:56,650 --> 00:06:59,450 C++ has the new and delete operators, 142 00:06:59,450 --> 00:07:03,130 which work similarly to malloc and free. 143 00:07:03,130 --> 00:07:06,670 They also call the object constructor and destructor, 144 00:07:06,670 --> 00:07:09,760 and the C functions don't do that. 145 00:07:09,760 --> 00:07:15,310 And unlike Java and Python, C and C++ don't provide a garbage 146 00:07:15,310 --> 00:07:18,910 collector, so the programmer has to manage memory him 147 00:07:18,910 --> 00:07:21,850 or herself, and this is one of the reasons for the efficiency 148 00:07:21,850 --> 00:07:25,900 of C and C++, because there's no garbage collector running 149 00:07:25,900 --> 00:07:27,250 in the background. 150 00:07:27,250 --> 00:07:28,900 However, this makes it much harder 151 00:07:28,900 --> 00:07:32,290 to write correct programs in C because you 152 00:07:32,290 --> 00:07:36,520 have to be careful of memory leaks, dangling pointers, 153 00:07:36,520 --> 00:07:37,660 and double freeing. 154 00:07:37,660 --> 00:07:41,320 So a memory leak is if you allocate something 155 00:07:41,320 --> 00:07:43,620 and you forget to free it and your program keeps 156 00:07:43,620 --> 00:07:45,370 running and allocating more and more stuff 157 00:07:45,370 --> 00:07:46,328 but without freeing it. 158 00:07:46,328 --> 00:07:48,370 Eventually, you're going to run out of memory, 159 00:07:48,370 --> 00:07:51,040 and your program is going to crash. 160 00:07:51,040 --> 00:07:54,880 So you need to be careful of memory leaks. 161 00:07:54,880 --> 00:07:57,790 Dangling pointers are pointers to pieces of memory 162 00:07:57,790 --> 00:08:00,250 that you have already freed, and if you 163 00:08:00,250 --> 00:08:02,580 try to dereference a dangling pointer, 164 00:08:02,580 --> 00:08:04,550 the behavior is going to be undefined. 165 00:08:04,550 --> 00:08:06,610 So maybe you'll get a segmentation fault. 166 00:08:06,610 --> 00:08:09,160 Maybe you won't see anything until later on in your program 167 00:08:09,160 --> 00:08:11,980 because that memory might have been reallocated for something 168 00:08:11,980 --> 00:08:15,970 else, and it's actually legal to dereference that memory. 169 00:08:15,970 --> 00:08:18,580 So dangling pointers are very annoying when 170 00:08:18,580 --> 00:08:21,340 you're using C. If you're lucky, you'll get a segfault right 171 00:08:21,340 --> 00:08:22,810 away, and you can go fix your bug, 172 00:08:22,810 --> 00:08:27,100 but sometimes these are very hard to find. 173 00:08:27,100 --> 00:08:28,310 There's also double freeing. 174 00:08:28,310 --> 00:08:31,810 So this is when you free something more than once. 175 00:08:31,810 --> 00:08:34,240 And again, this will lead to undefined behavior. 176 00:08:34,240 --> 00:08:38,230 Maybe you'll get a segfault, or maybe that piece of memory 177 00:08:38,230 --> 00:08:40,179 was allocated for something else, 178 00:08:40,179 --> 00:08:43,419 and then when you free it again, it's actually legal. 179 00:08:43,419 --> 00:08:45,850 But your program is going to be incorrect, 180 00:08:45,850 --> 00:08:48,310 so you need to be careful that you don't free something 181 00:08:48,310 --> 00:08:48,955 more than once. 182 00:08:51,650 --> 00:08:53,800 And this is why some people prefer 183 00:08:53,800 --> 00:08:56,770 to use a language like Java and Python 184 00:08:56,770 --> 00:09:00,520 that provide these built-in garbage collectors. 185 00:09:00,520 --> 00:09:02,680 However, these languages are less efficient 186 00:09:02,680 --> 00:09:04,690 because they have a general-purpose garbage 187 00:09:04,690 --> 00:09:07,650 collector running in the background. 188 00:09:07,650 --> 00:09:10,210 So in this class, we're going to use C 189 00:09:10,210 --> 00:09:13,390 because we want to be able to write the fastest programs as 190 00:09:13,390 --> 00:09:18,370 possible, so we need to study how to manage memory. 191 00:09:18,370 --> 00:09:19,900 And there are some tools you can use 192 00:09:19,900 --> 00:09:23,950 to reduce the number of memory bugs you have in your program. 193 00:09:23,950 --> 00:09:28,510 So there's memory checkers like AddressSanitizer and Valgrind, 194 00:09:28,510 --> 00:09:31,270 which can assist you in finding these pernicious bugs. 195 00:09:31,270 --> 00:09:35,740 So AddressSanitizer is a compiler instrumentation tool. 196 00:09:35,740 --> 00:09:39,160 When you compile your program, you pass a flag, 197 00:09:39,160 --> 00:09:41,950 and then, when you run your program, 198 00:09:41,950 --> 00:09:45,580 it's going to report possible memory 199 00:09:45,580 --> 00:09:47,380 bugs you have in your program. 200 00:09:47,380 --> 00:09:50,300 And then Valgrind-- it works directly off the binaries. 201 00:09:50,300 --> 00:09:54,280 You don't need to do anything special when you compile it. 202 00:09:54,280 --> 00:09:57,220 You can just pass your binary to Valgrind, 203 00:09:57,220 --> 00:10:01,120 and if there is a memory bug, it might find it. 204 00:10:01,120 --> 00:10:04,720 But Valgrind tends to be slower than AddressSanitizer, 205 00:10:04,720 --> 00:10:07,870 and it tends to catch fewer bugs because it 206 00:10:07,870 --> 00:10:10,540 knows less about the program than AddressSanitizer. 207 00:10:10,540 --> 00:10:13,840 And AddressSanitizer sees the source code of the program 208 00:10:13,840 --> 00:10:16,540 and has more information, whereas Valgrind just 209 00:10:16,540 --> 00:10:17,995 works directly off the binary. 210 00:10:21,560 --> 00:10:26,450 Also, don't confuse the heap with the heap data structure 211 00:10:26,450 --> 00:10:29,750 that you might have seen before in your algorithms or data 212 00:10:29,750 --> 00:10:30,710 structures courses. 213 00:10:30,710 --> 00:10:34,400 So these are two different concepts. 214 00:10:34,400 --> 00:10:37,910 The heap data structure in your algorithms course 215 00:10:37,910 --> 00:10:40,580 was a data structure used to represent a priority queue, 216 00:10:40,580 --> 00:10:43,880 where you can efficiently extract the highest priority 217 00:10:43,880 --> 00:10:46,100 element, and you can also update the priorities 218 00:10:46,100 --> 00:10:47,820 of elements in the set. 219 00:10:47,820 --> 00:10:49,610 And this could be used for algorithms 220 00:10:49,610 --> 00:10:52,280 like sorting or graph search. 221 00:10:52,280 --> 00:10:53,750 But today we're going to be talking 222 00:10:53,750 --> 00:10:56,900 about another heap, which is the heap that's 223 00:10:56,900 --> 00:10:58,430 used for storage allocation. 224 00:10:58,430 --> 00:10:59,825 So don't get confused. 225 00:11:02,430 --> 00:11:03,650 So any questions so far? 226 00:11:06,930 --> 00:11:08,940 OK. 227 00:11:08,940 --> 00:11:09,440 All right. 228 00:11:09,440 --> 00:11:12,500 So we're going to first start with fixed-size allocations, 229 00:11:12,500 --> 00:11:14,653 since that's the easier case. 230 00:11:14,653 --> 00:11:16,820 So we're going to assume that every piece of storage 231 00:11:16,820 --> 00:11:18,710 has the same size. 232 00:11:18,710 --> 00:11:23,960 Some of these blocks are used, and some of them are unused. 233 00:11:23,960 --> 00:11:26,240 And among the unused blocks, we're 234 00:11:26,240 --> 00:11:29,540 going to keep a list that we call the free list, 235 00:11:29,540 --> 00:11:32,060 and each block in this free list has a pointer 236 00:11:32,060 --> 00:11:34,010 to the next block in the free list. 237 00:11:34,010 --> 00:11:35,990 And since this memory is unused, we 238 00:11:35,990 --> 00:11:38,270 can actually use the memory to store 239 00:11:38,270 --> 00:11:42,028 a pointer as part of our storage allocator implementation. 240 00:11:45,700 --> 00:11:48,670 There's actually another way to do fixed-size allocations. 241 00:11:48,670 --> 00:11:50,860 Instead of using a free list, you 242 00:11:50,860 --> 00:11:54,340 could actually a place a bit for each block saying 243 00:11:54,340 --> 00:11:57,070 whether or not it's free, and then when you do allocation, 244 00:11:57,070 --> 00:11:59,110 you can use bit tricks. 245 00:11:59,110 --> 00:12:01,870 But today I'm going to talk about the free list 246 00:12:01,870 --> 00:12:03,194 implementation. 247 00:12:05,920 --> 00:12:10,960 So to allocate one object from the free list, 248 00:12:10,960 --> 00:12:14,680 you set the pointer x to be free. 249 00:12:14,680 --> 00:12:18,070 So free is pointing to the first object in this free list. 250 00:12:18,070 --> 00:12:20,050 Then you set the free pointer to point 251 00:12:20,050 --> 00:12:22,240 to the next thing in the free list, 252 00:12:22,240 --> 00:12:27,250 so this is doing free equal to the next pointer of free. 253 00:12:27,250 --> 00:12:29,230 And then, finally, you return x, which 254 00:12:29,230 --> 00:12:35,140 is a pointer to the first object in the free list. 255 00:12:35,140 --> 00:12:36,640 So here's an animation. 256 00:12:36,640 --> 00:12:40,630 So x is going to point to what free points to. 257 00:12:40,630 --> 00:12:42,760 You also need to check if free is equal to null, 258 00:12:42,760 --> 00:12:45,295 because if free is equal to null, 259 00:12:45,295 --> 00:12:48,670 that means there are no more free blocks in the free list, 260 00:12:48,670 --> 00:12:50,410 and the programmer should know this. 261 00:12:50,410 --> 00:12:51,910 You should return a special value. 262 00:12:54,700 --> 00:12:57,910 Otherwise, we're going to set the free pointer 263 00:12:57,910 --> 00:13:00,490 to point to the next thing in the free list, 264 00:13:00,490 --> 00:13:03,700 and then, finally, we return x to the program, 265 00:13:03,700 --> 00:13:06,640 and now the program has a block of memory that I can use. 266 00:13:10,420 --> 00:13:12,940 There is still a garbage pointer in this block 267 00:13:12,940 --> 00:13:15,910 that we pass back to the program because we didn't clear it. 268 00:13:15,910 --> 00:13:18,910 So the implementation of the storage allocator 269 00:13:18,910 --> 00:13:21,768 could decide to zero this out, or it can just 270 00:13:21,768 --> 00:13:23,560 pass it back to the program and leave it up 271 00:13:23,560 --> 00:13:25,643 to the programmer to do whatever it wants with it. 272 00:13:25,643 --> 00:13:27,790 So in the latter case, the programmer 273 00:13:27,790 --> 00:13:31,270 should be careful not to try to dereference this pointer. 274 00:13:36,510 --> 00:13:40,530 OK, so how about deallocation? 275 00:13:40,530 --> 00:13:43,860 So let's say we want to free some object x. 276 00:13:43,860 --> 00:13:47,100 What we do is, we just set the next pointer 277 00:13:47,100 --> 00:13:51,420 of x to be equal to free, so it's 278 00:13:51,420 --> 00:13:56,040 going to point to the first thing in the free list. 279 00:13:56,040 --> 00:13:58,380 And then we set free equal to x. 280 00:13:58,380 --> 00:14:01,850 So right. 281 00:14:01,850 --> 00:14:05,390 So now free is pointing to x, and this x object 282 00:14:05,390 --> 00:14:09,230 that we wanted to free now is a pointer to the first object 283 00:14:09,230 --> 00:14:10,460 in the original free list. 284 00:14:13,490 --> 00:14:14,370 So pretty simple. 285 00:14:14,370 --> 00:14:16,040 Any questions on this? 286 00:14:25,350 --> 00:14:27,210 So this sort of acts like a stack 287 00:14:27,210 --> 00:14:30,630 in that the last thing that you've freed 288 00:14:30,630 --> 00:14:32,750 is going to be the first thing that you allocate, 289 00:14:32,750 --> 00:14:36,960 so you get temporal locality in that way. 290 00:14:36,960 --> 00:14:38,700 But unlike a stack, you can actually 291 00:14:38,700 --> 00:14:41,580 free any of the blocks and not just the last block 292 00:14:41,580 --> 00:14:43,590 that you allocated. 293 00:14:46,410 --> 00:14:49,590 So with a free list, allocating and freeing 294 00:14:49,590 --> 00:14:51,690 take constant time because you're just 295 00:14:51,690 --> 00:14:54,630 adjusting some pointers. 296 00:14:54,630 --> 00:14:57,030 It has good temporal locality, because as I said, 297 00:14:57,030 --> 00:15:00,060 the things that you freed most recently 298 00:15:00,060 --> 00:15:04,670 are going to be the things that are going to be allocated. 299 00:15:04,670 --> 00:15:10,050 It has poor spatial locality due to external fragmentation, 300 00:15:10,050 --> 00:15:14,280 and external fragmentation means that your box of used memory 301 00:15:14,280 --> 00:15:16,680 are spread out all over the place 302 00:15:16,680 --> 00:15:20,220 in the space of all memory. 303 00:15:20,220 --> 00:15:22,140 And this can be bad for performance 304 00:15:22,140 --> 00:15:25,740 because it can increase the size of the page table, 305 00:15:25,740 --> 00:15:28,300 and it can also cause disk thrashing. 306 00:15:28,300 --> 00:15:31,470 So if you recall, whenever you access 307 00:15:31,470 --> 00:15:37,110 a page in virtual memory, it has to do address translation 308 00:15:37,110 --> 00:15:39,390 to the physical memory address. 309 00:15:39,390 --> 00:15:42,990 And if your memory is spread out across many pages 310 00:15:42,990 --> 00:15:47,183 in virtual memory, then you're going to have a lot of entries 311 00:15:47,183 --> 00:15:49,350 in the page table, because the page table is storing 312 00:15:49,350 --> 00:15:53,040 this mapping between the virtual memory address of the page 313 00:15:53,040 --> 00:15:55,650 and the physical memory address of the page. 314 00:15:55,650 --> 00:15:57,360 So this can complicate the page table, 315 00:15:57,360 --> 00:16:01,020 make it less efficient to do lookups in it. 316 00:16:01,020 --> 00:16:03,300 And then if you have more pages than you 317 00:16:03,300 --> 00:16:06,090 can fit in your main memory, then this 318 00:16:06,090 --> 00:16:09,630 can cause disk thrashing because you have to move pages 319 00:16:09,630 --> 00:16:10,515 in and out of disk. 320 00:16:13,170 --> 00:16:17,160 The Translation Lookaside Buffer or TLB can also be a problem. 321 00:16:17,160 --> 00:16:19,320 Does anybody know what a TLB is? 322 00:16:28,530 --> 00:16:29,456 Yes? 323 00:16:29,456 --> 00:16:31,400 AUDIENCE: A cache of the result of translating 324 00:16:31,400 --> 00:16:33,025 from virtual memory to physical memory. 325 00:16:33,025 --> 00:16:35,380 JULIAN SHUN: Yeah, so the TLB is essentially 326 00:16:35,380 --> 00:16:38,170 a cache for the page table, so it 327 00:16:38,170 --> 00:16:41,050 will cache the results of the translation 328 00:16:41,050 --> 00:16:43,180 from virtual memory addresses to physical memory 329 00:16:43,180 --> 00:16:46,750 addresses for the most recent translations. 330 00:16:46,750 --> 00:16:49,840 And looking up a translation in the TLB 331 00:16:49,840 --> 00:16:53,770 is much more efficient than going through the page table. 332 00:16:53,770 --> 00:16:56,470 And if you have a lot of external fragmentation, 333 00:16:56,470 --> 00:16:59,830 then you have a lot of pages that you might access, 334 00:16:59,830 --> 00:17:01,990 and this means that when you go to the TLB, 335 00:17:01,990 --> 00:17:03,880 it's more likely you'll get a TLB miss, 336 00:17:03,880 --> 00:17:05,380 and you have to go to the page table 337 00:17:05,380 --> 00:17:06,849 to look up the appropriate address. 338 00:17:06,849 --> 00:17:09,670 So that's why external fragmentation is bad. 339 00:17:12,790 --> 00:17:15,250 So let's look at some ways to mitigate 340 00:17:15,250 --> 00:17:18,619 external fragmentation. 341 00:17:18,619 --> 00:17:23,890 So one way to do this is to keep a free list or a bitmap 342 00:17:23,890 --> 00:17:27,460 per disk page, and then when you want to allocate something, 343 00:17:27,460 --> 00:17:30,730 you allocate from the free list of the fullest page. 344 00:17:30,730 --> 00:17:33,340 So you sort of skew the memory that's 345 00:17:33,340 --> 00:17:37,420 being used to as few pages as possible. 346 00:17:37,420 --> 00:17:40,330 And when you free a block of storage, 347 00:17:40,330 --> 00:17:45,490 you just return it to the page on which that block resides. 348 00:17:45,490 --> 00:17:47,680 And if a page becomes completely empty-- 349 00:17:47,680 --> 00:17:50,860 there are no more items that are used on that page-- 350 00:17:50,860 --> 00:17:52,950 then the virtual memory system can page it out 351 00:17:52,950 --> 00:17:55,450 without affecting the program performance because you're not 352 00:17:55,450 --> 00:17:58,750 going to access that page anyways. 353 00:17:58,750 --> 00:18:00,250 So this might seem counterintuitive. 354 00:18:00,250 --> 00:18:04,030 Why do we want to skew the items to as few pages as possible? 355 00:18:04,030 --> 00:18:05,920 So let's look at a simple example 356 00:18:05,920 --> 00:18:09,220 to convince ourselves why this is actually good for dealing 357 00:18:09,220 --> 00:18:10,730 with external fragmentation. 358 00:18:10,730 --> 00:18:12,830 So here I have two cases. 359 00:18:12,830 --> 00:18:16,930 In the first case, I have 90% of my blocks on one page 360 00:18:16,930 --> 00:18:20,080 and 10% of the blocks on the other page. 361 00:18:20,080 --> 00:18:23,050 In the second case, I have half of my blocks on one page 362 00:18:23,050 --> 00:18:25,660 and half on the other page. 363 00:18:25,660 --> 00:18:27,160 So now let's look at the probability 364 00:18:27,160 --> 00:18:29,980 that two random accesses will hit the same page. 365 00:18:29,980 --> 00:18:31,990 So let's assume that all of the random accesses 366 00:18:31,990 --> 00:18:34,630 are going to go to one of the two pages. 367 00:18:34,630 --> 00:18:36,730 So in the first case, the probability 368 00:18:36,730 --> 00:18:39,340 that both of the accesses hit the first page is going 369 00:18:39,340 --> 00:18:43,780 to be 0.9 times 0.9, and then the probability that they 370 00:18:43,780 --> 00:18:46,700 both hit the second page is 0.1 times 0.1, 371 00:18:46,700 --> 00:18:49,430 and if you sum this up, you get 0.82. 372 00:18:49,430 --> 00:18:53,080 That's the probability that both of the random accesses 373 00:18:53,080 --> 00:18:55,230 are going to hit the same page. 374 00:18:55,230 --> 00:18:57,280 In the other case, the probability 375 00:18:57,280 --> 00:19:00,820 that both of the accesses hit the first page 376 00:19:00,820 --> 00:19:03,250 is going to be 0.5 times 0.5. 377 00:19:03,250 --> 00:19:06,190 The second page is also going to be 0.5 times 0.5, 378 00:19:06,190 --> 00:19:09,010 so that sums to 0.5, and that means 379 00:19:09,010 --> 00:19:11,835 that there's only a 50% chance that two random accesses are 380 00:19:11,835 --> 00:19:12,960 going to hit the same page. 381 00:19:12,960 --> 00:19:14,377 So in the first case, you actually 382 00:19:14,377 --> 00:19:17,590 have a higher chance that the two random accesses hit 383 00:19:17,590 --> 00:19:20,890 the same page, and that's why we want to skew the items as much 384 00:19:20,890 --> 00:19:23,800 as possible so that we can reduce 385 00:19:23,800 --> 00:19:27,510 the external fragmentation. 386 00:19:27,510 --> 00:19:28,350 Any questions? 387 00:19:33,800 --> 00:19:39,870 OK, so that was fixed-size heap allocation, 388 00:19:39,870 --> 00:19:43,730 and obviously you can't use that for many programs 389 00:19:43,730 --> 00:19:46,590 if you're allocating memory of different sizes. 390 00:19:46,590 --> 00:19:49,230 So now let's look at variable-size heap allocation. 391 00:19:52,360 --> 00:19:55,110 So we're going to look at one allocation scheme called 392 00:19:55,110 --> 00:19:56,942 binned free lists. 393 00:19:56,942 --> 00:20:03,410 And the idea is to leverage the efficiency of free list 394 00:20:03,410 --> 00:20:05,010 and also accept the bounded amount 395 00:20:05,010 --> 00:20:06,520 of internal fragmentation. 396 00:20:06,520 --> 00:20:10,500 So internal fragmentation is wasted space within a block, 397 00:20:10,500 --> 00:20:12,300 so that means when you allocate possibly 398 00:20:12,300 --> 00:20:14,130 more space than you're using, then 399 00:20:14,130 --> 00:20:16,530 there's some wasted space in there. 400 00:20:16,530 --> 00:20:19,320 So in binned free lists, what we're going to do is, 401 00:20:19,320 --> 00:20:21,510 we're going to have a whole bunch of bins, 402 00:20:21,510 --> 00:20:25,260 and each bin is going to store blocks of a particular size. 403 00:20:30,090 --> 00:20:33,720 So here I'm going to say that bin k holds 404 00:20:33,720 --> 00:20:36,480 memory blocks of size 2 to the k, 405 00:20:36,480 --> 00:20:40,440 so I'm going to store blocks of sizes powers of 2. 406 00:20:42,990 --> 00:20:46,050 So why don't I just store a bin for every possible size? 407 00:20:46,050 --> 00:20:47,700 Does anybody know why? 408 00:20:50,760 --> 00:20:56,490 Why am I rounding up to powers of 2 here? 409 00:20:56,490 --> 00:20:58,410 AUDIENCE: You'd have too many bins. 410 00:20:58,410 --> 00:21:01,200 JULIAN SHUN: Yes, if I wanted a bin for every possible size, 411 00:21:01,200 --> 00:21:06,240 I would have way too many bins, and just the pointers 412 00:21:06,240 --> 00:21:08,890 to these bins are not going to fit in memory. 413 00:21:08,890 --> 00:21:12,300 So that's why I'm only using bins that store 414 00:21:12,300 --> 00:21:14,415 blocks of size 2 to the k. 415 00:21:17,190 --> 00:21:21,090 And now let's look at how I'm going 416 00:21:21,090 --> 00:21:23,748 to allocate x bytes from a binned free list. 417 00:21:23,748 --> 00:21:25,290 So what I'm going to do is, I'm going 418 00:21:25,290 --> 00:21:29,370 to look up the bin for which I should take a block from, 419 00:21:29,370 --> 00:21:35,670 and to get that, I'm going to take the ceiling of log base x. 420 00:21:35,670 --> 00:21:40,560 This is log base 2, so recall that lg is log base 2. 421 00:21:40,560 --> 00:21:42,600 If that bin is nonempty, then I can just 422 00:21:42,600 --> 00:21:45,420 return a block from that bin. 423 00:21:45,420 --> 00:21:47,280 However, if that bin is empty, then I 424 00:21:47,280 --> 00:21:52,590 need to go to the next highest bin that's nonempty, 425 00:21:52,590 --> 00:21:54,960 and then I'm going to take a block from that bin 426 00:21:54,960 --> 00:21:57,360 and then split it up into smaller chunks 427 00:21:57,360 --> 00:21:59,580 and place them into smaller bins. 428 00:21:59,580 --> 00:22:02,880 And then I'll also get a chunk that is of the right size. 429 00:22:02,880 --> 00:22:07,920 So for this example, let's say I wanted to allocate 3 bytes. 430 00:22:07,920 --> 00:22:12,060 The ceiling of log base 2 of x is 2, so I go to bin 2. 431 00:22:12,060 --> 00:22:15,330 But bin 2 is empty, so I need to look for the next bin that's 432 00:22:15,330 --> 00:22:16,800 not empty. 433 00:22:16,800 --> 00:22:19,330 And that's going to be bin 4. 434 00:22:19,330 --> 00:22:24,230 And I'm going to split up this block into smaller powers of 2. 435 00:22:24,230 --> 00:22:26,130 So in particular, I'm going to find 436 00:22:26,130 --> 00:22:29,250 a nonempty bin k prime greater than k 437 00:22:29,250 --> 00:22:33,900 and split up a block into sizes of 2 to the k prime minus 1, 438 00:22:33,900 --> 00:22:37,530 2 to the k prime minus 2, all the way down to 2 to the k. 439 00:22:37,530 --> 00:22:41,340 So I'm going to split it into sizes of all the powers of 2 440 00:22:41,340 --> 00:22:46,260 less than 2 to the k prime and greater than or equal to 2 441 00:22:46,260 --> 00:22:47,070 to the k. 442 00:22:47,070 --> 00:22:50,790 And I'm going to actually have two blocks of size 2 to the k, 443 00:22:50,790 --> 00:22:54,090 and one of those will be returned to the program. 444 00:22:54,090 --> 00:22:57,480 So here I'm going to split up this block. 445 00:22:57,480 --> 00:23:01,140 I'm going to place one of the smaller blocks in bin 3, one 446 00:23:01,140 --> 00:23:05,370 of them into bin 2, and then I also have another block here 447 00:23:05,370 --> 00:23:07,245 that I'm just going to return to the program. 448 00:23:10,310 --> 00:23:12,200 So any questions on how this scheme works? 449 00:23:17,780 --> 00:23:21,460 OK, and if there are no larger blocks that exist-- 450 00:23:21,460 --> 00:23:23,310 so that means all of the bins higher 451 00:23:23,310 --> 00:23:25,840 than the bin I'm looking at are empty-- 452 00:23:25,840 --> 00:23:30,250 then I need to go to the OS to request for more memory. 453 00:23:30,250 --> 00:23:31,740 And then after I get that memory, 454 00:23:31,740 --> 00:23:35,250 I'll split it up so I can satisfy my allocation request. 455 00:23:39,260 --> 00:23:43,630 In practice, this exact scheme isn't used, 456 00:23:43,630 --> 00:23:45,380 so there are many variants of this scheme. 457 00:23:45,380 --> 00:23:47,000 So it turns out that efficiency is 458 00:23:47,000 --> 00:23:49,070 very important for small allocations 459 00:23:49,070 --> 00:23:52,460 because there's not that much work performed 460 00:23:52,460 --> 00:23:55,940 on these small pieces of memory, and the overheads 461 00:23:55,940 --> 00:23:59,630 of the storage allocation scheme could cause a performance 462 00:23:59,630 --> 00:24:00,170 bottleneck. 463 00:24:00,170 --> 00:24:03,140 So in practice, you usually don't go all the way down 464 00:24:03,140 --> 00:24:04,760 to blocks of size 1. 465 00:24:04,760 --> 00:24:08,330 You might stop at blocks of size 8 bytes so 466 00:24:08,330 --> 00:24:10,460 that you don't have that much overhead, 467 00:24:10,460 --> 00:24:13,130 but this does increase the internal fragmentation 468 00:24:13,130 --> 00:24:19,460 by a little bit, because now you have some wasted space. 469 00:24:19,460 --> 00:24:21,350 And then-- one second-- and then you 470 00:24:21,350 --> 00:24:23,300 can also group blocks into pages, 471 00:24:23,300 --> 00:24:26,787 as I said before, so that all of the blocks in the same page 472 00:24:26,787 --> 00:24:28,370 have the same size, and then you don't 473 00:24:28,370 --> 00:24:33,800 have to store the information of the size of the blocks. 474 00:24:33,800 --> 00:24:36,635 Yes. 475 00:24:36,635 --> 00:24:39,080 AUDIENCE: How do you-- 476 00:24:39,080 --> 00:24:41,710 JULIAN SHUN: Yeah, so there are two commands you can use. 477 00:24:41,710 --> 00:24:45,150 One is called mmap and the other one is called sbrk. 478 00:24:45,150 --> 00:24:48,577 So those are system calls. 479 00:24:48,577 --> 00:24:51,160 You just call that, and then the OS will give you more memory, 480 00:24:51,160 --> 00:24:52,952 and then your storage allocator can use it. 481 00:24:58,720 --> 00:25:01,285 Yes? 482 00:25:01,285 --> 00:25:04,762 AUDIENCE: They don't have to use something like this in order 483 00:25:04,762 --> 00:25:05,860 to implement those? 484 00:25:05,860 --> 00:25:10,390 JULIAN SHUN: No, the standard implementation of malloc-- 485 00:25:10,390 --> 00:25:13,960 internally, it uses these commands, mmap and sbrk, 486 00:25:13,960 --> 00:25:17,920 to get memory from the OS, so the OS just gives you 487 00:25:17,920 --> 00:25:19,030 a huge chunk of memory. 488 00:25:19,030 --> 00:25:21,940 It doesn't split it up into smaller blocks or anything. 489 00:25:21,940 --> 00:25:23,650 That's up to the storage allocator to do. 490 00:25:23,650 --> 00:25:26,140 It just gives you a big chunk of memory, 491 00:25:26,140 --> 00:25:28,840 and then the storage allocator will break it up 492 00:25:28,840 --> 00:25:31,800 into smaller blocks. 493 00:25:31,800 --> 00:25:34,300 There are similar commands where you 494 00:25:34,300 --> 00:25:36,100 can free memory back to the OS when 495 00:25:36,100 --> 00:25:39,131 you're not using them anymore. 496 00:25:39,131 --> 00:25:41,464 AUDIENCE: Can you explain the paging thing [INAUDIBLE]?? 497 00:25:44,135 --> 00:25:45,510 JULIAN SHUN: Yeah, so what I said 498 00:25:45,510 --> 00:25:48,330 was that you can actually keep blocks of different sizes 499 00:25:48,330 --> 00:25:52,230 on different pages, so then you don't actually 500 00:25:52,230 --> 00:25:53,920 have to store the size of each block. 501 00:25:53,920 --> 00:25:56,970 You can just look up what page that block resides in when 502 00:25:56,970 --> 00:25:58,150 you get the memory address. 503 00:25:58,150 --> 00:26:01,020 And then for each page, you have one field 504 00:26:01,020 --> 00:26:04,020 that stores the size of those blocks on that page. 505 00:26:04,020 --> 00:26:05,700 So this saves you the overhead of having 506 00:26:05,700 --> 00:26:09,820 to store information per block to figure out its size. 507 00:26:09,820 --> 00:26:10,320 Yeah. 508 00:26:10,320 --> 00:26:12,740 AUDIENCE: --changing the size of the blocks. 509 00:26:12,740 --> 00:26:14,280 JULIAN SHUN: Yeah, so I mean, if you 510 00:26:14,280 --> 00:26:16,790 do change the size of the blocks, 511 00:26:16,790 --> 00:26:19,140 then you can't actually use this scheme, 512 00:26:19,140 --> 00:26:21,600 so this is actually a variant where you don't 513 00:26:21,600 --> 00:26:22,962 change the size of the blocks. 514 00:26:22,962 --> 00:26:24,420 If you do change the size, then you 515 00:26:24,420 --> 00:26:26,003 have to change it for the entire page. 516 00:26:28,470 --> 00:26:32,740 Yeah, so there are many variants of memory allocators out there. 517 00:26:32,740 --> 00:26:35,470 This is just the simplest one that I described. 518 00:26:35,470 --> 00:26:40,510 But it turns out that this exact scheme isn't the one that's 519 00:26:40,510 --> 00:26:41,340 used in practice. 520 00:26:41,340 --> 00:26:42,340 There are many variants. 521 00:26:42,340 --> 00:26:45,880 Like some allocators, instead of using powers of 2, 522 00:26:45,880 --> 00:26:49,750 they use Fibonacci numbers to determine the different bins. 523 00:26:52,648 --> 00:26:53,620 Yeah. 524 00:26:53,620 --> 00:26:56,830 Any other questions? 525 00:26:56,830 --> 00:26:58,930 So you'll actually get a chance to play around 526 00:26:58,930 --> 00:27:02,440 with implementing some allocators in project 527 00:27:02,440 --> 00:27:03,715 3 and homework 6. 528 00:27:11,680 --> 00:27:16,420 So let's briefly look at the storage layout of a program. 529 00:27:16,420 --> 00:27:20,040 So this is how our virtual memory address 530 00:27:20,040 --> 00:27:22,390 space is laid out. 531 00:27:22,390 --> 00:27:25,940 So we have the stack all the way at the top, 532 00:27:25,940 --> 00:27:29,050 and the stack grows downwards, so we have the high addresses 533 00:27:29,050 --> 00:27:31,930 up top and the low addresses below. 534 00:27:31,930 --> 00:27:34,420 Then we have the heap, which grows upward, 535 00:27:34,420 --> 00:27:36,460 and the heap and the stack basically 536 00:27:36,460 --> 00:27:39,490 grow towards each other, and this space is dynamically 537 00:27:39,490 --> 00:27:42,610 allocated as the program runs. 538 00:27:42,610 --> 00:27:46,330 Then there's the bss segment, the data segment, 539 00:27:46,330 --> 00:27:51,440 and the text segment, which all reside below the heap. 540 00:27:51,440 --> 00:27:55,820 So the code segment just stores the code for your program. 541 00:27:55,820 --> 00:27:57,790 So when you load up your program, 542 00:27:57,790 --> 00:28:02,650 this code is going to put your program into this text segment. 543 00:28:02,650 --> 00:28:05,770 Then there's this data segment, which 544 00:28:05,770 --> 00:28:10,060 stores all of the global variables and static variables, 545 00:28:10,060 --> 00:28:12,460 these constants that you defined in your program. 546 00:28:12,460 --> 00:28:15,095 These are all stored in the data segment, 547 00:28:15,095 --> 00:28:16,720 and when you load your program you also 548 00:28:16,720 --> 00:28:20,230 have to read this data from disk and store it 549 00:28:20,230 --> 00:28:22,390 into the data segment. 550 00:28:22,390 --> 00:28:25,810 Then there's the bss segment this segment is 551 00:28:25,810 --> 00:28:28,240 used to store all the on initialize variables 552 00:28:28,240 --> 00:28:31,390 in your program, and this is just 553 00:28:31,390 --> 00:28:33,580 initialized to 0 at the start of your program, 554 00:28:33,580 --> 00:28:35,540 since your program hasn't initialized it, 555 00:28:35,540 --> 00:28:39,790 so it doesn't matter what we set it to. 556 00:28:39,790 --> 00:28:42,580 And then the heap-- this is the memory 557 00:28:42,580 --> 00:28:47,110 that we're using when we're calling malloc and free. 558 00:28:47,110 --> 00:28:50,380 And then we have the stack, which we talked about. 559 00:28:55,610 --> 00:28:57,947 So in practice, the stack and the heap 560 00:28:57,947 --> 00:29:00,280 are never actually going to hit each other because we're 561 00:29:00,280 --> 00:29:02,647 working with 64-bit addresses. 562 00:29:02,647 --> 00:29:04,730 So even though they're growing towards each other, 563 00:29:04,730 --> 00:29:06,370 you don't have to worry about them actually 564 00:29:06,370 --> 00:29:07,180 hitting each other. 565 00:29:10,040 --> 00:29:13,600 And another point to note is that if you're 566 00:29:13,600 --> 00:29:17,080 doing a lot of precomputation in your program, 567 00:29:17,080 --> 00:29:21,430 for example generating these huge tables of constants, 568 00:29:21,430 --> 00:29:23,560 those all have to be read from disk when 569 00:29:23,560 --> 00:29:26,363 you start your program and stored in this data segment. 570 00:29:26,363 --> 00:29:28,030 So if you have a lot of these constants, 571 00:29:28,030 --> 00:29:30,670 it's actually going to make your program loading 572 00:29:30,670 --> 00:29:33,590 time much higher. 573 00:29:33,590 --> 00:29:36,460 However, it's usually OK to do a little bit of precomputation, 574 00:29:36,460 --> 00:29:39,940 especially if you can save a lot of computation at runtime, 575 00:29:39,940 --> 00:29:41,530 but in some cases it might actually 576 00:29:41,530 --> 00:29:45,185 be faster overall to just compute the things in memory 577 00:29:45,185 --> 00:29:47,060 when you start your program, because then you 578 00:29:47,060 --> 00:29:48,370 have to read stuff from disk. 579 00:29:54,725 --> 00:29:55,600 So here's a question. 580 00:29:55,600 --> 00:29:59,560 So since a 64-bit address space takes over a century 581 00:29:59,560 --> 00:30:03,580 to write at a rate of 4 billion bytes per second, 582 00:30:03,580 --> 00:30:07,820 we're never effectively going to run out of virtual memory. 583 00:30:07,820 --> 00:30:10,840 So why don't we just allocate out of virtual memory and never 584 00:30:10,840 --> 00:30:11,650 free anything? 585 00:30:20,730 --> 00:30:23,600 Yes? 586 00:30:23,600 --> 00:30:25,064 AUDIENCE: If you allocate a bunch 587 00:30:25,064 --> 00:30:27,992 of small things in random places, 588 00:30:27,992 --> 00:30:31,783 then it's harder to update than a large segment? 589 00:30:31,783 --> 00:30:33,200 JULIAN SHUN: Yeah, so one thing is 590 00:30:33,200 --> 00:30:37,077 that you have this issue of fragmentation. 591 00:30:37,077 --> 00:30:38,660 The blocks of memory that you're using 592 00:30:38,660 --> 00:30:40,368 are not going to be contiguous in memory, 593 00:30:40,368 --> 00:30:42,905 and then it makes it harder for you to find large blocks. 594 00:30:45,560 --> 00:30:48,020 So this is called external fragmentation, 595 00:30:48,020 --> 00:30:49,550 which I mentioned earlier. 596 00:30:49,550 --> 00:30:52,040 So if you do this, external fragmentation 597 00:30:52,040 --> 00:30:54,140 is going to be very bad. 598 00:30:54,140 --> 00:30:55,670 The performance of the page table 599 00:30:55,670 --> 00:30:59,180 is going to degrade tremendously because the memory that you're 600 00:30:59,180 --> 00:31:01,720 using is going to be spread all over virtual memory, 601 00:31:01,720 --> 00:31:03,620 and you're going to use many pages, 602 00:31:03,620 --> 00:31:06,230 and this leads to disk thrashing. 603 00:31:06,230 --> 00:31:10,660 So you have to do a lot of swaps of pages in and out of disk. 604 00:31:10,660 --> 00:31:14,120 Your TLB hit rate is going to be very low. 605 00:31:18,350 --> 00:31:21,272 And another reason is that you're also 606 00:31:21,272 --> 00:31:22,730 going to run out of physical memory 607 00:31:22,730 --> 00:31:26,000 if you never free anything. 608 00:31:26,000 --> 00:31:28,160 So one of the goals of storage allocation 609 00:31:28,160 --> 00:31:31,310 is to try to use as little virtual memory as possible 610 00:31:31,310 --> 00:31:33,860 and to try to keep the used portions of the memory 611 00:31:33,860 --> 00:31:34,775 relatively compact. 612 00:31:38,400 --> 00:31:39,620 Any questions so far? 613 00:31:48,530 --> 00:31:53,020 OK, so let's do an analysis of the binned free list storage 614 00:31:53,020 --> 00:31:54,790 allocation scheme. 615 00:31:54,790 --> 00:31:55,850 So here's a theorem. 616 00:31:55,850 --> 00:31:59,110 Suppose that the maximum amount of heap memory in use 617 00:31:59,110 --> 00:32:03,400 at any time by a program is M. If the heap is managed 618 00:32:03,400 --> 00:32:06,700 by a binned free list allocator, then the amount 619 00:32:06,700 --> 00:32:09,010 of virtual memory consumed by the heap storage 620 00:32:09,010 --> 00:32:13,030 is upper bounded by M log M. Does anybody 621 00:32:13,030 --> 00:32:17,020 have an intuition about why this theorem could be true? 622 00:32:17,020 --> 00:32:19,270 So how many bins do we have, at most? 623 00:32:23,166 --> 00:32:24,153 AUDIENCE: [INAUDIBLE] 624 00:32:24,153 --> 00:32:24,945 JULIAN SHUN: Right. 625 00:32:24,945 --> 00:32:30,420 So the number of bins we have is upper bounded by log M, 626 00:32:30,420 --> 00:32:33,270 and each bin is going to use order M memory. 627 00:32:33,270 --> 00:32:35,130 So let's look at this more formally. 628 00:32:35,130 --> 00:32:38,490 So an allocation request for a block of size x 629 00:32:38,490 --> 00:32:42,360 is going to consume 2 to the ceiling of log base 2 630 00:32:42,360 --> 00:32:45,690 of x storage, which is upper bounded by 2x, 631 00:32:45,690 --> 00:32:51,270 so we're only wasting a factor of 2 storage here. 632 00:32:51,270 --> 00:32:53,370 So therefore, the amount of virtual memory 633 00:32:53,370 --> 00:32:59,370 devoted to blocks of size 2 to the k is at most 2M. 634 00:32:59,370 --> 00:33:04,230 And since there are at most log base 2 of M free lists, 635 00:33:04,230 --> 00:33:07,350 the theorem holds just by multiplying the two terms. 636 00:33:07,350 --> 00:33:09,600 So you can only have log base 2 of M free lists 637 00:33:09,600 --> 00:33:14,100 because that's the maximum amount of memory you're using, 638 00:33:14,100 --> 00:33:16,020 and therefore your largest bin is only 639 00:33:16,020 --> 00:33:22,150 going to hold blocks of size M. And it turns out 640 00:33:22,150 --> 00:33:26,410 that the bin free list allocation scheme is theta 641 00:33:26,410 --> 00:33:29,310 of 1 competitive with the optimal allocator, 642 00:33:29,310 --> 00:33:33,280 and here an optimal locator knows all of the memory 643 00:33:33,280 --> 00:33:34,790 requests in the future. 644 00:33:34,790 --> 00:33:37,900 So it can basically do a lot of clever things 645 00:33:37,900 --> 00:33:41,140 to optimize the memory allocation process. 646 00:33:41,140 --> 00:33:43,510 But it turns out that the binned free list is only 647 00:33:43,510 --> 00:33:45,310 going to be a constant factor worse 648 00:33:45,310 --> 00:33:47,530 than the optimal allocator. 649 00:33:47,530 --> 00:33:50,050 This is assuming that we don't coalesce blocks together, 650 00:33:50,050 --> 00:33:53,510 which I'll talk about on the next slide. 651 00:33:53,510 --> 00:33:56,110 It turns out that this constant is 6, 652 00:33:56,110 --> 00:34:00,550 so Charles Leiserson has a paper describing this result. 653 00:34:00,550 --> 00:34:03,520 And there's also a lower bound of 6, so this is tight. 654 00:34:07,950 --> 00:34:09,900 So coalescing. 655 00:34:09,900 --> 00:34:14,639 So coalescing is when you splice together smaller blocks 656 00:34:14,639 --> 00:34:15,584 into a larger block. 657 00:34:15,584 --> 00:34:18,000 So you can do this if you have two free blocks that 658 00:34:18,000 --> 00:34:19,139 are contiguous in memory. 659 00:34:19,139 --> 00:34:24,420 This will allow you to put them together into a larger block. 660 00:34:24,420 --> 00:34:26,370 So binned free lists can sometimes 661 00:34:26,370 --> 00:34:29,699 be heuristically improved by doing coalescing, 662 00:34:29,699 --> 00:34:31,290 and there are many clever schemes 663 00:34:31,290 --> 00:34:33,617 for trying to find adjacent blocks efficiently. 664 00:34:33,617 --> 00:34:35,159 So there's something called the buddy 665 00:34:35,159 --> 00:34:37,949 system, where each block has a buddy that's 666 00:34:37,949 --> 00:34:40,020 contiguous and memory. 667 00:34:40,020 --> 00:34:43,620 However, it turns out that this scheme especially, the buddy 668 00:34:43,620 --> 00:34:46,030 system scheme, has pretty high overhead. 669 00:34:46,030 --> 00:34:48,179 So it's usually going to be slower 670 00:34:48,179 --> 00:34:53,699 than just the standard binned free list algorithm. 671 00:34:53,699 --> 00:34:55,830 There are no good theoretical bounds 672 00:34:55,830 --> 00:34:59,430 that exist that prove the effectiveness of coalescing, 673 00:34:59,430 --> 00:35:01,890 but it does seem to work pretty well in practice 674 00:35:01,890 --> 00:35:06,270 at reducing fragmentation because heap storage tends 675 00:35:06,270 --> 00:35:09,640 to be deallocated as a stack or in batches. 676 00:35:09,640 --> 00:35:16,140 So what I mean by this is that the objects that you free 677 00:35:16,140 --> 00:35:19,138 tend to be pretty close together in memory. 678 00:35:19,138 --> 00:35:21,180 So if you deallocate as a stack, then all of them 679 00:35:21,180 --> 00:35:23,490 are going to be near the top of the stack. 680 00:35:23,490 --> 00:35:25,530 And when you deallocate in batches-- 681 00:35:25,530 --> 00:35:28,950 this is when you do allocate a whole bunch of things 682 00:35:28,950 --> 00:35:31,380 that you allocated together in your program. 683 00:35:31,380 --> 00:35:33,720 For example, if you have a graph data structure 684 00:35:33,720 --> 00:35:39,270 and you allocated data for the vertices all at the same time, 685 00:35:39,270 --> 00:35:41,460 then when you deallocate them all together, 686 00:35:41,460 --> 00:35:44,550 this is going to give you a chunk of contiguous memory 687 00:35:44,550 --> 00:35:46,880 that you can splice together. 688 00:35:46,880 --> 00:35:51,635 OK, so now let's look at garbage collection. 689 00:35:51,635 --> 00:35:53,760 This is going to be slightly different from storage 690 00:35:53,760 --> 00:35:55,060 allocation. 691 00:35:55,060 --> 00:35:58,320 So the idea of garbage collection 692 00:35:58,320 --> 00:36:01,650 is to free the programmer from having to free objects. 693 00:36:01,650 --> 00:36:04,110 So languages like Java and Python, 694 00:36:04,110 --> 00:36:05,710 they have built-in garbage collectors, 695 00:36:05,710 --> 00:36:09,210 so the programmer doesn't have to free stuff themselves, 696 00:36:09,210 --> 00:36:11,280 and this makes it easier to write programs 697 00:36:11,280 --> 00:36:13,770 because you don't have to worry about double 698 00:36:13,770 --> 00:36:16,270 freeing and dangling pointers and so forth. 699 00:36:19,410 --> 00:36:22,100 So a garbage collector is going to identify and recycle 700 00:36:22,100 --> 00:36:24,900 the objects that the programmer can no longer access 701 00:36:24,900 --> 00:36:29,490 so that these memory objects can be used for future allocations. 702 00:36:29,490 --> 00:36:32,640 And in addition to having a built-in garbage collector, 703 00:36:32,640 --> 00:36:35,880 you can also create your own garbage collector 704 00:36:35,880 --> 00:36:37,830 in C, which doesn't have a garbage collector. 705 00:36:37,830 --> 00:36:39,345 So if you have an application, you 706 00:36:39,345 --> 00:36:41,220 can actually create a special-purpose garbage 707 00:36:41,220 --> 00:36:43,335 collector that might be more efficient than 708 00:36:43,335 --> 00:36:44,940 a general garbage collector. 709 00:36:44,940 --> 00:36:45,812 Yes? 710 00:36:45,812 --> 00:36:50,330 AUDIENCE: This is the previous topic, but why [? INAUDIBLE ?] 711 00:36:50,330 --> 00:36:53,550 order of M memory? 712 00:36:53,550 --> 00:36:55,560 JULIAN SHUN: Why is it not order M? 713 00:36:55,560 --> 00:36:57,770 AUDIENCE: Yeah. 714 00:36:57,770 --> 00:36:59,520 JULIAN SHUN: Because for each of the bins, 715 00:36:59,520 --> 00:37:02,770 you could use up to order M memory. 716 00:37:02,770 --> 00:37:08,010 So if you don't do coalescing, basically, I 717 00:37:08,010 --> 00:37:10,140 could have a bunch of small allocations, 718 00:37:10,140 --> 00:37:13,080 and then I chop up all of my blocks, 719 00:37:13,080 --> 00:37:14,820 and then they all go into smaller bins. 720 00:37:14,820 --> 00:37:16,695 And then I want to allocate something larger. 721 00:37:16,695 --> 00:37:18,310 I can't just splice those together. 722 00:37:18,310 --> 00:37:21,130 I have to make another memory allocation. 723 00:37:21,130 --> 00:37:23,430 So if you order your memory requests in a certain way, 724 00:37:23,430 --> 00:37:27,120 you can make it so that each of the bins has order M memory. 725 00:37:35,770 --> 00:37:40,410 OK, so for garbage collection, let's go over some terminology. 726 00:37:40,410 --> 00:37:44,280 So there are three types of memory objects, roots, 727 00:37:44,280 --> 00:37:46,440 live objects, and dead objects. 728 00:37:46,440 --> 00:37:48,090 Roots are objects that are directly 729 00:37:48,090 --> 00:37:51,400 accessible by the program, so these are global variables, 730 00:37:51,400 --> 00:37:54,060 things on the stack, and so on. 731 00:37:54,060 --> 00:37:55,590 Then there are live objects, which 732 00:37:55,590 --> 00:38:00,120 are reachable by following the roots via pointers, 733 00:38:00,120 --> 00:38:01,830 and then finally, there are dead objects, 734 00:38:01,830 --> 00:38:05,070 and these objects are inaccessible via sequences 735 00:38:05,070 --> 00:38:05,580 of pointers. 736 00:38:05,580 --> 00:38:07,860 And these can be recycled because the programmer can 737 00:38:07,860 --> 00:38:09,735 no longer reach these dead objects. 738 00:38:14,380 --> 00:38:19,260 So in order for garbage collection to work in general, 739 00:38:19,260 --> 00:38:22,800 you need to be able to have the garbage collector identify 740 00:38:22,800 --> 00:38:25,680 pointers, and this requires strong typing. 741 00:38:25,680 --> 00:38:30,690 So languages like Python and Java have strong typing, 742 00:38:30,690 --> 00:38:33,840 but in C, it doesn't have strong typing. 743 00:38:33,840 --> 00:38:36,232 This means that when you have a pointer 744 00:38:36,232 --> 00:38:38,190 you don't actually know whether it's a pointer. 745 00:38:38,190 --> 00:38:41,100 Because a pointer just looks like an integer. 746 00:38:41,100 --> 00:38:43,060 It could be either a point or an integer. 747 00:38:43,060 --> 00:38:45,570 You can cast things in C. You can also 748 00:38:45,570 --> 00:38:50,220 do pointer arithmetic in C. So in contrast, 749 00:38:50,220 --> 00:38:52,170 in other languages, once you declare something 750 00:38:52,170 --> 00:38:55,170 to be a pointer, it's always going to be a pointer. 751 00:38:55,170 --> 00:38:57,630 And for those languages that have strong typing, 752 00:38:57,630 --> 00:39:01,380 this makes it much easier to do garbage collection. 753 00:39:01,380 --> 00:39:03,900 You also need to prohibit doing pointer arithmetic 754 00:39:03,900 --> 00:39:05,670 on these pointers. 755 00:39:05,670 --> 00:39:08,850 Because if you do pointer arithmetic 756 00:39:08,850 --> 00:39:12,210 and you change the location of the pointer, 757 00:39:12,210 --> 00:39:13,830 then the garbage collector no longer 758 00:39:13,830 --> 00:39:17,310 knows where the memory region starts anymore. 759 00:39:17,310 --> 00:39:20,910 In C, sometimes you do do pointer arithmetic, 760 00:39:20,910 --> 00:39:23,100 and that's why you can't actually 761 00:39:23,100 --> 00:39:26,280 have a general-purpose garbage collector in C that works well. 762 00:39:29,940 --> 00:39:36,380 So let's look at one simple form of garbage collection. 763 00:39:36,380 --> 00:39:39,020 And this is called reference counting. 764 00:39:39,020 --> 00:39:41,390 The idea is that, for each object, 765 00:39:41,390 --> 00:39:44,240 I'm going to keep a count of the number of pointers referencing 766 00:39:44,240 --> 00:39:46,160 that object. 767 00:39:46,160 --> 00:39:48,670 And if the count ever goes to 0, then that 768 00:39:48,670 --> 00:39:50,420 means I can free that object because there 769 00:39:50,420 --> 00:39:54,420 are no more pointers that can reach that object. 770 00:39:54,420 --> 00:39:57,440 So here, I have a bunch of roots. 771 00:39:57,440 --> 00:40:00,200 So these are directly accessible by my program. 772 00:40:00,200 --> 00:40:01,640 And then I have a bunch of objects 773 00:40:01,640 --> 00:40:05,750 that can be reached via following pointers 774 00:40:05,750 --> 00:40:07,100 starting from the root. 775 00:40:07,100 --> 00:40:09,440 And then each of them have a reference count 776 00:40:09,440 --> 00:40:12,290 that indicates how many incoming pointers they have. 777 00:40:16,430 --> 00:40:19,120 So let's say now I change one of these pointers. 778 00:40:19,120 --> 00:40:21,400 So initially, I had a pointer going to here, 779 00:40:21,400 --> 00:40:24,550 but now I changed it so that it goes down here. 780 00:40:24,550 --> 00:40:27,700 So what happens now is I have to adjust a reference counts 781 00:40:27,700 --> 00:40:29,780 of both of these objects. 782 00:40:29,780 --> 00:40:32,500 So this object here, now it doesn't 783 00:40:32,500 --> 00:40:35,260 have any incoming pointers, so I have to decrement its reference 784 00:40:35,260 --> 00:40:36,350 count. 785 00:40:36,350 --> 00:40:37,750 So that goes to 0. 786 00:40:37,750 --> 00:40:40,400 And then for this one, I have to increment its reference count, 787 00:40:40,400 --> 00:40:42,470 so now it's 3. 788 00:40:42,470 --> 00:40:45,790 And now I have an object that has a reference count of 0, 789 00:40:45,790 --> 00:40:47,740 and with this reference counting algorithm, 790 00:40:47,740 --> 00:40:50,062 I can free this object. 791 00:40:50,062 --> 00:40:52,450 So let's go ahead and free this object. 792 00:40:52,450 --> 00:40:54,310 But when I free this object, it actually 793 00:40:54,310 --> 00:40:56,350 has pointers to other objects, so I also 794 00:40:56,350 --> 00:41:00,520 have to decrement the reference counts of these other objects 795 00:41:00,520 --> 00:41:02,420 when I free this object. 796 00:41:02,420 --> 00:41:05,010 So I'm going to decrement the counts. 797 00:41:05,010 --> 00:41:07,630 And now it turns out that this object also 798 00:41:07,630 --> 00:41:10,750 has a reference count of 0, so I can free that, as well. 799 00:41:10,750 --> 00:41:13,300 And in general, I just keep doing this process 800 00:41:13,300 --> 00:41:15,070 until the reference counts of the objects 801 00:41:15,070 --> 00:41:18,340 don't change anymore, and whenever I encounter an object 802 00:41:18,340 --> 00:41:23,110 with a reference count of 0, I can free it immediately. 803 00:41:23,110 --> 00:41:26,260 And the memory that I freed can be recycled. 804 00:41:26,260 --> 00:41:30,420 It can be used for future memory allocations. 805 00:41:30,420 --> 00:41:32,273 So questions on how the reference counting 806 00:41:32,273 --> 00:41:32,940 procedure works? 807 00:41:40,760 --> 00:41:43,670 So there's one issue with reference counting. 808 00:41:43,670 --> 00:41:47,150 Does anybody see what the issue is? 809 00:41:47,150 --> 00:41:47,780 Yes? 810 00:41:47,780 --> 00:41:50,852 AUDIENCE: What if it has a reference to itself? 811 00:41:50,852 --> 00:41:51,560 JULIAN SHUN: Yes. 812 00:41:51,560 --> 00:41:55,970 So what if it has a reference to itself? 813 00:41:55,970 --> 00:41:59,930 More generally, what if it has a cycle? 814 00:41:59,930 --> 00:42:03,200 You can't ever collect garbage collect a cycle when 815 00:42:03,200 --> 00:42:06,440 you're using reference counts. 816 00:42:06,440 --> 00:42:10,100 So here we have a cycle of length 3. 817 00:42:10,100 --> 00:42:12,020 They all have a reference count of 1, 818 00:42:12,020 --> 00:42:16,100 but you can never reach the cycle by following pointers 819 00:42:16,100 --> 00:42:18,440 from the root, and therefore, you 820 00:42:18,440 --> 00:42:21,215 can never delete any object in the cycle, 821 00:42:21,215 --> 00:42:23,590 and the reference counts are always going to be non-zero. 822 00:42:27,410 --> 00:42:29,660 So let's just illustrate the cycle. 823 00:42:29,660 --> 00:42:37,200 And furthermore, any object that's pointed to by objects 824 00:42:37,200 --> 00:42:42,450 in the cycle cannot be garbage collected, as well, 825 00:42:42,450 --> 00:42:44,580 because you can't garbage collect the cycle, 826 00:42:44,580 --> 00:42:48,600 so all the pointer is going out of the objects in the cycle are 827 00:42:48,600 --> 00:42:50,680 always going to be there. 828 00:42:50,680 --> 00:42:52,770 So there could be a lot of objects downstream 829 00:42:52,770 --> 00:42:55,800 from this object here that can't be garbage collected, 830 00:42:55,800 --> 00:42:57,320 so this makes it very bad. 831 00:42:59,980 --> 00:43:03,220 And as we all know, uncollected garbage stinks, 832 00:43:03,220 --> 00:43:05,420 so we don't want that. 833 00:43:05,420 --> 00:43:09,530 So let's see if we can come up with another garbage collection 834 00:43:09,530 --> 00:43:10,030 scheme. 835 00:43:13,195 --> 00:43:15,320 So it turns out that reference counting is actually 836 00:43:15,320 --> 00:43:18,200 pretty good when it does work because it's 837 00:43:18,200 --> 00:43:20,420 very efficient and simple to implement. 838 00:43:20,420 --> 00:43:22,730 So if you know that your program doesn't 839 00:43:22,730 --> 00:43:25,760 have these cycles in them among pointers, 840 00:43:25,760 --> 00:43:29,840 then you can use a reference counting scheme. 841 00:43:29,840 --> 00:43:32,000 There are some languages, like Objective-C, 842 00:43:32,000 --> 00:43:35,000 that have two different types of pointers, strong pointers 843 00:43:35,000 --> 00:43:37,040 and weak pointers. 844 00:43:37,040 --> 00:43:39,560 And if you're doing reference counting 845 00:43:39,560 --> 00:43:42,260 with a language with these two types of pointers, 846 00:43:42,260 --> 00:43:45,020 the reference count only stores the number 847 00:43:45,020 --> 00:43:48,050 of incoming strong pointers. 848 00:43:48,050 --> 00:43:51,590 And therefore, if you define these pointers inside a cycle 849 00:43:51,590 --> 00:43:53,510 to be weak pointers, they're not going 850 00:43:53,510 --> 00:43:55,220 to contribute to the reference count, 851 00:43:55,220 --> 00:43:58,330 and therefore you can still garbage collect. 852 00:43:58,330 --> 00:44:01,520 However, programming with strong or weak pointers 853 00:44:01,520 --> 00:44:03,230 can be kind of tricky because you 854 00:44:03,230 --> 00:44:06,710 need to make sure that you're not dereferencing something 855 00:44:06,710 --> 00:44:08,960 that a weak pointer points to because that thing might 856 00:44:08,960 --> 00:44:10,670 have been garbage collected already, 857 00:44:10,670 --> 00:44:12,380 so you need to be careful. 858 00:44:12,380 --> 00:44:16,580 And C doesn't have these two types of pointers, 859 00:44:16,580 --> 00:44:19,520 so we need to use another method of garbage collection 860 00:44:19,520 --> 00:44:22,880 to make sure we can garbage collect these cycles. 861 00:44:22,880 --> 00:44:25,230 So we're going to look at two more garbage collection 862 00:44:25,230 --> 00:44:25,730 schemes. 863 00:44:25,730 --> 00:44:27,650 The first one is called mark-and-sweep, 864 00:44:27,650 --> 00:44:32,510 and the second one is called stop-and-copy. 865 00:44:32,510 --> 00:44:37,520 So first we need to define a graph abstraction. 866 00:44:37,520 --> 00:44:42,620 So let's say we have a graph with vertices V and edges E. 867 00:44:42,620 --> 00:44:45,260 And the vertex at V contains all of the memory 868 00:44:45,260 --> 00:44:48,860 objects in memory, and the edges E are 869 00:44:48,860 --> 00:44:53,510 directed edges between objects. 870 00:44:53,510 --> 00:44:55,640 So there's a directed edge from object A 871 00:44:55,640 --> 00:45:02,190 to object B if object A has a pointer to object B. 872 00:45:02,190 --> 00:45:05,980 And then, as we said earlier, the live objects are the ones 873 00:45:05,980 --> 00:45:07,750 that are reachable from the roots, 874 00:45:07,750 --> 00:45:09,730 so we can use a breadth-first-search-like 875 00:45:09,730 --> 00:45:11,922 procedure to find all of the live objects. 876 00:45:11,922 --> 00:45:13,630 So we just start our breadth-first search 877 00:45:13,630 --> 00:45:16,465 from the roots, and we'll mark all 878 00:45:16,465 --> 00:45:18,850 of the objects that can be reachable from the roots. 879 00:45:18,850 --> 00:45:22,330 And then everything else that isn't reached, 880 00:45:22,330 --> 00:45:27,430 those are available to be reclaimed. 881 00:45:27,430 --> 00:45:30,130 So we're going to have a FIFO queue, 882 00:45:30,130 --> 00:45:33,550 First-In, First-Out queue, for our breadth-first search. 883 00:45:33,550 --> 00:45:35,390 This is represented as an array. 884 00:45:35,390 --> 00:45:38,740 And we have two pointers, one to the head of the queue 885 00:45:38,740 --> 00:45:41,440 and one to the tail of the queue. 886 00:45:41,440 --> 00:45:44,650 And here let's look at this code, which essentially 887 00:45:44,650 --> 00:45:46,600 is like a breadth-first search. 888 00:45:46,600 --> 00:45:51,030 So we're first going to go over all the vertices in our graph, 889 00:45:51,030 --> 00:45:53,770 and we're going to check if each vertex v is a root. 890 00:45:53,770 --> 00:45:58,150 If it is a root, we're going to set its mark to be 1, 891 00:45:58,150 --> 00:46:02,100 and we're going to place the vertex onto the queue. 892 00:46:02,100 --> 00:46:06,490 And otherwise, we're going to set the mark of v to be 0. 893 00:46:06,490 --> 00:46:08,390 And then while the queue is not empty, 894 00:46:08,390 --> 00:46:10,840 we're going to dequeue the first thing from the queue. 895 00:46:10,840 --> 00:46:12,350 Let that be u. 896 00:46:12,350 --> 00:46:15,250 Then we're going to look at all the outgoing neighbors of u. 897 00:46:15,250 --> 00:46:17,730 So these are vertices v such that there 898 00:46:17,730 --> 00:46:21,730 is a directed edge from u to v. We're going to check 899 00:46:21,730 --> 00:46:24,550 if v's mark is equal to 0. 900 00:46:24,550 --> 00:46:26,680 If it is, that means we haven't explored it yet, 901 00:46:26,680 --> 00:46:30,520 so we'll set its mark to be 1, and we place it onto the queue. 902 00:46:30,520 --> 00:46:32,530 And if the neighbor has already been explored, 903 00:46:32,530 --> 00:46:33,947 then we don't have to do anything. 904 00:46:36,560 --> 00:46:39,320 So let's illustrate how this algorithm works 905 00:46:39,320 --> 00:46:41,600 on this simple graph here. 906 00:46:41,600 --> 00:46:43,220 And for this example, I'm just going 907 00:46:43,220 --> 00:46:45,710 to assume that I have one root, vertex r. 908 00:46:45,710 --> 00:46:47,360 In general, I can have multiple routes, 909 00:46:47,360 --> 00:46:49,310 and I just place all of them onto the queue 910 00:46:49,310 --> 00:46:51,090 at the beginning, but for this example, 911 00:46:51,090 --> 00:46:53,810 I'm just going to have a single root. 912 00:46:53,810 --> 00:46:57,650 So I'm going to place it onto the queue, and the location 913 00:46:57,650 --> 00:47:00,890 that I place it is going to be where the tail pointer points 914 00:47:00,890 --> 00:47:01,610 to. 915 00:47:01,610 --> 00:47:03,068 And after I placed it on the queue, 916 00:47:03,068 --> 00:47:05,780 I increment the tail pointer. 917 00:47:05,780 --> 00:47:07,730 Now I'm going to take the first thing off 918 00:47:07,730 --> 00:47:11,780 of my queue, which is r, and I'll explore my neighbors. 919 00:47:11,780 --> 00:47:14,360 So the neighbors are b and c here. 920 00:47:14,360 --> 00:47:18,065 Both of them haven't been marked yet, so I'm going to mark them, 921 00:47:18,065 --> 00:47:19,940 and I'm going to indicate the marked vertices 922 00:47:19,940 --> 00:47:23,150 with shaded blue. 923 00:47:23,150 --> 00:47:26,840 And I'll place them onto the queue. 924 00:47:26,840 --> 00:47:28,750 Now I'm going to take the next thing, b. 925 00:47:28,750 --> 00:47:30,950 I'm going to check its neighbors. 926 00:47:30,950 --> 00:47:34,190 It only has a neighbor to c, but c is already on the queue. 927 00:47:34,190 --> 00:47:37,460 It's already marked, so I don't have to do anything. 928 00:47:37,460 --> 00:47:41,450 Now I dequeue c, and c has neighbors d and e, 929 00:47:41,450 --> 00:47:43,796 so I place them onto the queue. 930 00:47:43,796 --> 00:47:46,310 d doesn't have any outgoing neighbors, 931 00:47:46,310 --> 00:47:48,560 so I don't to do anything. 932 00:47:48,560 --> 00:47:51,950 Now when I dequeue e, it has neighbors f. 933 00:47:51,950 --> 00:47:54,103 When I dequeue f, it has a neighbor g, 934 00:47:54,103 --> 00:47:56,270 and when I dequeue g, it doesn't have any neighbors. 935 00:47:56,270 --> 00:47:59,115 So now my queue is empty, and my breadth-first search procedure 936 00:47:59,115 --> 00:47:59,615 finishes. 937 00:48:02,780 --> 00:48:04,870 So at this point, I've marked all 938 00:48:04,870 --> 00:48:08,800 of the objects that are accessible from the root, 939 00:48:08,800 --> 00:48:12,043 and all of the unmarked objects can now 940 00:48:12,043 --> 00:48:13,960 be garbage collected because there is no way I 941 00:48:13,960 --> 00:48:16,840 can access them in the program. 942 00:48:16,840 --> 00:48:20,892 So the mark-and-sweep procedure has two stages. 943 00:48:20,892 --> 00:48:22,600 The first stage is called the mark stage, 944 00:48:22,600 --> 00:48:24,550 where I use a breadth-first search 945 00:48:24,550 --> 00:48:27,670 to mark all of the live objects. 946 00:48:27,670 --> 00:48:30,160 And the sweep stage will scan over memory 947 00:48:30,160 --> 00:48:33,700 to free the unmarked objects. 948 00:48:33,700 --> 00:48:37,150 So this a pretty simple scheme. 949 00:48:37,150 --> 00:48:39,730 There is one issue with this scheme. 950 00:48:39,730 --> 00:48:43,030 Does anybody see what the possible issue is? 951 00:48:47,130 --> 00:48:47,640 Yes? 952 00:48:47,640 --> 00:48:49,540 AUDIENCE: You have to scan over all the [INAUDIBLE].. 953 00:48:49,540 --> 00:48:51,990 JULIAN SHUN: Yeah, so that's one issue, where you have 954 00:48:51,990 --> 00:48:54,640 to scan over all of memory. 955 00:48:54,640 --> 00:48:56,790 There are some variants of mark-and-sweep 956 00:48:56,790 --> 00:49:00,650 where it keeps track of just the allocated objects, 957 00:49:00,650 --> 00:49:02,490 so you only have to scan over those instead 958 00:49:02,490 --> 00:49:05,160 of the entire memory space. 959 00:49:05,160 --> 00:49:08,010 Besides that, are there any other possible issues 960 00:49:08,010 --> 00:49:10,990 with this? 961 00:49:10,990 --> 00:49:11,530 Yes? 962 00:49:11,530 --> 00:49:16,190 AUDIENCE: This also requires that you [INAUDIBLE] 963 00:49:16,190 --> 00:49:17,622 strong typing. 964 00:49:17,622 --> 00:49:19,080 JULIAN SHUN: Right, so let's assume 965 00:49:19,080 --> 00:49:21,900 that we do have strong typing. 966 00:49:21,900 --> 00:49:25,840 Any other possible limitations? 967 00:49:25,840 --> 00:49:27,210 Anybody else? 968 00:49:27,210 --> 00:49:28,210 Think I called on-- 969 00:49:28,210 --> 00:49:29,247 yeah. 970 00:49:29,247 --> 00:49:30,955 AUDIENCE: [INAUDIBLE] reference counting, 971 00:49:30,955 --> 00:49:34,190 you can see the object that has a reference to it, 972 00:49:34,190 --> 00:49:36,982 whereas for here you can find everything that would not be 973 00:49:36,982 --> 00:49:38,550 garbage collected [INAUDIBLE]. 974 00:49:42,780 --> 00:49:45,280 JULIAN SHUN: Yeah, so for the scheme that I described, 975 00:49:45,280 --> 00:49:49,210 you have to look over all of the things that 976 00:49:49,210 --> 00:49:51,100 don't have references to it. 977 00:49:51,100 --> 00:49:55,040 So that is another overhead. 978 00:49:55,040 --> 00:49:57,880 So those are all issues. 979 00:49:57,880 --> 00:49:58,380 Good. 980 00:49:58,380 --> 00:50:00,070 The issue I want to get at is that 981 00:50:00,070 --> 00:50:02,260 the mark-and-sweep algorithm that I presented here 982 00:50:02,260 --> 00:50:04,460 doesn't deal with fragmentation. 983 00:50:04,460 --> 00:50:07,210 So it doesn't compact the live objects 984 00:50:07,210 --> 00:50:09,450 to be contiguous in memory. 985 00:50:09,450 --> 00:50:11,310 It just frees the ones that are unreachable, 986 00:50:11,310 --> 00:50:16,000 but it doesn't do anything with the ones that are reachable. 987 00:50:16,000 --> 00:50:18,070 So let's look at another procedure that 988 00:50:18,070 --> 00:50:23,020 does deal with fragmentation. 989 00:50:23,020 --> 00:50:26,260 This is called the stop-and-copy garbage collection procedure. 990 00:50:30,730 --> 00:50:34,030 At a high level, it's pretty similar to 991 00:50:34,030 --> 00:50:36,632 the mark-and-sweep algorithm. 992 00:50:36,632 --> 00:50:38,590 We're still going to use a breadth-first search 993 00:50:38,590 --> 00:50:40,915 to identify all of the live objects. 994 00:50:43,690 --> 00:50:46,180 But if you look at how this breadth-first search is 995 00:50:46,180 --> 00:50:48,340 implemented, is there any information 996 00:50:48,340 --> 00:50:51,580 you can use here to try to get the live objects to be 997 00:50:51,580 --> 00:50:54,980 contiguous in memory? 998 00:50:54,980 --> 00:50:57,740 Does anybody see anything here that we can use 999 00:50:57,740 --> 00:50:59,450 to try to reduce fragmentation? 1000 00:50:59,450 --> 00:51:00,202 Yes? 1001 00:51:00,202 --> 00:51:01,380 AUDIENCE: [INAUDIBLE] 1002 00:51:01,380 --> 00:51:03,630 JULIAN SHUN: Yes, so the answer is 1003 00:51:03,630 --> 00:51:07,230 that the objects that we visited are contiguous on the queue. 1004 00:51:07,230 --> 00:51:11,820 So in the mark-and-sweep algorithm, 1005 00:51:11,820 --> 00:51:14,190 I just place the IDs of the vertices on the queue, 1006 00:51:14,190 --> 00:51:17,880 but if I just place the actual objects onto the queue instead, 1007 00:51:17,880 --> 00:51:21,810 then I can just use my queue as my new memory. 1008 00:51:21,810 --> 00:51:25,200 And then all of the objects that are unreachable 1009 00:51:25,200 --> 00:51:27,660 will be implicitly deleted. 1010 00:51:27,660 --> 00:51:31,450 So this procedure here will deal with external fragmentation. 1011 00:51:31,450 --> 00:51:33,880 So let's see how this works. 1012 00:51:33,880 --> 00:51:36,510 So we're going to have two separate memory 1013 00:51:36,510 --> 00:51:40,770 spaces, the FROM space and the TO space. 1014 00:51:40,770 --> 00:51:44,550 So in the FROM space, I'm just going to do allocation 1015 00:51:44,550 --> 00:51:46,795 and freeing on it until it becomes full. 1016 00:51:46,795 --> 00:51:48,420 So when I allocate something I place it 1017 00:51:48,420 --> 00:51:50,340 at the end of this space. 1018 00:51:50,340 --> 00:51:53,490 When I free something, I just market as free, 1019 00:51:53,490 --> 00:51:55,740 but I don't compact it out yet. 1020 00:51:55,740 --> 00:51:59,760 And when this FROM space becomes full, 1021 00:51:59,760 --> 00:52:03,960 then I'm going to run my garbage collection algorithm, 1022 00:52:03,960 --> 00:52:06,938 and I'm going to use the TO space as my queue 1023 00:52:06,938 --> 00:52:08,355 when I do my breadth-first search. 1024 00:52:10,930 --> 00:52:15,910 So after I run my breadth-first search, all of the live objects 1025 00:52:15,910 --> 00:52:19,900 are going to appear in the TO space in contiguous memory 1026 00:52:19,900 --> 00:52:22,210 since I used the TO space as my queue. 1027 00:52:26,060 --> 00:52:28,780 Right, and then I just keep allocating stuff 1028 00:52:28,780 --> 00:52:31,600 from the TO space and also marking things 1029 00:52:31,600 --> 00:52:34,300 as deleted when I free them until the TO space 1030 00:52:34,300 --> 00:52:35,290 becomes full. 1031 00:52:35,290 --> 00:52:37,300 Then I do the same thing, but I swap the roles 1032 00:52:37,300 --> 00:52:40,340 of the TO and the FROM spaces. 1033 00:52:40,340 --> 00:52:43,420 So this is called the stop-and-copy algorithm. 1034 00:52:43,420 --> 00:52:46,690 There is one problem with this algorithm 1035 00:52:46,690 --> 00:52:48,160 which we haven't addressed yet. 1036 00:52:48,160 --> 00:52:51,515 Does anybody see what the potential problem is? 1037 00:52:51,515 --> 00:52:52,015 Yes? 1038 00:52:52,015 --> 00:52:53,752 AUDIENCE: If nothing is dead, then you're 1039 00:52:53,752 --> 00:52:56,727 copying over your entire storage every single time. 1040 00:52:56,727 --> 00:52:58,810 JULIAN SHUN: Yeah, so that's one good observation. 1041 00:52:58,810 --> 00:53:01,780 If nothing is dead, then you're wasting a lot of work 1042 00:53:01,780 --> 00:53:04,848 because you have to copy this. 1043 00:53:04,848 --> 00:53:06,640 Although with the mark-and-sweep algorithm, 1044 00:53:06,640 --> 00:53:08,050 you still have to do some copying, 1045 00:53:08,050 --> 00:53:10,008 although you're not copying the entire objects. 1046 00:53:10,008 --> 00:53:11,830 You're just copying the IDs. 1047 00:53:11,830 --> 00:53:15,460 There's actually a correctness issue here. 1048 00:53:15,460 --> 00:53:17,620 So does anybody see what the correct this issue is? 1049 00:53:28,450 --> 00:53:29,050 Yes? 1050 00:53:29,050 --> 00:53:31,966 AUDIENCE: So maybe the pointers in the TO space 1051 00:53:31,966 --> 00:53:36,340 have to be changed in order to point to the new [INAUDIBLE].. 1052 00:53:36,340 --> 00:53:38,060 JULIAN SHUN: Yeah, so the answer is 1053 00:53:38,060 --> 00:53:43,480 that if you had pointers that pointed to objects in the FROM 1054 00:53:43,480 --> 00:53:47,020 space, if you move your objects to the TO space, 1055 00:53:47,020 --> 00:53:49,280 those pointers aren't going to be correct anymore. 1056 00:53:49,280 --> 00:53:52,100 So if I had a pointer to a live object before 1057 00:53:52,100 --> 00:53:55,030 and I moved my live object to a different memory address, 1058 00:53:55,030 --> 00:53:58,230 I need to also update that pointer. 1059 00:53:58,230 --> 00:54:02,220 So let's see how we can deal with this. 1060 00:54:02,220 --> 00:54:05,160 So the idea is that, when an object is copied 1061 00:54:05,160 --> 00:54:07,860 to the TO space, we're going to store a forwarding 1062 00:54:07,860 --> 00:54:11,250 pointer in the corresponding object 1063 00:54:11,250 --> 00:54:14,880 in the from space, and this implicitly marks 1064 00:54:14,880 --> 00:54:16,950 that object as moved. 1065 00:54:16,950 --> 00:54:19,650 And then when I remove an object from the FIFO 1066 00:54:19,650 --> 00:54:23,430 queue in my breadth-first search, in the TO space 1067 00:54:23,430 --> 00:54:25,110 I'm going to update all of the pointers 1068 00:54:25,110 --> 00:54:28,340 by following these forwarding pointers. 1069 00:54:28,340 --> 00:54:32,640 So let's look at an example of how this works. 1070 00:54:32,640 --> 00:54:35,640 So let's say I'm executing the breadth-first search, 1071 00:54:35,640 --> 00:54:39,070 and this is my current queue right now. 1072 00:54:42,540 --> 00:54:45,180 What I'm going to do is, when I dequeue an element 1073 00:54:45,180 --> 00:54:51,000 from my queue, first I'm going to place the neighboring 1074 00:54:51,000 --> 00:54:54,120 objects that haven't been explored yet onto the queue. 1075 00:54:54,120 --> 00:54:55,710 So here it actually has two neighbors, 1076 00:54:55,710 --> 00:54:57,877 but the first one has already been placed the queue, 1077 00:54:57,877 --> 00:54:59,130 so I can ignore it. 1078 00:54:59,130 --> 00:55:01,470 And the second one hasn't been placed on the queue yet, 1079 00:55:01,470 --> 00:55:02,960 so I place it onto the queue. 1080 00:55:06,350 --> 00:55:09,240 And then I'm also going to-- 1081 00:55:09,240 --> 00:55:12,695 oh, so this object also has a pointer 1082 00:55:12,695 --> 00:55:14,570 to something in the FROM space, which I'm not 1083 00:55:14,570 --> 00:55:16,250 going to change at this time. 1084 00:55:16,250 --> 00:55:18,890 But I am going to store a forwarding pointer 1085 00:55:18,890 --> 00:55:21,980 from the object that I moved from the FROM space 1086 00:55:21,980 --> 00:55:25,700 to the TO space, so now it has a pointer 1087 00:55:25,700 --> 00:55:28,080 that tells me the new address. 1088 00:55:28,080 --> 00:55:32,270 And then, for the object that I just dequeued, 1089 00:55:32,270 --> 00:55:33,740 I'm going to follow the forwarding 1090 00:55:33,740 --> 00:55:36,800 pointers of its neighbors, and that will give me 1091 00:55:36,800 --> 00:55:39,970 the correct addresses now. 1092 00:55:39,970 --> 00:55:42,982 So I'm going to update the pointers by just following 1093 00:55:42,982 --> 00:55:43,940 the forwarding pointer. 1094 00:55:43,940 --> 00:55:47,360 So the first pointer pointed to this object, 1095 00:55:47,360 --> 00:55:51,770 which has a forwarding pointer to this, so I just make a point 1096 00:55:51,770 --> 00:55:53,830 to this object in the TO space. 1097 00:55:53,830 --> 00:55:56,870 And then similarly for the other pointer, 1098 00:55:56,870 --> 00:55:58,700 I'm going to make it point to this object. 1099 00:56:02,500 --> 00:56:05,560 So that's the that's the idea how 1100 00:56:05,560 --> 00:56:08,530 of how to adjust the pointers. 1101 00:56:08,530 --> 00:56:11,460 One question is, why can't we just adjust the pointer 1102 00:56:11,460 --> 00:56:13,385 is when we enqueue the object? 1103 00:56:13,385 --> 00:56:15,010 So why do I have to adjust the pointers 1104 00:56:15,010 --> 00:56:16,052 when I dequeue an object? 1105 00:56:19,630 --> 00:56:20,581 Yes? 1106 00:56:20,581 --> 00:56:23,410 AUDIENCE: Because we haven't processed its neighbors yet. 1107 00:56:23,410 --> 00:56:25,510 JULIAN SHUN: Yeah, so the answer is 1108 00:56:25,510 --> 00:56:28,752 that, when you enqueue object, you don't actually 1109 00:56:28,752 --> 00:56:30,210 know where your neighbors are going 1110 00:56:30,210 --> 00:56:32,080 to reside in the TO space. 1111 00:56:32,080 --> 00:56:35,380 And you only know that when you dequeue the object, 1112 00:56:35,380 --> 00:56:38,170 because when you dequeue the object, 1113 00:56:38,170 --> 00:56:39,820 you must have explored your neighbors, 1114 00:56:39,820 --> 00:56:42,070 and therefore you can generate these forward pointers. 1115 00:56:44,860 --> 00:56:46,720 So any questions on this scheme? 1116 00:56:54,690 --> 00:56:59,460 So how much time does it take to do the stop-and-copy procedure? 1117 00:56:59,460 --> 00:57:03,960 So let's say n is the number of objects 1118 00:57:03,960 --> 00:57:05,430 and the number of pointers I have, 1119 00:57:05,430 --> 00:57:09,600 so it's the sum of the number of objects and number of pointers. 1120 00:57:09,600 --> 00:57:12,479 How much time would it take to run this algorithm? 1121 00:57:20,143 --> 00:57:22,828 AUDIENCE: [INAUDIBLE] 1122 00:57:22,828 --> 00:57:24,370 JULIAN SHUN: Yeah, so it's just going 1123 00:57:24,370 --> 00:57:26,995 to be linear time because we're running a breadth-first search, 1124 00:57:26,995 --> 00:57:28,680 and that takes linear time. 1125 00:57:32,400 --> 00:57:34,410 You also have to do work in order 1126 00:57:34,410 --> 00:57:38,730 to copy these objects to the TO space, 1127 00:57:38,730 --> 00:57:40,410 so you also have to do work proportional 1128 00:57:40,410 --> 00:57:44,200 to the number of bytes that you're copying over. 1129 00:57:44,200 --> 00:57:46,660 So it's linear in the number of objects, 1130 00:57:46,660 --> 00:57:50,220 the number of pointers, and the total amount of space 1131 00:57:50,220 --> 00:57:53,180 or copying over. 1132 00:57:53,180 --> 00:57:56,900 And the advantage of this scheme is 1133 00:57:56,900 --> 00:58:00,560 that you don't actually need to go over the objects that 1134 00:58:00,560 --> 00:58:03,230 aren't reachable because those are going to be implicitly 1135 00:58:03,230 --> 00:58:06,470 deleted because they're not copied over to the TO space, 1136 00:58:06,470 --> 00:58:08,420 whereas in the mark-and-sweep procedure, 1137 00:58:08,420 --> 00:58:12,050 you had to actually go through your entire memory 1138 00:58:12,050 --> 00:58:14,840 and then free all the objects that aren't reachable. 1139 00:58:14,840 --> 00:58:20,000 So this makes the stop-and-copy procedure more efficient, 1140 00:58:20,000 --> 00:58:22,480 and it also deals with the external fragmentation issue. 1141 00:58:28,630 --> 00:58:34,400 So what happens when the FROM space becomes full? 1142 00:58:34,400 --> 00:58:37,630 So what you do is, you're going to request a new heap 1143 00:58:37,630 --> 00:58:40,660 space equal to the used space, so you're just 1144 00:58:40,660 --> 00:58:42,910 going to double the size of your FROM space. 1145 00:58:44,980 --> 00:58:47,230 And then you're going to consider the FROM space to be 1146 00:58:47,230 --> 00:58:49,997 full when the newly allocated space becomes full, 1147 00:58:49,997 --> 00:58:51,580 so essentially what you're going to do 1148 00:58:51,580 --> 00:58:53,260 is, you're going to double the space, 1149 00:58:53,260 --> 00:58:54,927 and when that becomes full, you're going 1150 00:58:54,927 --> 00:58:56,690 to double it again, and so on. 1151 00:58:56,690 --> 00:58:58,750 And with this method, you can amortize 1152 00:58:58,750 --> 00:59:02,290 the cost of garbage collection to the size of the new heap 1153 00:59:02,290 --> 00:59:05,770 space, so it's going to be amortized constant overhead 1154 00:59:05,770 --> 00:59:08,900 per byte of memory. 1155 00:59:08,900 --> 00:59:10,510 And this is assuming that the user 1156 00:59:10,510 --> 00:59:13,630 program is going to touch all of the memory that it allocates. 1157 00:59:17,300 --> 00:59:19,340 And furthermore, the virtual memory space 1158 00:59:19,340 --> 00:59:22,460 required by this scheme is just a constant times 1159 00:59:22,460 --> 00:59:25,910 the optimal if you locate the FROM and the TO spaces 1160 00:59:25,910 --> 00:59:27,620 in different regions of virtual memory 1161 00:59:27,620 --> 00:59:30,420 so that they can't interfere with each other. 1162 00:59:30,420 --> 00:59:32,820 And the reason why it's a constant times the optimal 1163 00:59:32,820 --> 00:59:36,350 is because you only lose a factor of 2 1164 00:59:36,350 --> 00:59:39,080 because you're maintaining two separate spaces. 1165 00:59:39,080 --> 00:59:41,330 And then another factor of 2 comes from the fact 1166 00:59:41,330 --> 00:59:46,370 that you're doubling the size of your array when it becomes full 1167 00:59:46,370 --> 00:59:50,390 and up to half of it will be unused. 1168 00:59:50,390 --> 00:59:52,580 But it's constant times optimal since we're just 1169 00:59:52,580 --> 00:59:55,100 multiplying constants together. 1170 00:59:55,100 --> 00:59:57,980 And similarly, when you're FROM space 1171 00:59:57,980 --> 01:00:00,260 becomes relatively empty-- 1172 01:00:00,260 --> 01:00:02,030 for example, if it's less than half full-- 1173 01:00:02,030 --> 01:00:05,990 you can also release memory back to the OS, 1174 01:00:05,990 --> 01:00:10,350 and then the analysis of the amortized constant overhead 1175 01:00:10,350 --> 01:00:10,988 is similar. 1176 01:00:15,970 --> 01:00:17,626 OK, any other questions? 1177 01:00:25,840 --> 01:00:30,230 OK, so there's a lot more that's known and also unknown 1178 01:00:30,230 --> 01:00:31,940 about dynamic storage allocation, 1179 01:00:31,940 --> 01:00:34,370 so I've only scratched the surface of dynamic storage 1180 01:00:34,370 --> 01:00:35,990 allocation today. 1181 01:00:35,990 --> 01:00:37,190 There are many other topics. 1182 01:00:37,190 --> 01:00:42,960 For example, there's the buddy system for doing coalescing. 1183 01:00:42,960 --> 01:00:46,370 There are many variants of the mark-and-sweep procedure. 1184 01:00:46,370 --> 01:00:51,560 So there are optimizations to improve the performance of it. 1185 01:00:51,560 --> 01:00:53,420 There's generational garbage collection, 1186 01:00:53,420 --> 01:00:56,510 and this is based on the idea that many objects are 1187 01:00:56,510 --> 01:00:59,600 short-lived, so a lot of the objects 1188 01:00:59,600 --> 01:01:01,550 are going to be freed pretty close to the time 1189 01:01:01,550 --> 01:01:02,770 when you allocate it. 1190 01:01:02,770 --> 01:01:05,060 And for the ones that aren't going to be freed, 1191 01:01:05,060 --> 01:01:07,662 they tend to be pretty long-lived. 1192 01:01:07,662 --> 01:01:09,620 And the idea of generational garbage collection 1193 01:01:09,620 --> 01:01:12,500 is, instead of scanning your whole memory every time, 1194 01:01:12,500 --> 01:01:16,285 you just do work on the younger objects most of the time. 1195 01:01:16,285 --> 01:01:17,660 And then once in a while, you try 1196 01:01:17,660 --> 01:01:20,480 to collect the garbage from the older objects 1197 01:01:20,480 --> 01:01:22,880 because those tend to not change that often. 1198 01:01:25,620 --> 01:01:27,520 There's also real-time garbage collection. 1199 01:01:27,520 --> 01:01:30,450 So the methods I talked about today 1200 01:01:30,450 --> 01:01:34,320 assume that the program isn't running when the garbage 1201 01:01:34,320 --> 01:01:37,710 collection procedure is running, but in practice, you 1202 01:01:37,710 --> 01:01:41,300 might want to actually have your garbage collector running 1203 01:01:41,300 --> 01:01:43,260 in the background when your program is running, 1204 01:01:43,260 --> 01:01:45,300 but this can lead to correctness issues 1205 01:01:45,300 --> 01:01:48,660 because the static algorithms I just described 1206 01:01:48,660 --> 01:01:51,990 assume that the graph of the objects and pointers 1207 01:01:51,990 --> 01:01:55,413 isn't changing, and when the objects and pointers are 1208 01:01:55,413 --> 01:01:57,330 changing, you need to make sure that you still 1209 01:01:57,330 --> 01:01:59,400 get a correct answer. 1210 01:01:59,400 --> 01:02:03,620 Real-time garbage collection tends to be conservative, 1211 01:02:03,620 --> 01:02:08,580 so it doesn't always free everything that's garbage. 1212 01:02:08,580 --> 01:02:10,740 But for the things that it does decide to free, 1213 01:02:10,740 --> 01:02:13,280 those can be actually reclaimed. 1214 01:02:13,280 --> 01:02:16,500 And there are various techniques to make real-time garbage 1215 01:02:16,500 --> 01:02:19,110 collection efficient. 1216 01:02:19,110 --> 01:02:21,540 One possible way is, instead of just having one FROM 1217 01:02:21,540 --> 01:02:24,570 and TO space, you can have multiple FROM and TO spaces, 1218 01:02:24,570 --> 01:02:27,660 and then you just work on one of the spaces at a time 1219 01:02:27,660 --> 01:02:29,730 so that it doesn't actually take that long 1220 01:02:29,730 --> 01:02:31,100 to do garbage collection. 1221 01:02:31,100 --> 01:02:35,130 You can do it incrementally throughout your program. 1222 01:02:35,130 --> 01:02:37,710 There's also multithreaded storage allocation 1223 01:02:37,710 --> 01:02:39,900 and parallel garbage collection. 1224 01:02:39,900 --> 01:02:42,060 So this is when you have multiple threads running, 1225 01:02:42,060 --> 01:02:47,100 how do you allocate memory, and also how do you collect garbage 1226 01:02:47,100 --> 01:02:47,850 in the background. 1227 01:02:47,850 --> 01:02:52,940 So the algorithms become much trickier because there 1228 01:02:52,940 --> 01:02:55,320 are multiple threads running, and you 1229 01:02:55,320 --> 01:02:57,390 have to deal with races and correctness issues 1230 01:02:57,390 --> 01:02:58,380 and so forth. 1231 01:02:58,380 --> 01:03:01,560 And that's actually a topic of the next lecture. 1232 01:03:04,860 --> 01:03:06,920 So in summary, these are the things 1233 01:03:06,920 --> 01:03:08,280 that we talked about today. 1234 01:03:08,280 --> 01:03:12,860 So we have the most basic form of storage, which is a stack. 1235 01:03:12,860 --> 01:03:16,400 The limitation of a stack is that you can only free things 1236 01:03:16,400 --> 01:03:17,960 at the top of the stack. 1237 01:03:17,960 --> 01:03:20,570 You can't free arbitrary things in the stack, 1238 01:03:20,570 --> 01:03:24,140 but it's very efficient when it works because the code is 1239 01:03:24,140 --> 01:03:24,735 very simple. 1240 01:03:24,735 --> 01:03:26,360 And it can be inlined, and in fact this 1241 01:03:26,360 --> 01:03:30,530 is what the C calling procedure uses. 1242 01:03:30,530 --> 01:03:33,440 It places local variables in the return address 1243 01:03:33,440 --> 01:03:35,380 of the function on the stack. 1244 01:03:35,380 --> 01:03:37,790 The heap is the more general form of storage, 1245 01:03:37,790 --> 01:03:41,020 but it's much more complicated to manage. 1246 01:03:41,020 --> 01:03:43,850 And we talked about various ways to do allocation 1247 01:03:43,850 --> 01:03:45,480 and deallocation for the heap. 1248 01:03:45,480 --> 01:03:47,570 We have fixed-size allocation using 1249 01:03:47,570 --> 01:03:50,330 free lists, variable-size allocation using 1250 01:03:50,330 --> 01:03:54,500 binned free lists, and then many variants of these ideas 1251 01:03:54,500 --> 01:03:57,170 are used in practice. 1252 01:03:57,170 --> 01:03:59,480 For garbage collection, this is where 1253 01:03:59,480 --> 01:04:04,070 you want to free the programmer from having to free objects. 1254 01:04:04,070 --> 01:04:06,470 And garbage collection algorithms 1255 01:04:06,470 --> 01:04:10,100 are supported in languages like Java and Python. 1256 01:04:10,100 --> 01:04:12,690 We talked about various ways to do this reference counting, 1257 01:04:12,690 --> 01:04:14,120 which suffers from the limitation 1258 01:04:14,120 --> 01:04:16,580 that it can't free cycles. 1259 01:04:16,580 --> 01:04:18,500 Mark-and-sweep and stop-and-copy-- these 1260 01:04:18,500 --> 01:04:20,110 can free cycles. 1261 01:04:20,110 --> 01:04:21,620 The mark-and-sweep procedure doesn't 1262 01:04:21,620 --> 01:04:23,300 deal with external fragmentation, 1263 01:04:23,300 --> 01:04:26,840 but the stop-and-copy procedure does. 1264 01:04:26,840 --> 01:04:29,730 We also talked about internal and external fragmentation. 1265 01:04:29,730 --> 01:04:33,950 So external fragmentation is when your memory blocks are all 1266 01:04:33,950 --> 01:04:35,480 over the place in virtual memory. 1267 01:04:35,480 --> 01:04:38,570 This can cause performance issues like disk thrashing 1268 01:04:38,570 --> 01:04:40,040 and TLB misses. 1269 01:04:40,040 --> 01:04:42,440 Then there's internal fragmentation, 1270 01:04:42,440 --> 01:04:44,795 where you're actually not using all 1271 01:04:44,795 --> 01:04:47,060 of the space in the block that you allocate. 1272 01:04:47,060 --> 01:04:49,700 So for example, in the binned free list algorithm, 1273 01:04:49,700 --> 01:04:51,950 you do have a little bit of internal fragmentation 1274 01:04:51,950 --> 01:04:55,063 because you're always rounding up to the nearest power of 2 1275 01:04:55,063 --> 01:04:57,230 greater than the size you want, so you're wasting up 1276 01:04:57,230 --> 01:04:59,450 to a factor of 2 in space. 1277 01:04:59,450 --> 01:05:01,310 And in project 3, you're going to look 1278 01:05:01,310 --> 01:05:05,030 much more at these storage allocation schemes, 1279 01:05:05,030 --> 01:05:09,770 and then you'll also get to try some of these in homework 6. 1280 01:05:09,770 --> 01:05:13,030 So any other questions? 1281 01:05:13,030 --> 01:05:16,060 So that's all I have for today's lecture.