1 00:00:09,388 --> 00:00:11,680 MICHALE FEE: All right, let's go ahead and get started. 2 00:00:11,680 --> 00:00:13,180 So we're starting a new topic today. 3 00:00:13,180 --> 00:00:15,450 This is actually one of my favorite lectures, 4 00:00:15,450 --> 00:00:20,430 one of my favorite subjects in computational neuroscience. 5 00:00:20,430 --> 00:00:23,590 All right, so brief recap of what we've been doing. 6 00:00:23,590 --> 00:00:27,750 So we've been working on circuit models of neural networks. 7 00:00:27,750 --> 00:00:30,330 And we've been working on what we 8 00:00:30,330 --> 00:00:32,430 call a rate model, in which we replaced 9 00:00:32,430 --> 00:00:35,670 all the spikes of a neuron with, essentially, 10 00:00:35,670 --> 00:00:39,000 a single number that characterizes 11 00:00:39,000 --> 00:00:41,550 the rate at which a neuron fires. 12 00:00:41,550 --> 00:00:45,960 We introduced a simple network in which 13 00:00:45,960 --> 00:00:48,450 we have an input neuron and an output neuron 14 00:00:48,450 --> 00:00:52,110 with a synaptic connection of weight w between them. 15 00:00:52,110 --> 00:00:56,730 And that synaptic connection leads to a synaptic input 16 00:00:56,730 --> 00:00:59,400 that's proportional to w times the firing 17 00:00:59,400 --> 00:01:00,870 rate of the input neuron. 18 00:01:00,870 --> 00:01:03,690 And then we talked about how we can characterize 19 00:01:03,690 --> 00:01:06,660 the output, the firing rate of the output neuron, 20 00:01:06,660 --> 00:01:12,030 as some nonlinear function of the total input 21 00:01:12,030 --> 00:01:15,330 to this output neuron. 22 00:01:15,330 --> 00:01:18,660 We've talked about different F-I curves. 23 00:01:18,660 --> 00:01:22,530 We've talked about having what's called a binary threshold 24 00:01:22,530 --> 00:01:25,360 unit, which has zero firing below some threshold. 25 00:01:25,360 --> 00:01:28,500 And then actually, there are different versions 26 00:01:28,500 --> 00:01:30,270 of the binary threshold unit. 27 00:01:30,270 --> 00:01:33,870 Sometimes the firing rate is zero 28 00:01:33,870 --> 00:01:36,030 for inputs below the threshold. 29 00:01:36,030 --> 00:01:40,110 And in other models, we use a minus 1. 30 00:01:40,110 --> 00:01:44,490 And then a constant firing rate of one above that threshold. 31 00:01:44,490 --> 00:01:46,950 And we also talked about linear neurons, 32 00:01:46,950 --> 00:01:49,410 where we can write down the firing rate of the output 33 00:01:49,410 --> 00:01:54,000 neuron just as a weighted sum of the inputs. 34 00:01:54,000 --> 00:01:56,130 And remember that these neurons are 35 00:01:56,130 --> 00:01:59,880 kind of special in that they can have negative firing 36 00:01:59,880 --> 00:02:05,050 rates, which is not really biophysically plausible, 37 00:02:05,050 --> 00:02:09,690 but mathematically, it's very convenient to have neurons 38 00:02:09,690 --> 00:02:10,530 like this. 39 00:02:10,530 --> 00:02:14,490 So we took this simple model and we expanded it to the case 40 00:02:14,490 --> 00:02:17,790 where we have many input neurons and many output neurons. 41 00:02:17,790 --> 00:02:24,090 So now we have a vector of input firing rates, u, and a vector 42 00:02:24,090 --> 00:02:25,590 of output firing rates, u. 43 00:02:25,590 --> 00:02:27,860 And for the case of linear neurons, 44 00:02:27,860 --> 00:02:29,850 we talked about how you can write down 45 00:02:29,850 --> 00:02:33,180 the vector of firing rates of the output neuron 46 00:02:33,180 --> 00:02:38,910 simply as a matrix product of a weight matrix times the vector 47 00:02:38,910 --> 00:02:39,990 of input firing rates. 48 00:02:39,990 --> 00:02:45,970 And we talked about how this can produce transformations 49 00:02:45,970 --> 00:02:47,660 of this vector of input firing rates. 50 00:02:47,660 --> 00:02:51,430 So in this high-dimensional space of inputs, 51 00:02:51,430 --> 00:02:54,850 we can imagine stretching that input vector 52 00:02:54,850 --> 00:02:58,960 along different directions to amplify certain directions that 53 00:02:58,960 --> 00:03:01,240 may be more important than others. 54 00:03:01,240 --> 00:03:02,740 We talked about how you can do that, 55 00:03:02,740 --> 00:03:06,350 stretch in arbitrary directions, not just along the axes. 56 00:03:06,350 --> 00:03:10,390 And we talked about how that vector of-- 57 00:03:10,390 --> 00:03:14,510 that, sorry, matrix of weights can produce a rotation. 58 00:03:14,510 --> 00:03:17,230 So we can have some set of inputs 59 00:03:17,230 --> 00:03:20,140 where, let's say, we have clusters 60 00:03:20,140 --> 00:03:22,187 of different input values corresponding 61 00:03:22,187 --> 00:03:23,020 to different things. 62 00:03:23,020 --> 00:03:27,430 And you can rotate that to put certain features 63 00:03:27,430 --> 00:03:29,380 in particular output neurons. 64 00:03:29,380 --> 00:03:31,240 So now you can discriminate one class 65 00:03:31,240 --> 00:03:33,280 of objects from another class of objects 66 00:03:33,280 --> 00:03:36,370 by looking at just one dimension and not 67 00:03:36,370 --> 00:03:39,940 the whole high-dimensional space. 68 00:03:39,940 --> 00:03:43,980 So today, we're going to look at a new kind of network called 69 00:03:43,980 --> 00:03:45,910 a recurrent neural network, where not 70 00:03:45,910 --> 00:03:50,650 only do we have inputs to our output neurons 71 00:03:50,650 --> 00:03:54,400 from an input layer, but we also have connections 72 00:03:54,400 --> 00:03:56,770 between the neurons in the output layer. 73 00:03:56,770 --> 00:04:01,750 So these neurons in a recurrent network talk to each other. 74 00:04:01,750 --> 00:04:07,160 And that imbues some really cool properties onto these networks. 75 00:04:07,160 --> 00:04:10,090 So we're going to develop the math 76 00:04:10,090 --> 00:04:11,890 and describe how these things work 77 00:04:11,890 --> 00:04:15,100 to develop an intuition for how recurrent networks respond 78 00:04:15,100 --> 00:04:16,180 to their inputs. 79 00:04:16,180 --> 00:04:20,019 We're going to get into some of the computations 80 00:04:20,019 --> 00:04:22,330 that recurrent networks can do. 81 00:04:22,330 --> 00:04:26,980 They can act as amplifiers in particular directions. 82 00:04:26,980 --> 00:04:30,250 They can act as integrators, so they can accumulate information 83 00:04:30,250 --> 00:04:31,540 over time. 84 00:04:31,540 --> 00:04:33,430 They can generate sequences. 85 00:04:33,430 --> 00:04:35,620 They can act as short-term memories 86 00:04:35,620 --> 00:04:39,170 of either continuous variables or discrete variables. 87 00:04:39,170 --> 00:04:43,420 It's a very powerful kind of circuit architecture. 88 00:04:43,420 --> 00:04:46,930 And on top of that, in order to describe these mathematically, 89 00:04:46,930 --> 00:04:49,840 we're going to use all of the linear algebra tools 90 00:04:49,840 --> 00:04:51,920 that we've been developing so far. 91 00:04:51,920 --> 00:04:57,190 So, hopefully, a bunch of things will kind of connect together. 92 00:04:57,190 --> 00:05:00,360 OK, so mathematical description of recurrent networks. 93 00:05:00,360 --> 00:05:02,117 We're going to talk about dynamics 94 00:05:02,117 --> 00:05:03,700 in these recurrent networks, and we're 95 00:05:03,700 --> 00:05:06,460 going to start with the very simplest kind 96 00:05:06,460 --> 00:05:09,100 of recurrent network called an autapse network. 97 00:05:09,100 --> 00:05:13,360 Then we're going to extend that to the general case 98 00:05:13,360 --> 00:05:16,450 of recurrent connectivity. 99 00:05:16,450 --> 00:05:18,520 And then we're going to talk about how recurrent 100 00:05:18,520 --> 00:05:20,570 networks store memories. 101 00:05:20,570 --> 00:05:25,720 So we'll start talking about a specific circuit models 102 00:05:25,720 --> 00:05:28,570 for storing short-term memories. 103 00:05:28,570 --> 00:05:33,280 And I'll touch on recurrent networks for decision-making. 104 00:05:33,280 --> 00:05:38,620 And this will kind of lead into the last few lectures 105 00:05:38,620 --> 00:05:41,920 of the class, where we get into how 106 00:05:41,920 --> 00:05:45,610 sort of specific cases of looking at how networks 107 00:05:45,610 --> 00:05:48,140 can store memories. 108 00:05:48,140 --> 00:05:50,500 OK, mathematical description. 109 00:05:50,500 --> 00:05:53,980 All right, so the first thing that we need to do is-- 110 00:05:53,980 --> 00:05:56,740 the really cool thing about recurrent networks 111 00:05:56,740 --> 00:06:01,060 is that their activity can evolve over time. 112 00:06:01,060 --> 00:06:05,500 So we need to talk about dynamics, all right? 113 00:06:05,500 --> 00:06:08,170 The feed-forward networks that we've been talking about, 114 00:06:08,170 --> 00:06:11,000 we just put in an input. 115 00:06:11,000 --> 00:06:14,210 It gets weighted by synaptic strength, 116 00:06:14,210 --> 00:06:17,570 and we get a firing rate in the output, 117 00:06:17,570 --> 00:06:19,160 just sort of instantaneously. 118 00:06:19,160 --> 00:06:21,500 We've been thinking of you put an input, 119 00:06:21,500 --> 00:06:23,090 and you get an output. 120 00:06:23,090 --> 00:06:25,070 In general, neural networks don't do that. 121 00:06:25,070 --> 00:06:27,740 You put an input, and things change over time 122 00:06:27,740 --> 00:06:30,380 until you settle at some output, maybe, 123 00:06:30,380 --> 00:06:33,830 or it starts doing something interesting, all right? 124 00:06:33,830 --> 00:06:38,120 So the time course of the activity 125 00:06:38,120 --> 00:06:39,770 becomes very important, all right? 126 00:06:39,770 --> 00:06:43,180 So neurons don't respond instantaneously to inputs. 127 00:06:43,180 --> 00:06:45,200 There are synaptic delays. 128 00:06:45,200 --> 00:06:48,140 There are integration of membrane potential. 129 00:06:48,140 --> 00:06:49,970 Things change over time. 130 00:06:49,970 --> 00:06:53,300 And a specific example of this that we saw in the past 131 00:06:53,300 --> 00:06:55,280 is that if you have an input spike, 132 00:06:55,280 --> 00:06:59,480 you can produce a postsynaptic current that jumps up abruptly 133 00:06:59,480 --> 00:07:01,910 as the synaptic conductance turns on. 134 00:07:01,910 --> 00:07:04,670 And then the synaptic conductance 135 00:07:04,670 --> 00:07:09,140 decays away as the neurotransmitter unbinds 136 00:07:09,140 --> 00:07:10,670 from the neurotransmitter receptor, 137 00:07:10,670 --> 00:07:15,160 and you get a synaptic current that decays away over time, OK? 138 00:07:15,160 --> 00:07:19,720 So that's a simple kind of time dependence that you would get. 139 00:07:19,720 --> 00:07:22,850 And that could lead to time dependence 140 00:07:22,850 --> 00:07:26,000 in the firing rate of the output neuron. 141 00:07:26,000 --> 00:07:28,190 OK, dendritic propagation, membrane 142 00:07:28,190 --> 00:07:33,470 time constant, other examples of how things can take time 143 00:07:33,470 --> 00:07:35,060 in a neural network. 144 00:07:35,060 --> 00:07:36,680 All right, so we're going to model 145 00:07:36,680 --> 00:07:39,450 the firing rate of our output neuron in the following way. 146 00:07:39,450 --> 00:07:42,170 If we have an input firing rate that's zero 147 00:07:42,170 --> 00:07:46,400 and then steps up to some constant and then steps down, 148 00:07:46,400 --> 00:07:51,740 we're going to model the output, the firing rate of the output 149 00:07:51,740 --> 00:07:54,375 neuron, using exactly the same kind of first order 150 00:07:54,375 --> 00:07:56,000 linear differential equation that we've 151 00:07:56,000 --> 00:07:59,120 been using all along for the membrane potential, 152 00:07:59,120 --> 00:08:00,817 for the Hodgkin-Huxley gating variables. 153 00:08:00,817 --> 00:08:02,900 The same kind of differential equation that you've 154 00:08:02,900 --> 00:08:04,808 seen over and over again. 155 00:08:04,808 --> 00:08:07,100 So that's the differential equation we're going to use. 156 00:08:07,100 --> 00:08:11,330 We're going to say that the time derivative of the firing 157 00:08:11,330 --> 00:08:14,210 rate of the output neuron times the time constant 158 00:08:14,210 --> 00:08:17,960 is just equal to minus the firing rate of the output 159 00:08:17,960 --> 00:08:19,740 non plus v infinity. 160 00:08:19,740 --> 00:08:22,697 And so you know that the solution to this equation 161 00:08:22,697 --> 00:08:24,530 is that the firing rate of the output neuron 162 00:08:24,530 --> 00:08:31,280 will just relax exponentially to some new v infinity. 163 00:08:31,280 --> 00:08:34,520 And the v infinity that we're going to use 164 00:08:34,520 --> 00:08:38,870 is just this non-linear function times the weighted input 165 00:08:38,870 --> 00:08:41,870 to our neuron. 166 00:08:41,870 --> 00:08:47,260 So we're going to take the formalism that we developed 167 00:08:47,260 --> 00:08:49,690 for our feed-forward networks to say, 168 00:08:49,690 --> 00:08:51,460 what is the firing rate of the output 169 00:08:51,460 --> 00:08:54,040 neuron as a function of the inputs? 170 00:08:54,040 --> 00:08:56,380 And we're going to use that firing rate that we've 171 00:08:56,380 --> 00:09:01,810 been using before as the v infinity for our network 172 00:09:01,810 --> 00:09:03,220 with dynamics. 173 00:09:03,220 --> 00:09:04,330 Any questions about that? 174 00:09:07,150 --> 00:09:10,310 All right, so that becomes our differential equation now 175 00:09:10,310 --> 00:09:14,390 for this recurrent network, all right? 176 00:09:14,390 --> 00:09:17,510 So it's just a first order linear differential equation, 177 00:09:17,510 --> 00:09:21,080 where the v infinity, the steady state firing rate of the output 178 00:09:21,080 --> 00:09:25,520 neuron, is just this nonlinear function times the weighted sum 179 00:09:25,520 --> 00:09:28,710 of all the inputs. 180 00:09:28,710 --> 00:09:31,460 All right, and actually, for most of what we do today, 181 00:09:31,460 --> 00:09:35,160 we're going to just take the case of a linear neuron. 182 00:09:35,160 --> 00:09:35,660 All right. 183 00:09:41,370 --> 00:09:42,630 So this I've already said. 184 00:09:42,630 --> 00:09:44,460 This I've already said. 185 00:09:44,460 --> 00:09:47,650 And actually, what I'm doing here is just extending this. 186 00:09:47,650 --> 00:09:50,580 So this was the case for a single output neuron 187 00:09:50,580 --> 00:09:52,080 and a single input neuron. 188 00:09:52,080 --> 00:09:54,300 What we're doing now is we're just extending this 189 00:09:54,300 --> 00:09:58,290 to the case where we have a vector of input neurons 190 00:09:58,290 --> 00:10:01,950 with a firing rate represented by a firing rate vector u, 191 00:10:01,950 --> 00:10:04,260 and a vector of output neurons with a fine rate 192 00:10:04,260 --> 00:10:08,310 vector v. And we're just going to use this same differential 193 00:10:08,310 --> 00:10:11,220 equation, but we're going to write it in vector notation. 194 00:10:11,220 --> 00:10:14,040 So each one of these output neurons 195 00:10:14,040 --> 00:10:17,030 has an equation like this, and we're 196 00:10:17,030 --> 00:10:21,255 going to combine them all together into a single vector. 197 00:10:21,255 --> 00:10:22,130 Does that make sense? 198 00:10:25,570 --> 00:10:28,730 All right, so there is our vector notation 199 00:10:28,730 --> 00:10:33,460 of the activity in this recurrent network. 200 00:10:33,460 --> 00:10:37,880 Sorry, I forgot to put the recurrent connections in there. 201 00:10:37,880 --> 00:10:42,200 So the time dependence is really simple 202 00:10:42,200 --> 00:10:44,930 in this feed-forward network, right? 203 00:10:44,930 --> 00:10:47,810 So in a feed-forward network, the dynamics 204 00:10:47,810 --> 00:10:48,860 just look like this. 205 00:10:51,570 --> 00:10:53,330 But in a recurrent network, this thing 206 00:10:53,330 --> 00:10:56,630 can get really interesting and start doing interesting stuff. 207 00:10:56,630 --> 00:11:00,530 All right, so let's add recurrent connections now 208 00:11:00,530 --> 00:11:06,075 and add these recurrent connections to our equation. 209 00:11:08,910 --> 00:11:12,210 So in addition to this weight matrix 210 00:11:12,210 --> 00:11:15,067 w that describes the connections from the input 211 00:11:15,067 --> 00:11:16,650 layer to the output layer, we're going 212 00:11:16,650 --> 00:11:19,470 to have another weight matrix that 213 00:11:19,470 --> 00:11:22,500 describes the connections between the neurons 214 00:11:22,500 --> 00:11:25,740 in the output layer. 215 00:11:25,740 --> 00:11:28,380 And this weight matrix, of course, 216 00:11:28,380 --> 00:11:30,690 has to be able to describe a connection 217 00:11:30,690 --> 00:11:35,040 from any one of these neurons to any other of these neurons. 218 00:11:35,040 --> 00:11:37,260 And so this weight matrix is going 219 00:11:37,260 --> 00:11:41,310 to be a function of the postsynaptic neuron, 220 00:11:41,310 --> 00:11:42,840 the weight-- 221 00:11:42,840 --> 00:11:45,210 the synaptic strength is going to be 222 00:11:45,210 --> 00:11:49,380 a function of the postsynaptic neuron and the presynaptic-- 223 00:11:49,380 --> 00:11:51,220 the identity of the postsynaptic neuron 224 00:11:51,220 --> 00:11:53,190 and the identity of the presynaptic neuron. 225 00:11:53,190 --> 00:11:55,880 Does that make sense? 226 00:11:55,880 --> 00:11:58,310 OK, so there are two kinds of input-- 227 00:11:58,310 --> 00:12:03,630 a feed-forward input from the input layer 228 00:12:03,630 --> 00:12:07,510 and a recurrent input due to connections within the output 229 00:12:07,510 --> 00:12:08,010 layer. 230 00:12:13,340 --> 00:12:14,420 Any questions about that? 231 00:12:21,300 --> 00:12:24,840 OK, so there is the equation now that 232 00:12:24,840 --> 00:12:29,760 describes the time rate of change of the firing 233 00:12:29,760 --> 00:12:31,740 rates in the output layer. 234 00:12:31,740 --> 00:12:35,310 It's just this first order linear differential equation. 235 00:12:35,310 --> 00:12:43,290 And the infinity is just this non-linear function 236 00:12:43,290 --> 00:12:50,220 of the inputs, of the net input to this neuron, to each neuron. 237 00:12:50,220 --> 00:12:53,190 And the net input to this set of neurons 238 00:12:53,190 --> 00:12:57,090 is a contribution from the feed-forward inputs, 239 00:12:57,090 --> 00:13:01,590 given by this weight matrix w, and this contribution 240 00:13:01,590 --> 00:13:07,820 from the recurrent inputs, given by this weight matrix, m. 241 00:13:07,820 --> 00:13:14,540 So that is the crux of it, all right? 242 00:13:14,540 --> 00:13:21,320 So I want to make sure that we understand where we are. 243 00:13:21,320 --> 00:13:23,916 Does anybody have any questions about that? 244 00:13:23,916 --> 00:13:26,760 No? 245 00:13:26,760 --> 00:13:29,760 All right, then I'll push ahead. 246 00:13:29,760 --> 00:13:31,950 All right, so what is this? 247 00:13:31,950 --> 00:13:33,270 So we've seen this before. 248 00:13:33,270 --> 00:13:37,050 This product of this weight matrix 249 00:13:37,050 --> 00:13:40,830 times this vector of input firing rates 250 00:13:40,830 --> 00:13:42,550 just looks like this. 251 00:13:42,550 --> 00:13:49,800 You can see that the input to this neuron, this first output 252 00:13:49,800 --> 00:13:54,420 neuron, is just the dot product of these weights 253 00:13:54,420 --> 00:13:59,190 onto the first neuron and the dot 254 00:13:59,190 --> 00:14:01,680 product of that vector of weights, that row of the weight 255 00:14:01,680 --> 00:14:04,950 matrix, with the vector of input firing rates. 256 00:14:07,650 --> 00:14:10,590 And the feed-forward contribution to this neuron 257 00:14:10,590 --> 00:14:14,610 is just the dot product of that row weight of this input weight 258 00:14:14,610 --> 00:14:20,770 matrix with the vector of input firing rates, and so on. 259 00:14:20,770 --> 00:14:25,480 If we look at the recurrent input to these neurons, 260 00:14:25,480 --> 00:14:28,630 the recurrent input to this first neuron 261 00:14:28,630 --> 00:14:31,000 is just going to be the dot product 262 00:14:31,000 --> 00:14:35,140 of this row of the recurrent weight matrix 263 00:14:35,140 --> 00:14:39,250 and the vector of firing rates in the output layer. 264 00:14:43,580 --> 00:14:46,210 The recurrent inputs to the second neuron 265 00:14:46,210 --> 00:14:49,930 is going to be the dot product of this row of the weight 266 00:14:49,930 --> 00:14:53,160 matrix and the vector of firing rates. 267 00:14:56,130 --> 00:14:56,938 Yes? 268 00:14:56,938 --> 00:14:58,750 AUDIENCE: So I guess I'm a little confused, 269 00:14:58,750 --> 00:15:01,890 because I thought it was from A. Oh, to A. OK. 270 00:15:01,890 --> 00:15:04,410 MICHALE FEE: Yeah, it's always post, pre. 271 00:15:04,410 --> 00:15:07,650 Post, pre in a weight matrix. 272 00:15:14,260 --> 00:15:16,060 That's because we're usually writing 273 00:15:16,060 --> 00:15:20,890 down these vectors the way that I'm defining this notation. 274 00:15:23,740 --> 00:15:31,990 This vector is a column matrix, a column vector. 275 00:15:31,990 --> 00:15:37,840 All right, so we're going to make one simplification 276 00:15:37,840 --> 00:15:39,880 to this. 277 00:15:39,880 --> 00:15:44,020 When we work with the recurrent networks, 278 00:15:44,020 --> 00:15:47,230 we're usually going to simplify this input. 279 00:15:47,230 --> 00:15:53,050 And rather than write down this complex feed-forward component, 280 00:15:53,050 --> 00:15:55,820 writing this out as this matrix product, 281 00:15:55,820 --> 00:15:59,950 we're just going to simplify the math. 282 00:15:59,950 --> 00:16:04,090 And rather than carry around this w times u, 283 00:16:04,090 --> 00:16:10,760 we're just going to replace that with a vector of inputs 284 00:16:10,760 --> 00:16:12,700 onto each one of those neurons, OK? 285 00:16:12,700 --> 00:16:17,380 So we're just going to pretend that the input to this neuron 286 00:16:17,380 --> 00:16:21,377 is just coming from one input, OK? 287 00:16:21,377 --> 00:16:22,960 And the input to this neuron is coming 288 00:16:22,960 --> 00:16:24,610 from another single input. 289 00:16:24,610 --> 00:16:28,720 And so we're just going to replace that feed-forward input 290 00:16:28,720 --> 00:16:30,820 onto this network with this vector h. 291 00:16:33,660 --> 00:16:35,940 So that's the equation that we're 292 00:16:35,940 --> 00:16:40,060 going to use moving forward, all right? 293 00:16:40,060 --> 00:16:42,130 Just simplifies things a little bit so 294 00:16:42,130 --> 00:16:45,050 we're not carrying around this w u. 295 00:16:47,560 --> 00:16:50,860 So now, that's our equation that we're 296 00:16:50,860 --> 00:16:54,350 going to use to describe this recurrent network. 297 00:16:54,350 --> 00:16:57,137 This is a system of coupled equations. 298 00:16:57,137 --> 00:16:57,970 What does that mean? 299 00:16:57,970 --> 00:17:01,540 You can see that the time derivative of the firing 300 00:17:01,540 --> 00:17:05,349 rate of this first neuron is given by a contribution 301 00:17:05,349 --> 00:17:08,920 from the input layer and a contribution 302 00:17:08,920 --> 00:17:13,040 from other neurons in the output layer. 303 00:17:13,040 --> 00:17:16,190 So the time rate of change of this neuron 304 00:17:16,190 --> 00:17:18,950 depends on the activity in all the other neurons 305 00:17:18,950 --> 00:17:20,050 in the network. 306 00:17:20,050 --> 00:17:21,800 And the time rate of change in this neuron 307 00:17:21,800 --> 00:17:24,650 depends on the activity of all the other neurons 308 00:17:24,650 --> 00:17:25,290 in the network. 309 00:17:25,290 --> 00:17:28,174 So that's a set of coupled equations. 310 00:17:28,174 --> 00:17:30,390 And that, in general, can be-- 311 00:17:30,390 --> 00:17:33,230 you know, it's not obvious, when you look at it, 312 00:17:33,230 --> 00:17:35,360 what the solution is, all right? 313 00:17:35,360 --> 00:17:42,200 So we're going to develop the tools to solve this equation 314 00:17:42,200 --> 00:17:46,640 and get some intuition about how networks like this behave 315 00:17:46,640 --> 00:17:50,090 in response to their inputs. 316 00:17:50,090 --> 00:17:51,620 So the first thing we're going to do 317 00:17:51,620 --> 00:17:58,800 is to simplify this network to the case of linear neurons. 318 00:17:58,800 --> 00:18:01,810 So we don't have-- 319 00:18:01,810 --> 00:18:04,080 so the neurons just fire. 320 00:18:04,080 --> 00:18:06,690 Their firing rate is just linear with their input. 321 00:18:09,360 --> 00:18:12,750 And so that's the equation for the linear case. 322 00:18:12,750 --> 00:18:14,400 All we've done is we've just gotten rid 323 00:18:14,400 --> 00:18:16,830 of this non-linear function f. 324 00:18:19,470 --> 00:18:24,180 All right, so now let's take a very simple case 325 00:18:24,180 --> 00:18:27,810 of a recurrent network and use this equation 326 00:18:27,810 --> 00:18:29,970 to see how it behaves, all right? 327 00:18:29,970 --> 00:18:34,080 So the simplest case of a recurrent network 328 00:18:34,080 --> 00:18:39,120 is the case where the recurrent connections within this layer 329 00:18:39,120 --> 00:18:41,100 are given by-- 330 00:18:41,100 --> 00:18:43,980 the weight matrix is given by a diagonal matrix. 331 00:18:43,980 --> 00:18:45,690 Now, what does that correspond to? 332 00:18:45,690 --> 00:18:50,160 What that corresponds to is this neuron making a connection 333 00:18:50,160 --> 00:18:56,340 onto itself with a synapse of weight lambda one, right there. 334 00:18:56,340 --> 00:18:59,670 And that kind of recurrent connection 335 00:18:59,670 --> 00:19:03,420 of a neuron onto itself is called an autapse, 336 00:19:03,420 --> 00:19:06,770 like an auto synapse. 337 00:19:06,770 --> 00:19:08,760 And we're going to put one of those autapses 338 00:19:08,760 --> 00:19:12,150 on each one of these neurons in our output layer, 339 00:19:12,150 --> 00:19:15,460 in our recurrent layer. 340 00:19:15,460 --> 00:19:18,540 So now we can write down the equation 341 00:19:18,540 --> 00:19:21,540 for this network, all right? 342 00:19:21,540 --> 00:19:26,150 And what we're going to do is simply replace-- 343 00:19:26,150 --> 00:19:28,260 sorry, let me just bring up that equation again. 344 00:19:28,260 --> 00:19:30,030 Sorry, there's the equation. 345 00:19:30,030 --> 00:19:33,570 And we're simply going to replace this weight matrix 346 00:19:33,570 --> 00:19:36,870 m, this recurrent weight matrix, with that diagonal matrix 347 00:19:36,870 --> 00:19:40,510 that I just showed you. 348 00:19:40,510 --> 00:19:42,250 So there it is. 349 00:19:42,250 --> 00:19:45,820 So that time rate of change of this vector of output neurons 350 00:19:45,820 --> 00:19:48,990 is just minus v plus this diagonal matrix times 351 00:19:48,990 --> 00:19:51,480 [INAUDIBLE] plus the inputs. 352 00:19:55,570 --> 00:19:58,630 So now you can see that if we write out 353 00:19:58,630 --> 00:20:03,130 the equation separately for each one of these output neurons-- 354 00:20:03,130 --> 00:20:06,210 so here it is in vector notation. 355 00:20:06,210 --> 00:20:12,600 We can just write that out for each one of our output neurons. 356 00:20:12,600 --> 00:20:14,880 So there's a separate equation like this 357 00:20:14,880 --> 00:20:18,170 for each one of these neurons. 358 00:20:18,170 --> 00:20:20,570 But you can see that these are all uncoupled. 359 00:20:20,570 --> 00:20:23,060 So we can understand how this network responds just 360 00:20:23,060 --> 00:20:27,990 by studying this equation for one of those neurons. 361 00:20:27,990 --> 00:20:29,000 OK, so let's do that. 362 00:20:29,000 --> 00:20:31,820 We have an independent equation. 363 00:20:31,820 --> 00:20:34,700 The firing rate change-- 364 00:20:34,700 --> 00:20:37,820 the time derivative of the firing rate of neuron one 365 00:20:37,820 --> 00:20:40,510 depends only on the firing rate of neuron one. 366 00:20:40,510 --> 00:20:44,000 It doesn't depend on any other neurons. 367 00:20:44,000 --> 00:20:45,790 As you can see, it's not connected 368 00:20:45,790 --> 00:20:47,860 to any of the other neurons. 369 00:20:47,860 --> 00:20:50,420 OK, so let's write this equation. 370 00:20:50,420 --> 00:20:53,420 And let's see what that equation looks like. 371 00:20:53,420 --> 00:20:55,340 So we're going to rewrite this a little bit. 372 00:20:55,340 --> 00:21:00,940 We're just going to factor out the va all right here. 373 00:21:00,940 --> 00:21:05,574 This parameter, 1 minus lambda a, 374 00:21:05,574 --> 00:21:08,770 controls what kind of solutions this equation has. 375 00:21:08,770 --> 00:21:11,793 And there are three different cases that we need to consider. 376 00:21:11,793 --> 00:21:13,210 We need to consider the case where 377 00:21:13,210 --> 00:21:17,350 1 minus lambda is greater than zero, equal to zero, 378 00:21:17,350 --> 00:21:19,830 or less than zero. 379 00:21:19,830 --> 00:21:23,820 Those three different values of that parameter 1 minus lambda 380 00:21:23,820 --> 00:21:26,963 give three different kinds of solutions to this equation. 381 00:21:26,963 --> 00:21:28,380 We're going to start with the case 382 00:21:28,380 --> 00:21:31,730 where lambda is less than one. 383 00:21:31,730 --> 00:21:35,020 And if lambda is less than 1, then this term right 384 00:21:35,020 --> 00:21:38,200 here is greater than zero. 385 00:21:38,200 --> 00:21:42,120 If we do that, then we can rewrite this equation 386 00:21:42,120 --> 00:21:42,760 as follows. 387 00:21:42,760 --> 00:21:45,780 We're going to divide both sides of this equation 388 00:21:45,780 --> 00:21:50,090 by 1 minus lambda, and that's what we have here. 389 00:21:50,090 --> 00:21:52,970 And you can see that this equation starts looking 390 00:21:52,970 --> 00:21:57,240 very familiar, very simple. 391 00:21:57,240 --> 00:22:00,560 We have a first order linear differential equation, where 392 00:22:00,560 --> 00:22:04,790 we have a time constant here, tau over 1 minus lambda, 393 00:22:04,790 --> 00:22:09,560 and a v infinity here, which is the input, the effective input 394 00:22:09,560 --> 00:22:13,260 onto that neuron, divided by 1 minus lambda. 395 00:22:13,260 --> 00:22:18,482 So that's tau dv dt equals minus v plus v infinity. 396 00:22:21,260 --> 00:22:24,470 But now you can see that the time constant 397 00:22:24,470 --> 00:22:28,380 and the v infinity depend on lambda, 398 00:22:28,380 --> 00:22:35,450 depend on the strength of that connection, all right? 399 00:22:35,450 --> 00:22:39,110 And the solution to that we've seen before, to this equation. 400 00:22:39,110 --> 00:22:43,690 It's just exponential relaxation toward v infinity. 401 00:22:43,690 --> 00:22:45,490 OK, so here's our v infinity. 402 00:22:45,490 --> 00:22:47,170 There's our tau. 403 00:22:47,170 --> 00:22:51,730 True for the case of lambda between-- 404 00:22:51,730 --> 00:22:55,210 let's just look at these solutions for the case 405 00:22:55,210 --> 00:22:58,850 of lambda between zero and one. 406 00:22:58,850 --> 00:23:04,740 So I'm going to plot v as a function of time when we have 407 00:23:04,740 --> 00:23:09,630 an input that goes from zero and then steps up and then 408 00:23:09,630 --> 00:23:12,340 is held constant. 409 00:23:12,340 --> 00:23:15,180 All right, so let's look at the case of lambda equals zero. 410 00:23:15,180 --> 00:23:18,540 So this lambda zero means there's no autapse. 411 00:23:18,540 --> 00:23:21,280 It's just not connected. 412 00:23:21,280 --> 00:23:23,560 So you can see that, in this case, 413 00:23:23,560 --> 00:23:24,970 the solution is very simple. 414 00:23:24,970 --> 00:23:29,100 It's just exponential relaxation toward infinity. v infinity 415 00:23:29,100 --> 00:23:35,880 is just given by h, the input, and tau is just 416 00:23:35,880 --> 00:23:39,540 the original tau, 1 minus 0, right? 417 00:23:39,540 --> 00:23:43,638 So it's just exponential relaxation to h. 418 00:23:46,830 --> 00:23:47,810 That make sense? 419 00:23:51,100 --> 00:23:56,990 And it relaxes with a time constant tau, tau m. 420 00:23:56,990 --> 00:23:59,480 We're going to now turn up the synapse a little bit 421 00:23:59,480 --> 00:24:04,250 so that it has a little bit of strength. 422 00:24:04,250 --> 00:24:08,200 You see that what happens when lambda is 0.5, 423 00:24:08,200 --> 00:24:10,910 that v infinity gets bigger. 424 00:24:10,910 --> 00:24:12,400 v infinity goes to 2h. 425 00:24:12,400 --> 00:24:12,900 Why? 426 00:24:12,900 --> 00:24:16,010 Because it's h divided by 1 minus 0.5. 427 00:24:16,010 --> 00:24:19,630 So it's h over 0.5, so 2h. 428 00:24:19,630 --> 00:24:21,310 And what happens to the time constant? 429 00:24:21,310 --> 00:24:25,710 Well, it becomes two tau. 430 00:24:25,710 --> 00:24:28,800 All right, and if we make lambda equal to 0.3-- 431 00:24:28,800 --> 00:24:29,930 sorry, 0.66. 432 00:24:29,930 --> 00:24:31,340 We turn it up a little bit. 433 00:24:31,340 --> 00:24:36,600 You can see that the response of this neuron gets even bigger. 434 00:24:36,600 --> 00:24:38,480 So you can see that what's happening 435 00:24:38,480 --> 00:24:42,890 is that when we start letting this neuron feed back 436 00:24:42,890 --> 00:24:47,970 to itself, positive feedback, the response of the neuron 437 00:24:47,970 --> 00:24:51,020 to a fixed input-- 438 00:24:51,020 --> 00:24:52,680 the input is the same for all of those. 439 00:24:52,680 --> 00:24:55,380 The response of the neuron gets bigger. 440 00:24:55,380 --> 00:24:59,130 And so having positive feedback of that neuron onto itself 441 00:24:59,130 --> 00:25:02,130 through an autapse just amplifies the response 442 00:25:02,130 --> 00:25:03,480 of this neuron to its input. 443 00:25:09,080 --> 00:25:11,930 Now, let's consider the case where-- 444 00:25:11,930 --> 00:25:14,630 so positive feedback amplifies the response. 445 00:25:14,630 --> 00:25:16,190 And what also does it do? 446 00:25:16,190 --> 00:25:18,530 It slows the response down. 447 00:25:18,530 --> 00:25:21,590 The time constants are getting longer, which 448 00:25:21,590 --> 00:25:23,570 means the response is slower. 449 00:25:27,305 --> 00:25:30,320 All right, let's look at what happens when 450 00:25:30,320 --> 00:25:32,960 the lambdas are less than zero. 451 00:25:32,960 --> 00:25:37,085 What does lambda less than zero correspond to here? 452 00:25:37,085 --> 00:25:37,960 AUDIENCE: [INAUDIBLE] 453 00:25:37,960 --> 00:25:41,470 MICHALE FEE: Yeah, which is, in neurons, what 454 00:25:41,470 --> 00:25:43,294 does that correspond to? 455 00:25:43,294 --> 00:25:44,800 AUDIENCE: [INAUDIBLE] 456 00:25:44,800 --> 00:25:46,100 MICHALE FEE: Inhibition. 457 00:25:46,100 --> 00:25:48,685 So this neuron, when you put an input in, 458 00:25:48,685 --> 00:25:50,980 it tries to activate the neuron. 459 00:25:50,980 --> 00:25:52,955 But that neuron inhibits itself. 460 00:25:52,955 --> 00:25:54,580 So what do you think's going to happen? 461 00:25:54,580 --> 00:25:58,120 So positive feedback made the response bigger. 462 00:25:58,120 --> 00:26:01,130 Here, the neuron is kind of inhibiting itself. 463 00:26:01,130 --> 00:26:02,960 So what's going to happen? 464 00:26:02,960 --> 00:26:05,620 You put in that same h that we had before, 465 00:26:05,620 --> 00:26:09,612 what's going to happen when we have inhibition? 466 00:26:09,612 --> 00:26:11,070 AUDIENCE: Response is [INAUDIBLE].. 467 00:26:11,070 --> 00:26:12,000 MICHALE FEE: What's that? 468 00:26:12,000 --> 00:26:12,900 AUDIENCE: The response is going to be smaller. 469 00:26:12,900 --> 00:26:15,060 MICHALE FEE: The response will just be smaller, that's right. 470 00:26:15,060 --> 00:26:16,030 So let's look at that. 471 00:26:16,030 --> 00:26:17,850 So here's firing rate of this neuron 472 00:26:17,850 --> 00:26:20,820 is a function of time for a step input. 473 00:26:20,820 --> 00:26:23,070 You can see for a lambda equals zero, 474 00:26:23,070 --> 00:26:25,305 we're going to respond with an amount h. 475 00:26:27,890 --> 00:26:29,590 But if we put in-- 476 00:26:29,590 --> 00:26:30,920 in a time constant tau. 477 00:26:30,920 --> 00:26:33,740 If we put in a lambda of negative one-- 478 00:26:33,740 --> 00:26:36,080 that means you put this input in-- 479 00:26:36,080 --> 00:26:39,280 that neuron starts inhibiting itself, 480 00:26:39,280 --> 00:26:42,130 and you can see the response is smaller. 481 00:26:42,130 --> 00:26:44,290 But another thing that's real interesting 482 00:26:44,290 --> 00:26:46,690 is that you can see that the response of the neuron 483 00:26:46,690 --> 00:26:48,190 is actually faster. 484 00:26:52,100 --> 00:26:55,645 So if the feedback-- if the lambda is minus one, 485 00:26:55,645 --> 00:27:00,350 you can see that v infinity is h over 1 minus negative 1. 486 00:27:00,350 --> 00:27:02,860 So it's h over 2. 487 00:27:02,860 --> 00:27:03,760 All right, and so on. 488 00:27:03,760 --> 00:27:06,430 The more we turn up that inhibition, the more 489 00:27:06,430 --> 00:27:09,070 suppressed the neuron is, the weaker 490 00:27:09,070 --> 00:27:11,420 the response that neuron is to its input, 491 00:27:11,420 --> 00:27:14,110 but the faster it is. 492 00:27:14,110 --> 00:27:17,500 So negative feedback suppresses the response of the neuron 493 00:27:17,500 --> 00:27:19,300 and speeds up the response. 494 00:27:26,708 --> 00:27:28,750 OK, now, there's one other really important thing 495 00:27:28,750 --> 00:27:32,860 about recurrent networks in this regime, where 496 00:27:32,860 --> 00:27:36,610 this lambda is less than one. 497 00:27:36,610 --> 00:27:39,610 And that is that the activity always 498 00:27:39,610 --> 00:27:43,080 relaxes back to zero when you turn the input off. 499 00:27:43,080 --> 00:27:46,660 OK, so you put a step input in, the neuron 500 00:27:46,660 --> 00:27:50,320 responds, relaxing exponentially to sum of v infinity. 501 00:27:50,320 --> 00:27:53,820 But when you turn the input off, the network 502 00:27:53,820 --> 00:27:56,040 relaxes back to zero, OK? 503 00:28:05,760 --> 00:28:10,080 So now let's go to the more general case 504 00:28:10,080 --> 00:28:12,020 of recurrent connections. 505 00:28:12,020 --> 00:28:13,890 Oh, and first, I just want to show you 506 00:28:13,890 --> 00:28:19,870 how we actually show graphically how a neuron responds-- 507 00:28:19,870 --> 00:28:22,820 sorry, how one of these networks respond. 508 00:28:22,820 --> 00:28:25,050 And a typical way that we do that is we 509 00:28:25,050 --> 00:28:29,430 plot the firing rate of one neuron versus the firing 510 00:28:29,430 --> 00:28:31,260 rate of another neuron. 511 00:28:31,260 --> 00:28:34,510 That's called a state-space trajectory. 512 00:28:34,510 --> 00:28:38,820 And we plot that response as a function of time 513 00:28:38,820 --> 00:28:40,760 after we put in an input. 514 00:28:40,760 --> 00:28:44,100 So we can put an input in described as some vector. 515 00:28:44,100 --> 00:28:47,670 So we put in some h1 and h2, and we then 516 00:28:47,670 --> 00:28:50,430 plot the response of the neuron-- 517 00:28:50,430 --> 00:28:54,750 the response of the network in this output state space. 518 00:28:54,750 --> 00:28:57,400 So let me show you an example of what that looks like. 519 00:28:57,400 --> 00:29:04,170 So here is the output of this little network 520 00:29:04,170 --> 00:29:06,660 for different kinds of inputs. 521 00:29:06,660 --> 00:29:09,210 So Daniel made this nice little movie for us. 522 00:29:12,170 --> 00:29:16,250 Here, you can see that if you put an input into neuron one, 523 00:29:16,250 --> 00:29:17,420 neuron one responds. 524 00:29:17,420 --> 00:29:20,180 If you put a negative input into neuron one, 525 00:29:20,180 --> 00:29:21,700 the neuron goes negative. 526 00:29:21,700 --> 00:29:25,140 If you put an input into neuron two, the neuron responds. 527 00:29:25,140 --> 00:29:29,640 And if you put a negative input into neuron two, it responds. 528 00:29:29,640 --> 00:29:33,680 Now, why did it respond bigger in this direction than 529 00:29:33,680 --> 00:29:34,978 in this direction? 530 00:29:39,370 --> 00:29:42,320 AUDIENCE: That's [INAUDIBLE]. 531 00:29:42,320 --> 00:29:43,070 MICHALE FEE: Good. 532 00:29:43,070 --> 00:29:47,858 Because neuron one had-- 533 00:29:47,858 --> 00:29:48,830 AUDIENCE: Positive? 534 00:29:48,830 --> 00:29:51,000 MICHALE FEE: Positive feedback. 535 00:29:51,000 --> 00:29:53,330 And neuron two had negative feedback. 536 00:29:53,330 --> 00:29:59,600 So neuron one, this neuron one, amplified its input 537 00:29:59,600 --> 00:30:01,100 and gave a big response. 538 00:30:01,100 --> 00:30:05,400 Neuron two suppressed the response to its input, 539 00:30:05,400 --> 00:30:06,650 and so it had a weak response. 540 00:30:12,750 --> 00:30:14,390 Let's look at another interesting case. 541 00:30:14,390 --> 00:30:17,180 Let's put an input into these neurons-- 542 00:30:17,180 --> 00:30:20,090 not one at a time, but simultaneously. 543 00:30:24,450 --> 00:30:28,440 So now we're going to put an input into both neurons one 544 00:30:28,440 --> 00:30:29,760 and two simultaneously. 545 00:30:37,570 --> 00:30:38,550 It's like Spirograph. 546 00:30:38,550 --> 00:30:44,430 Did you guys play with Spirograph? 547 00:30:44,430 --> 00:30:45,570 It's kind of weird, right? 548 00:30:45,570 --> 00:30:47,940 It's like making little butterflies for spring. 549 00:30:52,140 --> 00:30:53,850 So why does the output-- 550 00:30:53,850 --> 00:30:56,100 why does the response of this neuron 551 00:30:56,100 --> 00:31:01,530 to an input, positive input to both h1 and h2, look like this? 552 00:31:01,530 --> 00:31:04,680 Let's just break this down into one of these little branches. 553 00:31:04,680 --> 00:31:05,910 We start at zero. 554 00:31:05,910 --> 00:31:09,420 We put an input into h1 and h2, and the response 555 00:31:09,420 --> 00:31:14,930 goes quickly like this and then relaxes up to here. 556 00:31:14,930 --> 00:31:16,702 So why is that? 557 00:31:16,702 --> 00:31:18,190 Lena? 558 00:31:18,190 --> 00:31:23,646 AUDIENCE: [INAUDIBLE] so there was [INAUDIBLE] and then 559 00:31:23,646 --> 00:31:27,012 because it's negative, it's shorter. 560 00:31:27,012 --> 00:31:27,720 MICHALE FEE: Yup. 561 00:31:27,720 --> 00:31:30,965 The response in the v2 direction is weak but fast. 562 00:31:30,965 --> 00:31:31,590 AUDIENCE: Yeah. 563 00:31:31,590 --> 00:31:34,570 MICHALE FEE: So it goes up quickly. 564 00:31:34,570 --> 00:31:37,680 And then the response in the v1 direction is? 565 00:31:37,680 --> 00:31:39,060 AUDIENCE: Slow, but [INAUDIBLE]. 566 00:31:39,060 --> 00:31:39,810 MICHALE FEE: Good. 567 00:31:39,810 --> 00:31:41,360 That's it. 568 00:31:41,360 --> 00:31:43,090 It's slow, but [AUDIO OUT]. 569 00:31:43,090 --> 00:31:46,150 It's amplified in this direction, suppressed 570 00:31:46,150 --> 00:31:46,900 in this direction. 571 00:31:46,900 --> 00:31:49,630 But the response is fast this way and slow this way. 572 00:31:49,630 --> 00:31:51,310 So it traces this out. 573 00:31:51,310 --> 00:31:56,420 Now, when you turn the input off, again, it relaxes. 574 00:31:56,420 --> 00:32:02,100 v2 relaxes quickly back to zero, and v1 relaxes slowly 575 00:32:02,100 --> 00:32:02,800 back to zero. 576 00:32:02,800 --> 00:32:06,900 So it kind of traces out this kind of hysteretic loop. 577 00:32:10,400 --> 00:32:13,900 It's not really hysteresis. 578 00:32:13,900 --> 00:32:15,820 Then it's exactly mirror image when 579 00:32:15,820 --> 00:32:17,710 you put in a negative input. 580 00:32:17,710 --> 00:32:24,610 And when you put in h1 positive and v1 negative, 581 00:32:24,610 --> 00:32:28,210 it just looks like a mirror image. 582 00:32:28,210 --> 00:32:30,470 All right, so any questions about that? 583 00:32:30,470 --> 00:32:31,218 Yes, Lena? 584 00:32:31,218 --> 00:32:34,086 AUDIENCE: If there was nothing, like no kind of amplified 585 00:32:34,086 --> 00:32:37,440 or [INAUDIBLE],, would it just be like a [INAUDIBLE]?? 586 00:32:37,440 --> 00:32:39,200 MICHALE FEE: Yeah, so if you took out 587 00:32:39,200 --> 00:32:42,964 the recurrent connections, what would what would it look like? 588 00:32:42,964 --> 00:32:44,200 AUDIENCE: An x? 589 00:32:44,200 --> 00:32:45,690 MICHALE FEE: Yeah, the output-- 590 00:32:45,690 --> 00:32:50,060 so let's say that you just literally set those to zero. 591 00:32:50,060 --> 00:32:58,130 Then the response will be the identity matrix, right? 592 00:32:58,130 --> 00:33:00,570 You get the output as a function of input. 593 00:33:00,570 --> 00:33:02,130 Let's just go back to the equation. 594 00:33:02,130 --> 00:33:03,650 Can always, always get the answer 595 00:33:03,650 --> 00:33:04,820 by looking at the equation. 596 00:33:10,330 --> 00:33:13,000 Too many animations. 597 00:33:13,000 --> 00:33:14,340 No, it's a very good question. 598 00:33:14,340 --> 00:33:14,910 Here we go. 599 00:33:14,910 --> 00:33:16,430 There it is right there. 600 00:33:16,430 --> 00:33:20,190 So you're asking about-- let's just ask about the steady state 601 00:33:20,190 --> 00:33:21,080 response. 602 00:33:21,080 --> 00:33:23,540 So we can set dv dt equal to zero. 603 00:33:23,540 --> 00:33:26,540 And you're asking, what is v? 604 00:33:26,540 --> 00:33:31,200 And you're saying, let's set lambda to zero, right? 605 00:33:31,200 --> 00:33:35,130 We're going to set all these diagonal elements to zero. 606 00:33:35,130 --> 00:33:37,740 And so now v equals h. 607 00:33:47,940 --> 00:33:49,350 OK, great question. 608 00:33:49,350 --> 00:33:54,390 Now, let's go to the case of fully recurrent networks. 609 00:33:54,390 --> 00:33:57,330 We've been working with this simplified case of just 610 00:33:57,330 --> 00:34:00,350 having neurons have autapses. 611 00:34:00,350 --> 00:34:03,290 And the reason we've been doing that is because the answer 612 00:34:03,290 --> 00:34:06,380 you get for the autapse kind of captures 613 00:34:06,380 --> 00:34:09,080 almost all the intuition that you need to have. 614 00:34:09,080 --> 00:34:10,820 What we're going to do is we're going 615 00:34:10,820 --> 00:34:14,270 to take a fully recurrent neural network, 616 00:34:14,270 --> 00:34:17,150 and we're going to do a mathematical trick that 617 00:34:17,150 --> 00:34:19,310 just turns it into an autapse network. 618 00:34:22,340 --> 00:34:25,280 And the answer for the fully recurrent network 619 00:34:25,280 --> 00:34:30,113 is just going to be just as simple as what you saw here. 620 00:34:30,113 --> 00:34:31,280 All right, so let's do that. 621 00:34:31,280 --> 00:34:33,620 Let's take this fully recurrent network. 622 00:34:33,620 --> 00:34:36,980 Our weight matrix m now, instead of just having 623 00:34:36,980 --> 00:34:39,935 diagonal elements, also has off-diagonal elements. 624 00:34:42,820 --> 00:34:44,820 And I'll say that one of the things that we're 625 00:34:44,820 --> 00:34:47,580 going to do today is just consider the simplest 626 00:34:47,580 --> 00:34:51,239 case of this fully recurrent network, where 627 00:34:51,239 --> 00:34:55,889 the connections are symmetric, where a connection from v1 628 00:34:55,889 --> 00:35:00,180 to v2 is equal to the connection from v2 to v1, all right? 629 00:35:00,180 --> 00:35:04,050 We're going to do that because that's the next thing 630 00:35:04,050 --> 00:35:06,150 to do to build our intuition, and it's 631 00:35:06,150 --> 00:35:12,360 also mathematically simpler than the fully general case, OK? 632 00:35:12,360 --> 00:35:15,390 So we saw how the behavior of this network 633 00:35:15,390 --> 00:35:17,610 is very simple if m is diagonal. 634 00:35:20,273 --> 00:35:21,690 So what we're going to do is we're 635 00:35:21,690 --> 00:35:26,030 going to take this arbitrary matrix m, 636 00:35:26,030 --> 00:35:28,760 and we're going to just make it diagonal. 637 00:35:28,760 --> 00:35:31,140 So let's do that. 638 00:35:31,140 --> 00:35:35,720 So we're going to rewrite our weight matrix m as-- 639 00:35:35,720 --> 00:35:46,910 so we're going to rewrite m in this form, where this phi-- 640 00:35:46,910 --> 00:35:52,210 sorry, where this lambda is a diagonal matrix. 641 00:35:52,210 --> 00:35:54,340 So we're going to take this network 642 00:35:54,340 --> 00:35:58,200 with recurrent connections between different neurons 643 00:35:58,200 --> 00:36:03,780 in the network, and we're going to transform it 644 00:36:03,780 --> 00:36:07,325 into sort of an equivalent network that just has autapses. 645 00:36:11,310 --> 00:36:13,950 So how do we write m in this form, 646 00:36:13,950 --> 00:36:17,310 with a rotation matrix times a diagonal matrix 647 00:36:17,310 --> 00:36:19,870 times a rotation matrix? 648 00:36:19,870 --> 00:36:26,170 We just solve this eigenvalue equation, OK? 649 00:36:26,170 --> 00:36:27,620 Does that make sense? 650 00:36:27,620 --> 00:36:29,630 We're just going to do exactly the same thing 651 00:36:29,630 --> 00:36:36,570 we did in PCA, where we find the covariance matrix. 652 00:36:36,570 --> 00:36:39,870 And we rewrote the covariance matrix like this. 653 00:36:39,870 --> 00:36:42,300 Now we're going to take a weight matrix 654 00:36:42,300 --> 00:36:46,830 of this recurrent network, and we're 655 00:36:46,830 --> 00:36:51,090 going to rewrite it in exactly the same way. 656 00:36:51,090 --> 00:36:55,110 So that process is called diagonalizing the weight 657 00:36:55,110 --> 00:36:57,860 matrix. 658 00:36:57,860 --> 00:37:04,040 So the elements of lambda here are the eigenvalues of m. 659 00:37:06,740 --> 00:37:12,320 And the columns of the phi are the eigenvectors of m. 660 00:37:15,606 --> 00:37:22,590 And we're going to use these quantities, these elements, 661 00:37:22,590 --> 00:37:27,450 to build a new network that has the same properties 662 00:37:27,450 --> 00:37:32,400 as our recurrent network. 663 00:37:32,400 --> 00:37:34,740 So let me just show you how we do that. 664 00:37:34,740 --> 00:37:38,190 So remember that what this eigenvalue-- 665 00:37:38,190 --> 00:37:43,470 this is an eigenvalue equation written in matrix notation. 666 00:37:43,470 --> 00:37:51,340 What this means is this is set of eigenvalues equations that 667 00:37:51,340 --> 00:37:54,888 have-- it's a set of n eigenvalue equations 668 00:37:54,888 --> 00:37:56,430 like this, where there's one of these 669 00:37:56,430 --> 00:37:58,280 for each neuron in the network. 670 00:37:58,280 --> 00:38:00,390 OK, so let me just go through that. 671 00:38:00,390 --> 00:38:02,340 OK, so here's the eigenvalue equation. 672 00:38:02,340 --> 00:38:08,180 If M is a symmetric matrix, then the eigenvalues are real 673 00:38:08,180 --> 00:38:10,440 and phi is a rotation matrix. 674 00:38:10,440 --> 00:38:14,150 And the eigenvectors give us an orthogonal basis, all right? 675 00:38:14,150 --> 00:38:16,450 So everybody remember this from a few lectures ago? 676 00:38:19,090 --> 00:38:21,070 If M is symmetric-- and this is why 677 00:38:21,070 --> 00:38:23,420 we're going to, at this point on, consider 678 00:38:23,420 --> 00:38:26,390 just the case where M is symmetric, 679 00:38:26,390 --> 00:38:30,970 then the eigenvectors, the columns of that matrix phi, 680 00:38:30,970 --> 00:38:37,600 give us an orthogonal set of vectors and their unit vectors. 681 00:38:37,600 --> 00:38:41,710 So it satisfies this orthonormal condition. 682 00:38:41,710 --> 00:38:45,010 And phi transpose phi is an identity matrix, which 683 00:38:45,010 --> 00:38:48,670 means phi is a rotation matrix. 684 00:38:48,670 --> 00:38:51,670 OK, so now what we're going to do is rewrite. 685 00:38:51,670 --> 00:38:53,950 The first thing we're going to do to use this trick 686 00:38:53,950 --> 00:38:57,220 to rewrite our matrix, our network, 687 00:38:57,220 --> 00:39:01,000 is to rewrite the vector of firing rates v 688 00:39:01,000 --> 00:39:01,950 in this new basis. 689 00:39:01,950 --> 00:39:02,950 What are we going to do? 690 00:39:02,950 --> 00:39:07,120 Well take the vector and all we're going to do 691 00:39:07,120 --> 00:39:11,170 is to rewrite that vector in this new basis set. 692 00:39:11,170 --> 00:39:14,710 We're just going to do a change of basis of our firing rate 693 00:39:14,710 --> 00:39:17,950 vector into a new basis set that's 694 00:39:17,950 --> 00:39:20,170 given by the columns of phi. 695 00:39:23,507 --> 00:39:25,090 Another way of saying it is that we're 696 00:39:25,090 --> 00:39:30,190 going to rotate this firing rate vector v using the phi rotation 697 00:39:30,190 --> 00:39:32,170 matrix. 698 00:39:32,170 --> 00:39:35,620 So we're going to project v onto each one of those new basis 699 00:39:35,620 --> 00:39:36,170 vectors. 700 00:39:36,170 --> 00:39:39,280 So there's v in the standard basis. 701 00:39:39,280 --> 00:39:42,100 There's our new basis, f1 and f2. 702 00:39:42,100 --> 00:39:45,160 We're going to project v onto f1 and f2 703 00:39:45,160 --> 00:39:52,300 and write down that scalar projection, c1 and c2. 704 00:39:52,300 --> 00:39:56,230 So we're going to write down the scalar projection of v 705 00:39:56,230 --> 00:39:59,270 onto each one of those basis vectors. 706 00:39:59,270 --> 00:40:01,870 So we can write that c sub alpha-- 707 00:40:01,870 --> 00:40:04,040 that's the alpha-th component-- 708 00:40:04,040 --> 00:40:13,180 is just v dot the alpha-th basis vector. 709 00:40:13,180 --> 00:40:16,510 So now we can express v as a linear combination 710 00:40:16,510 --> 00:40:18,010 in this new basis. 711 00:40:21,240 --> 00:40:26,050 So it's c1 times f1 plus c2 times f2 plus c3-- 712 00:40:26,050 --> 00:40:27,870 that's supposed to be a three-- 713 00:40:27,870 --> 00:40:29,360 times f3 and so on. 714 00:40:32,860 --> 00:40:35,980 And of course, remember, we're doing all of this 715 00:40:35,980 --> 00:40:38,570 because we want to understand the dynamics. 716 00:40:38,570 --> 00:40:40,910 So these things are time dependent. 717 00:40:40,910 --> 00:40:45,100 So v is v changes in time. 718 00:40:45,100 --> 00:40:48,290 We're not going to be changing our basis vectors in time. 719 00:40:48,290 --> 00:40:50,260 So if we want to write down a time dependent v, 720 00:40:50,260 --> 00:40:51,940 it's really these coefficients that 721 00:40:51,940 --> 00:40:56,445 are changing in time, right? 722 00:40:56,445 --> 00:40:59,170 Does that make sense? 723 00:40:59,170 --> 00:41:03,742 So we can now write our vector v, our firing rate vector, 724 00:41:03,742 --> 00:41:09,750 as a sum of contributions in all these different directions 725 00:41:09,750 --> 00:41:11,070 corresponding to the new basis. 726 00:41:14,280 --> 00:41:16,170 And each one of those coefficients, c 727 00:41:16,170 --> 00:41:20,680 is just the time dependent v projected onto one 728 00:41:20,680 --> 00:41:21,940 of those basis vectors. 729 00:41:28,450 --> 00:41:30,320 And questions? 730 00:41:30,320 --> 00:41:32,050 No? 731 00:41:32,050 --> 00:41:33,410 OK. 732 00:41:33,410 --> 00:41:39,140 And remember, we can write that in matrix notation using 733 00:41:39,140 --> 00:41:42,490 this formalism that we developed in the lecture on basis sets. 734 00:41:42,490 --> 00:41:47,570 So v is just phi c, and c is just phi transpose v. 735 00:41:47,570 --> 00:41:49,640 So we're just taking this vector v, 736 00:41:49,640 --> 00:41:52,220 and we're rotating it into a new basis set, 737 00:41:52,220 --> 00:41:53,720 and we can rotate it back. 738 00:41:56,433 --> 00:41:58,100 All right, so now what we're going to do 739 00:41:58,100 --> 00:42:03,390 is we're going to take this v expressed in this new basis set 740 00:42:03,390 --> 00:42:08,870 and were going to rewrite our equation in that new basis set. 741 00:42:11,770 --> 00:42:12,500 Watch this. 742 00:42:12,500 --> 00:42:14,560 This is so cool. 743 00:42:14,560 --> 00:42:16,145 All right, you ready? 744 00:42:16,145 --> 00:42:18,520 We're going to take this, and we're to plug it into here. 745 00:42:22,140 --> 00:42:28,170 So dv dt is phi dc dt. 746 00:42:28,170 --> 00:42:31,786 V is just phi c. 747 00:42:31,786 --> 00:42:36,550 v is phi c, and h doesn't change. 748 00:42:36,550 --> 00:42:40,645 So now what is that? 749 00:42:45,270 --> 00:42:48,215 Do you remember? 750 00:42:48,215 --> 00:42:49,610 AUDIENCE: Phi [INAUDIBLE]. 751 00:42:49,610 --> 00:42:50,600 MICHALE FEE: Right. 752 00:42:50,600 --> 00:42:57,440 We got phi as the solution to the eigenvalue equation. 753 00:42:57,440 --> 00:43:00,010 What was the eigenvalue equation? 754 00:43:00,010 --> 00:43:06,150 The eigenvalue equation was m phi equals phi lambda. 755 00:43:06,150 --> 00:43:09,870 So the phi here, this rotation matrix, 756 00:43:09,870 --> 00:43:13,980 is the solution to this equation, all right? 757 00:43:13,980 --> 00:43:18,870 So we're given m, and we're saying 758 00:43:18,870 --> 00:43:20,950 we're going to find a phi and a lambda 759 00:43:20,950 --> 00:43:26,110 such that we can write m phi is equal to phi lambda. 760 00:43:26,110 --> 00:43:32,220 So when we take that matrix m and we run eig on it in Matlab, 761 00:43:32,220 --> 00:43:37,150 Matlab sends us back a phi and a lambda such that this equation 762 00:43:37,150 --> 00:43:37,650 is true. 763 00:43:41,120 --> 00:43:43,805 So literally, we can take the weight matrix 764 00:43:43,805 --> 00:43:47,860 m stick it into Matlab, and get a phi and a lambda 765 00:43:47,860 --> 00:43:51,790 such that m phi is equal to phi lambda. 766 00:43:51,790 --> 00:43:59,020 So m phi is equal to what? 767 00:43:59,020 --> 00:43:59,800 Phi lambda. 768 00:44:04,020 --> 00:44:06,810 That becomes this. 769 00:44:06,810 --> 00:44:11,350 Now, all of a sudden, this thing is just going to simplify. 770 00:44:14,480 --> 00:44:16,265 So how would we simplify this equation? 771 00:44:19,270 --> 00:44:23,080 We can get rid of all of these things, all of these phi's, 772 00:44:23,080 --> 00:44:24,220 by doing what? 773 00:44:24,220 --> 00:44:25,555 How do you get rid of phi's? 774 00:44:25,555 --> 00:44:27,430 AUDIENCE: Multiply [INAUDIBLE] phi transpose. 775 00:44:27,430 --> 00:44:29,710 MICHALE FEE: You multiply by phi transpose, exactly. 776 00:44:29,710 --> 00:44:32,470 So we're going to multiply each term in this equation 777 00:44:32,470 --> 00:44:35,760 by phi transpose. 778 00:44:35,760 --> 00:44:37,860 So what do you have? 779 00:44:37,860 --> 00:44:42,990 Phi transpose phi, phi transpose phi, phi transpose phi. 780 00:44:42,990 --> 00:44:46,780 What is phi transpose phi equal to? 781 00:44:46,780 --> 00:44:48,700 The identity matrix. 782 00:44:48,700 --> 00:44:51,220 Because it's a rotation matrix, phi transpose 783 00:44:51,220 --> 00:44:54,730 is just the inverse of phi. 784 00:44:54,730 --> 00:44:58,550 So phi inverse phi is just equal to the identity matrix. 785 00:44:58,550 --> 00:45:00,680 And all those things disappear. 786 00:45:00,680 --> 00:45:02,800 And you're left with this equation-- 787 00:45:02,800 --> 00:45:09,370 tau dc dt equals minus c plus lambda c plus h, hf. 788 00:45:09,370 --> 00:45:10,450 And what is hf? 789 00:45:10,450 --> 00:45:13,580 hf is just h rotated into the new basis set. 790 00:45:16,370 --> 00:45:20,980 So this is the equation for a recurrent network 791 00:45:20,980 --> 00:45:28,120 with just autapses, which we just understood. 792 00:45:28,120 --> 00:45:30,610 We just wrote down what the solution is, right? 793 00:45:30,610 --> 00:45:33,130 And we plotted it for different values of lambda. 794 00:45:40,380 --> 00:45:44,260 So now let's just look at what some of these look like. 795 00:45:44,260 --> 00:45:52,360 So we've rewritten our weight matrix in a new basis set. 796 00:45:52,360 --> 00:45:55,540 We've rebuilt our network and a new basis set, 797 00:45:55,540 --> 00:45:59,860 in a rotated basis set where everything simplifies. 798 00:45:59,860 --> 00:46:02,380 So we've taken this complicated network 799 00:46:02,380 --> 00:46:07,540 with recurrent connections and we've rewritten it 800 00:46:07,540 --> 00:46:10,600 in a new network, where each of these neurons 801 00:46:10,600 --> 00:46:13,480 in our new network corresponds to what's 802 00:46:13,480 --> 00:46:18,820 called a mode of the fully recurrent network. 803 00:46:22,200 --> 00:46:28,850 So the activities c alpha c1 and c2 of the network modes 804 00:46:28,850 --> 00:46:33,770 represent kind of an activity in a linear combination 805 00:46:33,770 --> 00:46:35,180 of these neurons. 806 00:46:35,180 --> 00:46:40,360 So we're going to go through what that means now. 807 00:46:40,360 --> 00:46:42,970 So the first thing I want to do is just calculate 808 00:46:42,970 --> 00:46:46,960 what the steady state response is in this neuron. 809 00:46:46,960 --> 00:46:48,770 And I'll just do it mathematically, 810 00:46:48,770 --> 00:46:51,550 and then I'll show you what it looks like graphically. 811 00:46:54,400 --> 00:46:56,320 So there's our original network equation. 812 00:46:56,320 --> 00:47:00,380 We've rewritten it a set of differential equations 813 00:47:00,380 --> 00:47:03,320 for the modes of this network. 814 00:47:06,470 --> 00:47:10,270 I'm just rewriting this by putting an I here, 815 00:47:10,270 --> 00:47:12,040 minus I times c. 816 00:47:12,040 --> 00:47:14,150 That's the only change I made here. 817 00:47:14,150 --> 00:47:15,475 I just rewrote it like this. 818 00:47:20,450 --> 00:47:21,670 Let's find a steady state. 819 00:47:21,670 --> 00:47:24,760 So we're going to set dc dt equal to zero. 820 00:47:24,760 --> 00:47:28,850 We're going to ask, what is c in steady state? 821 00:47:28,850 --> 00:47:33,310 So we're going to call that c infinity, all right? 822 00:47:33,310 --> 00:47:37,520 I minus lambda times c infinity equals phi transpose h. 823 00:47:37,520 --> 00:47:38,740 OK, don't panic. 824 00:47:38,740 --> 00:47:41,480 It's all going to be very simple in a second. 825 00:47:41,480 --> 00:47:47,230 c infinity is just I minus lambda inverse phi transpose h. 826 00:47:47,230 --> 00:47:49,300 But I is diagonal. 827 00:47:49,300 --> 00:47:50,560 Lambda is diagonal. 828 00:47:50,560 --> 00:47:53,730 So I minus lambda inverse is just the-- 829 00:47:53,730 --> 00:47:58,600 it's a diagonal matrix with these elements with one 830 00:47:58,600 --> 00:48:00,145 over all those diagonal elements. 831 00:48:04,290 --> 00:48:06,870 Now let's calculate v infinity. v infinity 832 00:48:06,870 --> 00:48:09,430 is just phi times v infinity. 833 00:48:09,430 --> 00:48:12,390 So here, we're multiplying on the left by phi. 834 00:48:12,390 --> 00:48:14,710 That's just v infinity. 835 00:48:14,710 --> 00:48:16,750 So v infinity is just this. 836 00:48:16,750 --> 00:48:18,330 So what is this? 837 00:48:18,330 --> 00:48:21,960 This just says v infinity is some matrix-- 838 00:48:21,960 --> 00:48:23,940 it's a rotated stretch matrix-- 839 00:48:23,940 --> 00:48:25,170 times the input. 840 00:48:25,170 --> 00:48:30,500 So v infinity is just this matrix times h. 841 00:48:30,500 --> 00:48:32,050 And now let's look at what that is. 842 00:48:34,580 --> 00:48:37,610 v infinity is a matrix times h. 843 00:48:37,610 --> 00:48:39,830 We're going to call that g. 844 00:48:39,830 --> 00:48:42,600 v infinity is a gain matrix. 845 00:48:42,600 --> 00:48:45,270 We're going to think of that as a gain times the input. 846 00:48:45,270 --> 00:48:50,800 So it's just a matrix operation on the input. 847 00:48:50,800 --> 00:48:55,390 This matrix has exactly the same eigenvectors as m. 848 00:48:55,390 --> 00:48:59,290 And the eigenvalues are just 1 over 1 minus lambda. 849 00:49:01,870 --> 00:49:03,350 Hang in there. 850 00:49:03,350 --> 00:49:07,060 So what this means is that if an input is parallel 851 00:49:07,060 --> 00:49:09,940 to one of the eigenvectors of the weight matrix, 852 00:49:09,940 --> 00:49:12,520 that means the output is parallel to the input. 853 00:49:16,640 --> 00:49:19,240 So if the input is in the direction 854 00:49:19,240 --> 00:49:25,720 of one of the eigenvectors, v infinity is g times f. 855 00:49:25,720 --> 00:49:28,651 But g times f-- 856 00:49:28,651 --> 00:49:31,310 f is an eigenvector v. And what that means 857 00:49:31,310 --> 00:49:35,900 is that v infinity is parallel to f with a scaling factor 858 00:49:35,900 --> 00:49:39,310 1 over 1 minus lambda. 859 00:49:39,310 --> 00:49:39,810 All right? 860 00:49:39,810 --> 00:49:41,030 So hang in there. 861 00:49:41,030 --> 00:49:43,720 I'm going to show you what this looks like. 862 00:49:43,720 --> 00:49:48,480 So in steady state, the output will be parallel to the input 863 00:49:48,480 --> 00:49:50,490 if the input is in the direction of one 864 00:49:50,490 --> 00:49:52,950 of the eigenvectors of the network. 865 00:49:57,610 --> 00:50:00,750 So if the input is in the direction of one 866 00:50:00,750 --> 00:50:02,370 of the eigenvectors of the network, 867 00:50:02,370 --> 00:50:07,770 that means you're activating only one mode of the network. 868 00:50:07,770 --> 00:50:11,155 And only that one mode responds, and none of the other modes 869 00:50:11,155 --> 00:50:11,655 respond. 870 00:50:15,840 --> 00:50:17,760 The response of the network will be 871 00:50:17,760 --> 00:50:20,340 in the direction of that input, and it 872 00:50:20,340 --> 00:50:24,480 will be amplified or suppressed by this gain factor. 873 00:50:24,480 --> 00:50:28,260 And the time constant will also be increased or decreased 874 00:50:28,260 --> 00:50:30,350 by that factor. 875 00:50:30,350 --> 00:50:32,370 So now let's look at-- so I just kind of whizzed 876 00:50:32,370 --> 00:50:33,370 through a bunch of math. 877 00:50:33,370 --> 00:50:36,400 Let's look at what this looks like graphically 878 00:50:36,400 --> 00:50:39,050 for a few simple cases. 879 00:50:39,050 --> 00:50:41,440 And then I think it will become much more clear. 880 00:50:41,440 --> 00:50:43,600 Let's just look at a simple network, 881 00:50:43,600 --> 00:50:47,740 where we have two neurons with an excitatory connection 882 00:50:47,740 --> 00:50:51,520 from neuron one to neuron two, an excitatory connection 883 00:50:51,520 --> 00:50:53,440 from neuron two to neuron one. 884 00:50:53,440 --> 00:50:56,640 And we're going to make that weight 0.8. 885 00:50:56,640 --> 00:51:00,220 OK, so what is the weight matrix M look like? 886 00:51:00,220 --> 00:51:03,409 Just tell me what the entries are for M. 887 00:51:03,409 --> 00:51:05,630 AUDIENCE: Does it not have the autapse? 888 00:51:05,630 --> 00:51:09,370 MICHALE FEE: No, so there's no connection 889 00:51:09,370 --> 00:51:13,450 of any of these neurons onto themselves. 890 00:51:13,450 --> 00:51:15,640 AUDIENCE: So you have, like, zeros on the diagonal. 891 00:51:15,640 --> 00:51:17,098 MICHALE FEE: Zeros on the diagonal. 892 00:51:17,098 --> 00:51:18,080 Good. 893 00:51:18,080 --> 00:51:19,840 AUDIENCE: All the diagonals. 894 00:51:19,840 --> 00:51:20,720 MICHALE FEE: Good. 895 00:51:20,720 --> 00:51:22,120 Like that? 896 00:51:22,120 --> 00:51:23,030 Good. 897 00:51:23,030 --> 00:51:26,510 Connection from neuron one to itself is zero. 898 00:51:26,510 --> 00:51:32,330 The connection from post, pre is row, column. 899 00:51:32,330 --> 00:51:37,220 So onto neuron one from neuron two is 0.8. 900 00:51:37,220 --> 00:51:40,460 Onto neuron two from neuron one is 0.8. 901 00:51:40,460 --> 00:51:43,310 And neuron two onto neuron two is zero. 902 00:51:46,400 --> 00:51:51,070 So now we are just going to diagonalize this weight matrix. 903 00:51:51,070 --> 00:51:58,660 We're going to find the eigenvectors and eigenvalues. 904 00:51:58,660 --> 00:52:02,380 The eigenvectors are the columns of phi. 905 00:52:02,380 --> 00:52:04,865 And the eigenvalues are the diagonal elements of lambda. 906 00:52:08,140 --> 00:52:10,720 Let's take a look at what those eigenvectors are. 907 00:52:10,720 --> 00:52:13,860 So this vector here is f1. 908 00:52:13,860 --> 00:52:16,370 This vector here is another eigenvector, f2. 909 00:52:19,780 --> 00:52:20,785 And how did I get this? 910 00:52:24,260 --> 00:52:26,993 How did I get this from this? 911 00:52:26,993 --> 00:52:27,910 How would you do that? 912 00:52:27,910 --> 00:52:32,044 If I gave you this matrix, how would you find phi? 913 00:52:32,044 --> 00:52:33,940 AUDIENCE: Eig M. 914 00:52:33,940 --> 00:52:37,580 MICHALE FEE: Good, eig of M. Now, 915 00:52:37,580 --> 00:52:39,700 remember in the last lecture when 916 00:52:39,700 --> 00:52:45,350 we were talking about some simple cases of matrices 917 00:52:45,350 --> 00:52:49,240 that are really easy to find the eigenvectors of? 918 00:52:49,240 --> 00:52:53,350 If you have a symmetric matrix, where the diagonal elements are 919 00:52:53,350 --> 00:52:56,260 equal to each other, the eigenvectors 920 00:52:56,260 --> 00:53:01,070 are always 45 degrees here and 45 degrees there. 921 00:53:01,070 --> 00:53:07,000 And the eigenvalues are just the diagonal elements plus or minus 922 00:53:07,000 --> 00:53:08,270 the off-diagonal elements. 923 00:53:08,270 --> 00:53:15,460 So the eigenvalues here are 0.8 and minus 0.8. 924 00:53:15,460 --> 00:53:22,350 All right, so those are the two eigenvectors of this matrix, 925 00:53:22,350 --> 00:53:23,055 of this network. 926 00:53:25,830 --> 00:53:29,860 Those are the modes of the network. 927 00:53:29,860 --> 00:53:34,410 Notice that one of the modes corresponds to neuron one 928 00:53:34,410 --> 00:53:37,560 and neuron two firing together. 929 00:53:37,560 --> 00:53:40,650 The other mode corresponds to neuron one and neuron 930 00:53:40,650 --> 00:53:43,200 two firing with opposite sign-- 931 00:53:46,680 --> 00:53:50,020 minus one, one. 932 00:53:50,020 --> 00:53:54,550 So the lambda-- the diagonal elements of the lambda matrix 933 00:53:54,550 --> 00:53:56,120 are the eigenvalues. 934 00:53:56,120 --> 00:54:04,410 They're 0.8 and minus 0.8, a plus or minus b. 935 00:54:04,410 --> 00:54:07,500 Now, this gain factor, what this says 936 00:54:07,500 --> 00:54:11,960 is that if I have an input in the direction of f1, 937 00:54:11,960 --> 00:54:14,650 the response is going to be amplified by a gain. 938 00:54:14,650 --> 00:54:17,720 And remember, we just derived, on the previous slide, 939 00:54:17,720 --> 00:54:20,990 that that gain factor is just 1 over 1 940 00:54:20,990 --> 00:54:26,360 minus the eigenvalue for that eigenvector. 941 00:54:26,360 --> 00:54:34,270 In this case, the eigenvalue for mode one is 0.8. 942 00:54:34,270 --> 00:54:38,680 So 1 over 1 minus 0.8 is 5. 943 00:54:38,680 --> 00:54:43,180 So the gain in this direction is 5. 944 00:54:43,180 --> 00:54:47,650 The gain for an input in this direction 945 00:54:47,650 --> 00:54:56,380 is 1 over 1 minus negative 0.8, which is 1 over 1.8. 946 00:54:56,380 --> 00:54:58,640 Does that makes sense? 947 00:54:58,640 --> 00:55:00,370 OK, let's keep going, because I think 948 00:55:00,370 --> 00:55:01,960 it will make even more sense once we 949 00:55:01,960 --> 00:55:04,195 see how the network responds to its inputs. 950 00:55:10,910 --> 00:55:12,780 So zero input. 951 00:55:12,780 --> 00:55:16,220 Now we're going to put an input in the direction of this mode 952 00:55:16,220 --> 00:55:16,870 one. 953 00:55:16,870 --> 00:55:20,890 And you can see the mode responds a lot. 954 00:55:20,890 --> 00:55:23,000 Put a negative input in, it responds a lot. 955 00:55:23,000 --> 00:55:27,910 If we put a mode input in this direction or this direction, 956 00:55:27,910 --> 00:55:37,340 the response is suppressed by an amount of about 0.5. 957 00:55:37,340 --> 00:55:39,440 Because here, the gain is small. 958 00:55:39,440 --> 00:55:41,360 Here, the gain is big. 959 00:55:41,360 --> 00:55:43,414 So you see what's happening? 960 00:55:43,414 --> 00:55:50,070 This network looks just like an autapse network, 961 00:55:50,070 --> 00:55:53,910 but where we've taken this input and output space and just 962 00:55:53,910 --> 00:56:00,410 rotated it into a new coordinate system, into this new basis. 963 00:56:00,410 --> 00:56:00,969 Yes? 964 00:56:00,969 --> 00:56:02,636 AUDIENCE: Why did it kind of loop around 965 00:56:02,636 --> 00:56:04,650 on the one side [INAUDIBLE]? 966 00:56:04,650 --> 00:56:08,970 MICHALE FEE: OK, it's because these things are relaxing 967 00:56:08,970 --> 00:56:10,710 exponentially back to zero. 968 00:56:10,710 --> 00:56:12,630 And we got a little bit impatient 969 00:56:12,630 --> 00:56:16,560 and started the next input before it had quite gone away. 970 00:56:16,560 --> 00:56:19,320 OK, good question. 971 00:56:19,320 --> 00:56:21,750 It's just that if you really wait for a long time for it 972 00:56:21,750 --> 00:56:24,240 to settle, then the movie just takes a long time. 973 00:56:24,240 --> 00:56:26,850 But maybe it would be better to do that. 974 00:56:26,850 --> 00:56:30,510 So input this way and this way lead to a large response, 975 00:56:30,510 --> 00:56:35,280 because those inputs activate mode one, which has a big gain. 976 00:56:35,280 --> 00:56:38,540 Inputs in this direction and this direction 977 00:56:38,540 --> 00:56:41,450 have a small response, because they activate 978 00:56:41,450 --> 00:56:46,150 mode two, which has small gain. 979 00:56:46,150 --> 00:56:51,730 But notice that when you activate mode one-- 980 00:56:51,730 --> 00:56:54,230 when you put an input in this direction, 981 00:56:54,230 --> 00:56:58,000 it only activates mode one. 982 00:56:58,000 --> 00:57:01,060 And it doesn't activate mode two at all. 983 00:57:01,060 --> 00:57:03,830 If you put an input in this direction, 984 00:57:03,830 --> 00:57:06,070 then it only activates mode two, and it doesn't 985 00:57:06,070 --> 00:57:07,780 activate mode one at all. 986 00:57:11,220 --> 00:57:15,360 So it's just like the autapse network, but rotated. 987 00:57:18,490 --> 00:57:27,830 So now let's do the case where we have an input that 988 00:57:27,830 --> 00:57:29,840 activates both modes. 989 00:57:29,840 --> 00:57:33,300 So let's say we put an input in this direction. 990 00:57:33,300 --> 00:57:37,730 What does that direction correspond to h up. 991 00:57:37,730 --> 00:57:41,330 What is that input mean here in terms of h1 and h2? 992 00:57:46,560 --> 00:57:49,590 Let's say we just put an input-- remember, 993 00:57:49,590 --> 00:57:55,050 this is a plot on axes h1 versus h2. 994 00:57:55,050 --> 00:57:57,570 So this input vector h corresponds 995 00:57:57,570 --> 00:58:04,220 to just putting an input on h2, into this neuron. 996 00:58:04,220 --> 00:58:08,120 So you can see that when we put an input in this direction, 997 00:58:08,120 --> 00:58:09,500 we're activating-- 998 00:58:09,500 --> 00:58:14,210 that input has a projection onto mode one and mode two. 999 00:58:14,210 --> 00:58:16,070 So we're activating both modes. 1000 00:58:19,200 --> 00:58:23,280 You can see that the input h has a projection 1001 00:58:23,280 --> 00:58:27,900 onto f1 and projection onto f2. 1002 00:58:27,900 --> 00:58:28,860 So what you do is-- 1003 00:58:34,090 --> 00:58:36,340 well, here, I'm just showing you what the steady state 1004 00:58:36,340 --> 00:58:39,490 response is mathematically. 1005 00:58:39,490 --> 00:58:42,280 Let me just show you what that looks like. 1006 00:58:42,280 --> 00:58:46,250 What this says is that if we put an h in this direction, 1007 00:58:46,250 --> 00:58:50,140 it's going to activate a little bit of mode one 1008 00:58:50,140 --> 00:58:54,940 with a big gain and a little bit of mode two 1009 00:58:54,940 --> 00:58:56,380 with a very small gain. 1010 00:58:56,380 --> 00:59:01,880 And so the steady state response will be the sum of those two. 1011 00:59:01,880 --> 00:59:04,240 It'll be up here. 1012 00:59:04,240 --> 00:59:09,120 So the steady state response to this input in this direction 1013 00:59:09,120 --> 00:59:10,810 is going to be over here. 1014 00:59:10,810 --> 00:59:11,670 Why? 1015 00:59:11,670 --> 00:59:16,680 Because that input activates mode one and mode two both. 1016 00:59:16,680 --> 00:59:20,180 But the response of mode one is big, 1017 00:59:20,180 --> 00:59:23,150 and the response of mode two is really small. 1018 00:59:23,150 --> 00:59:24,830 And so the steady state response is 1019 00:59:24,830 --> 00:59:29,180 going to be way over here because 1020 00:59:29,180 --> 00:59:32,330 of the big response, the amplified response of mode two, 1021 00:59:32,330 --> 00:59:35,750 which is in this direction, OK? 1022 00:59:35,750 --> 00:59:37,442 So when we put an input straight up, 1023 00:59:37,442 --> 00:59:38,900 the response of the network's going 1024 00:59:38,900 --> 00:59:40,760 to be all the way over here. 1025 00:59:40,760 --> 00:59:43,640 How is it going to get there? 1026 00:59:43,640 --> 00:59:44,390 Let's take a look. 1027 00:59:52,570 --> 00:59:55,110 We're going to put an input-- 1028 00:59:55,110 --> 00:59:58,283 sorry, that was first in this direction. 1029 00:59:58,283 --> 00:59:59,700 Now let's see what happens when we 1030 00:59:59,700 --> 01:00:01,720 put an input in this direction. 1031 01:00:01,720 --> 01:00:06,150 You can see the response is really big along the mode one 1032 01:00:06,150 --> 01:00:08,250 direction, in this direction, and it's 1033 01:00:08,250 --> 01:00:12,550 really small in this direction. 1034 01:00:12,550 --> 01:00:18,380 So input up in the upward direction onto just this neuron 1035 01:00:18,380 --> 01:00:21,690 produces a large response in mode, 1036 01:00:21,690 --> 01:00:24,020 which is this way, and a very small response 1037 01:00:24,020 --> 01:00:26,570 in mode two, which is this way. 1038 01:00:26,570 --> 01:00:32,380 The response in mode two is very fast, because the lambda, 1039 01:00:32,380 --> 01:00:37,192 the 1 over 1 minus lambda, is small, 1040 01:00:37,192 --> 01:00:39,810 which makes the time constant faster 1041 01:00:39,810 --> 01:00:43,230 and the response smaller. 1042 01:00:43,230 --> 01:00:45,990 So, again, it's just like the response 1043 01:00:45,990 --> 01:00:50,422 of the autapse network, but rotated 1044 01:00:50,422 --> 01:00:51,630 into a new coordinate system. 1045 01:00:56,670 --> 01:00:58,290 All right, any questions about that? 1046 01:01:02,610 --> 01:01:06,060 So you can see we basically understood everything 1047 01:01:06,060 --> 01:01:09,660 we needed to know about recurrent networks 1048 01:01:09,660 --> 01:01:17,920 just by understanding simple networks with just autapses. 1049 01:01:17,920 --> 01:01:21,830 And all these more complicated networks 1050 01:01:21,830 --> 01:01:25,190 are just nothing but rotated versions 1051 01:01:25,190 --> 01:01:27,710 of the response of a network with just autapses. 1052 01:01:36,998 --> 01:01:38,040 Any questions about that? 1053 01:01:41,990 --> 01:01:44,350 OK, let's do another network now where 1054 01:01:44,350 --> 01:01:46,210 we have inhibitory connections. 1055 01:01:46,210 --> 01:01:50,350 That's called mutual inhibition. 1056 01:01:50,350 --> 01:01:52,870 And let's make that inhibition minus 0.8. 1057 01:01:52,870 --> 01:01:55,690 The weight matrix is just zeros on the diagonals, 1058 01:01:55,690 --> 01:01:57,940 because there's no autapse here. 1059 01:01:57,940 --> 01:02:03,230 And minus 0.8 on the off-diagonals. 1060 01:02:03,230 --> 01:02:10,338 What are the eigenvectors for this matrix, for this network? 1061 01:02:10,338 --> 01:02:11,790 AUDIENCE: The same. 1062 01:02:11,790 --> 01:02:13,890 MICHALE FEE: Yeah, because the diagonal 1063 01:02:13,890 --> 01:02:15,430 elements are equal to each other, 1064 01:02:15,430 --> 01:02:18,070 and the off-diagonal elements are equal to each other. 1065 01:02:18,070 --> 01:02:21,990 It's a symmetric network with equal diagonal elements. 1066 01:02:21,990 --> 01:02:26,440 The eigenvectors are always at 45 degrees. 1067 01:02:26,440 --> 01:02:28,000 And what are the eigenvalues? 1068 01:02:30,940 --> 01:02:34,370 AUDIENCE: [INAUDIBLE] 1069 01:02:34,370 --> 01:02:36,458 MICHALE FEE: Well, the two numbers 1070 01:02:36,458 --> 01:02:37,500 are going to be the same. 1071 01:02:37,500 --> 01:02:44,320 It's zero plus and minus 0.8, plus and minus negative 0.8, 1072 01:02:44,320 --> 01:02:47,400 which is just 0.8 and minus 0.8, right? 1073 01:02:47,400 --> 01:02:47,910 Good. 1074 01:02:47,910 --> 01:02:51,390 So the eigenvalues are just 0.8 and minus 0.8. 1075 01:02:51,390 --> 01:02:55,590 But the eigenvalues correspond to different eigenvectors. 1076 01:02:55,590 --> 01:02:59,760 So now the eigenvalue mode in the 1, 1077 01:02:59,760 --> 01:03:04,170 1 direction is now minus 0.8, which 1078 01:03:04,170 --> 01:03:09,270 means it's suppressing the response in this direction. 1079 01:03:09,270 --> 01:03:12,980 And the eigenvalue for the eigenvector in the minus 1, 1080 01:03:12,980 --> 01:03:16,530 1 direction is now close to 1, which 1081 01:03:16,530 --> 01:03:20,880 means that mode has a lot of recurrent feedback. 1082 01:03:20,880 --> 01:03:25,480 And so its response in this direction is going to be big. 1083 01:03:25,480 --> 01:03:26,970 It's going to be amplified. 1084 01:03:26,970 --> 01:03:31,580 So unlike the case where we had positive recurrent synapses, 1085 01:03:31,580 --> 01:03:35,070 where we had amplification in this direction, now 1086 01:03:35,070 --> 01:03:37,920 we're going to have amplification 1087 01:03:37,920 --> 01:03:39,565 in this direction. 1088 01:03:39,565 --> 01:03:40,440 Does that make sense? 1089 01:03:43,500 --> 01:03:44,730 Think of it this way-- 1090 01:03:44,730 --> 01:03:48,320 if we go back to this network here, 1091 01:03:48,320 --> 01:03:51,950 you can see that when these two neurons-- 1092 01:03:51,950 --> 01:03:56,690 when this neuron is active, it tends to activate this neuron. 1093 01:03:56,690 --> 01:03:58,230 And when this neuron is activate, 1094 01:03:58,230 --> 01:04:00,150 it tends to activate that neuron. 1095 01:04:00,150 --> 01:04:05,150 So this network, if you were to activate one of these neurons, 1096 01:04:05,150 --> 01:04:08,480 it tends to drive the other neuron also. 1097 01:04:08,480 --> 01:04:13,040 And so the activity of those two neurons likes to go together. 1098 01:04:13,040 --> 01:04:15,520 When one is big, the other one wants to be big. 1099 01:04:15,520 --> 01:04:21,800 And that's why there's a lot of gain in this direction. 1100 01:04:21,800 --> 01:04:23,110 Does that make sense? 1101 01:04:23,110 --> 01:04:26,440 With these recurrent excitatory connections, 1102 01:04:26,440 --> 01:04:29,860 it's hard to make this neuron fire 1103 01:04:29,860 --> 01:04:32,470 and make that neuron not fire. 1104 01:04:32,470 --> 01:04:36,300 And that's why the response is suppressed in this direction, 1105 01:04:36,300 --> 01:04:36,910 OK? 1106 01:04:36,910 --> 01:04:41,920 With this network, when this neuron is active, 1107 01:04:41,920 --> 01:04:43,860 it's trying to suppress that neuron. 1108 01:04:47,100 --> 01:04:49,050 When that neuron has positive firing rate, 1109 01:04:49,050 --> 01:04:51,870 it's trying to make that neuron have a negative firing rate. 1110 01:04:51,870 --> 01:04:53,850 When that neuron is negative, it tries 1111 01:04:53,850 --> 01:04:55,470 to make that one go positive. 1112 01:04:55,470 --> 01:04:58,140 And so this network likes to have 1113 01:04:58,140 --> 01:05:04,990 one firing positive and the other neuron going negative. 1114 01:05:04,990 --> 01:05:06,550 And so that's what happens. 1115 01:05:06,550 --> 01:05:16,580 What you find is that if you put an input into the first neuron, 1116 01:05:16,580 --> 01:05:20,330 it tends to suppress the activity in the second neuron, 1117 01:05:20,330 --> 01:05:21,980 in v2. 1118 01:05:21,980 --> 01:05:27,390 If you put neuron into neuron two, 1119 01:05:27,390 --> 01:05:29,220 it tends to suppress the activity, 1120 01:05:29,220 --> 01:05:32,040 or make v1 go negative. 1121 01:05:32,040 --> 01:05:36,810 So it's, again, exactly like the autapse network, 1122 01:05:36,810 --> 01:05:42,590 but just, in this case, rotated minus 45 degrees instead 1123 01:05:42,590 --> 01:05:44,950 of plus 45 degrees, OK? 1124 01:05:51,750 --> 01:05:55,000 Any questions about that? 1125 01:05:55,000 --> 01:05:55,780 All right. 1126 01:05:55,780 --> 01:05:59,830 So now let's talk about how-- 1127 01:05:59,830 --> 01:06:00,989 yes, Linda? 1128 01:06:00,989 --> 01:06:03,489 AUDIENCE: So we just did, those were all symmetric matrices, 1129 01:06:03,489 --> 01:06:04,390 right? 1130 01:06:04,390 --> 01:06:05,098 MICHALE FEE: Yes. 1131 01:06:05,098 --> 01:06:08,397 AUDIENCE: So [INAUDIBLE] can we not do this strategy 1132 01:06:08,397 --> 01:06:09,380 if it's not symmetric? 1133 01:06:09,380 --> 01:06:11,690 MICHALE FEE: You can do it for non-symmetric matrices, 1134 01:06:11,690 --> 01:06:15,260 but non-symmetric matrices start doing 1135 01:06:15,260 --> 01:06:17,330 all kinds of other cool stuff that 1136 01:06:17,330 --> 01:06:20,730 is a topic for another day. 1137 01:06:20,730 --> 01:06:25,650 So symmetric matrices are special in that they 1138 01:06:25,650 --> 01:06:31,380 have very simple dynamics. 1139 01:06:31,380 --> 01:06:37,930 They just relax to a steady state solution. 1140 01:06:37,930 --> 01:06:40,980 Weight matrices that are not symmetric, or even 1141 01:06:40,980 --> 01:06:43,230 anti-symmetric, tend to do really cool things 1142 01:06:43,230 --> 01:06:46,590 like oscillating. 1143 01:06:46,590 --> 01:06:50,670 And we'll get to that in another lecture, all right? 1144 01:06:50,670 --> 01:06:55,170 OK, so now let's talk about using recurrent networks 1145 01:06:55,170 --> 01:06:57,150 to store memories. 1146 01:06:57,150 --> 01:07:00,360 So, remember, all of the cases we've just 1147 01:07:00,360 --> 01:07:03,960 described, all of the networks we've just described, 1148 01:07:03,960 --> 01:07:08,340 had the properties that the lambdas were less than one. 1149 01:07:08,340 --> 01:07:10,190 So what we've been looking at are 1150 01:07:10,190 --> 01:07:13,970 networks for which lambda is less than one 1151 01:07:13,970 --> 01:07:18,320 and they're symmetric weight matrices. 1152 01:07:18,320 --> 01:07:20,008 So that was kind of a special case, 1153 01:07:20,008 --> 01:07:21,800 but it's a good case for building intuition 1154 01:07:21,800 --> 01:07:24,050 about what goes on. 1155 01:07:24,050 --> 01:07:25,800 But now we're going to start branching out 1156 01:07:25,800 --> 01:07:30,310 into more interesting behavior. 1157 01:07:33,040 --> 01:07:37,090 So let's take a look at what happens to our equation. 1158 01:07:37,090 --> 01:07:41,170 This is now our equation different modes of a network. 1159 01:07:41,170 --> 01:07:43,930 What happens to this equation when lambda is actually 1160 01:07:43,930 --> 01:07:46,670 equal to one? 1161 01:07:46,670 --> 01:07:52,210 So when lambda is equal to one, this term goes to zero, right? 1162 01:07:52,210 --> 01:07:58,170 So we can just cross this out and rewrite our equation 1163 01:07:58,170 --> 01:08:06,710 as tau dc dt equals f1 f dot h. 1164 01:08:06,710 --> 01:08:09,238 So what is this? 1165 01:08:09,238 --> 01:08:10,280 What does that look like? 1166 01:08:13,850 --> 01:08:21,130 What's the solution to c for this differential equation? 1167 01:08:21,130 --> 01:08:25,420 Does this exponentially relax toward a v infinity? 1168 01:08:29,640 --> 01:08:31,840 What is v infinity here? 1169 01:08:31,840 --> 01:08:34,770 It's not even defined. 1170 01:08:34,770 --> 01:08:38,399 If you set dc dt equal to zero, there's not even a c 1171 01:08:38,399 --> 01:08:39,899 to solve for, right? 1172 01:08:39,899 --> 01:08:41,399 So what is this? 1173 01:08:46,290 --> 01:08:49,890 The derivative of c is just equal to-- 1174 01:08:49,890 --> 01:08:55,238 if we put in an input that's constant, what is c? 1175 01:08:55,238 --> 01:08:57,510 AUDIENCE: [INAUDIBLE] 1176 01:08:57,510 --> 01:09:00,290 MICHALE FEE: This is an integrator, right? 1177 01:09:00,290 --> 01:09:04,609 This c, the solution to this equation, 1178 01:09:04,609 --> 01:09:10,960 is that c is the integral of this input. 1179 01:09:10,960 --> 01:09:16,960 c is some initial c plus the integral over time. 1180 01:09:22,279 --> 01:09:25,180 So if we have an input-- 1181 01:09:25,180 --> 01:09:28,050 and again, what we're plotting here 1182 01:09:28,050 --> 01:09:34,370 is the activity of one of the modes of our network, c1, 1183 01:09:34,370 --> 01:09:37,430 which is a function of the projection 1184 01:09:37,430 --> 01:09:42,350 of the input along the eigenvector of mode one. 1185 01:09:42,350 --> 01:09:46,189 So we're going to plot h, which is just how much the input 1186 01:09:46,189 --> 01:09:50,000 overlaps with mode one. 1187 01:09:50,000 --> 01:09:53,810 And as a function of time, let's start at one equals zero. 1188 01:09:53,810 --> 01:09:54,890 What will this look like? 1189 01:09:59,710 --> 01:10:02,250 This will just increase linearly. 1190 01:10:02,250 --> 01:10:03,514 And then what happens? 1191 01:10:06,993 --> 01:10:08,120 What happens here? 1192 01:10:13,650 --> 01:10:14,357 Raymundo? 1193 01:10:14,357 --> 01:10:15,690 AUDIENCE: R just stays constant. 1194 01:10:15,690 --> 01:10:18,400 MICHALE FEE: Good. 1195 01:10:18,400 --> 01:10:21,730 We've been through that, like, 100 times in this class. 1196 01:10:25,220 --> 01:10:33,600 Now, what's special about this network is that remember, 1197 01:10:33,600 --> 01:10:37,350 when lambda was less than one, the network 1198 01:10:37,350 --> 01:10:39,120 would respond to the input. 1199 01:10:39,120 --> 01:10:41,370 And then what would it do when we took the input away? 1200 01:10:44,830 --> 01:10:47,300 It would decay back to zero. 1201 01:10:47,300 --> 01:10:51,070 But this network does something really special. 1202 01:10:51,070 --> 01:10:53,620 This network, you put an input in and then 1203 01:10:53,620 --> 01:10:58,360 take the input away, this network stays active. 1204 01:10:58,360 --> 01:11:02,920 It remembers what the input was. 1205 01:11:02,920 --> 01:11:06,220 Whereas, if you have a network where lambda is less than one, 1206 01:11:06,220 --> 01:11:12,237 the network very quickly forgets what the input was. 1207 01:11:12,237 --> 01:11:14,570 All right, what happens when lambda is greater than one? 1208 01:11:14,570 --> 01:11:18,920 So when lambda is greater than one, this term is now-- 1209 01:11:18,920 --> 01:11:20,780 this thing inside the parentheses 1210 01:11:20,780 --> 01:11:23,490 is negative, multiplied by a negative number. 1211 01:11:23,490 --> 01:11:27,290 This whole coefficient in front of the c1 becomes positive. 1212 01:11:27,290 --> 01:11:31,050 So we're just going to write it as lambda minus one. 1213 01:11:31,050 --> 01:11:33,980 And so this because positive. 1214 01:11:33,980 --> 01:11:35,750 And what does that solution look like? 1215 01:11:35,750 --> 01:11:37,670 Does anyone know what that looks like? 1216 01:11:37,670 --> 01:11:40,760 dc dt equals a positive number times c. 1217 01:11:49,790 --> 01:11:50,990 Nobody? 1218 01:11:50,990 --> 01:11:53,495 Are we all just sleepy? 1219 01:11:57,278 --> 01:11:57,820 What happens? 1220 01:12:00,700 --> 01:12:04,950 So if this is negative, if this coefficient were negative, dc-- 1221 01:12:04,950 --> 01:12:07,650 if c is positive, then dc dt is negative, 1222 01:12:07,650 --> 01:12:11,620 and it relaxes to zero, right? 1223 01:12:11,620 --> 01:12:13,270 Lets think about this for a minute. 1224 01:12:13,270 --> 01:12:15,380 What happens if this quantity is positive? 1225 01:12:15,380 --> 01:12:16,740 So if c is positive-- 1226 01:12:19,760 --> 01:12:20,720 cover that up. 1227 01:12:20,720 --> 01:12:24,090 If this is positive and c is positive, 1228 01:12:24,090 --> 01:12:26,760 then dc dt is positive. 1229 01:12:26,760 --> 01:12:31,790 So that means if c is positive, it just keeps getting bigger, 1230 01:12:31,790 --> 01:12:32,360 right? 1231 01:12:32,360 --> 01:12:36,320 And so what happens is you get exponential growth. 1232 01:12:36,320 --> 01:12:39,670 So if we now take an input and we put it into this network, 1233 01:12:39,670 --> 01:12:41,740 where lambda is greater than one, 1234 01:12:41,740 --> 01:12:44,550 you get exponential growth. 1235 01:12:44,550 --> 01:12:47,000 And now what happens when you turn that input off? 1236 01:12:53,800 --> 01:12:55,720 Does it go away? 1237 01:13:06,031 --> 01:13:07,013 What happens? 1238 01:13:12,420 --> 01:13:14,400 draw with their hand what happens here. 1239 01:13:18,360 --> 01:13:20,690 So just look at the equation. 1240 01:13:20,690 --> 01:13:26,450 Again, h dot f1 is zero here, so that's gone. 1241 01:13:26,450 --> 01:13:28,190 This is positive. 1242 01:13:28,190 --> 01:13:30,270 c is positive. 1243 01:13:30,270 --> 01:13:31,140 So what is dc dt? 1244 01:13:33,950 --> 01:13:34,490 Good. 1245 01:13:34,490 --> 01:13:35,550 It's positive. 1246 01:13:35,550 --> 01:13:36,290 And so what is-- 1247 01:13:36,290 --> 01:13:36,850 AUDIENCE: [INAUDIBLE] 1248 01:13:36,850 --> 01:13:38,100 MICHALE FEE: It keeps growing. 1249 01:13:41,710 --> 01:13:43,620 So you can see that this network also 1250 01:13:43,620 --> 01:13:49,020 remembers that it had input. 1251 01:13:51,940 --> 01:13:54,550 So this network also has a memory. 1252 01:13:54,550 --> 01:13:58,990 So anytime you have lambda less than one the network 1253 01:13:58,990 --> 01:14:01,060 just-- as soon as the input goes away, 1254 01:14:01,060 --> 01:14:03,190 the network activity goes to zero, 1255 01:14:03,190 --> 01:14:06,220 and it just completely forgets that it ever had input. 1256 01:14:06,220 --> 01:14:09,860 Whereas, as long as lambda is equal to or greater than one, 1257 01:14:09,860 --> 01:14:15,640 then this network remembers that it had input. 1258 01:14:15,640 --> 01:14:18,790 So if lambda is less than one, then the network 1259 01:14:18,790 --> 01:14:23,330 relaxes exponentially back to zero after the input goes away. 1260 01:14:23,330 --> 01:14:26,680 If you have lambda equal to one, you have an integrator, 1261 01:14:26,680 --> 01:14:29,350 and the network activity persists 1262 01:14:29,350 --> 01:14:31,550 after the input goes away. 1263 01:14:31,550 --> 01:14:33,550 And if you have exponential growth, 1264 01:14:33,550 --> 01:14:36,400 the network activity also persists 1265 01:14:36,400 --> 01:14:37,700 after the input goes away. 1266 01:14:40,770 --> 01:14:45,020 And so that right there is one of the best 1267 01:14:45,020 --> 01:14:52,560 models for short-term memory in the brain. 1268 01:14:52,560 --> 01:14:58,440 The idea that you have neurons that get input, 1269 01:14:58,440 --> 01:15:02,700 become activated, and then hold that memory 1270 01:15:02,700 --> 01:15:08,340 by reactivating themselves and holding their own activity high 1271 01:15:08,340 --> 01:15:11,310 through recurrent excitation. 1272 01:15:11,310 --> 01:15:14,430 But that excitation has to be big enough 1273 01:15:14,430 --> 01:15:17,520 to either just barely maintain the activity 1274 01:15:17,520 --> 01:15:22,240 or continue increasing their activity. 1275 01:15:22,240 --> 01:15:25,870 OK, now, that's not necessarily such a great model 1276 01:15:25,870 --> 01:15:26,710 for a memory, right? 1277 01:15:26,710 --> 01:15:28,990 Because we can't have neurons whose activity is 1278 01:15:28,990 --> 01:15:31,750 exploding exponentially, right? 1279 01:15:31,750 --> 01:15:32,980 So that's not so great. 1280 01:15:32,980 --> 01:15:40,070 But it is quite commonly thought that in neural networks 1281 01:15:40,070 --> 01:15:43,400 involved in memory, the lambda is actually greater than one. 1282 01:15:43,400 --> 01:15:46,020 And how would we rescue this situation? 1283 01:15:46,020 --> 01:15:48,890 How would we save our network from having neurons 1284 01:15:48,890 --> 01:15:51,004 that blow up exponentially? 1285 01:15:53,610 --> 01:15:59,430 Well, remember, this was the solution 1286 01:15:59,430 --> 01:16:02,880 for a network with linear neurons. 1287 01:16:02,880 --> 01:16:07,450 But neurons in the brain are not really linear, are they? 1288 01:16:07,450 --> 01:16:09,370 They have firing rates that saturate. 1289 01:16:09,370 --> 01:16:12,140 At higher inputs, firing rates tend [AUDIO OUT].. 1290 01:16:12,140 --> 01:16:12,640 Why? 1291 01:16:12,640 --> 01:16:14,995 Because sodium channels become inactivated, 1292 01:16:14,995 --> 01:16:18,850 and the neurons can't respond that fast, right? 1293 01:16:29,430 --> 01:16:31,650 All right, this I've already said. 1294 01:16:31,650 --> 01:16:37,760 So we use what are called saturating non-linearities. 1295 01:16:37,760 --> 01:16:41,000 So it's very common to write down 1296 01:16:41,000 --> 01:16:45,230 models in which we can still have neurons that are-- 1297 01:16:45,230 --> 01:16:47,380 we can still have them approximately linear. 1298 01:16:47,380 --> 01:16:50,510 So it's quite often to have neurons that are 1299 01:16:50,510 --> 01:16:52,460 linear for small [INAUDIBLE]. 1300 01:16:52,460 --> 01:16:55,130 They can go plus and minus, but they saturate 1301 01:16:55,130 --> 01:16:57,170 on the plus side or the minus. 1302 01:16:57,170 --> 01:17:00,050 So now you can have an input to a neuron 1303 01:17:00,050 --> 01:17:03,230 that activates the neuron. 1304 01:17:03,230 --> 01:17:08,670 You can see what happens is you start activating this neuron. 1305 01:17:08,670 --> 01:17:14,730 It keeps activating itself, even as the input goes away. 1306 01:17:14,730 --> 01:17:17,790 But now, what happens is that activity 1307 01:17:17,790 --> 01:17:20,490 starts getting up into the regime where the neuron can't 1308 01:17:20,490 --> 01:17:23,130 fire any faster. 1309 01:17:23,130 --> 01:17:28,070 And so the activity becomes stable at some high value 1310 01:17:28,070 --> 01:17:29,400 of firing. 1311 01:17:29,400 --> 01:17:31,040 Does that make sense? 1312 01:17:31,040 --> 01:17:32,600 And this kind of neuron, for example, 1313 01:17:32,600 --> 01:17:38,330 can remember a plus input, or it can remember a minus input. 1314 01:17:41,415 --> 01:17:42,290 Does that make sense? 1315 01:17:42,290 --> 01:17:46,050 So that's how we can build a simple network 1316 01:17:46,050 --> 01:17:52,950 with a neuron that can remember its previous inputs 1317 01:17:52,950 --> 01:17:56,700 with a lambda that's greater than one. 1318 01:17:56,700 --> 01:18:00,540 And this right here, that basic thing, 1319 01:18:00,540 --> 01:18:05,730 is one of the models for how the hippocampus stores 1320 01:18:05,730 --> 01:18:08,820 memories, that you have hippocampal neurons that 1321 01:18:08,820 --> 01:18:11,490 connect to each other with a lot of recurrent 1322 01:18:11,490 --> 01:18:13,500 connections [AUDIO OUT] in the hippocampus 1323 01:18:13,500 --> 01:18:15,960 has a lot of recurrent connections. 1324 01:18:15,960 --> 01:18:20,100 And the idea is that those neurons activate each other, 1325 01:18:20,100 --> 01:18:25,060 but then those neurons saturate so they can't fire anymore, 1326 01:18:25,060 --> 01:18:29,230 and now you can have a stable memory of some prior input. 1327 01:18:37,020 --> 01:18:38,870 And I think we should stop there. 1328 01:18:38,870 --> 01:18:42,080 But there are other very interesting topics 1329 01:18:42,080 --> 01:18:50,990 that we're going to get to on how these kind of networks 1330 01:18:50,990 --> 01:18:54,290 can also make decisions and how they 1331 01:18:54,290 --> 01:18:58,190 can store continuous memories-- not just discrete memories, 1332 01:18:58,190 --> 01:19:00,740 plus or minus, on or off, but can 1333 01:19:00,740 --> 01:19:07,540 store a value for a long period of time using this integrator. 1334 01:19:07,540 --> 01:19:10,050 OK, so we'll stop there.