1
00:00:09,388 --> 00:00:11,680
MICHALE FEE: All right, let's
go ahead and get started.

2
00:00:11,680 --> 00:00:13,180
So we're starting
a new topic today.

3
00:00:13,180 --> 00:00:15,450
This is actually one of
my favorite lectures,

4
00:00:15,450 --> 00:00:20,430
one of my favorite subjects
in computational neuroscience.

5
00:00:20,430 --> 00:00:23,590
All right, so brief recap
of what we've been doing.

6
00:00:23,590 --> 00:00:27,750
So we've been working on circuit
models of neural networks.

7
00:00:27,750 --> 00:00:30,330
And we've been
working on what we

8
00:00:30,330 --> 00:00:32,430
call a rate model,
in which we replaced

9
00:00:32,430 --> 00:00:35,670
all the spikes of a
neuron with, essentially,

10
00:00:35,670 --> 00:00:39,000
a single number
that characterizes

11
00:00:39,000 --> 00:00:41,550
the rate at which
a neuron fires.

12
00:00:41,550 --> 00:00:45,960
We introduced a simple
network in which

13
00:00:45,960 --> 00:00:48,450
we have an input neuron
and an output neuron

14
00:00:48,450 --> 00:00:52,110
with a synaptic connection
of weight w between them.

15
00:00:52,110 --> 00:00:56,730
And that synaptic connection
leads to a synaptic input

16
00:00:56,730 --> 00:00:59,400
that's proportional
to w times the firing

17
00:00:59,400 --> 00:01:00,870
rate of the input neuron.

18
00:01:00,870 --> 00:01:03,690
And then we talked about
how we can characterize

19
00:01:03,690 --> 00:01:06,660
the output, the firing
rate of the output neuron,

20
00:01:06,660 --> 00:01:12,030
as some nonlinear function
of the total input

21
00:01:12,030 --> 00:01:15,330
to this output neuron.

22
00:01:15,330 --> 00:01:18,660
We've talked about
different F-I curves.

23
00:01:18,660 --> 00:01:22,530
We've talked about having
what's called a binary threshold

24
00:01:22,530 --> 00:01:25,360
unit, which has zero firing
below some threshold.

25
00:01:25,360 --> 00:01:28,500
And then actually, there
are different versions

26
00:01:28,500 --> 00:01:30,270
of the binary threshold unit.

27
00:01:30,270 --> 00:01:33,870
Sometimes the
firing rate is zero

28
00:01:33,870 --> 00:01:36,030
for inputs below the threshold.

29
00:01:36,030 --> 00:01:40,110
And in other models,
we use a minus 1.

30
00:01:40,110 --> 00:01:44,490
And then a constant firing rate
of one above that threshold.

31
00:01:44,490 --> 00:01:46,950
And we also talked
about linear neurons,

32
00:01:46,950 --> 00:01:49,410
where we can write down the
firing rate of the output

33
00:01:49,410 --> 00:01:54,000
neuron just as a weighted
sum of the inputs.

34
00:01:54,000 --> 00:01:56,130
And remember that
these neurons are

35
00:01:56,130 --> 00:01:59,880
kind of special in that they
can have negative firing

36
00:01:59,880 --> 00:02:05,050
rates, which is not really
biophysically plausible,

37
00:02:05,050 --> 00:02:09,690
but mathematically, it's very
convenient to have neurons

38
00:02:09,690 --> 00:02:10,530
like this.

39
00:02:10,530 --> 00:02:14,490
So we took this simple model
and we expanded it to the case

40
00:02:14,490 --> 00:02:17,790
where we have many input
neurons and many output neurons.

41
00:02:17,790 --> 00:02:24,090
So now we have a vector of input
firing rates, u, and a vector

42
00:02:24,090 --> 00:02:25,590
of output firing rates, u.

43
00:02:25,590 --> 00:02:27,860
And for the case
of linear neurons,

44
00:02:27,860 --> 00:02:29,850
we talked about how
you can write down

45
00:02:29,850 --> 00:02:33,180
the vector of firing
rates of the output neuron

46
00:02:33,180 --> 00:02:38,910
simply as a matrix product of a
weight matrix times the vector

47
00:02:38,910 --> 00:02:39,990
of input firing rates.

48
00:02:39,990 --> 00:02:45,970
And we talked about how this
can produce transformations

49
00:02:45,970 --> 00:02:47,660
of this vector of
input firing rates.

50
00:02:47,660 --> 00:02:51,430
So in this high-dimensional
space of inputs,

51
00:02:51,430 --> 00:02:54,850
we can imagine stretching
that input vector

52
00:02:54,850 --> 00:02:58,960
along different directions to
amplify certain directions that

53
00:02:58,960 --> 00:03:01,240
may be more important
than others.

54
00:03:01,240 --> 00:03:02,740
We talked about how
you can do that,

55
00:03:02,740 --> 00:03:06,350
stretch in arbitrary directions,
not just along the axes.

56
00:03:06,350 --> 00:03:10,390
And we talked about
how that vector of--

57
00:03:10,390 --> 00:03:14,510
that, sorry, matrix of weights
can produce a rotation.

58
00:03:14,510 --> 00:03:17,230
So we can have
some set of inputs

59
00:03:17,230 --> 00:03:20,140
where, let's say,
we have clusters

60
00:03:20,140 --> 00:03:22,187
of different input
values corresponding

61
00:03:22,187 --> 00:03:23,020
to different things.

62
00:03:23,020 --> 00:03:27,430
And you can rotate that
to put certain features

63
00:03:27,430 --> 00:03:29,380
in particular output neurons.

64
00:03:29,380 --> 00:03:31,240
So now you can
discriminate one class

65
00:03:31,240 --> 00:03:33,280
of objects from another
class of objects

66
00:03:33,280 --> 00:03:36,370
by looking at just
one dimension and not

67
00:03:36,370 --> 00:03:39,940
the whole
high-dimensional space.

68
00:03:39,940 --> 00:03:43,980
So today, we're going to look
at a new kind of network called

69
00:03:43,980 --> 00:03:45,910
a recurrent neural
network, where not

70
00:03:45,910 --> 00:03:50,650
only do we have inputs
to our output neurons

71
00:03:50,650 --> 00:03:54,400
from an input layer, but
we also have connections

72
00:03:54,400 --> 00:03:56,770
between the neurons
in the output layer.

73
00:03:56,770 --> 00:04:01,750
So these neurons in a recurrent
network talk to each other.

74
00:04:01,750 --> 00:04:07,160
And that imbues some really cool
properties onto these networks.

75
00:04:07,160 --> 00:04:10,090
So we're going to
develop the math

76
00:04:10,090 --> 00:04:11,890
and describe how
these things work

77
00:04:11,890 --> 00:04:15,100
to develop an intuition for
how recurrent networks respond

78
00:04:15,100 --> 00:04:16,180
to their inputs.

79
00:04:16,180 --> 00:04:20,019
We're going to get into
some of the computations

80
00:04:20,019 --> 00:04:22,330
that recurrent networks can do.

81
00:04:22,330 --> 00:04:26,980
They can act as amplifiers
in particular directions.

82
00:04:26,980 --> 00:04:30,250
They can act as integrators, so
they can accumulate information

83
00:04:30,250 --> 00:04:31,540
over time.

84
00:04:31,540 --> 00:04:33,430
They can generate sequences.

85
00:04:33,430 --> 00:04:35,620
They can act as
short-term memories

86
00:04:35,620 --> 00:04:39,170
of either continuous variables
or discrete variables.

87
00:04:39,170 --> 00:04:43,420
It's a very powerful kind
of circuit architecture.

88
00:04:43,420 --> 00:04:46,930
And on top of that, in order to
describe these mathematically,

89
00:04:46,930 --> 00:04:49,840
we're going to use all of
the linear algebra tools

90
00:04:49,840 --> 00:04:51,920
that we've been
developing so far.

91
00:04:51,920 --> 00:04:57,190
So, hopefully, a bunch of things
will kind of connect together.

92
00:04:57,190 --> 00:05:00,360
OK, so mathematical description
of recurrent networks.

93
00:05:00,360 --> 00:05:02,117
We're going to
talk about dynamics

94
00:05:02,117 --> 00:05:03,700
in these recurrent
networks, and we're

95
00:05:03,700 --> 00:05:06,460
going to start with
the very simplest kind

96
00:05:06,460 --> 00:05:09,100
of recurrent network
called an autapse network.

97
00:05:09,100 --> 00:05:13,360
Then we're going to extend
that to the general case

98
00:05:13,360 --> 00:05:16,450
of recurrent connectivity.

99
00:05:16,450 --> 00:05:18,520
And then we're going to
talk about how recurrent

100
00:05:18,520 --> 00:05:20,570
networks store memories.

101
00:05:20,570 --> 00:05:25,720
So we'll start talking about
a specific circuit models

102
00:05:25,720 --> 00:05:28,570
for storing short-term memories.

103
00:05:28,570 --> 00:05:33,280
And I'll touch on recurrent
networks for decision-making.

104
00:05:33,280 --> 00:05:38,620
And this will kind of lead
into the last few lectures

105
00:05:38,620 --> 00:05:41,920
of the class, where
we get into how

106
00:05:41,920 --> 00:05:45,610
sort of specific cases of
looking at how networks

107
00:05:45,610 --> 00:05:48,140
can store memories.

108
00:05:48,140 --> 00:05:50,500
OK, mathematical description.

109
00:05:50,500 --> 00:05:53,980
All right, so the first
thing that we need to do is--

110
00:05:53,980 --> 00:05:56,740
the really cool thing
about recurrent networks

111
00:05:56,740 --> 00:06:01,060
is that their activity
can evolve over time.

112
00:06:01,060 --> 00:06:05,500
So we need to talk about
dynamics, all right?

113
00:06:05,500 --> 00:06:08,170
The feed-forward networks
that we've been talking about,

114
00:06:08,170 --> 00:06:11,000
we just put in an input.

115
00:06:11,000 --> 00:06:14,210
It gets weighted by
synaptic strength,

116
00:06:14,210 --> 00:06:17,570
and we get a firing
rate in the output,

117
00:06:17,570 --> 00:06:19,160
just sort of instantaneously.

118
00:06:19,160 --> 00:06:21,500
We've been thinking
of you put an input,

119
00:06:21,500 --> 00:06:23,090
and you get an output.

120
00:06:23,090 --> 00:06:25,070
In general, neural
networks don't do that.

121
00:06:25,070 --> 00:06:27,740
You put an input, and
things change over time

122
00:06:27,740 --> 00:06:30,380
until you settle at
some output, maybe,

123
00:06:30,380 --> 00:06:33,830
or it starts doing something
interesting, all right?

124
00:06:33,830 --> 00:06:38,120
So the time course
of the activity

125
00:06:38,120 --> 00:06:39,770
becomes very
important, all right?

126
00:06:39,770 --> 00:06:43,180
So neurons don't respond
instantaneously to inputs.

127
00:06:43,180 --> 00:06:45,200
There are synaptic delays.

128
00:06:45,200 --> 00:06:48,140
There are integration
of membrane potential.

129
00:06:48,140 --> 00:06:49,970
Things change over time.

130
00:06:49,970 --> 00:06:53,300
And a specific example of
this that we saw in the past

131
00:06:53,300 --> 00:06:55,280
is that if you have
an input spike,

132
00:06:55,280 --> 00:06:59,480
you can produce a postsynaptic
current that jumps up abruptly

133
00:06:59,480 --> 00:07:01,910
as the synaptic
conductance turns on.

134
00:07:01,910 --> 00:07:04,670
And then the
synaptic conductance

135
00:07:04,670 --> 00:07:09,140
decays away as the
neurotransmitter unbinds

136
00:07:09,140 --> 00:07:10,670
from the neurotransmitter
receptor,

137
00:07:10,670 --> 00:07:15,160
and you get a synaptic current
that decays away over time, OK?

138
00:07:15,160 --> 00:07:19,720
So that's a simple kind of time
dependence that you would get.

139
00:07:19,720 --> 00:07:22,850
And that could lead
to time dependence

140
00:07:22,850 --> 00:07:26,000
in the firing rate
of the output neuron.

141
00:07:26,000 --> 00:07:28,190
OK, dendritic
propagation, membrane

142
00:07:28,190 --> 00:07:33,470
time constant, other examples
of how things can take time

143
00:07:33,470 --> 00:07:35,060
in a neural network.

144
00:07:35,060 --> 00:07:36,680
All right, so we're
going to model

145
00:07:36,680 --> 00:07:39,450
the firing rate of our output
neuron in the following way.

146
00:07:39,450 --> 00:07:42,170
If we have an input
firing rate that's zero

147
00:07:42,170 --> 00:07:46,400
and then steps up to some
constant and then steps down,

148
00:07:46,400 --> 00:07:51,740
we're going to model the output,
the firing rate of the output

149
00:07:51,740 --> 00:07:54,375
neuron, using exactly the
same kind of first order

150
00:07:54,375 --> 00:07:56,000
linear differential
equation that we've

151
00:07:56,000 --> 00:07:59,120
been using all along for
the membrane potential,

152
00:07:59,120 --> 00:08:00,817
for the Hodgkin-Huxley
gating variables.

153
00:08:00,817 --> 00:08:02,900
The same kind of differential
equation that you've

154
00:08:02,900 --> 00:08:04,808
seen over and over again.

155
00:08:04,808 --> 00:08:07,100
So that's the differential
equation we're going to use.

156
00:08:07,100 --> 00:08:11,330
We're going to say that the
time derivative of the firing

157
00:08:11,330 --> 00:08:14,210
rate of the output neuron
times the time constant

158
00:08:14,210 --> 00:08:17,960
is just equal to minus the
firing rate of the output

159
00:08:17,960 --> 00:08:19,740
non plus v infinity.

160
00:08:19,740 --> 00:08:22,697
And so you know that the
solution to this equation

161
00:08:22,697 --> 00:08:24,530
is that the firing rate
of the output neuron

162
00:08:24,530 --> 00:08:31,280
will just relax exponentially
to some new v infinity.

163
00:08:31,280 --> 00:08:34,520
And the v infinity
that we're going to use

164
00:08:34,520 --> 00:08:38,870
is just this non-linear function
times the weighted input

165
00:08:38,870 --> 00:08:41,870
to our neuron.

166
00:08:41,870 --> 00:08:47,260
So we're going to take the
formalism that we developed

167
00:08:47,260 --> 00:08:49,690
for our feed-forward
networks to say,

168
00:08:49,690 --> 00:08:51,460
what is the firing
rate of the output

169
00:08:51,460 --> 00:08:54,040
neuron as a function
of the inputs?

170
00:08:54,040 --> 00:08:56,380
And we're going to use
that firing rate that we've

171
00:08:56,380 --> 00:09:01,810
been using before as the
v infinity for our network

172
00:09:01,810 --> 00:09:03,220
with dynamics.

173
00:09:03,220 --> 00:09:04,330
Any questions about that?

174
00:09:07,150 --> 00:09:10,310
All right, so that becomes
our differential equation now

175
00:09:10,310 --> 00:09:14,390
for this recurrent
network, all right?

176
00:09:14,390 --> 00:09:17,510
So it's just a first order
linear differential equation,

177
00:09:17,510 --> 00:09:21,080
where the v infinity, the steady
state firing rate of the output

178
00:09:21,080 --> 00:09:25,520
neuron, is just this nonlinear
function times the weighted sum

179
00:09:25,520 --> 00:09:28,710
of all the inputs.

180
00:09:28,710 --> 00:09:31,460
All right, and actually, for
most of what we do today,

181
00:09:31,460 --> 00:09:35,160
we're going to just take
the case of a linear neuron.

182
00:09:35,160 --> 00:09:35,660
All right.

183
00:09:41,370 --> 00:09:42,630
So this I've already said.

184
00:09:42,630 --> 00:09:44,460
This I've already said.

185
00:09:44,460 --> 00:09:47,650
And actually, what I'm doing
here is just extending this.

186
00:09:47,650 --> 00:09:50,580
So this was the case for
a single output neuron

187
00:09:50,580 --> 00:09:52,080
and a single input neuron.

188
00:09:52,080 --> 00:09:54,300
What we're doing now is
we're just extending this

189
00:09:54,300 --> 00:09:58,290
to the case where we have
a vector of input neurons

190
00:09:58,290 --> 00:10:01,950
with a firing rate represented
by a firing rate vector u,

191
00:10:01,950 --> 00:10:04,260
and a vector of output
neurons with a fine rate

192
00:10:04,260 --> 00:10:08,310
vector v. And we're just going
to use this same differential

193
00:10:08,310 --> 00:10:11,220
equation, but we're going to
write it in vector notation.

194
00:10:11,220 --> 00:10:14,040
So each one of
these output neurons

195
00:10:14,040 --> 00:10:17,030
has an equation
like this, and we're

196
00:10:17,030 --> 00:10:21,255
going to combine them all
together into a single vector.

197
00:10:21,255 --> 00:10:22,130
Does that make sense?

198
00:10:25,570 --> 00:10:28,730
All right, so there
is our vector notation

199
00:10:28,730 --> 00:10:33,460
of the activity in
this recurrent network.

200
00:10:33,460 --> 00:10:37,880
Sorry, I forgot to put the
recurrent connections in there.

201
00:10:37,880 --> 00:10:42,200
So the time dependence
is really simple

202
00:10:42,200 --> 00:10:44,930
in this feed-forward
network, right?

203
00:10:44,930 --> 00:10:47,810
So in a feed-forward
network, the dynamics

204
00:10:47,810 --> 00:10:48,860
just look like this.

205
00:10:51,570 --> 00:10:53,330
But in a recurrent
network, this thing

206
00:10:53,330 --> 00:10:56,630
can get really interesting and
start doing interesting stuff.

207
00:10:56,630 --> 00:11:00,530
All right, so let's add
recurrent connections now

208
00:11:00,530 --> 00:11:06,075
and add these recurrent
connections to our equation.

209
00:11:08,910 --> 00:11:12,210
So in addition to
this weight matrix

210
00:11:12,210 --> 00:11:15,067
w that describes the
connections from the input

211
00:11:15,067 --> 00:11:16,650
layer to the output
layer, we're going

212
00:11:16,650 --> 00:11:19,470
to have another
weight matrix that

213
00:11:19,470 --> 00:11:22,500
describes the connections
between the neurons

214
00:11:22,500 --> 00:11:25,740
in the output layer.

215
00:11:25,740 --> 00:11:28,380
And this weight
matrix, of course,

216
00:11:28,380 --> 00:11:30,690
has to be able to
describe a connection

217
00:11:30,690 --> 00:11:35,040
from any one of these neurons
to any other of these neurons.

218
00:11:35,040 --> 00:11:37,260
And so this weight
matrix is going

219
00:11:37,260 --> 00:11:41,310
to be a function of the
postsynaptic neuron,

220
00:11:41,310 --> 00:11:42,840
the weight--

221
00:11:42,840 --> 00:11:45,210
the synaptic strength
is going to be

222
00:11:45,210 --> 00:11:49,380
a function of the postsynaptic
neuron and the presynaptic--

223
00:11:49,380 --> 00:11:51,220
the identity of the
postsynaptic neuron

224
00:11:51,220 --> 00:11:53,190
and the identity of
the presynaptic neuron.

225
00:11:53,190 --> 00:11:55,880
Does that make sense?

226
00:11:55,880 --> 00:11:58,310
OK, so there are
two kinds of input--

227
00:11:58,310 --> 00:12:03,630
a feed-forward input
from the input layer

228
00:12:03,630 --> 00:12:07,510
and a recurrent input due to
connections within the output

229
00:12:07,510 --> 00:12:08,010
layer.

230
00:12:13,340 --> 00:12:14,420
Any questions about that?

231
00:12:21,300 --> 00:12:24,840
OK, so there is the
equation now that

232
00:12:24,840 --> 00:12:29,760
describes the time rate
of change of the firing

233
00:12:29,760 --> 00:12:31,740
rates in the output layer.

234
00:12:31,740 --> 00:12:35,310
It's just this first order
linear differential equation.

235
00:12:35,310 --> 00:12:43,290
And the infinity is just
this non-linear function

236
00:12:43,290 --> 00:12:50,220
of the inputs, of the net input
to this neuron, to each neuron.

237
00:12:50,220 --> 00:12:53,190
And the net input to
this set of neurons

238
00:12:53,190 --> 00:12:57,090
is a contribution from
the feed-forward inputs,

239
00:12:57,090 --> 00:13:01,590
given by this weight matrix
w, and this contribution

240
00:13:01,590 --> 00:13:07,820
from the recurrent inputs,
given by this weight matrix, m.

241
00:13:07,820 --> 00:13:14,540
So that is the crux
of it, all right?

242
00:13:14,540 --> 00:13:21,320
So I want to make sure that
we understand where we are.

243
00:13:21,320 --> 00:13:23,916
Does anybody have any
questions about that?

244
00:13:23,916 --> 00:13:26,760
No?

245
00:13:26,760 --> 00:13:29,760
All right, then I'll push ahead.

246
00:13:29,760 --> 00:13:31,950
All right, so what is this?

247
00:13:31,950 --> 00:13:33,270
So we've seen this before.

248
00:13:33,270 --> 00:13:37,050
This product of
this weight matrix

249
00:13:37,050 --> 00:13:40,830
times this vector of
input firing rates

250
00:13:40,830 --> 00:13:42,550
just looks like this.

251
00:13:42,550 --> 00:13:49,800
You can see that the input to
this neuron, this first output

252
00:13:49,800 --> 00:13:54,420
neuron, is just the dot
product of these weights

253
00:13:54,420 --> 00:13:59,190
onto the first
neuron and the dot

254
00:13:59,190 --> 00:14:01,680
product of that vector of
weights, that row of the weight

255
00:14:01,680 --> 00:14:04,950
matrix, with the vector
of input firing rates.

256
00:14:07,650 --> 00:14:10,590
And the feed-forward
contribution to this neuron

257
00:14:10,590 --> 00:14:14,610
is just the dot product of that
row weight of this input weight

258
00:14:14,610 --> 00:14:20,770
matrix with the vector of
input firing rates, and so on.

259
00:14:20,770 --> 00:14:25,480
If we look at the recurrent
input to these neurons,

260
00:14:25,480 --> 00:14:28,630
the recurrent input
to this first neuron

261
00:14:28,630 --> 00:14:31,000
is just going to
be the dot product

262
00:14:31,000 --> 00:14:35,140
of this row of the
recurrent weight matrix

263
00:14:35,140 --> 00:14:39,250
and the vector of firing
rates in the output layer.

264
00:14:43,580 --> 00:14:46,210
The recurrent inputs
to the second neuron

265
00:14:46,210 --> 00:14:49,930
is going to be the dot product
of this row of the weight

266
00:14:49,930 --> 00:14:53,160
matrix and the vector
of firing rates.

267
00:14:56,130 --> 00:14:56,938
Yes?

268
00:14:56,938 --> 00:14:58,750
AUDIENCE: So I guess
I'm a little confused,

269
00:14:58,750 --> 00:15:01,890
because I thought it was
from A. Oh, to A. OK.

270
00:15:01,890 --> 00:15:04,410
MICHALE FEE: Yeah,
it's always post, pre.

271
00:15:04,410 --> 00:15:07,650
Post, pre in a weight matrix.

272
00:15:14,260 --> 00:15:16,060
That's because we're
usually writing

273
00:15:16,060 --> 00:15:20,890
down these vectors the way that
I'm defining this notation.

274
00:15:23,740 --> 00:15:31,990
This vector is a column
matrix, a column vector.

275
00:15:31,990 --> 00:15:37,840
All right, so we're going
to make one simplification

276
00:15:37,840 --> 00:15:39,880
to this.

277
00:15:39,880 --> 00:15:44,020
When we work with the
recurrent networks,

278
00:15:44,020 --> 00:15:47,230
we're usually going to
simplify this input.

279
00:15:47,230 --> 00:15:53,050
And rather than write down this
complex feed-forward component,

280
00:15:53,050 --> 00:15:55,820
writing this out as
this matrix product,

281
00:15:55,820 --> 00:15:59,950
we're just going to
simplify the math.

282
00:15:59,950 --> 00:16:04,090
And rather than carry
around this w times u,

283
00:16:04,090 --> 00:16:10,760
we're just going to replace
that with a vector of inputs

284
00:16:10,760 --> 00:16:12,700
onto each one of
those neurons, OK?

285
00:16:12,700 --> 00:16:17,380
So we're just going to pretend
that the input to this neuron

286
00:16:17,380 --> 00:16:21,377
is just coming
from one input, OK?

287
00:16:21,377 --> 00:16:22,960
And the input to
this neuron is coming

288
00:16:22,960 --> 00:16:24,610
from another single input.

289
00:16:24,610 --> 00:16:28,720
And so we're just going to
replace that feed-forward input

290
00:16:28,720 --> 00:16:30,820
onto this network
with this vector h.

291
00:16:33,660 --> 00:16:35,940
So that's the
equation that we're

292
00:16:35,940 --> 00:16:40,060
going to use moving
forward, all right?

293
00:16:40,060 --> 00:16:42,130
Just simplifies
things a little bit so

294
00:16:42,130 --> 00:16:45,050
we're not carrying
around this w u.

295
00:16:47,560 --> 00:16:50,860
So now, that's our
equation that we're

296
00:16:50,860 --> 00:16:54,350
going to use to describe
this recurrent network.

297
00:16:54,350 --> 00:16:57,137
This is a system of
coupled equations.

298
00:16:57,137 --> 00:16:57,970
What does that mean?

299
00:16:57,970 --> 00:17:01,540
You can see that the time
derivative of the firing

300
00:17:01,540 --> 00:17:05,349
rate of this first neuron
is given by a contribution

301
00:17:05,349 --> 00:17:08,920
from the input layer
and a contribution

302
00:17:08,920 --> 00:17:13,040
from other neurons
in the output layer.

303
00:17:13,040 --> 00:17:16,190
So the time rate of
change of this neuron

304
00:17:16,190 --> 00:17:18,950
depends on the activity
in all the other neurons

305
00:17:18,950 --> 00:17:20,050
in the network.

306
00:17:20,050 --> 00:17:21,800
And the time rate of
change in this neuron

307
00:17:21,800 --> 00:17:24,650
depends on the activity
of all the other neurons

308
00:17:24,650 --> 00:17:25,290
in the network.

309
00:17:25,290 --> 00:17:28,174
So that's a set of
coupled equations.

310
00:17:28,174 --> 00:17:30,390
And that, in general, can be--

311
00:17:30,390 --> 00:17:33,230
you know, it's not obvious,
when you look at it,

312
00:17:33,230 --> 00:17:35,360
what the solution is, all right?

313
00:17:35,360 --> 00:17:42,200
So we're going to develop the
tools to solve this equation

314
00:17:42,200 --> 00:17:46,640
and get some intuition about
how networks like this behave

315
00:17:46,640 --> 00:17:50,090
in response to their inputs.

316
00:17:50,090 --> 00:17:51,620
So the first thing
we're going to do

317
00:17:51,620 --> 00:17:58,800
is to simplify this network
to the case of linear neurons.

318
00:17:58,800 --> 00:18:01,810
So we don't have--

319
00:18:01,810 --> 00:18:04,080
so the neurons just fire.

320
00:18:04,080 --> 00:18:06,690
Their firing rate is just
linear with their input.

321
00:18:09,360 --> 00:18:12,750
And so that's the equation
for the linear case.

322
00:18:12,750 --> 00:18:14,400
All we've done is
we've just gotten rid

323
00:18:14,400 --> 00:18:16,830
of this non-linear function f.

324
00:18:19,470 --> 00:18:24,180
All right, so now let's
take a very simple case

325
00:18:24,180 --> 00:18:27,810
of a recurrent network
and use this equation

326
00:18:27,810 --> 00:18:29,970
to see how it
behaves, all right?

327
00:18:29,970 --> 00:18:34,080
So the simplest case
of a recurrent network

328
00:18:34,080 --> 00:18:39,120
is the case where the recurrent
connections within this layer

329
00:18:39,120 --> 00:18:41,100
are given by--

330
00:18:41,100 --> 00:18:43,980
the weight matrix is given
by a diagonal matrix.

331
00:18:43,980 --> 00:18:45,690
Now, what does
that correspond to?

332
00:18:45,690 --> 00:18:50,160
What that corresponds to is
this neuron making a connection

333
00:18:50,160 --> 00:18:56,340
onto itself with a synapse of
weight lambda one, right there.

334
00:18:56,340 --> 00:18:59,670
And that kind of
recurrent connection

335
00:18:59,670 --> 00:19:03,420
of a neuron onto itself
is called an autapse,

336
00:19:03,420 --> 00:19:06,770
like an auto synapse.

337
00:19:06,770 --> 00:19:08,760
And we're going to put
one of those autapses

338
00:19:08,760 --> 00:19:12,150
on each one of these
neurons in our output layer,

339
00:19:12,150 --> 00:19:15,460
in our recurrent layer.

340
00:19:15,460 --> 00:19:18,540
So now we can write
down the equation

341
00:19:18,540 --> 00:19:21,540
for this network, all right?

342
00:19:21,540 --> 00:19:26,150
And what we're going to
do is simply replace--

343
00:19:26,150 --> 00:19:28,260
sorry, let me just bring
up that equation again.

344
00:19:28,260 --> 00:19:30,030
Sorry, there's the equation.

345
00:19:30,030 --> 00:19:33,570
And we're simply going to
replace this weight matrix

346
00:19:33,570 --> 00:19:36,870
m, this recurrent weight matrix,
with that diagonal matrix

347
00:19:36,870 --> 00:19:40,510
that I just showed you.

348
00:19:40,510 --> 00:19:42,250
So there it is.

349
00:19:42,250 --> 00:19:45,820
So that time rate of change of
this vector of output neurons

350
00:19:45,820 --> 00:19:48,990
is just minus v plus this
diagonal matrix times

351
00:19:48,990 --> 00:19:51,480
[INAUDIBLE] plus the inputs.

352
00:19:55,570 --> 00:19:58,630
So now you can see
that if we write out

353
00:19:58,630 --> 00:20:03,130
the equation separately for each
one of these output neurons--

354
00:20:03,130 --> 00:20:06,210
so here it is in
vector notation.

355
00:20:06,210 --> 00:20:12,600
We can just write that out for
each one of our output neurons.

356
00:20:12,600 --> 00:20:14,880
So there's a separate
equation like this

357
00:20:14,880 --> 00:20:18,170
for each one of these neurons.

358
00:20:18,170 --> 00:20:20,570
But you can see that
these are all uncoupled.

359
00:20:20,570 --> 00:20:23,060
So we can understand how
this network responds just

360
00:20:23,060 --> 00:20:27,990
by studying this equation
for one of those neurons.

361
00:20:27,990 --> 00:20:29,000
OK, so let's do that.

362
00:20:29,000 --> 00:20:31,820
We have an independent equation.

363
00:20:31,820 --> 00:20:34,700
The firing rate change--

364
00:20:34,700 --> 00:20:37,820
the time derivative of the
firing rate of neuron one

365
00:20:37,820 --> 00:20:40,510
depends only on the
firing rate of neuron one.

366
00:20:40,510 --> 00:20:44,000
It doesn't depend on
any other neurons.

367
00:20:44,000 --> 00:20:45,790
As you can see,
it's not connected

368
00:20:45,790 --> 00:20:47,860
to any of the other neurons.

369
00:20:47,860 --> 00:20:50,420
OK, so let's write
this equation.

370
00:20:50,420 --> 00:20:53,420
And let's see what that
equation looks like.

371
00:20:53,420 --> 00:20:55,340
So we're going to rewrite
this a little bit.

372
00:20:55,340 --> 00:21:00,940
We're just going to factor
out the va all right here.

373
00:21:00,940 --> 00:21:05,574
This parameter,
1 minus lambda a,

374
00:21:05,574 --> 00:21:08,770
controls what kind of
solutions this equation has.

375
00:21:08,770 --> 00:21:11,793
And there are three different
cases that we need to consider.

376
00:21:11,793 --> 00:21:13,210
We need to consider
the case where

377
00:21:13,210 --> 00:21:17,350
1 minus lambda is greater
than zero, equal to zero,

378
00:21:17,350 --> 00:21:19,830
or less than zero.

379
00:21:19,830 --> 00:21:23,820
Those three different values of
that parameter 1 minus lambda

380
00:21:23,820 --> 00:21:26,963
give three different kinds of
solutions to this equation.

381
00:21:26,963 --> 00:21:28,380
We're going to
start with the case

382
00:21:28,380 --> 00:21:31,730
where lambda is less than one.

383
00:21:31,730 --> 00:21:35,020
And if lambda is less than
1, then this term right

384
00:21:35,020 --> 00:21:38,200
here is greater than zero.

385
00:21:38,200 --> 00:21:42,120
If we do that, then we
can rewrite this equation

386
00:21:42,120 --> 00:21:42,760
as follows.

387
00:21:42,760 --> 00:21:45,780
We're going to divide both
sides of this equation

388
00:21:45,780 --> 00:21:50,090
by 1 minus lambda, and
that's what we have here.

389
00:21:50,090 --> 00:21:52,970
And you can see that this
equation starts looking

390
00:21:52,970 --> 00:21:57,240
very familiar, very simple.

391
00:21:57,240 --> 00:22:00,560
We have a first order linear
differential equation, where

392
00:22:00,560 --> 00:22:04,790
we have a time constant here,
tau over 1 minus lambda,

393
00:22:04,790 --> 00:22:09,560
and a v infinity here, which is
the input, the effective input

394
00:22:09,560 --> 00:22:13,260
onto that neuron, divided
by 1 minus lambda.

395
00:22:13,260 --> 00:22:18,482
So that's tau dv dt equals
minus v plus v infinity.

396
00:22:21,260 --> 00:22:24,470
But now you can see
that the time constant

397
00:22:24,470 --> 00:22:28,380
and the v infinity
depend on lambda,

398
00:22:28,380 --> 00:22:35,450
depend on the strength of
that connection, all right?

399
00:22:35,450 --> 00:22:39,110
And the solution to that we've
seen before, to this equation.

400
00:22:39,110 --> 00:22:43,690
It's just exponential
relaxation toward v infinity.

401
00:22:43,690 --> 00:22:45,490
OK, so here's our v infinity.

402
00:22:45,490 --> 00:22:47,170
There's our tau.

403
00:22:47,170 --> 00:22:51,730
True for the case
of lambda between--

404
00:22:51,730 --> 00:22:55,210
let's just look at these
solutions for the case

405
00:22:55,210 --> 00:22:58,850
of lambda between zero and one.

406
00:22:58,850 --> 00:23:04,740
So I'm going to plot v as a
function of time when we have

407
00:23:04,740 --> 00:23:09,630
an input that goes from zero
and then steps up and then

408
00:23:09,630 --> 00:23:12,340
is held constant.

409
00:23:12,340 --> 00:23:15,180
All right, so let's look at
the case of lambda equals zero.

410
00:23:15,180 --> 00:23:18,540
So this lambda zero
means there's no autapse.

411
00:23:18,540 --> 00:23:21,280
It's just not connected.

412
00:23:21,280 --> 00:23:23,560
So you can see
that, in this case,

413
00:23:23,560 --> 00:23:24,970
the solution is very simple.

414
00:23:24,970 --> 00:23:29,100
It's just exponential relaxation
toward infinity. v infinity

415
00:23:29,100 --> 00:23:35,880
is just given by h, the
input, and tau is just

416
00:23:35,880 --> 00:23:39,540
the original tau,
1 minus 0, right?

417
00:23:39,540 --> 00:23:43,638
So it's just exponential
relaxation to h.

418
00:23:46,830 --> 00:23:47,810
That make sense?

419
00:23:51,100 --> 00:23:56,990
And it relaxes with a
time constant tau, tau m.

420
00:23:56,990 --> 00:23:59,480
We're going to now turn up
the synapse a little bit

421
00:23:59,480 --> 00:24:04,250
so that it has a
little bit of strength.

422
00:24:04,250 --> 00:24:08,200
You see that what happens
when lambda is 0.5,

423
00:24:08,200 --> 00:24:10,910
that v infinity gets bigger.

424
00:24:10,910 --> 00:24:12,400
v infinity goes to 2h.

425
00:24:12,400 --> 00:24:12,900
Why?

426
00:24:12,900 --> 00:24:16,010
Because it's h divided
by 1 minus 0.5.

427
00:24:16,010 --> 00:24:19,630
So it's h over 0.5, so 2h.

428
00:24:19,630 --> 00:24:21,310
And what happens to
the time constant?

429
00:24:21,310 --> 00:24:25,710
Well, it becomes two tau.

430
00:24:25,710 --> 00:24:28,800
All right, and if we make
lambda equal to 0.3--

431
00:24:28,800 --> 00:24:29,930
sorry, 0.66.

432
00:24:29,930 --> 00:24:31,340
We turn it up a little bit.

433
00:24:31,340 --> 00:24:36,600
You can see that the response
of this neuron gets even bigger.

434
00:24:36,600 --> 00:24:38,480
So you can see that
what's happening

435
00:24:38,480 --> 00:24:42,890
is that when we start
letting this neuron feed back

436
00:24:42,890 --> 00:24:47,970
to itself, positive feedback,
the response of the neuron

437
00:24:47,970 --> 00:24:51,020
to a fixed input--

438
00:24:51,020 --> 00:24:52,680
the input is the same
for all of those.

439
00:24:52,680 --> 00:24:55,380
The response of the
neuron gets bigger.

440
00:24:55,380 --> 00:24:59,130
And so having positive feedback
of that neuron onto itself

441
00:24:59,130 --> 00:25:02,130
through an autapse just
amplifies the response

442
00:25:02,130 --> 00:25:03,480
of this neuron to its input.

443
00:25:09,080 --> 00:25:11,930
Now, let's consider
the case where--

444
00:25:11,930 --> 00:25:14,630
so positive feedback
amplifies the response.

445
00:25:14,630 --> 00:25:16,190
And what also does it do?

446
00:25:16,190 --> 00:25:18,530
It slows the response down.

447
00:25:18,530 --> 00:25:21,590
The time constants are
getting longer, which

448
00:25:21,590 --> 00:25:23,570
means the response is slower.

449
00:25:27,305 --> 00:25:30,320
All right, let's look
at what happens when

450
00:25:30,320 --> 00:25:32,960
the lambdas are less than zero.

451
00:25:32,960 --> 00:25:37,085
What does lambda less than
zero correspond to here?

452
00:25:37,085 --> 00:25:37,960
AUDIENCE: [INAUDIBLE]

453
00:25:37,960 --> 00:25:41,470
MICHALE FEE: Yeah, which
is, in neurons, what

454
00:25:41,470 --> 00:25:43,294
does that correspond to?

455
00:25:43,294 --> 00:25:44,800
AUDIENCE: [INAUDIBLE]

456
00:25:44,800 --> 00:25:46,100
MICHALE FEE: Inhibition.

457
00:25:46,100 --> 00:25:48,685
So this neuron, when
you put an input in,

458
00:25:48,685 --> 00:25:50,980
it tries to activate the neuron.

459
00:25:50,980 --> 00:25:52,955
But that neuron inhibits itself.

460
00:25:52,955 --> 00:25:54,580
So what do you think's
going to happen?

461
00:25:54,580 --> 00:25:58,120
So positive feedback
made the response bigger.

462
00:25:58,120 --> 00:26:01,130
Here, the neuron is kind
of inhibiting itself.

463
00:26:01,130 --> 00:26:02,960
So what's going to happen?

464
00:26:02,960 --> 00:26:05,620
You put in that same
h that we had before,

465
00:26:05,620 --> 00:26:09,612
what's going to happen
when we have inhibition?

466
00:26:09,612 --> 00:26:11,070
AUDIENCE: Response
is [INAUDIBLE]..

467
00:26:11,070 --> 00:26:12,000
MICHALE FEE: What's that?

468
00:26:12,000 --> 00:26:12,900
AUDIENCE: The response
is going to be smaller.

469
00:26:12,900 --> 00:26:15,060
MICHALE FEE: The response will
just be smaller, that's right.

470
00:26:15,060 --> 00:26:16,030
So let's look at that.

471
00:26:16,030 --> 00:26:17,850
So here's firing
rate of this neuron

472
00:26:17,850 --> 00:26:20,820
is a function of time
for a step input.

473
00:26:20,820 --> 00:26:23,070
You can see for a
lambda equals zero,

474
00:26:23,070 --> 00:26:25,305
we're going to respond
with an amount h.

475
00:26:27,890 --> 00:26:29,590
But if we put in--

476
00:26:29,590 --> 00:26:30,920
in a time constant tau.

477
00:26:30,920 --> 00:26:33,740
If we put in a lambda
of negative one--

478
00:26:33,740 --> 00:26:36,080
that means you put
this input in--

479
00:26:36,080 --> 00:26:39,280
that neuron starts
inhibiting itself,

480
00:26:39,280 --> 00:26:42,130
and you can see the
response is smaller.

481
00:26:42,130 --> 00:26:44,290
But another thing
that's real interesting

482
00:26:44,290 --> 00:26:46,690
is that you can see that
the response of the neuron

483
00:26:46,690 --> 00:26:48,190
is actually faster.

484
00:26:52,100 --> 00:26:55,645
So if the feedback-- if
the lambda is minus one,

485
00:26:55,645 --> 00:27:00,350
you can see that v infinity
is h over 1 minus negative 1.

486
00:27:00,350 --> 00:27:02,860
So it's h over 2.

487
00:27:02,860 --> 00:27:03,760
All right, and so on.

488
00:27:03,760 --> 00:27:06,430
The more we turn up that
inhibition, the more

489
00:27:06,430 --> 00:27:09,070
suppressed the
neuron is, the weaker

490
00:27:09,070 --> 00:27:11,420
the response that
neuron is to its input,

491
00:27:11,420 --> 00:27:14,110
but the faster it is.

492
00:27:14,110 --> 00:27:17,500
So negative feedback suppresses
the response of the neuron

493
00:27:17,500 --> 00:27:19,300
and speeds up the response.

494
00:27:26,708 --> 00:27:28,750
OK, now, there's one other
really important thing

495
00:27:28,750 --> 00:27:32,860
about recurrent networks
in this regime, where

496
00:27:32,860 --> 00:27:36,610
this lambda is less than one.

497
00:27:36,610 --> 00:27:39,610
And that is that
the activity always

498
00:27:39,610 --> 00:27:43,080
relaxes back to zero when
you turn the input off.

499
00:27:43,080 --> 00:27:46,660
OK, so you put a step
input in, the neuron

500
00:27:46,660 --> 00:27:50,320
responds, relaxing exponentially
to sum of v infinity.

501
00:27:50,320 --> 00:27:53,820
But when you turn the
input off, the network

502
00:27:53,820 --> 00:27:56,040
relaxes back to zero, OK?

503
00:28:05,760 --> 00:28:10,080
So now let's go to
the more general case

504
00:28:10,080 --> 00:28:12,020
of recurrent connections.

505
00:28:12,020 --> 00:28:13,890
Oh, and first, I
just want to show you

506
00:28:13,890 --> 00:28:19,870
how we actually show graphically
how a neuron responds--

507
00:28:19,870 --> 00:28:22,820
sorry, how one of
these networks respond.

508
00:28:22,820 --> 00:28:25,050
And a typical way
that we do that is we

509
00:28:25,050 --> 00:28:29,430
plot the firing rate of one
neuron versus the firing

510
00:28:29,430 --> 00:28:31,260
rate of another neuron.

511
00:28:31,260 --> 00:28:34,510
That's called a
state-space trajectory.

512
00:28:34,510 --> 00:28:38,820
And we plot that response
as a function of time

513
00:28:38,820 --> 00:28:40,760
after we put in an input.

514
00:28:40,760 --> 00:28:44,100
So we can put an input in
described as some vector.

515
00:28:44,100 --> 00:28:47,670
So we put in some h1
and h2, and we then

516
00:28:47,670 --> 00:28:50,430
plot the response
of the neuron--

517
00:28:50,430 --> 00:28:54,750
the response of the network
in this output state space.

518
00:28:54,750 --> 00:28:57,400
So let me show you an example
of what that looks like.

519
00:28:57,400 --> 00:29:04,170
So here is the output
of this little network

520
00:29:04,170 --> 00:29:06,660
for different kinds of inputs.

521
00:29:06,660 --> 00:29:09,210
So Daniel made this nice
little movie for us.

522
00:29:12,170 --> 00:29:16,250
Here, you can see that if you
put an input into neuron one,

523
00:29:16,250 --> 00:29:17,420
neuron one responds.

524
00:29:17,420 --> 00:29:20,180
If you put a negative
input into neuron one,

525
00:29:20,180 --> 00:29:21,700
the neuron goes negative.

526
00:29:21,700 --> 00:29:25,140
If you put an input into neuron
two, the neuron responds.

527
00:29:25,140 --> 00:29:29,640
And if you put a negative input
into neuron two, it responds.

528
00:29:29,640 --> 00:29:33,680
Now, why did it respond
bigger in this direction than

529
00:29:33,680 --> 00:29:34,978
in this direction?

530
00:29:39,370 --> 00:29:42,320
AUDIENCE: That's [INAUDIBLE].

531
00:29:42,320 --> 00:29:43,070
MICHALE FEE: Good.

532
00:29:43,070 --> 00:29:47,858
Because neuron one had--

533
00:29:47,858 --> 00:29:48,830
AUDIENCE: Positive?

534
00:29:48,830 --> 00:29:51,000
MICHALE FEE: Positive feedback.

535
00:29:51,000 --> 00:29:53,330
And neuron two had
negative feedback.

536
00:29:53,330 --> 00:29:59,600
So neuron one, this neuron
one, amplified its input

537
00:29:59,600 --> 00:30:01,100
and gave a big response.

538
00:30:01,100 --> 00:30:05,400
Neuron two suppressed the
response to its input,

539
00:30:05,400 --> 00:30:06,650
and so it had a weak response.

540
00:30:12,750 --> 00:30:14,390
Let's look at another
interesting case.

541
00:30:14,390 --> 00:30:17,180
Let's put an input
into these neurons--

542
00:30:17,180 --> 00:30:20,090
not one at a time,
but simultaneously.

543
00:30:24,450 --> 00:30:28,440
So now we're going to put an
input into both neurons one

544
00:30:28,440 --> 00:30:29,760
and two simultaneously.

545
00:30:37,570 --> 00:30:38,550
It's like Spirograph.

546
00:30:38,550 --> 00:30:44,430
Did you guys play
with Spirograph?

547
00:30:44,430 --> 00:30:45,570
It's kind of weird, right?

548
00:30:45,570 --> 00:30:47,940
It's like making little
butterflies for spring.

549
00:30:52,140 --> 00:30:53,850
So why does the output--

550
00:30:53,850 --> 00:30:56,100
why does the response
of this neuron

551
00:30:56,100 --> 00:31:01,530
to an input, positive input to
both h1 and h2, look like this?

552
00:31:01,530 --> 00:31:04,680
Let's just break this down into
one of these little branches.

553
00:31:04,680 --> 00:31:05,910
We start at zero.

554
00:31:05,910 --> 00:31:09,420
We put an input into h1
and h2, and the response

555
00:31:09,420 --> 00:31:14,930
goes quickly like this and
then relaxes up to here.

556
00:31:14,930 --> 00:31:16,702
So why is that?

557
00:31:16,702 --> 00:31:18,190
Lena?

558
00:31:18,190 --> 00:31:23,646
AUDIENCE: [INAUDIBLE] so
there was [INAUDIBLE] and then

559
00:31:23,646 --> 00:31:27,012
because it's negative,
it's shorter.

560
00:31:27,012 --> 00:31:27,720
MICHALE FEE: Yup.

561
00:31:27,720 --> 00:31:30,965
The response in the v2
direction is weak but fast.

562
00:31:30,965 --> 00:31:31,590
AUDIENCE: Yeah.

563
00:31:31,590 --> 00:31:34,570
MICHALE FEE: So it
goes up quickly.

564
00:31:34,570 --> 00:31:37,680
And then the response
in the v1 direction is?

565
00:31:37,680 --> 00:31:39,060
AUDIENCE: Slow, but [INAUDIBLE].

566
00:31:39,060 --> 00:31:39,810
MICHALE FEE: Good.

567
00:31:39,810 --> 00:31:41,360
That's it.

568
00:31:41,360 --> 00:31:43,090
It's slow, but [AUDIO OUT].

569
00:31:43,090 --> 00:31:46,150
It's amplified in this
direction, suppressed

570
00:31:46,150 --> 00:31:46,900
in this direction.

571
00:31:46,900 --> 00:31:49,630
But the response is fast
this way and slow this way.

572
00:31:49,630 --> 00:31:51,310
So it traces this out.

573
00:31:51,310 --> 00:31:56,420
Now, when you turn the input
off, again, it relaxes.

574
00:31:56,420 --> 00:32:02,100
v2 relaxes quickly back to
zero, and v1 relaxes slowly

575
00:32:02,100 --> 00:32:02,800
back to zero.

576
00:32:02,800 --> 00:32:06,900
So it kind of traces out
this kind of hysteretic loop.

577
00:32:10,400 --> 00:32:13,900
It's not really hysteresis.

578
00:32:13,900 --> 00:32:15,820
Then it's exactly
mirror image when

579
00:32:15,820 --> 00:32:17,710
you put in a negative input.

580
00:32:17,710 --> 00:32:24,610
And when you put in h1
positive and v1 negative,

581
00:32:24,610 --> 00:32:28,210
it just looks like
a mirror image.

582
00:32:28,210 --> 00:32:30,470
All right, so any
questions about that?

583
00:32:30,470 --> 00:32:31,218
Yes, Lena?

584
00:32:31,218 --> 00:32:34,086
AUDIENCE: If there was nothing,
like no kind of amplified

585
00:32:34,086 --> 00:32:37,440
or [INAUDIBLE],, would it
just be like a [INAUDIBLE]??

586
00:32:37,440 --> 00:32:39,200
MICHALE FEE: Yeah,
so if you took out

587
00:32:39,200 --> 00:32:42,964
the recurrent connections, what
would what would it look like?

588
00:32:42,964 --> 00:32:44,200
AUDIENCE: An x?

589
00:32:44,200 --> 00:32:45,690
MICHALE FEE: Yeah, the output--

590
00:32:45,690 --> 00:32:50,060
so let's say that you just
literally set those to zero.

591
00:32:50,060 --> 00:32:58,130
Then the response will be
the identity matrix, right?

592
00:32:58,130 --> 00:33:00,570
You get the output as
a function of input.

593
00:33:00,570 --> 00:33:02,130
Let's just go back
to the equation.

594
00:33:02,130 --> 00:33:03,650
Can always, always
get the answer

595
00:33:03,650 --> 00:33:04,820
by looking at the equation.

596
00:33:10,330 --> 00:33:13,000
Too many animations.

597
00:33:13,000 --> 00:33:14,340
No, it's a very good question.

598
00:33:14,340 --> 00:33:14,910
Here we go.

599
00:33:14,910 --> 00:33:16,430
There it is right there.

600
00:33:16,430 --> 00:33:20,190
So you're asking about-- let's
just ask about the steady state

601
00:33:20,190 --> 00:33:21,080
response.

602
00:33:21,080 --> 00:33:23,540
So we can set dv
dt equal to zero.

603
00:33:23,540 --> 00:33:26,540
And you're asking, what is v?

604
00:33:26,540 --> 00:33:31,200
And you're saying, let's
set lambda to zero, right?

605
00:33:31,200 --> 00:33:35,130
We're going to set all these
diagonal elements to zero.

606
00:33:35,130 --> 00:33:37,740
And so now v equals h.

607
00:33:47,940 --> 00:33:49,350
OK, great question.

608
00:33:49,350 --> 00:33:54,390
Now, let's go to the case
of fully recurrent networks.

609
00:33:54,390 --> 00:33:57,330
We've been working with
this simplified case of just

610
00:33:57,330 --> 00:34:00,350
having neurons have autapses.

611
00:34:00,350 --> 00:34:03,290
And the reason we've been doing
that is because the answer

612
00:34:03,290 --> 00:34:06,380
you get for the autapse
kind of captures

613
00:34:06,380 --> 00:34:09,080
almost all the intuition
that you need to have.

614
00:34:09,080 --> 00:34:10,820
What we're going to
do is we're going

615
00:34:10,820 --> 00:34:14,270
to take a fully
recurrent neural network,

616
00:34:14,270 --> 00:34:17,150
and we're going to do a
mathematical trick that

617
00:34:17,150 --> 00:34:19,310
just turns it into
an autapse network.

618
00:34:22,340 --> 00:34:25,280
And the answer for the
fully recurrent network

619
00:34:25,280 --> 00:34:30,113
is just going to be just as
simple as what you saw here.

620
00:34:30,113 --> 00:34:31,280
All right, so let's do that.

621
00:34:31,280 --> 00:34:33,620
Let's take this fully
recurrent network.

622
00:34:33,620 --> 00:34:36,980
Our weight matrix m now,
instead of just having

623
00:34:36,980 --> 00:34:39,935
diagonal elements, also
has off-diagonal elements.

624
00:34:42,820 --> 00:34:44,820
And I'll say that one
of the things that we're

625
00:34:44,820 --> 00:34:47,580
going to do today is just
consider the simplest

626
00:34:47,580 --> 00:34:51,239
case of this fully
recurrent network, where

627
00:34:51,239 --> 00:34:55,889
the connections are symmetric,
where a connection from v1

628
00:34:55,889 --> 00:35:00,180
to v2 is equal to the connection
from v2 to v1, all right?

629
00:35:00,180 --> 00:35:04,050
We're going to do that
because that's the next thing

630
00:35:04,050 --> 00:35:06,150
to do to build our
intuition, and it's

631
00:35:06,150 --> 00:35:12,360
also mathematically simpler
than the fully general case, OK?

632
00:35:12,360 --> 00:35:15,390
So we saw how the
behavior of this network

633
00:35:15,390 --> 00:35:17,610
is very simple if m is diagonal.

634
00:35:20,273 --> 00:35:21,690
So what we're going
to do is we're

635
00:35:21,690 --> 00:35:26,030
going to take this
arbitrary matrix m,

636
00:35:26,030 --> 00:35:28,760
and we're going to
just make it diagonal.

637
00:35:28,760 --> 00:35:31,140
So let's do that.

638
00:35:31,140 --> 00:35:35,720
So we're going to rewrite
our weight matrix m as--

639
00:35:35,720 --> 00:35:46,910
so we're going to rewrite m
in this form, where this phi--

640
00:35:46,910 --> 00:35:52,210
sorry, where this lambda
is a diagonal matrix.

641
00:35:52,210 --> 00:35:54,340
So we're going to
take this network

642
00:35:54,340 --> 00:35:58,200
with recurrent connections
between different neurons

643
00:35:58,200 --> 00:36:03,780
in the network, and we're
going to transform it

644
00:36:03,780 --> 00:36:07,325
into sort of an equivalent
network that just has autapses.

645
00:36:11,310 --> 00:36:13,950
So how do we write
m in this form,

646
00:36:13,950 --> 00:36:17,310
with a rotation matrix
times a diagonal matrix

647
00:36:17,310 --> 00:36:19,870
times a rotation matrix?

648
00:36:19,870 --> 00:36:26,170
We just solve this
eigenvalue equation, OK?

649
00:36:26,170 --> 00:36:27,620
Does that make sense?

650
00:36:27,620 --> 00:36:29,630
We're just going to do
exactly the same thing

651
00:36:29,630 --> 00:36:36,570
we did in PCA, where we
find the covariance matrix.

652
00:36:36,570 --> 00:36:39,870
And we rewrote the
covariance matrix like this.

653
00:36:39,870 --> 00:36:42,300
Now we're going to
take a weight matrix

654
00:36:42,300 --> 00:36:46,830
of this recurrent
network, and we're

655
00:36:46,830 --> 00:36:51,090
going to rewrite it in
exactly the same way.

656
00:36:51,090 --> 00:36:55,110
So that process is called
diagonalizing the weight

657
00:36:55,110 --> 00:36:57,860
matrix.

658
00:36:57,860 --> 00:37:04,040
So the elements of lambda
here are the eigenvalues of m.

659
00:37:06,740 --> 00:37:12,320
And the columns of the phi
are the eigenvectors of m.

660
00:37:15,606 --> 00:37:22,590
And we're going to use these
quantities, these elements,

661
00:37:22,590 --> 00:37:27,450
to build a new network that
has the same properties

662
00:37:27,450 --> 00:37:32,400
as our recurrent network.

663
00:37:32,400 --> 00:37:34,740
So let me just show
you how we do that.

664
00:37:34,740 --> 00:37:38,190
So remember that what
this eigenvalue--

665
00:37:38,190 --> 00:37:43,470
this is an eigenvalue equation
written in matrix notation.

666
00:37:43,470 --> 00:37:51,340
What this means is this is set
of eigenvalues equations that

667
00:37:51,340 --> 00:37:54,888
have-- it's a set of
n eigenvalue equations

668
00:37:54,888 --> 00:37:56,430
like this, where
there's one of these

669
00:37:56,430 --> 00:37:58,280
for each neuron in the network.

670
00:37:58,280 --> 00:38:00,390
OK, so let me just
go through that.

671
00:38:00,390 --> 00:38:02,340
OK, so here's the
eigenvalue equation.

672
00:38:02,340 --> 00:38:08,180
If M is a symmetric matrix,
then the eigenvalues are real

673
00:38:08,180 --> 00:38:10,440
and phi is a rotation matrix.

674
00:38:10,440 --> 00:38:14,150
And the eigenvectors give us
an orthogonal basis, all right?

675
00:38:14,150 --> 00:38:16,450
So everybody remember this
from a few lectures ago?

676
00:38:19,090 --> 00:38:21,070
If M is symmetric--
and this is why

677
00:38:21,070 --> 00:38:23,420
we're going to, at
this point on, consider

678
00:38:23,420 --> 00:38:26,390
just the case where
M is symmetric,

679
00:38:26,390 --> 00:38:30,970
then the eigenvectors, the
columns of that matrix phi,

680
00:38:30,970 --> 00:38:37,600
give us an orthogonal set of
vectors and their unit vectors.

681
00:38:37,600 --> 00:38:41,710
So it satisfies this
orthonormal condition.

682
00:38:41,710 --> 00:38:45,010
And phi transpose phi is
an identity matrix, which

683
00:38:45,010 --> 00:38:48,670
means phi is a rotation matrix.

684
00:38:48,670 --> 00:38:51,670
OK, so now what we're
going to do is rewrite.

685
00:38:51,670 --> 00:38:53,950
The first thing we're going
to do to use this trick

686
00:38:53,950 --> 00:38:57,220
to rewrite our
matrix, our network,

687
00:38:57,220 --> 00:39:01,000
is to rewrite the
vector of firing rates v

688
00:39:01,000 --> 00:39:01,950
in this new basis.

689
00:39:01,950 --> 00:39:02,950
What are we going to do?

690
00:39:02,950 --> 00:39:07,120
Well take the vector and
all we're going to do

691
00:39:07,120 --> 00:39:11,170
is to rewrite that vector
in this new basis set.

692
00:39:11,170 --> 00:39:14,710
We're just going to do a change
of basis of our firing rate

693
00:39:14,710 --> 00:39:17,950
vector into a new
basis set that's

694
00:39:17,950 --> 00:39:20,170
given by the columns of phi.

695
00:39:23,507 --> 00:39:25,090
Another way of saying
it is that we're

696
00:39:25,090 --> 00:39:30,190
going to rotate this firing rate
vector v using the phi rotation

697
00:39:30,190 --> 00:39:32,170
matrix.

698
00:39:32,170 --> 00:39:35,620
So we're going to project v
onto each one of those new basis

699
00:39:35,620 --> 00:39:36,170
vectors.

700
00:39:36,170 --> 00:39:39,280
So there's v in
the standard basis.

701
00:39:39,280 --> 00:39:42,100
There's our new
basis, f1 and f2.

702
00:39:42,100 --> 00:39:45,160
We're going to project
v onto f1 and f2

703
00:39:45,160 --> 00:39:52,300
and write down that scalar
projection, c1 and c2.

704
00:39:52,300 --> 00:39:56,230
So we're going to write down
the scalar projection of v

705
00:39:56,230 --> 00:39:59,270
onto each one of
those basis vectors.

706
00:39:59,270 --> 00:40:01,870
So we can write
that c sub alpha--

707
00:40:01,870 --> 00:40:04,040
that's the alpha-th component--

708
00:40:04,040 --> 00:40:13,180
is just v dot the
alpha-th basis vector.

709
00:40:13,180 --> 00:40:16,510
So now we can express v
as a linear combination

710
00:40:16,510 --> 00:40:18,010
in this new basis.

711
00:40:21,240 --> 00:40:26,050
So it's c1 times f1 plus
c2 times f2 plus c3--

712
00:40:26,050 --> 00:40:27,870
that's supposed to be a three--

713
00:40:27,870 --> 00:40:29,360
times f3 and so on.

714
00:40:32,860 --> 00:40:35,980
And of course, remember,
we're doing all of this

715
00:40:35,980 --> 00:40:38,570
because we want to
understand the dynamics.

716
00:40:38,570 --> 00:40:40,910
So these things
are time dependent.

717
00:40:40,910 --> 00:40:45,100
So v is v changes in time.

718
00:40:45,100 --> 00:40:48,290
We're not going to be changing
our basis vectors in time.

719
00:40:48,290 --> 00:40:50,260
So if we want to write
down a time dependent v,

720
00:40:50,260 --> 00:40:51,940
it's really these
coefficients that

721
00:40:51,940 --> 00:40:56,445
are changing in time, right?

722
00:40:56,445 --> 00:40:59,170
Does that make sense?

723
00:40:59,170 --> 00:41:03,742
So we can now write our vector
v, our firing rate vector,

724
00:41:03,742 --> 00:41:09,750
as a sum of contributions in
all these different directions

725
00:41:09,750 --> 00:41:11,070
corresponding to the new basis.

726
00:41:14,280 --> 00:41:16,170
And each one of
those coefficients, c

727
00:41:16,170 --> 00:41:20,680
is just the time dependent
v projected onto one

728
00:41:20,680 --> 00:41:21,940
of those basis vectors.

729
00:41:28,450 --> 00:41:30,320
And questions?

730
00:41:30,320 --> 00:41:32,050
No?

731
00:41:32,050 --> 00:41:33,410
OK.

732
00:41:33,410 --> 00:41:39,140
And remember, we can write
that in matrix notation using

733
00:41:39,140 --> 00:41:42,490
this formalism that we developed
in the lecture on basis sets.

734
00:41:42,490 --> 00:41:47,570
So v is just phi c, and c
is just phi transpose v.

735
00:41:47,570 --> 00:41:49,640
So we're just taking
this vector v,

736
00:41:49,640 --> 00:41:52,220
and we're rotating it
into a new basis set,

737
00:41:52,220 --> 00:41:53,720
and we can rotate it back.

738
00:41:56,433 --> 00:41:58,100
All right, so now
what we're going to do

739
00:41:58,100 --> 00:42:03,390
is we're going to take this v
expressed in this new basis set

740
00:42:03,390 --> 00:42:08,870
and were going to rewrite our
equation in that new basis set.

741
00:42:11,770 --> 00:42:12,500
Watch this.

742
00:42:12,500 --> 00:42:14,560
This is so cool.

743
00:42:14,560 --> 00:42:16,145
All right, you ready?

744
00:42:16,145 --> 00:42:18,520
We're going to take this, and
we're to plug it into here.

745
00:42:22,140 --> 00:42:28,170
So dv dt is phi dc dt.

746
00:42:28,170 --> 00:42:31,786
V is just phi c.

747
00:42:31,786 --> 00:42:36,550
v is phi c, and
h doesn't change.

748
00:42:36,550 --> 00:42:40,645
So now what is that?

749
00:42:45,270 --> 00:42:48,215
Do you remember?

750
00:42:48,215 --> 00:42:49,610
AUDIENCE: Phi [INAUDIBLE].

751
00:42:49,610 --> 00:42:50,600
MICHALE FEE: Right.

752
00:42:50,600 --> 00:42:57,440
We got phi as the solution
to the eigenvalue equation.

753
00:42:57,440 --> 00:43:00,010
What was the
eigenvalue equation?

754
00:43:00,010 --> 00:43:06,150
The eigenvalue equation was
m phi equals phi lambda.

755
00:43:06,150 --> 00:43:09,870
So the phi here,
this rotation matrix,

756
00:43:09,870 --> 00:43:13,980
is the solution to this
equation, all right?

757
00:43:13,980 --> 00:43:18,870
So we're given m,
and we're saying

758
00:43:18,870 --> 00:43:20,950
we're going to find
a phi and a lambda

759
00:43:20,950 --> 00:43:26,110
such that we can write m
phi is equal to phi lambda.

760
00:43:26,110 --> 00:43:32,220
So when we take that matrix m
and we run eig on it in Matlab,

761
00:43:32,220 --> 00:43:37,150
Matlab sends us back a phi and
a lambda such that this equation

762
00:43:37,150 --> 00:43:37,650
is true.

763
00:43:41,120 --> 00:43:43,805
So literally, we can
take the weight matrix

764
00:43:43,805 --> 00:43:47,860
m stick it into Matlab,
and get a phi and a lambda

765
00:43:47,860 --> 00:43:51,790
such that m phi is
equal to phi lambda.

766
00:43:51,790 --> 00:43:59,020
So m phi is equal to what?

767
00:43:59,020 --> 00:43:59,800
Phi lambda.

768
00:44:04,020 --> 00:44:06,810
That becomes this.

769
00:44:06,810 --> 00:44:11,350
Now, all of a sudden, this
thing is just going to simplify.

770
00:44:14,480 --> 00:44:16,265
So how would we
simplify this equation?

771
00:44:19,270 --> 00:44:23,080
We can get rid of all of these
things, all of these phi's,

772
00:44:23,080 --> 00:44:24,220
by doing what?

773
00:44:24,220 --> 00:44:25,555
How do you get rid of phi's?

774
00:44:25,555 --> 00:44:27,430
AUDIENCE: Multiply
[INAUDIBLE] phi transpose.

775
00:44:27,430 --> 00:44:29,710
MICHALE FEE: You multiply
by phi transpose, exactly.

776
00:44:29,710 --> 00:44:32,470
So we're going to multiply
each term in this equation

777
00:44:32,470 --> 00:44:35,760
by phi transpose.

778
00:44:35,760 --> 00:44:37,860
So what do you have?

779
00:44:37,860 --> 00:44:42,990
Phi transpose phi, phi transpose
phi, phi transpose phi.

780
00:44:42,990 --> 00:44:46,780
What is phi transpose
phi equal to?

781
00:44:46,780 --> 00:44:48,700
The identity matrix.

782
00:44:48,700 --> 00:44:51,220
Because it's a rotation
matrix, phi transpose

783
00:44:51,220 --> 00:44:54,730
is just the inverse of phi.

784
00:44:54,730 --> 00:44:58,550
So phi inverse phi is just
equal to the identity matrix.

785
00:44:58,550 --> 00:45:00,680
And all those things disappear.

786
00:45:00,680 --> 00:45:02,800
And you're left
with this equation--

787
00:45:02,800 --> 00:45:09,370
tau dc dt equals minus c
plus lambda c plus h, hf.

788
00:45:09,370 --> 00:45:10,450
And what is hf?

789
00:45:10,450 --> 00:45:13,580
hf is just h rotated
into the new basis set.

790
00:45:16,370 --> 00:45:20,980
So this is the equation
for a recurrent network

791
00:45:20,980 --> 00:45:28,120
with just autapses,
which we just understood.

792
00:45:28,120 --> 00:45:30,610
We just wrote down what
the solution is, right?

793
00:45:30,610 --> 00:45:33,130
And we plotted it for
different values of lambda.

794
00:45:40,380 --> 00:45:44,260
So now let's just look at
what some of these look like.

795
00:45:44,260 --> 00:45:52,360
So we've rewritten our weight
matrix in a new basis set.

796
00:45:52,360 --> 00:45:55,540
We've rebuilt our network
and a new basis set,

797
00:45:55,540 --> 00:45:59,860
in a rotated basis set
where everything simplifies.

798
00:45:59,860 --> 00:46:02,380
So we've taken this
complicated network

799
00:46:02,380 --> 00:46:07,540
with recurrent connections
and we've rewritten it

800
00:46:07,540 --> 00:46:10,600
in a new network, where
each of these neurons

801
00:46:10,600 --> 00:46:13,480
in our new network
corresponds to what's

802
00:46:13,480 --> 00:46:18,820
called a mode of the
fully recurrent network.

803
00:46:22,200 --> 00:46:28,850
So the activities c alpha c1
and c2 of the network modes

804
00:46:28,850 --> 00:46:33,770
represent kind of an activity
in a linear combination

805
00:46:33,770 --> 00:46:35,180
of these neurons.

806
00:46:35,180 --> 00:46:40,360
So we're going to go
through what that means now.

807
00:46:40,360 --> 00:46:42,970
So the first thing I want
to do is just calculate

808
00:46:42,970 --> 00:46:46,960
what the steady state
response is in this neuron.

809
00:46:46,960 --> 00:46:48,770
And I'll just do
it mathematically,

810
00:46:48,770 --> 00:46:51,550
and then I'll show you what
it looks like graphically.

811
00:46:54,400 --> 00:46:56,320
So there's our original
network equation.

812
00:46:56,320 --> 00:47:00,380
We've rewritten it a set
of differential equations

813
00:47:00,380 --> 00:47:03,320
for the modes of this network.

814
00:47:06,470 --> 00:47:10,270
I'm just rewriting this
by putting an I here,

815
00:47:10,270 --> 00:47:12,040
minus I times c.

816
00:47:12,040 --> 00:47:14,150
That's the only
change I made here.

817
00:47:14,150 --> 00:47:15,475
I just rewrote it like this.

818
00:47:20,450 --> 00:47:21,670
Let's find a steady state.

819
00:47:21,670 --> 00:47:24,760
So we're going to set
dc dt equal to zero.

820
00:47:24,760 --> 00:47:28,850
We're going to ask, what
is c in steady state?

821
00:47:28,850 --> 00:47:33,310
So we're going to call
that c infinity, all right?

822
00:47:33,310 --> 00:47:37,520
I minus lambda times c infinity
equals phi transpose h.

823
00:47:37,520 --> 00:47:38,740
OK, don't panic.

824
00:47:38,740 --> 00:47:41,480
It's all going to be
very simple in a second.

825
00:47:41,480 --> 00:47:47,230
c infinity is just I minus
lambda inverse phi transpose h.

826
00:47:47,230 --> 00:47:49,300
But I is diagonal.

827
00:47:49,300 --> 00:47:50,560
Lambda is diagonal.

828
00:47:50,560 --> 00:47:53,730
So I minus lambda
inverse is just the--

829
00:47:53,730 --> 00:47:58,600
it's a diagonal matrix with
these elements with one

830
00:47:58,600 --> 00:48:00,145
over all those
diagonal elements.

831
00:48:04,290 --> 00:48:06,870
Now let's calculate v
infinity. v infinity

832
00:48:06,870 --> 00:48:09,430
is just phi times v infinity.

833
00:48:09,430 --> 00:48:12,390
So here, we're multiplying
on the left by phi.

834
00:48:12,390 --> 00:48:14,710
That's just v infinity.

835
00:48:14,710 --> 00:48:16,750
So v infinity is just this.

836
00:48:16,750 --> 00:48:18,330
So what is this?

837
00:48:18,330 --> 00:48:21,960
This just says v
infinity is some matrix--

838
00:48:21,960 --> 00:48:23,940
it's a rotated stretch matrix--

839
00:48:23,940 --> 00:48:25,170
times the input.

840
00:48:25,170 --> 00:48:30,500
So v infinity is just
this matrix times h.

841
00:48:30,500 --> 00:48:32,050
And now let's look
at what that is.

842
00:48:34,580 --> 00:48:37,610
v infinity is a matrix times h.

843
00:48:37,610 --> 00:48:39,830
We're going to call that g.

844
00:48:39,830 --> 00:48:42,600
v infinity is a gain matrix.

845
00:48:42,600 --> 00:48:45,270
We're going to think of that
as a gain times the input.

846
00:48:45,270 --> 00:48:50,800
So it's just a matrix
operation on the input.

847
00:48:50,800 --> 00:48:55,390
This matrix has exactly
the same eigenvectors as m.

848
00:48:55,390 --> 00:48:59,290
And the eigenvalues are
just 1 over 1 minus lambda.

849
00:49:01,870 --> 00:49:03,350
Hang in there.

850
00:49:03,350 --> 00:49:07,060
So what this means is that
if an input is parallel

851
00:49:07,060 --> 00:49:09,940
to one of the eigenvectors
of the weight matrix,

852
00:49:09,940 --> 00:49:12,520
that means the output is
parallel to the input.

853
00:49:16,640 --> 00:49:19,240
So if the input is
in the direction

854
00:49:19,240 --> 00:49:25,720
of one of the eigenvectors,
v infinity is g times f.

855
00:49:25,720 --> 00:49:28,651
But g times f--

856
00:49:28,651 --> 00:49:31,310
f is an eigenvector
v. And what that means

857
00:49:31,310 --> 00:49:35,900
is that v infinity is parallel
to f with a scaling factor

858
00:49:35,900 --> 00:49:39,310
1 over 1 minus lambda.

859
00:49:39,310 --> 00:49:39,810
All right?

860
00:49:39,810 --> 00:49:41,030
So hang in there.

861
00:49:41,030 --> 00:49:43,720
I'm going to show you
what this looks like.

862
00:49:43,720 --> 00:49:48,480
So in steady state, the output
will be parallel to the input

863
00:49:48,480 --> 00:49:50,490
if the input is in
the direction of one

864
00:49:50,490 --> 00:49:52,950
of the eigenvectors
of the network.

865
00:49:57,610 --> 00:50:00,750
So if the input is in
the direction of one

866
00:50:00,750 --> 00:50:02,370
of the eigenvectors
of the network,

867
00:50:02,370 --> 00:50:07,770
that means you're activating
only one mode of the network.

868
00:50:07,770 --> 00:50:11,155
And only that one mode responds,
and none of the other modes

869
00:50:11,155 --> 00:50:11,655
respond.

870
00:50:15,840 --> 00:50:17,760
The response of
the network will be

871
00:50:17,760 --> 00:50:20,340
in the direction of
that input, and it

872
00:50:20,340 --> 00:50:24,480
will be amplified or
suppressed by this gain factor.

873
00:50:24,480 --> 00:50:28,260
And the time constant will
also be increased or decreased

874
00:50:28,260 --> 00:50:30,350
by that factor.

875
00:50:30,350 --> 00:50:32,370
So now let's look at--
so I just kind of whizzed

876
00:50:32,370 --> 00:50:33,370
through a bunch of math.

877
00:50:33,370 --> 00:50:36,400
Let's look at what this
looks like graphically

878
00:50:36,400 --> 00:50:39,050
for a few simple cases.

879
00:50:39,050 --> 00:50:41,440
And then I think it will
become much more clear.

880
00:50:41,440 --> 00:50:43,600
Let's just look at
a simple network,

881
00:50:43,600 --> 00:50:47,740
where we have two neurons
with an excitatory connection

882
00:50:47,740 --> 00:50:51,520
from neuron one to neuron
two, an excitatory connection

883
00:50:51,520 --> 00:50:53,440
from neuron two to neuron one.

884
00:50:53,440 --> 00:50:56,640
And we're going to
make that weight 0.8.

885
00:50:56,640 --> 00:51:00,220
OK, so what is the weight
matrix M look like?

886
00:51:00,220 --> 00:51:03,409
Just tell me what the
entries are for M.

887
00:51:03,409 --> 00:51:05,630
AUDIENCE: Does it
not have the autapse?

888
00:51:05,630 --> 00:51:09,370
MICHALE FEE: No, so
there's no connection

889
00:51:09,370 --> 00:51:13,450
of any of these neurons
onto themselves.

890
00:51:13,450 --> 00:51:15,640
AUDIENCE: So you have,
like, zeros on the diagonal.

891
00:51:15,640 --> 00:51:17,098
MICHALE FEE: Zeros
on the diagonal.

892
00:51:17,098 --> 00:51:18,080
Good.

893
00:51:18,080 --> 00:51:19,840
AUDIENCE: All the diagonals.

894
00:51:19,840 --> 00:51:20,720
MICHALE FEE: Good.

895
00:51:20,720 --> 00:51:22,120
Like that?

896
00:51:22,120 --> 00:51:23,030
Good.

897
00:51:23,030 --> 00:51:26,510
Connection from neuron
one to itself is zero.

898
00:51:26,510 --> 00:51:32,330
The connection from
post, pre is row, column.

899
00:51:32,330 --> 00:51:37,220
So onto neuron one
from neuron two is 0.8.

900
00:51:37,220 --> 00:51:40,460
Onto neuron two from
neuron one is 0.8.

901
00:51:40,460 --> 00:51:43,310
And neuron two onto
neuron two is zero.

902
00:51:46,400 --> 00:51:51,070
So now we are just going to
diagonalize this weight matrix.

903
00:51:51,070 --> 00:51:58,660
We're going to find the
eigenvectors and eigenvalues.

904
00:51:58,660 --> 00:52:02,380
The eigenvectors are
the columns of phi.

905
00:52:02,380 --> 00:52:04,865
And the eigenvalues are the
diagonal elements of lambda.

906
00:52:08,140 --> 00:52:10,720
Let's take a look at what
those eigenvectors are.

907
00:52:10,720 --> 00:52:13,860
So this vector here is f1.

908
00:52:13,860 --> 00:52:16,370
This vector here is
another eigenvector, f2.

909
00:52:19,780 --> 00:52:20,785
And how did I get this?

910
00:52:24,260 --> 00:52:26,993
How did I get this from this?

911
00:52:26,993 --> 00:52:27,910
How would you do that?

912
00:52:27,910 --> 00:52:32,044
If I gave you this matrix,
how would you find phi?

913
00:52:32,044 --> 00:52:33,940
AUDIENCE: Eig M.

914
00:52:33,940 --> 00:52:37,580
MICHALE FEE: Good,
eig of M. Now,

915
00:52:37,580 --> 00:52:39,700
remember in the
last lecture when

916
00:52:39,700 --> 00:52:45,350
we were talking about some
simple cases of matrices

917
00:52:45,350 --> 00:52:49,240
that are really easy to
find the eigenvectors of?

918
00:52:49,240 --> 00:52:53,350
If you have a symmetric matrix,
where the diagonal elements are

919
00:52:53,350 --> 00:52:56,260
equal to each other,
the eigenvectors

920
00:52:56,260 --> 00:53:01,070
are always 45 degrees
here and 45 degrees there.

921
00:53:01,070 --> 00:53:07,000
And the eigenvalues are just the
diagonal elements plus or minus

922
00:53:07,000 --> 00:53:08,270
the off-diagonal elements.

923
00:53:08,270 --> 00:53:15,460
So the eigenvalues here
are 0.8 and minus 0.8.

924
00:53:15,460 --> 00:53:22,350
All right, so those are the two
eigenvectors of this matrix,

925
00:53:22,350 --> 00:53:23,055
of this network.

926
00:53:25,830 --> 00:53:29,860
Those are the modes
of the network.

927
00:53:29,860 --> 00:53:34,410
Notice that one of the modes
corresponds to neuron one

928
00:53:34,410 --> 00:53:37,560
and neuron two firing together.

929
00:53:37,560 --> 00:53:40,650
The other mode corresponds
to neuron one and neuron

930
00:53:40,650 --> 00:53:43,200
two firing with opposite sign--

931
00:53:46,680 --> 00:53:50,020
minus one, one.

932
00:53:50,020 --> 00:53:54,550
So the lambda-- the diagonal
elements of the lambda matrix

933
00:53:54,550 --> 00:53:56,120
are the eigenvalues.

934
00:53:56,120 --> 00:54:04,410
They're 0.8 and minus
0.8, a plus or minus b.

935
00:54:04,410 --> 00:54:07,500
Now, this gain
factor, what this says

936
00:54:07,500 --> 00:54:11,960
is that if I have an input
in the direction of f1,

937
00:54:11,960 --> 00:54:14,650
the response is going to
be amplified by a gain.

938
00:54:14,650 --> 00:54:17,720
And remember, we just derived,
on the previous slide,

939
00:54:17,720 --> 00:54:20,990
that that gain factor
is just 1 over 1

940
00:54:20,990 --> 00:54:26,360
minus the eigenvalue
for that eigenvector.

941
00:54:26,360 --> 00:54:34,270
In this case, the eigenvalue
for mode one is 0.8.

942
00:54:34,270 --> 00:54:38,680
So 1 over 1 minus 0.8 is 5.

943
00:54:38,680 --> 00:54:43,180
So the gain in this
direction is 5.

944
00:54:43,180 --> 00:54:47,650
The gain for an input
in this direction

945
00:54:47,650 --> 00:54:56,380
is 1 over 1 minus negative
0.8, which is 1 over 1.8.

946
00:54:56,380 --> 00:54:58,640
Does that makes sense?

947
00:54:58,640 --> 00:55:00,370
OK, let's keep going,
because I think

948
00:55:00,370 --> 00:55:01,960
it will make even
more sense once we

949
00:55:01,960 --> 00:55:04,195
see how the network
responds to its inputs.

950
00:55:10,910 --> 00:55:12,780
So zero input.

951
00:55:12,780 --> 00:55:16,220
Now we're going to put an input
in the direction of this mode

952
00:55:16,220 --> 00:55:16,870
one.

953
00:55:16,870 --> 00:55:20,890
And you can see the
mode responds a lot.

954
00:55:20,890 --> 00:55:23,000
Put a negative input
in, it responds a lot.

955
00:55:23,000 --> 00:55:27,910
If we put a mode input in this
direction or this direction,

956
00:55:27,910 --> 00:55:37,340
the response is suppressed
by an amount of about 0.5.

957
00:55:37,340 --> 00:55:39,440
Because here, the gain is small.

958
00:55:39,440 --> 00:55:41,360
Here, the gain is big.

959
00:55:41,360 --> 00:55:43,414
So you see what's happening?

960
00:55:43,414 --> 00:55:50,070
This network looks just
like an autapse network,

961
00:55:50,070 --> 00:55:53,910
but where we've taken this
input and output space and just

962
00:55:53,910 --> 00:56:00,410
rotated it into a new coordinate
system, into this new basis.

963
00:56:00,410 --> 00:56:00,969
Yes?

964
00:56:00,969 --> 00:56:02,636
AUDIENCE: Why did it
kind of loop around

965
00:56:02,636 --> 00:56:04,650
on the one side [INAUDIBLE]?

966
00:56:04,650 --> 00:56:08,970
MICHALE FEE: OK, it's because
these things are relaxing

967
00:56:08,970 --> 00:56:10,710
exponentially back to zero.

968
00:56:10,710 --> 00:56:12,630
And we got a little
bit impatient

969
00:56:12,630 --> 00:56:16,560
and started the next input
before it had quite gone away.

970
00:56:16,560 --> 00:56:19,320
OK, good question.

971
00:56:19,320 --> 00:56:21,750
It's just that if you really
wait for a long time for it

972
00:56:21,750 --> 00:56:24,240
to settle, then the movie
just takes a long time.

973
00:56:24,240 --> 00:56:26,850
But maybe it would
be better to do that.

974
00:56:26,850 --> 00:56:30,510
So input this way and this
way lead to a large response,

975
00:56:30,510 --> 00:56:35,280
because those inputs activate
mode one, which has a big gain.

976
00:56:35,280 --> 00:56:38,540
Inputs in this direction
and this direction

977
00:56:38,540 --> 00:56:41,450
have a small response,
because they activate

978
00:56:41,450 --> 00:56:46,150
mode two, which has small gain.

979
00:56:46,150 --> 00:56:51,730
But notice that when
you activate mode one--

980
00:56:51,730 --> 00:56:54,230
when you put an input
in this direction,

981
00:56:54,230 --> 00:56:58,000
it only activates mode one.

982
00:56:58,000 --> 00:57:01,060
And it doesn't activate
mode two at all.

983
00:57:01,060 --> 00:57:03,830
If you put an input
in this direction,

984
00:57:03,830 --> 00:57:06,070
then it only activates
mode two, and it doesn't

985
00:57:06,070 --> 00:57:07,780
activate mode one at all.

986
00:57:11,220 --> 00:57:15,360
So it's just like the
autapse network, but rotated.

987
00:57:18,490 --> 00:57:27,830
So now let's do the case
where we have an input that

988
00:57:27,830 --> 00:57:29,840
activates both modes.

989
00:57:29,840 --> 00:57:33,300
So let's say we put an
input in this direction.

990
00:57:33,300 --> 00:57:37,730
What does that direction
correspond to h up.

991
00:57:37,730 --> 00:57:41,330
What is that input mean
here in terms of h1 and h2?

992
00:57:46,560 --> 00:57:49,590
Let's say we just put
an input-- remember,

993
00:57:49,590 --> 00:57:55,050
this is a plot on
axes h1 versus h2.

994
00:57:55,050 --> 00:57:57,570
So this input
vector h corresponds

995
00:57:57,570 --> 00:58:04,220
to just putting an input
on h2, into this neuron.

996
00:58:04,220 --> 00:58:08,120
So you can see that when we
put an input in this direction,

997
00:58:08,120 --> 00:58:09,500
we're activating--

998
00:58:09,500 --> 00:58:14,210
that input has a projection
onto mode one and mode two.

999
00:58:14,210 --> 00:58:16,070
So we're activating both modes.

1000
00:58:19,200 --> 00:58:23,280
You can see that the
input h has a projection

1001
00:58:23,280 --> 00:58:27,900
onto f1 and projection onto f2.

1002
00:58:27,900 --> 00:58:28,860
So what you do is--

1003
00:58:34,090 --> 00:58:36,340
well, here, I'm just showing
you what the steady state

1004
00:58:36,340 --> 00:58:39,490
response is mathematically.

1005
00:58:39,490 --> 00:58:42,280
Let me just show you
what that looks like.

1006
00:58:42,280 --> 00:58:46,250
What this says is that if we
put an h in this direction,

1007
00:58:46,250 --> 00:58:50,140
it's going to activate
a little bit of mode one

1008
00:58:50,140 --> 00:58:54,940
with a big gain and a
little bit of mode two

1009
00:58:54,940 --> 00:58:56,380
with a very small gain.

1010
00:58:56,380 --> 00:59:01,880
And so the steady state response
will be the sum of those two.

1011
00:59:01,880 --> 00:59:04,240
It'll be up here.

1012
00:59:04,240 --> 00:59:09,120
So the steady state response
to this input in this direction

1013
00:59:09,120 --> 00:59:10,810
is going to be over here.

1014
00:59:10,810 --> 00:59:11,670
Why?

1015
00:59:11,670 --> 00:59:16,680
Because that input activates
mode one and mode two both.

1016
00:59:16,680 --> 00:59:20,180
But the response
of mode one is big,

1017
00:59:20,180 --> 00:59:23,150
and the response of mode
two is really small.

1018
00:59:23,150 --> 00:59:24,830
And so the steady
state response is

1019
00:59:24,830 --> 00:59:29,180
going to be way
over here because

1020
00:59:29,180 --> 00:59:32,330
of the big response, the
amplified response of mode two,

1021
00:59:32,330 --> 00:59:35,750
which is in this direction, OK?

1022
00:59:35,750 --> 00:59:37,442
So when we put an
input straight up,

1023
00:59:37,442 --> 00:59:38,900
the response of
the network's going

1024
00:59:38,900 --> 00:59:40,760
to be all the way over here.

1025
00:59:40,760 --> 00:59:43,640
How is it going to get there?

1026
00:59:43,640 --> 00:59:44,390
Let's take a look.

1027
00:59:52,570 --> 00:59:55,110
We're going to put an input--

1028
00:59:55,110 --> 00:59:58,283
sorry, that was first
in this direction.

1029
00:59:58,283 --> 00:59:59,700
Now let's see what
happens when we

1030
00:59:59,700 --> 01:00:01,720
put an input in this direction.

1031
01:00:01,720 --> 01:00:06,150
You can see the response is
really big along the mode one

1032
01:00:06,150 --> 01:00:08,250
direction, in this
direction, and it's

1033
01:00:08,250 --> 01:00:12,550
really small in this direction.

1034
01:00:12,550 --> 01:00:18,380
So input up in the upward
direction onto just this neuron

1035
01:00:18,380 --> 01:00:21,690
produces a large
response in mode,

1036
01:00:21,690 --> 01:00:24,020
which is this way, and
a very small response

1037
01:00:24,020 --> 01:00:26,570
in mode two, which is this way.

1038
01:00:26,570 --> 01:00:32,380
The response in mode two is
very fast, because the lambda,

1039
01:00:32,380 --> 01:00:37,192
the 1 over 1 minus
lambda, is small,

1040
01:00:37,192 --> 01:00:39,810
which makes the
time constant faster

1041
01:00:39,810 --> 01:00:43,230
and the response smaller.

1042
01:00:43,230 --> 01:00:45,990
So, again, it's just
like the response

1043
01:00:45,990 --> 01:00:50,422
of the autapse
network, but rotated

1044
01:00:50,422 --> 01:00:51,630
into a new coordinate system.

1045
01:00:56,670 --> 01:00:58,290
All right, any
questions about that?

1046
01:01:02,610 --> 01:01:06,060
So you can see we basically
understood everything

1047
01:01:06,060 --> 01:01:09,660
we needed to know about
recurrent networks

1048
01:01:09,660 --> 01:01:17,920
just by understanding simple
networks with just autapses.

1049
01:01:17,920 --> 01:01:21,830
And all these more
complicated networks

1050
01:01:21,830 --> 01:01:25,190
are just nothing
but rotated versions

1051
01:01:25,190 --> 01:01:27,710
of the response of a
network with just autapses.

1052
01:01:36,998 --> 01:01:38,040
Any questions about that?

1053
01:01:41,990 --> 01:01:44,350
OK, let's do another
network now where

1054
01:01:44,350 --> 01:01:46,210
we have inhibitory connections.

1055
01:01:46,210 --> 01:01:50,350
That's called mutual inhibition.

1056
01:01:50,350 --> 01:01:52,870
And let's make that
inhibition minus 0.8.

1057
01:01:52,870 --> 01:01:55,690
The weight matrix is just
zeros on the diagonals,

1058
01:01:55,690 --> 01:01:57,940
because there's no autapse here.

1059
01:01:57,940 --> 01:02:03,230
And minus 0.8 on
the off-diagonals.

1060
01:02:03,230 --> 01:02:10,338
What are the eigenvectors for
this matrix, for this network?

1061
01:02:10,338 --> 01:02:11,790
AUDIENCE: The same.

1062
01:02:11,790 --> 01:02:13,890
MICHALE FEE: Yeah,
because the diagonal

1063
01:02:13,890 --> 01:02:15,430
elements are equal
to each other,

1064
01:02:15,430 --> 01:02:18,070
and the off-diagonal elements
are equal to each other.

1065
01:02:18,070 --> 01:02:21,990
It's a symmetric network
with equal diagonal elements.

1066
01:02:21,990 --> 01:02:26,440
The eigenvectors are
always at 45 degrees.

1067
01:02:26,440 --> 01:02:28,000
And what are the eigenvalues?

1068
01:02:30,940 --> 01:02:34,370
AUDIENCE: [INAUDIBLE]

1069
01:02:34,370 --> 01:02:36,458
MICHALE FEE: Well,
the two numbers

1070
01:02:36,458 --> 01:02:37,500
are going to be the same.

1071
01:02:37,500 --> 01:02:44,320
It's zero plus and minus 0.8,
plus and minus negative 0.8,

1072
01:02:44,320 --> 01:02:47,400
which is just 0.8
and minus 0.8, right?

1073
01:02:47,400 --> 01:02:47,910
Good.

1074
01:02:47,910 --> 01:02:51,390
So the eigenvalues are
just 0.8 and minus 0.8.

1075
01:02:51,390 --> 01:02:55,590
But the eigenvalues correspond
to different eigenvectors.

1076
01:02:55,590 --> 01:02:59,760
So now the eigenvalue
mode in the 1,

1077
01:02:59,760 --> 01:03:04,170
1 direction is now
minus 0.8, which

1078
01:03:04,170 --> 01:03:09,270
means it's suppressing the
response in this direction.

1079
01:03:09,270 --> 01:03:12,980
And the eigenvalue for the
eigenvector in the minus 1,

1080
01:03:12,980 --> 01:03:16,530
1 direction is now
close to 1, which

1081
01:03:16,530 --> 01:03:20,880
means that mode has a lot
of recurrent feedback.

1082
01:03:20,880 --> 01:03:25,480
And so its response in this
direction is going to be big.

1083
01:03:25,480 --> 01:03:26,970
It's going to be amplified.

1084
01:03:26,970 --> 01:03:31,580
So unlike the case where we had
positive recurrent synapses,

1085
01:03:31,580 --> 01:03:35,070
where we had amplification
in this direction, now

1086
01:03:35,070 --> 01:03:37,920
we're going to
have amplification

1087
01:03:37,920 --> 01:03:39,565
in this direction.

1088
01:03:39,565 --> 01:03:40,440
Does that make sense?

1089
01:03:43,500 --> 01:03:44,730
Think of it this way--

1090
01:03:44,730 --> 01:03:48,320
if we go back to
this network here,

1091
01:03:48,320 --> 01:03:51,950
you can see that when
these two neurons--

1092
01:03:51,950 --> 01:03:56,690
when this neuron is active, it
tends to activate this neuron.

1093
01:03:56,690 --> 01:03:58,230
And when this
neuron is activate,

1094
01:03:58,230 --> 01:04:00,150
it tends to activate
that neuron.

1095
01:04:00,150 --> 01:04:05,150
So this network, if you were to
activate one of these neurons,

1096
01:04:05,150 --> 01:04:08,480
it tends to drive the
other neuron also.

1097
01:04:08,480 --> 01:04:13,040
And so the activity of those two
neurons likes to go together.

1098
01:04:13,040 --> 01:04:15,520
When one is big, the
other one wants to be big.

1099
01:04:15,520 --> 01:04:21,800
And that's why there's a lot
of gain in this direction.

1100
01:04:21,800 --> 01:04:23,110
Does that make sense?

1101
01:04:23,110 --> 01:04:26,440
With these recurrent
excitatory connections,

1102
01:04:26,440 --> 01:04:29,860
it's hard to make
this neuron fire

1103
01:04:29,860 --> 01:04:32,470
and make that neuron not fire.

1104
01:04:32,470 --> 01:04:36,300
And that's why the response is
suppressed in this direction,

1105
01:04:36,300 --> 01:04:36,910
OK?

1106
01:04:36,910 --> 01:04:41,920
With this network, when
this neuron is active,

1107
01:04:41,920 --> 01:04:43,860
it's trying to
suppress that neuron.

1108
01:04:47,100 --> 01:04:49,050
When that neuron has
positive firing rate,

1109
01:04:49,050 --> 01:04:51,870
it's trying to make that neuron
have a negative firing rate.

1110
01:04:51,870 --> 01:04:53,850
When that neuron is
negative, it tries

1111
01:04:53,850 --> 01:04:55,470
to make that one go positive.

1112
01:04:55,470 --> 01:04:58,140
And so this network
likes to have

1113
01:04:58,140 --> 01:05:04,990
one firing positive and the
other neuron going negative.

1114
01:05:04,990 --> 01:05:06,550
And so that's what happens.

1115
01:05:06,550 --> 01:05:16,580
What you find is that if you put
an input into the first neuron,

1116
01:05:16,580 --> 01:05:20,330
it tends to suppress the
activity in the second neuron,

1117
01:05:20,330 --> 01:05:21,980
in v2.

1118
01:05:21,980 --> 01:05:27,390
If you put neuron
into neuron two,

1119
01:05:27,390 --> 01:05:29,220
it tends to suppress
the activity,

1120
01:05:29,220 --> 01:05:32,040
or make v1 go negative.

1121
01:05:32,040 --> 01:05:36,810
So it's, again, exactly
like the autapse network,

1122
01:05:36,810 --> 01:05:42,590
but just, in this case, rotated
minus 45 degrees instead

1123
01:05:42,590 --> 01:05:44,950
of plus 45 degrees, OK?

1124
01:05:51,750 --> 01:05:55,000
Any questions about that?

1125
01:05:55,000 --> 01:05:55,780
All right.

1126
01:05:55,780 --> 01:05:59,830
So now let's talk about how--

1127
01:05:59,830 --> 01:06:00,989
yes, Linda?

1128
01:06:00,989 --> 01:06:03,489
AUDIENCE: So we just did, those
were all symmetric matrices,

1129
01:06:03,489 --> 01:06:04,390
right?

1130
01:06:04,390 --> 01:06:05,098
MICHALE FEE: Yes.

1131
01:06:05,098 --> 01:06:08,397
AUDIENCE: So [INAUDIBLE]
can we not do this strategy

1132
01:06:08,397 --> 01:06:09,380
if it's not symmetric?

1133
01:06:09,380 --> 01:06:11,690
MICHALE FEE: You can do it
for non-symmetric matrices,

1134
01:06:11,690 --> 01:06:15,260
but non-symmetric
matrices start doing

1135
01:06:15,260 --> 01:06:17,330
all kinds of other
cool stuff that

1136
01:06:17,330 --> 01:06:20,730
is a topic for another day.

1137
01:06:20,730 --> 01:06:25,650
So symmetric matrices
are special in that they

1138
01:06:25,650 --> 01:06:31,380
have very simple dynamics.

1139
01:06:31,380 --> 01:06:37,930
They just relax to a
steady state solution.

1140
01:06:37,930 --> 01:06:40,980
Weight matrices that are
not symmetric, or even

1141
01:06:40,980 --> 01:06:43,230
anti-symmetric, tend to
do really cool things

1142
01:06:43,230 --> 01:06:46,590
like oscillating.

1143
01:06:46,590 --> 01:06:50,670
And we'll get to that in
another lecture, all right?

1144
01:06:50,670 --> 01:06:55,170
OK, so now let's talk about
using recurrent networks

1145
01:06:55,170 --> 01:06:57,150
to store memories.

1146
01:06:57,150 --> 01:07:00,360
So, remember, all of
the cases we've just

1147
01:07:00,360 --> 01:07:03,960
described, all of the
networks we've just described,

1148
01:07:03,960 --> 01:07:08,340
had the properties that the
lambdas were less than one.

1149
01:07:08,340 --> 01:07:10,190
So what we've been
looking at are

1150
01:07:10,190 --> 01:07:13,970
networks for which
lambda is less than one

1151
01:07:13,970 --> 01:07:18,320
and they're symmetric
weight matrices.

1152
01:07:18,320 --> 01:07:20,008
So that was kind
of a special case,

1153
01:07:20,008 --> 01:07:21,800
but it's a good case
for building intuition

1154
01:07:21,800 --> 01:07:24,050
about what goes on.

1155
01:07:24,050 --> 01:07:25,800
But now we're going
to start branching out

1156
01:07:25,800 --> 01:07:30,310
into more interesting behavior.

1157
01:07:33,040 --> 01:07:37,090
So let's take a look at what
happens to our equation.

1158
01:07:37,090 --> 01:07:41,170
This is now our equation
different modes of a network.

1159
01:07:41,170 --> 01:07:43,930
What happens to this equation
when lambda is actually

1160
01:07:43,930 --> 01:07:46,670
equal to one?

1161
01:07:46,670 --> 01:07:52,210
So when lambda is equal to one,
this term goes to zero, right?

1162
01:07:52,210 --> 01:07:58,170
So we can just cross this
out and rewrite our equation

1163
01:07:58,170 --> 01:08:06,710
as tau dc dt equals f1 f dot h.

1164
01:08:06,710 --> 01:08:09,238
So what is this?

1165
01:08:09,238 --> 01:08:10,280
What does that look like?

1166
01:08:13,850 --> 01:08:21,130
What's the solution to c for
this differential equation?

1167
01:08:21,130 --> 01:08:25,420
Does this exponentially
relax toward a v infinity?

1168
01:08:29,640 --> 01:08:31,840
What is v infinity here?

1169
01:08:31,840 --> 01:08:34,770
It's not even defined.

1170
01:08:34,770 --> 01:08:38,399
If you set dc dt equal to
zero, there's not even a c

1171
01:08:38,399 --> 01:08:39,899
to solve for, right?

1172
01:08:39,899 --> 01:08:41,399
So what is this?

1173
01:08:46,290 --> 01:08:49,890
The derivative of c
is just equal to--

1174
01:08:49,890 --> 01:08:55,238
if we put in an input
that's constant, what is c?

1175
01:08:55,238 --> 01:08:57,510
AUDIENCE: [INAUDIBLE]

1176
01:08:57,510 --> 01:09:00,290
MICHALE FEE: This is
an integrator, right?

1177
01:09:00,290 --> 01:09:04,609
This c, the solution
to this equation,

1178
01:09:04,609 --> 01:09:10,960
is that c is the
integral of this input.

1179
01:09:10,960 --> 01:09:16,960
c is some initial c plus
the integral over time.

1180
01:09:22,279 --> 01:09:25,180
So if we have an input--

1181
01:09:25,180 --> 01:09:28,050
and again, what
we're plotting here

1182
01:09:28,050 --> 01:09:34,370
is the activity of one of
the modes of our network, c1,

1183
01:09:34,370 --> 01:09:37,430
which is a function
of the projection

1184
01:09:37,430 --> 01:09:42,350
of the input along the
eigenvector of mode one.

1185
01:09:42,350 --> 01:09:46,189
So we're going to plot h, which
is just how much the input

1186
01:09:46,189 --> 01:09:50,000
overlaps with mode one.

1187
01:09:50,000 --> 01:09:53,810
And as a function of time,
let's start at one equals zero.

1188
01:09:53,810 --> 01:09:54,890
What will this look like?

1189
01:09:59,710 --> 01:10:02,250
This will just
increase linearly.

1190
01:10:02,250 --> 01:10:03,514
And then what happens?

1191
01:10:06,993 --> 01:10:08,120
What happens here?

1192
01:10:13,650 --> 01:10:14,357
Raymundo?

1193
01:10:14,357 --> 01:10:15,690
AUDIENCE: R just stays constant.

1194
01:10:15,690 --> 01:10:18,400
MICHALE FEE: Good.

1195
01:10:18,400 --> 01:10:21,730
We've been through that,
like, 100 times in this class.

1196
01:10:25,220 --> 01:10:33,600
Now, what's special about
this network is that remember,

1197
01:10:33,600 --> 01:10:37,350
when lambda was less
than one, the network

1198
01:10:37,350 --> 01:10:39,120
would respond to the input.

1199
01:10:39,120 --> 01:10:41,370
And then what would it do
when we took the input away?

1200
01:10:44,830 --> 01:10:47,300
It would decay back to zero.

1201
01:10:47,300 --> 01:10:51,070
But this network does
something really special.

1202
01:10:51,070 --> 01:10:53,620
This network, you put
an input in and then

1203
01:10:53,620 --> 01:10:58,360
take the input away, this
network stays active.

1204
01:10:58,360 --> 01:11:02,920
It remembers what the input was.

1205
01:11:02,920 --> 01:11:06,220
Whereas, if you have a network
where lambda is less than one,

1206
01:11:06,220 --> 01:11:12,237
the network very quickly
forgets what the input was.

1207
01:11:12,237 --> 01:11:14,570
All right, what happens when
lambda is greater than one?

1208
01:11:14,570 --> 01:11:18,920
So when lambda is greater
than one, this term is now--

1209
01:11:18,920 --> 01:11:20,780
this thing inside
the parentheses

1210
01:11:20,780 --> 01:11:23,490
is negative, multiplied
by a negative number.

1211
01:11:23,490 --> 01:11:27,290
This whole coefficient in front
of the c1 becomes positive.

1212
01:11:27,290 --> 01:11:31,050
So we're just going to write
it as lambda minus one.

1213
01:11:31,050 --> 01:11:33,980
And so this because positive.

1214
01:11:33,980 --> 01:11:35,750
And what does that
solution look like?

1215
01:11:35,750 --> 01:11:37,670
Does anyone know
what that looks like?

1216
01:11:37,670 --> 01:11:40,760
dc dt equals a positive
number times c.

1217
01:11:49,790 --> 01:11:50,990
Nobody?

1218
01:11:50,990 --> 01:11:53,495
Are we all just sleepy?

1219
01:11:57,278 --> 01:11:57,820
What happens?

1220
01:12:00,700 --> 01:12:04,950
So if this is negative, if this
coefficient were negative, dc--

1221
01:12:04,950 --> 01:12:07,650
if c is positive, then
dc dt is negative,

1222
01:12:07,650 --> 01:12:11,620
and it relaxes to zero, right?

1223
01:12:11,620 --> 01:12:13,270
Lets think about
this for a minute.

1224
01:12:13,270 --> 01:12:15,380
What happens if this
quantity is positive?

1225
01:12:15,380 --> 01:12:16,740
So if c is positive--

1226
01:12:19,760 --> 01:12:20,720
cover that up.

1227
01:12:20,720 --> 01:12:24,090
If this is positive
and c is positive,

1228
01:12:24,090 --> 01:12:26,760
then dc dt is positive.

1229
01:12:26,760 --> 01:12:31,790
So that means if c is positive,
it just keeps getting bigger,

1230
01:12:31,790 --> 01:12:32,360
right?

1231
01:12:32,360 --> 01:12:36,320
And so what happens is you
get exponential growth.

1232
01:12:36,320 --> 01:12:39,670
So if we now take an input and
we put it into this network,

1233
01:12:39,670 --> 01:12:41,740
where lambda is
greater than one,

1234
01:12:41,740 --> 01:12:44,550
you get exponential growth.

1235
01:12:44,550 --> 01:12:47,000
And now what happens when
you turn that input off?

1236
01:12:53,800 --> 01:12:55,720
Does it go away?

1237
01:13:06,031 --> 01:13:07,013
What happens?

1238
01:13:12,420 --> 01:13:14,400
draw with their hand
what happens here.

1239
01:13:18,360 --> 01:13:20,690
So just look at the equation.

1240
01:13:20,690 --> 01:13:26,450
Again, h dot f1 is zero
here, so that's gone.

1241
01:13:26,450 --> 01:13:28,190
This is positive.

1242
01:13:28,190 --> 01:13:30,270
c is positive.

1243
01:13:30,270 --> 01:13:31,140
So what is dc dt?

1244
01:13:33,950 --> 01:13:34,490
Good.

1245
01:13:34,490 --> 01:13:35,550
It's positive.

1246
01:13:35,550 --> 01:13:36,290
And so what is--

1247
01:13:36,290 --> 01:13:36,850
AUDIENCE: [INAUDIBLE]

1248
01:13:36,850 --> 01:13:38,100
MICHALE FEE: It keeps growing.

1249
01:13:41,710 --> 01:13:43,620
So you can see that
this network also

1250
01:13:43,620 --> 01:13:49,020
remembers that it had input.

1251
01:13:51,940 --> 01:13:54,550
So this network
also has a memory.

1252
01:13:54,550 --> 01:13:58,990
So anytime you have lambda
less than one the network

1253
01:13:58,990 --> 01:14:01,060
just-- as soon as
the input goes away,

1254
01:14:01,060 --> 01:14:03,190
the network activity
goes to zero,

1255
01:14:03,190 --> 01:14:06,220
and it just completely forgets
that it ever had input.

1256
01:14:06,220 --> 01:14:09,860
Whereas, as long as lambda is
equal to or greater than one,

1257
01:14:09,860 --> 01:14:15,640
then this network remembers
that it had input.

1258
01:14:15,640 --> 01:14:18,790
So if lambda is less than
one, then the network

1259
01:14:18,790 --> 01:14:23,330
relaxes exponentially back to
zero after the input goes away.

1260
01:14:23,330 --> 01:14:26,680
If you have lambda equal to
one, you have an integrator,

1261
01:14:26,680 --> 01:14:29,350
and the network
activity persists

1262
01:14:29,350 --> 01:14:31,550
after the input goes away.

1263
01:14:31,550 --> 01:14:33,550
And if you have
exponential growth,

1264
01:14:33,550 --> 01:14:36,400
the network activity
also persists

1265
01:14:36,400 --> 01:14:37,700
after the input goes away.

1266
01:14:40,770 --> 01:14:45,020
And so that right there
is one of the best

1267
01:14:45,020 --> 01:14:52,560
models for short-term
memory in the brain.

1268
01:14:52,560 --> 01:14:58,440
The idea that you have
neurons that get input,

1269
01:14:58,440 --> 01:15:02,700
become activated, and
then hold that memory

1270
01:15:02,700 --> 01:15:08,340
by reactivating themselves and
holding their own activity high

1271
01:15:08,340 --> 01:15:11,310
through recurrent excitation.

1272
01:15:11,310 --> 01:15:14,430
But that excitation
has to be big enough

1273
01:15:14,430 --> 01:15:17,520
to either just barely
maintain the activity

1274
01:15:17,520 --> 01:15:22,240
or continue increasing
their activity.

1275
01:15:22,240 --> 01:15:25,870
OK, now, that's not
necessarily such a great model

1276
01:15:25,870 --> 01:15:26,710
for a memory, right?

1277
01:15:26,710 --> 01:15:28,990
Because we can't have
neurons whose activity is

1278
01:15:28,990 --> 01:15:31,750
exploding exponentially, right?

1279
01:15:31,750 --> 01:15:32,980
So that's not so great.

1280
01:15:32,980 --> 01:15:40,070
But it is quite commonly
thought that in neural networks

1281
01:15:40,070 --> 01:15:43,400
involved in memory, the lambda
is actually greater than one.

1282
01:15:43,400 --> 01:15:46,020
And how would we
rescue this situation?

1283
01:15:46,020 --> 01:15:48,890
How would we save our
network from having neurons

1284
01:15:48,890 --> 01:15:51,004
that blow up exponentially?

1285
01:15:53,610 --> 01:15:59,430
Well, remember, this
was the solution

1286
01:15:59,430 --> 01:16:02,880
for a network with
linear neurons.

1287
01:16:02,880 --> 01:16:07,450
But neurons in the brain are
not really linear, are they?

1288
01:16:07,450 --> 01:16:09,370
They have firing
rates that saturate.

1289
01:16:09,370 --> 01:16:12,140
At higher inputs, firing
rates tend [AUDIO OUT]..

1290
01:16:12,140 --> 01:16:12,640
Why?

1291
01:16:12,640 --> 01:16:14,995
Because sodium channels
become inactivated,

1292
01:16:14,995 --> 01:16:18,850
and the neurons can't
respond that fast, right?

1293
01:16:29,430 --> 01:16:31,650
All right, this
I've already said.

1294
01:16:31,650 --> 01:16:37,760
So we use what are called
saturating non-linearities.

1295
01:16:37,760 --> 01:16:41,000
So it's very common
to write down

1296
01:16:41,000 --> 01:16:45,230
models in which we can still
have neurons that are--

1297
01:16:45,230 --> 01:16:47,380
we can still have them
approximately linear.

1298
01:16:47,380 --> 01:16:50,510
So it's quite often to
have neurons that are

1299
01:16:50,510 --> 01:16:52,460
linear for small [INAUDIBLE].

1300
01:16:52,460 --> 01:16:55,130
They can go plus and
minus, but they saturate

1301
01:16:55,130 --> 01:16:57,170
on the plus side or the minus.

1302
01:16:57,170 --> 01:17:00,050
So now you can have
an input to a neuron

1303
01:17:00,050 --> 01:17:03,230
that activates the neuron.

1304
01:17:03,230 --> 01:17:08,670
You can see what happens is you
start activating this neuron.

1305
01:17:08,670 --> 01:17:14,730
It keeps activating itself,
even as the input goes away.

1306
01:17:14,730 --> 01:17:17,790
But now, what happens
is that activity

1307
01:17:17,790 --> 01:17:20,490
starts getting up into the
regime where the neuron can't

1308
01:17:20,490 --> 01:17:23,130
fire any faster.

1309
01:17:23,130 --> 01:17:28,070
And so the activity becomes
stable at some high value

1310
01:17:28,070 --> 01:17:29,400
of firing.

1311
01:17:29,400 --> 01:17:31,040
Does that make sense?

1312
01:17:31,040 --> 01:17:32,600
And this kind of
neuron, for example,

1313
01:17:32,600 --> 01:17:38,330
can remember a plus input, or
it can remember a minus input.

1314
01:17:41,415 --> 01:17:42,290
Does that make sense?

1315
01:17:42,290 --> 01:17:46,050
So that's how we can
build a simple network

1316
01:17:46,050 --> 01:17:52,950
with a neuron that can
remember its previous inputs

1317
01:17:52,950 --> 01:17:56,700
with a lambda that's
greater than one.

1318
01:17:56,700 --> 01:18:00,540
And this right here,
that basic thing,

1319
01:18:00,540 --> 01:18:05,730
is one of the models for
how the hippocampus stores

1320
01:18:05,730 --> 01:18:08,820
memories, that you have
hippocampal neurons that

1321
01:18:08,820 --> 01:18:11,490
connect to each other
with a lot of recurrent

1322
01:18:11,490 --> 01:18:13,500
connections [AUDIO OUT]
in the hippocampus

1323
01:18:13,500 --> 01:18:15,960
has a lot of
recurrent connections.

1324
01:18:15,960 --> 01:18:20,100
And the idea is that those
neurons activate each other,

1325
01:18:20,100 --> 01:18:25,060
but then those neurons saturate
so they can't fire anymore,

1326
01:18:25,060 --> 01:18:29,230
and now you can have a stable
memory of some prior input.

1327
01:18:37,020 --> 01:18:38,870
And I think we
should stop there.

1328
01:18:38,870 --> 01:18:42,080
But there are other
very interesting topics

1329
01:18:42,080 --> 01:18:50,990
that we're going to get to
on how these kind of networks

1330
01:18:50,990 --> 01:18:54,290
can also make
decisions and how they

1331
01:18:54,290 --> 01:18:58,190
can store continuous memories--
not just discrete memories,

1332
01:18:58,190 --> 01:19:00,740
plus or minus, on
or off, but can

1333
01:19:00,740 --> 01:19:07,540
store a value for a long period
of time using this integrator.

1334
01:19:07,540 --> 01:19:10,050
OK, so we'll stop there.