1
00:00:01,685 --> 00:00:04,040
The phone and content is
provided under a Creative

2
00:00:04,040 --> 00:00:05,580
Commons license.

3
00:00:05,580 --> 00:00:07,880
Your support will help
MIT OpenCourseWare

4
00:00:07,880 --> 00:00:12,270
continue to offer high quality
educational resources for free.

5
00:00:12,270 --> 00:00:14,870
To make a donation or
view additional materials

6
00:00:14,870 --> 00:00:18,830
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,830 --> 00:00:20,000
at ocw.mit.edu.

8
00:00:21,881 --> 00:00:23,630
ETHAN MEYERS: What I'm
talking about today

9
00:00:23,630 --> 00:00:26,180
is neural population
decoding, which

10
00:00:26,180 --> 00:00:28,790
is very similar to what
Rebecca was talking about,

11
00:00:28,790 --> 00:00:32,330
except for I'm talking about now
more on the single neuron level

12
00:00:32,330 --> 00:00:35,810
and also talk a bit about
some MEG at the end.

13
00:00:35,810 --> 00:00:39,740
But kind of to tie it to what
was previously discussed,

14
00:00:39,740 --> 00:00:43,130
Rebecca talked a lot about, at
the end, the big catastrophe.

15
00:00:43,130 --> 00:00:45,350
Well, you don't know
if something is not

16
00:00:45,350 --> 00:00:49,340
there in the fMRI signal because
things could be masked when

17
00:00:49,340 --> 00:00:51,470
you're averaging
over a large region

18
00:00:51,470 --> 00:00:54,860
as you do when you're recording
from those bold signals.

19
00:00:54,860 --> 00:00:58,790
And when you're doing
decoding on single neurons,

20
00:00:58,790 --> 00:01:00,920
that is not really an issue
because you're actually

21
00:01:00,920 --> 00:01:03,420
going down and recording
those individual neurons.

22
00:01:03,420 --> 00:01:07,040
And so while in general
in hypothesis testing

23
00:01:07,040 --> 00:01:10,240
you can never really say
something doesn't exist,

24
00:01:10,240 --> 00:01:12,830
here you can feel fairly
confident that it probably

25
00:01:12,830 --> 00:01:14,460
doesn't, unless you--

26
00:01:14,460 --> 00:01:18,320
I mean, you could do
a Bayesian analysis.

27
00:01:18,320 --> 00:01:20,170
Anyway, all right.

28
00:01:20,170 --> 00:01:23,900
So kind of the very basic
motivation behind what I do

29
00:01:23,900 --> 00:01:28,400
is, you know, I'm interested in
all the questions the CBMM is

30
00:01:28,400 --> 00:01:33,050
interested in, how can we
algorithmically solve problems

31
00:01:33,050 --> 00:01:34,880
and perform behaviors.

32
00:01:34,880 --> 00:01:38,150
And so, you know,
basically as motivation,

33
00:01:38,150 --> 00:01:42,800
you know, as a theoretician,
we might have some great idea

34
00:01:42,800 --> 00:01:44,400
about how the brain works.

35
00:01:44,400 --> 00:01:47,450
And so what we do is we
come up with an experiment

36
00:01:47,450 --> 00:01:48,180
and we run it.

37
00:01:48,180 --> 00:01:50,382
And we record a
bunch of neural data.

38
00:01:50,382 --> 00:01:52,340
And then at the end of
it, what we're left with

39
00:01:52,340 --> 00:01:54,320
is just a bunch of data.

40
00:01:54,320 --> 00:01:56,820
It's not really an
answer to our question.

41
00:01:56,820 --> 00:02:00,012
So for example, if
you recorded spikes,

42
00:02:00,012 --> 00:02:01,970
you might end up with
something called a raster

43
00:02:01,970 --> 00:02:03,510
where you have trials and time.

44
00:02:03,510 --> 00:02:07,040
And you just end up with
little indications at what

45
00:02:07,040 --> 00:02:09,530
times did a neuron spike.

46
00:02:09,530 --> 00:02:11,480
Or if you did an
MEG experiment, you

47
00:02:11,480 --> 00:02:14,450
might end up with a bunch
of kind of waveforms

48
00:02:14,450 --> 00:02:16,550
that are kind of noisy.

49
00:02:16,550 --> 00:02:19,740
And so this is a
good first step,

50
00:02:19,740 --> 00:02:21,410
but obviously what
you need to do

51
00:02:21,410 --> 00:02:24,407
is take this and turn it
into some sort of answer

52
00:02:24,407 --> 00:02:25,115
to your question.

53
00:02:27,299 --> 00:02:29,090
Because if you can't
turn it into an answer

54
00:02:29,090 --> 00:02:30,556
to your question,
there is no point

55
00:02:30,556 --> 00:02:32,180
in doing that experiment
to begin with.

56
00:02:34,880 --> 00:02:37,430
So basically, what
I'm looking for is

57
00:02:37,430 --> 00:02:39,560
clear answers to questions.

58
00:02:39,560 --> 00:02:41,760
In particular I'm
interested in two things.

59
00:02:41,760 --> 00:02:43,730
One is neural content.

60
00:02:43,730 --> 00:02:46,610
And that is what information
is in a particular region

61
00:02:46,610 --> 00:02:49,029
of the brain, and at what time.

62
00:02:49,029 --> 00:02:50,570
And the other thing
I'm interested in

63
00:02:50,570 --> 00:02:55,130
is neural coding, or what
features of the neural activity

64
00:02:55,130 --> 00:02:58,980
contain that information.

65
00:02:58,980 --> 00:03:01,550
And so the idea is, basically,
if we can make recordings

66
00:03:01,550 --> 00:03:03,530
from a number of
different brain regions

67
00:03:03,530 --> 00:03:07,580
and tell what content was
in different parts, then

68
00:03:07,580 --> 00:03:10,520
we could, basically,
trace the information flow

69
00:03:10,520 --> 00:03:12,500
through the brain
and try to unravel

70
00:03:12,500 --> 00:03:15,997
the algorithms that enable us
to perform particular tasks.

71
00:03:15,997 --> 00:03:18,080
And then if we can do that,
we can do other things

72
00:03:18,080 --> 00:03:20,660
that the CBMM likes
to do, such as build

73
00:03:20,660 --> 00:03:24,920
helpful robots that will
either bring us drinks

74
00:03:24,920 --> 00:03:25,940
or create peace.

75
00:03:28,580 --> 00:03:30,770
So the outline
for the talk today

76
00:03:30,770 --> 00:03:34,256
is I'm going to talk about what
neural population decoding is.

77
00:03:34,256 --> 00:03:35,630
I'm going to show
you how you can

78
00:03:35,630 --> 00:03:38,960
use it to get at neural
content, so what information

79
00:03:38,960 --> 00:03:40,370
is in brain regions.

80
00:03:40,370 --> 00:03:42,911
Then I'm going to show how you
can use it to answer questions

81
00:03:42,911 --> 00:03:46,070
about neural coding, or how do
neurons contain information.

82
00:03:46,070 --> 00:03:48,930
And then I'm going to show you
a little bit how you can use

83
00:03:48,930 --> 00:03:50,130
it to analyze your own data.

84
00:03:50,130 --> 00:03:53,000
So very briefly, a
toolbox I created

85
00:03:53,000 --> 00:03:54,995
that makes it easy
to do these analyses.

86
00:03:57,970 --> 00:04:02,950
All right, so the basic
idea behind neural decoding

87
00:04:02,950 --> 00:04:04,630
is that what you
want to do is you

88
00:04:04,630 --> 00:04:07,300
want to take neural activity
and try to predict something

89
00:04:07,300 --> 00:04:09,160
about the stimulus
itself or about,

90
00:04:09,160 --> 00:04:11,150
let's say, an animal's behavior.

91
00:04:11,150 --> 00:04:14,560
So it's a function that
goes from neural activity

92
00:04:14,560 --> 00:04:15,430
to a stimulus.

93
00:04:19,430 --> 00:04:22,760
And decoding approaches
have been used for maybe

94
00:04:22,760 --> 00:04:24,130
about 30 years.

95
00:04:24,130 --> 00:04:27,110
So Rebecca was saying
MBPA goes back to 2001.

96
00:04:27,110 --> 00:04:29,160
Well, this goes
back much further.

97
00:04:29,160 --> 00:04:33,530
So in 1986, Georgopoulos
did some studies

98
00:04:33,530 --> 00:04:35,800
with monkeys showing
that he could

99
00:04:35,800 --> 00:04:38,540
decode where a monkey
was moving its arm based

100
00:04:38,540 --> 00:04:40,130
on neural activity.

101
00:04:40,130 --> 00:04:43,340
And there was other
studies in '93

102
00:04:43,340 --> 00:04:46,230
by Matt Wilson and McNaughton.

103
00:04:46,230 --> 00:04:49,040
Matt gave a talk here,
I think, as well.

104
00:04:49,040 --> 00:04:50,510
And what he tried
to do is decode

105
00:04:50,510 --> 00:04:53,180
where a rat is in a maze.

106
00:04:53,180 --> 00:04:55,160
So again, recording
from the hippocampus,

107
00:04:55,160 --> 00:04:57,440
trying to tell
where that rat is.

108
00:04:57,440 --> 00:05:02,120
And there's also been a large
amount of computational work,

109
00:05:02,120 --> 00:05:06,530
such as work by Selinas and
Larry Abbott, kind of comparing

110
00:05:06,530 --> 00:05:09,350
different decoding methods.

111
00:05:09,350 --> 00:05:12,200
But despite all of this work,
it's still not widely used.

112
00:05:12,200 --> 00:05:15,910
So Rebecca was saying that
MVPA has really taken off.

113
00:05:15,910 --> 00:05:17,870
Well, I'm still waiting
for population decoding

114
00:05:17,870 --> 00:05:19,389
in neural activity to take off.

115
00:05:19,389 --> 00:05:20,930
And so part of me
being up here today

116
00:05:20,930 --> 00:05:22,442
is to say you really
should do this.

117
00:05:22,442 --> 00:05:23,150
It's really good.

118
00:05:25,670 --> 00:05:28,160
And just a few other
names for decoding

119
00:05:28,160 --> 00:05:31,250
is MVPA, multi variant
pattern analysis.

120
00:05:31,250 --> 00:05:34,280
This is the terminology that
people in the fMRI community

121
00:05:34,280 --> 00:05:36,170
use and what Rebecca was using.

122
00:05:36,170 --> 00:05:37,781
It's also called read out.

123
00:05:37,781 --> 00:05:39,530
So if you've heard
those terms, it kind of

124
00:05:39,530 --> 00:05:40,571
refers to the same thing.

125
00:05:43,770 --> 00:05:47,090
All right, so let
me show you what

126
00:05:47,090 --> 00:05:51,390
decoding looks like in terms of
an experiment with, let's say,

127
00:05:51,390 --> 00:05:52,910
a monkey.

128
00:05:52,910 --> 00:05:55,700
So here we'd have an experiment
where we're showing the monkey

129
00:05:55,700 --> 00:05:58,220
different images on a screen.

130
00:05:58,220 --> 00:06:00,744
And so for example, we could
show it a picture of a kiwi.

131
00:06:00,744 --> 00:06:02,660
And then we'd be making
some neural recordings

132
00:06:02,660 --> 00:06:04,035
from this monkey,
so we'd get out

133
00:06:04,035 --> 00:06:05,990
a pattern of neural activity.

134
00:06:05,990 --> 00:06:08,120
And what we do in decoding
is we feed that pattern

135
00:06:08,120 --> 00:06:10,460
of neural activity
into a machine learning

136
00:06:10,460 --> 00:06:13,280
algorithm, which we call
pattern classifiers.

137
00:06:13,280 --> 00:06:16,532
Again, you've all
heard a lot about that.

138
00:06:16,532 --> 00:06:17,990
And so what this
algorithm does, is

139
00:06:17,990 --> 00:06:21,200
it learns to make an association
between this particular

140
00:06:21,200 --> 00:06:25,460
stimulus and this particular
pattern of neural activity.

141
00:06:25,460 --> 00:06:27,710
And so then we repeat that
process with another image,

142
00:06:27,710 --> 00:06:30,020
get another pattern of
neural activity out.

143
00:06:30,020 --> 00:06:31,850
Feed that into the classifier.

144
00:06:31,850 --> 00:06:34,190
And again, it learns
that association.

145
00:06:34,190 --> 00:06:37,130
And so we do that for every
single stimulus in our stimulus

146
00:06:37,130 --> 00:06:37,760
set.

147
00:06:37,760 --> 00:06:41,870
And for multiple repetitions
of each stimulus.

148
00:06:41,870 --> 00:06:43,870
So you know, once this
association is learned,

149
00:06:43,870 --> 00:06:46,140
what we do is we
use the classifier

150
00:06:46,140 --> 00:06:47,606
or test the classifier.

151
00:06:47,606 --> 00:06:48,730
Here we show another image.

152
00:06:48,730 --> 00:06:51,220
We get another pattern
of neural activity out.

153
00:06:51,220 --> 00:06:52,854
We feed that into
the classifier.

154
00:06:52,854 --> 00:06:54,520
But this time, instead
of the classifier

155
00:06:54,520 --> 00:06:57,430
learning the association,
it makes a prediction.

156
00:06:57,430 --> 00:07:00,730
And here it predicted the
kiwi, so we'd say it's correct.

157
00:07:00,730 --> 00:07:02,830
And then we can repeat
that with a car,

158
00:07:02,830 --> 00:07:04,802
get another pattern
of activity out.

159
00:07:04,802 --> 00:07:07,219
Feed it to the classifier,
get another prediction.

160
00:07:07,219 --> 00:07:09,010
And this time the
prediction was incorrect.

161
00:07:09,010 --> 00:07:11,350
It predicted a face, but
it was actually a car.

162
00:07:11,350 --> 00:07:14,380
And so what we do is
we just note how often

163
00:07:14,380 --> 00:07:15,820
are predictions correct.

164
00:07:15,820 --> 00:07:17,920
And we can plot that
as a function of time

165
00:07:17,920 --> 00:07:20,302
and kind of see the
evolution of information

166
00:07:20,302 --> 00:07:21,760
as it flows through
a brain region.

167
00:07:27,750 --> 00:07:30,690
All right, so in reality,
what we usually do is actually

168
00:07:30,690 --> 00:07:32,690
we run the full experiment.

169
00:07:32,690 --> 00:07:35,340
So we actually have collected
all the data beforehand.

170
00:07:35,340 --> 00:07:39,420
And then what we do is we split
it up into different splits.

171
00:07:39,420 --> 00:07:43,080
So here we had, you know,
this experiment, let's say,

172
00:07:43,080 --> 00:07:45,220
was faces and cars or something.

173
00:07:45,220 --> 00:07:47,370
So we have different
splits that have

174
00:07:47,370 --> 00:07:50,340
two repetitions of the
activity of different neurons

175
00:07:50,340 --> 00:07:53,490
do two faces and two cars, and
there's three different splits.

176
00:07:53,490 --> 00:07:56,492
And so what we do is we
take two of the splits

177
00:07:56,492 --> 00:07:58,950
and train the classifier, and
then have the remaining split

178
00:07:58,950 --> 00:08:00,000
and test it.

179
00:08:00,000 --> 00:08:05,940
And we do that for all
permutations of leaving out

180
00:08:05,940 --> 00:08:08,190
a different test split.

181
00:08:08,190 --> 00:08:11,390
So you all heard about
cross-validation before?

182
00:08:11,390 --> 00:08:14,040
OK.

183
00:08:14,040 --> 00:08:16,560
One thing to note about
neural populations

184
00:08:16,560 --> 00:08:18,240
is when you're
doing decoding, you

185
00:08:18,240 --> 00:08:22,050
don't actually need to record
all the neurons simultaneously.

186
00:08:22,050 --> 00:08:24,342
So I think this might be one
reason why a lot of people

187
00:08:24,342 --> 00:08:26,966
haven't jumped on the technique
because they feel like you need

188
00:08:26,966 --> 00:08:28,410
to do these massive recordings.

189
00:08:28,410 --> 00:08:30,660
But you can actually do
something what's called pseudo

190
00:08:30,660 --> 00:08:33,690
populations, where you build up
a virtual population that you

191
00:08:33,690 --> 00:08:35,940
pretend was recorded
simultaneously but really

192
00:08:35,940 --> 00:08:37,020
wasn't.

193
00:08:37,020 --> 00:08:40,779
So what you do with that is
you just, if on the first day

194
00:08:40,779 --> 00:08:42,570
you recorded one neuron,
and the second day

195
00:08:42,570 --> 00:08:44,891
you recorded the
second neuron, etc.

196
00:08:44,891 --> 00:08:46,890
What you can do is you
can just randomly select,

197
00:08:46,890 --> 00:08:48,390
let's say, one trial
when a kiwi was

198
00:08:48,390 --> 00:08:50,980
shown from the first
day, another trial

199
00:08:50,980 --> 00:08:52,785
from the second day, et cetera.

200
00:08:52,785 --> 00:08:54,270
You randomly pick them.

201
00:08:54,270 --> 00:08:58,721
And then you can just build
up this virtual population.

202
00:08:58,721 --> 00:09:00,720
And you can do that for
a few examples of kiwis,

203
00:09:00,720 --> 00:09:02,130
a few examples of cars.

204
00:09:02,130 --> 00:09:05,690
And then you just train and test
your classifier like normal.

205
00:09:05,690 --> 00:09:08,280
But this kind of broadens
the applicability.

206
00:09:08,280 --> 00:09:10,260
And then you can ask
questions about what

207
00:09:10,260 --> 00:09:13,110
is being lost by doing
this process versus if you

208
00:09:13,110 --> 00:09:15,382
had actually done the
simultaneous recordings.

209
00:09:15,382 --> 00:09:17,340
And we'll discuss that
a little bit more later.

210
00:09:21,710 --> 00:09:24,009
So I'll give you an
example of one classifier,

211
00:09:24,009 --> 00:09:25,550
again, I'm sure
you've seen much more

212
00:09:25,550 --> 00:09:27,091
sophisticated and
interesting methods

213
00:09:27,091 --> 00:09:29,390
but I'll show you a
very basic one that I

214
00:09:29,390 --> 00:09:31,482
have used a bit in the past.

215
00:09:31,482 --> 00:09:33,440
It's called the maximum
correlation coefficient

216
00:09:33,440 --> 00:09:34,390
classifier.

217
00:09:34,390 --> 00:09:36,890
It's, again, very similar to
what Rebecca was talking about.

218
00:09:36,890 --> 00:09:39,080
But all you do is--

219
00:09:39,080 --> 00:09:41,040
let's say this is
our training set.

220
00:09:41,040 --> 00:09:46,310
So we have four
vectors for each image,

221
00:09:46,310 --> 00:09:47,814
each thing we want to classify.

222
00:09:47,814 --> 00:09:49,230
And all we're going
to do is we're

223
00:09:49,230 --> 00:09:52,430
going to take the
average across neurons

224
00:09:52,430 --> 00:09:56,270
to reduce these four
vectors into a single factor

225
00:09:56,270 --> 00:09:57,790
for each stimulus.

226
00:09:57,790 --> 00:10:01,820
OK, so if we did that we'd
get one kind of prototype

227
00:10:01,820 --> 00:10:03,320
of each of the stimuli.

228
00:10:03,320 --> 00:10:06,050
And then to test the classifier,
all we're going to do

229
00:10:06,050 --> 00:10:07,610
is we're going to
take a test point

230
00:10:07,610 --> 00:10:09,901
and we're going to do the
correlation between this test

231
00:10:09,901 --> 00:10:13,220
point and each of the
kind of prototype vectors.

232
00:10:13,220 --> 00:10:15,830
Whichever one has the
highest correlation,

233
00:10:15,830 --> 00:10:19,910
we're going to say
that's the prediction.

234
00:10:19,910 --> 00:10:21,720
Hopefully pretty simple.

235
00:10:25,940 --> 00:10:28,520
The reason we often use
fairly simple classifiers,

236
00:10:28,520 --> 00:10:31,280
such as the maximum correlation
coefficient classifier,

237
00:10:31,280 --> 00:10:32,555
is because--

238
00:10:32,555 --> 00:10:34,490
or at least one
motivation is because it

239
00:10:34,490 --> 00:10:38,240
can be translated into what
information is directly

240
00:10:38,240 --> 00:10:42,650
available to a downstream
population that

241
00:10:42,650 --> 00:10:45,840
is reading the information
in the population you

242
00:10:45,840 --> 00:10:47,540
have recordings from.

243
00:10:47,540 --> 00:10:50,000
So you could actually
view what the classifier

244
00:10:50,000 --> 00:10:53,390
learns as synaptic
weights to a neuron.

245
00:10:53,390 --> 00:10:56,180
You could view the
pattern of activity

246
00:10:56,180 --> 00:10:59,360
you're trying to classify as
the pre-synaptic activity.

247
00:10:59,360 --> 00:11:02,870
And then by doing this dot
product multiplication, perhaps

248
00:11:02,870 --> 00:11:04,760
pass through some
non-linearity, you

249
00:11:04,760 --> 00:11:09,400
can kind of output a
prediction about whether there

250
00:11:09,400 --> 00:11:12,230
is evidence for a particular
stimulus being present.

251
00:11:12,230 --> 00:11:14,900
All right, so let's go into
talking about neural content,

252
00:11:14,900 --> 00:11:17,030
or what information
is in a brain region

253
00:11:17,030 --> 00:11:20,910
and how it needs
decoding to get at that.

254
00:11:20,910 --> 00:11:23,390
So as motivation, I'm
going to be talking

255
00:11:23,390 --> 00:11:26,210
about a very simple experiment.

256
00:11:26,210 --> 00:11:29,270
Basically, this experiment
involves a monkey

257
00:11:29,270 --> 00:11:32,114
fixating on a point for--

258
00:11:32,114 --> 00:11:33,780
well, through the
duration of the trial.

259
00:11:33,780 --> 00:11:35,840
But first, there's
a blank screen.

260
00:11:35,840 --> 00:11:40,520
And then after 500 milliseconds,
up is going to come a stimulus.

261
00:11:40,520 --> 00:11:42,350
And for this
experiment, there is

262
00:11:42,350 --> 00:11:45,039
going to be 7 different possible
stimuli that are shown here.

263
00:11:45,039 --> 00:11:46,580
And what we're going
to try to decode

264
00:11:46,580 --> 00:11:49,820
is which of these stimuli
was present on one

265
00:11:49,820 --> 00:11:51,470
particular trial.

266
00:11:51,470 --> 00:11:55,400
And we're going to do that
as a function of time.

267
00:11:55,400 --> 00:11:57,320
And the data I'm
going to use comes

268
00:11:57,320 --> 00:11:59,210
from the inferior
temporal cortex.

269
00:11:59,210 --> 00:12:02,690
We're going to look at 132
neuron pseudo populations.

270
00:12:02,690 --> 00:12:07,315
This was data recorded by Ying
Jang in Bob Desimone's lab.

271
00:12:07,315 --> 00:12:09,440
It's actually part of a
more complicated experiment

272
00:12:09,440 --> 00:12:12,380
but I've just reduced it here to
the simplest kind of bare bones

273
00:12:12,380 --> 00:12:12,880
nature.

274
00:12:15,374 --> 00:12:16,790
So what we're going
to do is we're

275
00:12:16,790 --> 00:12:21,770
going to basically train
the classifier on one time

276
00:12:21,770 --> 00:12:24,590
point with the average
firing rate in some bin.

277
00:12:24,590 --> 00:12:27,200
I think in this case
it's 100 milliseconds.

278
00:12:27,200 --> 00:12:29,270
And then we're going to
test at that time point.

279
00:12:29,270 --> 00:12:31,430
And then I'm going to slide
over by a small amount

280
00:12:31,430 --> 00:12:32,940
and repeat that process.

281
00:12:32,940 --> 00:12:35,690
So each time we are repeating
training and testing

282
00:12:35,690 --> 00:12:36,695
the classifier.

283
00:12:40,580 --> 00:12:42,830
Again, 100 milliseconds
sampled every 10 seconds,

284
00:12:42,830 --> 00:12:44,300
or sliding every 10 seconds.

285
00:12:44,300 --> 00:12:46,680
And this will give us a flow
of information over time.

286
00:12:46,680 --> 00:12:49,160
So during the baseline
period we should not

287
00:12:49,160 --> 00:12:51,320
be able to decode
what's about to be seen,

288
00:12:51,320 --> 00:12:54,662
unless the monkey is
psychic, in which case

289
00:12:54,662 --> 00:12:56,870
either there is something
wrong with your experiment,

290
00:12:56,870 --> 00:12:57,500
most likely.

291
00:12:57,500 --> 00:13:00,504
Or you should go to Wall
Street with your monkey.

292
00:13:00,504 --> 00:13:02,420
But you know, you shouldn't
get anything here.

293
00:13:02,420 --> 00:13:04,253
And then we should see
some sort of increase

294
00:13:04,253 --> 00:13:06,720
here if there is information.

295
00:13:06,720 --> 00:13:09,500
And this is kind of what it
looks like from the results.

296
00:13:09,500 --> 00:13:11,390
So this is zero.

297
00:13:11,390 --> 00:13:13,190
After here, we should
see information.

298
00:13:13,190 --> 00:13:16,670
This is chance, or 1 over 7.

299
00:13:16,670 --> 00:13:18,560
And so if we try this
decoding experiment,

300
00:13:18,560 --> 00:13:21,050
what we find is
during the baseline,

301
00:13:21,050 --> 00:13:23,390
our monkey is not psychic.

302
00:13:23,390 --> 00:13:26,630
But when we put
on a stimulus, we

303
00:13:26,630 --> 00:13:31,530
can tell what it is pretty
well, like almost perfectly.

304
00:13:31,530 --> 00:13:33,120
Pretty simple.

305
00:13:33,120 --> 00:13:36,000
All right, we can also
do some statistics

306
00:13:36,000 --> 00:13:40,410
to tell you when the decoding
results are above chance doing

307
00:13:40,410 --> 00:13:43,140
some sort of permutation test
where we shuffle the labels

308
00:13:43,140 --> 00:13:45,510
and try to do the decoding
on shuffled labels where

309
00:13:45,510 --> 00:13:47,940
we should get chance
decoding performance.

310
00:13:47,940 --> 00:13:51,690
And then we can see where is our
real result relative to chance,

311
00:13:51,690 --> 00:13:53,510
and get p values and
things like that.

312
00:13:56,640 --> 00:13:57,720
It's pretty simple.

313
00:13:57,720 --> 00:14:00,120
How does this stack up
against other methods

314
00:14:00,120 --> 00:14:02,200
that people commonly use?

315
00:14:02,200 --> 00:14:06,150
So here's our decoding
result. Here's another method.

316
00:14:06,150 --> 00:14:09,270
Here I'm applying an
ANOVA to each neuron

317
00:14:09,270 --> 00:14:12,000
individually and counting
the number of neurons that

318
00:14:12,000 --> 00:14:16,620
are deemed to be selective.

319
00:14:16,620 --> 00:14:18,954
And so what you see is that
there's basically no neurons

320
00:14:18,954 --> 00:14:19,911
in the baseline period.

321
00:14:19,911 --> 00:14:21,280
And then we have a huge number.

322
00:14:21,280 --> 00:14:25,200
OK, so it looks
pretty much identical.

323
00:14:25,200 --> 00:14:28,380
We can compute mutual
information on each neuron

324
00:14:28,380 --> 00:14:31,500
and then average that together
over a whole bunch of neurons.

325
00:14:31,500 --> 00:14:33,780
Again, looks pretty simple.

326
00:14:33,780 --> 00:14:35,490
Or similar, I should say.

327
00:14:35,490 --> 00:14:39,085
Or we can compute a
selectivity index.

328
00:14:39,085 --> 00:14:41,460
Take the best stimulus, subtract
from the worst stimulus,

329
00:14:41,460 --> 00:14:42,640
divide by the sum.

330
00:14:42,640 --> 00:14:43,770
Again, looks similar.

331
00:14:43,770 --> 00:14:47,560
So there's two
takeaway messages here.

332
00:14:47,560 --> 00:14:51,120
First of all, why do decoding if
all the other methods work just

333
00:14:51,120 --> 00:14:52,330
as well?

334
00:14:52,330 --> 00:14:54,990
And I'll show you in a
bit, they don't always.

335
00:14:54,990 --> 00:14:57,150
And then the other
take away message

336
00:14:57,150 --> 00:14:59,684
though is as a reassurance,
it is giving you

337
00:14:59,684 --> 00:15:00,600
the same thing, right?

338
00:15:00,600 --> 00:15:02,520
So you know we're
not completely crazy.

339
00:15:02,520 --> 00:15:04,645
It's a sensible thing to
do in the most basic case.

340
00:15:06,859 --> 00:15:08,400
One other thing
decoding can give you

341
00:15:08,400 --> 00:15:11,640
that these other methods can't
is something called a confusion

342
00:15:11,640 --> 00:15:13,090
matrix.

343
00:15:13,090 --> 00:15:16,020
So a confusion matrix,
Rebecca kind of talked

344
00:15:16,020 --> 00:15:18,450
a little bit about
related concepts,

345
00:15:18,450 --> 00:15:22,050
basically what you have is you
have the true classes here.

346
00:15:22,050 --> 00:15:25,380
So this is what was actually
shown on each trial.

347
00:15:25,380 --> 00:15:28,830
And this is what your
classifier predicted.

348
00:15:28,830 --> 00:15:30,930
So the diagonal elements
mean correct predictions.

349
00:15:30,930 --> 00:15:34,320
There actually was a car
shown and you predicted a car.

350
00:15:34,320 --> 00:15:36,780
But you can look at the
off diagonal elements

351
00:15:36,780 --> 00:15:41,340
and you can see what was
commonly made as a mistake.

352
00:15:41,340 --> 00:15:43,440
And this can tell you,
oh, these two stimuli

353
00:15:43,440 --> 00:15:47,280
are represented in a similar
way in a brain region, where

354
00:15:47,280 --> 00:15:48,476
the mistakes are happening.

355
00:15:53,310 --> 00:15:56,700
So another kind of
methods issue is,

356
00:15:56,700 --> 00:16:00,599
what is the effect of using
different classifiers?

357
00:16:00,599 --> 00:16:02,890
If the method is highly
dependent on the classifier you

358
00:16:02,890 --> 00:16:06,240
use, then that's
not a good thing

359
00:16:06,240 --> 00:16:08,540
because you're not
telling yourself

360
00:16:08,540 --> 00:16:10,290
anything about the
data, but you're really

361
00:16:10,290 --> 00:16:11,873
telling you something
about the method

362
00:16:11,873 --> 00:16:13,510
you use to extract that data.

363
00:16:13,510 --> 00:16:17,160
But in general, for at least
simple decoding questions,

364
00:16:17,160 --> 00:16:20,434
it's pretty robust to the choice
of classifier you would use.

365
00:16:20,434 --> 00:16:22,350
So here is the maximum
correlation coefficient

366
00:16:22,350 --> 00:16:24,290
classifier I told you about.

367
00:16:24,290 --> 00:16:25,740
Here's a support vector machine.

368
00:16:25,740 --> 00:16:28,350
You can see like almost
everything looks similar.

369
00:16:28,350 --> 00:16:30,930
And like when there's
something not working as well,

370
00:16:30,930 --> 00:16:33,300
it's generally a
slight downward shift.

371
00:16:33,300 --> 00:16:35,610
So you get the same
kind of estimation

372
00:16:35,610 --> 00:16:37,860
of how much information is
in a brain region flowing

373
00:16:37,860 --> 00:16:39,330
as a function of time.

374
00:16:39,330 --> 00:16:43,136
But maybe your absolute accuracy
is just a little bit lower

375
00:16:43,136 --> 00:16:44,760
if you're not using
the optimal method.

376
00:16:44,760 --> 00:16:47,301
But really, it seems like we're
assessing what is in the data

377
00:16:47,301 --> 00:16:50,250
and not so much
about the algorithm.

378
00:16:50,250 --> 00:16:51,960
So that was decoding
basic information

379
00:16:51,960 --> 00:16:54,480
in terms of content.

380
00:16:54,480 --> 00:16:56,760
But I think one of the most
powerful things decoding

381
00:16:56,760 --> 00:16:59,520
can do is it can
decode what I call

382
00:16:59,520 --> 00:17:02,094
abstract or
invariant information

383
00:17:02,094 --> 00:17:04,510
where you can get an assessment
of whether that's present.

384
00:17:04,510 --> 00:17:06,190
So what does that mean?

385
00:17:06,190 --> 00:17:09,810
Well, basically you can think of
something like the word hello.

386
00:17:09,810 --> 00:17:11,760
It has many different
pronunciations

387
00:17:11,760 --> 00:17:13,030
in different languages.

388
00:17:13,030 --> 00:17:14,821
But if you speak these
different languages,

389
00:17:14,821 --> 00:17:17,010
you can kind of
translate that word

390
00:17:17,010 --> 00:17:19,230
into some sort of meaning
that it's a greeting.

391
00:17:19,230 --> 00:17:21,460
And you know how to
respond appropriately.

392
00:17:21,460 --> 00:17:23,230
So that's kind of a
form of abstraction.

393
00:17:23,230 --> 00:17:26,250
It's going from very
different sound concepts

394
00:17:26,250 --> 00:17:28,496
into some sort of
abstract representation

395
00:17:28,496 --> 00:17:30,870
where I know how to respond
appropriately by saying hello

396
00:17:30,870 --> 00:17:33,330
back in that language.

397
00:17:33,330 --> 00:17:37,260
Or another example of this kind
of abstraction or invariance

398
00:17:37,260 --> 00:17:41,050
is the invariance of
the pose of a head.

399
00:17:41,050 --> 00:17:43,890
So for example, here is a bunch
of pictures of Hillary Clinton.

400
00:17:43,890 --> 00:17:46,590
You can see her head is
at very different angles.

401
00:17:46,590 --> 00:17:49,170
But we can still tell
it's Hillary Clinton.

402
00:17:49,170 --> 00:17:51,570
So we have some sort of
representation of Hillary

403
00:17:51,570 --> 00:17:54,270
that's abstracted from the
exact pose of her head,

404
00:17:54,270 --> 00:17:56,550
and also abstracted from
the color of her pantsuit.

405
00:17:56,550 --> 00:18:00,120
It's very highly
abstract, right?

406
00:18:00,120 --> 00:18:03,780
So that's pretty powerful to
know how the brain is dropping

407
00:18:03,780 --> 00:18:06,560
information in order to build
up these representations that

408
00:18:06,560 --> 00:18:08,220
are useful for behavior.

409
00:18:08,220 --> 00:18:09,720
And I think if we
were, again, going

410
00:18:09,720 --> 00:18:11,790
to build intelligent
robotic system,

411
00:18:11,790 --> 00:18:14,100
we'd want to build it
to have representations

412
00:18:14,100 --> 00:18:20,340
that have become more abstract
so it can perform correctly.

413
00:18:20,340 --> 00:18:22,590
So let's show you
the example of how

414
00:18:22,590 --> 00:18:28,220
we can assess abstract
representations in neural data.

415
00:18:28,220 --> 00:18:31,420
What I'm going to look at
is position invariance.

416
00:18:31,420 --> 00:18:33,720
So this is similar
to a study that

417
00:18:33,720 --> 00:18:36,571
was done in 2005 by Hung
and Kreiman in Science.

418
00:18:36,571 --> 00:18:38,070
And what I'm going
to do here is I'm

419
00:18:38,070 --> 00:18:42,730
going to train the classifier
with data at an upper location.

420
00:18:42,730 --> 00:18:44,790
So in this experiment,
the stimuli

421
00:18:44,790 --> 00:18:47,230
was shown at three
different locations.

422
00:18:47,230 --> 00:18:49,170
So on any given
trial, one stimulus

423
00:18:49,170 --> 00:18:50,701
was shown at one location.

424
00:18:50,701 --> 00:18:52,200
And these three
locations were used,

425
00:18:52,200 --> 00:18:55,410
so the 7 objects were all
shown at the upper location,

426
00:18:55,410 --> 00:18:57,290
or at the middle, at the lower.

427
00:18:57,290 --> 00:18:58,790
And here I'm training
the classifier

428
00:18:58,790 --> 00:19:01,130
using just the trials
when the stimuli was

429
00:19:01,130 --> 00:19:02,880
shown in the upper location.

430
00:19:02,880 --> 00:19:05,430
And then what we can do is we
can then test the classifier

431
00:19:05,430 --> 00:19:07,260
on those trials where
the stimuli were just

432
00:19:07,260 --> 00:19:09,080
shown at the lower location.

433
00:19:09,080 --> 00:19:11,350
And we can see, if we train
at the upper location,

434
00:19:11,350 --> 00:19:13,830
does it generalize to
the lower location.

435
00:19:13,830 --> 00:19:16,530
And if it does, it means there
is a representation that's

436
00:19:16,530 --> 00:19:18,580
invariant to position.

437
00:19:18,580 --> 00:19:22,320
Does that make
sense to everyone?

438
00:19:22,320 --> 00:19:23,910
So let's take a
look at the results

439
00:19:23,910 --> 00:19:27,810
for training at the upper
and testing at the lower.

440
00:19:27,810 --> 00:19:29,597
They're down here.

441
00:19:29,597 --> 00:19:31,680
So here again, I'm training
at the upper location.

442
00:19:31,680 --> 00:19:33,763
And this is the results
from testing at the lower.

443
00:19:33,763 --> 00:19:34,470
Here is chance.

444
00:19:34,470 --> 00:19:37,600
And you can see we're well
above chance in the decoding.

445
00:19:37,600 --> 00:19:40,680
So it's generalizing from the
upper location to the lower.

446
00:19:40,680 --> 00:19:45,150
We can also train at the upper
and test at the same upper,

447
00:19:45,150 --> 00:19:47,040
at the middle location.

448
00:19:47,040 --> 00:19:48,894
And what we find is
this pattern of results.

449
00:19:48,894 --> 00:19:51,060
So we're getting best results
when we train and test

450
00:19:51,060 --> 00:19:52,860
at exactly the same position.

451
00:19:52,860 --> 00:19:55,440
But we can see it does
generalize to other positions

452
00:19:55,440 --> 00:19:57,770
as well.

453
00:19:57,770 --> 00:20:00,692
And so we can do this full
permutations of things.

454
00:20:00,692 --> 00:20:02,150
So here we trained
at the upper, we

455
00:20:02,150 --> 00:20:05,282
could also train at the middle,
or train at the lower location.

456
00:20:05,282 --> 00:20:06,740
And here if we
train at the middle,

457
00:20:06,740 --> 00:20:08,720
we get the best decoding
performance when

458
00:20:08,720 --> 00:20:10,040
we decode at that same middle.

459
00:20:10,040 --> 00:20:12,680
But again, it's generalizing to
the upper and lower locations,

460
00:20:12,680 --> 00:20:14,250
and the same for
training at lower.

461
00:20:14,250 --> 00:20:15,875
Get the best performance
testing lower,

462
00:20:15,875 --> 00:20:18,350
but it again generalizes.

463
00:20:18,350 --> 00:20:22,340
So if you want to just conclude
this one mini study here,

464
00:20:22,340 --> 00:20:24,620
you know, information in
IT is position invariant

465
00:20:24,620 --> 00:20:26,150
but not you know 100%.

466
00:20:30,642 --> 00:20:31,850
So we can use this technique.

467
00:20:31,850 --> 00:20:33,050
I'll show you a
few other examples

468
00:20:33,050 --> 00:20:35,360
of how it can be used in
slightly more powerful ways,

469
00:20:35,360 --> 00:20:38,760
maybe, or to answer slightly
more interesting questions.

470
00:20:38,760 --> 00:20:42,520
So what another question
we might want to ask,

471
00:20:42,520 --> 00:20:46,280
actually we did ask in this
paper that just came out,

472
00:20:46,280 --> 00:20:50,150
was about the question
of pose invariant

473
00:20:50,150 --> 00:20:51,950
identity information,
so that same question

474
00:20:51,950 --> 00:20:55,820
about can a brain region
respond to Hillary Clinton

475
00:20:55,820 --> 00:20:58,550
regardless of where
she's looking.

476
00:20:58,550 --> 00:21:01,795
And so this is data recorded
by Winrich Freiwald and Doris

477
00:21:01,795 --> 00:21:02,295
Tsao.

478
00:21:02,295 --> 00:21:05,460
Winrich probably already
talked about this experiment.

479
00:21:05,460 --> 00:21:08,390
But what they did was they
had the face system here

480
00:21:08,390 --> 00:21:10,010
where they found
these little patches

481
00:21:10,010 --> 00:21:13,620
through fMRI that respond more
to faces than other stimuli.

482
00:21:13,620 --> 00:21:16,070
They went in and they
recorded from these patches.

483
00:21:16,070 --> 00:21:19,740
And in this study that we're
going to look at, they did a--

484
00:21:19,740 --> 00:21:23,840
they used these stimuli that
had 25 different individuals

485
00:21:23,840 --> 00:21:26,151
shown at eight different
head orientations.

486
00:21:26,151 --> 00:21:28,400
So this is Doris at eight
different head orientations,

487
00:21:28,400 --> 00:21:33,014
but there were 24 other
people who also were shown.

488
00:21:33,014 --> 00:21:34,430
And so what I'm
going to try to do

489
00:21:34,430 --> 00:21:37,730
is decode between the 25
different people and see,

490
00:21:37,730 --> 00:21:40,790
can it generalize if I
train at one orientation

491
00:21:40,790 --> 00:21:43,290
and test at a different one.

492
00:21:43,290 --> 00:21:46,010
And the three brain
regions we're going to use

493
00:21:46,010 --> 00:21:47,790
is the most posterior region.

494
00:21:47,790 --> 00:21:50,340
So in this case, the eyes
out here, this is like V1.

495
00:21:50,340 --> 00:21:51,960
This is the ventral pathway.

496
00:21:51,960 --> 00:21:55,200
So the most posterior region,
we can combine ML and MF.

497
00:21:55,200 --> 00:21:58,820
We compare that to AL and to AM.

498
00:21:58,820 --> 00:22:01,374
I'm going to see how much
position variance is there.

499
00:22:01,374 --> 00:22:03,290
So again, like I said,
let's start by training

500
00:22:03,290 --> 00:22:05,540
on the left profile
and then we can

501
00:22:05,540 --> 00:22:08,000
test on the left profile
in different trials.

502
00:22:08,000 --> 00:22:11,690
Or we can test on a
different set of images

503
00:22:11,690 --> 00:22:15,620
where the individuals were
looking straight forward.

504
00:22:15,620 --> 00:22:19,520
So here are the results from the
most posterior region, ML/MF.

505
00:22:19,520 --> 00:22:22,100
What we see is if we
train in the left profile

506
00:22:22,100 --> 00:22:24,440
and test on the
left profile here,

507
00:22:24,440 --> 00:22:27,140
we're getting results that
are above chance, as indicated

508
00:22:27,140 --> 00:22:30,530
by the lighter blue trace.

509
00:22:30,530 --> 00:22:33,050
But if we train on the
left profile and test

510
00:22:33,050 --> 00:22:34,670
in the straight
results, we're getting

511
00:22:34,670 --> 00:22:37,790
results that are at chance.

512
00:22:37,790 --> 00:22:42,587
So this patch here is not
showing very much pose

513
00:22:42,587 --> 00:22:43,540
invariance.

514
00:22:43,540 --> 00:22:45,540
So let's take a look at
the rest of the results.

515
00:22:45,540 --> 00:22:47,270
So this is ML/MF.

516
00:22:47,270 --> 00:22:49,779
If we look at AL,
what we see is,

517
00:22:49,779 --> 00:22:52,070
again, there's a big advantage
for training and testing

518
00:22:52,070 --> 00:22:53,390
at that same orientation.

519
00:22:53,390 --> 00:22:55,010
But now we're seeing
generalization

520
00:22:55,010 --> 00:22:56,870
to the other orientations.

521
00:22:56,870 --> 00:22:59,390
You're also seeing this "U"
pattern where you're actually

522
00:22:59,390 --> 00:23:01,550
generalizing better
from one profile

523
00:23:01,550 --> 00:23:03,920
to the opposite profile,
which was reported in some

524
00:23:03,920 --> 00:23:06,710
of their earlier papers.

525
00:23:06,710 --> 00:23:08,925
But yeah, here you're
seeing, statistically,

526
00:23:08,925 --> 00:23:09,800
that is above chance.

527
00:23:09,800 --> 00:23:12,622
Now it's not huge,
but it's above what

528
00:23:12,622 --> 00:23:13,580
you'd expect by chance.

529
00:23:13,580 --> 00:23:16,580
And if we look at
AM as well, we're

530
00:23:16,580 --> 00:23:19,640
seeing a higher degree
of invariance, again,

531
00:23:19,640 --> 00:23:23,390
a slight advantage to the exact
pose, but still pretty good.

532
00:23:23,390 --> 00:23:25,070
Again, this "U" a
little bit but yeah,

533
00:23:25,070 --> 00:23:26,300
we're going to the
back of the head.

534
00:23:26,300 --> 00:23:27,390
So what would that
tell you, the fact

535
00:23:27,390 --> 00:23:29,473
that it's going to the
back of the head, tells you

536
00:23:29,473 --> 00:23:31,510
it's probably representing
something about hair.

537
00:23:31,510 --> 00:23:32,900
What I'm going to do next,
rather than just training

538
00:23:32,900 --> 00:23:35,990
at the left profile, I'm going
to take the results of training

539
00:23:35,990 --> 00:23:40,070
at each of the profiles and
either testing at the same

540
00:23:40,070 --> 00:23:41,900
or testing at a
different profile.

541
00:23:41,900 --> 00:23:45,080
And then I'm going to plot
it as a function of time.

542
00:23:45,080 --> 00:23:49,070
So here are the results
of training and testing

543
00:23:49,070 --> 00:23:50,850
at the same pose.

544
00:23:50,850 --> 00:23:52,310
So the non-invariant case.

545
00:23:52,310 --> 00:23:54,650
This is ML/MF.

546
00:23:54,650 --> 00:23:56,135
And this AL and AM.

547
00:23:56,135 --> 00:23:59,420
So this is going from the
back of the head anterior.

548
00:23:59,420 --> 00:24:00,890
And what you see
is there is a kind

549
00:24:00,890 --> 00:24:05,430
of an increase in this
pose-specific information.

550
00:24:05,430 --> 00:24:07,850
Here the increase
is fairly small.

551
00:24:07,850 --> 00:24:10,100
But there is just
generally more information

552
00:24:10,100 --> 00:24:11,120
as you're going down.

553
00:24:11,120 --> 00:24:14,120
But the big increase is
really in this pose invariant

554
00:24:14,120 --> 00:24:14,756
information.

555
00:24:14,756 --> 00:24:16,880
When you train at one
location and test at another,

556
00:24:16,880 --> 00:24:18,260
that's these red traces here.

557
00:24:18,260 --> 00:24:21,740
And here you can see it's
really accelerating a lot.

558
00:24:21,740 --> 00:24:25,250
It's really that these
areas downstream are maybe

559
00:24:25,250 --> 00:24:27,380
pooling over the different
poses to create opposing

560
00:24:27,380 --> 00:24:30,770
invariant representation.

561
00:24:30,770 --> 00:24:35,240
So to carry on with this for
general concept of testing

562
00:24:35,240 --> 00:24:37,494
invariant representations
or abstract representations,

563
00:24:37,494 --> 00:24:39,410
let me just give you one
more example of that.

564
00:24:39,410 --> 00:24:41,460
Here was one of my
earlier studies.

565
00:24:41,460 --> 00:24:47,520
What I did was this study was
looking at categorization.

566
00:24:47,520 --> 00:24:49,460
It was a study done
in Earl Miller's lab.

567
00:24:49,460 --> 00:24:50,969
David Friedman
collected the data.

568
00:24:50,969 --> 00:24:52,760
And what they did was
they trained a monkey

569
00:24:52,760 --> 00:24:55,670
to group a bunch of images
together and called them cats.

570
00:24:55,670 --> 00:24:58,260
And then to group a number
of images together and called

571
00:24:58,260 --> 00:24:59,730
them dogs.

572
00:24:59,730 --> 00:25:01,620
It wasn't clear that
the images necessarily

573
00:25:01,620 --> 00:25:03,661
were more similar to each
other within a category

574
00:25:03,661 --> 00:25:05,070
versus out of the category.

575
00:25:05,070 --> 00:25:07,080
But through this
training, the monkeys

576
00:25:07,080 --> 00:25:11,170
could quite well group the
images together in a delayed

577
00:25:11,170 --> 00:25:12,900
match to sample task.

578
00:25:12,900 --> 00:25:14,790
And so what I
wanted to know was,

579
00:25:14,790 --> 00:25:17,250
is there information that is
kind of about the animal's

580
00:25:17,250 --> 00:25:21,630
category that is abstracted
away from the low level

581
00:25:21,630 --> 00:25:23,310
of visual features.

582
00:25:23,310 --> 00:25:25,440
OK, so was this
learning process,

583
00:25:25,440 --> 00:25:27,570
did they build neural
representations that

584
00:25:27,570 --> 00:25:29,940
are more similar to each other?

585
00:25:29,940 --> 00:25:34,770
So what I did here was I
trained the classifier on two

586
00:25:34,770 --> 00:25:36,720
of the prototype images.

587
00:25:36,720 --> 00:25:40,100
And then I tested it on
a left out prototype.

588
00:25:40,100 --> 00:25:42,450
And so if it's making
correct predictions here,

589
00:25:42,450 --> 00:25:45,180
then it is generalizing
to something

590
00:25:45,180 --> 00:25:47,930
that would only be available
in the data if the monkey had--

591
00:25:47,930 --> 00:25:51,750
due to the monkey's training.

592
00:25:51,750 --> 00:25:55,750
Modulo any low level compounds.

593
00:25:55,750 --> 00:25:59,280
And so here is decoding of
this abstract or invariant

594
00:25:59,280 --> 00:26:00,649
information from the two areas.

595
00:26:00,649 --> 00:26:02,190
And what you see,
indeed, there seems

596
00:26:02,190 --> 00:26:04,710
to be this kind of
grouping effect, where

597
00:26:04,710 --> 00:26:08,190
the category is represented
both in IT and PFC

598
00:26:08,190 --> 00:26:10,517
in this abstract way.

599
00:26:10,517 --> 00:26:12,600
So the same method can be
used to assess learning.

600
00:26:16,190 --> 00:26:19,370
So just to summarize
the neural content part,

601
00:26:19,370 --> 00:26:22,247
decoding offers a way to clearly
see what information is there

602
00:26:22,247 --> 00:26:24,080
and how it is flowing
through a brain region

603
00:26:24,080 --> 00:26:27,170
as a function of time.

604
00:26:27,170 --> 00:26:29,760
We can assess basic
information and often it

605
00:26:29,760 --> 00:26:32,090
yields similar results
to other methods.

606
00:26:32,090 --> 00:26:34,990
But we can also do
things like assess

607
00:26:34,990 --> 00:26:36,830
abstract or invariant
information, which

608
00:26:36,830 --> 00:26:38,570
is not really possible
with other methods

609
00:26:38,570 --> 00:26:41,180
as far as I can see how to
use those other methods.

610
00:26:44,330 --> 00:26:48,800
So for neural coding, my
motivation is the game poker.

611
00:26:48,800 --> 00:26:50,079
This one study I did.

612
00:26:50,079 --> 00:26:52,370
Basically, when I moved to
Boston I learned how to play

613
00:26:52,370 --> 00:26:54,200
Texas Hold'em.

614
00:26:54,200 --> 00:26:56,917
It's a card game where, you
know-- it's a variant of poker,

615
00:26:56,917 --> 00:26:59,000
I'm sure most of you know,
I didn't know the rules

616
00:26:59,000 --> 00:27:01,386
before but I learned the rules.

617
00:27:01,386 --> 00:27:03,260
And I could play the
game pretty successfully

618
00:27:03,260 --> 00:27:05,480
in terms of at least applying
those rules correctly,

619
00:27:05,480 --> 00:27:07,610
not necessarily in
terms of winning money.

620
00:27:07,610 --> 00:27:09,530
But I knew what to do.

621
00:27:09,530 --> 00:27:11,480
And prior to that, I
had known other games

622
00:27:11,480 --> 00:27:14,390
like Go Fish, or
War, or whatever.

623
00:27:14,390 --> 00:27:15,860
And me learning
how to play poker

624
00:27:15,860 --> 00:27:19,070
did not disrupt my
ability to play go fish.

625
00:27:19,070 --> 00:27:21,290
I was still bad at that as well.

626
00:27:21,290 --> 00:27:26,180
So somehow this information that
allowed me to play this game

627
00:27:26,180 --> 00:27:28,610
had to be added
into my brain if we

628
00:27:28,610 --> 00:27:30,929
believe brains cause behavior.

629
00:27:30,929 --> 00:27:33,470
And so in this study, we're kind
of getting at that question,

630
00:27:33,470 --> 00:27:37,490
what changed about a brain to
allow it to perform a new task?

631
00:27:40,740 --> 00:27:45,230
And so to do this in an
experiment with monkeys,

632
00:27:45,230 --> 00:27:46,730
basically, they
used a paradigm that

633
00:27:46,730 --> 00:27:49,084
had two different phases to it.

634
00:27:49,084 --> 00:27:51,500
The first phase, what they
did, was they had a monkey just

635
00:27:51,500 --> 00:27:54,050
do a passive fixation task.

636
00:27:54,050 --> 00:27:56,300
So what the monkey
did was, there

637
00:27:56,300 --> 00:27:58,700
would be a fixation
dot that came up.

638
00:27:58,700 --> 00:28:00,560
Up would come a stimulus.

639
00:28:00,560 --> 00:28:01,790
There would be a delay.

640
00:28:01,790 --> 00:28:03,560
There would be a
second stimulus.

641
00:28:03,560 --> 00:28:05,570
And there would
be a second delay.

642
00:28:05,570 --> 00:28:07,047
And then there
would be a reward.

643
00:28:07,047 --> 00:28:09,380
And the reward was given just
for the monkey maintaining

644
00:28:09,380 --> 00:28:09,994
fixation.

645
00:28:09,994 --> 00:28:11,660
The monkey did not
need to pay attention

646
00:28:11,660 --> 00:28:14,120
to what the stimuli were at all.

647
00:28:14,120 --> 00:28:16,630
And on some trials the
stimuli was the same.

648
00:28:16,630 --> 00:28:18,780
On other trials,
they were different.

649
00:28:18,780 --> 00:28:21,736
But the monkey did not
need to care about that.

650
00:28:21,736 --> 00:28:23,110
So monkey does
this passive task.

651
00:28:23,110 --> 00:28:26,690
They record like
over 750 neurons

652
00:28:26,690 --> 00:28:29,287
from the prefrontal cortex.

653
00:28:29,287 --> 00:28:31,370
And then what they did was
they trained the monkey

654
00:28:31,370 --> 00:28:34,220
to deal with delayed
match to sample task.

655
00:28:34,220 --> 00:28:37,290
And the delayed match to
sample task ran very similar.

656
00:28:37,290 --> 00:28:39,500
So it had a fixation.

657
00:28:39,500 --> 00:28:40,790
There was a first stimulus.

658
00:28:40,790 --> 00:28:44,297
There was a delay, a second
stimulus, a second delay.

659
00:28:44,297 --> 00:28:46,130
So up to this point,
the sequence of stimuli

660
00:28:46,130 --> 00:28:48,570
was exactly the same.

661
00:28:48,570 --> 00:28:51,080
But now after the
second delay, up came

662
00:28:51,080 --> 00:28:55,340
a choice target, a choice
image, and the monkey

663
00:28:55,340 --> 00:28:57,860
needed to make a saccade
to the green stimulus

664
00:28:57,860 --> 00:29:01,279
if these two stimuli
were matches.

665
00:29:01,279 --> 00:29:03,320
And needed to make a
saccade to the blue stimulus

666
00:29:03,320 --> 00:29:05,684
if they were different.

667
00:29:05,684 --> 00:29:07,850
And so what we wanted to
know was when the monkey is

668
00:29:07,850 --> 00:29:10,340
performing this task, it
needs to remember the stimuli

669
00:29:10,340 --> 00:29:12,000
and whether they
were matched or not,

670
00:29:12,000 --> 00:29:15,740
is there a change in
the monkey's brain.

671
00:29:15,740 --> 00:29:17,510
And so the way we're
going to get at this

672
00:29:17,510 --> 00:29:21,405
is, not surprisingly,
doing a decoding approach.

673
00:29:21,405 --> 00:29:23,780
And what we do is we're going
to use the same thing where

674
00:29:23,780 --> 00:29:25,760
we train to classify
at one point in time,

675
00:29:25,760 --> 00:29:28,070
test, and move on.

676
00:29:28,070 --> 00:29:31,190
And what we should
find is that we're

677
00:29:31,190 --> 00:29:33,200
going to try to decode
whether to stimuli

678
00:29:33,200 --> 00:29:34,280
matched or did not match.

679
00:29:34,280 --> 00:29:37,100
And so at the time when the
second stimulus was shown,

680
00:29:37,100 --> 00:29:39,020
we should have some sort
of information about

681
00:29:39,020 --> 00:29:40,478
whether it was a
match or non-match

682
00:29:40,478 --> 00:29:42,070
if any information is present.

683
00:29:42,070 --> 00:29:43,820
And we can see, was
that information there

684
00:29:43,820 --> 00:29:46,202
before when the monkey was
just passively fixating,

685
00:29:46,202 --> 00:29:48,410
or does that information
come on only after training.

686
00:29:51,530 --> 00:29:55,400
So here is a schematic of
the results for decoding.

687
00:29:55,400 --> 00:29:57,220
It's a binary task,
whether a trial

688
00:29:57,220 --> 00:29:58,700
was a match or a non-match.

689
00:29:58,700 --> 00:30:01,970
So chance is 50% if
you were guessing.

690
00:30:01,970 --> 00:30:04,220
This light gray shaded
region is the time

691
00:30:04,220 --> 00:30:05,780
when the first stimuli came on.

692
00:30:05,780 --> 00:30:09,440
This second region is the time
the second stimulus came on.

693
00:30:09,440 --> 00:30:11,840
And here is where we're
kind of going to ignore,

694
00:30:11,840 --> 00:30:14,060
this was either the
monkey was making a choice

695
00:30:14,060 --> 00:30:15,450
or got a juice reward.

696
00:30:15,450 --> 00:30:17,420
We just ignore that.

697
00:30:17,420 --> 00:30:19,400
So let's make this interactive.

698
00:30:19,400 --> 00:30:21,257
How many people thought
there was-- or think

699
00:30:21,257 --> 00:30:23,840
there might be information about
whether the two stimuli match

700
00:30:23,840 --> 00:30:27,740
or do not match prior to the
monkey doing the tasks, so

701
00:30:27,740 --> 00:30:30,840
just in the pacification task?

702
00:30:30,840 --> 00:30:33,530
Two, three, four, five--

703
00:30:33,530 --> 00:30:36,440
how many people
think there was not?

704
00:30:36,440 --> 00:30:38,610
OK, I'd say it's
about a 50/50 split.

705
00:30:38,610 --> 00:30:42,090
OK, so let's look at the
passive fixation task.

706
00:30:42,090 --> 00:30:45,300
And what we find is that there
really wasn't any information.

707
00:30:45,300 --> 00:30:47,580
So there's no blue
bar down here.

708
00:30:47,580 --> 00:30:50,270
So as far as the
decoding could tell,

709
00:30:50,270 --> 00:30:52,610
I cannot tell whether the two
stimuli match or not match

710
00:30:52,610 --> 00:30:55,190
in the passive fixation.

711
00:30:55,190 --> 00:30:58,340
What about in the active
delay match to sample task,

712
00:30:58,340 --> 00:31:00,860
how many people think--

713
00:31:00,860 --> 00:31:03,511
it would be a pretty boring
talk if there wasn't.

714
00:31:03,511 --> 00:31:04,010
what area?

715
00:31:04,010 --> 00:31:06,920
We're talking about
dorsolateral--

716
00:31:06,920 --> 00:31:10,780
actually, both dorsa and ventra
lateral prefrontal cortex.

717
00:31:17,310 --> 00:31:20,310
Yeah, indeed there
was information there.

718
00:31:20,310 --> 00:31:23,180
In fact, we could
decode nearly perfectly

719
00:31:23,180 --> 00:31:25,880
from that brain region.

720
00:31:25,880 --> 00:31:29,060
So way up here at the time when
the second stimulus was shown.

721
00:31:29,060 --> 00:31:33,212
So clearly performing
the task, or learning

722
00:31:33,212 --> 00:31:34,670
how to perform the
task, influenced

723
00:31:34,670 --> 00:31:37,331
what information was present
in the prefrontal cortex.

724
00:31:37,331 --> 00:31:39,080
I'm pretty convinced
that this information

725
00:31:39,080 --> 00:31:40,640
is present and real.

726
00:31:40,640 --> 00:31:43,100
Now the question is,
and why I'm using this

727
00:31:43,100 --> 00:31:45,170
as an example of coding,
how did this information

728
00:31:45,170 --> 00:31:48,350
get added into the population.

729
00:31:48,350 --> 00:31:50,577
We believe it's there
for real and probably

730
00:31:50,577 --> 00:31:52,660
contributing to behavior
it's a pretty big effect.

731
00:31:55,190 --> 00:31:58,610
All right, so here is just
some single neuron results.

732
00:31:58,610 --> 00:32:00,440
What I've plotted
here is this is

733
00:32:00,440 --> 00:32:02,990
a measure of how much of
the variability of a neuron

734
00:32:02,990 --> 00:32:08,060
is predicted about whether a
trial is match or non-match.

735
00:32:08,060 --> 00:32:10,550
And I've plotted
each dot as a neuron.

736
00:32:10,550 --> 00:32:12,530
I've plotted each
neuron at the time

737
00:32:12,530 --> 00:32:14,990
where it had this
maximum value of being

738
00:32:14,990 --> 00:32:17,680
able to predict whether a
trial is match or non-match.

739
00:32:17,680 --> 00:32:19,180
And so this is the passive case.

740
00:32:19,180 --> 00:32:20,930
And so this is kind
of a null distribution

741
00:32:20,930 --> 00:32:23,900
because we didn't
see any information

742
00:32:23,900 --> 00:32:27,840
present about match or
non-match in the passive case.

743
00:32:27,840 --> 00:32:29,840
When the monkey was
performing the delayed match

744
00:32:29,840 --> 00:32:31,970
to sample task, what
you see is that there's

745
00:32:31,970 --> 00:32:34,280
kind of a small
number of neurons

746
00:32:34,280 --> 00:32:38,570
that become selective after
the second stimulus is shown.

747
00:32:38,570 --> 00:32:41,900
So it seems like a few
neurons are carrying

748
00:32:41,900 --> 00:32:43,631
a bunch of the information.

749
00:32:43,631 --> 00:32:46,130
Let's see if we can quantify
this just maybe a little better

750
00:32:46,130 --> 00:32:47,970
using decoding.

751
00:32:47,970 --> 00:32:50,600
So what we're going
to do is we're

752
00:32:50,600 --> 00:32:53,870
going to take the
training set and we're

753
00:32:53,870 --> 00:32:58,580
going to do an ANOVA to find,
let's say, the eight neurons

754
00:32:58,580 --> 00:33:01,570
that carry the most information
out of the whole population.

755
00:33:01,570 --> 00:33:03,120
So the 750 neurons,
let's just find

756
00:33:03,120 --> 00:33:07,980
the eight that had the
smallest p value in an ANOVA.

757
00:33:07,980 --> 00:33:09,680
And so we can find
those neurons.

758
00:33:09,680 --> 00:33:10,760
And we can keep them.

759
00:33:10,760 --> 00:33:13,431
And we can delete all
the other neurons.

760
00:33:13,431 --> 00:33:14,930
And then now we
found those neurons,

761
00:33:14,930 --> 00:33:18,380
we'll also go to the test set
and we'll delete those neurons.

762
00:33:18,380 --> 00:33:22,340
And now we'll try doing the
whole decoding procedure

763
00:33:22,340 --> 00:33:24,811
on the smaller population.

764
00:33:24,811 --> 00:33:26,810
And by deleting the neurons
on the training set,

765
00:33:26,810 --> 00:33:28,550
we're not really
biasing our results

766
00:33:28,550 --> 00:33:32,690
when we start doing
the classification.

767
00:33:32,690 --> 00:33:36,200
So here are the results
using all 750 neurons

768
00:33:36,200 --> 00:33:38,300
that I showed you before.

769
00:33:38,300 --> 00:33:42,050
And here are the results using
just the eight best neurons.

770
00:33:42,050 --> 00:33:44,270
And what you can see is
that the eight best neurons

771
00:33:44,270 --> 00:33:48,327
are doing almost as well
as using all 750 neurons.

772
00:33:48,327 --> 00:33:50,410
Now I should say, there
might be a different eight

773
00:33:50,410 --> 00:33:51,770
best at each point
in time because I'm

774
00:33:51,770 --> 00:33:52,910
shifting that bin around.

775
00:33:52,910 --> 00:33:54,368
But still, at any
one point in time

776
00:33:54,368 --> 00:33:57,770
there are eight neurons that
are really, really good.

777
00:33:57,770 --> 00:34:00,920
So clearly there is kind of
this compact or small subset

778
00:34:00,920 --> 00:34:05,361
of neurons that carry the whole
information of the population.

779
00:34:05,361 --> 00:34:06,860
Once you've done
that, you might not

780
00:34:06,860 --> 00:34:09,830
want to know the flip of that,
how many redundant neurons are

781
00:34:09,830 --> 00:34:12,239
there that also carry
that information.

782
00:34:12,239 --> 00:34:15,949
So here are the results,
again, showing all 750 neurons

783
00:34:15,949 --> 00:34:16,812
as a comparison.

784
00:34:16,812 --> 00:34:18,270
And what I'm going
to do now is I'm

785
00:34:18,270 --> 00:34:20,311
going to take those eight
best neurons, find them

786
00:34:20,311 --> 00:34:22,560
in the training
set, throw them out.

787
00:34:22,560 --> 00:34:24,080
I'm going to also
throw another 120

788
00:34:24,080 --> 00:34:27,415
of the best neurons just to
get rid of a lot of stuff.

789
00:34:27,415 --> 00:34:29,040
So I'm going to throw
out the best 128.

790
00:34:29,040 --> 00:34:30,800
And then we'll look at the
remaining neurons and see,

791
00:34:30,800 --> 00:34:33,050
is there redundant
information in those neurons.

792
00:34:33,050 --> 00:34:36,949
It's still like 600
neurons or more.

793
00:34:36,949 --> 00:34:38,810
And so here are the
results from that.

794
00:34:38,810 --> 00:34:41,690
What you see is that there
is also redundant information

795
00:34:41,690 --> 00:34:43,130
in this kind of weaker tail.

796
00:34:43,130 --> 00:34:45,500
It's not quite as good as
the eight best or not as

797
00:34:45,500 --> 00:34:47,480
high decoding
accuracy, but there

798
00:34:47,480 --> 00:34:48,829
is redundant information to it.

799
00:34:51,679 --> 00:34:54,380
Just to summarize this
part, what we see here

800
00:34:54,380 --> 00:34:56,270
is that there is
a few neurons that

801
00:34:56,270 --> 00:34:58,775
really became highly, highly
selective due to this process.

802
00:35:02,420 --> 00:35:04,240
So we see that there's
a lot of information

803
00:35:04,240 --> 00:35:06,640
in this small, compact set.

804
00:35:06,640 --> 00:35:08,740
Here are the results from
a related experiment.

805
00:35:08,740 --> 00:35:10,885
This was in a task
where the monkey had

806
00:35:10,885 --> 00:35:12,760
to remember the spatial
location of a stimuli

807
00:35:12,760 --> 00:35:16,570
rather than what an image
was, like a square or circle.

808
00:35:16,570 --> 00:35:18,220
But anyway, small detail.

809
00:35:18,220 --> 00:35:20,710
Here's this big effect of
this is match information,

810
00:35:20,710 --> 00:35:23,536
this is non-match
information being decoded.

811
00:35:23,536 --> 00:35:24,910
So these are the
decoding results

812
00:35:24,910 --> 00:35:27,420
that I showed you before.

813
00:35:27,420 --> 00:35:31,870
Here's an analysis where an ROC
analysis was done on this data.

814
00:35:31,870 --> 00:35:33,850
So for each neuron,
they calculated

815
00:35:33,850 --> 00:35:36,910
how well does an individual
neuron separate the match

816
00:35:36,910 --> 00:35:38,920
and the non-match trials.

817
00:35:38,920 --> 00:35:41,040
And again, pre
and post training.

818
00:35:41,040 --> 00:35:44,560
And what you see is here, they
did not see this big split

819
00:35:44,560 --> 00:35:46,990
that I saw with the decoding.

820
00:35:46,990 --> 00:35:49,460
And this was published.

821
00:35:49,460 --> 00:35:53,830
So the question is, why
did they not see it.

822
00:35:53,830 --> 00:35:57,490
And the reason is because there
were only a few neurons that

823
00:35:57,490 --> 00:35:59,140
were really highly selective.

824
00:35:59,140 --> 00:36:00,770
That was enough to
drive the decoding

825
00:36:00,770 --> 00:36:03,370
but it wasn't enough if you
averaged over all the neurons

826
00:36:03,370 --> 00:36:04,910
to see this effect.

827
00:36:04,910 --> 00:36:07,600
So essentially, there's kind
of like two populations here.

828
00:36:07,600 --> 00:36:09,100
There's a huge
population of neurons

829
00:36:09,100 --> 00:36:10,840
that did pick up the
match information,

830
00:36:10,840 --> 00:36:12,280
or picked it up very weakly.

831
00:36:12,280 --> 00:36:14,020
And then there's a
small set of neurons

832
00:36:14,020 --> 00:36:16,780
that are very selective.

833
00:36:16,780 --> 00:36:20,930
And so if you take an average
of the nonselective population,

834
00:36:20,930 --> 00:36:22,440
it's just here.

835
00:36:22,440 --> 00:36:24,850
Let's say this is the
pre-training population.

836
00:36:24,850 --> 00:36:26,710
If you take an average
of post-training

837
00:36:26,710 --> 00:36:28,810
over all the
neurons, the average

838
00:36:28,810 --> 00:36:30,400
would shift slightly
to the right.

839
00:36:30,400 --> 00:36:32,440
But it might not
be very detectable

840
00:36:32,440 --> 00:36:34,960
from the pre-training
amount of information.

841
00:36:34,960 --> 00:36:38,020
But if you have weights on just
the highly selective neurons,

842
00:36:38,020 --> 00:36:39,634
you see a huge effect.

843
00:36:39,634 --> 00:36:41,800
So it's really important
that you don't average over

844
00:36:41,800 --> 00:36:45,280
all your neurons but you treat
the neurons as individuals,

845
00:36:45,280 --> 00:36:49,390
or maybe classes, because
they're doing different things.

846
00:36:49,390 --> 00:36:52,540
So the next coding
question I wanted to ask

847
00:36:52,540 --> 00:36:54,880
was, is information
contained in what I

848
00:36:54,880 --> 00:36:57,580
call a dynamic population code.

849
00:36:57,580 --> 00:37:00,980
OK, so let me explain
what that means.

850
00:37:00,980 --> 00:37:05,190
If we showed a stimulus, such
as a kiwi, which I like showing,

851
00:37:05,190 --> 00:37:08,800
we saw that there might be a
unique pattern for that kiwi.

852
00:37:08,800 --> 00:37:10,615
And that pattern
is what enables me

853
00:37:10,615 --> 00:37:12,490
to discriminate between
all the other stimuli

854
00:37:12,490 --> 00:37:14,089
and do the classification.

855
00:37:14,089 --> 00:37:15,880
But it might turn out
that there's not just

856
00:37:15,880 --> 00:37:18,340
one pattern for that
kiwi, but there's actually

857
00:37:18,340 --> 00:37:19,790
a sequence of patterns.

858
00:37:19,790 --> 00:37:22,270
So if we plotted the
patterns in time,

859
00:37:22,270 --> 00:37:24,940
they would actually change.

860
00:37:24,940 --> 00:37:27,340
So it's a sequence of patterns
that represents one thing.

861
00:37:29,890 --> 00:37:33,215
And this kind of thing has
been shown a little bit.

862
00:37:33,215 --> 00:37:34,840
And actually now it's
been shown a lot.

863
00:37:34,840 --> 00:37:38,350
But when I first did this
in 2008, the kind of one

864
00:37:38,350 --> 00:37:40,360
study I knew of
that kind of showed

865
00:37:40,360 --> 00:37:44,704
this was this paper by Ofer
Mazor and Gilles Laurent

866
00:37:44,704 --> 00:37:46,370
where they did kind
of the PCA analysis.

867
00:37:46,370 --> 00:37:49,030
And this is in like the locusts,
I think, olfactory bulb.

868
00:37:49,030 --> 00:37:51,446
And they showed that there
were these kind of trajectories

869
00:37:51,446 --> 00:37:53,830
in space where a particular
odor was represented

870
00:37:53,830 --> 00:37:57,400
by maybe different neurons.

871
00:37:57,400 --> 00:38:00,250
And again, I had a paper in
2008 where I examined this.

872
00:38:00,250 --> 00:38:02,710
And there's a review
paper by King and Dehaene

873
00:38:02,710 --> 00:38:03,980
in 2014 about this.

874
00:38:03,980 --> 00:38:06,830
And there's a lot of
people looking at this now.

875
00:38:06,830 --> 00:38:10,172
So how can we get at this
kind of thing in decoding?

876
00:38:10,172 --> 00:38:12,130
What you can do is you
can train the classifier

877
00:38:12,130 --> 00:38:14,530
at one point in time, and
test it at a point in time

878
00:38:14,530 --> 00:38:15,790
like we were doing before.

879
00:38:15,790 --> 00:38:19,260
But you can also test
at other points in time.

880
00:38:19,260 --> 00:38:22,000
And so what happens is if you
train at a point in time that

881
00:38:22,000 --> 00:38:24,760
should have the information,
and things are contained

882
00:38:24,760 --> 00:38:27,880
in a static code where there's
just one pattern, then if you

883
00:38:27,880 --> 00:38:30,544
test at other points in
time, you should do well.

884
00:38:30,544 --> 00:38:33,210
Because you capture that pattern
where there's good information,

885
00:38:33,210 --> 00:38:35,590
you should do well at
other points in time.

886
00:38:35,590 --> 00:38:38,292
However, if it's a changing
pattern of neural activity,

887
00:38:38,292 --> 00:38:40,000
then when you train
at one point in time,

888
00:38:40,000 --> 00:38:43,305
you won't do well at
other points in time.

889
00:38:43,305 --> 00:38:44,180
Does that make sense?

890
00:38:50,550 --> 00:38:54,900
So here are the results--

891
00:38:54,900 --> 00:38:55,950
if that will go away.

892
00:38:55,950 --> 00:38:57,400
Let me just orient you here.

893
00:38:57,400 --> 00:38:59,700
So this is the same
experiment, you know,

894
00:38:59,700 --> 00:39:03,390
time of the first stimulus, time
of the second stimulus, chance.

895
00:39:03,390 --> 00:39:05,880
This black trace is what we
saw before that I was always

896
00:39:05,880 --> 00:39:06,870
plotting in red.

897
00:39:06,870 --> 00:39:09,161
This is the standard decoding
when I trained and tested

898
00:39:09,161 --> 00:39:11,250
at each point in time.

899
00:39:11,250 --> 00:39:13,590
This blue trace is
where I train here

900
00:39:13,590 --> 00:39:16,920
and I tested all
other points in time.

901
00:39:16,920 --> 00:39:19,530
So if it's the case that
there's one pattern coding

902
00:39:19,530 --> 00:39:21,330
the information, what
you're going to find

903
00:39:21,330 --> 00:39:24,000
is that as soon as that
information becomes present,

904
00:39:24,000 --> 00:39:26,940
it will fill out
this whole curve.

905
00:39:26,940 --> 00:39:29,340
Conversely, if it's
changing, what you might see

906
00:39:29,340 --> 00:39:34,254
is just a localized
information just at one spot.

907
00:39:34,254 --> 00:39:35,670
So let's take a
look at the movie,

908
00:39:35,670 --> 00:39:36,878
if that moves out of the way.

909
00:39:36,878 --> 00:39:39,720
OK, here is the moment of truth.

910
00:39:39,720 --> 00:39:42,070
Information is rising.

911
00:39:42,070 --> 00:39:46,845
And what you see in
this second delay period

912
00:39:46,845 --> 00:39:50,646
is clearly we see this
little peak moving along.

913
00:39:50,646 --> 00:39:52,020
So it's not that
there's just one

914
00:39:52,020 --> 00:39:55,884
pattern that
contains information

915
00:39:55,884 --> 00:39:56,800
at all points in time.

916
00:39:56,800 --> 00:39:59,070
But in fact, it's a
sequence of patterns

917
00:39:59,070 --> 00:40:00,864
that each contain
that information.

918
00:40:07,640 --> 00:40:11,410
So here are the results just
plotted in a different format.

919
00:40:11,410 --> 00:40:13,570
This is what we call a
temporal cross training

920
00:40:13,570 --> 00:40:16,010
plot because I train
at one point and test

921
00:40:16,010 --> 00:40:18,062
at a different point in time.

922
00:40:18,062 --> 00:40:20,020
So this is the time I'm
testing the classifier.

923
00:40:20,020 --> 00:40:22,180
This is the time I'm
training the classifier.

924
00:40:22,180 --> 00:40:23,980
This is the passive
fixation stage,

925
00:40:23,980 --> 00:40:26,710
so there was no information
in the population.

926
00:40:26,710 --> 00:40:28,322
And this is just
how I often plot it.

927
00:40:28,322 --> 00:40:30,280
What you see is there's
this big diagonal band.

928
00:40:30,280 --> 00:40:31,930
Here you see it's
like widening a bit

929
00:40:31,930 --> 00:40:36,350
so it might be hitting some
sort of stationary point there.

930
00:40:36,350 --> 00:40:38,710
But you can see
that clearly there's

931
00:40:38,710 --> 00:40:40,979
these dynamics happening.

932
00:40:40,979 --> 00:40:43,145
And we can go and we can
look at individual neurons.

933
00:40:43,145 --> 00:40:45,700
So these are actually the
three most selective neurons.

934
00:40:45,700 --> 00:40:48,700
They're not randomly chosen.

935
00:40:48,700 --> 00:40:51,430
Red is the firing rate
to the non-match trials.

936
00:40:51,430 --> 00:40:53,380
Blue is the firing rate
to the match trials.

937
00:40:53,380 --> 00:40:56,910
This neuron has a pretty
wide window of selectivity.

938
00:40:56,910 --> 00:41:00,160
This other neuron here
has a really small window.

939
00:41:00,160 --> 00:41:02,650
There's just this little blip
where it's more selective

940
00:41:02,650 --> 00:41:05,652
or has a higher firing rate to
not match compared to match.

941
00:41:05,652 --> 00:41:08,110
And it's these neurons that
have these little kind of blips

942
00:41:08,110 --> 00:41:10,510
that are giving rise
to that dynamics.

943
00:41:10,510 --> 00:41:13,420
Here's something else we can
ask about with the paradigm

944
00:41:13,420 --> 00:41:17,111
of asking coding questions.

945
00:41:17,111 --> 00:41:18,610
What we're going
to do here is we're

946
00:41:18,610 --> 00:41:21,007
going to try a bunch of
different classifiers.

947
00:41:21,007 --> 00:41:22,840
And here, you know,
these are some questions

948
00:41:22,840 --> 00:41:23,570
that kind of came up.

949
00:41:23,570 --> 00:41:26,194
But can we tweak the classifier
to understand a little bit more

950
00:41:26,194 --> 00:41:27,290
about population code.

951
00:41:27,290 --> 00:41:29,320
So here is a fairly
simple example.

952
00:41:29,320 --> 00:41:31,702
But I compared three
different classifiers.

953
00:41:31,702 --> 00:41:33,160
And the question
I wanted to get at

954
00:41:33,160 --> 00:41:37,330
was, is information coded in the
total activity of a population.

955
00:41:37,330 --> 00:41:40,630
Or is it coded more so
in the relative activity

956
00:41:40,630 --> 00:41:42,080
of different neurons.

957
00:41:42,080 --> 00:41:44,920
So you know, in particular,
in the face patches,

958
00:41:44,920 --> 00:41:49,900
we see that information of all
neurons increases to faces.

959
00:41:49,900 --> 00:41:51,625
But if you think
about that from a--

960
00:41:51,625 --> 00:41:53,500
or maybe not information,
but the firing rate

961
00:41:53,500 --> 00:41:55,270
increases to all faces.

962
00:41:55,270 --> 00:41:57,340
But if the firing rate
increases to all faces,

963
00:41:57,340 --> 00:41:59,800
you've lost dynamic range
and you can't really

964
00:41:59,800 --> 00:42:02,237
tell what's happening
for individual faces.

965
00:42:02,237 --> 00:42:03,820
So what I wanted to
know was, how much

966
00:42:03,820 --> 00:42:06,279
information is coded by this
overall shift versus patterns.

967
00:42:06,279 --> 00:42:08,903
So what I did here was I used a
Poisson Naive Bayes classifier,

968
00:42:08,903 --> 00:42:11,950
which takes into account both
the overall magnitude and also

969
00:42:11,950 --> 00:42:12,940
the patterns.

970
00:42:12,940 --> 00:42:15,610
I used a classifier
minimum angle

971
00:42:15,610 --> 00:42:17,740
that took only the
patterns into account.

972
00:42:17,740 --> 00:42:20,140
And I used a classifier
called the total population

973
00:42:20,140 --> 00:42:23,200
activity that only took
the average activity

974
00:42:23,200 --> 00:42:25,440
of the whole population.

975
00:42:25,440 --> 00:42:28,270
This classifier's pretty
dumb, but in a certain sense,

976
00:42:28,270 --> 00:42:30,670
it's what fMRI is
doing, just averaging

977
00:42:30,670 --> 00:42:33,640
all your neurons together.

978
00:42:33,640 --> 00:42:36,050
So it's a little bit of a proxy.

979
00:42:36,050 --> 00:42:38,190
There's paper,
also, by Elias Issa

980
00:42:38,190 --> 00:42:41,255
and Jim DiCarlo where they show
that fMRI is actually fairly--

981
00:42:41,255 --> 00:42:43,630
or somewhat strongly correlated
with the average activity

982
00:42:43,630 --> 00:42:45,003
of a whole population.

983
00:42:47,590 --> 00:42:50,350
So let's see how these
classifiers compare

984
00:42:50,350 --> 00:42:52,540
to each other to see
where the information is

985
00:42:52,540 --> 00:42:54,180
being coded in the activity.

986
00:42:54,180 --> 00:42:58,570
Again, I'm going to use this
study from Doris and Winrich

987
00:42:58,570 --> 00:43:01,750
where we're going to be looking
at the pose specific phase

988
00:43:01,750 --> 00:43:03,440
information, just as an example.

989
00:43:03,440 --> 00:43:05,500
So this is decoding
those 25 individuals

990
00:43:05,500 --> 00:43:07,160
when they're shown,
trained, and tested

991
00:43:07,160 --> 00:43:08,350
that exact same head pose.

992
00:43:11,440 --> 00:43:15,730
And so what we see is we see
that when we use the Poisson

993
00:43:15,730 --> 00:43:19,030
Naive Bayes classifier that
took the pattern and also

994
00:43:19,030 --> 00:43:22,030
the total activity
into account, and when

995
00:43:22,030 --> 00:43:24,830
we used the classifier that took
just the pattern into account,

996
00:43:24,830 --> 00:43:28,610
the minimum angle, we're
getting similar results.

997
00:43:28,610 --> 00:43:31,305
So the overall activity
was not really adding much.

998
00:43:31,305 --> 00:43:33,430
But if you just use the
overall activity by itself,

999
00:43:33,430 --> 00:43:35,380
it was pretty poor.

1000
00:43:35,380 --> 00:43:37,257
So this is, again,
touching on something

1001
00:43:37,257 --> 00:43:39,340
about what Rebecca said,
when you start averaging,

1002
00:43:39,340 --> 00:43:40,580
you can lose a lot.

1003
00:43:40,580 --> 00:43:43,130
And so you might be blind
to a lot of what's going on

1004
00:43:43,130 --> 00:43:45,190
if you're just using voxels.

1005
00:43:49,060 --> 00:43:53,890
There is reasons to do
invasive recordings.

1006
00:43:53,890 --> 00:43:58,180
All right, and I think this
might be my last point in terms

1007
00:43:58,180 --> 00:43:59,470
of neural coding.

1008
00:43:59,470 --> 00:44:02,830
But this is the question of
the independent neuron code.

1009
00:44:02,830 --> 00:44:05,950
So is there more activity
if you took into account

1010
00:44:05,950 --> 00:44:09,005
the joint activity of all
neurons simultaneously,

1011
00:44:09,005 --> 00:44:11,015
so if you had
simultaneous recordings

1012
00:44:11,015 --> 00:44:13,390
and took that into account,
versus the pseudo populations

1013
00:44:13,390 --> 00:44:16,150
I'm doing where you are
treating each neuron as if they

1014
00:44:16,150 --> 00:44:18,880
were statistically independent.

1015
00:44:18,880 --> 00:44:21,910
And so this is a very,
very simple analysis.

1016
00:44:21,910 --> 00:44:25,450
Here I just did the
decoding in an experiment

1017
00:44:25,450 --> 00:44:27,130
where we had
simultaneous recordings

1018
00:44:27,130 --> 00:44:29,560
and compared it to using
that same data but using

1019
00:44:29,560 --> 00:44:35,050
pseudo populations on that data,
using very simple classifiers.

1020
00:44:35,050 --> 00:44:36,460
And so here are the results.

1021
00:44:36,460 --> 00:44:38,980
What I found was
that in this one case

1022
00:44:38,980 --> 00:44:40,660
there was a little
bit extra information

1023
00:44:40,660 --> 00:44:42,370
in the simultaneous
recordings as

1024
00:44:42,370 --> 00:44:44,534
compared to the
pseudo populations.

1025
00:44:44,534 --> 00:44:47,200
But you know, it wouldn't really
change many of your conclusions

1026
00:44:47,200 --> 00:44:48,158
about what's happening.

1027
00:44:48,158 --> 00:44:50,780
It's like, you know, maybe
a 5% increase or something.

1028
00:44:50,780 --> 00:44:53,590
And then this has been seen
in a lot of the literature.

1029
00:44:53,590 --> 00:44:56,110
This is the question
of temporal precision

1030
00:44:56,110 --> 00:44:58,412
or what is sometimes
called temporal coding.

1031
00:44:58,412 --> 00:45:00,370
What happens, you know,
some of the experiments

1032
00:45:00,370 --> 00:45:03,130
I was using 100 millisecond
bin, sometimes I was using 500.

1033
00:45:03,130 --> 00:45:05,141
What happens when you
change the bin size?

1034
00:45:05,141 --> 00:45:06,890
What happens, this is
pretty clear, again,

1035
00:45:06,890 --> 00:45:08,835
from a lot of studies
that I've done,

1036
00:45:08,835 --> 00:45:11,460
when you increase the bin size,
generally the decoding accuracy

1037
00:45:11,460 --> 00:45:13,350
goes up.

1038
00:45:13,350 --> 00:45:15,270
What you lose is
temporal precision,

1039
00:45:15,270 --> 00:45:17,730
because now you're blurring
over a much bigger area.

1040
00:45:17,730 --> 00:45:21,330
So in terms of your
understanding what's going on,

1041
00:45:21,330 --> 00:45:24,840
you have to find the
right point between having

1042
00:45:24,840 --> 00:45:27,226
a very clear result by having
a larger bin versus you

1043
00:45:27,226 --> 00:45:28,600
caring about the
time information

1044
00:45:28,600 --> 00:45:29,600
and using a smaller bin.

1045
00:45:32,700 --> 00:45:36,356
And I haven't seen that I need
like one millisecond resolution

1046
00:45:36,356 --> 00:45:37,980
or a very complicated
classifier that's

1047
00:45:37,980 --> 00:45:40,440
taking every single spike
time into account to help me.

1048
00:45:40,440 --> 00:45:42,930
But again, I haven't explored
this as fully as I could.

1049
00:45:42,930 --> 00:45:44,700
So it would be
interesting for someone

1050
00:45:44,700 --> 00:45:47,280
to use a method [INAUDIBLE]
that people really

1051
00:45:47,280 --> 00:45:50,940
love to claim that things are
coded in patterns in time.

1052
00:45:50,940 --> 00:45:53,220
You know, if you
want to, go for it.

1053
00:45:53,220 --> 00:45:54,260
Show me it.

1054
00:45:54,260 --> 00:45:55,650
I've got some data available.

1055
00:45:55,650 --> 00:45:58,650
Build a classifier that does
that and we can compare it.

1056
00:45:58,650 --> 00:46:01,950
But I haven't seen it yet.

1057
00:46:01,950 --> 00:46:03,645
So a summary of
the neural coding.

1058
00:46:03,645 --> 00:46:07,290
Decoding allows you to examine
many questions, such as is

1059
00:46:07,290 --> 00:46:08,540
there a compact code.

1060
00:46:08,540 --> 00:46:11,040
So is there just a few neurons
that has all the information.

1061
00:46:11,040 --> 00:46:12,180
Is there a dynamic code.

1062
00:46:12,180 --> 00:46:13,971
So is the pattern of
activity that's coding

1063
00:46:13,971 --> 00:46:16,350
information changing in time.

1064
00:46:16,350 --> 00:46:19,670
Are neurons independent or is
there more information coded

1065
00:46:19,670 --> 00:46:21,990
in their joint activity.

1066
00:46:21,990 --> 00:46:24,010
And what is the
temporal precision.

1067
00:46:24,010 --> 00:46:25,756
And this is, again,
not everything,

1068
00:46:25,756 --> 00:46:27,750
there are many other
questions you could ask.

1069
00:46:30,072 --> 00:46:31,905
Any other questions
about the neural coding?

1070
00:46:37,670 --> 00:46:40,250
Just a few other
things to mention.

1071
00:46:40,250 --> 00:46:43,490
So you know, I was talking all
about, basically, spiking data.

1072
00:46:43,490 --> 00:46:46,880
But you can also do
decoding from MEG data.

1073
00:46:46,880 --> 00:46:49,760
So there was a
great study by Leyla

1074
00:46:49,760 --> 00:46:53,570
where she tried to
decode from MEG signals.

1075
00:46:53,570 --> 00:46:56,170
Here's just one example
from that paper where

1076
00:46:56,170 --> 00:46:59,740
she was trying to decode
which letter of the alphabet,

1077
00:46:59,740 --> 00:47:01,730
or at least 25 of
the 26 letters,

1078
00:47:01,730 --> 00:47:05,660
was shown to a subject, a human
subject in an MEG scanner.

1079
00:47:05,660 --> 00:47:08,360
You know, see is
very nice, you know,

1080
00:47:08,360 --> 00:47:10,522
people are not psychic either.

1081
00:47:10,522 --> 00:47:12,980
And then at the time, slightly
after the stimulus is shown,

1082
00:47:12,980 --> 00:47:14,780
you can decode quite well.

1083
00:47:14,780 --> 00:47:17,030
And things are above chance.

1084
00:47:17,030 --> 00:47:20,684
And then she went on to
examine position invariance

1085
00:47:20,684 --> 00:47:22,850
in different parts of the
brain, the timing of that.

1086
00:47:22,850 --> 00:47:24,590
So you can check out
that paper as well.

1087
00:47:27,290 --> 00:47:34,280
And as Rebecca mentioned,
this kind of approach

1088
00:47:34,280 --> 00:47:35,984
has really taken off in fMRI.

1089
00:47:35,984 --> 00:47:37,400
Here are three
different toolboxes

1090
00:47:37,400 --> 00:47:40,087
you could use if
you're doing fMRI.

1091
00:47:40,087 --> 00:47:42,170
So I wrote a toolbox I
will talk about in a minute

1092
00:47:42,170 --> 00:47:44,400
to do neural decoding, and
I recommend it for that.

1093
00:47:44,400 --> 00:47:46,377
But if you're going
to do fMRI decoding,

1094
00:47:46,377 --> 00:47:48,710
you probably are better off
using one of these toolboxes

1095
00:47:48,710 --> 00:47:51,660
because they have certain
things that are fMRI specific,

1096
00:47:51,660 --> 00:47:54,470
such as mapping back to voxels
that my toolbox doesn't have.

1097
00:47:54,470 --> 00:47:56,590
Although you could,
in principle,

1098
00:47:56,590 --> 00:47:58,580
throw fMRI data into
my toolbox as well.

1099
00:48:02,440 --> 00:48:05,100
And then all these studies
so far I've mentioned

1100
00:48:05,100 --> 00:48:08,760
have had kind of structure
where every trial is exactly

1101
00:48:08,760 --> 00:48:12,630
the same length, as
Tyler pointed out.

1102
00:48:12,630 --> 00:48:14,040
And if you wanted
to do something

1103
00:48:14,040 --> 00:48:16,500
where it wasn't that structured
that well, such as decoding

1104
00:48:16,500 --> 00:48:19,050
from a rat running around a maze
where it wasn't always doing

1105
00:48:19,050 --> 00:48:21,860
things in the same
amount of time,

1106
00:48:21,860 --> 00:48:27,060
there's a toolbox that came
out of Emory Brown's lab that

1107
00:48:27,060 --> 00:48:28,710
should hopefully
enable you to do some

1108
00:48:28,710 --> 00:48:30,108
of those kinds of analyses.

1109
00:48:33,034 --> 00:48:35,450
All right, let me just briefly
talk about some limitations

1110
00:48:35,450 --> 00:48:39,610
to decoding, just like Rebecca
did with the downer at the end.

1111
00:48:39,610 --> 00:48:43,710
So some limitations are, this
is a hypothesis-based method.

1112
00:48:43,710 --> 00:48:46,702
So we have specific questions
in mind that we want to test.

1113
00:48:46,702 --> 00:48:49,160
And then we can assess whether
those questions are answered

1114
00:48:49,160 --> 00:48:52,264
or not, to a certain degree.

1115
00:48:52,264 --> 00:48:54,680
So that's kind of a good thing
but it's also a down thing.

1116
00:48:54,680 --> 00:48:56,750
Like if we didn't think
about the right question,

1117
00:48:56,750 --> 00:48:58,160
then we're not going to see it.

1118
00:48:58,160 --> 00:48:59,300
So there could be
a lot happening

1119
00:48:59,300 --> 00:49:01,883
in our neural activity that we
just didn't think to ask about.

1120
00:49:04,250 --> 00:49:05,967
And so unsupervised
learning methods

1121
00:49:05,967 --> 00:49:07,050
might get at some of that.

1122
00:49:07,050 --> 00:49:09,560
And you could see about how
much is the variable of interest

1123
00:49:09,560 --> 00:49:11,976
you're interested in, accounting
for the total variability

1124
00:49:11,976 --> 00:49:14,090
in a population.

1125
00:49:14,090 --> 00:49:16,070
Also, I hinted at this
throughout the talk,

1126
00:49:16,070 --> 00:49:19,337
just because information is
present doesn't mean it's used.

1127
00:49:19,337 --> 00:49:21,920
The back of the head stuff might
be an example of that or not,

1128
00:49:21,920 --> 00:49:22,790
I don't know.

1129
00:49:22,790 --> 00:49:24,710
But you just have to
interpret the results

1130
00:49:24,710 --> 00:49:27,110
and don't know the
information there.

1131
00:49:27,110 --> 00:49:29,650
Therefore, this is the
brain region doing x.

1132
00:49:29,650 --> 00:49:32,510
A lot of stuff can
kind of sneak in.

1133
00:49:32,510 --> 00:49:36,810
Timing information can be
also really interesting.

1134
00:49:36,810 --> 00:49:38,200
I've been exploring this summer.

1135
00:49:38,200 --> 00:49:41,390
So if you can know the relative
timing, when information

1136
00:49:41,390 --> 00:49:43,010
is in one brain
region versus another,

1137
00:49:43,010 --> 00:49:46,440
it can tell you a lot about
kind of the flow of information

1138
00:49:46,440 --> 00:49:48,920
the computation that brain
regions might be doing.

1139
00:49:48,920 --> 00:49:53,930
So I think that's another very
promising area to explore.

1140
00:49:53,930 --> 00:49:57,290
Also, decoding kind of focuses
on the computational level

1141
00:49:57,290 --> 00:50:00,202
or algorithmic level, or
really neural representations

1142
00:50:00,202 --> 00:50:01,910
if you thought about
Marr's three levels.

1143
00:50:01,910 --> 00:50:04,535
It doesn't talk about this kind
of implementational mechanistic

1144
00:50:04,535 --> 00:50:05,240
level.

1145
00:50:05,240 --> 00:50:07,670
So [INAUDIBLE] it's not
one thing it can do.

1146
00:50:07,670 --> 00:50:09,789
Now if you have the flow
of information going

1147
00:50:09,789 --> 00:50:12,080
through an area and you
understand that well and what's

1148
00:50:12,080 --> 00:50:13,580
being represented,
I think you might

1149
00:50:13,580 --> 00:50:16,580
be able to back out some of
these mechanisms or processes

1150
00:50:16,580 --> 00:50:18,020
of how that can be built up.

1151
00:50:18,020 --> 00:50:22,864
But in and of itself, decoding
doesn't give you that.

1152
00:50:22,864 --> 00:50:24,530
Also, decoding methods,
computationally,

1153
00:50:24,530 --> 00:50:25,820
can be intensive.

1154
00:50:25,820 --> 00:50:27,645
can take up to an hour.

1155
00:50:27,645 --> 00:50:29,270
If you do something
really complicated,

1156
00:50:29,270 --> 00:50:31,832
it can take you a week to
run something very elaborate.

1157
00:50:31,832 --> 00:50:33,290
You know, sometimes
it can be quick

1158
00:50:33,290 --> 00:50:35,040
and you can do it
in a few minutes,

1159
00:50:35,040 --> 00:50:38,000
but it's certainly a lot
slower than doing something

1160
00:50:38,000 --> 00:50:41,154
like an activity index where
you're done in two seconds

1161
00:50:41,154 --> 00:50:43,070
and then you have the
wrong answer right away.

1162
00:50:48,410 --> 00:50:50,780
Let me just spend like
five more minutes talking

1163
00:50:50,780 --> 00:50:52,580
about this toolbox
and then you can all

1164
00:50:52,580 --> 00:50:54,990
go work on your projects
and do what you want to do.

1165
00:50:54,990 --> 00:50:57,260
So this is a toolbox I made
called the neural decoding

1166
00:50:57,260 --> 00:50:57,779
toolbox.

1167
00:50:57,779 --> 00:50:59,320
There's a paper
about it in Frontiers

1168
00:50:59,320 --> 00:51:01,505
in Neuroinfomatics in 2013.

1169
00:51:01,505 --> 00:51:04,130
And the whole point of it was to
try to make it easy for people

1170
00:51:04,130 --> 00:51:07,400
to do these analyses
because [INAUDIBLE]..

1171
00:51:07,400 --> 00:51:10,550
And so basically, here
is like six lines of code

1172
00:51:10,550 --> 00:51:13,599
that if you ran it would do
one of those analyses for you.

1173
00:51:13,599 --> 00:51:15,140
And not only is it
six lines of code,

1174
00:51:15,140 --> 00:51:17,972
but it's almost literally these
exact same six lines of code.

1175
00:51:17,972 --> 00:51:19,430
The only thing
you'd, like, replace

1176
00:51:19,430 --> 00:51:22,850
would be your data rather
than this data file.

1177
00:51:22,850 --> 00:51:31,393
And so what you can do,
the whole idea behind it

1178
00:51:31,393 --> 00:51:33,660
is it's a kind of open
science idea, you know,

1179
00:51:33,660 --> 00:51:36,299
I want more transparency
so I'm sharing my code.

1180
00:51:36,299 --> 00:51:38,840
If you use my code, ultimately,
if you could share your data,

1181
00:51:38,840 --> 00:51:40,970
that would be great
because I think

1182
00:51:40,970 --> 00:51:42,380
I wouldn't have been able
to develop any of this stuff

1183
00:51:42,380 --> 00:51:43,940
if people hadn't
shared data with me.

1184
00:51:43,940 --> 00:51:46,670
I think we'll make a lot
more progress in science

1185
00:51:46,670 --> 00:51:49,877
if we're open and share.

1186
00:51:49,877 --> 00:51:50,960
There you go, I'm a hippy.

1187
00:51:55,920 --> 00:52:01,010
And here's the website for
the toolbox, www.readout.info.

1188
00:52:01,010 --> 00:52:03,570
Just talk briefly a little
bit more about the toolbox.

1189
00:52:03,570 --> 00:52:08,670
The way it was designed is
around four abstract classes.

1190
00:52:08,670 --> 00:52:11,759
So these are kind of
major pieces or objects

1191
00:52:11,759 --> 00:52:13,300
that you can kind
of swap in and out.

1192
00:52:13,300 --> 00:52:14,810
They're like
components that allow

1193
00:52:14,810 --> 00:52:16,950
you to do different things.

1194
00:52:16,950 --> 00:52:20,100
So for example, one of the
components is a data source.

1195
00:52:20,100 --> 00:52:23,487
This creates the training
and test set of data.

1196
00:52:23,487 --> 00:52:25,320
You can separate that
out in different ways,

1197
00:52:25,320 --> 00:52:28,430
like there's just a standard
one but you can swap it out

1198
00:52:28,430 --> 00:52:32,600
to do that invariance
or abstract analysis.

1199
00:52:32,600 --> 00:52:34,760
Or you can do things
like, I guess, change

1200
00:52:34,760 --> 00:52:38,310
the different binning schemes
within that piece of code.

1201
00:52:38,310 --> 00:52:40,310
So that's one component
you can swap in and out.

1202
00:52:40,310 --> 00:52:42,560
Another one are
these preprocessors.

1203
00:52:42,560 --> 00:52:45,140
What they do is they apply
pre-processing to your training

1204
00:52:45,140 --> 00:52:47,690
data, and then use
those parameters that

1205
00:52:47,690 --> 00:52:51,410
were learned on the training
set to do some mechanics

1206
00:52:51,410 --> 00:52:53,370
to the test set as well.

1207
00:52:53,370 --> 00:52:55,720
So for example, when I was
selecting the best neurons,

1208
00:52:55,720 --> 00:52:58,490
I used a preprocessor
that just eliminated--

1209
00:52:58,490 --> 00:53:00,620
found good neurons
in the training set,

1210
00:53:00,620 --> 00:53:02,510
just used those, and
then also eliminated

1211
00:53:02,510 --> 00:53:04,030
those neurons in the test set.

1212
00:53:04,030 --> 00:53:05,446
And so there are
different, again,

1213
00:53:05,446 --> 00:53:07,400
components you can swap
in and out with that.

1214
00:53:07,400 --> 00:53:10,640
An obvious component you can
swap in and out, classifiers.

1215
00:53:10,640 --> 00:53:13,040
You could throw in a classifier
that takes correlations

1216
00:53:13,040 --> 00:53:14,502
into account or doesn't.

1217
00:53:14,502 --> 00:53:15,710
Or do whatever you want here.

1218
00:53:15,710 --> 00:53:18,860
You know, use some highly
nonlinear or somewhat nonlinear

1219
00:53:18,860 --> 00:53:23,100
thing and see is the
brain doing it that way.

1220
00:53:23,100 --> 00:53:26,930
And there's this final piece
called cross validator.

1221
00:53:26,930 --> 00:53:29,330
It basically runs the whole
cross validation loop.

1222
00:53:29,330 --> 00:53:31,520
It pulls data from
the data source,

1223
00:53:31,520 --> 00:53:33,270
creating training and test sets.

1224
00:53:33,270 --> 00:53:35,190
It applies the
future preprocessor.

1225
00:53:35,190 --> 00:53:37,437
It trains the classifier
and reports the results.

1226
00:53:37,437 --> 00:53:40,020
Generally, I've only written one
of these and it's pretty long

1227
00:53:40,020 --> 00:53:41,010
and does a lot of
different things,

1228
00:53:41,010 --> 00:53:42,759
like gives you different
types of results.

1229
00:53:42,759 --> 00:53:44,610
So not just is there
information loss

1230
00:53:44,610 --> 00:53:46,600
but gives you mutual information
and all these other things.

1231
00:53:46,600 --> 00:53:48,808
But again, if you wanted
to, you could expand on that

1232
00:53:48,808 --> 00:53:50,740
and do the cross-validation
in different ways.

1233
00:53:54,690 --> 00:53:57,770
If you wanted to get
started on your own data,

1234
00:53:57,770 --> 00:54:00,570
you just have to put your data
in a fairly simple format.

1235
00:54:00,570 --> 00:54:03,770
It's a format I
call raster format.

1236
00:54:03,770 --> 00:54:05,120
It's just in a raster.

1237
00:54:05,120 --> 00:54:06,770
So you just have
trials going this way.

1238
00:54:06,770 --> 00:54:07,604
Time going this way.

1239
00:54:07,604 --> 00:54:09,061
And if it was
spikes, it would just

1240
00:54:09,061 --> 00:54:11,690
be the ones and zeros that
happen on the different trials.

1241
00:54:11,690 --> 00:54:14,390
If this was MEG data,
you'd have your MEG

1242
00:54:14,390 --> 00:54:16,350
actual continuous
values in there.

1243
00:54:16,350 --> 00:54:19,240
Again, trials and time.

1244
00:54:19,240 --> 00:54:20,620
Or fMRI or whatever.

1245
00:54:20,620 --> 00:54:25,010
fMRI might just be one vector
if you didn't have any time.

1246
00:54:25,010 --> 00:54:26,970
And so again, this
is just blown up.

1247
00:54:26,970 --> 00:54:27,890
This was trials.

1248
00:54:27,890 --> 00:54:28,700
This is time.

1249
00:54:28,700 --> 00:54:31,970
You can have the little
ones where a spike occurred.

1250
00:54:31,970 --> 00:54:33,800
And then what corresponds
to each trial,

1251
00:54:33,800 --> 00:54:36,500
you need to give the
labels about what happened.

1252
00:54:36,500 --> 00:54:39,220
So you'd have just something
called raster labels.

1253
00:54:39,220 --> 00:54:39,987
It's a structure.

1254
00:54:39,987 --> 00:54:42,320
And you'd say, OK, on the
first trial I showed a flower.

1255
00:54:42,320 --> 00:54:43,528
Second trial I showed a face.

1256
00:54:43,528 --> 00:54:45,677
Third trial I showed a couch.

1257
00:54:45,677 --> 00:54:47,760
And these could be numbers
or whatever you wanted.

1258
00:54:47,760 --> 00:54:49,490
But it's just indicating
different things are

1259
00:54:49,490 --> 00:54:50,810
happening in different trials.

1260
00:54:50,810 --> 00:54:53,910
And you can also have
multiple ones of these.

1261
00:54:53,910 --> 00:54:56,120
So if I want to decode
position, I also

1262
00:54:56,120 --> 00:54:57,470
have upper, middle, lower.

1263
00:54:57,470 --> 00:54:59,928
And so you can use the same
data and decode different types

1264
00:54:59,928 --> 00:55:02,060
of things from that data set.

1265
00:55:02,060 --> 00:55:03,950
And then there's this
final information

1266
00:55:03,950 --> 00:55:05,160
that's kind of optional.

1267
00:55:05,160 --> 00:55:06,530
It's just raster site info.

1268
00:55:06,530 --> 00:55:09,200
So for each site you could
have just meta information.

1269
00:55:09,200 --> 00:55:12,500
This is the recording
I made on January 14

1270
00:55:12,500 --> 00:55:15,470
and it was recorded from IT.

1271
00:55:18,054 --> 00:55:19,970
So you just define these
three things and then

1272
00:55:19,970 --> 00:55:23,040
the toolbox plug and play.

1273
00:55:23,040 --> 00:55:26,360
So with some experience you
should be able to do that.

1274
00:55:26,360 --> 00:55:27,540
So that's it.

1275
00:55:27,540 --> 00:55:30,410
I want to thank the Center
for Brains, Minds, Machines

1276
00:55:30,410 --> 00:55:32,180
for funding this work.

1277
00:55:32,180 --> 00:55:35,480
And all my collaborators who
collected the data or who

1278
00:55:35,480 --> 00:55:38,180
worked with me to analyze it.

1279
00:55:38,180 --> 00:55:41,840
And there is the
URL for the toolbox

1280
00:55:41,840 --> 00:55:44,290
if you want to download it.