1
00:00:01,725 --> 00:00:04,080
The following content is
provided under a Creative

2
00:00:04,080 --> 00:00:05,620
Commons license.

3
00:00:05,620 --> 00:00:07,920
Your support will help
MIT OpenCourseWare

4
00:00:07,920 --> 00:00:12,280
continue to offer high quality,
educational resources for free.

5
00:00:12,280 --> 00:00:14,910
To make a donation or
view additional materials

6
00:00:14,910 --> 00:00:18,840
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,840 --> 00:00:19,760
at ocw.mit.edu.

8
00:00:22,322 --> 00:00:24,030
TOMER ULLMAN: So yeah,
I'm going to spend

9
00:00:24,030 --> 00:00:26,700
the rest of the tutorial
talking about Amazon Mechanical

10
00:00:26,700 --> 00:00:30,420
Turk, although some of the stuff
just applies to, in general,

11
00:00:30,420 --> 00:00:33,030
sort of making large scale
experiments on people

12
00:00:33,030 --> 00:00:35,320
and crowds and things like that.

13
00:00:35,320 --> 00:00:36,630
There are other alternatives.

14
00:00:36,630 --> 00:00:38,745
I think Google has
some in-house things.

15
00:00:38,745 --> 00:00:40,980
There's like crowd
flower or crowd

16
00:00:40,980 --> 00:00:42,370
clicker or things like that.

17
00:00:42,370 --> 00:00:43,870
But most of the
psychologists I know

18
00:00:43,870 --> 00:00:46,770
that are doing stuff online
and large scale things

19
00:00:46,770 --> 00:00:48,150
use Amazon Mechanical Turk.

20
00:00:48,150 --> 00:00:49,927
So I'll be talking about that.

21
00:00:49,927 --> 00:00:51,510
This is a crowdsourcing
platform which

22
00:00:51,510 --> 00:00:54,550
is designed to do all
sorts of small tasks,

23
00:00:54,550 --> 00:00:57,180
not necessarily psychophysics.

24
00:00:57,180 --> 00:01:00,130
It was invented in around
or built in around 2005.

25
00:01:00,130 --> 00:01:02,040
And the signature
tagline for it is

26
00:01:02,040 --> 00:01:03,870
"Artificial artificial
intelligence."

27
00:01:03,870 --> 00:01:05,850
So these are all sorts
of tasks that you

28
00:01:05,850 --> 00:01:07,706
wish computers could do.

29
00:01:07,706 --> 00:01:09,330
You don't want to
put people through it

30
00:01:09,330 --> 00:01:11,120
but computers can't do it yet.

31
00:01:11,120 --> 00:01:13,840
So let's have people
do it for now.

32
00:01:13,840 --> 00:01:16,200
Amazon sort of invented
it for in-house purposes

33
00:01:16,200 --> 00:01:18,180
to get rid of duplicate pages.

34
00:01:18,180 --> 00:01:20,370
They really wanted
an algorithm to do

35
00:01:20,370 --> 00:01:23,160
that for duplicate postings,
but there really wasn't one.

36
00:01:23,160 --> 00:01:25,800
So they just paid some
people very little money

37
00:01:25,800 --> 00:01:27,510
to do a lot of these pages.

38
00:01:27,510 --> 00:01:29,220
And then they figured, wait
a minute, a lot of companies

39
00:01:29,220 --> 00:01:30,636
probably want this
sort of service

40
00:01:30,636 --> 00:01:34,380
so they offered it to
the general public.

41
00:01:34,380 --> 00:01:36,472
Do people know what
the original turk was,

42
00:01:36,472 --> 00:01:37,680
the original Mechanical Turk?

43
00:01:37,680 --> 00:01:39,660
Who doesn't know,
raise their hands.

44
00:01:39,660 --> 00:01:44,760
OK, so Mechanical Turk, the name
comes from this 18th century

45
00:01:44,760 --> 00:01:48,090
mechanical contraption
called the turk,

46
00:01:48,090 --> 00:01:51,480
which was a chess playing
device that supposedly ran

47
00:01:51,480 --> 00:01:54,840
on clockwork and could beat some
of the finest minds in Europe.

48
00:01:54,840 --> 00:01:58,320
I think at some point it played
like the Austrian duchess

49
00:01:58,320 --> 00:02:00,570
or whatever it was the empress
and Napoleon and things

50
00:02:00,570 --> 00:02:01,119
like that.

51
00:02:01,119 --> 00:02:02,910
And it floored people,
how could this work?

52
00:02:02,910 --> 00:02:04,050
And you know, the
inventor said, well,

53
00:02:04,050 --> 00:02:05,383
I've invented a thinking device.

54
00:02:05,383 --> 00:02:06,990
And the clockwork
just solves it.

55
00:02:06,990 --> 00:02:09,840
And a lot of people of course
figured that it must be a hoax

56
00:02:09,840 --> 00:02:12,750
but they couldn't be sure
exactly how it worked.

57
00:02:12,750 --> 00:02:14,030
And it was, of course, a hoax.

58
00:02:14,030 --> 00:02:15,780
All of these gears and
boxes and clockwork

59
00:02:15,780 --> 00:02:18,660
is just designed to
distract you from the fact

60
00:02:18,660 --> 00:02:20,242
that you could fit
a person inside.

61
00:02:20,242 --> 00:02:22,200
I mean, people have
thought of that, obviously,

62
00:02:22,200 --> 00:02:24,690
but it was cleverly designed
so that you couldn't quite

63
00:02:24,690 --> 00:02:26,280
find it.

64
00:02:26,280 --> 00:02:29,280
And nobody knows for sure
because the original work was

65
00:02:29,280 --> 00:02:32,550
destroyed before people put
forward the exact hypothesis,

66
00:02:32,550 --> 00:02:35,200
but the consensus
now is that it must

67
00:02:35,200 --> 00:02:39,002
have been just some person
inside making the moves.

68
00:02:39,002 --> 00:02:40,460
At some point people
have suggested

69
00:02:40,460 --> 00:02:42,720
it must have been a well-trained
child or a small person

70
00:02:42,720 --> 00:02:44,845
or something like that
because the compartment must

71
00:02:44,845 --> 00:02:45,830
be really small.

72
00:02:45,830 --> 00:02:47,240
That's not thinking right now.

73
00:02:47,240 --> 00:02:51,240
It was just some more magic, but
not magic in the Harry Potter

74
00:02:51,240 --> 00:02:54,750
sense magic, in the
stage magic sense.

75
00:02:54,750 --> 00:02:56,910
So, what sorts of
tasks do people

76
00:02:56,910 --> 00:02:58,350
run on Amazon Mechanical Turk?

77
00:02:58,350 --> 00:02:59,820
Well actually, the
majority is not

78
00:02:59,820 --> 00:03:01,260
psychophysics or psychologists.

79
00:03:01,260 --> 00:03:03,426
There are a lot of companies
using Amazon Mechanical

80
00:03:03,426 --> 00:03:06,130
Turk to do things like, hey,
garner positive reviews for us.

81
00:03:06,130 --> 00:03:08,190
Go to this website and
write something nice

82
00:03:08,190 --> 00:03:10,110
about our product.

83
00:03:10,110 --> 00:03:12,150
Or you know, translate
some text for me instead

84
00:03:12,150 --> 00:03:14,010
of hiring some
professional to do it,

85
00:03:14,010 --> 00:03:16,937
most people who-- there's a lot
of people on Amazon Mechanical

86
00:03:16,937 --> 00:03:19,020
Turk that can probably
translate something for you

87
00:03:19,020 --> 00:03:21,180
from English to Spanish and
back, English to Chinese

88
00:03:21,180 --> 00:03:22,304
and back, things like that.

89
00:03:22,304 --> 00:03:23,970
And they can do it
for much less money

90
00:03:23,970 --> 00:03:26,460
than you would need to pay
a professional translator.

91
00:03:26,460 --> 00:03:29,460
Another thing that may be more
up your alley is, you know,

92
00:03:29,460 --> 00:03:32,280
you've heard a lot about
supervised training and things

93
00:03:32,280 --> 00:03:33,990
like that, and big
data sets that are

94
00:03:33,990 --> 00:03:36,530
used to train things like
convolutional neural networks.

95
00:03:36,530 --> 00:03:39,390
The supervised part usually
comes from somewhere.

96
00:03:39,390 --> 00:03:40,920
Somebody has to
tag those images.

97
00:03:40,920 --> 00:03:42,660
Somebody has to go
over a million images

98
00:03:42,660 --> 00:03:47,220
and say dog, not dog, dog, not
dog, not dog, two dogs, one

99
00:03:47,220 --> 00:03:48,900
dog.

100
00:03:48,900 --> 00:03:50,840
You want somebody to
go ahead and do that.

101
00:03:50,840 --> 00:03:52,590
And you don't have
artificial intelligence

102
00:03:52,590 --> 00:03:53,757
to do it for you, yet.

103
00:03:53,757 --> 00:03:55,590
I mean, CNN's are getting
better but someone

104
00:03:55,590 --> 00:03:56,850
had to tag it for them.

105
00:03:56,850 --> 00:03:57,870
And that's the sort of
thing that you would

106
00:03:57,870 --> 00:03:59,359
use Amazon Mechanical Turk for.

107
00:03:59,359 --> 00:04:00,150
Look at this image.

108
00:04:00,150 --> 00:04:02,400
Tell me, is there a social
interaction or isn't there?

109
00:04:02,400 --> 00:04:03,930
Now do that for the
next 100 images.

110
00:04:03,930 --> 00:04:05,640
Let's get 1,000
people to do that.

111
00:04:05,640 --> 00:04:08,100
You get the sense of like why
you would want to crowdsource

112
00:04:08,100 --> 00:04:10,680
this kind of problem.

113
00:04:10,680 --> 00:04:13,996
But there's also, you know,
just psychology, psychophysics,

114
00:04:13,996 --> 00:04:15,870
the sort of thing that
you would bring people

115
00:04:15,870 --> 00:04:20,350
into the lab to do but you don't
want to do for various reasons.

116
00:04:20,350 --> 00:04:23,089
So the idea is you could collect
a lot of people responding

117
00:04:23,089 --> 00:04:25,380
to stuff on the screen right,
exactly the sort of thing

118
00:04:25,380 --> 00:04:27,880
that you would bring them into
the lab and measure something

119
00:04:27,880 --> 00:04:29,190
that they're doing to a screen.

120
00:04:29,190 --> 00:04:31,860
You could just have them doing
that to the screen at home,

121
00:04:31,860 --> 00:04:35,290
or wherever the hell it is
that they're doing this thing.

122
00:04:35,290 --> 00:04:38,734
So let's see, you could
do things like perception.

123
00:04:38,734 --> 00:04:39,900
You know, just look at this.

124
00:04:39,900 --> 00:04:40,530
Is it a dog?

125
00:04:40,530 --> 00:04:41,100
Yes or no.

126
00:04:41,100 --> 00:04:43,980
Or rather things
like the Stroop task.

127
00:04:43,980 --> 00:04:45,660
If nobody had invented
the Stroop task,

128
00:04:45,660 --> 00:04:48,200
you could do that on Amazon
Mechanical Turk and get famous.

129
00:04:48,200 --> 00:04:51,509
I was going to say rich and
famous, but just famous.

130
00:04:51,509 --> 00:04:53,050
You could do various
attention tasks.

131
00:04:53,050 --> 00:04:55,560
You could do things like
learning and categorization

132
00:04:55,560 --> 00:04:56,130
and bias.

133
00:04:56,130 --> 00:04:58,630
Learning things like, here's
a tufa, here's a tufa,

134
00:04:58,630 --> 00:04:59,790
here's a tufa.

135
00:04:59,790 --> 00:05:01,260
Is this a tufa?

136
00:05:01,260 --> 00:05:03,250
I mean, it's very easy
to do that on a screen.

137
00:05:03,250 --> 00:05:06,180
You can do that on
Amazon Mechanical Turk.

138
00:05:06,180 --> 00:05:07,830
You could do things
like implicit bias.

139
00:05:07,830 --> 00:05:09,480
A lot of social
psychologists are

140
00:05:09,480 --> 00:05:11,515
interested in that
sort of thing.

141
00:05:11,515 --> 00:05:12,390
Let's see, what else.

142
00:05:12,390 --> 00:05:15,217
You could do things like
morality, the trolley problem.

143
00:05:15,217 --> 00:05:17,550
You could do things like
decision making, and economics,

144
00:05:17,550 --> 00:05:20,082
and prisoner dilemmas,
and making predictions,

145
00:05:20,082 --> 00:05:22,540
and tell us which one of these
two movies based on trailers

146
00:05:22,540 --> 00:05:24,970
do you think will
win the best actor

147
00:05:24,970 --> 00:05:28,446
award and things like that.

148
00:05:28,446 --> 00:05:30,820
How would you actually run
something on Amazon Mechanical

149
00:05:30,820 --> 00:05:31,150
Turk?

150
00:05:31,150 --> 00:05:33,483
The first thing you would do
is register as a requester.

151
00:05:33,483 --> 00:05:35,609
You would go to
requester.amazon.--

152
00:05:35,609 --> 00:05:36,650
I'll send you the slides.

153
00:05:36,650 --> 00:05:39,233
You can just click on that link
and it'll take you to the page

154
00:05:39,233 --> 00:05:40,540
to register as a requester.

155
00:05:40,540 --> 00:05:42,725
You would then you
do one of two things.

156
00:05:42,725 --> 00:05:44,350
You could either do
the vanilla version

157
00:05:44,350 --> 00:05:46,187
which is to just use
the Amazon template.

158
00:05:46,187 --> 00:05:47,770
Amazon has made it
very simple for you

159
00:05:47,770 --> 00:05:50,560
as a requester to bring up a
new experiment and just say,

160
00:05:50,560 --> 00:05:52,660
I want to ask a simple
set of questions.

161
00:05:52,660 --> 00:05:53,250
Now go.

162
00:05:53,250 --> 00:05:55,333
And there are some parameters
that you need to set

163
00:05:55,333 --> 00:05:58,450
and I'll show you those
in a second, what they do.

164
00:05:58,450 --> 00:06:00,670
And if you're only interested
in very simple things,

165
00:06:00,670 --> 00:06:02,470
like fill out this
box or click on one

166
00:06:02,470 --> 00:06:04,653
of these two images or something
like that, that's perfect

167
00:06:04,653 --> 00:06:05,240
and it's fine.

168
00:06:05,240 --> 00:06:06,880
And Amazon takes care of
a lot of things for you

169
00:06:06,880 --> 00:06:08,110
like saving the data.

170
00:06:08,110 --> 00:06:11,500
You don't have to mess around
with like SQL databases

171
00:06:11,500 --> 00:06:13,425
and things like
that on your own.

172
00:06:13,425 --> 00:06:14,800
The other thing
that you could do

173
00:06:14,800 --> 00:06:16,960
is to point people to
an external website.

174
00:06:16,960 --> 00:06:19,750
Then you would have
to host it somehow.

175
00:06:19,750 --> 00:06:21,550
The advantage there
is that then you

176
00:06:21,550 --> 00:06:24,047
could do any sort of
fanciness that you want.

177
00:06:24,047 --> 00:06:25,630
You could show them
custom animations,

178
00:06:25,630 --> 00:06:28,870
have them play a game, record
how they're playing that game,

179
00:06:28,870 --> 00:06:31,450
send them new things based on
how they're playing that game.

180
00:06:31,450 --> 00:06:33,160
Or you could do
things like, you wait

181
00:06:33,160 --> 00:06:34,630
until you record two people.

182
00:06:34,630 --> 00:06:35,770
You record two people.

183
00:06:35,770 --> 00:06:37,020
If you just recruit
one person, it just

184
00:06:37,020 --> 00:06:38,320
says waiting, waiting, waiting.

185
00:06:38,320 --> 00:06:39,370
Now you wait for
the next person.

186
00:06:39,370 --> 00:06:41,203
Now you have them pitted
against one another

187
00:06:41,203 --> 00:06:42,310
in some sort of game.

188
00:06:42,310 --> 00:06:44,061
This has become more
popular in economics.

189
00:06:44,061 --> 00:06:45,643
That's not the sort
of thing you could

190
00:06:45,643 --> 00:06:47,050
do with the Amazon template.

191
00:06:47,050 --> 00:06:48,640
But if you're good
at coding or you

192
00:06:48,640 --> 00:06:50,590
can hire someone that's
reasonable at coding,

193
00:06:50,590 --> 00:06:52,923
you can do that by pointing
them to an external website.

194
00:06:52,923 --> 00:06:56,050
And we can show some
examples of that.

195
00:06:56,050 --> 00:06:59,324
Then once you decide-- you
register as a requester.

196
00:06:59,324 --> 00:07:01,240
You decide which one of
these things you want.

197
00:07:01,240 --> 00:07:04,030
You build your website, either
you build the external website

198
00:07:04,030 --> 00:07:06,730
or you just use the
Amazon template.

199
00:07:06,730 --> 00:07:08,860
And then you test
it on a sandbox.

200
00:07:08,860 --> 00:07:11,430
Don't run your experiments
live immediately.

201
00:07:11,430 --> 00:07:13,530
I'll be giving sort of
tips throughout the thing.

202
00:07:13,530 --> 00:07:16,210
This might be redundant for
some of you but not redundant

203
00:07:16,210 --> 00:07:17,420
for others.

204
00:07:17,420 --> 00:07:19,930
So there's a sandbox
where you can just

205
00:07:19,930 --> 00:07:21,940
run it and get sort
of false responses

206
00:07:21,940 --> 00:07:24,640
that don't count exactly where
people can sort of fill it in.

207
00:07:24,640 --> 00:07:26,556
You might want to used
that before you go live

208
00:07:26,556 --> 00:07:28,150
with 1,000 people
and say, oh god,

209
00:07:28,150 --> 00:07:30,820
I miscoded that variable
and nothing is working.

210
00:07:30,820 --> 00:07:33,097
So test it ahead of time.

211
00:07:33,097 --> 00:07:34,930
And then, once you're
finally, finally done,

212
00:07:34,930 --> 00:07:36,010
you would submit it.

213
00:07:36,010 --> 00:07:38,180
You would just click, you
know, submit this thing.

214
00:07:38,180 --> 00:07:40,030
You would pay the
money to Amazon.

215
00:07:40,030 --> 00:07:42,940
And you also want to announce
it on several Mechanical Turk

216
00:07:42,940 --> 00:07:44,530
forums, which I'll
get to in the end.

217
00:07:44,530 --> 00:07:45,940
These are very helpful people.

218
00:07:45,940 --> 00:07:47,940
They're very nice to you
if you're nice to them.

219
00:07:47,940 --> 00:07:51,100
And it gets you a lot
more results very fast.

220
00:07:51,100 --> 00:07:54,010
OK, let's see.

221
00:07:54,010 --> 00:07:55,390
Why don't I show
you an example--

222
00:07:57,695 --> 00:08:00,070
let me show you an example of
what a requester page looks

223
00:08:00,070 --> 00:08:01,900
like, just to get a sense of it
for those of you who have not

224
00:08:01,900 --> 00:08:03,464
seen this sort of thing before.

225
00:08:03,464 --> 00:08:05,380
You can see our experiment
is almost finished.

226
00:08:05,380 --> 00:08:08,360
I asked for 100 people on that.

227
00:08:12,440 --> 00:08:14,790
So let's go to
something like create.

228
00:08:14,790 --> 00:08:16,780
So this is what the
requester page looks like.

229
00:08:16,780 --> 00:08:18,238
Here's all sorts
of projects that I

230
00:08:18,238 --> 00:08:19,990
run on Amazon Mechanical Turk.

231
00:08:19,990 --> 00:08:21,640
And you would do
something like new

232
00:08:21,640 --> 00:08:23,390
or you would copy
something from something

233
00:08:23,390 --> 00:08:24,431
old that you already did.

234
00:08:24,431 --> 00:08:25,810
Let's just edit
that and show you

235
00:08:25,810 --> 00:08:27,520
some examples of
what you can do.

236
00:08:27,520 --> 00:08:29,440
You give a name,
you know, a title

237
00:08:29,440 --> 00:08:32,100
for your own internal
title, like AI estimate.

238
00:08:32,100 --> 00:08:34,630
You then give a title that
Amazon Mechanical Turkers see,

239
00:08:34,630 --> 00:08:36,669
something like artificial
intelligence estimate,

240
00:08:36,669 --> 00:08:38,260
short psychology study.

241
00:08:38,260 --> 00:08:40,360
There, you probably want
to give a time estimate.

242
00:08:40,360 --> 00:08:43,045
Turkers would prefer it if
you give them a time estimate.

243
00:08:43,045 --> 00:08:44,169
They care about their time.

244
00:08:44,169 --> 00:08:46,630
They care about their money.

245
00:08:46,630 --> 00:08:48,049
They do like doing psychology.

246
00:08:48,049 --> 00:08:49,840
They don't like filling
in endless bubbles,

247
00:08:49,840 --> 00:08:51,640
you know, the standard
psychology things

248
00:08:51,640 --> 00:08:55,150
where you rate 100 things like,
I feel this way or that way.

249
00:08:55,150 --> 00:08:56,110
Don't do that.

250
00:08:56,110 --> 00:08:58,693
But the sort of fun psychology
they're actually on board with.

251
00:08:58,693 --> 00:09:01,810
It's much more fun than
writing show reviews.

252
00:09:01,810 --> 00:09:04,360
You might want to give it a nice
title that will entice them,

253
00:09:04,360 --> 00:09:06,190
an honest description,
like you know,

254
00:09:06,190 --> 00:09:07,690
you'll answer a few
simple questions

255
00:09:07,690 --> 00:09:09,898
on you'll watch a movie and
then answer two questions

256
00:09:09,898 --> 00:09:10,800
or things like that.

257
00:09:10,800 --> 00:09:13,690
Key words like easy, fun,
something descriptive

258
00:09:13,690 --> 00:09:15,670
about the task, like AI, short.

259
00:09:15,670 --> 00:09:17,710
Again, this is sort
of luring, and it's

260
00:09:17,710 --> 00:09:20,440
good to do that if
you're honest about it.

261
00:09:20,440 --> 00:09:22,490
Some people do things
like easy, fun,

262
00:09:22,490 --> 00:09:24,987
100 pages of filling in
bubbles and things like that.

263
00:09:24,987 --> 00:09:25,570
Don't do that.

264
00:09:25,570 --> 00:09:27,695
They publish it in the
forums, like, this is a lie.

265
00:09:27,695 --> 00:09:29,179
Don't do that.

266
00:09:29,179 --> 00:09:31,720
This is where you say how much
you want to pay per assignment

267
00:09:31,720 --> 00:09:34,290
and we'll get to how much you
should pay per assignment.

268
00:09:34,290 --> 00:09:36,010
You'll notice it's very little.

269
00:09:36,010 --> 00:09:37,900
You're paying these
people very little.

270
00:09:37,900 --> 00:09:39,910
How much assignments
you want per hit.

271
00:09:39,910 --> 00:09:42,070
Hit is just the
name for your task.

272
00:09:42,070 --> 00:09:44,350
How much time you are
allotting them per assignment.

273
00:09:44,350 --> 00:09:46,450
So someone has
accepted your hit, now

274
00:09:46,450 --> 00:09:49,120
how long do they
have to carry it out.

275
00:09:49,120 --> 00:09:51,070
You might think,
oh, my task only

276
00:09:51,070 --> 00:09:53,085
takes two to three minutes,
so only give people

277
00:09:53,085 --> 00:09:53,960
two to three minutes.

278
00:09:53,960 --> 00:09:55,570
I don't want them
like taking the hit

279
00:09:55,570 --> 00:09:57,195
and then going and
drinking some coffee

280
00:09:57,195 --> 00:09:59,740
and then coming back to
it or something like that.

281
00:09:59,740 --> 00:10:01,914
Consider still giving them
a whole bunch of time,

282
00:10:01,914 --> 00:10:04,330
because a lot of the time,
you'll find that people if they

283
00:10:04,330 --> 00:10:06,440
see that there's only a
five minute mark on it,

284
00:10:06,440 --> 00:10:07,760
and if they don't
completed in five minutes

285
00:10:07,760 --> 00:10:09,926
they're sort of concerned,
like, what is this thing?

286
00:10:09,926 --> 00:10:12,820
And if I won't get done through
it in five minutes, you know,

287
00:10:12,820 --> 00:10:15,357
I'll get disqualified
or something like that.

288
00:10:15,357 --> 00:10:16,190
Give them some time.

289
00:10:16,190 --> 00:10:19,030
Give them more than the ample
time to finish this thing.

290
00:10:19,030 --> 00:10:21,190
You can, yourself,
keep around some timer

291
00:10:21,190 --> 00:10:23,315
within the external website
or something like that,

292
00:10:23,315 --> 00:10:25,810
or they actually actively
participating right now.

293
00:10:25,810 --> 00:10:28,000
How long will this
hit stay up for?

294
00:10:28,000 --> 00:10:30,850
And auto approve,
and things like that.

295
00:10:30,850 --> 00:10:33,395
I'll talk a little bit
about rewards and incentives

296
00:10:33,395 --> 00:10:34,270
and things like that.

297
00:10:34,270 --> 00:10:37,479
Obviously, they care a lot
about things like money.

298
00:10:37,479 --> 00:10:39,520
They care a lot about
things like how much you're

299
00:10:39,520 --> 00:10:40,160
going to pay them.

300
00:10:40,160 --> 00:10:42,034
They care about doing
it quickly so that they

301
00:10:42,034 --> 00:10:43,720
can move on and get more money.

302
00:10:43,720 --> 00:10:45,370
They care about it
being somewhat fun,

303
00:10:45,370 --> 00:10:47,270
but that's not such
a big deal for them.

304
00:10:47,270 --> 00:10:49,510
And they care about getting
it approved quickly.

305
00:10:49,510 --> 00:10:52,210
OK, so you as a
requester, you will

306
00:10:52,210 --> 00:10:55,110
get reviewed on various
forums and things like that.

307
00:10:55,110 --> 00:10:57,610
If you get bad reviews, people
don't want to do your things.

308
00:10:57,610 --> 00:11:00,370
One of the things that people
care a lot about is something

309
00:11:00,370 --> 00:11:02,157
like getting approved quickly.

310
00:11:02,157 --> 00:11:04,490
And quickly can be in a day
or two, something like that.

311
00:11:04,490 --> 00:11:07,031
They don't want to have to wait
two weeks for that $0.50 that

312
00:11:07,031 --> 00:11:09,160
you were supposed to give them.

313
00:11:09,160 --> 00:11:10,570
You can do that if you want to.

314
00:11:10,570 --> 00:11:12,160
You have the power
as the requester.

315
00:11:12,160 --> 00:11:14,770
But if you want to
incentivize people,

316
00:11:14,770 --> 00:11:16,810
try to make sure that
you approve them quickly

317
00:11:16,810 --> 00:11:19,990
and let them know that you're
going to approve them quickly.

318
00:11:19,990 --> 00:11:23,830
So that was just a
very general statement.

319
00:11:23,830 --> 00:11:25,680
So who are the
sort of people that

320
00:11:25,680 --> 00:11:27,032
are on Amazon Mechanical Turk?

321
00:11:27,032 --> 00:11:29,490
Have people read these sort of
papers and things like that?

322
00:11:29,490 --> 00:11:30,580
Do you know more or less?

323
00:11:30,580 --> 00:11:31,205
Some of you do.

324
00:11:31,205 --> 00:11:32,590
Some of you don't.

325
00:11:32,590 --> 00:11:34,664
The use in India
make up about 80%--

326
00:11:34,664 --> 00:11:35,830
by the way, this is in flux.

327
00:11:35,830 --> 00:11:37,920
Like a study came out two years
ago about this sort of thing.

328
00:11:37,920 --> 00:11:39,080
It's changed since then.

329
00:11:39,080 --> 00:11:42,819
But the study two years ago, the
US and India make up about 80%,

330
00:11:42,819 --> 00:11:44,860
with the US taking up
50-something percent, India

331
00:11:44,860 --> 00:11:46,195
taking up the rest.

332
00:11:46,195 --> 00:11:48,820
There are slightly more females
than males on Amazon Mechanical

333
00:11:48,820 --> 00:11:50,920
Turk, and at least
the US population

334
00:11:50,920 --> 00:11:54,310
is biased towards young and
educated people, educated

335
00:11:54,310 --> 00:11:56,110
meaning a bachelor's degree.

336
00:11:56,110 --> 00:11:57,880
It's certainly
more representative

337
00:11:57,880 --> 00:12:00,130
of the general population
than just hiring,

338
00:12:00,130 --> 00:12:03,010
you know, college students
during their bachelor's degree.

339
00:12:03,010 --> 00:12:06,310
But it is still skewed,
keep that in mind.

340
00:12:06,310 --> 00:12:08,560
It's skewed towards, basically,
the sort of population

341
00:12:08,560 --> 00:12:11,150
that you would expect to find
on the internet in the United

342
00:12:11,150 --> 00:12:11,650
States.

343
00:12:11,650 --> 00:12:15,542
OK, so somewhat younger
people, somewhat more educated.

344
00:12:15,542 --> 00:12:17,500
In general, you might
want to look at something

345
00:12:17,500 --> 00:12:20,070
like Mason and
Siddharth were looking

346
00:12:20,070 --> 00:12:21,070
at these sort of things.

347
00:12:21,070 --> 00:12:22,940
I can send you the links later.

348
00:12:22,940 --> 00:12:24,690
I was talking a little
bit about payments.

349
00:12:24,690 --> 00:12:26,190
There's been a whole
load of studies

350
00:12:26,190 --> 00:12:29,920
looking at querying the
Amazon Mechanical Turk pool.

351
00:12:29,920 --> 00:12:30,460
Who are you?

352
00:12:30,460 --> 00:12:31,376
What's your education?

353
00:12:31,376 --> 00:12:32,680
Why are you doing this?

354
00:12:32,680 --> 00:12:34,500
And then they ask why are you
doing this in various ways.

355
00:12:34,500 --> 00:12:35,400
Are you doing it for fun?

356
00:12:35,400 --> 00:12:36,460
Are you doing it to kill time?

357
00:12:36,460 --> 00:12:37,780
Are you doing this as
a supplementary thing?

358
00:12:37,780 --> 00:12:38,840
They're in it for money.

359
00:12:38,840 --> 00:12:40,798
It's very obvious that
they're in it for money.

360
00:12:40,798 --> 00:12:41,980
That's OK.

361
00:12:41,980 --> 00:12:45,220
And keep that in mind
when you post hits.

362
00:12:45,220 --> 00:12:48,090
You have a lot of
power as a requester.

363
00:12:48,090 --> 00:12:49,090
You have a ton of power.

364
00:12:49,090 --> 00:12:51,256
You have a ton of power to
dictate the terms of what

365
00:12:51,256 --> 00:12:52,300
you're going to pay them.

366
00:12:52,300 --> 00:12:54,600
You have a ton of power
to reject their work.

367
00:12:54,600 --> 00:12:57,100
Once they basically do the hit
for you, you can then go back

368
00:12:57,100 --> 00:12:59,200
and say, you failed
this question

369
00:12:59,200 --> 00:13:00,970
or you didn't quite
get what we wanted,

370
00:13:00,970 --> 00:13:03,130
or you actually did the
study two months ago

371
00:13:03,130 --> 00:13:05,999
but I didn't implement checks
for that, I just asked you,

372
00:13:05,999 --> 00:13:08,290
did you take the study before
and they didn't remember.

373
00:13:08,290 --> 00:13:09,159
Something like that.

374
00:13:09,159 --> 00:13:10,450
And then you reject their work.

375
00:13:10,450 --> 00:13:12,320
And if you reject it too many
times, then they get banned.

376
00:13:12,320 --> 00:13:13,970
First of all, if you
reject, they don't get paid.

377
00:13:13,970 --> 00:13:15,980
If you reject it too many
times, they get banned.

378
00:13:15,980 --> 00:13:18,396
These are people that are doing
it either as supplementary

379
00:13:18,396 --> 00:13:21,210
or as their main income.

380
00:13:21,210 --> 00:13:23,617
I don't know-- this may
seem obvious to some of you.

381
00:13:23,617 --> 00:13:25,783
If this doesn't seem obvious
to at least one of you,

382
00:13:25,783 --> 00:13:30,860
then I'll count it as
worthwhile to stress this.

383
00:13:30,860 --> 00:13:33,020
You don't care about the $0.20.

384
00:13:33,020 --> 00:13:35,292
These people do care
about the $0.20.

385
00:13:35,292 --> 00:13:37,250
And again, it's not
because they're necessarily

386
00:13:37,250 --> 00:13:39,010
from poor economical
backgrounds,

387
00:13:39,010 --> 00:13:40,760
but they're in it for
the money and that's

388
00:13:40,760 --> 00:13:42,410
what they're doing this for.

389
00:13:42,410 --> 00:13:44,854
Try to give them fair pay.

390
00:13:44,854 --> 00:13:46,520
And we'll stress that
again and tell you

391
00:13:46,520 --> 00:13:48,080
what I mean by fair pay.

392
00:13:48,080 --> 00:13:49,520
Try not to reject them.

393
00:13:49,520 --> 00:13:51,620
OK, except in
extreme situations.

394
00:13:51,620 --> 00:13:52,520
Even if they failed.

395
00:13:52,520 --> 00:13:54,311
Even if they didn't do
your catch question.

396
00:13:54,311 --> 00:13:56,480
Even if you think that they
just zoomed through this

397
00:13:56,480 --> 00:13:58,490
or something like that,
that's usually on you

398
00:13:58,490 --> 00:14:01,130
to catch that, as a
psychology researcher.

399
00:14:01,130 --> 00:14:02,190
You're not a company.

400
00:14:02,190 --> 00:14:04,040
You're a psychology researcher.

401
00:14:04,040 --> 00:14:05,600
Try not to reject people.

402
00:14:05,600 --> 00:14:07,730
Make sure you have some
ways set up ahead of time,

403
00:14:07,730 --> 00:14:10,040
and I'll get to that,
to know who to reject.

404
00:14:10,040 --> 00:14:13,100
But don't actually reject
people except in really extreme

405
00:14:13,100 --> 00:14:16,204
situations.

406
00:14:16,204 --> 00:14:18,620
Something about payments is
that Amazon takes up about 10.

407
00:14:18,620 --> 00:14:22,940
They actually raise this to
20% to 40% of the payments.

408
00:14:22,940 --> 00:14:24,440
And this is what I
was going to say.

409
00:14:24,440 --> 00:14:26,898
There have been some attempts
within the psychologists that

410
00:14:26,898 --> 00:14:28,910
has been doing Amazon
Mechanical Turk studies,

411
00:14:28,910 --> 00:14:31,380
and this has also been fueled
by the community of Amazon

412
00:14:31,380 --> 00:14:34,640
Mechanical Turkers, or
Turkers, to establish some sort

413
00:14:34,640 --> 00:14:37,430
of guidelines for minimum pay.

414
00:14:37,430 --> 00:14:38,990
So if people come
into the lab, there

415
00:14:38,990 --> 00:14:41,323
are some guidelines on what
you're supposed to pay them.

416
00:14:41,323 --> 00:14:43,170
There are no exact guidelines.

417
00:14:43,170 --> 00:14:46,314
There's no enforcing
guidelines, at least not--

418
00:14:46,314 --> 00:14:48,230
maybe there are within
particular universities

419
00:14:48,230 --> 00:14:50,676
but there's no cross
university one guideline

420
00:14:50,676 --> 00:14:52,550
to tell you you have to
pay people this much.

421
00:14:52,550 --> 00:14:54,000
And that might be
tempting to say,

422
00:14:54,000 --> 00:14:55,580
oh, I'll just pay
people, you know,

423
00:14:55,580 --> 00:14:58,100
the minimum I can get away with.

424
00:14:58,100 --> 00:15:00,680
I mean, if I can pay people
$0.05 to do a 10 minute task,

425
00:15:00,680 --> 00:15:01,310
I'll do that.

426
00:15:01,310 --> 00:15:03,350
Fine, I can get the
20 subjects I need.

427
00:15:03,350 --> 00:15:05,240
It's a free market.

428
00:15:05,240 --> 00:15:09,470
We're not trying to live here in
some sort of capitalist fantasy

429
00:15:09,470 --> 00:15:10,140
of some sort.

430
00:15:10,140 --> 00:15:12,807
I'm not going to get into
economics too much because I

431
00:15:12,807 --> 00:15:15,140
think you guys know that
better than I do you don't need

432
00:15:15,140 --> 00:15:16,880
my lecturing in that sense.

433
00:15:16,880 --> 00:15:20,540
But a lot of people who have
looked into the ethics of this

434
00:15:20,540 --> 00:15:22,760
recommend that you try
to estimate ahead of time

435
00:15:22,760 --> 00:15:26,240
through a pilot how long
is this test going to take.

436
00:15:26,240 --> 00:15:30,950
Based on that, pay them such
that it matches minimum wage.

437
00:15:30,950 --> 00:15:32,930
Minimum wage being
somewhat flux, like,

438
00:15:32,930 --> 00:15:35,182
you know, I forget if
it's like $10 an hour

439
00:15:35,182 --> 00:15:36,140
or something like that.

440
00:15:36,140 --> 00:15:37,348
It's probably less than that.

441
00:15:37,348 --> 00:15:39,399
But something like that.

442
00:15:39,399 --> 00:15:40,940
I'm not going to
tell you exactly how

443
00:15:40,940 --> 00:15:41,981
much you should pay them.

444
00:15:41,981 --> 00:15:44,840
But try to figure out more or
less minimum wage, more or less

445
00:15:44,840 --> 00:15:47,200
how long your task takes and
pay them according to that.

446
00:15:49,919 --> 00:15:51,710
Some general advantages
of Mechanical Turk,

447
00:15:51,710 --> 00:15:55,430
in case you guys have not been
persuaded yet, let me say.

448
00:15:55,430 --> 00:15:57,020
You can run large
scale experiments.

449
00:15:57,020 --> 00:15:59,405
I started running
this experiment

450
00:15:59,405 --> 00:16:01,530
on 100 people that would
have taken me a long time.

451
00:16:01,530 --> 00:16:03,440
It's a silly experiment
as Nori pointed out.

452
00:16:03,440 --> 00:16:04,939
It's not even exactly
an experiment.

453
00:16:04,939 --> 00:16:08,480
But I wanted to check
how people's responses

454
00:16:08,480 --> 00:16:10,430
compared to people in CBMM.

455
00:16:10,430 --> 00:16:13,290
First of all, I wouldn't
do that in the lab.

456
00:16:13,290 --> 00:16:15,497
So there's that.

457
00:16:15,497 --> 00:16:18,080
But even if I were to do it in
a lab, getting 100 participants

458
00:16:18,080 --> 00:16:19,850
would take a long, long time.

459
00:16:19,850 --> 00:16:21,800
And for your more
serious experiments,

460
00:16:21,800 --> 00:16:24,440
getting 100 participants
would take a long, long time.

461
00:16:24,440 --> 00:16:26,220
Each one has to come in and
you have to talk to them.

462
00:16:26,220 --> 00:16:28,553
And you have to explain to
them exactly what's going on.

463
00:16:28,553 --> 00:16:30,380
And you usually can't
run them in parallel,

464
00:16:30,380 --> 00:16:32,546
or at least you can run
only one or two in parallel.

465
00:16:32,546 --> 00:16:35,260
Here we run 100
subjects in an hour.

466
00:16:35,260 --> 00:16:36,760
And that's still amazing.

467
00:16:36,760 --> 00:16:38,900
That's still flooring me.

468
00:16:38,900 --> 00:16:41,040
And we can do it, so we
can do it very quickly.

469
00:16:41,040 --> 00:16:42,421
A lot of people very quickly.

470
00:16:42,421 --> 00:16:44,420
And what you can do with
large scale experiments

471
00:16:44,420 --> 00:16:47,342
is usually test some very fine
grain things of your model.

472
00:16:47,342 --> 00:16:48,800
If your model has
some things like,

473
00:16:48,800 --> 00:16:51,260
well, I need to show people
all of these different things

474
00:16:51,260 --> 00:16:52,490
and make all of these
different predictions.

475
00:16:52,490 --> 00:16:54,500
And I just need 300
people to do that.

476
00:16:54,500 --> 00:16:56,960
Or for example, what my dad
was presenting yesterday,

477
00:16:56,960 --> 00:16:58,292
these minimal images.

478
00:16:58,292 --> 00:16:59,750
Did he mention how
many people they

479
00:16:59,750 --> 00:17:01,416
had to recruit on
Amazon Mechanical Turk

480
00:17:01,416 --> 00:17:03,320
to do those minimal images?

481
00:17:03,320 --> 00:17:04,180
Yes?

482
00:17:04,180 --> 00:17:05,900
It's like it's thousands.

483
00:17:05,900 --> 00:17:08,949
I think it's over 10,000 at this
point, or something like that.

484
00:17:08,949 --> 00:17:10,490
And the reason is
because once you've

485
00:17:10,490 --> 00:17:12,874
seen that thing, once you've
seen the minimal image,

486
00:17:12,874 --> 00:17:14,040
you already know what it is.

487
00:17:14,040 --> 00:17:14,869
You're biased.

488
00:17:14,869 --> 00:17:17,660
You know it's a horse even
though a naive participant

489
00:17:17,660 --> 00:17:18,950
wouldn't know it's a horse.

490
00:17:18,950 --> 00:17:21,140
So you want to make sure
that you want 10,000

491
00:17:21,140 --> 00:17:22,599
participants for this thing.

492
00:17:22,599 --> 00:17:24,890
You're not going to get 10,000
participants in the lab.

493
00:17:24,890 --> 00:17:27,329
No way.

494
00:17:27,329 --> 00:17:29,330
So another thing
is that, as I said,

495
00:17:29,330 --> 00:17:31,040
even if you're paying
people minimum wage

496
00:17:31,040 --> 00:17:32,360
and things like that,
it's still pretty cheap

497
00:17:32,360 --> 00:17:33,420
to get 100 subjects.

498
00:17:33,420 --> 00:17:34,430
It's cheapish.

499
00:17:34,430 --> 00:17:37,100
The ish is because you should
pay people some minimum wage.

500
00:17:37,100 --> 00:17:40,070
It's replicable, ish.

501
00:17:40,070 --> 00:17:43,202
What I mean here by replicable
is not what you might think,

502
00:17:43,202 --> 00:17:44,660
which I'll get to
in another slide.

503
00:17:44,660 --> 00:17:47,810
It's just if you want to hand
it off to another person.

504
00:17:47,810 --> 00:17:50,670
Someone says, I don't quite
understand your protocol,

505
00:17:50,670 --> 00:17:53,330
or I don't quite believe it, or
I want to tweak it in some way.

506
00:17:53,330 --> 00:17:55,460
It's much harder, usually,
with lab protocols and things

507
00:17:55,460 --> 00:17:55,960
like that.

508
00:17:55,960 --> 00:17:58,490
We certainly know that
in baby experiments.

509
00:17:58,490 --> 00:18:01,579
Wouldn't it be nice if we
could just port, you know,

510
00:18:01,579 --> 00:18:03,620
I won't say who because
it doesn't really matter,

511
00:18:03,620 --> 00:18:04,940
but some experiments of people.

512
00:18:04,940 --> 00:18:06,648
They describe their
methods in the paper.

513
00:18:06,648 --> 00:18:07,773
It's not really that great.

514
00:18:07,773 --> 00:18:09,314
Wouldn't it be great
if we could just

515
00:18:09,314 --> 00:18:11,990
copy paste their experiment
and run it with some tweaking.

516
00:18:11,990 --> 00:18:13,550
With this sort of
thing, you can.

517
00:18:13,550 --> 00:18:15,530
I mean, you need to be somewhat
on good terms with the person

518
00:18:15,530 --> 00:18:17,240
you're asking for, but they can
just tell you, oh yeah, sure.

519
00:18:17,240 --> 00:18:19,220
Here's the code for my website.

520
00:18:19,220 --> 00:18:20,450
Just run it again.

521
00:18:20,450 --> 00:18:23,990
Run it with your tweaks
and things like that.

522
00:18:23,990 --> 00:18:27,555
There's an ish there and
I'll get to it in a second.

523
00:18:27,555 --> 00:18:29,680
The participant pool, as
I said, it's more diverse.

524
00:18:29,680 --> 00:18:31,820
So I was I was harping
before about this point

525
00:18:31,820 --> 00:18:35,210
that the pool doesn't quite
represent the US in general,

526
00:18:35,210 --> 00:18:37,760
it's more like the US
population on the internet.

527
00:18:37,760 --> 00:18:39,920
That's still a lot more
diverse than recruiting

528
00:18:39,920 --> 00:18:41,570
college students.

529
00:18:41,570 --> 00:18:44,510
It's a lot more diverse, let's
see, I have here in response

530
00:18:44,510 --> 00:18:47,721
to gender, socioeconomic status,
geographic region, and age.

531
00:18:47,721 --> 00:18:49,220
On all these things
that people have

532
00:18:49,220 --> 00:18:50,594
tested on Amazon
Mechanical Turk,

533
00:18:50,594 --> 00:18:52,089
the sample is a
lot more diverse,

534
00:18:52,089 --> 00:18:54,380
is a lot more representative
of the general population,

535
00:18:54,380 --> 00:18:56,750
is a lot less weird.

536
00:18:56,750 --> 00:19:02,390
Weird being like Western,
educated, industrialized, rich,

537
00:19:02,390 --> 00:19:04,370
democracies, which is
usually the pool that's

538
00:19:04,370 --> 00:19:05,930
been studied in psychology.

539
00:19:05,930 --> 00:19:07,970
These pools are a
lot more diverse.

540
00:19:07,970 --> 00:19:09,977
They're not diverse
enough for certain things

541
00:19:09,977 --> 00:19:12,560
in social psychology, and that's
this paper by Weinberg, where

542
00:19:12,560 --> 00:19:15,710
he says, you know, sometimes
social psychologists really,

543
00:19:15,710 --> 00:19:17,630
really, really want to
control for making sure

544
00:19:17,630 --> 00:19:20,070
that age is not a factor,
or something like that.

545
00:19:20,070 --> 00:19:22,430
Or they really want to get
it what the population is,

546
00:19:22,430 --> 00:19:24,930
or age they think plays a
factor, or something like that.

547
00:19:24,930 --> 00:19:26,929
So you need a population
where they call it

548
00:19:26,929 --> 00:19:28,220
sort of like knowledge experts.

549
00:19:28,220 --> 00:19:29,666
You build some pool.

550
00:19:29,666 --> 00:19:31,040
You build some
pool that you say,

551
00:19:31,040 --> 00:19:32,990
OK, the reason
this pool exists is

552
00:19:32,990 --> 00:19:35,040
because it's representative
of the population.

553
00:19:35,040 --> 00:19:36,140
And now we're going
to go to this pool

554
00:19:36,140 --> 00:19:38,030
and just try them again
and again and again

555
00:19:38,030 --> 00:19:39,330
on many different experiments.

556
00:19:39,330 --> 00:19:41,240
It's sort of like
your, you know,

557
00:19:41,240 --> 00:19:44,120
not exactly private but shared
between some universities

558
00:19:44,120 --> 00:19:48,309
pool of social
psychology participants.

559
00:19:48,309 --> 00:19:49,850
They've tested that
and they've shown

560
00:19:49,850 --> 00:19:51,266
that mechanical
turkers are better

561
00:19:51,266 --> 00:19:53,990
than those pools in terms
of things like attention,

562
00:19:53,990 --> 00:19:55,430
filling out the
correct responses,

563
00:19:55,430 --> 00:19:56,360
and things like that.

564
00:19:56,360 --> 00:19:58,170
So yay Mechanical Turk.

565
00:19:58,170 --> 00:20:00,770
But there are some things like
implicit biases and things

566
00:20:00,770 --> 00:20:02,480
that social
psychologists care about,

567
00:20:02,480 --> 00:20:05,722
where you don't know if the
effect is something like age,

568
00:20:05,722 --> 00:20:06,680
or something like that.

569
00:20:06,680 --> 00:20:08,513
I don't know if this
matters to a lot of you

570
00:20:08,513 --> 00:20:11,110
but it's important to keep in
mind for those of you who do.

571
00:20:11,110 --> 00:20:12,860
Here's this point about
will it replicate.

572
00:20:12,860 --> 00:20:14,540
You know, some people
who are being introduced

573
00:20:14,540 --> 00:20:16,880
to Amazon Mechanical Turk,
or thinking about it, usually

574
00:20:16,880 --> 00:20:18,180
say, yeah, that's fine.

575
00:20:18,180 --> 00:20:20,360
But how do I know that
people are actually doing

576
00:20:20,360 --> 00:20:22,340
what will happen in the lab?

577
00:20:22,340 --> 00:20:23,744
I might do it on
Mechanical Turk,

578
00:20:23,744 --> 00:20:25,160
but then if I do
it in the lab, it

579
00:20:25,160 --> 00:20:27,000
won't replicate or
things like that.

580
00:20:27,000 --> 00:20:30,000
So people have tried a
bunch of the psychophysics

581
00:20:30,000 --> 00:20:32,360
that Leyla was talking
about before, and more.

582
00:20:32,360 --> 00:20:35,060
They tried stroop, switching,
Flanker, Simon, Posner, cueing,

583
00:20:35,060 --> 00:20:37,944
intentional, blink, subliminal
priming, and category learning.

584
00:20:37,944 --> 00:20:39,360
And they've done
a whole lot more.

585
00:20:39,360 --> 00:20:43,340
This is just from one
study by Crump et al.,

586
00:20:43,340 --> 00:20:45,510
where one of the et al.
is Todd Gureckis, who

587
00:20:45,510 --> 00:20:47,180
we'll get to in a second.

588
00:20:47,180 --> 00:20:49,370
And what they find was
basically replication

589
00:20:49,370 --> 00:20:51,724
on all of these classic
psychology stuff, the sort

590
00:20:51,724 --> 00:20:52,890
of effects you would expect.

591
00:20:52,890 --> 00:20:54,945
The sort of effect
sizes you would expect.

592
00:20:54,945 --> 00:20:56,570
The only thing that
was a bit different

593
00:20:56,570 --> 00:20:58,903
was in category learning where
you show people something

594
00:20:58,903 --> 00:21:00,025
like, this is a tufa.

595
00:21:00,025 --> 00:21:00,650
This is a tufa.

596
00:21:00,650 --> 00:21:01,275
This is a tufa.

597
00:21:01,275 --> 00:21:02,450
Is this a tufa?

598
00:21:02,450 --> 00:21:04,310
Where there are different
types of learning,

599
00:21:04,310 --> 00:21:06,770
type one being easier, type
two being of a little harder,

600
00:21:06,770 --> 00:21:08,270
type three being much harder.

601
00:21:08,270 --> 00:21:11,830
Where the classic finding was
something like a graded thing.

602
00:21:11,830 --> 00:21:14,030
And for Amazon Mechanical
Turk, it was more--

603
00:21:14,030 --> 00:21:16,610
it was really hard for
them beyond type one.

604
00:21:16,610 --> 00:21:18,170
Now is that a failure
of replication

605
00:21:18,170 --> 00:21:19,878
or is that because
the original study was

606
00:21:19,878 --> 00:21:22,204
done on college students
who are young and educated?

607
00:21:22,204 --> 00:21:23,870
And this is actually
more representative

608
00:21:23,870 --> 00:21:26,210
of the population, but
this is harder to learn.

609
00:21:26,210 --> 00:21:27,530
I'm not quite sure.

610
00:21:27,530 --> 00:21:31,010
The takeaway here is that, yeah,
it seems like, in general, it

611
00:21:31,010 --> 00:21:33,020
will replicate,
at least certainly

612
00:21:33,020 --> 00:21:37,370
for simple perceptual stuff.

613
00:21:37,370 --> 00:21:41,260
Concerns of running things
on Amazon Mechanical Turk.

614
00:21:41,260 --> 00:21:44,150
And I can send a whole
bunch of recent papers

615
00:21:44,150 --> 00:21:45,387
that are very nice about it.

616
00:21:45,387 --> 00:21:46,970
One of them was
specifically-- there's

617
00:21:46,970 --> 00:21:48,720
been a bunch of like
New York Times papers

618
00:21:48,720 --> 00:21:50,442
on Amazon Mechanical
Turk in general.

619
00:21:50,442 --> 00:21:52,400
There's been a very recent
one a few months ago

620
00:21:52,400 --> 00:21:55,160
on using it for psychophysics
experiments in particular.

621
00:21:55,160 --> 00:21:57,520
It's called the Internet's
Hidden Science Factory.

622
00:21:57,520 --> 00:21:59,210
It's a very nice
paper to check out.

623
00:21:59,210 --> 00:22:01,501
And they make all these points
about the sort of things

624
00:22:01,501 --> 00:22:03,700
that you probably thought
about as a researcher

625
00:22:03,700 --> 00:22:05,654
but it bears
thinking about again,

626
00:22:05,654 --> 00:22:07,070
which is, people
don't necessarily

627
00:22:07,070 --> 00:22:08,840
pay that much
attention to your task.

628
00:22:08,840 --> 00:22:10,140
You have no control.

629
00:22:10,140 --> 00:22:11,510
They're not in the lab.

630
00:22:11,510 --> 00:22:13,700
They give some quotes
there, which is, you know,

631
00:22:13,700 --> 00:22:16,169
Nancy's employees don't know--

632
00:22:16,169 --> 00:22:17,210
yeah, I think it's Nancy.

633
00:22:17,210 --> 00:22:18,350
I changed the name.

634
00:22:18,350 --> 00:22:20,030
Nancy's employers
don't know that Nancy

635
00:22:20,030 --> 00:22:22,190
works while negotiating
her toddlers milk bottles

636
00:22:22,190 --> 00:22:23,265
and giving him hugs.

637
00:22:23,265 --> 00:22:24,890
They don't know that
she's seen studies

638
00:22:24,890 --> 00:22:28,280
similar to theirs, maybe
hundreds, possibly thousands

639
00:22:28,280 --> 00:22:30,110
of times.

640
00:22:30,110 --> 00:22:32,240
So that brings us,
actually, to another thing,

641
00:22:32,240 --> 00:22:34,580
which is repeated exposure.

642
00:22:34,580 --> 00:22:37,587
This is a big concern.

643
00:22:37,587 --> 00:22:39,920
By the way, sorry, I'm going
sort of back and forth here

644
00:22:39,920 --> 00:22:41,150
because before I
leave attention,

645
00:22:41,150 --> 00:22:42,774
I want to mention
just one thing, which

646
00:22:42,774 --> 00:22:45,430
is attention is a problem you
want to put in attention cues.

647
00:22:45,430 --> 00:22:47,910
And I'll talk about how
to do that in a second.

648
00:22:47,910 --> 00:22:50,210
But we've had this a lot
with people in the lab.

649
00:22:50,210 --> 00:22:51,860
I'm sure that some of you
have experienced this as well.

650
00:22:51,860 --> 00:22:53,870
You put them in a room because
they need to have privacy

651
00:22:53,870 --> 00:22:56,203
while they're doing the task,
you go in to check on them

652
00:22:56,203 --> 00:22:57,769
and they're on their phone.

653
00:22:57,769 --> 00:22:58,560
This happens a lot.

654
00:22:58,560 --> 00:23:00,680
So attention is something that
you want to check in the lab

655
00:23:00,680 --> 00:23:01,040
as well.

656
00:23:01,040 --> 00:23:03,510
A lot of these concerns are
not just about Mechanical Turk,

657
00:23:03,510 --> 00:23:05,801
but it's certainly easier
for people in Mechanical Turk

658
00:23:05,801 --> 00:23:08,480
to not quite pay attention.

659
00:23:08,480 --> 00:23:11,570
Repeated exposure
is a huge problem.

660
00:23:11,570 --> 00:23:15,270
And it's a problem for
two different reasons.

661
00:23:15,270 --> 00:23:17,597
One is that it
destroys intuition.

662
00:23:17,597 --> 00:23:19,930
I was asking about the trolley
problem and you all went,

663
00:23:19,930 --> 00:23:22,000
oh, the trolley problem.

664
00:23:22,000 --> 00:23:25,170
People on Turk are doing
that even more than you are.

665
00:23:25,170 --> 00:23:27,100
They see the trolley
problem, they've seen it.

666
00:23:27,100 --> 00:23:29,290
I guarantee you,
it's very difficult

667
00:23:29,290 --> 00:23:31,624
to find a Turker that has
not seen the trolley problem.

668
00:23:31,624 --> 00:23:32,290
They've seen it.

669
00:23:32,290 --> 00:23:34,180
They've seen it 1,000 times.

670
00:23:34,180 --> 00:23:35,876
They've seen all the variations.

671
00:23:35,876 --> 00:23:37,000
And they complain about it.

672
00:23:37,000 --> 00:23:37,690
And they're satiated.

673
00:23:37,690 --> 00:23:38,470
And they're sick of it.

674
00:23:38,470 --> 00:23:40,600
They will say things like, if
I see one more trolley problem,

675
00:23:40,600 --> 00:23:41,404
just kill them all.

676
00:23:41,404 --> 00:23:43,570
Is there a way to kill the
five and the other person

677
00:23:43,570 --> 00:23:44,861
on the other side of the track?

678
00:23:44,861 --> 00:23:46,420
I don't care anymore.

679
00:23:46,420 --> 00:23:48,190
OK, they're completely satiated.

680
00:23:48,190 --> 00:23:50,565
It's kind of like saying,
hammer, hammer, hammer, hammer.

681
00:23:50,565 --> 00:23:52,300
It loses all meaning
at some point.

682
00:23:52,300 --> 00:23:54,850
You don't have the gut
intuitive response.

683
00:23:54,850 --> 00:23:56,406
And even if they're
doing their best,

684
00:23:56,406 --> 00:23:58,030
even if they're not
trying to fool you,

685
00:23:58,030 --> 00:24:00,488
even if they're honestly trying
to answer, they just can't.

686
00:24:00,488 --> 00:24:02,680
They don't have the
gut intuitive response

687
00:24:02,680 --> 00:24:05,560
anymore for the stuff
that you're asking them.

688
00:24:05,560 --> 00:24:09,362
Some ways to get around that
is to try even simple changes.

689
00:24:09,362 --> 00:24:11,320
Just don't call it the
trolley problem anymore.

690
00:24:11,320 --> 00:24:12,880
Set up something
else, which is not

691
00:24:12,880 --> 00:24:18,160
5 versus 1, which is 10 versus
2 and it involves pineapples.

692
00:24:18,160 --> 00:24:18,890
Like something.

693
00:24:18,890 --> 00:24:22,270
You know, these small
changes can matter a lot.

694
00:24:22,270 --> 00:24:24,390
So that's one thing
about repeated exposure,

695
00:24:24,390 --> 00:24:28,176
that it destroys intuition
and related to that,

696
00:24:28,176 --> 00:24:29,800
there's something
called super Turkers.

697
00:24:29,800 --> 00:24:31,716
These are people that,
you know, 1% of Turkers

698
00:24:31,716 --> 00:24:34,210
is responsible for about
10% to 15% of all studies.

699
00:24:34,210 --> 00:24:36,307
So when I say people
have seen it a lot,

700
00:24:36,307 --> 00:24:37,390
that's part of the reason.

701
00:24:37,390 --> 00:24:39,301
There's a lot of people doing--

702
00:24:39,301 --> 00:24:41,050
the same small group
of people is probably

703
00:24:41,050 --> 00:24:43,750
responsible for a
lot of these studies.

704
00:24:43,750 --> 00:24:47,160
The other reason that repeated
exposure ruined things for us

705
00:24:47,160 --> 00:24:53,040
is because, you know, it just
ruins basic correlations.

706
00:24:53,040 --> 00:24:54,730
So let me let me
give you an example.

707
00:24:54,730 --> 00:24:58,450
Who here has heard of the
ball and bat question?

708
00:24:58,450 --> 00:25:00,640
Who here has not heard of
the ball and bat question?

709
00:25:00,640 --> 00:25:02,560
OK, let me pose it to you.

710
00:25:02,560 --> 00:25:05,280
Those of you who suddenly
say, oh yeah, I know this.

711
00:25:05,280 --> 00:25:06,527
Sh.

712
00:25:06,527 --> 00:25:08,860
I'm interested in people who
have not heard this before.

713
00:25:08,860 --> 00:25:12,040
So it's a simple question.

714
00:25:12,040 --> 00:25:16,190
A ball and a bat
together cost $1.10.

715
00:25:16,190 --> 00:25:19,510
The bat costs $1
more than the ball.

716
00:25:19,510 --> 00:25:21,900
How much does the ball cost?

717
00:25:21,900 --> 00:25:22,900
I'll explain that again.

718
00:25:22,900 --> 00:25:25,930
A bat and a ball
costs $1.10 together.

719
00:25:25,930 --> 00:25:29,830
The bat costs dollar
more than the ball.

720
00:25:29,830 --> 00:25:31,660
How much does the ball cost?

721
00:25:31,660 --> 00:25:32,300
Anyone?

722
00:25:32,300 --> 00:25:33,774
Shout it out.

723
00:25:33,774 --> 00:25:34,440
AUDIENCE: $0.05.

724
00:25:34,440 --> 00:25:35,523
TOMER ULLMAN: Yeah, $0.05.

725
00:25:35,523 --> 00:25:38,080
Who would have said
$0.10, don't be shy.

726
00:25:38,080 --> 00:25:39,340
Thank you.

727
00:25:39,340 --> 00:25:41,500
Thank you for
being brave enough.

728
00:25:41,500 --> 00:25:43,060
A lot of people say $0.10.

729
00:25:43,060 --> 00:25:43,960
They say immediately.

730
00:25:43,960 --> 00:25:45,060
The ball costs $0.10.

731
00:25:45,060 --> 00:25:46,360
The bat costs $1.

732
00:25:46,360 --> 00:25:47,390
Wait, but it costs $1.

733
00:25:47,390 --> 00:25:49,270
No, no, it costs $1 more.

734
00:25:49,270 --> 00:25:52,634
So it's-- does everyone get
why $0.10 is not the answer,

735
00:25:52,634 --> 00:25:53,425
it should be $0.05?

736
00:25:53,425 --> 00:25:54,760
OK, good.

737
00:25:54,760 --> 00:25:56,262
This was sort of
a classic question

738
00:25:56,262 --> 00:25:58,720
along with two other questions
called the lily pad question

739
00:25:58,720 --> 00:26:00,220
and the widget question.

740
00:26:00,220 --> 00:26:04,330
The widget question is something
like five widget machines

741
00:26:04,330 --> 00:26:07,220
make five widgets
in five minutes.

742
00:26:07,220 --> 00:26:12,870
How long does it take 100 widget
machines to make 100 widgets?

743
00:26:12,870 --> 00:26:13,810
Five minutes, right.

744
00:26:13,810 --> 00:26:15,590
Not 100.

745
00:26:15,590 --> 00:26:17,820
These sort of three
questions were found out

746
00:26:17,820 --> 00:26:20,730
to correlate a lot better
with things like IQ tests

747
00:26:20,730 --> 00:26:25,330
or even better, sorry, than IQ
on a lot of different things.

748
00:26:25,330 --> 00:26:27,930
They've been found, you
know, many smart people

749
00:26:27,930 --> 00:26:31,080
from MIT and Harvard included
have failed this question.

750
00:26:34,272 --> 00:26:35,730
And the point was,
you know, people

751
00:26:35,730 --> 00:26:36,979
have made a big deal about it.

752
00:26:36,979 --> 00:26:39,750
Like this is even better than IQ
tests on all sorts of measures.

753
00:26:39,750 --> 00:26:40,740
And it's so simple.

754
00:26:40,740 --> 00:26:42,584
We don't have to run
100 IQ test questions.

755
00:26:42,584 --> 00:26:44,250
Let's just ask the
ball and bat question

756
00:26:44,250 --> 00:26:45,985
and then see what
that correlates with.

757
00:26:45,985 --> 00:26:47,360
And it also relates
to, you know,

758
00:26:47,360 --> 00:26:49,180
Kahneman makes a
big deal about it.

759
00:26:49,180 --> 00:26:50,910
It's system one
versus system two.

760
00:26:50,910 --> 00:26:54,360
System one really wants to
answer that it costs $0.10.

761
00:26:54,360 --> 00:26:55,350
And you could solve it.

762
00:26:55,350 --> 00:26:56,460
Of course you
could all solve it.

763
00:26:56,460 --> 00:26:58,350
It's not hard for you if you
wrote down the simple equation,

764
00:26:58,350 --> 00:26:59,220
it's trivial.

765
00:26:59,220 --> 00:27:00,990
Right, you wrote
it down as like x.

766
00:27:00,990 --> 00:27:02,062
And x costs this.

767
00:27:02,062 --> 00:27:03,270
And you could all solve this.

768
00:27:03,270 --> 00:27:05,460
You could all solve
this in middle school.

769
00:27:05,460 --> 00:27:06,240
But you don't.

770
00:27:06,240 --> 00:27:08,260
Like system to-- some of you do.

771
00:27:08,260 --> 00:27:11,450
But usually the first time
you hear it, you don't.

772
00:27:11,450 --> 00:27:14,039
And you don't-- even
if people are warned,

773
00:27:14,039 --> 00:27:16,080
like this is a bit of a
trick question, you know,

774
00:27:16,080 --> 00:27:19,020
think about it,
they don't, usually.

775
00:27:19,020 --> 00:27:21,619
And people have used this as
like ways to test system one--

776
00:27:21,619 --> 00:27:23,160
do people know what
I mean when I say

777
00:27:23,160 --> 00:27:25,140
system one versus system two?

778
00:27:25,140 --> 00:27:25,700
Who doesn't?

779
00:27:25,700 --> 00:27:27,750
Raise your hand.

780
00:27:27,750 --> 00:27:29,490
So very, very quickly
because it's not

781
00:27:29,490 --> 00:27:32,230
that relevant to Mechanical
Turk, but very, very quickly,

782
00:27:32,230 --> 00:27:34,380
you should read Thinking
Fast and Slow by Kahneman.

783
00:27:34,380 --> 00:27:36,270
The point is that the
mind can generally

784
00:27:36,270 --> 00:27:39,100
be categorized into two systems,
even Kahneman doesn't quite

785
00:27:39,100 --> 00:27:41,100
believe that, but it's
sort of this thing that's

786
00:27:41,100 --> 00:27:42,180
easy to talk about.

787
00:27:42,180 --> 00:27:45,720
System one is the fast,
heuristic system that gives you

788
00:27:45,720 --> 00:27:46,910
the cache response system.

789
00:27:46,910 --> 00:27:49,680
Two is the slow, laborious,
can't do many things

790
00:27:49,680 --> 00:27:50,472
at one time system.

791
00:27:50,472 --> 00:27:52,096
That's the sort of
thing that you would

792
00:27:52,096 --> 00:27:53,380
use to solve algebra problems.

793
00:27:53,380 --> 00:27:55,350
That's the reason you need to
slow down when you're thinking

794
00:27:55,350 --> 00:27:56,434
about something very hard.

795
00:27:56,434 --> 00:27:57,933
System one is the
sort of thing that

796
00:27:57,933 --> 00:28:00,270
gives you biases and heuristics
and things like that.

797
00:28:00,270 --> 00:28:02,590
This was given as an
example of the bat and ball

798
00:28:02,590 --> 00:28:04,590
was like one of the prime
examples of system one

799
00:28:04,590 --> 00:28:06,370
wants to do this, system
two wants to do that.

800
00:28:06,370 --> 00:28:07,870
System two is lazy,
doesn't actually

801
00:28:07,870 --> 00:28:11,430
get engaged unless it
really, really has to.

802
00:28:11,430 --> 00:28:12,690
Why am I bring this up?

803
00:28:12,690 --> 00:28:14,190
The reason I bring this
up is because people

804
00:28:14,190 --> 00:28:16,290
thought it was a great idea
to put it on Amazon Mechanical

805
00:28:16,290 --> 00:28:17,670
Turk and ask people
a lot of questions

806
00:28:17,670 --> 00:28:18,961
to see what it correlates with.

807
00:28:18,961 --> 00:28:21,390
Everybody knows about
the ball and bat problem

808
00:28:21,390 --> 00:28:22,800
on Amazon Mechanical Turk.

809
00:28:22,800 --> 00:28:24,239
Everybody.

810
00:28:24,239 --> 00:28:25,780
Don't think that
you're being unique.

811
00:28:25,780 --> 00:28:27,196
Don't ask them the
widget problem.

812
00:28:27,196 --> 00:28:29,100
Don't ask them the
lily pad problem.

813
00:28:29,100 --> 00:28:31,230
They all know about it.

814
00:28:31,230 --> 00:28:33,450
And here it's not a
problem for intuitive,

815
00:28:33,450 --> 00:28:35,610
you know, being satiated
and things like that.

816
00:28:35,610 --> 00:28:38,049
Even if they want to,
right, I said before,

817
00:28:38,049 --> 00:28:40,590
like they want to tell you what
the thing is, they just don't

818
00:28:40,590 --> 00:28:41,910
have the gut response anymore.

819
00:28:41,910 --> 00:28:43,320
Here they just know it.

820
00:28:43,320 --> 00:28:46,680
The reason most of you
answered, or those of you

821
00:28:46,680 --> 00:28:48,570
who knew about it
said, haha, $0.05.

822
00:28:48,570 --> 00:28:49,290
I know that one.

823
00:28:49,290 --> 00:28:50,490
I've solved it before.

824
00:28:50,490 --> 00:28:52,350
People on Mechanical
Turk know that too.

825
00:28:52,350 --> 00:28:54,266
And it's sort of destroyed
any sort of measure

826
00:28:54,266 --> 00:28:58,504
it had for whatever the heck
it was trying to measure.

827
00:28:58,504 --> 00:29:00,420
In general, there's sort
of this growing sense

828
00:29:00,420 --> 00:29:03,110
that since 2010,
people have been

829
00:29:03,110 --> 00:29:04,860
trying to make Amazon
Mechanical Turk more

830
00:29:04,860 --> 00:29:07,110
popular for a few years.

831
00:29:07,110 --> 00:29:09,420
It feels almost like an
overexploited resource,

832
00:29:09,420 --> 00:29:10,140
a little bit.

833
00:29:10,140 --> 00:29:12,889
Like you have this tribe which
doesn't know about numbers.

834
00:29:12,889 --> 00:29:15,430
Let's all go and study them and
ask them a billion questions.

835
00:29:15,430 --> 00:29:16,500
And the reason they
don't know about numbers

836
00:29:16,500 --> 00:29:17,610
is because they
don't know English.

837
00:29:17,610 --> 00:29:19,860
And by the time we are done with
them, they will know English.

838
00:29:19,860 --> 00:29:21,090
And the one thing
they'll know in English

839
00:29:21,090 --> 00:29:22,548
is how to count to
10 because we've

840
00:29:22,548 --> 00:29:24,362
asked them all these questions.

841
00:29:24,362 --> 00:29:26,070
Amazon Mechanical Turk
feels a little bit

842
00:29:26,070 --> 00:29:29,460
like this over exploited
resource at sometimes.

843
00:29:29,460 --> 00:29:31,050
These have been more
concerns for you

844
00:29:31,050 --> 00:29:33,390
as a requester working on
Amazon Mechanical Turk, things

845
00:29:33,390 --> 00:29:35,620
to sort of keep in
mind and watch out for.

846
00:29:35,620 --> 00:29:38,220
Here are some concerns for
people on Mechanical Turk

847
00:29:38,220 --> 00:29:39,767
that you can try to alleviate.

848
00:29:39,767 --> 00:29:41,100
And these sort of things about--

849
00:29:41,100 --> 00:29:44,970
I've already mentioned sort
of two of them about low pay

850
00:29:44,970 --> 00:29:48,204
and rejecting people for no good
reason and things like that.

851
00:29:48,204 --> 00:29:49,620
One more thing I
want to point out

852
00:29:49,620 --> 00:29:52,170
is this thing about
no de-briefing.

853
00:29:52,170 --> 00:29:53,970
When people coming
to the lab, you

854
00:29:53,970 --> 00:29:56,040
can tell them why they
were in the study.

855
00:29:56,040 --> 00:29:58,920
That's a basic part of the
protocol of psychology.

856
00:29:58,920 --> 00:30:00,674
When people are on
Amazon Mechanical Turk,

857
00:30:00,674 --> 00:30:02,590
it's a good thing to put
it in the experiment,

858
00:30:02,590 --> 00:30:05,010
why were you in this experiment
if you have such a thing,

859
00:30:05,010 --> 00:30:06,691
and you're running
an experiment.

860
00:30:06,691 --> 00:30:08,190
People might drop
out in the middle,

861
00:30:08,190 --> 00:30:10,232
might decide it's not for
me, actually, I'm done.

862
00:30:10,232 --> 00:30:11,856
And they never actually
figure out what

863
00:30:11,856 --> 00:30:13,200
the point of the experiment was.

864
00:30:13,200 --> 00:30:14,497
You might say, well, who cares?

865
00:30:14,497 --> 00:30:16,830
But you might say, well, who
cares to de-briefing people

866
00:30:16,830 --> 00:30:17,430
in the lab.

867
00:30:17,430 --> 00:30:19,300
If there's a reason for
de-briefing people in the lab

868
00:30:19,300 --> 00:30:21,510
there's probably a good
reason for de-briefing people

869
00:30:21,510 --> 00:30:22,470
on Mechanical Turk.

870
00:30:22,470 --> 00:30:24,180
If they drop out
before their de-brief,

871
00:30:24,180 --> 00:30:25,540
that's kind of a problem.

872
00:30:25,540 --> 00:30:27,040
I don't have a good
solution for it.

873
00:30:27,040 --> 00:30:30,340
It's something to keep in mind.

874
00:30:30,340 --> 00:30:32,070
There's also the
problem that you

875
00:30:32,070 --> 00:30:34,940
should keep in mind
and report, probably,

876
00:30:34,940 --> 00:30:37,440
if it's a problem, the amount
of people who have dropped out

877
00:30:37,440 --> 00:30:39,231
in the middle because
otherwise it can lead

878
00:30:39,231 --> 00:30:41,700
to all sorts of small effects.

879
00:30:41,700 --> 00:30:44,280
Like if your task is really
hard, and a lot of people

880
00:30:44,280 --> 00:30:46,170
drop out, you had
like a 90% dropout

881
00:30:46,170 --> 00:30:48,630
and then you say, oh, people
are brilliant at my task

882
00:30:48,630 --> 00:30:50,730
because the 10% who actually
stuck through with it

883
00:30:50,730 --> 00:30:53,580
are the sort of crazy people
who are willing to do it.

884
00:30:53,580 --> 00:30:55,410
That's actually
really, really skewed.

885
00:30:55,410 --> 00:30:57,770
So keep in mind the
dropout should be low.

886
00:30:57,770 --> 00:31:00,337
It should be like a few
percent or something like that.

887
00:31:00,337 --> 00:31:02,670
The flip side, by the way,
of the no de-briefing problem

888
00:31:02,670 --> 00:31:04,790
is a coercion problem.

889
00:31:04,790 --> 00:31:07,290
So when you bring people into
the lab and you say, you know,

890
00:31:07,290 --> 00:31:08,020
do the study.

891
00:31:08,020 --> 00:31:09,210
You can stop at any time.

892
00:31:09,210 --> 00:31:12,900
No problem at all let me shut
the door and wait over here.

893
00:31:12,900 --> 00:31:14,934
There's sort of a slight
feeling of coercion,

894
00:31:14,934 --> 00:31:16,350
even if the study
is not something

895
00:31:16,350 --> 00:31:18,570
they really are enjoying doing.

896
00:31:18,570 --> 00:31:20,890
They'll still do it because
they feel pressured to.

897
00:31:20,890 --> 00:31:22,390
That problem doesn't
exist, at least

898
00:31:22,390 --> 00:31:23,370
on Amazon Mechanical Turk.

899
00:31:23,370 --> 00:31:24,744
I mean, they're
in it for the pay

900
00:31:24,744 --> 00:31:26,500
and there's some
cost and all that.

901
00:31:26,500 --> 00:31:28,320
But if they really
don't want to do it,

902
00:31:28,320 --> 00:31:30,550
if it offends them in
some way, they'll stop.

903
00:31:30,550 --> 00:31:33,480
So that's actually a bonus.

904
00:31:33,480 --> 00:31:35,620
Some general tips.

905
00:31:35,620 --> 00:31:36,220
Let's see.

906
00:31:36,220 --> 00:31:37,860
And I think I'll
have two more slides

907
00:31:37,860 --> 00:31:38,970
and then we'll wrap it up.

908
00:31:38,970 --> 00:31:39,780
Some general tips.

909
00:31:39,780 --> 00:31:41,940
In general, when you're thinking
about your task, the lower

910
00:31:41,940 --> 00:31:43,530
level it is, the
closer level it is

911
00:31:43,530 --> 00:31:45,210
to-- think about
the Stroop task.

912
00:31:45,210 --> 00:31:48,000
OK, let's put the Stroop task
in one case and the question

913
00:31:48,000 --> 00:31:49,550
I actually asked
you in another case.

914
00:31:49,550 --> 00:31:52,020
The Stroop task is low
level, hard to beat,

915
00:31:52,020 --> 00:31:53,850
even if you've seen
it a million times,

916
00:31:53,850 --> 00:31:55,690
you will still find that effect.

917
00:31:55,690 --> 00:31:58,860
If your task is like that, you
should expect to replicate it.

918
00:31:58,860 --> 00:32:01,110
If it's much more higher
level, the sort of thing that

919
00:32:01,110 --> 00:32:04,719
relies on them not
having seen that before,

920
00:32:04,719 --> 00:32:05,760
it's harder to replicate.

921
00:32:05,760 --> 00:32:07,593
You want to make sure
that people who see it

922
00:32:07,593 --> 00:32:08,737
have not seen it before.

923
00:32:08,737 --> 00:32:11,070
If it's high level and relies
on some sort of zeitgeist,

924
00:32:11,070 --> 00:32:14,070
like what people think about AI
right now, in two years you'll

925
00:32:14,070 --> 00:32:16,170
find a different result.
Don't expect the thing

926
00:32:16,170 --> 00:32:18,570
that I put online
today to replicate

927
00:32:18,570 --> 00:32:21,840
in some sense in two years.

928
00:32:21,840 --> 00:32:23,875
Have participants give comments.

929
00:32:23,875 --> 00:32:25,500
At the end of your
survey, once they're

930
00:32:25,500 --> 00:32:28,200
done, before the de-briefing,
collect some demographics

931
00:32:28,200 --> 00:32:30,246
and leave them something
optional to just say,

932
00:32:30,246 --> 00:32:31,620
what did you think
of this study?

933
00:32:31,620 --> 00:32:32,730
Do you have any comments?

934
00:32:32,730 --> 00:32:33,750
Or anything like that.

935
00:32:33,750 --> 00:32:35,766
Most of the time you
won't get any comments.

936
00:32:35,766 --> 00:32:37,140
After that, the
most likely thing

937
00:32:37,140 --> 00:32:38,250
is they'll say, fun study.

938
00:32:38,250 --> 00:32:39,750
Thanks, if you've
done it correctly.

939
00:32:39,750 --> 00:32:41,035
Interesting, stuff like that.

940
00:32:41,035 --> 00:32:43,410
Some studies we've done have
been crazy and they're very,

941
00:32:43,410 --> 00:32:44,659
you know, they really like it.

942
00:32:44,659 --> 00:32:46,290
And it's nice to
get good feedback.

943
00:32:46,290 --> 00:32:47,760
Or they'll tell
you something like,

944
00:32:47,760 --> 00:32:49,640
that was really interesting,
could you tell me a bit more.

945
00:32:49,640 --> 00:32:50,670
Here's my email address.

946
00:32:50,670 --> 00:32:52,470
Or this button that
didn't work for me.

947
00:32:52,470 --> 00:32:54,886
Or I was actually a bit bothered
by the fact that you said

948
00:32:54,886 --> 00:32:56,250
that you would kill the robot.

949
00:32:56,250 --> 00:32:57,570
Things like that.

950
00:32:57,570 --> 00:33:00,497
Or you ask them things like--
give them comments like, why.

951
00:33:00,497 --> 00:33:02,580
You don't put that necessarily
in your experiment.

952
00:33:02,580 --> 00:33:04,440
But you tell them,
like, you know, do this.

953
00:33:04,440 --> 00:33:05,310
Do that.

954
00:33:05,310 --> 00:33:06,501
Make a decision.

955
00:33:06,501 --> 00:33:07,000
Why?

956
00:33:07,000 --> 00:33:08,220
Like the trolley problem.

957
00:33:08,220 --> 00:33:08,735
Why?

958
00:33:08,735 --> 00:33:10,110
OK, it's the sort
of thing that's

959
00:33:10,110 --> 00:33:11,490
not likely to be
published immediately,

960
00:33:11,490 --> 00:33:12,660
but it's definitely
the sort of thing

961
00:33:12,660 --> 00:33:14,034
that will help
you think of where

962
00:33:14,034 --> 00:33:15,970
to take your experiment next.

963
00:33:15,970 --> 00:33:18,880
It's very, very
important to communicate.

964
00:33:18,880 --> 00:33:21,810
And what I mean by that is give
them an email at the beginning

965
00:33:21,810 --> 00:33:24,310
to reach you in some
way, say like, you

966
00:33:24,310 --> 00:33:26,212
know, in the consent statement.

967
00:33:26,212 --> 00:33:27,420
You're doing this experiment.

968
00:33:27,420 --> 00:33:28,740
You're doing it for x.

969
00:33:28,740 --> 00:33:31,546
Here's a way to reach
x if you want to.

970
00:33:31,546 --> 00:33:33,420
AUDIENCE: Do you give
them your actual email?

971
00:33:33,420 --> 00:33:35,753
TOMER ULLMAN: You can set up
a bogus email and the sense

972
00:33:35,753 --> 00:33:37,350
of, like, tomer@mechanicalturk.

973
00:33:37,350 --> 00:33:39,080
I personally give
them my email at MIT.

974
00:33:39,080 --> 00:33:39,580
I do.

975
00:33:42,560 --> 00:33:44,705
And make sure that you
respond to their concerns.

976
00:33:44,705 --> 00:33:46,830
And they will write to you,
especially if something

977
00:33:46,830 --> 00:33:47,880
goes technically wrong.

978
00:33:47,880 --> 00:33:50,670
They'll say, like, you know,
the screen didn't load for me.

979
00:33:50,670 --> 00:33:53,190
I forgot to paste in the
code that you wanted,

980
00:33:53,190 --> 00:33:54,190
things like that.

981
00:33:54,190 --> 00:33:56,430
Or this didn't work quite well.

982
00:33:56,430 --> 00:33:57,330
Write back to them.

983
00:33:57,330 --> 00:33:58,380
Explain what happened.

984
00:33:58,380 --> 00:34:00,255
If they want to know
what the study is about,

985
00:34:00,255 --> 00:34:02,692
you should explain to them
what this study is about.

986
00:34:02,692 --> 00:34:04,150
You should do that
for two reasons.

987
00:34:04,150 --> 00:34:05,400
It feels silly to mention this.

988
00:34:05,400 --> 00:34:06,900
I'm sure you're all, you
know, you've figured it out

989
00:34:06,900 --> 00:34:07,470
by yourselves.

990
00:34:07,470 --> 00:34:09,090
But I'll mention it
anyway, just on the chance

991
00:34:09,090 --> 00:34:11,040
that there's one person
that says, oh yeah,

992
00:34:11,040 --> 00:34:12,085
that's a good reason.

993
00:34:12,085 --> 00:34:14,460
For two reasons, one is that
they'll like you a lot more.

994
00:34:14,460 --> 00:34:16,920
OK, these people, they
go to their own forums.

995
00:34:16,920 --> 00:34:18,600
There's a lot of
Mechanical Turk forums.

996
00:34:18,600 --> 00:34:19,530
There's hundreds of them.

997
00:34:19,530 --> 00:34:21,600
And they tell each other
what things they should do.

998
00:34:21,600 --> 00:34:23,350
They do a good job of
policing each other.

999
00:34:23,350 --> 00:34:27,049
They're like, they try never
to post answers to things,

1000
00:34:27,049 --> 00:34:29,340
or like, oh you can do this
by answering this question.

1001
00:34:29,340 --> 00:34:29,670
No.

1002
00:34:29,670 --> 00:34:31,878
It's like, they don't tolerate
that because they know

1003
00:34:31,878 --> 00:34:35,250
that we don't tolerate that.

1004
00:34:35,250 --> 00:34:36,960
You want them to like
you in that sense.

1005
00:34:36,960 --> 00:34:38,920
You want to get good
reviews on these things.

1006
00:34:38,920 --> 00:34:42,056
And one way to garner good
favor is to communicate.

1007
00:34:42,056 --> 00:34:44,139
The other reason is because
it's just a good idea.

1008
00:34:44,139 --> 00:34:45,250
These are people in the public.

1009
00:34:45,250 --> 00:34:46,469
You wouldn't think
about not answering

1010
00:34:46,469 --> 00:34:48,260
a question of someone
who came into the lab

1011
00:34:48,260 --> 00:34:49,650
and asked you something.

1012
00:34:49,650 --> 00:34:52,260
Keep in mind, there's a real
people behind the screen.

1013
00:34:52,260 --> 00:34:54,194
Make sure that you treat
them as real people.

1014
00:34:54,194 --> 00:34:56,610
I don't mean-- I sound like
I'm berating you or something,

1015
00:34:56,610 --> 00:34:58,530
like that you guys have not been
communicating and it's awful.

1016
00:34:58,530 --> 00:34:59,160
No.

1017
00:34:59,160 --> 00:35:00,035
That's not the point.

1018
00:35:00,035 --> 00:35:02,040
I'm sure you all mean
to do that I'm just

1019
00:35:02,040 --> 00:35:04,200
trying to emphasize it.

1020
00:35:04,200 --> 00:35:06,120
Like I said before,
don't reject unless it's

1021
00:35:06,120 --> 00:35:07,500
an extreme situation.

1022
00:35:07,500 --> 00:35:10,200
Also, decide ahead of time
how you're going to reject.

1023
00:35:10,200 --> 00:35:11,970
Decide ahead of time
on a catch question,

1024
00:35:11,970 --> 00:35:13,416
something I'll get
to in a second.

1025
00:35:13,416 --> 00:35:15,540
And say, I'm going to reject
people if they do this

1026
00:35:15,540 --> 00:35:17,220
and stick to it.

1027
00:35:17,220 --> 00:35:19,350
Because otherwise, if you
don't do that, then when

1028
00:35:19,350 --> 00:35:21,766
it comes time to actually try
to write a paper you'll say,

1029
00:35:21,766 --> 00:35:24,374
well, I think I'll try throwing
out all the people that did it

1030
00:35:24,374 --> 00:35:26,790
in under 20 seconds because I
don't think they were paying

1031
00:35:26,790 --> 00:35:28,110
attention that much.

1032
00:35:28,110 --> 00:35:29,154
Maybe 30 seconds.

1033
00:35:29,154 --> 00:35:31,070
Yeah, this test should
really take 40 seconds.

1034
00:35:31,070 --> 00:35:32,250
You got the point.

1035
00:35:32,250 --> 00:35:35,310
Decide ahead of time on
the rejection criteria.

1036
00:35:35,310 --> 00:35:37,000
Have good catch questions.

1037
00:35:37,000 --> 00:35:39,390
This is good both for, you
know, knowing who to reject

1038
00:35:39,390 --> 00:35:41,340
and making sure that
they're paying attention.

1039
00:35:41,340 --> 00:35:42,510
Catch questions are
the sort of thing

1040
00:35:42,510 --> 00:35:43,920
that you would put in
the middle of your survey

1041
00:35:43,920 --> 00:35:45,840
or at the end of survey, or at
the start of the survey, just

1042
00:35:45,840 --> 00:35:47,760
to make sure that they
are paying attention.

1043
00:35:47,760 --> 00:35:49,872
Ideally it would be also
that they have actually

1044
00:35:49,872 --> 00:35:51,330
read the instructions
and know what

1045
00:35:51,330 --> 00:35:52,900
they're supposed to be doing.

1046
00:35:52,900 --> 00:35:54,240
So sometimes even if
they're paying attention,

1047
00:35:54,240 --> 00:35:56,280
they didn't get the instructions
or something like that.

1048
00:35:56,280 --> 00:35:58,279
There's a bunch of different
ways of doing this.

1049
00:35:58,279 --> 00:36:00,312
One way of doing
it, you know, I'm

1050
00:36:00,312 --> 00:36:02,520
sure you guys can come up
with your own ways I'm just

1051
00:36:02,520 --> 00:36:04,853
giving you some examples of
the stuff that people I know

1052
00:36:04,853 --> 00:36:05,550
have done.

1053
00:36:05,550 --> 00:36:07,530
Toby, for example,
that you've seen

1054
00:36:07,530 --> 00:36:09,305
doing some counterfactual
stuff maybe,

1055
00:36:09,305 --> 00:36:11,430
he just gives people a
screen with the instructions

1056
00:36:11,430 --> 00:36:12,380
and asks them some questions.

1057
00:36:12,380 --> 00:36:13,650
And until they get
the question right,

1058
00:36:13,650 --> 00:36:15,191
they don't move onto
the next screen.

1059
00:36:15,191 --> 00:36:16,960
So he doesn't reject them later.

1060
00:36:16,960 --> 00:36:18,270
He just says, in order to
pass to the next screen,

1061
00:36:18,270 --> 00:36:20,061
you have to answer this
question correctly.

1062
00:36:20,061 --> 00:36:22,761
And he has some way
of checking that.

1063
00:36:22,761 --> 00:36:23,760
That's been really good.

1064
00:36:23,760 --> 00:36:25,680
Like once he implemented
that, the data is

1065
00:36:25,680 --> 00:36:27,654
much, much better and cleaner.

1066
00:36:27,654 --> 00:36:29,070
Here's the sort
of catch questions

1067
00:36:29,070 --> 00:36:31,520
you don't necessarily
want to do.

1068
00:36:31,520 --> 00:36:32,790
They're very popular.

1069
00:36:32,790 --> 00:36:34,373
You don't necessarily
want to do them.

1070
00:36:34,373 --> 00:36:36,990
They're things like, have you
ever had a fatal heart attack.

1071
00:36:36,990 --> 00:36:39,500
And the answer
is, of course, no.

1072
00:36:39,500 --> 00:36:41,430
Have you ever eaten
a sandwich on Mars.

1073
00:36:41,430 --> 00:36:42,270
It's the sort of
thing that like you're

1074
00:36:42,270 --> 00:36:44,400
trying to catch people that are
going through it very quickly

1075
00:36:44,400 --> 00:36:46,320
and are just marking
things randomly.

1076
00:36:46,320 --> 00:36:47,520
One of the reasons you
don't want to do that

1077
00:36:47,520 --> 00:36:49,853
is because even if they're
answering randomly yes or no,

1078
00:36:49,853 --> 00:36:53,590
you'll still miss 50% who
just got it right by error.

1079
00:36:53,590 --> 00:36:55,619
The other reason is
the standard stuff.

1080
00:36:55,619 --> 00:36:57,910
I mean, I'm sure you guys
could come up with something.

1081
00:36:57,910 --> 00:36:59,100
But there's a lot of
examples out there.

1082
00:36:59,100 --> 00:37:00,475
The two examples
I just gave you,

1083
00:37:00,475 --> 00:37:02,280
the Martian, the
fatal heart attack,

1084
00:37:02,280 --> 00:37:05,340
this is stuff that gets used
over and over and over again.

1085
00:37:05,340 --> 00:37:08,190
And they sort of just know it.

1086
00:37:08,190 --> 00:37:09,750
One of them said
like, any time I

1087
00:37:09,750 --> 00:37:11,306
see the person I
told you before was

1088
00:37:11,306 --> 00:37:13,680
like juggling kids and trying
to answer at the same time.

1089
00:37:13,680 --> 00:37:15,750
He says, oh yeah, whenever
I see the word vacuum,

1090
00:37:15,750 --> 00:37:17,450
I know it's time for
an attention check

1091
00:37:17,450 --> 00:37:18,450
because it's going to
be like, have you ever

1092
00:37:18,450 --> 00:37:20,824
eaten a sandwich in a vacuum
or like something like that.

1093
00:37:20,824 --> 00:37:23,460
But whenever I see vacuum, it's
obviously an attention check.

1094
00:37:23,460 --> 00:37:24,552
You don't want to do that.

1095
00:37:24,552 --> 00:37:26,010
Ideally, you want
to have something

1096
00:37:26,010 --> 00:37:27,810
that relates to the task.

1097
00:37:27,810 --> 00:37:29,520
So in one of our
examples, we were

1098
00:37:29,520 --> 00:37:31,260
doing some sort of Turing task.

1099
00:37:31,260 --> 00:37:34,290
And we just wanted
to say, like, here,

1100
00:37:34,290 --> 00:37:35,970
complete the following sentence.

1101
00:37:35,970 --> 00:37:38,760
You were playing
the game against a--

1102
00:37:38,760 --> 00:37:40,080
and then it's an open text box.

1103
00:37:40,080 --> 00:37:42,205
OK, some of these people
have like automatic robots

1104
00:37:42,205 --> 00:37:43,170
that fill it in.

1105
00:37:43,170 --> 00:37:45,210
So they'll do
something like, thanks.

1106
00:37:45,210 --> 00:37:46,110
Or yes.

1107
00:37:46,110 --> 00:37:47,410
Or something like that.

1108
00:37:47,410 --> 00:37:50,010
Then they just hope
that yes will match.

1109
00:37:50,010 --> 00:37:52,490
But here the correct
answer was robot.

1110
00:37:52,490 --> 00:37:54,240
You were playing the
game against a robot,

1111
00:37:54,240 --> 00:37:56,700
or against a human, or
something like that.

1112
00:37:56,700 --> 00:37:58,690
Did people get that example?

1113
00:37:58,690 --> 00:37:59,232
OK.

1114
00:37:59,232 --> 00:38:00,690
So ideally, the
good catch question

1115
00:38:00,690 --> 00:38:02,482
is an open field,
something that's

1116
00:38:02,482 --> 00:38:04,440
not just you can click
and get right by mistake

1117
00:38:04,440 --> 00:38:06,148
and relates something
to the instructions

1118
00:38:06,148 --> 00:38:08,190
that you were giving.

1119
00:38:08,190 --> 00:38:09,060
This is not a tip.

1120
00:38:09,060 --> 00:38:11,800
This is something you should do.

1121
00:38:11,800 --> 00:38:12,634
Again, it's trivial.

1122
00:38:12,634 --> 00:38:14,841
You'll have to do it if
you're thinking of running it

1123
00:38:14,841 --> 00:38:16,650
in your own university
and your university

1124
00:38:16,650 --> 00:38:18,570
has never done
Mechanical Turk before,

1125
00:38:18,570 --> 00:38:22,830
get IRB approval specifically
for Mechanical Turk.

1126
00:38:22,830 --> 00:38:24,644
So just make sure
you get IRB approval.

1127
00:38:24,644 --> 00:38:26,060
Make sure you get
informed consent

1128
00:38:26,060 --> 00:38:28,200
at the beginning of your
study, say like, we're

1129
00:38:28,200 --> 00:38:29,640
going to be doing this.

1130
00:38:29,640 --> 00:38:32,460
If it's OK click on this
button that says, I agree.

1131
00:38:32,460 --> 00:38:35,700
Usually the IRB will force
you to do that anyway.

1132
00:38:35,700 --> 00:38:39,210
And as I said, ethical pay, I
just keep going back to that.

1133
00:38:39,210 --> 00:38:45,080
OK, there's various helpful
tools for running experiments.

1134
00:38:45,080 --> 00:38:47,580
If any of you are interested
in this, reading more about it,

1135
00:38:47,580 --> 00:38:49,180
much more in depth
how to actually run

1136
00:38:49,180 --> 00:38:50,260
an experiment and
things like that,

1137
00:38:50,260 --> 00:38:51,390
come talk to me afterwards.

1138
00:38:51,390 --> 00:38:54,182
Or look at Todd
Gureckis's website.

1139
00:38:54,182 --> 00:38:55,890
Other websites that
you should check out.

1140
00:38:55,890 --> 00:38:58,380
These are the forums
you should probably

1141
00:38:58,380 --> 00:39:01,507
know about, TurkerNation,
mTurkGrind, and TurkOpticon.

1142
00:39:01,507 --> 00:39:03,840
These are useful both to get
a sense of how your task is

1143
00:39:03,840 --> 00:39:05,787
doing, are people sort
of responding to it

1144
00:39:05,787 --> 00:39:06,870
saying something about it.

1145
00:39:06,870 --> 00:39:09,150
It's also a good place
to publicize your study.

1146
00:39:09,150 --> 00:39:11,910
If I need 300 participants
within two hours,

1147
00:39:11,910 --> 00:39:14,430
I can put it on Turk and
hope the pay is enough.

1148
00:39:14,430 --> 00:39:15,810
Or I can put it on mTurkGrind.

1149
00:39:15,810 --> 00:39:17,619
People there have
liked our tasks before

1150
00:39:17,619 --> 00:39:19,410
and they'll give you
a thumbs up and you'll

1151
00:39:19,410 --> 00:39:21,570
get 200 people within an
hour just because they

1152
00:39:21,570 --> 00:39:24,240
know about you and they know
that you're an OK person,

1153
00:39:24,240 --> 00:39:25,920
and you communicate.

1154
00:39:25,920 --> 00:39:29,550
So it's a good practice
to get a user name

1155
00:39:29,550 --> 00:39:31,600
on one of these things,
or both of them,

1156
00:39:31,600 --> 00:39:33,120
when you run an
actual experiment.

1157
00:39:33,120 --> 00:39:34,290
Explain what you're doing.

1158
00:39:34,290 --> 00:39:35,640
Put it on there.

1159
00:39:35,640 --> 00:39:37,850
Be willing to answer questions.

1160
00:39:37,850 --> 00:39:40,657
TurkOpticon is the usual
thing that will then be used--

1161
00:39:40,657 --> 00:39:42,990
this is what one of these
forums looks like, by the way.

1162
00:39:42,990 --> 00:39:46,050
They're like, you
know, oh it's terrible.

1163
00:39:46,050 --> 00:39:46,611
It's Tuesday.

1164
00:39:46,611 --> 00:39:47,610
What should we do today?

1165
00:39:47,610 --> 00:39:49,110
Here's how much I've done today.

1166
00:39:49,110 --> 00:39:51,840
Somebody says, I didn't
see this posted anywhere.

1167
00:39:51,840 --> 00:39:54,846
Very interesting to those of
you who want to do 3D printing.

1168
00:39:54,846 --> 00:39:56,720
Then there's like the
title of the experiment

1169
00:39:56,720 --> 00:39:58,036
and a description of it.

1170
00:39:58,036 --> 00:39:59,910
And people can just
click directly from there

1171
00:39:59,910 --> 00:40:01,074
to the experiment.

1172
00:40:01,074 --> 00:40:03,240
Usually the experiments
that they post on the forums

1173
00:40:03,240 --> 00:40:04,380
looks something like this.

1174
00:40:04,380 --> 00:40:06,630
They're getting these numbers
off of that last website

1175
00:40:06,630 --> 00:40:08,110
that I just mentioned,
TurkOpticon.

1176
00:40:08,110 --> 00:40:10,740
So usually when people want to
rate you on Mechanical Turk,

1177
00:40:10,740 --> 00:40:12,630
they're not going to complain
to Amazon because Amazon's not

1178
00:40:12,630 --> 00:40:13,997
going to do anything about it.

1179
00:40:13,997 --> 00:40:16,080
Like I said before,
requesters have a lot of power

1180
00:40:16,080 --> 00:40:19,470
and Amazon doesn't bother
to arbitrate, usually.

1181
00:40:19,470 --> 00:40:22,110
So one of the ways that they
do have of either rewarding

1182
00:40:22,110 --> 00:40:24,660
or punishing you is
to go to TurkOpticon

1183
00:40:24,660 --> 00:40:26,970
and give you a rating for
communication, generosity,

1184
00:40:26,970 --> 00:40:29,580
fairness, and promptness.

1185
00:40:29,580 --> 00:40:32,010
And those numbers will then
be used on most of the forums

1186
00:40:32,010 --> 00:40:33,134
when you publish your task.

1187
00:40:33,134 --> 00:40:36,000
People can see, like, these guys
have actually cheated people

1188
00:40:36,000 --> 00:40:38,310
before, or something like that.

1189
00:40:38,310 --> 00:40:40,780
So you can go, once you've
registered, go to TurkOpticon

1190
00:40:40,780 --> 00:40:44,770
and you can check out your own
ratings and things like that.

1191
00:40:44,770 --> 00:40:49,104
So yeah, in summary, Mechanical
Turk is a wonderful resource.

1192
00:40:49,104 --> 00:40:51,520
I hope I haven't scared you
away from it, those of you who

1193
00:40:51,520 --> 00:40:52,830
are thinking about doing it.

1194
00:40:52,830 --> 00:40:54,990
It amplifies everything.

1195
00:40:54,990 --> 00:40:57,480
You can do a lot
more a lot faster.

1196
00:40:57,480 --> 00:40:59,310
But it doesn't get
rid of concerns,

1197
00:40:59,310 --> 00:41:00,640
it amplifies those concerns.

1198
00:41:00,640 --> 00:41:02,849
So things like ethical
concerns and payment concerns,

1199
00:41:02,849 --> 00:41:05,140
and things like that, the
sort of same sort of concerns

1200
00:41:05,140 --> 00:41:07,680
that you would have in the
lab, those are included there,

1201
00:41:07,680 --> 00:41:10,140
and other magnified
100-fold because you're

1202
00:41:10,140 --> 00:41:12,890
recruiting a lot more people.