1
00:00:00,060 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,340
To make a donation or
view additional materials

6
00:00:13,340 --> 00:00:17,217
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,217 --> 00:00:17,842
at ocw.mit.edu.

8
00:00:21,520 --> 00:00:25,260
PROFESSOR: OK, so
good afternoon.

9
00:00:25,260 --> 00:00:30,820
Today, we will review
probability theory.

10
00:00:30,820 --> 00:00:36,090
So I will mostly focus on-- I'll
give you some distributions.

11
00:00:36,090 --> 00:00:38,830
So probabilistic distributions,
that will be of interest to us

12
00:00:38,830 --> 00:00:40,830
throughout the course.

13
00:00:40,830 --> 00:00:44,610
And I will talk about
moment-generating function

14
00:00:44,610 --> 00:00:46,120
a little bit.

15
00:00:46,120 --> 00:00:50,660
Afterwards, I will talk
about law of large numbers

16
00:00:50,660 --> 00:00:52,210
and central limit theorem.

17
00:00:56,310 --> 00:01:00,680
Who has heard of all
of these topics before?

18
00:01:00,680 --> 00:01:02,150
OK.

19
00:01:02,150 --> 00:01:04,120
That's good.

20
00:01:04,120 --> 00:01:06,624
Then I'll try to focus
more on a little bit more

21
00:01:06,624 --> 00:01:07,540
of the advanced stuff.

22
00:01:10,890 --> 00:01:13,830
Then a big part of it
will be review for you.

23
00:01:13,830 --> 00:01:18,260
So first of all, just to
agree on terminology, let's

24
00:01:18,260 --> 00:01:21,490
review some definitions.

25
00:01:21,490 --> 00:01:32,670
So a random variable
X-- we will talk

26
00:01:32,670 --> 00:01:38,900
about discrete and
continuous random variables.

27
00:01:43,310 --> 00:01:47,240
Just to set up the notation,
I will write discrete as X

28
00:01:47,240 --> 00:01:50,130
and continuous random
variable as Y for now.

29
00:01:50,130 --> 00:01:52,820
So they are given by its
probability distribution--

30
00:01:52,820 --> 00:01:57,070
discrete random variable is
given by its probability mass

31
00:01:57,070 --> 00:02:02,490
function, f sub
X, I will denote.

32
00:02:02,490 --> 00:02:06,900
And continuous is given by
probability distribution

33
00:02:06,900 --> 00:02:07,399
function.

34
00:02:11,530 --> 00:02:17,745
I will denote by f
sub Y. So pmf and pdf.

35
00:02:22,210 --> 00:02:23,930
Here, I just use a
subscript because I

36
00:02:23,930 --> 00:02:26,030
wanted to distinguish
f sub x and f sub y.

37
00:02:26,030 --> 00:02:29,140
But when it's clear which random
variable we're talking about,

38
00:02:29,140 --> 00:02:32,190
I'll just say f.

39
00:02:32,190 --> 00:02:33,740
So what is this?

40
00:02:33,740 --> 00:02:42,980
A probability mass function is
a function from the sample space

41
00:02:42,980 --> 00:02:50,290
to non-negative reals such
that the sum over all points

42
00:02:50,290 --> 00:02:54,480
in the domain equals 1.

43
00:02:54,480 --> 00:02:57,110
The probability distribution
is very similar.

44
00:02:59,730 --> 00:03:02,890
The function from the
sample space non-negative

45
00:03:02,890 --> 00:03:07,500
reals, but now the
integration over the domain.

46
00:03:11,780 --> 00:03:16,650
So it's pretty much safe to
consider our sample space

47
00:03:16,650 --> 00:03:20,570
to be the real numbers for
continuous random variables.

48
00:03:20,570 --> 00:03:23,960
Later in the course, you
will see some examples where

49
00:03:23,960 --> 00:03:25,230
it's not the real numbers.

50
00:03:25,230 --> 00:03:29,217
But for now, just consider
it as real numbers.

51
00:03:34,840 --> 00:03:39,412
For example, probability
mass function.

52
00:03:39,412 --> 00:03:46,810
If X takes 1 with
probability 1/3,

53
00:03:46,810 --> 00:03:53,010
minus 1 with probability 1/3,
and 0 with probability 1/3.

54
00:03:56,070 --> 00:04:01,464
Then our probability mass
function is f_x(1) equals

55
00:04:01,464 --> 00:04:08,370
f_x(-1), 1/3, just like that.

56
00:04:08,370 --> 00:04:11,820
An example of a
continuous random variable

57
00:04:11,820 --> 00:04:17,470
is if-- let's say, for
example, if f sub Y is

58
00:04:17,470 --> 00:04:25,420
equal to 1 for all
y in [0,1], then

59
00:04:25,420 --> 00:04:36,305
this is pdf of uniform
random variable

60
00:04:36,305 --> 00:04:39,800
where the space is [0,1].

61
00:04:39,800 --> 00:04:41,850
So this random variable
just picks one out

62
00:04:41,850 --> 00:04:44,330
of the three numbers
with equal probability.

63
00:04:44,330 --> 00:04:47,450
This picks one out of this,
all the real numbers between 0

64
00:04:47,450 --> 00:04:51,600
and 1, with equal probability.

65
00:04:51,600 --> 00:04:54,956
These are just some basic stuff.

66
00:04:54,956 --> 00:04:56,330
You should be
familiar with this,

67
00:04:56,330 --> 00:05:00,934
but I wrote it down just so
that we agree on the notation.

68
00:05:00,934 --> 00:05:01,858
OK.

69
00:05:01,858 --> 00:05:03,353
Both of the boards don't slide.

70
00:05:03,353 --> 00:05:06,311
That's good.

71
00:05:06,311 --> 00:05:08,490
A few more stuff.

72
00:05:08,490 --> 00:05:14,530
Expectation-- probability first.

73
00:05:14,530 --> 00:05:22,092
Probability of an event can be
computed as probability of A

74
00:05:22,092 --> 00:05:28,200
is equal to either sum of all
points in A-- this probability

75
00:05:28,200 --> 00:05:36,700
mass function-- or
integral over the set A

76
00:05:36,700 --> 00:05:39,540
depending on what you're using.

77
00:05:39,540 --> 00:05:50,050
And expectation, or mean
is-- expectation of X

78
00:05:50,050 --> 00:05:55,410
is equal to the sum over
all x, x times that.

79
00:05:55,410 --> 00:06:01,110
And expectation of Y is
the integral over omega.

80
00:06:01,110 --> 00:06:02,580
Oh, sorry.

81
00:06:02,580 --> 00:06:04,540
Space.

82
00:06:04,540 --> 00:06:05,538
y times.

83
00:06:11,016 --> 00:06:12,520
OK.

84
00:06:12,520 --> 00:06:16,850
And one more basic
concept I'd like to review

85
00:06:16,850 --> 00:06:32,150
is two random variables X_1, X_2
are independent if probability

86
00:06:32,150 --> 00:06:38,220
that X_1 is in A and
X_2 is in B equals

87
00:06:38,220 --> 00:06:48,898
the product of the
probabilities, for all events A

88
00:06:48,898 --> 00:06:54,222
and B. OK.

89
00:06:57,610 --> 00:06:59,570
All agreed?

90
00:06:59,570 --> 00:07:01,910
So for independence, I will
talk about independence

91
00:07:01,910 --> 00:07:04,570
of several random
variables as well.

92
00:07:04,570 --> 00:07:09,290
There are two concepts
of independence--

93
00:07:09,290 --> 00:07:10,760
not two, but several.

94
00:07:10,760 --> 00:07:17,220
The two most popular are
mutually independent events

95
00:07:17,220 --> 00:07:19,110
and pairwise independent events.

96
00:07:23,583 --> 00:07:27,060
Can somebody tell me the
difference between these two

97
00:07:27,060 --> 00:07:28,865
for several variables?

98
00:07:33,230 --> 00:07:34,200
Yes?

99
00:07:34,200 --> 00:07:35,655
AUDIENCE: So
usually, independent

100
00:07:35,655 --> 00:07:38,640
means all the random
variables are independent,

101
00:07:38,640 --> 00:07:42,550
like X_1 is independent
with every others.

102
00:07:42,550 --> 00:07:46,610
But pairwise means X_1
and X_2 are independent,

103
00:07:46,610 --> 00:07:51,677
but X_1, X_2, and x_3, they
may not be independent.

104
00:07:51,677 --> 00:07:52,260
PROFESSOR: OK.

105
00:07:52,260 --> 00:07:54,940
Maybe-- yeah.

106
00:07:54,940 --> 00:07:57,020
So that's good.

107
00:07:57,020 --> 00:08:04,420
So let's see-- for the example
of three random variables,

108
00:08:04,420 --> 00:08:07,770
it might be the case that
each pair are independent.

109
00:08:07,770 --> 00:08:10,110
X_1 and X_2 X_1 is
independent with X_2,

110
00:08:10,110 --> 00:08:12,940
X_1 is independent with
X_3, X_2 is with X_3.

111
00:08:12,940 --> 00:08:15,290
But altogether, it's
not independent.

112
00:08:15,290 --> 00:08:20,780
What that means is, this type
of statement is not true.

113
00:08:20,780 --> 00:08:25,200
So there are say A_1, A_2, A_3
for which this does not hold.

114
00:08:25,200 --> 00:08:28,150
But that's just some
technical detail.

115
00:08:28,150 --> 00:08:30,960
We will mostly just consider
mutually independent events.

116
00:08:30,960 --> 00:08:32,960
So when we say that several
random variables are

117
00:08:32,960 --> 00:08:36,630
independent, it just means
whatever collection you take,

118
00:08:36,630 --> 00:08:37,742
they're all independent.

119
00:08:43,995 --> 00:08:44,960
OK.

120
00:08:44,960 --> 00:08:47,780
So a little bit more fun
stuff [? in this ?] overview.

121
00:08:50,640 --> 00:08:54,275
So we defined random variables.

122
00:08:54,275 --> 00:08:59,060
And one of the most
universal random variable,

123
00:08:59,060 --> 00:09:02,310
or distribution, is a
normal distribution.

124
00:09:10,920 --> 00:09:14,450
It's a continuous
random variable.

125
00:09:14,450 --> 00:09:21,160
Our continuous random variable
has normal distribution,

126
00:09:21,160 --> 00:09:29,835
is said to have normal
distribution, if-- N(mu,

127
00:09:29,835 --> 00:09:40,380
sigma)-- if the probability
distribution function is given

128
00:09:40,380 --> 00:09:46,820
as 1 over sigma
square root 2 pi,

129
00:09:46,820 --> 00:09:50,830
e to the minus x
minus mu squared.

130
00:09:57,270 --> 00:10:01,194
For all reals.

131
00:10:01,194 --> 00:10:04,146
OK?

132
00:10:04,146 --> 00:10:12,500
So mu mean over--
that's one of the most

133
00:10:12,500 --> 00:10:17,050
universal random variables--
distributions, the most

134
00:10:17,050 --> 00:10:18,100
important one as well.

135
00:10:28,990 --> 00:10:29,870
OK.

136
00:10:29,870 --> 00:10:33,150
So this distribution, how
it looks like-- I'm sure

137
00:10:33,150 --> 00:10:36,043
you saw this bell curve before.

138
00:10:36,043 --> 00:10:42,351
It looks like this if
it's N(0,1), let's say.

139
00:10:42,351 --> 00:10:45,420
And that's your y.

140
00:10:45,420 --> 00:10:48,360
So it's centered
around the origin,

141
00:10:48,360 --> 00:10:52,090
and it's symmetrical
on the origin.

142
00:10:52,090 --> 00:10:55,290
So now let's look
at our purpose.

143
00:10:55,290 --> 00:10:56,850
Let's think about our purpose.

144
00:10:56,850 --> 00:11:01,940
We want to model a financial
product or a stock,

145
00:11:01,940 --> 00:11:05,350
the price of the stock,
using some random variable.

146
00:11:05,350 --> 00:11:09,065
The first thing you can try
is to use normal distribution.

147
00:11:09,065 --> 00:11:10,690
Normal distribution
doesn't make sense,

148
00:11:10,690 --> 00:11:19,586
but we can say the price at
day n minus the price at day n

149
00:11:19,586 --> 00:11:21,615
minus 1 is normal distribution.

150
00:11:25,575 --> 00:11:29,440
Is this a sensible definition?

151
00:11:29,440 --> 00:11:30,637
That's not really.

152
00:11:30,637 --> 00:11:31,720
So it's not a good choice.

153
00:11:31,720 --> 00:11:35,810
You can model it like this,
but it's not a good choice.

154
00:11:35,810 --> 00:11:38,050
There may be several
reasons, but one reason

155
00:11:38,050 --> 00:11:40,860
is that it doesn't take into
account the order of magnitude

156
00:11:40,860 --> 00:11:42,110
of the price itself.

157
00:11:42,110 --> 00:11:49,487
So the stock-- let's say
you have a stock price that

158
00:11:49,487 --> 00:11:52,730
goes something like that.

159
00:11:52,730 --> 00:11:58,620
And say it was $10
here, and $50 here.

160
00:11:58,620 --> 00:12:01,890
Regardless of where
your position is at,

161
00:12:01,890 --> 00:12:05,900
it says that the increment,
the absolute value of increment

162
00:12:05,900 --> 00:12:11,080
is identically distributed at
this point and at this point.

163
00:12:11,080 --> 00:12:14,770
But if you observed
how it works,

164
00:12:14,770 --> 00:12:18,040
usually that's not
normally distributed.

165
00:12:18,040 --> 00:12:21,800
What's normally distributed
is the percentage

166
00:12:21,800 --> 00:12:24,610
of how much it changes daily.

167
00:12:24,610 --> 00:12:32,125
So this is not a sensible
model, not a good model.

168
00:12:35,910 --> 00:12:41,200
But still, we can use
normal distribution

169
00:12:41,200 --> 00:12:42,830
to come up with a
pretty good model.

170
00:12:49,170 --> 00:13:06,130
So instead, what we want
is a relative difference

171
00:13:06,130 --> 00:13:07,892
to be normally distributed.

172
00:13:15,680 --> 00:13:16,720
That is the percent.

173
00:13:26,760 --> 00:13:33,150
The question is, what is
the distribution of price?

174
00:13:33,150 --> 00:13:34,826
What does the
distribution of price?

175
00:13:45,750 --> 00:13:48,660
So it's not a very
good explanation.

176
00:13:48,660 --> 00:13:52,860
Because I'm giving just
discrete increments while

177
00:13:52,860 --> 00:13:55,770
these are continuous
random variables and so on.

178
00:13:55,770 --> 00:13:59,030
But what I'm trying to say here
is that normal distribution

179
00:13:59,030 --> 00:14:00,500
is not good enough.

180
00:14:00,500 --> 00:14:03,360
Instead, we want the
percentage change

181
00:14:03,360 --> 00:14:05,450
to be normally distributed.

182
00:14:05,450 --> 00:14:11,300
And if that is the case,
what will be the distribution

183
00:14:11,300 --> 00:14:13,066
of the random variable?

184
00:14:13,066 --> 00:14:15,440
In this case, what will be
the distribution of the price?

185
00:14:27,420 --> 00:14:30,250
One thing I should
mention is, in this case,

186
00:14:30,250 --> 00:14:34,230
if each discrement is
normally distributed,

187
00:14:34,230 --> 00:14:39,530
then the price at
day n will still

188
00:14:39,530 --> 00:14:44,270
be a normal random variable
distributed like that.

189
00:14:47,440 --> 00:14:53,900
So if there's no tendency-- if
the average daily increment is

190
00:14:53,900 --> 00:14:56,832
0, then no matter
how far you go,

191
00:14:56,832 --> 00:14:58,915
your random variable will
be normally distributed.

192
00:15:02,230 --> 00:15:06,110
But here, that will
not be the case.

193
00:15:06,110 --> 00:15:08,785
So we want to see what
the distribution of P_n

194
00:15:08,785 --> 00:15:11,981
will be in this case.

195
00:15:11,981 --> 00:15:12,480
OK.

196
00:15:17,820 --> 00:15:29,300
To do that-- let me formally
write down what I want to say.

197
00:15:29,300 --> 00:15:34,008
What I want to say is this.

198
00:15:34,008 --> 00:15:46,030
I want to define a
log-normal distribution Y,

199
00:15:46,030 --> 00:16:07,274
or log-normal random variable
Y, such that log of Y

200
00:16:07,274 --> 00:16:08,762
is normally distributed.

201
00:16:24,170 --> 00:16:26,670
So to derive the probability
distribution of this

202
00:16:26,670 --> 00:16:28,220
from the normal
distribution, we can

203
00:16:28,220 --> 00:16:40,010
use the change of
variable formula, which

204
00:16:40,010 --> 00:16:47,340
says the following:
suppose X and Y

205
00:16:47,340 --> 00:17:16,781
are random variables such
that-- probability of X

206
00:17:16,781 --> 00:17:26,262
minus x-- for all x.

207
00:17:32,250 --> 00:17:48,218
Then F of Y of y--
the first-- of f

208
00:17:48,218 --> 00:17:52,709
sub X is equal to f sub Y of y.

209
00:17:58,198 --> 00:17:59,196
h of x.

210
00:18:07,200 --> 00:18:11,930
So let's try to fit
into this story.

211
00:18:11,930 --> 00:18:14,920
We want to have a
random variable Y such

212
00:18:14,920 --> 00:18:18,510
that log Y is
normally distributed.

213
00:18:18,510 --> 00:18:26,430
Here-- so you can
put log of x here.

214
00:18:26,430 --> 00:18:30,300
If Y is normally distributed,
X will be the distribution

215
00:18:30,300 --> 00:18:32,890
that we're interested in.

216
00:18:32,890 --> 00:18:37,870
So using this formula, we can
find probability distribution

217
00:18:37,870 --> 00:18:40,650
function of the log-normal
distribution using

218
00:18:40,650 --> 00:18:43,720
the probability
distribution of normal.

219
00:18:43,720 --> 00:18:44,810
So let's do that.

220
00:19:05,669 --> 00:19:10,659
AUDIENCE: [INAUDIBLE], right?

221
00:19:10,659 --> 00:19:12,910
PROFESSOR: Yes.

222
00:19:12,910 --> 00:19:15,006
So it's not a good choice.

223
00:19:15,006 --> 00:19:16,380
Locally, it might
be good choice.

224
00:19:16,380 --> 00:19:20,357
But if it's taken
over a long time,

225
00:19:20,357 --> 00:19:21,440
it won't be a good choice.

226
00:19:21,440 --> 00:19:24,398
Because it will also take
negative values, for example.

227
00:19:28,517 --> 00:19:30,100
So if you just take
this model, what's

228
00:19:30,100 --> 00:19:31,849
going to happen over
a long period of time

229
00:19:31,849 --> 00:19:35,730
is it's going to hit
this square root of n,

230
00:19:35,730 --> 00:19:38,090
negative square root of
n line infinitely often.

231
00:19:42,050 --> 00:19:44,620
And then it can
go up to infinity,

232
00:19:44,620 --> 00:19:47,470
or it can go down to
infinity eventually.

233
00:19:47,470 --> 00:19:49,720
So it will take negative
values and positive values.

234
00:19:53,310 --> 00:19:55,460
That's one reason, but
there are several reasons

235
00:19:55,460 --> 00:19:57,970
why that's not a good choice.

236
00:19:57,970 --> 00:19:59,440
If you look at a
very small scale,

237
00:19:59,440 --> 00:20:03,610
it might be OK, because the base
price doesn't change that much.

238
00:20:03,610 --> 00:20:05,490
So if you model
in terms of ratio,

239
00:20:05,490 --> 00:20:07,930
our if you model it
in an absolute way,

240
00:20:07,930 --> 00:20:09,830
it doesn't matter that much.

241
00:20:09,830 --> 00:20:13,850
But if you want to do it a
little bit more large scale,

242
00:20:13,850 --> 00:20:17,890
then that's not a
very good choice.

243
00:20:17,890 --> 00:20:20,120
Other questions?

244
00:20:20,120 --> 00:20:21,745
Do you want me to
add some explanation?

245
00:20:25,322 --> 00:20:25,822
OK.

246
00:20:29,580 --> 00:20:32,720
So let me get this right.

247
00:20:37,120 --> 00:20:45,440
Y. I want X to be-- yes.

248
00:20:45,440 --> 00:20:49,950
I want X to be the log
normal distribution.

249
00:20:56,950 --> 00:21:04,580
And I want Y to be
normal distribution

250
00:21:04,580 --> 00:21:07,190
or a normal random variable.

251
00:21:07,190 --> 00:21:12,572
Then the probability
that X is at most x

252
00:21:12,572 --> 00:21:24,500
equals the probability
that Y is at most-- sigma.

253
00:21:24,500 --> 00:21:29,070
Y is at most log x.

254
00:21:29,070 --> 00:21:33,160
That's the definition of
log-normal distribution.

255
00:21:33,160 --> 00:21:39,130
Then by using this change
of variable formula,

256
00:21:39,130 --> 00:21:41,780
probability density
function of X

257
00:21:41,780 --> 00:21:46,980
is equal to probability
density function of Y at log

258
00:21:46,980 --> 00:21:54,440
x times the differentiation
of log x which is 1 over x.

259
00:21:54,440 --> 00:22:00,460
So it becomes 1 over
x sigma square root

260
00:22:00,460 --> 00:22:07,704
2 pi, e to the minus
log x minus mu squared.

261
00:22:11,610 --> 00:22:13,430
So log-normal
distribution can also

262
00:22:13,430 --> 00:22:15,380
be defined as the
distribution which has

263
00:22:15,380 --> 00:22:17,246
probability mass function this.

264
00:22:22,650 --> 00:22:26,160
You can use either definition.

265
00:22:26,160 --> 00:22:29,391
Let me just make sure that I
didn't mess up in the middle.

266
00:22:32,800 --> 00:22:33,780
Yes.

267
00:22:33,780 --> 00:22:39,187
And that only works
for x greater than 0.

268
00:22:39,187 --> 00:22:39,687
Yes?

269
00:22:39,687 --> 00:22:41,714
AUDIENCE: [INAUDIBLE]?

270
00:22:41,714 --> 00:22:42,380
PROFESSOR: Yeah.

271
00:22:42,380 --> 00:22:43,940
So all logs are natural log.

272
00:22:43,940 --> 00:22:46,171
It should be ln.

273
00:22:46,171 --> 00:22:46,670
Yeah.

274
00:22:46,670 --> 00:22:48,320
Thank you.

275
00:22:48,320 --> 00:22:49,810
OK.

276
00:22:49,810 --> 00:22:58,370
So question-- what's the mean
of this distribution here?

277
00:22:58,370 --> 00:22:58,870
Yeah?

278
00:22:58,870 --> 00:23:00,970
AUDIENCE: 1?

279
00:23:00,970 --> 00:23:02,460
PROFESSOR: Not 1.

280
00:23:02,460 --> 00:23:04,820
It might be mu.

281
00:23:04,820 --> 00:23:07,500
Is it mu?

282
00:23:07,500 --> 00:23:08,260
Oh, sorry.

283
00:23:08,260 --> 00:23:09,850
It might be e to the mu.

284
00:23:09,850 --> 00:23:15,470
Because log X, the normal
distribution had mean mu.

285
00:23:15,470 --> 00:23:17,630
log x equals mu
might be the center.

286
00:23:17,630 --> 00:23:20,850
If that's the case, x is e
to the mu will be the mean.

287
00:23:20,850 --> 00:23:23,915
Is that the case?

288
00:23:23,915 --> 00:23:24,415
Yes?

289
00:23:24,415 --> 00:23:27,890
AUDIENCE: Can you get
the mu minus [INAUDIBLE]?

290
00:23:27,890 --> 00:23:29,760
PROFESSOR: Probably right.

291
00:23:29,760 --> 00:23:31,070
I don't remember what's there.

292
00:23:31,070 --> 00:23:32,490
There is a correcting factor.

293
00:23:32,490 --> 00:23:34,292
I don't remember
exactly what that is,

294
00:23:34,292 --> 00:23:37,210
but I think you're right.

295
00:23:37,210 --> 00:23:39,770
So one very important
thing to remember

296
00:23:39,770 --> 00:23:43,500
is log-normal
distribution are referred

297
00:23:43,500 --> 00:23:48,150
to in terms of the
parameters mu and sigma,

298
00:23:48,150 --> 00:23:50,510
because that's the mu and
sigma up here and here coming

299
00:23:50,510 --> 00:23:52,600
from the normal distribution.

300
00:23:52,600 --> 00:23:57,580
But those are not the
mean and variance anymore,

301
00:23:57,580 --> 00:24:01,900
because you skew
the distribution.

302
00:24:01,900 --> 00:24:03,700
It's no longer centered at mu.

303
00:24:03,700 --> 00:24:07,490
log X is centered at mu, but
when it takes exponential,

304
00:24:07,490 --> 00:24:08,590
it becomes skewed.

305
00:24:08,590 --> 00:24:12,630
And we take the average,
you'll see that the mean

306
00:24:12,630 --> 00:24:13,930
is no longer e to the mu.

307
00:24:13,930 --> 00:24:16,365
So that doesn't give the mean.

308
00:24:16,365 --> 00:24:18,490
That doesn't imply that
the mean is e to the sigma.

309
00:24:18,490 --> 00:24:20,870
That doesn't imply
that the variance is

310
00:24:20,870 --> 00:24:23,242
something like e to the sigma.

311
00:24:23,242 --> 00:24:27,040
That's just totally nonsense.

312
00:24:27,040 --> 00:24:30,080
Just remember-- these are just
parameters, some parameters.

313
00:24:30,080 --> 00:24:32,450
It's no longer mean or variance.

314
00:24:35,670 --> 00:24:39,794
And in your homework,
one exercise,

315
00:24:39,794 --> 00:24:41,710
we'll ask you to compute
the mean and variance

316
00:24:41,710 --> 00:24:44,490
of the random variable.

317
00:24:44,490 --> 00:24:48,560
But really, just try to
have it stick in your mind

318
00:24:48,560 --> 00:24:53,160
that mu and sigma is no
longer mean and variance.

319
00:24:53,160 --> 00:24:56,230
That's only the case for
normal random variables.

320
00:24:56,230 --> 00:24:58,380
And the reason we are
still using mu and sigma

321
00:24:58,380 --> 00:25:00,680
is because of this derivation.

322
00:25:00,680 --> 00:25:02,390
And it's easy to
describe it in those.

323
00:25:05,830 --> 00:25:07,940
OK.

324
00:25:07,940 --> 00:25:11,800
So the normal distribution
and log-normal distribution

325
00:25:11,800 --> 00:25:13,720
will probably be
the distributions

326
00:25:13,720 --> 00:25:15,742
that you'll see the most
throughout the course.

327
00:25:15,742 --> 00:25:17,325
But there are some
other distributions

328
00:25:17,325 --> 00:25:18,500
that you'll also see.

329
00:25:23,460 --> 00:25:24,948
I need this.

330
00:25:32,884 --> 00:25:35,650
I will not talk
about it in detail.

331
00:25:35,650 --> 00:25:38,540
It will be some
exercise questions.

332
00:25:38,540 --> 00:25:44,939
For example, you have Poisson
distribution or exponential

333
00:25:44,939 --> 00:25:45,522
distributions.

334
00:25:52,130 --> 00:25:56,550
These are some other
distributions that you'll see.

335
00:25:56,550 --> 00:25:59,060
And all of these-- normal,
log-normal, Poisson,

336
00:25:59,060 --> 00:26:01,060
and exponential,
and a lot more can

337
00:26:01,060 --> 00:26:04,400
be grouped into a
family of distributions

338
00:26:04,400 --> 00:26:05,798
called exponential family.

339
00:26:18,490 --> 00:26:24,026
So a distribution is called to
be in an exponential family--

340
00:26:24,026 --> 00:26:36,590
A distribution belongs
to exponential family

341
00:26:36,590 --> 00:26:50,890
if there exists a theta,
a vector that parametrizes

342
00:26:50,890 --> 00:27:05,520
the distribution such that
the probability density

343
00:27:05,520 --> 00:27:10,670
function for this choice
of parameter theta

344
00:27:10,670 --> 00:27:16,480
can be written as h
of x times c of theta

345
00:27:16,480 --> 00:27:22,498
times the exponent of
sum from i equal 1 to k--

346
00:27:35,446 --> 00:27:35,970
Yes.

347
00:27:35,970 --> 00:27:40,100
So here, when I write
only x, h should only

348
00:27:40,100 --> 00:27:43,400
depend on x, not on theta.

349
00:27:43,400 --> 00:27:45,090
When I write some
function of theta,

350
00:27:45,090 --> 00:27:48,020
it should only depend
on theta, not on x.

351
00:27:48,020 --> 00:28:01,070
So h(x), t_i(x) depends only
on x and c(theta) on my value

352
00:28:01,070 --> 00:28:04,679
theta, depends only on theta.

353
00:28:04,679 --> 00:28:05,720
That's an abstract thing.

354
00:28:05,720 --> 00:28:07,830
It's not clear why
this is so useful,

355
00:28:07,830 --> 00:28:10,140
at least from the definition.

356
00:28:10,140 --> 00:28:14,955
But you're going to talk
about some distribution

357
00:28:14,955 --> 00:28:16,650
for an exponential
family, right?

358
00:28:16,650 --> 00:28:17,150
Yeah.

359
00:28:17,150 --> 00:28:19,840
So you will see
something about this.

360
00:28:19,840 --> 00:28:21,770
But one good thing
is, they exhibit

361
00:28:21,770 --> 00:28:25,360
some good statistical
behavior, the things-- when

362
00:28:25,360 --> 00:28:28,330
you group them into--
all distributions

363
00:28:28,330 --> 00:28:31,460
in the exponential family
have some nice statistical

364
00:28:31,460 --> 00:28:35,590
properties, which makes it good.

365
00:28:35,590 --> 00:28:37,270
That's too abstract.

366
00:28:37,270 --> 00:28:42,140
Let's see how log-normal
distribution actually falls

367
00:28:42,140 --> 00:28:43,631
into the exponential family.

368
00:28:47,607 --> 00:28:49,444
AUDIENCE: So, let
me just comment.

369
00:28:49,444 --> 00:28:50,360
PROFESSOR: Yeah, sure.

370
00:28:50,360 --> 00:28:53,976
AUDIENCE: The notion of
independent random variables,

371
00:28:53,976 --> 00:28:58,687
you went over how the--
well, the probability density

372
00:28:58,687 --> 00:29:00,520
functions of collections
of random variables

373
00:29:00,520 --> 00:29:01,936
if they're mutually
independent is

374
00:29:01,936 --> 00:29:05,640
the product of the
probability densities

375
00:29:05,640 --> 00:29:07,132
of the individual variables.

376
00:29:07,132 --> 00:29:10,240
And so with this
exponential family,

377
00:29:10,240 --> 00:29:12,685
if you have random variables
from the same exponential

378
00:29:12,685 --> 00:29:18,380
family, products of this
density function factor out

379
00:29:18,380 --> 00:29:19,700
into a very simple form.

380
00:29:19,700 --> 00:29:21,360
It doesn't get more
complicated as you

381
00:29:21,360 --> 00:29:24,430
look at the joint density
of many variables,

382
00:29:24,430 --> 00:29:27,510
and in fact simplifies to
the same exponential family.

383
00:29:27,510 --> 00:29:30,210
So that's where that
becomes very useful.

384
00:29:30,210 --> 00:29:32,305
PROFESSOR: So it's designed
so that it factors out

385
00:29:32,305 --> 00:29:33,180
when it's multiplied.

386
00:29:33,180 --> 00:29:34,644
It factors out well.

387
00:29:37,990 --> 00:29:38,650
OK.

388
00:29:38,650 --> 00:29:43,000
So-- sorry about that.

389
00:29:43,000 --> 00:29:44,960
Yeah, log-normal distribution.

390
00:29:44,960 --> 00:29:49,970
So take h(x), 1 over x.

391
00:29:49,970 --> 00:29:52,350
Before that, let's just rewrite
that in a different way.

392
00:29:52,350 --> 00:29:58,804
So 1 over x sigma square
root 2 pi, e to the minus log

393
00:29:58,804 --> 00:30:03,430
x [INAUDIBLE] squared.

394
00:30:03,430 --> 00:30:04,530
Square.

395
00:30:04,530 --> 00:30:10,546
Can be rewritten as 1
over x, times 1 over sigma

396
00:30:10,546 --> 00:30:18,215
squared 2 pi, e to
the minus log x square

397
00:30:18,215 --> 00:30:30,590
over 2 sigma square plus
mu log x over sigma square

398
00:30:30,590 --> 00:30:33,065
minus mu square.

399
00:30:37,050 --> 00:30:38,730
Let's write it like that.

400
00:30:38,730 --> 00:30:42,464
Set up h(x) equals 1 over x.

401
00:30:42,464 --> 00:30:51,422
c of theta-- sorry,
theta equals mu sigma.

402
00:30:51,422 --> 00:30:55,932
c(theta) is equal to 1 over
sigma square root 2 pi, e

403
00:30:55,932 --> 00:30:57,163
to the minus mu square.

404
00:31:01,510 --> 00:31:03,920
So you will
parametrize this family

405
00:31:03,920 --> 00:31:06,870
in terms of mu and sigma.

406
00:31:06,870 --> 00:31:09,490
Your h of x here
will be 1 over x.

407
00:31:09,490 --> 00:31:14,000
Your c(theta) will be this
term and the last term here,

408
00:31:14,000 --> 00:31:16,960
because this
doesn't depend on x.

409
00:31:16,960 --> 00:31:21,630
And then you have to
figure out what w and t is.

410
00:31:21,630 --> 00:31:24,970
You can let w_1 of
x be log x square.

411
00:31:29,180 --> 00:31:38,940
t_1-- no, t_1 of x be log x
square, w_1 of theta be minus 1

412
00:31:38,940 --> 00:31:41,392
over 2 sigma square.

413
00:31:41,392 --> 00:31:44,080
And similarly, you
can let t_2 equals log

414
00:31:44,080 --> 00:31:51,404
x and w_2 equals mu over sigma.

415
00:31:54,580 --> 00:31:56,570
It's just some technicality,
but at least you

416
00:31:56,570 --> 00:31:59,974
can see it really fits in.

417
00:32:02,690 --> 00:32:05,200
OK.

418
00:32:05,200 --> 00:32:07,380
So that's all
about distributions

419
00:32:07,380 --> 00:32:10,080
that I want to talk about.

420
00:32:10,080 --> 00:32:12,640
And then let's talk
a little bit more

421
00:32:12,640 --> 00:32:15,340
about more interesting
stuff, in my opinion.

422
00:32:15,340 --> 00:32:16,705
I like this stuff better.

423
00:32:19,440 --> 00:32:23,340
There are two main things
that we're interested in.

424
00:32:23,340 --> 00:32:30,650
When we have a random variable,
at least for our purpose, what

425
00:32:30,650 --> 00:32:42,766
we want to study is given
a random variable, first,

426
00:32:42,766 --> 00:32:44,015
we want to study a statistics.

427
00:32:50,710 --> 00:32:54,826
So we want to study this
statistics, whatever

428
00:32:54,826 --> 00:32:55,798
that means.

429
00:32:59,690 --> 00:33:02,567
And that will be represented
by the k-th moments

430
00:33:02,567 --> 00:33:03,525
of the random variable.

431
00:33:10,340 --> 00:33:15,370
Where k-th moment is defined
as expectation of X to the k.

432
00:33:20,600 --> 00:33:24,000
And a good way to study
all the moments together

433
00:33:24,000 --> 00:33:26,855
in one function is a
moment-generating function.

434
00:33:34,300 --> 00:33:36,480
So this moment-generating
function

435
00:33:36,480 --> 00:33:40,340
encodes all the k-th moments
of a random variable.

436
00:33:40,340 --> 00:33:43,130
So it contains all the
statistical information

437
00:33:43,130 --> 00:33:45,339
of a random variable.

438
00:33:45,339 --> 00:33:46,880
That's why
moment-generating function

439
00:33:46,880 --> 00:33:48,060
will be interesting to us.

440
00:33:48,060 --> 00:33:50,050
Because when you
want to study it,

441
00:33:50,050 --> 00:33:52,760
you don't have to consider
each moment separately.

442
00:33:52,760 --> 00:33:54,090
It gives a unified way.

443
00:33:54,090 --> 00:33:58,050
It gives a very good
feeling about your function.

444
00:33:58,050 --> 00:33:59,560
That will be our first topic.

445
00:33:59,560 --> 00:34:02,200
Our second topic will
be we want to study

446
00:34:02,200 --> 00:34:10,140
its long-term or
large-scale behavior.

447
00:34:18,190 --> 00:34:21,199
So for example, assume that you
have a normal distribution--

448
00:34:21,199 --> 00:34:24,449
one random variable with
normal distribution.

449
00:34:24,449 --> 00:34:28,800
If we just have a
single random variable,

450
00:34:28,800 --> 00:34:30,760
you really have no control.

451
00:34:30,760 --> 00:34:31,870
It can be anywhere.

452
00:34:31,870 --> 00:34:39,260
The outcome can be anything
according to that distribution.

453
00:34:39,260 --> 00:34:41,429
But if you have several
independent random variables

454
00:34:41,429 --> 00:34:44,540
with the exact
same distribution,

455
00:34:44,540 --> 00:34:49,530
if the number is super large--
let's say 100 million--

456
00:34:49,530 --> 00:34:55,320
and you plot how many random
variables fall into each point

457
00:34:55,320 --> 00:34:58,150
into a graph,
you'll know that it

458
00:34:58,150 --> 00:35:01,672
has to look very
close to this curve.

459
00:35:01,672 --> 00:35:04,160
It will be more dense
here, sparser there,

460
00:35:04,160 --> 00:35:06,720
and sparser there.

461
00:35:06,720 --> 00:35:09,050
So you don't have
individual control on each

462
00:35:09,050 --> 00:35:10,150
of the random variables.

463
00:35:10,150 --> 00:35:12,185
But when you look
at large scale,

464
00:35:12,185 --> 00:35:16,860
you know, at least with
very high probability,

465
00:35:16,860 --> 00:35:19,990
it has to look like this curve.

466
00:35:19,990 --> 00:35:22,480
Those kind of things are
what we want to study.

467
00:35:22,480 --> 00:35:25,720
When we look at this long-term
behavior or large scale

468
00:35:25,720 --> 00:35:28,500
behavior, what can we say?

469
00:35:28,500 --> 00:35:30,130
What kind of events
are guaranteed

470
00:35:30,130 --> 00:35:35,110
to happen with probability,
let's say, 99.9%?

471
00:35:35,110 --> 00:35:38,680
And actually, some interesting
things are happening.

472
00:35:38,680 --> 00:35:44,800
As you might already know, two
typical theorems of this type

473
00:35:44,800 --> 00:35:46,850
will be, in this
topic will be law

474
00:35:46,850 --> 00:35:53,282
of large numbers and
central limit theorem.

475
00:36:02,520 --> 00:36:04,590
So let's start with
our first topic--

476
00:36:04,590 --> 00:36:05,975
the moment-generating function.

477
00:36:26,310 --> 00:36:28,800
The moment-generating
function of a random variable

478
00:36:28,800 --> 00:36:31,540
is defined as-- I
write it as m sub

479
00:36:31,540 --> 00:36:39,330
X. It's defined as expectation
of e to the t times x

480
00:36:39,330 --> 00:36:41,090
where t is some parameter.

481
00:36:41,090 --> 00:36:42,510
t can be any real.

482
00:36:47,372 --> 00:36:48,330
You have to be careful.

483
00:36:48,330 --> 00:36:51,680
It doesn't always converge.

484
00:36:51,680 --> 00:36:58,360
So remark: does not
necessarily exist.

485
00:37:09,900 --> 00:37:12,960
So for example, one of the
distributions you already saw

486
00:37:12,960 --> 00:37:15,010
does not have
moment-generating function.

487
00:37:15,010 --> 00:37:22,101
The log-normal
distribution does not

488
00:37:22,101 --> 00:37:23,600
have any moment-generating
function.

489
00:37:30,650 --> 00:37:33,720
And that's one thing
you have to be careful.

490
00:37:33,720 --> 00:37:35,870
It's not just some
theoretical thing.

491
00:37:38,329 --> 00:37:40,120
The statement is not
something theoretical.

492
00:37:40,120 --> 00:37:42,670
It actually happens for
some random variables

493
00:37:42,670 --> 00:37:45,548
that you encounter in your life.

494
00:37:45,548 --> 00:37:48,190
So be careful.

495
00:37:48,190 --> 00:37:54,460
And that will actually show
some very interesting thing

496
00:37:54,460 --> 00:37:57,220
I will later explain.

497
00:37:57,220 --> 00:37:59,796
Some very interesting
facts arise from this fact.

498
00:38:03,900 --> 00:38:06,277
Before going into
that, first of all,

499
00:38:06,277 --> 00:38:08,110
why is it called
moment-generating function?

500
00:38:08,110 --> 00:38:14,540
It's because if you
take the k-th derivative

501
00:38:14,540 --> 00:38:26,280
of this function,
then it actually

502
00:38:26,280 --> 00:38:33,131
gives the k-th moment
of your random variable.

503
00:38:33,131 --> 00:38:34,505
That's where the
name comes from.

504
00:38:43,235 --> 00:38:45,225
It's for all integers.

505
00:38:58,320 --> 00:39:00,040
And that gives a
different way of writing

506
00:39:00,040 --> 00:39:01,248
a moment-generating function.

507
00:39:11,230 --> 00:39:18,090
Because of that, we may write
the moment-generating function

508
00:39:18,090 --> 00:39:24,992
as the sum from k equals
0 to infinity, t to the k,

509
00:39:24,992 --> 00:39:29,912
k factorial, times
a k-th moment.

510
00:39:37,790 --> 00:39:40,469
That's like the
Taylor expansion.

511
00:39:40,469 --> 00:39:42,010
Because you know
all the derivatives,

512
00:39:42,010 --> 00:39:43,551
you know what the
functions would be.

513
00:39:43,551 --> 00:39:45,300
Of course, only if it exists.

514
00:39:45,300 --> 00:39:46,300
This might not converge.

515
00:39:55,080 --> 00:39:58,360
So if moment-generating
function exists,

516
00:39:58,360 --> 00:40:01,120
they pretty much classify
your random variables.

517
00:40:04,630 --> 00:40:09,020
So if two random
variables, X, Y,

518
00:40:09,020 --> 00:40:16,120
have the same
moment-generating function,

519
00:40:16,120 --> 00:40:24,835
then X and Y have the
same distribution.

520
00:40:30,020 --> 00:40:32,550
I will not prove this theorem.

521
00:40:32,550 --> 00:40:35,080
But it says that
moment-generating function,

522
00:40:35,080 --> 00:40:39,600
if it exists, encodes
really all the information

523
00:40:39,600 --> 00:40:41,516
about your random variables.

524
00:40:41,516 --> 00:40:42,990
You're not losing anything.

525
00:40:46,320 --> 00:40:50,540
However, be very careful when
you're applying this theorem.

526
00:40:50,540 --> 00:40:59,920
Because remark,
it does not imply

527
00:40:59,920 --> 00:41:20,740
that all random variables
with identical k-th moments

528
00:41:20,740 --> 00:41:26,790
for all k has the
same distribution.

529
00:41:37,418 --> 00:41:40,030
Do you see it?

530
00:41:40,030 --> 00:41:43,330
If X and Y have a
moment-generating function,

531
00:41:43,330 --> 00:41:49,210
and they're the same, then they
have the same distribution.

532
00:41:49,210 --> 00:41:52,710
This looks a little bit
controversial to this theorem.

533
00:41:52,710 --> 00:41:56,890
It says that it's not
necessarily the case

534
00:41:56,890 --> 00:42:01,000
that two random variables, which
have identical moments-- so

535
00:42:01,000 --> 00:42:04,750
all k-th moments are the
same for two variables--

536
00:42:04,750 --> 00:42:06,710
even if that's the case,
they don't necessarily

537
00:42:06,710 --> 00:42:10,060
have to have the
same distribution.

538
00:42:10,060 --> 00:42:12,014
Which seems like it
doesn't make sense

539
00:42:12,014 --> 00:42:13,180
if you look at this theorem.

540
00:42:13,180 --> 00:42:14,596
Because moment-generating
function

541
00:42:14,596 --> 00:42:16,650
is defined in terms
of the moments.

542
00:42:16,650 --> 00:42:18,742
If two random variables
have the same moment,

543
00:42:18,742 --> 00:42:20,575
we have the same
moment-generating function.

544
00:42:20,575 --> 00:42:22,616
If they have the same
moment-generating function,

545
00:42:22,616 --> 00:42:24,970
they have the same distribution.

546
00:42:24,970 --> 00:42:28,450
There is a hole
in this argument.

547
00:42:28,450 --> 00:42:31,850
Even if they have
the same moments,

548
00:42:31,850 --> 00:42:33,792
it doesn't necessarily
imply that they

549
00:42:33,792 --> 00:42:35,500
have the same
moment-generating function.

550
00:42:35,500 --> 00:42:39,520
They might both not have
moment-generating functions.

551
00:42:39,520 --> 00:42:42,620
That's the glitch.

552
00:42:42,620 --> 00:42:44,040
Be careful.

553
00:42:44,040 --> 00:42:47,587
So just remember that even if
they have the same moments,

554
00:42:47,587 --> 00:42:49,670
they don't necessarily
have the same distribution.

555
00:42:49,670 --> 00:42:51,740
And the reason is
because-- one reason

556
00:42:51,740 --> 00:42:56,110
is because the moment-generating
function might not exist.

557
00:42:56,110 --> 00:42:57,930
And if you look in
to Wikipedia, you'll

558
00:42:57,930 --> 00:43:00,850
see an example of
when it happens,

559
00:43:00,850 --> 00:43:03,345
of two random variables
where this happens.

560
00:43:10,310 --> 00:43:13,380
So that's one thing
we will use later.

561
00:43:13,380 --> 00:43:17,660
Another thing that
we will use later,

562
00:43:17,660 --> 00:43:20,950
it's a statement
very similar to that,

563
00:43:20,950 --> 00:43:25,820
but it says something about a
sequence of random variables.

564
00:43:25,820 --> 00:43:39,406
So if X_1, X_2, up to X_n is
a sequence of random variables

565
00:43:39,406 --> 00:43:48,470
such that the moment-generating
function exists,

566
00:43:48,470 --> 00:43:52,580
and it converges-- ah,
it goes to infinity.

567
00:43:57,542 --> 00:44:03,250
Tends to the
moment-generating function

568
00:44:03,250 --> 00:44:05,380
of some random variable t.

569
00:44:05,380 --> 00:44:13,091
X. For some random
variable X for all t.

570
00:44:16,250 --> 00:44:18,970
Here, we're assuming that all
moment-generating function

571
00:44:18,970 --> 00:44:20,280
exists.

572
00:44:20,280 --> 00:44:22,050
So again, the
situation is, you have

573
00:44:22,050 --> 00:44:24,900
a sequence of random variables.

574
00:44:24,900 --> 00:44:27,600
Their moment-generating
function exists.

575
00:44:27,600 --> 00:44:31,790
And in each point
t, it converges

576
00:44:31,790 --> 00:44:33,967
to the value of the
moment-generating function

577
00:44:33,967 --> 00:44:35,300
of some other random variable x.

578
00:44:38,270 --> 00:44:41,310
And what should happen?

579
00:44:41,310 --> 00:44:43,880
In light of this theorem,
it should be the case

580
00:44:43,880 --> 00:44:47,490
that the distribution
of this sequence

581
00:44:47,490 --> 00:44:49,240
gets closer and closer
to the distribution

582
00:44:49,240 --> 00:44:53,360
of this random variable x.

583
00:44:53,360 --> 00:45:00,220
And to make it formal, to make
that information formal, what

584
00:45:00,220 --> 00:45:09,760
we can conclude is, for
all x, the probability

585
00:45:09,760 --> 00:45:15,440
X_i is less than or equal to
x tends to the probability

586
00:45:15,440 --> 00:45:17,300
that at x.

587
00:45:20,090 --> 00:45:22,990
So in this sense,
the distributions

588
00:45:22,990 --> 00:45:25,940
of these random variables
converges to the distribution

589
00:45:25,940 --> 00:45:27,216
of that random variable.

590
00:45:30,090 --> 00:45:32,330
So it's just a technical issue.

591
00:45:32,330 --> 00:45:38,890
You can just think of it as
these random variables converge

592
00:45:38,890 --> 00:45:41,200
to that random variable.

593
00:45:41,200 --> 00:45:43,230
If you take some graduate
probability course,

594
00:45:43,230 --> 00:45:47,100
you'll see that there's
several possible ways

595
00:45:47,100 --> 00:45:48,730
to define convergence.

596
00:45:48,730 --> 00:45:50,740
But that's just
some technicality.

597
00:45:50,740 --> 00:45:53,397
And the spirit
here is just really

598
00:45:53,397 --> 00:45:55,730
the sequence converges if its
moment-generating function

599
00:45:55,730 --> 00:45:56,229
converges.

600
00:45:59,790 --> 00:46:02,470
So as you can see from
these two theorems,

601
00:46:02,470 --> 00:46:04,440
moment-generating
function, if it exists,

602
00:46:04,440 --> 00:46:08,270
is a really powerful
tool that allows you

603
00:46:08,270 --> 00:46:09,480
to control the distribution.

604
00:46:13,060 --> 00:46:16,407
You'll see some applications
later in central limit theorem.

605
00:46:16,407 --> 00:46:16,990
Any questions?

606
00:46:21,530 --> 00:46:22,446
AUDIENCE: [INAUDIBLE]?

607
00:46:28,557 --> 00:46:29,390
PROFESSOR: This one?

608
00:46:32,870 --> 00:46:34,154
Why?

609
00:46:34,154 --> 00:46:35,612
AUDIENCE: Because
it starts with t,

610
00:46:35,612 --> 00:46:38,162
and the right-hand side
has nothing general.

611
00:46:40,777 --> 00:46:41,360
PROFESSOR: Ah.

612
00:46:44,318 --> 00:46:47,180
Thank you.

613
00:46:47,180 --> 00:46:48,350
We evaluated at zero.

614
00:46:53,230 --> 00:46:54,694
Other questions?

615
00:46:54,694 --> 00:46:56,646
Other corrections?

616
00:46:56,646 --> 00:46:59,086
AUDIENCE: When you say the
moment-generating function

617
00:46:59,086 --> 00:47:01,526
doesn't exist, do you mean
that it isn't analytic

618
00:47:01,526 --> 00:47:03,010
or it doesn't converge?

619
00:47:03,010 --> 00:47:04,580
PROFESSOR: It
might not converge.

620
00:47:04,580 --> 00:47:08,130
So log-normal distribution,
it does not converge.

621
00:47:08,130 --> 00:47:10,412
So for all non-zero
t, it does not

622
00:47:10,412 --> 00:47:12,109
converge, for
log-normal distribution.

623
00:47:12,109 --> 00:47:13,025
AUDIENCE: [INAUDIBLE]?

624
00:47:16,350 --> 00:47:17,140
PROFESSOR: Here?

625
00:47:17,140 --> 00:47:17,640
Yes.

626
00:47:17,640 --> 00:47:19,822
Pointwise convergence implies
pointwise convergence.

627
00:47:22,420 --> 00:47:22,945
No, no.

628
00:47:26,760 --> 00:47:30,474
Because it's pointwise, this
conclusion is also rather weak.

629
00:47:30,474 --> 00:47:32,640
It's almost the weakest
convergence in distribution.

630
00:48:01,024 --> 00:48:01,524
OK.

631
00:48:01,524 --> 00:48:12,480
The law of large numbers.

632
00:49:04,100 --> 00:49:06,940
So now we're talking about
large-scale behavior.

633
00:49:06,940 --> 00:49:09,630
Let X_1 up to X_n be
independent random variables

634
00:49:09,630 --> 00:49:11,334
with identical distribution.

635
00:49:11,334 --> 00:49:13,250
We don't really know
what the distribution is,

636
00:49:13,250 --> 00:49:15,270
but we know that
they're all the same.

637
00:49:15,270 --> 00:49:18,620
In short, I'll just refer
to this condition as i.i.d.

638
00:49:18,620 --> 00:49:21,990
random variables later.

639
00:49:21,990 --> 00:49:25,048
Independent, identically
distributed random variables.

640
00:49:29,040 --> 00:49:36,530
And let mean be mu,
variance be sigma square.

641
00:49:44,470 --> 00:49:50,740
Let's also define X as the
average of n random variables.

642
00:49:54,590 --> 00:50:22,986
Then the probability that--
X-- for all-- all positive

643
00:50:22,986 --> 00:50:23,486
[INAUDIBLE].

644
00:50:31,590 --> 00:50:35,100
So whenever you have identical
independent distributions, when

645
00:50:35,100 --> 00:50:39,050
you take their average, if
you take a large enough number

646
00:50:39,050 --> 00:50:43,430
of samples, they will be
very close to the mean, which

647
00:50:43,430 --> 00:50:44,144
makes sense.

648
00:51:04,420 --> 00:51:06,270
So what's an example of this?

649
00:51:06,270 --> 00:51:14,010
Before proving it, example
of this theorem in practice

650
00:51:14,010 --> 00:51:16,605
can be seen in the casino.

651
00:51:22,530 --> 00:51:25,120
So for example, if
you're playing blackjack

652
00:51:25,120 --> 00:51:38,890
in a casino, when you're
playing against the casino,

653
00:51:38,890 --> 00:51:42,700
you have a very
small disadvantage.

654
00:51:42,700 --> 00:51:52,500
If you're playing at
the optimal strategy,

655
00:51:52,500 --> 00:51:56,380
you have-- does anybody
know the probability?

656
00:51:56,380 --> 00:52:00,460
It's about 48%, 49%.

657
00:52:00,460 --> 00:52:04,520
About 48% chance of winning.

658
00:52:09,160 --> 00:52:14,340
That means if you bet $1 at
the beginning of each round,

659
00:52:14,340 --> 00:52:22,605
the expected amount
you'll win is $0.48.

660
00:52:22,605 --> 00:52:28,060
The expected amount that the
casino will win is $0.52.

661
00:52:28,060 --> 00:52:30,760
But it's designed so
that the variance is

662
00:52:30,760 --> 00:52:37,030
so big that this expectation
is hidden, the mean is hidden.

663
00:52:37,030 --> 00:52:39,390
From the player's
point of view, you only

664
00:52:39,390 --> 00:52:41,390
have a very small sample.

665
00:52:41,390 --> 00:52:44,960
So it looks like the
mean doesn't matter,

666
00:52:44,960 --> 00:52:48,710
because the variance takes
over in a very short scale.

667
00:52:48,710 --> 00:52:50,730
But from the casino's
point of view,

668
00:52:50,730 --> 00:52:54,680
they're taking a
very large n there.

669
00:52:54,680 --> 00:53:02,720
So for each round, let's
say from the casino's

670
00:53:02,720 --> 00:53:13,500
point of view, it's
like taking, they

671
00:53:13,500 --> 00:53:20,520
are taking enormous value of n.

672
00:53:26,640 --> 00:53:27,660
n here.

673
00:53:27,660 --> 00:53:32,380
And that means as long as they
have the slightest advantage,

674
00:53:32,380 --> 00:53:34,993
they'll be winning money,
and a huge amount of money.

675
00:53:38,240 --> 00:53:41,690
And most games played in the
casinos are designed like this.

676
00:53:41,690 --> 00:53:45,730
It looks like the mean
is really close to 50%,

677
00:53:45,730 --> 00:53:47,840
but it's hidden,
because they designed it

678
00:53:47,840 --> 00:53:51,000
so the variance is big.

679
00:53:51,000 --> 00:53:53,180
But from the casino's
point of view,

680
00:53:53,180 --> 00:53:55,010
they have enough
players to play the game

681
00:53:55,010 --> 00:54:02,120
so that the law of large
numbers just makes them money.

682
00:54:07,770 --> 00:54:09,530
The moral is, don't
play blackjack.

683
00:54:12,240 --> 00:54:15,360
Play poker.

684
00:54:15,360 --> 00:54:19,790
The reason that the rule
of law of large numbers

685
00:54:19,790 --> 00:54:23,010
doesn't apply, at least
in this sense, to poker--

686
00:54:23,010 --> 00:54:24,220
can anybody explain why?

687
00:54:27,100 --> 00:54:32,000
It's because poker, you're
playing against other players.

688
00:54:32,000 --> 00:54:36,500
If you have an advantage, if
your skill-- if you believe

689
00:54:36,500 --> 00:54:38,980
that there is skill in poker--
if your skill is better

690
00:54:38,980 --> 00:54:41,330
than the other
player by, let's say,

691
00:54:41,330 --> 00:54:47,010
5% chance, then you have
an edge over that player.

692
00:54:47,010 --> 00:54:48,010
So you can win money.

693
00:54:48,010 --> 00:54:53,870
The only problem is that
because-- poker, you're

694
00:54:53,870 --> 00:54:55,691
not playing against the casino.

695
00:55:00,390 --> 00:55:04,770
Don't play against casino.

696
00:55:04,770 --> 00:55:06,530
But they still
have to make money.

697
00:55:06,530 --> 00:55:08,770
So what they do instead
is they take rake.

698
00:55:08,770 --> 00:55:12,350
So for each round
that the players play,

699
00:55:12,350 --> 00:55:15,740
they pay some fee to the casino.

700
00:55:15,740 --> 00:55:19,920
And how the casino makes
money at the poker table

701
00:55:19,920 --> 00:55:22,870
is by accumulating those fees.

702
00:55:22,870 --> 00:55:25,291
They're not taking
chances there.

703
00:55:25,291 --> 00:55:26,790
But from the player's
point of view,

704
00:55:26,790 --> 00:55:32,405
if you're better than the other
player, and the amount of edge

705
00:55:32,405 --> 00:55:35,630
you have over the other
player is larger than the fee

706
00:55:35,630 --> 00:55:38,000
that the casino
charges to you, then

707
00:55:38,000 --> 00:55:41,380
now you can apply law of large
numbers to yourself and win.

708
00:55:45,420 --> 00:55:50,360
And if you take an
example as poker,

709
00:55:50,360 --> 00:55:54,372
it looks like-- OK, I'm
not going to play poker.

710
00:55:54,372 --> 00:55:59,320
But if it's a hedge
fund, or if you're

711
00:55:59,320 --> 00:56:04,850
doing high-frequency trading,
that's the moral behind it.

712
00:56:04,850 --> 00:56:07,860
So that's the belief
you should have.

713
00:56:07,860 --> 00:56:10,760
You have to believe
that you have an edge.

714
00:56:10,760 --> 00:56:13,660
Even if you have a
tiny edge, if you

715
00:56:13,660 --> 00:56:16,400
can have enough
number of trials,

716
00:56:16,400 --> 00:56:21,000
if you can trade enough of times
using some strategy that you

717
00:56:21,000 --> 00:56:26,580
believe is winning over time,
then law of large numbers

718
00:56:26,580 --> 00:56:31,266
will take it from there and
will bring you money, profit.

719
00:56:34,920 --> 00:56:41,770
Of course, the problem is,
when the variance is big,

720
00:56:41,770 --> 00:56:45,210
your belief starts to fall.

721
00:56:45,210 --> 00:56:48,660
At least, that was the case for
me when I was playing poker.

722
00:56:48,660 --> 00:56:51,650
Because I believed
that I had an edge,

723
00:56:51,650 --> 00:56:55,520
but when there is
really swing, it

724
00:56:55,520 --> 00:56:59,680
looks like your
expectation is negative.

725
00:56:59,680 --> 00:57:01,885
And that's when you have
to believe in yourself.

726
00:57:05,590 --> 00:57:07,690
Yeah.

727
00:57:07,690 --> 00:57:09,480
That's when your
faith in mathematics

728
00:57:09,480 --> 00:57:11,929
is being challenged.

729
00:57:11,929 --> 00:57:12,720
It really happened.

730
00:57:15,290 --> 00:57:17,290
I hope it doesn't happen to you.

731
00:57:17,290 --> 00:57:22,730
Anyway, that's proof
law of large numbers.

732
00:57:22,730 --> 00:57:23,690
How do you prove it?

733
00:57:23,690 --> 00:57:24,690
The proof is quite easy.

734
00:57:27,840 --> 00:57:32,940
First of all, one observation--
expectation of X is just

735
00:57:32,940 --> 00:57:37,640
expectation of 1 over
n times sum of X_i's.

736
00:57:41,400 --> 00:57:52,471
And that, by linearity,
just becomes the sum of--

737
00:57:52,471 --> 00:57:55,883
and that's mu.

738
00:57:55,883 --> 00:57:56,383
OK.

739
00:57:56,383 --> 00:57:59,317
That's good.

740
00:57:59,317 --> 00:58:01,610
And then the variance,
what's the variance of X?

741
00:58:04,430 --> 00:58:09,750
That's the expectation
of X minus mu

742
00:58:09,750 --> 00:58:20,976
square, which is the expectation
sum over all i's, minus mu

743
00:58:20,976 --> 00:58:21,476
square.

744
00:58:24,344 --> 00:58:26,260
I'll group them.

745
00:58:26,260 --> 00:58:33,584
That's the expectation of 1 over
n sum of X_i minus mu square.

746
00:58:33,584 --> 00:58:35,580
i is from 1 to n.

747
00:58:43,570 --> 00:58:44,800
What did I do wrong?

748
00:58:44,800 --> 00:58:46,610
1 over n is inside the square.

749
00:58:46,610 --> 00:58:50,720
So I can take it out
and square, n square.

750
00:58:50,720 --> 00:58:53,660
And then you're summing
n terms of sigma square.

751
00:58:53,660 --> 00:58:57,145
So that is equal to
sigma square over n.

752
00:59:02,450 --> 00:59:04,110
That means the
effect of averaging

753
00:59:04,110 --> 00:59:08,600
n terms does not
affect your average,

754
00:59:08,600 --> 00:59:10,020
but it affects your variance.

755
00:59:13,510 --> 00:59:15,802
It divides your variance by n.

756
00:59:15,802 --> 00:59:18,890
If you take larger and
larger n, your variance

757
00:59:18,890 --> 00:59:20,080
gets smaller and smaller.

758
00:59:22,590 --> 00:59:25,970
And using that, we can
prove this statement.

759
00:59:25,970 --> 00:59:27,840
There's only one thing
you have to notice--

760
00:59:27,840 --> 00:59:30,510
that the probability
that x minus mu

761
00:59:30,510 --> 00:59:32,620
is greater than epsilon.

762
00:59:32,620 --> 00:59:35,840
When you multiply this
by epsilon square.

763
00:59:35,840 --> 00:59:41,230
This will be less than or
equal to the variance of x.

764
00:59:41,230 --> 00:59:42,780
The reason this
inequality holds is

765
00:59:42,780 --> 00:59:46,290
because variance X is defined
as the expectation of X minus mu

766
00:59:46,290 --> 00:59:48,200
square.

767
00:59:48,200 --> 00:59:52,340
For all the events when you have
X minus mu at least epsilon,

768
00:59:52,340 --> 00:59:54,260
your multiplying
factor X square will

769
00:59:54,260 --> 00:59:56,780
be at least epsilon square.

770
00:59:56,780 --> 01:00:00,350
This term will be at
least epsilon square

771
01:00:00,350 --> 01:00:03,520
when you fall into this event.

772
01:00:03,520 --> 01:00:07,100
So your variance has
to be at least that.

773
01:00:07,100 --> 01:00:11,971
And this is known to
be sigma square over n.

774
01:00:11,971 --> 01:00:15,704
So probability that
x minus mu is greater

775
01:00:15,704 --> 01:00:21,980
than epsilon is at most sigma
square over n epsilon squared.

776
01:00:21,980 --> 01:00:26,140
That means if you take n to go
to infinity, that goes to zero.

777
01:00:26,140 --> 01:00:29,590
So the probability that
you deviate from the mean

778
01:00:29,590 --> 01:00:33,187
by more than epsilon goes to 0.

779
01:00:33,187 --> 01:00:35,645
You can actually read out a
little bit more from the proof.

780
01:00:38,690 --> 01:00:41,635
It also tells a little bit
about the speed of convergence.

781
01:00:44,260 --> 01:00:50,230
So let's say you have a random
variable X. Your mean is 50.

782
01:00:50,230 --> 01:00:53,930
You epsilon is 0.1.

783
01:00:53,930 --> 01:00:55,830
So you want to know
the probability

784
01:00:55,830 --> 01:01:00,480
that you deviate from your
mean by more than 0.1.

785
01:01:00,480 --> 01:01:06,010
Let's say you want
to be 99% sure.

786
01:01:06,010 --> 01:01:14,812
Want to be 99% sure that X
minus mu is less than 0.1,

787
01:01:14,812 --> 01:01:18,120
or X minus 50 is less than 0.1.

788
01:01:18,120 --> 01:01:23,060
In that case, what you can do
is-- you want this to be 0.01.

789
01:01:23,060 --> 01:01:26,360
It has to be 0.01.

790
01:01:26,360 --> 01:01:29,800
So plug in that, plug in your
variance, plug in your epsilon.

791
01:01:29,800 --> 01:01:32,230
That will give you
some bound on n.

792
01:01:32,230 --> 01:01:34,190
If you have more than
that number of trials,

793
01:01:34,190 --> 01:01:38,113
you can be 99% sure that you
don't deviate from your mean

794
01:01:38,113 --> 01:01:40,680
by more than epsilon.

795
01:01:40,680 --> 01:01:42,700
So that does give
some estimate, but I

796
01:01:42,700 --> 01:01:46,150
should mention that this
is a very bad estimate.

797
01:01:46,150 --> 01:01:47,990
There are much more
powerful estimates

798
01:01:47,990 --> 01:01:48,970
that can be done here.

799
01:01:48,970 --> 01:01:50,770
That will give the order of
magnitude-- I didn't really

800
01:01:50,770 --> 01:01:53,440
calculate here, but it looks
like it's close to millions.

801
01:01:53,440 --> 01:01:55,900
It has to be close to millions.

802
01:01:55,900 --> 01:02:00,360
But in practice, if you use
a lot more powerful tool

803
01:02:00,360 --> 01:02:05,008
of estimating it, it should
only be hundreds or at most

804
01:02:05,008 --> 01:02:05,508
thousands.

805
01:02:13,460 --> 01:02:15,960
So the tool you'll use there
is moment-generating functions,

806
01:02:15,960 --> 01:02:18,360
something similar to
moment-generating functions.

807
01:02:18,360 --> 01:02:20,412
But I will not go into it.

808
01:02:20,412 --> 01:02:20,995
Any questions?

809
01:02:23,610 --> 01:02:25,090
OK.

810
01:02:25,090 --> 01:02:28,552
For those who already saw
law of large numbers before,

811
01:02:28,552 --> 01:02:30,510
the name suggests there's
also something called

812
01:02:30,510 --> 01:02:32,250
strong law of large numbers.

813
01:02:35,982 --> 01:02:41,380
In that theorem, your
conclusion is stronger.

814
01:02:41,380 --> 01:02:45,005
So the convergence is stronger
than this type of convergence.

815
01:02:47,810 --> 01:02:51,610
And also, the
condition I gave here

816
01:02:51,610 --> 01:02:53,580
is a very strong condition.

817
01:02:53,580 --> 01:02:56,020
The same conclusion
is true even if you

818
01:02:56,020 --> 01:02:58,840
weaken some of the conditions.

819
01:02:58,840 --> 01:03:01,580
So for example, the variance
does not have to exist.

820
01:03:01,580 --> 01:03:06,480
It can be replaced by some
other condition, and so on.

821
01:03:06,480 --> 01:03:08,860
But here, I just want
it to be a simple form

822
01:03:08,860 --> 01:03:11,350
so that it's easy to prove.

823
01:03:11,350 --> 01:03:14,274
And you at least get the
spirit of what's happening.

824
01:03:20,480 --> 01:03:26,140
Now let's move on to the next
topic-- central limit theorem.

825
01:04:11,240 --> 01:04:16,880
So weak law of
large numbers says

826
01:04:16,880 --> 01:04:22,210
that if you have IID random
variables, 1 over n times

827
01:04:22,210 --> 01:04:27,400
sum over X_i's converges to mu,
the mean, in some weak sense.

828
01:04:31,210 --> 01:04:33,730
And the reason it happened
was because this had

829
01:04:33,730 --> 01:04:39,157
mean mu and variance
sigma square over n.

830
01:04:43,660 --> 01:04:49,730
We've exploited the fact that
variance vanishes to get this.

831
01:04:49,730 --> 01:04:53,560
So the question is, what
happens if you replace 1 over n

832
01:04:53,560 --> 01:04:54,903
by 1 over square root n?

833
01:04:59,250 --> 01:05:04,590
What happens if-- for
the random variable

834
01:05:04,590 --> 01:05:08,300
1 over square root n times X_i?

835
01:05:14,180 --> 01:05:16,990
The reason I'm making this
choice of 1 over square root n

836
01:05:16,990 --> 01:05:19,310
is because if you
make this choice,

837
01:05:19,310 --> 01:05:26,330
now the average has mean mu
and variance sigma square just

838
01:05:26,330 --> 01:05:28,770
as in X_i's.

839
01:05:28,770 --> 01:05:34,981
So this is the same as X_i.

840
01:05:40,910 --> 01:05:44,330
Then what should it look like?

841
01:05:44,330 --> 01:05:46,730
If the random variable is the
same mean and same variance

842
01:05:46,730 --> 01:05:52,120
as your original random
variable, the distribution

843
01:05:52,120 --> 01:05:54,795
of this, should it look like
the distribution of X_i?

844
01:06:00,530 --> 01:06:01,290
If mean is mu.

845
01:06:01,290 --> 01:06:04,170
Thank you very much.

846
01:06:04,170 --> 01:06:05,535
The case when mean is 0.

847
01:06:13,160 --> 01:06:13,660
OK.

848
01:06:13,660 --> 01:06:17,620
For this special case,
will it look like X_i,

849
01:06:17,620 --> 01:06:20,820
or will it not look like X_i?

850
01:06:20,820 --> 01:06:24,260
If it doesn't look like X_i,
can we say anything interesting

851
01:06:24,260 --> 01:06:27,590
about the distribution of this?

852
01:06:27,590 --> 01:06:31,480
And central limit theorem
answers this question.

853
01:06:31,480 --> 01:06:34,980
When I first saw it, I thought
it was really interesting.

854
01:06:34,980 --> 01:06:37,161
Because normal
distribution comes up here.

855
01:06:40,250 --> 01:06:42,050
And that's probably
one of the reasons

856
01:06:42,050 --> 01:06:45,010
that normal distribution
is so universal.

857
01:06:45,010 --> 01:06:50,310
Because when you take
many independent events

858
01:06:50,310 --> 01:06:53,270
and take the average
in this sense,

859
01:06:53,270 --> 01:06:56,765
their distribution converges
to a normal distribution.

860
01:06:56,765 --> 01:06:57,265
Yes?

861
01:06:57,265 --> 01:06:59,660
AUDIENCE: How did you get
mean equals [INAUDIBLE]?

862
01:06:59,660 --> 01:07:00,970
PROFESSOR: I didn't get it.

863
01:07:00,970 --> 01:07:02,678
I assumed it if X-- yeah.

864
01:07:29,600 --> 01:07:41,480
So theorem: let
X_1, X_2, to X_n be

865
01:07:41,480 --> 01:07:51,960
IID random variables with mean,
this time, mu and variance

866
01:07:51,960 --> 01:07:55,020
sigma squared.

867
01:07:55,020 --> 01:07:59,308
And let X-- or Y_n.

868
01:08:01,940 --> 01:08:10,023
Y_n be square root n times
1 over n, of X_i minus mu.

869
01:08:24,813 --> 01:08:41,080
Then the distribution
of Y_n converges

870
01:08:41,080 --> 01:08:50,056
to that of normal distribution
with mean 0 and variance sigma.

871
01:08:55,050 --> 01:08:57,350
What this means-- I'll
write it down again--

872
01:08:57,350 --> 01:09:01,790
it means for all x,
probability that Y_n

873
01:09:01,790 --> 01:09:03,790
is less than or
equal to x converges

874
01:09:03,790 --> 01:09:07,722
the probability that normal
distribution is less than

875
01:09:07,722 --> 01:09:08,910
or equal to x.

876
01:09:14,140 --> 01:09:16,220
What's really
interesting here is,

877
01:09:16,220 --> 01:09:20,340
no matter what distribution
you had in the beginning,

878
01:09:20,340 --> 01:09:24,090
if we average it
out in this sense,

879
01:09:24,090 --> 01:09:25,965
then you converge to
the normal distribution.

880
01:09:35,429 --> 01:09:37,720
Any questions about this
statement, or any corrections?

881
01:09:40,490 --> 01:09:43,545
Any mistakes that I made?

882
01:09:43,545 --> 01:09:46,015
OK.

883
01:09:46,015 --> 01:09:47,003
Here's the proof.

884
01:09:50,970 --> 01:09:54,400
I will prove it when the
moment-generating function

885
01:09:54,400 --> 01:09:54,900
exists.

886
01:09:54,900 --> 01:09:56,816
So assume that the
moment-generating functions

887
01:09:56,816 --> 01:09:58,010
exists.

888
01:09:58,010 --> 01:10:04,963
So proof assuming
m of X_i exists.

889
01:10:16,810 --> 01:10:19,860
So remember that theorem.

890
01:10:19,860 --> 01:10:22,160
Try to recall that
theorem where if you

891
01:10:22,160 --> 01:10:25,130
know that the moment-generating
function of Y_n's converges

892
01:10:25,130 --> 01:10:29,250
to the moment-generating
function of the normal, then

893
01:10:29,250 --> 01:10:30,210
we have the statement.

894
01:10:30,210 --> 01:10:31,400
The distribution converges.

895
01:10:31,400 --> 01:10:34,328
So that's the statement
we're going to use.

896
01:10:34,328 --> 01:10:37,100
That means our goal is to prove
that the moment-generating

897
01:10:37,100 --> 01:10:43,020
function of these Y_n's converge
to the moment-generating

898
01:10:43,020 --> 01:10:51,088
function of the normal for
all t, pointwise convergence.

899
01:10:56,360 --> 01:11:00,080
And this part is well known.

900
01:11:00,080 --> 01:11:01,455
I'll just write it down.

901
01:11:01,455 --> 01:11:06,094
It's known to be e to the t
square sigma square over 2.

902
01:11:08,818 --> 01:11:11,173
That just can be computed.

903
01:11:18,610 --> 01:11:21,270
So we want to somehow show that
the moment-generating function

904
01:11:21,270 --> 01:11:25,738
of this Y_n converges to that.

905
01:11:25,738 --> 01:11:29,440
The moment-generating
function of Y_n

906
01:11:29,440 --> 01:11:36,102
is equal to expectation
of e to t Y_n.

907
01:11:42,544 --> 01:11:50,496
e to the t, 1 over square
root n, sum of X_i minus mu.

908
01:11:54,490 --> 01:11:57,680
And then because each of
the X_i's are independent,

909
01:11:57,680 --> 01:11:59,403
this sum will split
into products.

910
01:12:02,650 --> 01:12:14,059
Product of-- let
me split it better.

911
01:12:14,059 --> 01:12:19,240
Meets the expectation-- we
didn't use independent yet.

912
01:12:19,240 --> 01:12:26,504
Sum becomes products of e to
the t, 1 over square root n, X_i

913
01:12:26,504 --> 01:12:27,462
minus mu.

914
01:12:34,650 --> 01:12:36,380
And then because
they're independent,

915
01:12:36,380 --> 01:12:37,530
this product can go out.

916
01:12:40,925 --> 01:12:49,996
Equal to the product from 1 to
n expectation e to the t times

917
01:12:49,996 --> 01:12:50,984
square root n--

918
01:12:56,160 --> 01:12:56,660
OK.

919
01:12:56,660 --> 01:12:58,159
Now they're identically
distributed,

920
01:12:58,159 --> 01:13:00,900
so you just have to take
the n-th power of that.

921
01:13:00,900 --> 01:13:03,923
That's equal to the
expectation of e

922
01:13:03,923 --> 01:13:11,920
to the t over square root n,
X_i minus mu, to the n-th power.

923
01:13:11,920 --> 01:13:15,420
Now we'll do some estimation.

924
01:13:15,420 --> 01:13:19,450
So use the Taylor
expansion of this.

925
01:13:19,450 --> 01:13:30,002
What we get is expectation of 1
plus that, t over square root n

926
01:13:30,002 --> 01:13:36,990
xi minus mu, plus 1 over
2 factorial, that squared,

927
01:13:36,990 --> 01:13:43,760
t over square root n,
xi minus mu squared,

928
01:13:43,760 --> 01:13:48,748
plus 1 over 3 factorial,
that cubed plus so on.

929
01:13:55,050 --> 01:13:57,990
Then that's equal to 1--
Ah, to the n-th power.

930
01:14:02,920 --> 01:14:06,890
The linearity of
expectation, 1 comes out.

931
01:14:06,890 --> 01:14:12,830
Second term is 0,
because X_i have mean mu.

932
01:14:12,830 --> 01:14:15,020
So that disappears.

933
01:14:15,020 --> 01:14:26,930
This term-- we have 1 over 2,
t squared over n, X_i minus mu

934
01:14:26,930 --> 01:14:29,370
square.

935
01:14:29,370 --> 01:14:31,590
X_i minus mu square, when
you take expectation,

936
01:14:31,590 --> 01:14:35,550
that will be sigma square.

937
01:14:35,550 --> 01:14:39,720
And then the terms after
that, because we're

938
01:14:39,720 --> 01:14:42,850
only interested in
proving that for fixed t,

939
01:14:42,850 --> 01:14:46,160
this converges-- so we're only
proving pointwise convergence.

940
01:14:46,160 --> 01:14:49,030
You may consider t
as a fixed number.

941
01:14:49,030 --> 01:14:52,540
So as n goes to infinity--
if n is really, really large,

942
01:14:52,540 --> 01:14:56,730
all these terms will be
smaller order of magnitude

943
01:14:56,730 --> 01:15:00,830
than n, 1 over n.

944
01:15:00,830 --> 01:15:02,270
Something like that happens.

945
01:15:08,530 --> 01:15:11,250
And that's happening
because we're fixed.

946
01:15:11,250 --> 01:15:14,260
For fixed t, we
have to prove it.

947
01:15:14,260 --> 01:15:16,292
So if we're saying
something uniformly about t,

948
01:15:16,292 --> 01:15:18,390
that's no longer true.

949
01:15:18,390 --> 01:15:21,060
Now we go back to
the exponential form.

950
01:15:21,060 --> 01:15:26,540
So this is pretty much
just e to that term,

951
01:15:26,540 --> 01:15:30,900
1 over 2 t square
sigma square over n

952
01:15:30,900 --> 01:15:37,370
plus little o of 1 over
n to the n-th power.

953
01:15:37,370 --> 01:15:42,980
Now, that n can be
multiplied to cancel out.

954
01:15:42,980 --> 01:15:46,640
And we see that it's e to t
square sigma square over 2

955
01:15:46,640 --> 01:15:48,342
plus the little o of 1.

956
01:15:48,342 --> 01:15:50,370
So if you take n
to go to infinity,

957
01:15:50,370 --> 01:15:55,840
that term disappears,
and we prove

958
01:15:55,840 --> 01:15:57,410
that it converges to that.

959
01:16:00,100 --> 01:16:04,516
And then by the theorem that I
stated before, if we have this,

960
01:16:04,516 --> 01:16:06,182
we know that the
distribution converges.

961
01:16:09,880 --> 01:16:10,500
Any questions?

962
01:16:13,760 --> 01:16:14,260
OK.

963
01:16:14,260 --> 01:16:15,515
I'll make one final remark.

964
01:16:29,009 --> 01:16:42,640
So suppose there is a random
variable x whose mean we do not

965
01:16:42,640 --> 01:16:44,865
know, whose mean is unknown.

966
01:16:53,670 --> 01:16:55,710
Our goal is to
estimate the mean.

967
01:16:58,970 --> 01:17:02,730
And one way to do that is by
taking many independent trials

968
01:17:02,730 --> 01:17:05,220
of this random variable.

969
01:17:05,220 --> 01:17:21,680
So take independent trials X_1,
X_2, to X_n, and use 1 over--

970
01:17:21,680 --> 01:17:22,250
X_1 plus...

971
01:17:22,250 --> 01:17:23,565
X_n as our estimator.

972
01:17:32,960 --> 01:17:34,990
Then the law of large
numbers says that this

973
01:17:34,990 --> 01:17:36,750
will be very close to the mean.

974
01:17:36,750 --> 01:17:39,840
So if you take n
to be large enough,

975
01:17:39,840 --> 01:17:42,100
you will more than likely
have some value which

976
01:17:42,100 --> 01:17:44,190
is very close to the mean.

977
01:17:44,190 --> 01:17:47,050
And then the central
limit theorem

978
01:17:47,050 --> 01:17:53,530
tells you how the
distribution of this variable

979
01:17:53,530 --> 01:17:55,915
is around the mean.

980
01:17:55,915 --> 01:17:57,920
So we don't know what
the real value is,

981
01:17:57,920 --> 01:18:00,620
but we know that
the distribution

982
01:18:00,620 --> 01:18:02,980
of the value that
we will obtain here

983
01:18:02,980 --> 01:18:05,048
is something like
that around the mean.

984
01:18:09,340 --> 01:18:17,080
And because normal distribution
have very small tails,

985
01:18:17,080 --> 01:18:21,900
the tail distributions
is really small,

986
01:18:21,900 --> 01:18:23,950
we will get really
close really fast.

987
01:18:27,290 --> 01:18:34,387
And this is known as the maximum
likelihood estimator, is it?

988
01:18:37,670 --> 01:18:38,310
OK, yeah.

989
01:18:38,310 --> 01:18:39,980
For some distributions,
it's better

990
01:18:39,980 --> 01:18:44,080
to take some other estimator.

991
01:18:44,080 --> 01:18:47,280
Which is quite interesting.

992
01:18:47,280 --> 01:18:50,015
At least my intuition is to
take this for every single case,

993
01:18:50,015 --> 01:18:52,890
looks like that will
be a good choice.

994
01:18:52,890 --> 01:18:54,680
But it turns out that
that's not the case;

995
01:18:54,680 --> 01:18:59,492
for some distributions there's
a better choice than this.

996
01:18:59,492 --> 01:19:03,210
And Peter will
later talk about it.

997
01:19:06,340 --> 01:19:09,960
If you're interested
in, come back.

998
01:19:09,960 --> 01:19:13,960
And that's it for
today, any questions?

999
01:19:13,960 --> 01:19:17,875
So next Tuesday we will
have an outside speaker,

1000
01:19:17,875 --> 01:19:21,256
and it will be on bonds.

1001
01:19:21,256 --> 01:19:24,883
and I don't think anything from
linear algebra will be here.