1
00:00:00,000 --> 00:00:07,000
So just trying to remind you that
the replication fork looks something

2
00:00:07,000 --> 00:00:15,000
like this where 5 prime to 3 prime
and 5 prime to 3 prime.

3
00:00:15,000 --> 00:00:22,000
This is what's known as the leading
strand because DNA,

4
00:00:22,000 --> 00:00:30,000
the synthesis of the new strand can
go --

5
00:00:30,000 --> 00:00:37,000
Which is going 5 prime to 3 prime is

6
00:00:37,000 --> 00:00:43,000
going in the same direction as the
movement of the replication fork.

7
00:00:43,000 --> 00:00:49,000
The other strand, which is known as
the lagging strand,

8
00:00:49,000 --> 00:00:55,000
the DNA synthesis is actually going
backwards to the movement of the

9
00:00:55,000 --> 00:01:01,000
replication fork,
which means it has to go and then

10
00:01:01,000 --> 00:01:06,000
start up here and go again.
And it's continually jumping.

11
00:01:06,000 --> 00:01:10,000
And I told you that the little RNA
primer is used to start each strand.

12
00:01:10,000 --> 00:01:15,000
And then the DNA polymerase is able
to elongate that.

13
00:01:15,000 --> 00:01:20,000
And then at the end these little
nicks in here,

14
00:01:20,000 --> 00:01:24,000
the RNA has to be removed,
fill in the gap and then it's sealed

15
00:01:24,000 --> 00:01:29,000
up by the enzyme DNA ligase,
which we'll talk about when we talk

16
00:01:29,000 --> 00:01:34,000
about recombinant DNA.
Someone asked,

17
00:01:34,000 --> 00:01:38,000
I had mentioned why this strategy of
using RNA was beneficial,

18
00:01:38,000 --> 00:01:42,000
and that has to do with the fact
that the fidelity,

19
00:01:42,000 --> 00:01:46,000
which is going to be the next thing
I'm going to focus on of DNA

20
00:01:46,000 --> 00:01:50,000
replication as not,
you can get a much higher accuracy

21
00:01:50,000 --> 00:01:54,000
if you have the end of a primer
already there and then carry out the

22
00:01:54,000 --> 00:01:58,000
chemistry in there.
No enzyme has ever achieved the

23
00:01:58,000 --> 00:02:02,000
accuracy that you see in DNA
replication if it's

24
00:02:02,000 --> 00:02:06,000
starting a strand.
So RNA polymerase,

25
00:02:06,000 --> 00:02:10,000
which constantly starts strands to
make RNA copies,

26
00:02:10,000 --> 00:02:14,000
as we'll talk about,
is not as accurate as DNA

27
00:02:14,000 --> 00:02:18,000
replication.  And by putting a
little bit of RNA,

28
00:02:18,000 --> 00:02:22,000
because the cell has to start a new
strand.  Before it gets here there's

29
00:02:22,000 --> 00:02:26,000
no strand at all on this lagging
strand so it needs to make this

30
00:02:26,000 --> 00:02:30,000
little RNA primer.
It needs to make a little primer.

31
00:02:30,000 --> 00:02:34,000
And by making it out of RNA then it
can tell what doesn't belong there.

32
00:02:34,000 --> 00:02:39,000
It doesn't matter if it's not quite
as accurate as the rest of DNA

33
00:02:39,000 --> 00:02:43,000
replications because it's going to
take it out anyway and fill it in

34
00:02:43,000 --> 00:02:48,000
using the DNA polymerase.
And if you think about that maybe

35
00:02:48,000 --> 00:02:52,000
you can see one of the reasons that
the cell has chosen or nature has

36
00:02:52,000 --> 00:02:57,000
chosen through evolution to use
little RNAs to begin the strands.

37
00:02:57,000 --> 00:03:01,000
OK.  Well, in any case, the
fidelity of DNA replication is

38
00:03:01,000 --> 00:03:05,000
really pretty amazing.
Incidentally, just speaking of DNA,

39
00:03:05,000 --> 00:03:09,000
many of you wrote some very
thoughtful things about Vernon

40
00:03:09,000 --> 00:03:12,000
Ingram's visit.
I didn't give him a whole lot of

41
00:03:12,000 --> 00:03:15,000
warning and he had to go and change
his schedule and move meetings

42
00:03:15,000 --> 00:03:19,000
around in order to come to talk to
you.  And it was very nice of you.

43
00:03:19,000 --> 00:03:22,000
Many of you wrote some very
thoughtful things,

44
00:03:22,000 --> 00:03:25,000
which I'm going to pass onto him.
I want him to know that many of you

45
00:03:25,000 --> 00:03:29,000
appreciated his visit.
I also saw a lot of you reacted to

46
00:03:29,000 --> 00:03:33,000
his advice about crowded labs.
That has been my experience,

47
00:03:33,000 --> 00:03:37,000
too.  And one thing about the
scientific process is it's not just

48
00:03:37,000 --> 00:03:41,000
one person.  You're in with a group
of people, just as Vernon described,

49
00:03:41,000 --> 00:03:45,000
and that group of people becomes the
creative engine that drives all the

50
00:03:45,000 --> 00:03:49,000
science within that lab.
And so you're not only picking your

51
00:03:49,000 --> 00:03:53,000
project, you're looking for a group
of people to work with.

52
00:03:53,000 --> 00:03:57,000
And, as Vernon said, if the lab is
really doing hot stuff they tend to

53
00:03:57,000 --> 00:04:01,000
attract a lot of people.
So a crowded lab can sometimes be a

54
00:04:01,000 --> 00:04:04,000
really good indicator.
No absolutes, and there's an

55
00:04:04,000 --> 00:04:08,000
exception to everything,
but that was a good piece of advice

56
00:04:08,000 --> 00:04:12,000
he gave you if you're looking for
UROPs sometimes.

57
00:04:12,000 --> 00:04:16,000
OK.  So, anyway,
DNA fidelity.  Remember I said we've

58
00:04:16,000 --> 00:04:20,000
gone from, our bodies have somewhere
from like 10 to 20 billion miles of

59
00:04:20,000 --> 00:04:24,000
DNA in them if we could take all the
human DNA and stretch it out?

60
00:04:24,000 --> 00:04:28,000
But that fidelity is done at an
error rate of about one mistake to

61
00:04:28,000 --> 00:04:32,000
every ten to the minus tenth
nucleotides replicated.

62
00:04:32,000 --> 00:04:37,000
Which I said if you were typing all
the time it would be like sort of

63
00:04:37,000 --> 00:04:43,000
making one mistake every 38 years.
So it's an astonishing degree of

64
00:04:43,000 --> 00:04:49,000
fidelity.  Something that's beyond
anything within our experience.

65
00:04:49,000 --> 00:04:55,000
And there are three principles that
go.  One is polymerase is really

66
00:04:55,000 --> 00:05:01,000
good at the base pair recognition
telling that an A is paired with a T

67
00:05:01,000 --> 00:05:06,000
or a G is paired with a C.
And discriminating against

68
00:05:06,000 --> 00:05:12,000
everything else there's a phenomenon
known as proofreading,

69
00:05:12,000 --> 00:05:18,000
and I'll tell you how that works.
And then there's a third system

70
00:05:18,000 --> 00:05:24,000
called mismatch repair.
And all three of these contribute

71
00:05:24,000 --> 00:05:30,000
to this very, very low-frequency of
errors, one mistake for

72
00:05:30,000 --> 00:05:36,000
approximately every ten to the tenth
nucleotides replicated.

73
00:05:36,000 --> 00:05:40,000
So the first thing is I've pointed
out to you several times that if you

74
00:05:40,000 --> 00:05:45,000
draw the hydrogen bonds between an A
and a T base pair,

75
00:05:45,000 --> 00:05:49,000
the two hydrogen bonds or the three
hydrogen bonds between a G and C

76
00:05:49,000 --> 00:05:54,000
base pair, that the shapes of this
pair and that pair are virtually

77
00:05:54,000 --> 00:05:59,000
identical.  You can pick them up and
lay it right down on top.

78
00:05:59,000 --> 00:06:02,000
Now, if you actually look at it
you'll see you could draw some base

79
00:06:02,000 --> 00:06:05,000
pairs between,
for example, a G and a T.

80
00:06:05,000 --> 00:06:09,000
In fact, you can draw two hydrogen
bonds, which is the same as between

81
00:06:09,000 --> 00:06:12,000
an A and a T.  But the one thing I
hope you can see,

82
00:06:12,000 --> 00:06:16,000
just from the shapes even without
being able to see the individual

83
00:06:16,000 --> 00:06:19,000
atoms, is that a GT base pair
doesn't have the same shape as the

84
00:06:19,000 --> 00:06:23,000
correct base pairs.
So when I showed you that little

85
00:06:23,000 --> 00:06:26,000
movie the other day where this is
the template nucleotide,

86
00:06:26,000 --> 00:06:30,000
this is the incoming nucleotide and
there's this alpha helix

87
00:06:30,000 --> 00:06:34,000
that's swinging up.
What's happening in there is that

88
00:06:34,000 --> 00:06:40,000
the enzyme is checking the way that
the incoming nucleotide is the

89
00:06:40,000 --> 00:06:45,000
correct shape to go with the base
pair.  And you can sort of see it's

90
00:06:45,000 --> 00:06:50,000
flipping it right into a very narrow
little slot in the enzyme.

91
00:06:50,000 --> 00:06:56,000
So it's not only asking for sort of
hydrogen bonds, it's asking

92
00:06:56,000 --> 00:07:01,000
for the exact shape.
If you just did it by thermodynamic

93
00:07:01,000 --> 00:07:07,000
grounds you'd make about one mistake
in a hundred because that's about

94
00:07:07,000 --> 00:07:13,000
the discrimination between the
correct base pairs and some of these

95
00:07:13,000 --> 00:07:19,000
other ones.  This works so well.
You get more like one mistake in

96
00:07:19,000 --> 00:07:25,000
ten to the fourth or ten to the
fifth.  We're still quite a distance

97
00:07:25,000 --> 00:07:31,000
away from the ten to the tenth,
but this is one of the things.  It's

98
00:07:31,000 --> 00:07:37,000
looking for the correct shape
of the base pair.

99
00:07:37,000 --> 00:07:41,000
Now, the second thing that helps
with fidelity is a phenomena known

100
00:07:41,000 --> 00:07:54,000
as proofreading --

101
00:07:54,000 --> 00:08:00,000
-- exonuclease.  Things
called a nuclease.

102
00:08:00,000 --> 00:08:12,000
That means it can degrade DNA.
And the exo works at an end.

103
00:08:12,000 --> 00:08:19,000
And, furthermore,

104
00:08:19,000 --> 00:08:24,000
the directionality of this
proofreading was something that

105
00:08:24,000 --> 00:08:30,000
puzzled people initially because
it's going 3 prime to 5 prime.

106
00:08:30,000 --> 00:08:35,000
And when people started to purify
DNA polymerases or complexes of DNA

107
00:08:35,000 --> 00:08:40,000
polymerases involved in replication
there seemed to be a puzzle because

108
00:08:40,000 --> 00:08:46,000
the polymerase,
as I've told you,

109
00:08:46,000 --> 00:08:51,000
goes 5 prime to 3 prime,
but the same enzyme complex had an

110
00:08:51,000 --> 00:08:57,000
exonuclease that went in
the opposite direction.

111
00:08:57,000 --> 00:09:01,000
So this seemed very peculiar at
first in the sense if you were

112
00:09:01,000 --> 00:09:06,000
trying to polymerase DNA in this way
why in that same enzyme would you

113
00:09:06,000 --> 00:09:10,000
have something that wanted to
degrade DNA in the other way?

114
00:09:10,000 --> 00:09:15,000
And the answer turned out that this
was known as a proofreading

115
00:09:15,000 --> 00:09:20,000
exonuclease, as I've put up here.
And here's the principle of how it

116
00:09:20,000 --> 00:09:24,000
works.  Suppose you were replicating
the DNA and there was a G.

117
00:09:24,000 --> 00:09:29,000
And if you put a C in there it very
quickly goes on and continues

118
00:09:29,000 --> 00:09:34,000
the replication.
If it puts in a T,

119
00:09:34,000 --> 00:09:38,000
let's say, this is not a very good
base pair.  It wouldn't have the

120
00:09:38,000 --> 00:09:42,000
right shape.  So when the enzyme
came up looking for that 3 prime

121
00:09:42,000 --> 00:09:47,000
hydroxyl, which would be right at
the end of that T,

122
00:09:47,000 --> 00:09:51,000
things are not in the right place.
And so the polymerase activity

123
00:09:51,000 --> 00:09:56,000
slows down.  And as that primer
terminus, if it sits there for a

124
00:09:56,000 --> 00:10:00,000
little bit, it's able to just peel
off the DNA, flip up,

125
00:10:00,000 --> 00:10:04,000
and there's this function that does
just what you'd do if you were

126
00:10:04,000 --> 00:10:09,000
typing and you made a mistake.
You'd just hit the delete key and

127
00:10:09,000 --> 00:10:13,000
take off the last nucleotide that
you did.  And I have a little movie

128
00:10:13,000 --> 00:10:17,000
showing you that.
This is a crystal structure.

129
00:10:17,000 --> 00:10:21,000
This is the DNA template.  And the
polymerase catalytic activity site

130
00:10:21,000 --> 00:10:25,000
is right here.
And in this little movie it's just

131
00:10:25,000 --> 00:10:30,000
added an incorrect base pair and the
polymerase is sort of stalled.

132
00:10:30,000 --> 00:10:35,000
And the actual nuclease function is
physically separate on the protein

133
00:10:35,000 --> 00:10:41,000
structure.  But what you'll see in
the movie is that if the polymerase

134
00:10:41,000 --> 00:10:47,000
cannot go very well eventually this
thing will come up and it will chop

135
00:10:47,000 --> 00:10:53,000
off one nucleotide,
come back and try it again.

136
00:10:53,000 --> 00:10:59,000
Let's see.  I think if we do this,
oopsy-daisey.  Let me see if I can

137
00:10:59,000 --> 00:11:04,000
get this to work here.
Nope, it's not working.

138
00:11:04,000 --> 00:11:09,000
OK.  Well, anyway, I'm going to
skip it for right now.

139
00:11:09,000 --> 00:11:13,000
I don't want to waste time.
But, in any case, the end would go

140
00:11:13,000 --> 00:11:18,000
up here and it would take off one
nucleotide.  So there at least are

141
00:11:18,000 --> 00:11:23,000
two of the ways that the polymerase
is able to work with such fidelity.

142
00:11:23,000 --> 00:11:28,000
It selects for the correct base
pair shape.

143
00:11:28,000 --> 00:11:32,000
And then after it's done in addition
it sort of looks back,

144
00:11:32,000 --> 00:11:36,000
just as if you were a very slow
typist, and every time you typed a

145
00:11:36,000 --> 00:11:40,000
letter you looked back and said did
I make a mistake?

146
00:11:40,000 --> 00:11:44,000
And if you made a mistake then
you'd delete and then just try again.

147
00:11:44,000 --> 00:11:48,000
And that gets the cell another
maybe two orders of magnitude of

148
00:11:48,000 --> 00:11:52,000
accuracy.  So we're up to about one
mistake in ten to the seventh base

149
00:11:52,000 --> 00:11:56,000
pairs replicated.
The third system,

150
00:11:56,000 --> 00:12:00,000
which is called mismatched repair,
turns out to be very important for a

151
00:12:00,000 --> 00:12:04,000
whole variety of reasons.
And before I tell you about it,

152
00:12:04,000 --> 00:12:08,000
I want to first introduce the idea
of DNA repair in general.

153
00:12:08,000 --> 00:12:12,000
One of the things that's wonderful
about DNA --

154
00:12:12,000 --> 00:12:19,000
-- as you've learned,

155
00:12:19,000 --> 00:12:22,000
is it's got the information in two
copies.  It's in a complimentary

156
00:12:22,000 --> 00:12:25,000
form but it's like having the
photograph and the negative.

157
00:12:25,000 --> 00:12:29,000
And if your kid sister pokes a hole
with a pair of scissors through the

158
00:12:29,000 --> 00:12:32,000
picture of your boyfriend or your
girlfriend, you're not really in

159
00:12:32,000 --> 00:12:35,000
trouble as long as you've got the
negative because you can get the

160
00:12:35,000 --> 00:12:39,000
information back again.
And that same principle applies in

161
00:12:39,000 --> 00:12:43,000
DNA repair.  So if you have some
kind of lesion in DNA,

162
00:12:43,000 --> 00:12:48,000
and this might have come from going
outside in the sunlight,

163
00:12:48,000 --> 00:12:53,000
your DNA absorbs in the UV and it
undergoes photoreactions,

164
00:12:53,000 --> 00:12:57,000
they tend, for the most part,
to just effect one of the two

165
00:12:57,000 --> 00:13:03,000
strands of DNA.
Or if you smoke,

166
00:13:03,000 --> 00:13:09,000
which I hope none of you do,
there are many chemicals in smoke

167
00:13:09,000 --> 00:13:15,000
that will react with DNA,
and they'll modify one strand.

168
00:13:15,000 --> 00:13:21,000
And so what the cell has is a
system that has many kinds of repair

169
00:13:21,000 --> 00:13:27,000
systems, but it has a special type
of repair system known as nucleotide

170
00:13:27,000 --> 00:13:33,000
excision repair.
And you could think of this as a

171
00:13:33,000 --> 00:13:39,000
protein machine that constantly
scans the DNA looking for little

172
00:13:39,000 --> 00:13:44,000
distortions.  And if it finds it
then what it needs to do is it needs

173
00:13:44,000 --> 00:13:50,000
to make cuts, remove the DNA and
make a little gap.

174
00:13:50,000 --> 00:13:55,000
And now you can see what it can do
now.  Once it's got a little gap the

175
00:13:55,000 --> 00:14:01,000
information over here is
a complimentary form.

176
00:14:01,000 --> 00:14:05,000
So if a DNA polymerase were to come
along it could fill in that gap and

177
00:14:05,000 --> 00:14:10,000
seal it up and then you'd be back to
ordinary DNA, the lesion would be

178
00:14:10,000 --> 00:14:15,000
gone.  And I made a silly little
PowerPoint thing here to show it.

179
00:14:15,000 --> 00:14:20,000
So if you were to, say, damage the
guanine with something,

180
00:14:20,000 --> 00:14:25,000
say one of the carcinogens you find
in cigarette smoke,

181
00:14:25,000 --> 00:14:30,000
you could think of this protein
machine as being a sort of pair of

182
00:14:30,000 --> 00:14:35,000
scissors that have a conditionality
in them.

183
00:14:35,000 --> 00:14:38,000
As this protein machine scans along
the DNA the scissors aren't

184
00:14:38,000 --> 00:14:42,000
activated until it recognizes
there's a distortion here,

185
00:14:42,000 --> 00:14:46,000
at which point then it senses that
there's some bump in the DNA.

186
00:14:46,000 --> 00:14:50,000
And it's very cleaver the way it
does it because the nuclease

187
00:14:50,000 --> 00:14:54,000
activities, the things that are
going to cut the DNA are actually

188
00:14:54,000 --> 00:14:58,000
some distance away,
a few nucleotides away from the

189
00:14:58,000 --> 00:15:02,000
lesion.
So even if this is distorting the

190
00:15:02,000 --> 00:15:06,000
DNA, the scissors are able to work
out here and out here.

191
00:15:06,000 --> 00:15:10,000
It makes two cuts.  That was a huge
surprise.  Nobody expected that when

192
00:15:10,000 --> 00:15:14,000
they started to do the biochemistry.
And then in principle once you cut

193
00:15:14,000 --> 00:15:18,000
it now you can remove this little
nucleotide and then a DNA polymerase

194
00:15:18,000 --> 00:15:22,000
can just come in,
and following those A pairs with T,

195
00:15:22,000 --> 00:15:26,000
G pairs with C, copy it along and
then would seal it up to get to the

196
00:15:26,000 --> 00:15:30,000
end.  And I've actually shown you a
picture of what happens if a human

197
00:15:30,000 --> 00:15:35,000
is missing that system.
When I was showing you how profound

198
00:15:35,000 --> 00:15:39,000
an effect you could get from just
losing one single gene or a mutation

199
00:15:39,000 --> 00:15:44,000
affecting one single gene,
this disease called xeroderma

200
00:15:44,000 --> 00:15:48,000
pigmentosum.  They're a variety of
different groups.

201
00:15:48,000 --> 00:15:53,000
And the one on the left is an
example.  That's someone who is

202
00:15:53,000 --> 00:15:57,000
missing one of the genes that
encodes one of the proteins involved

203
00:15:57,000 --> 00:16:01,000
in nucleotide excision repair.
And this is really,

204
00:16:01,000 --> 00:16:05,000
really important for fixing up the
damage we get all the time in

205
00:16:05,000 --> 00:16:08,000
sunlight.  So if you miss that
repair system and you got out in the

206
00:16:08,000 --> 00:16:12,000
sun then you get all kinds of
lesions and people are very

207
00:16:12,000 --> 00:16:16,000
susceptible to skin cancer.
And I told you fortunately now you

208
00:16:16,000 --> 00:16:19,000
don't find people with this disease
looking like that because at least

209
00:16:19,000 --> 00:16:23,000
in developed countries we recognize
it.  They're kept out of the sun.

210
00:16:23,000 --> 00:16:26,000
And these were the kids who I said
are called ìchildren of the moonî

211
00:16:26,000 --> 00:16:30,000
because they, for example,
go to summer camps where they do

212
00:16:30,000 --> 00:16:34,000
everything at night so they won't
get exposed to sunlight.

213
00:16:34,000 --> 00:16:39,000
But that's what happens to us if we
miss that excision repair.

214
00:16:39,000 --> 00:16:45,000
And, again, what makes that
possible is that the information is

215
00:16:45,000 --> 00:16:50,000
there twice in a double-stranded DNA.
I also showed you a little movie

216
00:16:50,000 --> 00:16:56,000
early on when I was showing you,
I'm going to actually run this in

217
00:16:56,000 --> 00:17:02,000
QuickTime because it works a little
more smoothly, I think.

218
00:17:02,000 --> 00:17:07,000
So I showed you this when we were

219
00:17:07,000 --> 00:17:11,000
talking about DNA because I wanted
you to sort of get that sense of

220
00:17:11,000 --> 00:17:14,000
what it was like to kind of fly down
the groove of a DNA.

221
00:17:14,000 --> 00:17:18,000
But what I didn't emphasize was
this protein that was bound to the

222
00:17:18,000 --> 00:17:22,000
DNA.  That's a protein that's a DNA
repair protein.

223
00:17:22,000 --> 00:17:25,000
And it's one of these things that
looks for lesions in the DNA.

224
00:17:25,000 --> 00:17:29,000
And as we fly along the major
groove this little green thing is

225
00:17:29,000 --> 00:17:33,000
actually the lesion that that
protein is looking for.

226
00:17:33,000 --> 00:17:38,000
And it sort of puts fingers down
into the groove and it's able to

227
00:17:38,000 --> 00:17:43,000
sense that.  And you can sort of see
how this protein is bound to DNA.

228
00:17:43,000 --> 00:17:49,000
This is a lesion that we get all
the time from oxidative damage.

229
00:17:49,000 --> 00:17:54,000
And remember I said oxygen is bad
for DNA?  So our bodies have to have

230
00:17:54,000 --> 00:18:00,000
systems that are able to do that.
So DNA repair is very important for

231
00:18:00,000 --> 00:18:05,000
life.
We'll just finish flying down the

232
00:18:05,000 --> 00:18:09,000
major groove one more time here.
OK.  I'm going to go back to

233
00:18:09,000 --> 00:18:20,000
PowerPoint.

234
00:18:20,000 --> 00:18:30,000
OK.  So mismatched repair is a form
of repair that's got

235
00:18:30,000 --> 00:18:37,000
that same idea.
Let's think about it if we had a

236
00:18:37,000 --> 00:18:42,000
replication fork here,
and let's say there was a G here and

237
00:18:42,000 --> 00:18:46,000
the T got misincorporated,
but in this case it wasn't removed

238
00:18:46,000 --> 00:18:51,000
by the proofreading which happens
about one in ten to the seventh

239
00:18:51,000 --> 00:18:56,000
times.  Now if that strand is fixed
up, excuse me,

240
00:18:56,000 --> 00:19:01,000
is continued then you'd end up with
a GT base pair.

241
00:19:01,000 --> 00:19:05,000
And the next time you copied it this
strand would give rise to a GC but

242
00:19:05,000 --> 00:19:09,000
this one would give rise to an AT.
And then you'd have a mutation that

243
00:19:09,000 --> 00:19:13,000
now would have changed.
And if it affected an important

244
00:19:13,000 --> 00:19:17,000
gene that could be bad for you.
So the cell has what's known as a

245
00:19:17,000 --> 00:19:25,000
mismatch repair --

246
00:19:25,000 --> 00:19:29,000
-- that works in exactly the same
logic as here.  That it

247
00:19:29,000 --> 00:19:34,000
basically comes along.
It scans the DNA.

248
00:19:34,000 --> 00:19:41,000
It finds the bump because this is
not a proper base pair.

249
00:19:41,000 --> 00:19:48,000
And then it fills it in and you're
back to ordinary DNA with a GC base

250
00:19:48,000 --> 00:19:55,000
pair.  There's one little wrinkle.
For this system to work it has to

251
00:19:55,000 --> 00:20:03,000
do one other thing that's different
from that kind of DNA repair.

252
00:20:03,000 --> 00:20:08,000
Can anybody see what it is?
Why don't you talk to the person

253
00:20:08,000 --> 00:20:14,000
next to you and see if you can
figure it out.

254
00:20:14,000 --> 00:20:19,000
This system must be doing something
else in order for this to work.

255
00:20:19,000 --> 00:20:25,000
OK, you can ask somebody.  What do
you think?

256
00:20:25,000 --> 00:20:35,000
What if I removed the gene?

257
00:20:35,000 --> 00:20:43,000
Would that work?

258
00:20:43,000 --> 00:20:47,000
What would happen if I took the gene
instead?  Say I made the little gap

259
00:20:47,000 --> 00:20:52,000
over on this strand instead,
cut it here?

260
00:20:52,000 --> 00:20:59,000
Yeah.  So which one is the one

261
00:20:59,000 --> 00:21:03,000
that's right, the old strand
or the new strand?

262
00:21:03,000 --> 00:21:07,000
The old strand,
yeah.  See, this is the old and this

263
00:21:07,000 --> 00:21:11,000
is the new.  And the term that's
usually used, it's known as the

264
00:21:11,000 --> 00:21:16,000
daughter strand,
the new strand.  So the other thing

265
00:21:16,000 --> 00:21:20,000
this system has to do is it not only
has to be able to detect that

266
00:21:20,000 --> 00:21:25,000
there's an incorrect little base
pair in there,

267
00:21:25,000 --> 00:21:29,000
but it also has to know which is the
parental strand,

268
00:21:29,000 --> 00:21:33,000
the template strand,
and which is the daughter strand,

269
00:21:33,000 --> 00:21:37,000
the newly synthesized strand.
And this system makes the assumption

270
00:21:37,000 --> 00:21:41,000
that the strand that's old is the
one that's correct and the mistake

271
00:21:41,000 --> 00:21:44,000
is on the new one.
You guys see that?

272
00:21:44,000 --> 00:21:48,000
OK.  So that gets another two or
three orders of magnitude in

273
00:21:48,000 --> 00:21:52,000
accuracy and that's what
brings it up.

274
00:21:52,000 --> 00:21:55,000
Now, the people who made this,
who formulated this model for

275
00:21:55,000 --> 00:21:59,000
mismatch repair,
complete with the feature that it

276
00:21:59,000 --> 00:22:03,000
needed to recognize the old and new
strand, that's a bit of a trick,

277
00:22:03,000 --> 00:22:07,000
if you think about it because it's
DNA on both sides.

278
00:22:07,000 --> 00:22:10,000
And there are several different
ways used in nature,

279
00:22:10,000 --> 00:22:14,000
so I'm not going to go into it,
but there's at least a couple of

280
00:22:14,000 --> 00:22:18,000
different ways of doing that trick.
You could sort of see if you were

281
00:22:18,000 --> 00:22:22,000
the replication fork and you talked
to that you could certainly,

282
00:22:22,000 --> 00:22:26,000
just from the geometry of that,
if you wanted, you could probably

283
00:22:26,000 --> 00:22:29,000
keep track of who's old and new.
E. coli has a very cute trick,

284
00:22:29,000 --> 00:22:33,000
but it's not universal so I won't go
into it, but the people who did the

285
00:22:33,000 --> 00:22:36,000
seminal stuff,
I had to just quickly show you a

286
00:22:36,000 --> 00:22:39,000
couple of pictures.
When I showed you that picture of

287
00:22:39,000 --> 00:22:43,000
the DNA 50th, the guy sitting in the
front row was Miroslav Radman who

288
00:22:43,000 --> 00:22:46,000
was one of the two people.
He's a European scientist

289
00:22:46,000 --> 00:22:49,000
originally from Croatia.
And he collaborated with someone

290
00:22:49,000 --> 00:22:53,000
you've heard about before,
Matt Meselson, who was up at Harvard.

291
00:22:53,000 --> 00:22:56,000
And it was with the Meselson-Stahl
experiment that showed the

292
00:22:56,000 --> 00:23:00,000
semi-conservative mechanism
of DNA repair.

293
00:23:00,000 --> 00:23:03,000
This was a little reception.
And Matt was talking to Alex Rich

294
00:23:03,000 --> 00:23:07,000
who's in the MIT Biology Department.
And I was amused because remember

295
00:23:07,000 --> 00:23:11,000
how Vernon told you how Francis
Crick would run up and down the

296
00:23:11,000 --> 00:23:15,000
stairs in the Cambridge lab and he
was talking all the time?

297
00:23:15,000 --> 00:23:19,000
And I've heard Vernon say you could
never really tell whether an idea

298
00:23:19,000 --> 00:23:23,000
came from Watson or Crick because
they'd just talk,

299
00:23:23,000 --> 00:23:27,000
talk, talk all the time.
So this was at sort of nice

300
00:23:27,000 --> 00:23:31,000
reception at the DNA 50th.
And within a couple of minutes,

301
00:23:31,000 --> 00:23:35,000
I looked over and there were
Miroslav Radman and Matt Meselson

302
00:23:35,000 --> 00:23:39,000
talk, talk, talk.
They were in the corner drawing

303
00:23:39,000 --> 00:23:43,000
pictures on a board.
I also showed you actually a

304
00:23:43,000 --> 00:23:47,000
picture of one of the genes that's
involved in recognizing this

305
00:23:47,000 --> 00:23:51,000
mismatch, because there's a protein
that recognizes that mismatch and

306
00:23:51,000 --> 00:23:55,000
it's given the name of mute S.
And when I was showing you some

307
00:23:55,000 --> 00:23:59,000
proteins it had one that had a lot
of alpha helices.

308
00:23:59,000 --> 00:24:04,000
This is actually a picture of mute S.
It's a dimer.

309
00:24:04,000 --> 00:24:09,000
That's why some of it's green and
some of it's blue.

310
00:24:09,000 --> 00:24:14,000
And this is DNA viewed end on and
it's recognizing a GT mismatch in

311
00:24:14,000 --> 00:24:19,000
DNA in that picture.
Now, this may sound very esoteric,

312
00:24:19,000 --> 00:24:24,000
you know, and obviously important
for life and an important part of

313
00:24:24,000 --> 00:24:29,000
sort of understanding how life works
if you're interesting in studying

314
00:24:29,000 --> 00:24:34,000
molecular biology.
It may not seem to have very much

315
00:24:34,000 --> 00:24:39,000
connection to your real life.
But, in fact, in this case mismatch

316
00:24:39,000 --> 00:24:44,000
repair does because it affects the
frequency with which,

317
00:24:44,000 --> 00:24:49,000
if you lose it, then when you
replicate your DNA you're going to

318
00:24:49,000 --> 00:24:54,000
make more mistakes.
And I need to just give you a very

319
00:24:54,000 --> 00:24:59,000
quick introduction to cancer so you
can see why this is important.

320
00:24:59,000 --> 00:25:03,000
Cancer comes from the fact that
remember a human cell or a

321
00:25:03,000 --> 00:25:07,000
multi-cell like us that has many
kinds of different cells starts out

322
00:25:07,000 --> 00:25:11,000
from one cell.
And I talked about first you get

323
00:25:11,000 --> 00:25:15,000
the embryonic stem cells that can
become anything.

324
00:25:15,000 --> 00:25:19,000
And the cells become successively
more and more and more specialized

325
00:25:19,000 --> 00:25:23,000
as they go along.
So ultimately a cell that's in your

326
00:25:23,000 --> 00:25:27,000
retina or in, say,
the lining of your colon needs to

327
00:25:27,000 --> 00:25:32,000
know that's where it belongs.
And it also needs to know that it

328
00:25:32,000 --> 00:25:37,000
cannot just keep replicating.
So if this is actually showing a

329
00:25:37,000 --> 00:25:42,000
little picture of the lining of your
intestine.  And there's a single

330
00:25:42,000 --> 00:25:46,000
layer of cells right along the
inside edge of your intestines.

331
00:25:46,000 --> 00:25:51,000
This is the cells through which all
the nutrient exchange happens and

332
00:25:51,000 --> 00:25:56,000
everything else when your body
extracts nutrients as food stuff

333
00:25:56,000 --> 00:26:01,000
passes through your intestine.
And so what happens with cancer is a

334
00:26:01,000 --> 00:26:05,000
cell that's normally a part of your
body has to obey a whole set of

335
00:26:05,000 --> 00:26:10,000
rules.  And what you can think of
when someone starts to develop

336
00:26:10,000 --> 00:26:15,000
cancer is that what started out as
an ordinary cell undergoes some kind

337
00:26:15,000 --> 00:26:19,000
of successive changes in its DNA
that gradually causes it to forget

338
00:26:19,000 --> 00:26:24,000
the rules that make it be part of an
organized body system.

339
00:26:24,000 --> 00:26:29,000
So if we take a look here at all
these different cells.

340
00:26:29,000 --> 00:26:33,000
But let's imagine just one of the
gets a change that makes it forget

341
00:26:33,000 --> 00:26:38,000
to stop, or it should know to stop
replicating when it touches its

342
00:26:38,000 --> 00:26:42,000
neighbors, but if a cell were to
lose that control what would happen?

343
00:26:42,000 --> 00:26:47,000
Well, it would then begin to
proliferate.  And then what happens

344
00:26:47,000 --> 00:26:52,000
in cancer is the cell will,
now there are more of them, and one

345
00:26:52,000 --> 00:26:56,000
cell with acquire an additional
mutation that will lead to a further

346
00:26:56,000 --> 00:27:01,000
loss of growth control.
You can see now the cells are

347
00:27:01,000 --> 00:27:05,000
starting to become sort of funny
shapes.  And then one of the cells

348
00:27:05,000 --> 00:27:09,000
in here will undergo yet another
change.  And right at this point,

349
00:27:09,000 --> 00:27:13,000
up until now, the cancer has, even
though the cells are dividing and

350
00:27:13,000 --> 00:27:18,000
have lost some of their growth
control they're still staying in the

351
00:27:18,000 --> 00:27:22,000
same place.  So that would be sort
of, you know, like a wart or

352
00:27:22,000 --> 00:27:26,000
something like that,
or what you would hear as a benign

353
00:27:26,000 --> 00:27:30,000
tumor.
You can go in surgically and take it

354
00:27:30,000 --> 00:27:34,000
away.  But then the other thing that
can happen is cells can forget where

355
00:27:34,000 --> 00:27:38,000
they're supposed to be in the body.
And when that happens they say the

356
00:27:38,000 --> 00:27:42,000
cells metastasize and become
metastatic or a malignant tumor.

357
00:27:42,000 --> 00:27:46,000
And what that means is the cell is
beginning to, it's acquired yet

358
00:27:46,000 --> 00:27:50,000
another change that's made it forget
which part of the body it's supposed

359
00:27:50,000 --> 00:27:54,000
to be in.  And they've signified it
here as being a change in this cell

360
00:27:54,000 --> 00:27:58,000
that then leads to,
you can see here right now it's

361
00:27:58,000 --> 00:28:02,000
starting to invade into
the whole intestine.

362
00:28:02,000 --> 00:28:06,000
Or if one of those cells comes off
lose in your bloodstream it can land

363
00:28:06,000 --> 00:28:10,000
somewhere else in your body and then
start to grow there.

364
00:28:10,000 --> 00:28:15,000
And that's what happens when
somebody has metastatic cancer.

365
00:28:15,000 --> 00:28:19,000
You cannot really cure it because
now there are cancer cells all over

366
00:28:19,000 --> 00:28:24,000
the body.  And that usually is a
very difficult situation to get any

367
00:28:24,000 --> 00:28:28,000
kind of cure on.
So to put this in perspective,

368
00:28:28,000 --> 00:28:32,000
you needed to have a number of
changes to go from an ordinary cell

369
00:28:32,000 --> 00:28:37,000
to a metastatic cancer cell.
So each one of these changes there

370
00:28:37,000 --> 00:28:43,000
was some kind of change in the DNA.
Either there was a mutation or

371
00:28:43,000 --> 00:28:48,000
maybe a chromosome was lost or
something like this so that you need

372
00:28:48,000 --> 00:28:53,000
a series of successive genetic
alterations.  So there was a very

373
00:28:53,000 --> 00:28:59,000
key insight that a number of people
had after we understood the

374
00:28:59,000 --> 00:29:04,000
mechanism of mismatch repair.
Because some people realized that if

375
00:29:04,000 --> 00:29:09,000
a human cell had lost mismatch
repair then the frequency of each

376
00:29:09,000 --> 00:29:14,000
one of these changes would go up.
It wouldn't affect what the change

377
00:29:14,000 --> 00:29:19,000
was.  It wouldn't actually have
anything to do,

378
00:29:19,000 --> 00:29:24,000
if you lost mismatch repair it
wouldn't affect directly the ability

379
00:29:24,000 --> 00:29:29,000
of this cell to stop dividing when
it touches its neighbors.

380
00:29:29,000 --> 00:29:35,000
But it would increase the chances
that a mutation somewhere would have

381
00:29:35,000 --> 00:29:41,000
that effect.  And if every one of
these steps goes now a hundred or a

382
00:29:41,000 --> 00:29:47,000
thousand times faster,
you can see that if somebody loses

383
00:29:47,000 --> 00:29:54,000
mismatch repair in a cell then the
chances of that cell coming into a

384
00:29:54,000 --> 00:30:00,000
cancer are very high.
So there was a kind of human cancer,

385
00:30:00,000 --> 00:30:06,000
it's a susceptibility to colon
cancer called hereditary

386
00:30:06,000 --> 00:30:12,000
nonpolyposis colon cancer.
You don't need to remember the name.

387
00:30:12,000 --> 00:30:18,000
It's often abbreviated HNPCC for
people who cannot remember the name.

388
00:30:18,000 --> 00:30:23,000
But it was a kind of susceptibility
to cancer that ran in families.

389
00:30:23,000 --> 00:30:29,000
So it was thought to be genetically
determined in some way.

390
00:30:29,000 --> 00:30:33,000
And one of the interesting things
was a number of the people who had

391
00:30:33,000 --> 00:30:38,000
this disease would show a kind of
instability of the genome if they

392
00:30:38,000 --> 00:30:42,000
looked in the tumors.
They just looked at the DNA.

393
00:30:42,000 --> 00:30:47,000
It seemed to be undergoing changes
at a much faster rate.

394
00:30:47,000 --> 00:30:51,000
And the insight that came out was
that the people who had this disease

395
00:30:51,000 --> 00:30:56,000
had, for example,
a mutation affecting what we can

396
00:30:56,000 --> 00:31:01,000
think of as a human
homolog of mute S.

397
00:31:01,000 --> 00:31:06,000
And we'll talk about genetics of
humans in a small number of weeks,

398
00:31:06,000 --> 00:31:11,000
but I think most of you know that
for most genes,

399
00:31:11,000 --> 00:31:16,000
except for the genes associated with
the sex chromosomes,

400
00:31:16,000 --> 00:31:21,000
you get one copy of a gene from mom
and another copy of a gene from dad.

401
00:31:21,000 --> 00:31:26,000
So under most circumstances we
would have two good copies of this

402
00:31:26,000 --> 00:31:32,000
gene encoding a human
homolog of mute S.

403
00:31:32,000 --> 00:31:36,000
What does that human homolog of mute
S do?  The same thing as the

404
00:31:36,000 --> 00:31:41,000
bacteria.  It recognizes a mismatch
in DNA and fixes it up.

405
00:31:41,000 --> 00:31:45,000
So it turned out that what the
people with this disease have is

406
00:31:45,000 --> 00:31:50,000
they have one of the genes.
The gene they got from mom or the

407
00:31:50,000 --> 00:31:55,000
gene they got from dad is broken.
So they're still OK. They have one

408
00:31:55,000 --> 00:32:00,000
copy of mismatch repair
in every cell.

409
00:32:00,000 --> 00:32:06,000
But if a cell ever had lost that
copy of the good version now that

410
00:32:06,000 --> 00:32:13,000
cell and all of its descendents
would mutate at something like a

411
00:32:13,000 --> 00:32:20,000
hundred or a thousand times the
normal probability.

412
00:32:20,000 --> 00:32:27,000
And so they would progress down
this pathway.

413
00:32:27,000 --> 00:32:33,000
And so the polyposis means that if
they look in the colons of people

414
00:32:33,000 --> 00:32:39,000
who have this disease they find lots
and lots of little growths or polyps

415
00:32:39,000 --> 00:32:45,000
that are on their way to progressing
down this disease.

416
00:32:45,000 --> 00:32:51,000
Even in these people it takes quite
a while.  And so once they knew that

417
00:32:51,000 --> 00:32:57,000
they were able to go in and through
colonoscopies find these cancers

418
00:32:57,000 --> 00:33:02,000
and remove them.
And most of you will not have that

419
00:33:02,000 --> 00:33:06,000
disease, but this is now a kind of
cancer that's pretty much

420
00:33:06,000 --> 00:33:11,000
preventable as long as it gets
detected.  It can take in a normal

421
00:33:11,000 --> 00:33:15,000
person as long as 20 years or
something for an initial cell that

422
00:33:15,000 --> 00:33:19,000
underwent this initial change to go
all the way down to becoming

423
00:33:19,000 --> 00:33:24,000
metastatic.  So when you get older,
and this certainly applies to most

424
00:33:24,000 --> 00:33:28,000
of your parents or in this age group,
you should have ask them if they've

425
00:33:28,000 --> 00:33:32,000
had a colonoscopy.
It's not the world's most fun

426
00:33:32,000 --> 00:33:35,000
procedure because,
you know, they stick a probe and

427
00:33:35,000 --> 00:33:39,000
look inside your intestine,
but it isn't that bad.  And what

428
00:33:39,000 --> 00:33:42,000
they do is if they see one of these
little polyps they can catch it

429
00:33:42,000 --> 00:33:45,000
before it's progressed far enough to
be metastatic.

430
00:33:45,000 --> 00:33:49,000
And then there's no problem.
I had my first one done about,

431
00:33:49,000 --> 00:33:52,000
I don't know, three or four years
ago and they found one.

432
00:33:52,000 --> 00:33:55,000
And they took it out and I'm fine.
But if it had been left there and

433
00:33:55,000 --> 00:33:59,000
allowed to progress then some years
down the line I would have

434
00:33:59,000 --> 00:34:02,000
gotten colon cancer.
And I'm going to have to go back and

435
00:34:02,000 --> 00:34:06,000
get checked again in another year or
two.  But it is something that you

436
00:34:06,000 --> 00:34:10,000
should check with your parents
because everybody should have a

437
00:34:10,000 --> 00:34:13,000
colonoscopy.  My hope is by the time
you guys reach an age when this

438
00:34:13,000 --> 00:34:17,000
comes they'll probably have some
kind of little blood test or

439
00:34:17,000 --> 00:34:20,000
something where you won't have to go
through this indignity.

440
00:34:20,000 --> 00:34:24,000
But right at the moment it's
something everyone should do,

441
00:34:24,000 --> 00:34:28,000
I think.  I just wanted to make one
other comment about basic research

442
00:34:28,000 --> 00:34:32,000
because there's another thing here.
Actually, my lab was the first lab

443
00:34:32,000 --> 00:34:36,000
to clone the mute S gene.
We cloned it, we sequenced it,

444
00:34:36,000 --> 00:34:40,000
and we looked in the databases.  And
at that time in the late eighties

445
00:34:40,000 --> 00:34:44,000
there was nothing else that looked
like it.  I thought it would be like,

446
00:34:44,000 --> 00:34:48,000
there were some sort of similar
mutants, and here's what it looked

447
00:34:48,000 --> 00:34:52,000
like.  This is a culture of E.
coli.  And there are about ten to

448
00:34:52,000 --> 00:34:56,000
the ninth cells per mil.
And we plated about ten to the

449
00:34:56,000 --> 00:35:00,000
ninth or ten to the eighth on a
plate with a drug on it.

450
00:35:00,000 --> 00:35:03,000
And you can see they almost all died,
but there were maybe three or four

451
00:35:03,000 --> 00:35:07,000
that survived.
And then their descendents were

452
00:35:07,000 --> 00:35:11,000
able to grow up and form a colony.
This is how we recognized something

453
00:35:11,000 --> 00:35:15,000
was defective and what we now know
as mismatch repair.

454
00:35:15,000 --> 00:35:18,000
If you took this mutant of E.
coli and plated it out, you'd see

455
00:35:18,000 --> 00:35:22,000
you got a lot more drug-resistant
colonies.  That's the difference

456
00:35:22,000 --> 00:35:26,000
that I was describing,
the importance of mismatch repair.

457
00:35:26,000 --> 00:35:30,000
If you don't have mismatch repair
you can see, you get a lot more

458
00:35:30,000 --> 00:35:33,000
mistakes that show up as mutants.
So I was studying that.

459
00:35:33,000 --> 00:35:37,000
And we cloned the mute S and mute L
genes which are another gene that's

460
00:35:37,000 --> 00:35:41,000
involved in this.
Didn't see anything in the database,

461
00:35:41,000 --> 00:35:44,000
but there were very similar mutants
in streptococcus pneumonia that

462
00:35:44,000 --> 00:35:48,000
people had isolated.
Remember streptococcus pneumonia in

463
00:35:48,000 --> 00:35:51,000
the transformation experiments?
So I thought, well, maybe these are

464
00:35:51,000 --> 00:35:55,000
the same genes on an evolutionary
basis.  So I phoned some labs,

465
00:35:55,000 --> 00:35:59,000
and I found one that was sequencing
what turned out to be

466
00:35:59,000 --> 00:36:02,000
a homolog of mute S.
We tried to publish our papers in a

467
00:36:02,000 --> 00:36:06,000
medium fancy journal because I
thought this was a pretty cool

468
00:36:06,000 --> 00:36:10,000
result that two bacteria that were
evolutionarily very diverged had

469
00:36:10,000 --> 00:36:14,000
this conserve mechanism for mismatch
repair, but the reviewer said,

470
00:36:14,000 --> 00:36:18,000
you know, this is a pretty
specialized topic,

471
00:36:18,000 --> 00:36:22,000
it's not of general interest,
it should go in, the phrase they use

472
00:36:22,000 --> 00:36:26,000
is ìa more specialized journalî.
So it was published in the Journal

473
00:36:26,000 --> 00:36:30,000
of Bacteriology which is a really
wonderful journal,

474
00:36:30,000 --> 00:36:33,000
but it basically deals with bacteria.
And about a week after that paper

475
00:36:33,000 --> 00:36:36,000
came out my phone rang and it was a
guy from Emory.

476
00:36:36,000 --> 00:36:39,000
And he said, ìI work on mouse.
We were sequencing a gene,î it

477
00:36:39,000 --> 00:36:42,000
doesn't matter what,
ìand we sequenced in the wrong

478
00:36:42,000 --> 00:36:45,000
direction.  And we seem to have
something called mute S.

479
00:36:45,000 --> 00:36:48,000
Do you know anything about mute S?
And a couple of days after that I

480
00:36:48,000 --> 00:36:51,000
got a phone call from somebody at
NIH.  And they said the same thing,

481
00:36:51,000 --> 00:36:54,000
ìWe were trying to sequence this
gene in humans.

482
00:36:54,000 --> 00:36:57,000
We kind of sequenced in the wrong
direction and found mute S.

483
00:36:57,000 --> 00:37:00,000
So within a week of the paper
coming out I knew there were mouse

484
00:37:00,000 --> 00:37:04,000
and human homologs.
And that led from these sorts of

485
00:37:04,000 --> 00:37:09,000
studies, which my first graduate
student worked on,

486
00:37:09,000 --> 00:37:14,000
to the identification of the human
homologs.  And then not me but

487
00:37:14,000 --> 00:37:19,000
others made the connection between
mismatch repair and cancer.

488
00:37:19,000 --> 00:37:24,000
But this is the way a lot of things
happen with basic research.

489
00:37:24,000 --> 00:37:29,000
This doesn't look like anything
that's very important.

490
00:37:29,000 --> 00:37:32,000
And it sure doesn't look like it's
going to lead to an insight into

491
00:37:32,000 --> 00:37:36,000
cancer, but this is very much the
way it goes.  I've had this happen

492
00:37:36,000 --> 00:37:40,000
twice with another set of genes in
my life that turned out to be

493
00:37:40,000 --> 00:37:44,000
important for cancer as well.
And, as I said, what happens,

494
00:37:44,000 --> 00:37:47,000
if you lose mismatch repair, then
all these alterations happen much

495
00:37:47,000 --> 00:37:51,000
more quickly and the cells can
become cancerous.

496
00:37:51,000 --> 00:37:55,000
I've included a couple of outtakes
because I actually made this slide

497
00:37:55,000 --> 00:37:59,000
with my son's pillowcase on our
dining room counter.

498
00:37:59,000 --> 00:38:04,000
And our cats, who you saw at some
point earlier in the year,

499
00:38:04,000 --> 00:38:09,000
thought this was the weirdest thing
they had ever seen,

500
00:38:09,000 --> 00:38:14,000
when I brought these plates home.
So, OK, anyway.  All right.  So one

501
00:38:14,000 --> 00:38:19,000
other thing to tell you about DNA
replication before I move

502
00:38:19,000 --> 00:38:28,000
on, and that is --

503
00:38:28,000 --> 00:38:33,000
-- the initiation of DNA replication.
In E. coli there's one great big

504
00:38:33,000 --> 00:38:39,000
piece of DNA.  And it's all one
giant circular chromosome.

505
00:38:39,000 --> 00:38:45,000
And if you realize what I've told
you about DNA replication,

506
00:38:45,000 --> 00:38:50,000
I've talked to you only about once
you have a replication fork

507
00:38:50,000 --> 00:38:56,000
established how you keep it going.
But, as you might guess, a really

508
00:38:56,000 --> 00:39:02,000
important point of biological
control is the initiation

509
00:39:02,000 --> 00:39:07,000
of DNA replication.
And so the way cells do that is they

510
00:39:07,000 --> 00:39:12,000
have a special sequence in their DNA.
It's written just with Gs and Cs

511
00:39:12,000 --> 00:39:17,000
and As and Ts,
but it's a word sort of written in a

512
00:39:17,000 --> 00:39:23,000
different language than the kind of
genetic code we're going to be

513
00:39:23,000 --> 00:39:28,000
talking about in the next couple of
lectures.  And what it means is

514
00:39:28,000 --> 00:39:34,000
ìstart replication hereî.
And so in E. coli these terms are

515
00:39:34,000 --> 00:39:41,000
called origin DNA replication.
And, for example, in E. coli it's a

516
00:39:41,000 --> 00:39:48,000
stretch of DNA that's about 250 base
pairs long.  And it's got a sequence

517
00:39:48,000 --> 00:39:55,000
that lets proteins bind and they
kind of are able to make a little

518
00:39:55,000 --> 00:40:01,000
bubble like this.
And it's at the edges of this little

519
00:40:01,000 --> 00:40:05,000
bubble where it's able to start a
replication fork.

520
00:40:05,000 --> 00:40:09,000
And one of the secrets to control
of cell division is that cells are

521
00:40:09,000 --> 00:40:13,000
able then to control whether the
protein that sees the origin is

522
00:40:13,000 --> 00:40:17,000
there or not.  And it won't start a
new round of replication unless

523
00:40:17,000 --> 00:40:21,000
everything is right.
Then it can make the things that

524
00:40:21,000 --> 00:40:25,000
initiate a new round.
And after that it finishes.

525
00:40:25,000 --> 00:40:30,000
Our eukaryotic cells with a lot
more DNA use the same thing.

526
00:40:30,000 --> 00:40:34,000
The same idea,
but there tend to be multiple

527
00:40:34,000 --> 00:40:39,000
origins.  And you get a little
bubble and another little one down

528
00:40:39,000 --> 00:40:44,000
here.  And once you get the
replication forks established then

529
00:40:44,000 --> 00:40:49,000
these kind of merge.
And then eventually we end up with

530
00:40:49,000 --> 00:40:53,000
the two strands of DNA.
But I just mention that in passing

531
00:40:53,000 --> 00:40:58,000
because it's an example of how even
though the DNA is nothing but Gs and

532
00:40:58,000 --> 00:41:03,000
Cs and As and Ts,
you can kind of write words in there

533
00:41:03,000 --> 00:41:08,000
that mean different things.
Some of them on the genetic code

534
00:41:08,000 --> 00:41:13,000
tell you what the order of amino
acids in the cell are,

535
00:41:13,000 --> 00:41:17,000
but everything else has to be
encoded in the DNA,

536
00:41:17,000 --> 00:41:22,000
too.  And here's a really nice
example of how that works.

537
00:41:22,000 --> 00:41:27,000
Now, we're going to switch at this
point from worrying about how DNA is

538
00:41:27,000 --> 00:41:32,000
replicated to how information is
stored and interpreted.

539
00:41:32,000 --> 00:41:39,000
And there's a figure that most of

540
00:41:39,000 --> 00:41:44,000
you have probably seen,
DNA goes to RNA goes to protein.

541
00:41:44,000 --> 00:41:48,000
This is the usual direction of
information flow.

542
00:41:48,000 --> 00:41:53,000
The information for making proteins
is encoded in the DNA,

543
00:41:53,000 --> 00:41:58,000
as we'll talk about in more detail,
and an RNA copy of some piece of

544
00:41:58,000 --> 00:42:03,000
that, one gene's worth usually,
gets made in RNA.

545
00:42:03,000 --> 00:42:12,000
And then that information in the RNA
is used to direct the sequences of

546
00:42:12,000 --> 00:42:22,000
amino acids that appear in a protein.
And this is a four letter alphabet,

547
00:42:22,000 --> 00:42:31,000
if you want, A, G, T and C.
This is a four letter alphabet,

548
00:42:31,000 --> 00:42:39,000
A, G, U and C, where the uracil and
the thiamine have the same base

549
00:42:39,000 --> 00:42:47,000
pairing capacity.
And this is a 20 letter alphabet.

550
00:42:47,000 --> 00:42:54,000
All those 20 amino acids that you

551
00:42:54,000 --> 00:42:58,000
were looking at,
at the chart over at the back of the

552
00:42:58,000 --> 00:43:02,000
exam.
So from the point of view of

553
00:43:02,000 --> 00:43:06,000
information storage and information
flow there are some interesting

554
00:43:06,000 --> 00:43:11,000
things that had to come up in order
for the information to flow in that

555
00:43:11,000 --> 00:43:15,000
way.  But before I do that I want to
just get you to think about DNA as

556
00:43:15,000 --> 00:43:20,000
an information storage device.
This is MIT.  I'm almost sure in

557
00:43:20,000 --> 00:43:24,000
this room there are some people that
are experts in high density

558
00:43:24,000 --> 00:43:29,000
information storage.
And even if you're not most of us

559
00:43:29,000 --> 00:43:35,000
have now a lot of experience with it.
Your computer can do gigabytes of

560
00:43:35,000 --> 00:43:40,000
information.  Your iPod probably has
a 40 megabyte hard drive in it or

561
00:43:40,000 --> 00:43:46,000
something like that.
So you have some experience with

562
00:43:46,000 --> 00:43:51,000
high density information storage.
So here's the question.  How much

563
00:43:51,000 --> 00:43:57,000
DNA would it take to encode
everybody who's alive on earth today,

564
00:43:57,000 --> 00:44:02,000
6 billion and a bit people?
And let's argue that all we need is

565
00:44:02,000 --> 00:44:08,000
a single cell's worth of DNA because
everybody started out a single

566
00:44:08,000 --> 00:44:14,000
fertilized egg and went on.
Yeah?  OK.  Enough DNA to fill one

567
00:44:14,000 --> 00:44:19,000
human being.  Anybody else got any
sense?  All right.

568
00:44:19,000 --> 00:44:25,000
This is, I think, the most amazing
demo.  I did this when I was

569
00:44:25,000 --> 00:44:30,000
teaching for the first time.
The amount of DNA it would take to

570
00:44:30,000 --> 00:44:34,000
encode everybody who's alive on
earth, one cell of everybody who's

571
00:44:34,000 --> 00:44:39,000
alive on earth today is this little
thing in here,

572
00:44:39,000 --> 00:44:43,000
which you probably cannot see even,
but I took a picture of it.  There

573
00:44:43,000 --> 00:44:48,000
are about six times ten to the minus
twelfth grams of DNA in a human cell.

574
00:44:48,000 --> 00:44:53,000
And if you multiple that out by 6
billion people it comes out to 36

575
00:44:53,000 --> 00:44:57,000
milligrams of DNA.
And I weighed out 40 something

576
00:44:57,000 --> 00:45:02,000
milligrams of DNA.
So there's actually more DNA there

577
00:45:02,000 --> 00:45:06,000
than you need to encode everybody
who's alive on earth today.

578
00:45:06,000 --> 00:45:11,000
And I don't know how this hits you,
but I've been working on DNA my

579
00:45:11,000 --> 00:45:16,000
entire life.  And every time I do
this, you know,

580
00:45:16,000 --> 00:45:20,000
I think I understand this molecule,
but I don't really think I do at

581
00:45:20,000 --> 00:45:25,000
some more fundamental level.
It's absolutely amazing how much

582
00:45:25,000 --> 00:45:30,000
information is stored
in that molecule.

583
00:45:30,000 --> 00:45:34,000
So the one point I will,
actually, I think it's close enough.

584
00:45:34,000 --> 00:45:37,000
Why don't we just call it a day,
and I'll pick this stuff --