Practice English Speaking&Listening with: Lecture - 05 Protein Structure III

Difficulty: 0

Welcome, this is lecture number five on Protein structure. Previously we discussed about the

importance of protein structure. So here you have difference in the folding of the protein

because of a specific genetic mutation. An amino acid mutation is where we have Glutamic

acid goes to Valine. Now we have a hydrophobic residue on the surface because of this mutation.

This is entirely different from the Glutamic acid because of this property. We have these

sequestering together or sticking together to form a fiber which will gives disease.

The folding of the protein is extremely important and there are certain forces associated with

a folding of a protein. Now we will understand more of how we can actually get to solving

protein structures because we already know that we have this amino acid sequence, we

know that we can get a secondary structure out of it from that we can get a tertiary

structure finally to a quaternary structure.

So, if we want to look at the process to get a protein structure, there are only two kinds

of techniques are available to get atomic resolution pictures of macromolecules because

you want to know exactly where the atoms are. These will give the information about the

protein structure. It is very difficult to pin point where the atoms are if you were

to take a snap shot of an extremely large system which is a macro molecule. Now there

are two techniques available to get atomic resolution pictures. One is X-ray Crystallography

and the other is NMR Spectroscopy.

X-ray Crystallography is the only method that is very useful in solving a protein structure.

But NMR Spectroscopy is actually very fast catching up. Now the difficulty of the X-ray

Crystallography is getting a crystal because you should have a crystal of the protein to

do the Crystallography study. But getting a protein crystal is very difficult because

of its large size. It actually either forms a powder or just sticks together and does

not form crystal at all. And that is why it is very difficult to get protein crystals.

So you cannot do Crystallography without having a protein crystal. Due to this reason people

will do a protein structure prediction because it is easy to get the sequence of the protein.

We will be studying a method to find out the amino acid sequence of the protein very easily.

Now we should need to know the structures due to the following reasons. The structure

will help us to understand the function, the mechanism evolution etc. it will help us in

structure based drug design and it is also will help us basically to solve the protein

folding problem because we will have more structures from which we can identify which

amino acid sequence folds into which structure.

Here we have a random coil in which we have an a-helix, we have a ß-strand and then we

have a-helix. Here we have a sequence in one letter code in which we will have to know

whether H means helix, C means coil and E means extended sheet that is the nomenclature

here is H C E. Here H is helix, C is coil, E is extended sheet. Now if we have this sequence

we will have to know actually which part will be a helix, which part will be a coil and

which part will be a sheet that is going to help us in analysis. Here we are considering about the Protein Folding problem.

If you have a hundred residues protein you will know what is that mean it means you have

hundred amino acids in your polypeptide chain. Now, if we consider that each residue can

take only three positions then you know that you can have rotations about the bonds in

the amino acids that connect amino acids together.

So, actually it can take on many more positions but if I consider that this hundred residue

polypeptide chain can actually take only three positions then there are three to the hundred

possible conformations for this polypeptide chain which is about ten to the forty seven

possible conformations. Now the protein folds in one single structure and that is the only

structure folds into. So in that ten to forty seven possible conformations that are available

for a certain protein that is only hundred amino acid residues long if the protein decides

to fold into a specific protein and it took less than a 1012 second to determine whether

it was fold into anyone of these possible conformations. Then it would take 1027 years

for a single protein to fold which it does in a matter of mille seconds. So it knows

exactly how it supports to fold and where is this information? All the information will

be avoilable in this sequence. And this is the big question that is still unanswered.

We do not know how a particular sequence of amino acids residues that is the primary structure

will go to which tertiary structure. Because you understand based on the conformational

flexibility there are a very large number of conformations available to it but it will

fold to it into a single structure. It is like that the example I give you of the necklace.

You have a necklace of beads you pick it up, drop it on the table. It is never going to

fall in the same conformation twice you have even 2D it will not do that and forget about

the 3D.

Therefore the whole problem of protein folding is what is a tertiary structure will be for

a given particular sequence of amino acid? But what we can do is we can go for just small

predictions. We can say from the structures that are already available how this particular

sequence might form a helix or we can find out which part is going to be in the central

region of the protein by determining which are hydrophobic in nature. So if I find a

stretch of amino acids that are going to be hydrophobic in nature I can say that they

might be forming the central part of the folded protein. That is the some information I can

get which I will be a bit better of then just the primary sequence of the protein.

This is what is called a hydropathy plot. The hydropathy plot is a graphical display

of the local hydrophobicity of the amino acids side chains in a protein. Why do I want to

do that? I know remember I showed you a table which gives you the hydrophobicity values

of different amino acid residues. If you have a positive value then you have hydrophobic

residues, if you have a negative value then you have a water expose region or a hydrophilic

region. These hydropathy plots are actually most useful in predicting trans membrane segments

but we will have to know how we can find a hydrophobic region. And what I may do with

a hydrophobic region? I will predict that this hydrophobic region forms the center of

the protein because I know the protein has a hydrophobic core with a hydrophilic surface

to it. Then we learnt in the previous class that how we can explain whether a helix that

is on the surface, we can tell which part is going to be inside which part is going

to be outside. So now I will be slightly better of in saying about the whole protein sequence

as to determining which part will be in the middle of the protein and which part is likely

to be on the surface of the protein. So here we have a hydrophobicity scale which I showed

earlier. And we have hydrophobic residues for the positive values, hydrophilic residues

for the negative values.

Now what can I do with this? I can go through a hydropathy plot. it is called a Sliding

Window Approach. I will go through it very slowly and we will have to plot a hydropathy

plot to determine whether a part of the protein is going to be on the surface or whether the

part of the protein is going to be in the center of the protein. So what we have to

do is we have to calculate the property for a sub sequence. What do mean by a sub sequence?

Say I have this as the amino acid sequence.

We have I where I is Isoleucine then we have Leucine and another Isoleucine, Lysine, Glutamic

acid, Isoleucine, Arginine. Now what I need to know is from the table how I can determine

which part is inside which part is outside.

I have a specific sequence that I have here. Now I take the values for these amino acid

residues and take the average of them. So what we will do is let us just put another

amino acid for say Gly and Ala. So the first thing that I do here is I add the value for

isoleucine. Now the value for Isoleucine is 4.5, the value for Leucine is 3.8, again 4.5,

for Lysine is -3.9 why is it minus? Because it is hydrophilic in nature, for Glutamic

acid is -3.5, Isoleucine is 4.5, Arginine is -4.5, Glycine is -0.4 and Alanine is 1.8.

Now this is called Sliding Window Approach. So this is residue number 1 then 2,3,4,5,6,7,8

and 9. I take the first seven residues that are called the window. I take a window of

seven, I take the average of these so that I have to add all them and divide by 7. We

can work that out and we find out that we get a value. We add all these up together

from one through seven and then we get a value. We have to add 4.5 + 3.8 + 4.53.9 -3.5

+4.54.5 the total comes to 5.40. So here we have the total as 5.4 we want the average of this. So we divide

by 7 then we get 0.77 this is assigned to the central residue. So in this case the residue

number 4 will have an average hydropathy index value of 0.77.

Then we move to next window, a sliding window approach. So what do I have to do now? I have

to go from two to eight. When I go from two to eight I have to add all these numbers from

Leucine, Isoleucine, Lysine, and Glutamic acid, Isoleucine, Arginine and Glycine together

and then I have to divide by 7 again. Here 5.40 / 7 gave me 0.77 that is assigned to

the central residue. So this is assigned to residue number #4. Then I take the other set

then here I am going to loose 4.5 from this and add -0.4 basically. So what I am going

to loose from 5.40, what is the value going to be for number #5? I will have a specific

value here. So if I add all these values together from two that is 3.8 + 4.53.9 -3.5 +4.5

4.5 -0.4.

I will get 0.5 then what do I have to do is have to divide by 7 that is going to give

the value 0.07 so this is asign to residue number #5. Then I have to slide my window

once more. Actually you will have to do this for the whole protein but we are not going

to do it now. So I have to go from residue number 3 to 9. Then I get another value that

I have assigned to residue number #6 and so on. Eventually what I may go to get? I am

going to get values from leaving out the first three residues and the last three residues

I will get values for the average hydrophobicity for the set of amino acids that formed this

particular window. Then what you can do is make a plot. Basically you have understood

that you can change the window size. We can make it in a nine residue window or an eleven

residue window but we make it an odd residue window so that we can assign it to the central

amino acid.

Usually if you have small windows then you have noisy at plots. This is usually nine

or eleven is used, here we have used seven but that is fine. Now this is when we have

membrane in this ease. We have a lipid bilayer, we have the Cytoplasmic face and we have the

inside basically and the outside.

Now, if we look at the types of residues we understand from the helical wheel which will

be hydrophilic in nature and which will be hydrophobic in nature. If we have a membrane

that is around 30 Aº we know that the rise per amino acid residue is 1.5 Aº. What is

that? That is the vertical rise per amino acid residue. The pitch that we saw was for

a complete turn was 5.4 Aº and for a single amino acid we have 1.5 Aº. Now if know that

my membrane is 30 Aº thick then how many hydrophobic residues should I have there?

Twenty, because each 1.5 Aº is for every amino acid residue, I have to span 30 Aºs.

So if I have a stretch of amino acid residues that actually form a helix here that to spam

the whole membrane I know if it were a single helix all of them would be hydrophobic in

nature because I have my lipid hydrophobic tales that have to interact with the helix.

So what can I do? I can say that when I am spamming the membrane with a helix then the

nature of the residue in this helix is hydrophobic in nature. So all the ones that are sticking

out here, all the side chains that are out here are going to be hydrophobic in nature.

Now what I need in such a specific protein sequence is a stretch of twenty amino acids

because I have a 30 Aº stretch and I know that for every amino acid I traverse 1.5 Aº

angstroms in height where 30 Aº is the thickness of the membrane. So the rise is 1.5 Aº per

amino acid. So I need twenty hydrophobic amino acids to construct a hydropathy plot.

Now for the hydropathy plot here on the Y-axis we will have an index and on the X-axis we

will have the sequence of the protein. So we have the amino acid sequence on the X-axis

and we have index on the Y-axis. What is this index? It is the average index that we found

out earlier. So I have residue 1 here and then 2, 3, 4, 5, 6, 7, 8 and so on. Then I

had a sliding window where the window size was seven. So now I assigned the first value to residue number 4 which was 0.77 in

this case. This is positive this is negative. So somewhere say if this is 1 this is 2 and

this is 3 so 0.77 is some where here. I just make a plot. Then when the windows slide over

instead of from residue one through seven which I assigned to residue number 4 and I

went from two to eight which I assigned to residue number 5 that came out be 0.07 so

that was very low down here. So I can complete a whole plot for protein. What do you need

to construct a hydropathy plot? You need the sequence of the protein and you need the hydrophobicity

values construct a hydropathy plot. Therefore we have the amino acid sequence and we also

have the hydrophobicity in this ease. Then I have to find the average depending upon

my window size. Then here I have a possible plot like this. This region is positive, this

region is negative.

Now what can I say about the positive regions? They are hydrophobic in nature. Now you understand

that when you take the average, a hydrophilic residue counteracts the effect of a hydrophilic

residue. But if you have only a stretch of hydrophobic amino acids this value would be

a high positive value. If you had a high positive value then you can have a stretch of highly

hydrophobic amino acid residues. So, when we look at this I can say this is a highly

hydrophobic stretch. When I am talking about a normal protein that is not a membrane protein

I can safely say that this part is going to be the central part of the protein or the

central core of the protein because it is hydrophobic in nature, it will not be on the


But usually when we do these hydropathy plots they are mainly done for membranes because

it tells you that this region is probably spamming the membrane. Why because if I have

said the residue is from approximately number 20 to number 45 here. So what is my stretch

of amino acids? I have approximately 25 amino acids which are hydrophobic in nature and

I know this is a membrane protein and I can that this part forms the helix. I can very

safely say that it is this part that is forming the helix of the membrane protein because

this part is hydrophobic in nature. And I know if I have a single Transmembrane helix

all of the residues have to interact with the lipid bilayer which is hydrophobic in


So I can plot a hydropathy plot that tells which region will be hydrophobic in nature

and which region will be hydrophilic in nature. So I can say that these regions are going

to be on the surface and I can say that these regions are going to be varied in the core

of the protein. And I can say that a Transmembrane helix is going to be on the membrane side.

Usually the hydropathy analysis is used to locate Transmembrane segments. But you can

also do it for a regular protein because the reason being that not many structures of Transmembrane

helix proteins are known. And the main signal is a stretch of hydrophobic and helix loving

amino acids. What do you mean by helix loving amino acids? Residues those are likely to

form a-helix. So that is what a hydropathy plot would look like. This is a hydropathy

plot for a rhodopsin. So I can say all the positive parts if see this is the residue

number 50 to 100, 150, 200, 250, 300 and so on. So these are stretches that are larger

than twenty amino acid residues. Based on the scale, if it goes form 0 to 350 then these

are larger than 20.

I can say I have 1, 2, 3, 4, 5, 6, 7 probable helices. What are these helices? They are

interacting with lipid bilayer of the membrane because rhodopsin is a membrane protein. So

this is a typical hydropathy plot. And what is the information you can get from this?

Now you understand that if you have a stretch of a hydrophobic amino acid then this is the

region that will be the helix part of the Transmembrane protein or rather this will

be Transmembrane segment. This will be the helix that is going to interact with the lipid

bilayer of the membrane. So again this is a very simple plot.

All the information you need is the sequence and the table. You can also construct a helical

wheel. What is the information you need to construct the helical wheel? Just the sequence

because for every amino acid you will get rotation is 100º. So you need the amino acid

sequence for the construction of the helical wheel. You need additionally the hydrophobicity

values of the amino acid residues for the hydropathy plot.

These are two other proteins where this is BACTERIORHODOPSIN and this is GLYCOPHORIN.

Now you know which stretch is hydrophobic in nature. We know which stretch is hydrophobic

in nature, which are mostly hydrophilic in nature. If this was for a normal protein then

you could say the region number 1, 2, 3, 4, 5, 6, 7 would form the inner core of the protein

the central part of the protein. So you could safely say these probably were on the outside.

So you would be better of just having the sequence of the protein and no idea is about

how the protein is folding.

We know

the primary sequence for over two hundred thousand proteins and we know the crystal

structures for twenty five thousand proteins which is miserable. If you know only the structure

of the protein, can you say the function or can you a design drug that is going to act

on it? You con not say and it does not lead you anywhere with knowing the sequence for

only two hundred thousand proteins and the crystal structures for only five thousand

proteins. So we have to know what the structures of the all these proteins will be and we have

to go for these prediction methods.

What does this give you an idea of? This gives you an idea that all we learned a helical

wheel from just the sequence. If I know the sequence which will form a helix then I can

say whether which part of the helix is going to be inside which is going to be outside.

Now, if we are looking at the sequence of this and I know from the hydrophobicity in

disease where I have a hydrophobic region I can safely say that this hydrophobic region

will form part of the protein core of the protein. But in this case when we are talking

about Transmembrane segments these are the regions that traverse the mapping.

Now we want to go for secondary structure prediction. I want to know where a helix will

form, so I am bit bolder now I had the sequence and from the sequence I could construct a

helical wheel. But what is the idea of constructing a helical wheel if you dont know where

the helix is going to be? You can not keep on doing with for the whole protein.

You can do the hydropathy index plot for the whole protein and then figure out which regions

are inside or outside. But you have the protein sequence always available to do a secondary

structure prediction. You do not have the structure always available.

If we want to construct a three state model we have the helix, the strand and the coil.

So we have basically a, a ß and a turn. These are just some numbers so we need another table

if we want to go for a secondary structure prediction. This is a very famous way or rather

very easy to predict whether you have a helix or not. These are called the Chou -Fasman

Parameters. It tells you the chance or rather the propensity that you are going to have

an Alanine in a-helix. Here this value is called the propensity. The larger the number

will have the larger probability that a helix is going to be in that specific secondary


Therefore what this table tells you is that you see all the twenty amino acids here, here

the numbers tells you whether these amino acids will form as that is alpha helices

or bs that is beta strands or coils that is turns because you have your protein sequence.

This is L that is another notation that is also used apart from C. So you want to know

where the coils are or the turns of the loops are, you want to know where the helices are

and you to know where the sheets are because that will give you a better idea of how a

protein is going to fold. Because you will have some information from hydrophobicity,

you will have some information about the secondary structure. So that will lead you into a better

idea of how a protein is actually going to form its tertiary structure from its amino

acid sequence. So we have the sequence of the protein, from the sequence of the protein

we just have to look at the numbers.

If six out of six contiguous numbers or rather six contiguous amino acids in which if four

of them have p(a) > 100 then a helix will form. If I had a helix, we know whether helix

begins here so MQGVVT. Here M >100 so we have one amino acid greater than hundred. The Q

where it is Glutamine which is also greater than hundred so I have two out of two greater

than hundred. G is Glycine which is 57 which is less than hundred. So I have two out of

three. Then V which is 163 so three out of four, again V is greater than hundred then

four out of five, T which is Threonine is 83. So after go bit further I may write here

MQGVVT. So we should have four that are greater than hundred. I have MQVV out of the six MQGVVT

which are greater than hundred. So here I have a helix HHHHHH.

What about the next one? Then we extend the region until four amino acids with P(a) is

less than hundred are found. So all you have to do is you just have a slide, again you

are sliding window where you are looking at a window size of six. This six telling you

that if you have four out of the six which are P(a) > 100 then you have an a-helix. If

the P(a) > 100 then you do not have an a-helix any more.

How do you look for a ß-sheet? If the P(b) > 100 for four out of six than you have a

ß-sheet. The problem arises when the a and the ß regions overlapped. Then you have to

do some mathematics. you have to some P(a) value of the six residues, some P(b) value

of the six residues which ever is higher it is going to be that.

So what information do you have? You have a lot of information from the amino acid sequence.

Now actually you can say whether you have a helix or not. So now it will make sense

from an understanding of whether you have a helix or not. Then you can construct a helical

wheel and you can say which part is going to be outside and inside. So we gradually

getting into know more and more of the structure. We have a helix which is 4 out of 6 residues

with high helix propensity. Now I am talking about propensity, it is sought of a probability

but you see the numbers are greater than hundred. In some tables you might see like we have

the P(a)Ala = 142 or some times put it as 1??42. The way Chou and Fasman got all these

numbers was by a statistical analysis on the structures that are available. What do I mean

by a structures are available? The crystal structures are solved for proteins are available

in protein data bank. Now it is freely available where you can download protein structures.

The protein structures are you have the x, y and z coordinates for all the atoms except

the hydrogen atoms because X-ray Crystallography cannot look at hydrogen atoms.

So I am looking at residue number #1 then I will have nitrogen for residue number #1.

Residue number #1 will also have a Ca atom associated with it, residue number #1 will

also have carbon atom associated to it where it is part of the carboxylic group and residue

number #1 will also have oxygen associated with it. What do I need to draw it? I need

these values. Only if I have these values I can draw it in three dimensional spaces.

So the protein data bank gives you these values for the twenty five thousand structures that

are available in it.

So when I go to residue number two it starts again with nitrogen because I am going from

the amino terminus to the carboxylic terminus. Then if you had a side chain you would have

apart from a Ca, you would also have a Cß. so the C beta would be return after this.

So this would be the back bone. Then you would have a Cß and so on but what we need to know

is there are a set of structures available for which you can do an analysis. Here for

say you look at all the helices that are there in the proteins and you have to count the

number of Alanines that are there in the alpha helices. You have to count all the residues

that are there in alpha a-helix only.

Then you have to count the number of Alanine in the database including the ones in the alpha. So you have

to calculate all the Alanines that are present. You have to calculate the number of residues

that are there in the database which ever database you are using. Propensity is a ratio.

It is a ratio of the number of Ala in a divided by the number of residues in a to the number

of Ala in database divided by the number of residues in the database which we can write

it as Propensity = [#Alaa / #Resa] / [#Aladb / #Resdb]. So this is your propensity calculation.

This number is greater than one because you have to remember that you looking at a large

sequence which is a polypeptide sequence of a large set of proteins.

You want to whether alpha is preferred in helices. If I look at this, this will give

me some idea about whether alpha is preferred in helices or not because this gives me the

number of Alanine in the whole database. So if I have say 8% of Alanine in the whole database

then I can calculate these as a percentage. For say I have thousand residues in the database

and hundred of them are Alanine, From that thousand residues in the database two hundred

are there in the alpha helices in which twenty are Alanine. So what would the value be, (20/200)

/ (100/1000) so this is equal to 1. This is nothing great that I have in the helices.

Just I have ten percent as I have in the rest of the protein. I do not have any information

about it. But if I had fifty Alanines then the value would be greater than one. Then

I can see more of Alanine in helices which makes it significant than in the normal case

of a protein. That is how they came up with these numbers in the slide here. So 1??42

means that this number is greater than one which means the Alanine would like to be in


But let us think about a Proline, all of you know how a Proline looks like. It will break

the helix because you cannot have a turn properly and it is an amino acid that bends on to itself.

So the propensity of it to forms a in an a-helix should be very low. Here the value of Glycine

is 57 which is also a very low value why because it does not like to be an a-helix or it is

not seen rather in a-helix for the analysis that has been done for the set of proteins

which is true for mostly all the cells. Now if I look at a turn where these have mostly

Glycine and Proline, the Glycine is because of its flexibility and Proline is because

it basically helps in the turn back of chained at times. So look at these numbers 152 and

156 which are pretty high. And Asparagine is also high which is also 156.

So that is how these propensity values were actually determined. The propensity values

have since been determined again for a very larger set of amino acids but this table is

still used today for a rough prediction of where a-helix or ß-sheet will be. We just

need to have the table to figure out where are helix is going to be and where are sheet

is going to be and the rest of it will going be coil. Here you have a turn set also, you

have a p turn also.

So this is our table and this is our sequence. So I have the T S P T A E here I have put

in values S is 77, T is suppose to be 83 where is 69. So we have Threionine, Serine, Proline

and so and so forth. Now when I am at this point how many do I have that are greater

than hundred just two. So I cannot say that I have a helix formation. I slide my window

down to serine. I have an additional one greater than one hundred but it is still three out

of six which is not good enough. Then I slide it again so now I have four out of six. So

what can I say now? The helix begins.

Here the helix begins basically just after the Proline but you need to know is you have

a sequence. And you can say from the sequence and from the table that where the helix is

going to be, where the sheet is going to be and where the turn is going to be. Then based

on that what we can do is from this information we can roughly determine the sequence of the

protein. So now I have the sequence 1 2 3 4…. and so on and what can I say is I have

a helix here, I have a turn here and then I have a sheet here so we can say what we

have. Then what do I do? I can do a hydropathy plot. What is hydropathetic plot going to

tell me? It will tell me that which parts of these are hydrophobic in nature. So I can

say this part is hydrophobic and again this part is hydrophobic.

So I can say that this part is going to be inside, this part is going to be outside and

then again this part is going to be inside. So I have some information of how the protein

is going to be fold. Now I can construct a helical wheel for this, I can also construct

a helical wheel for this.

This face would be hydrophobic because this turn can rotate basically, I can have a rotation

about this which would make either face in or out. Then what would I have to do? I have

to construct a helical wheel. If I know that this face is hydrophobic I have to turn it

around to make it to come to the core of the protein. So I have a hydrophobic region and

a hydrophilic region which is therefore on the outside.

Now I am better of in determining how the protein is going to interact, how it is going

to fold into giving the final tertiary structure. So what we learnt is that we can determined

where we might have the helix from the Chou -Fasman parameters, we can determine where

we can have the hydrophobic regions or where we can have Transmembrane segments form a

hydropathy plot and we can determined whether this part is outside or inside from the helical

wheel. So we are better of just on the primary amino acid sequence of the protein. Thank


The Description of Lecture - 05 Protein Structure III