Exploring Modular Protein Architecture
Exercises
April 2nd 2008
Exercise 1
One of the following two sequences is an enzyme, the other is a
scaffold protein (used by the cell to bind a large number of different
proteins) containing many linear intereaction motifs.
Use the protein order/disorder predictors linked to below to decide
which of the two sequences is which.
>P24588
METTISEIHVENKDEKRSAEGSPGAERQKEKASMLCFKRRKKAAKALKPKAGSEAADVAR
KCPQEAGASDQPEPTRGAWASLKRLVTRRKRSESSKQQKPLEGEMQPAINAEDADLSKKK
AKSRLKIPCIKFPRGPKRSNHSKIIEDSDCSIKVQEEAEILDIQTQTPLNDQATKAKSTQ
DLSEGISQKDGDEVCESNVSNSITSGEKVISVELGLDNGHSAIQTGTLILEEIETIKEKQ
DVQPQQASPLETSETDHQQPVLSDVPPLPAIPDQQIVEEASNSTLESAPNGKDYESTEIV
AEETKPKDTELSQESDFKENGITEEKSKSEESKRMEPIAIIITDTEISEFDVTKSKNVPK
QFLISAENEQVGVFANDNGFEDRTSEQYETLLIETASSLVKNAIQLSIEQLVNEMASDDN
KINNLLQ
>P61626
MKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRA
TNYNAGDRSTDYGIFQINSRYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRD
PQGIRAWVAWRNRCQNRDVRQYVQGCGV
If you'd like to read more about the two proteins, and visit the ExPASY site to search for the
swissprot entries of them (using the identifiers for each which are
given in the description lines above)
Exercise 2
The following sequence is predominantly IUP, with a globular region at
one end. Use the disorder prediction servers to determine the
approximate size of the globular region, and where the transition
between IUP and globular structure occurs (at the C terminus/furtherst
right region of the protein or at the N terminus/furthest left region)
>P02462
MGPRLSVWLLLLPAALLLHEEHSRAAAKGGCAGSGCGKCDCHGVKGQKGERGLPGLQGVI
GFPGMQGPEGPQGPPGQKGDTGEPGLPGTKGTRGPPGASGYPGNPGLPGIPGQDGPPGPP
GIPGCNGTKGERGPLGPPGLPGFAGNPGPPGLPGMKGDPGEILGHVPGMLLKGERGFPGI
PGTPGPPGLPGLQGPVGPPGFTGPPGPPGPPGPPGEKGQMGLSFQGPKGDKGDQGVSGPP
GVPGQAQVQEKGDFATKGEKGQKGEPGFQGMPGVGEKGEPGKPGPRGKPGKDGDKGEKGS
PGFPGEPGYPGLIGRQGPQGEKGEAGPPGPPGIVIGTGPLGEKGERGYPGTPGPRGEPGP
KGFPGLPGQPGPPGLPVPGQAGAPGFPGERGEKGDRGFPGTSLPGPSGRDGLPGPPGSPG
PPGQPGYTNGIVECQPGPPGDQGPPGIPGQPGFIGEIGEKGQKGESCLICDIDGYRGPPG
PQGPPGEIGFPGQPGAKGDRGLPGRDGVAGVPGPQGTPGLIGQPGAKGEPGEFYFDLRLK
GDKGDPGFPGQPGMTGRAGSPGRDGHPGLPGPKGSPGSVGLKGERGPPGGVGFPGSRGDT
GPPGPPGYGPAGPIGDKGQAGFPGGPGSPGLPGPKGEPGKIVPLPGPPGAEGLPGSPGFP
GPQGDRGFPGTPGRPGLPGEKGAVGQPGIGFPGPPGPKGVDGLPGDMGPPGTPGRPGFNG
LPGNPGVQGQKGEPGVGLPGLKGLPGLPGIPGTPGEKGSIGVPGVPGEHGAIGPPGLQGI
RGEPGPPGLPGSVGSPGVPGIGPPGARGPPGGQGPPGLSGPPGIKGEKGFPGFPGLDMPG
PKGDKGAQGLPGITGQSGLPGLPGQQGAPGIPGFPGSKGEMGVMGTPGQPGSPGPVGAPG
LPGEKGDHGFPGSSGPRGDPGLKGDKGDVGLPGKPGSMDKVDMGSMKGQKGDQGEKGQIG
PIGEKGSRGDPGTPGVPGKDGQAGQPGQPGPKGDPGISGTPGAPGLPGPKGSVGGMGLPG
TPGEKGVPGIPGPQGSPGLPGDKGAKGEKGQAGPPGIGIPGLRGEKGDQGIAGFPGSPGE
KGEKGSIGIPGMPGSPGLKGSPGSVGYPGSPGLPGEKGDKGLPGLDGIPGVKGEAGLPGT
PGPTGPAGQKGEPGSDGIPGSAGEKGEPGLPGRGFPGFPGAKGDKGSKGEVGFPGLAGSP
GIPGSKGEQGFMGPPGPQGQPGLPGSPGHATEGPKGDRGPQGQPGLPGLPGPMGPPGLPG
IDGVKGDKGNPGWPGAPGVPGPKGDPGFQGMPGIGGSPGITGSKGDMGPPGVPGFQGPKG
LPGLQGIKGDQGDQGVPGAKGLPGPPGPPGPYDIIKGEPGLPGPEGPPGLKGLQGLPGPK
GQQGVTGLVGIPGPPGIPGFDGAPGQKGEMGPAGPTGPRGFPGPPGPDGLPGSMGPPGTP
SVDHGFLVTRHSQTIDDPQCPSGTKILYHGYSLLYVQGNERAHGQDLGTAGSCLRKFSTM
PFLFCNINNVCNFASRNDYSYWLSTPEPMPMSMAPITGENIRPFISRCAVCEAPAMVMAV
HSQTIQIPPCPSGWSSLWIGYSFVMHTSAGAEGSGQALASPGSCLEEFRSAPFIECHGRG
TCNYYANAYSFWLATIERSEMFKKPTPSTLKAGELRTHVSRCQVCMRRT
Exercise 3
If you have time, try predicting order/disorder in some of your own
sequences of interest.
Otherwise look at the following sequences to see if you can find a
consensus on the distribution of regions of order and disorder within
them. Compare the results of disorder predictions with those from
PFAM/SMART/CDD. Look also in ELM to see if there are any known
instances of linear motifs within these proteins. If so, do they fall
within regions you expect to be ordered or disordered?
>P00519|ABL1_HUMAN Proto-oncogene tyrosine-protein kinase ABL1 -
Homo sapiens (Human).
MLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSE
NDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVN
SLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTAS
DGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERT
DITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQ
LLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFI
HRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKS
DVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNP
SDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPTKTRTSRRAAE
HRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLF
SALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSP
KPSNGAGVPNGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSAS
CVPHGAKDTEWRSVTLPRDLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTV
TPPPRLVKKNEEAADEVFKDIMESSPGSSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGS
ALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRRHKHSSESPGRDKGKLSRLKPAPP
PPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAAKPSQPGEGLKKPVL
PATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPPE
RIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNK
FAFREAINKLENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR
>P35568|IRS1_HUMAN Insulin receptor substrate 1 - Homo sapiens
(Human).
MASPPESDGFSDVRKVGYLRKPKSMHKRFFVLRAASEAGGPARLEYYENEKKWRHKSSAP
KRSIPLESCFNINKRADSKNKHLVALYTRDEHFAIAADSEAEQDSWYQALLQLHNRAKGH
HDGAAALGAGGGGGSCSGSSGLGEAGEDLSYGDVPPGPAFKEVWQVILKPKGLGQTKNLI
GIYRLCLTSKTISFVKLNSEAAAVVLQLMNIRRCGHSENFFFIEVGRSAVTGPGEFWMQV
DDSVVAQNMHETILEAMRAMSDEFRPRSKSQSSSNCSNPISVPLRRHHLNNPPPSQVGLT
RRSRTESITATSPASMVGGKPGSFRVRASSDGEGTMSRPASVDGSPVSPSTNRTHAHRHR
GSARLHPPLNHSRSIPMPASRCSPSATSPVSLSSSSTSGHGSTSDCLFPRRSSASVSGSP
SDGGFISSDEYGSSPCDFRSSFRSVTPDSLGHTPPARGEEELSNYICMGGKGPSTLTAPN
GHYILSRGGNGHRCTPGTGLGTSPALAGDEAASAADLDNRFRKRTHSAGTSPTITHQKTP
SQSSVASIEEYTEMMPAYPPGGGSGGRLPGHRHSAFVPTRSYPEEGLEMHPLERRGGHHR
PDSSTLHTDDGYMPMSPGVAPVPSGRKGSGDYMPMSPKSVSAPQQIINPIRRHPQRVDPN
GYMMMSPSGGCSPDIGGGPSSSSSSSNAVPSGTSYGKLWTNGVGGHHSHVLPHPKPPVES
SGGKLLPCTGDYMNMSPVGDSNTSSPSDCYYGPEDPQHKPVLSYYSLPRSFKHTQRPGEP
EEGARHQHLRLSTSSGRLLYAATADDSSSSTSSDSLGGGYCGARLEPSLPHPHHQVLQPH
LPRKVDTAAQTNSRLARPTRLSLGDPKASTLPRAREQQQQQQPLLHPPEPKSPGEYVNIE
FGSDQSGYLSGPVAFHSSPSVRCPSQLQPAPREEETGTEEYMKMDLGPGRRAAWQESTGV
EMGRLGPAPPGAASICRPTRAVPSSRGDYMTMQMSCPRQSYVDTSPAAPVSYADMRTGIA
AEEVSLPRATMAAASSSSAASASPTGPQGAAELAAHSSLLGGPQGPGGMSAFTRVNLSPN
RNQSAKVIRADPQGCRRRHSSETFSSTPSATRVGNTVPFGAGAAVGGGGGSSSSSEDVKR
HSSASFENVWLRPGELGGAPKEPAKLCGAAGGLENGLNYIDLDLVKDFKQCPQECTPEPQ
PPPPPPPHQPLGSGESSSTRRSSEDLSAYASISFQKQPEDRQ
Some Protein Order/Disorder Predictors
Remember - usually safest to input just "raw" protein sequence - i.e.
"GHYYITS..."
Rather than inputing as FASTA format i.e. with description line
">P1248923 protein description
GHYYITS..."
Run the servers using default options unless otherwise indicated
Some Protein Family/Structural Domain Predictors
Linear Motif Investigation
Find Sequences
- ExPASY
- ENTREZ
- GOOGLE - you'd be surprised
how often it's easiest just to google to find the information e.g.
"swissprot src human" as search terms...
Back
to Gibson Team course pages at EMBL.