Exploring Modular Protein Architecture

Aidan Budd

Exercises

April 2nd 2008


Exercise 1

One of the following two sequences is an enzyme, the other is a scaffold protein (used by the cell to bind a large number of different proteins) containing many linear intereaction motifs.

Use the protein order/disorder predictors linked to below to decide which of the two sequences is which.

>P24588
METTISEIHVENKDEKRSAEGSPGAERQKEKASMLCFKRRKKAAKALKPKAGSEAADVAR
KCPQEAGASDQPEPTRGAWASLKRLVTRRKRSESSKQQKPLEGEMQPAINAEDADLSKKK
AKSRLKIPCIKFPRGPKRSNHSKIIEDSDCSIKVQEEAEILDIQTQTPLNDQATKAKSTQ
DLSEGISQKDGDEVCESNVSNSITSGEKVISVELGLDNGHSAIQTGTLILEEIETIKEKQ
DVQPQQASPLETSETDHQQPVLSDVPPLPAIPDQQIVEEASNSTLESAPNGKDYESTEIV
AEETKPKDTELSQESDFKENGITEEKSKSEESKRMEPIAIIITDTEISEFDVTKSKNVPK
QFLISAENEQVGVFANDNGFEDRTSEQYETLLIETASSLVKNAIQLSIEQLVNEMASDDN
KINNLLQ

>P61626
MKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRA
TNYNAGDRSTDYGIFQINSRYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRD
PQGIRAWVAWRNRCQNRDVRQYVQGCGV

If you'd like to read more about the two proteins, and visit the ExPASY site to search for the swissprot entries of them (using the identifiers for each which are given in the description lines above)

Exercise 2

The following sequence is predominantly IUP, with a globular region at one end. Use the disorder prediction servers to determine the approximate size of the globular region, and where the transition between IUP and globular structure occurs (at the C terminus/furtherst right region of the protein or at the N terminus/furthest left region)

>P02462
MGPRLSVWLLLLPAALLLHEEHSRAAAKGGCAGSGCGKCDCHGVKGQKGERGLPGLQGVI
GFPGMQGPEGPQGPPGQKGDTGEPGLPGTKGTRGPPGASGYPGNPGLPGIPGQDGPPGPP
GIPGCNGTKGERGPLGPPGLPGFAGNPGPPGLPGMKGDPGEILGHVPGMLLKGERGFPGI
PGTPGPPGLPGLQGPVGPPGFTGPPGPPGPPGPPGEKGQMGLSFQGPKGDKGDQGVSGPP
GVPGQAQVQEKGDFATKGEKGQKGEPGFQGMPGVGEKGEPGKPGPRGKPGKDGDKGEKGS
PGFPGEPGYPGLIGRQGPQGEKGEAGPPGPPGIVIGTGPLGEKGERGYPGTPGPRGEPGP
KGFPGLPGQPGPPGLPVPGQAGAPGFPGERGEKGDRGFPGTSLPGPSGRDGLPGPPGSPG
PPGQPGYTNGIVECQPGPPGDQGPPGIPGQPGFIGEIGEKGQKGESCLICDIDGYRGPPG
PQGPPGEIGFPGQPGAKGDRGLPGRDGVAGVPGPQGTPGLIGQPGAKGEPGEFYFDLRLK
GDKGDPGFPGQPGMTGRAGSPGRDGHPGLPGPKGSPGSVGLKGERGPPGGVGFPGSRGDT
GPPGPPGYGPAGPIGDKGQAGFPGGPGSPGLPGPKGEPGKIVPLPGPPGAEGLPGSPGFP
GPQGDRGFPGTPGRPGLPGEKGAVGQPGIGFPGPPGPKGVDGLPGDMGPPGTPGRPGFNG
LPGNPGVQGQKGEPGVGLPGLKGLPGLPGIPGTPGEKGSIGVPGVPGEHGAIGPPGLQGI
RGEPGPPGLPGSVGSPGVPGIGPPGARGPPGGQGPPGLSGPPGIKGEKGFPGFPGLDMPG
PKGDKGAQGLPGITGQSGLPGLPGQQGAPGIPGFPGSKGEMGVMGTPGQPGSPGPVGAPG
LPGEKGDHGFPGSSGPRGDPGLKGDKGDVGLPGKPGSMDKVDMGSMKGQKGDQGEKGQIG
PIGEKGSRGDPGTPGVPGKDGQAGQPGQPGPKGDPGISGTPGAPGLPGPKGSVGGMGLPG
TPGEKGVPGIPGPQGSPGLPGDKGAKGEKGQAGPPGIGIPGLRGEKGDQGIAGFPGSPGE
KGEKGSIGIPGMPGSPGLKGSPGSVGYPGSPGLPGEKGDKGLPGLDGIPGVKGEAGLPGT
PGPTGPAGQKGEPGSDGIPGSAGEKGEPGLPGRGFPGFPGAKGDKGSKGEVGFPGLAGSP
GIPGSKGEQGFMGPPGPQGQPGLPGSPGHATEGPKGDRGPQGQPGLPGLPGPMGPPGLPG
IDGVKGDKGNPGWPGAPGVPGPKGDPGFQGMPGIGGSPGITGSKGDMGPPGVPGFQGPKG
LPGLQGIKGDQGDQGVPGAKGLPGPPGPPGPYDIIKGEPGLPGPEGPPGLKGLQGLPGPK
GQQGVTGLVGIPGPPGIPGFDGAPGQKGEMGPAGPTGPRGFPGPPGPDGLPGSMGPPGTP
SVDHGFLVTRHSQTIDDPQCPSGTKILYHGYSLLYVQGNERAHGQDLGTAGSCLRKFSTM
PFLFCNINNVCNFASRNDYSYWLSTPEPMPMSMAPITGENIRPFISRCAVCEAPAMVMAV
HSQTIQIPPCPSGWSSLWIGYSFVMHTSAGAEGSGQALASPGSCLEEFRSAPFIECHGRG
TCNYYANAYSFWLATIERSEMFKKPTPSTLKAGELRTHVSRCQVCMRRT

Exercise 3

If you have time, try predicting order/disorder in some of your own sequences of interest.

Otherwise look at the following sequences to see if you can find a consensus on the distribution of regions of order and disorder within them. Compare the results of disorder predictions with those from PFAM/SMART/CDD. Look also in ELM to see if there are any known instances of linear motifs within these proteins. If so, do they fall within regions you expect to be ordered or disordered?

>P00519|ABL1_HUMAN Proto-oncogene tyrosine-protein kinase ABL1 - Homo sapiens (Human).
MLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSE
NDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVN
SLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTAS
DGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERT
DITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQ
LLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFI
HRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKS
DVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNP
SDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPTKTRTSRRAAE
HRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLF
SALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSP
KPSNGAGVPNGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSAS
CVPHGAKDTEWRSVTLPRDLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTV
TPPPRLVKKNEEAADEVFKDIMESSPGSSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGS
ALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRRHKHSSESPGRDKGKLSRLKPAPP
PPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAAKPSQPGEGLKKPVL
PATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPPE
RIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNK
FAFREAINKLENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR

>P35568|IRS1_HUMAN Insulin receptor substrate 1 - Homo sapiens (Human).
MASPPESDGFSDVRKVGYLRKPKSMHKRFFVLRAASEAGGPARLEYYENEKKWRHKSSAP
KRSIPLESCFNINKRADSKNKHLVALYTRDEHFAIAADSEAEQDSWYQALLQLHNRAKGH
HDGAAALGAGGGGGSCSGSSGLGEAGEDLSYGDVPPGPAFKEVWQVILKPKGLGQTKNLI
GIYRLCLTSKTISFVKLNSEAAAVVLQLMNIRRCGHSENFFFIEVGRSAVTGPGEFWMQV
DDSVVAQNMHETILEAMRAMSDEFRPRSKSQSSSNCSNPISVPLRRHHLNNPPPSQVGLT
RRSRTESITATSPASMVGGKPGSFRVRASSDGEGTMSRPASVDGSPVSPSTNRTHAHRHR
GSARLHPPLNHSRSIPMPASRCSPSATSPVSLSSSSTSGHGSTSDCLFPRRSSASVSGSP
SDGGFISSDEYGSSPCDFRSSFRSVTPDSLGHTPPARGEEELSNYICMGGKGPSTLTAPN
GHYILSRGGNGHRCTPGTGLGTSPALAGDEAASAADLDNRFRKRTHSAGTSPTITHQKTP
SQSSVASIEEYTEMMPAYPPGGGSGGRLPGHRHSAFVPTRSYPEEGLEMHPLERRGGHHR
PDSSTLHTDDGYMPMSPGVAPVPSGRKGSGDYMPMSPKSVSAPQQIINPIRRHPQRVDPN
GYMMMSPSGGCSPDIGGGPSSSSSSSNAVPSGTSYGKLWTNGVGGHHSHVLPHPKPPVES
SGGKLLPCTGDYMNMSPVGDSNTSSPSDCYYGPEDPQHKPVLSYYSLPRSFKHTQRPGEP
EEGARHQHLRLSTSSGRLLYAATADDSSSSTSSDSLGGGYCGARLEPSLPHPHHQVLQPH
LPRKVDTAAQTNSRLARPTRLSLGDPKASTLPRAREQQQQQQPLLHPPEPKSPGEYVNIE
FGSDQSGYLSGPVAFHSSPSVRCPSQLQPAPREEETGTEEYMKMDLGPGRRAAWQESTGV
EMGRLGPAPPGAASICRPTRAVPSSRGDYMTMQMSCPRQSYVDTSPAAPVSYADMRTGIA
AEEVSLPRATMAAASSSSAASASPTGPQGAAELAAHSSLLGGPQGPGGMSAFTRVNLSPN
RNQSAKVIRADPQGCRRRHSSETFSSTPSATRVGNTVPFGAGAAVGGGGGSSSSSEDVKR
HSSASFENVWLRPGELGGAPKEPAKLCGAAGGLENGLNYIDLDLVKDFKQCPQECTPEPQ
PPPPPPPHQPLGSGESSSTRRSSEDLSAYASISFQKQPEDRQ


Some Protein Order/Disorder Predictors

Remember - usually safest to input just "raw" protein sequence - i.e.

"GHYYITS..." 

Rather than inputing as FASTA format i.e. with description line

">P1248923 protein description
GHYYITS..." 


Run the servers using default options unless otherwise indicated

Some Protein Family/Structural Domain Predictors


Linear Motif Investigation


Find Sequences

Back to Gibson Team course pages at EMBL.