Proteomics Core FacilitySample Preparation
What is the minimum amount of protein required for MS analysis?
This is the single most frequent question we get, and the hardest one to answer. The boring (but correct) answer is that this depends on the sample, and on the question that you want to address. If you want to identify a single protein from gel, an amount that produces a coomassie-stainable band will almost always give you a protein identification (as long as your protein exists in a database that we can search) - this should be in the range of 10-20 ng. If you have a silverstained band (<10 ng), we may still be in a good position to get a protein identification, but there is no guarantee of success: there are some caveats with respect to protein recovery and interference in MS signal. In general, always send us as much as you can spare. If you think of sending half of your sample, which gave you a faint coomassie band, consider what you would do with the remaining half – it might be better to send us everything from the start. If you aim to find posttranslational modifications, silverstaining is definitely not enough, and you will need to send as much as you can. For identification of proteins in complex mixtures (e.g. separated over a gel lane), the amount of protein should be upwards from 10 µg.
To determine the weight of intact proteins, you should send at least 10 µg at a concentration not lower than 1 mg/ml. Usually this is sufficient to get a molecular weight determined under denaturing conditions. The situation for so-called ‘native’ conditions is entirely different. The required amount usually is much higher, but will totally depend on the composition, size, and purity of the protein. There are no standard conditions for this type of experiments, so this will be a matter of trial and error, and optimization of MS conditions. Another very critical factor is the buffer composition (see below).
In what buffer should I send my sample?
Gels can be sent well-sealed in plastic foils (see here), gel bands in closed eppendorf tubes (see here). In both cases, addition of large volumes of water or buffer is not required - having them moist is sufficient. Also, there is no need to ship them on ice. For MW determination of intact proteins, the buffer composition is not highly criticial, as long as it does not contain detergents or more than 2% glycerol.
How can I be sure you correctly identified my protein?
Protein identification by mass spectrometry is probabilistic, meaning that the best match is sought between an experimental spectrum and a theoretical spectrum for a peptide in the database. The score assigned to this match, and therefore the probability for the match to be right, depends on a number of parameters, such as spectral quality, mass accuracy, the size of the database, and the algorithm used for database searching. In addition, the more peptides are assigned to a given protein, the higher the protein score will be. For the identification of single proteins, we report an E-value indicating the likelihood that the identification was generated by chance. For large datasets, we report a false discovery rate, indicating the estimated number of wrong identifications in the entire dataset (usually <1%).
Can you identify proteins from organisms with unsequenced genomes?
We identify proteins by matching spectra to protein sequences in a database. Thus, in principle, if a protein is not in the database it will not be identified. Therefore full genome sequences are very helpful for protein identification, even if the protein as such has never been observed before. For organisms with unsequenced genomes, the number of known proteins (or genes) can be far from complete, seriously hampering identification of novel proteins. We can search from DNA databases, so any additional information (partial genome sequences, or initial attempts for genome annotation, e.g. scaffolds) may be used to improve our chances. Remember that protein identification by MS is NOT a BLAST search, and thus proteins can not be identified ‘by homology’: even a single amino acid substitution will change the mass of a peptide, thus prohibiting its identification. In some cases, we can perform peptide de novo sequencing, either alone or in combination with a database search for higher confidence. Please remember that de novo sequencing will not piece back together the positions of the peptides within the protein - without a database sequence, this information is lost once the protein is digested into peptides.
Can you identify phosphorylation sites?
In principle we are set up to do this, although detection of phosphopeptides and site-localisation of the modification is never straightforward. First, phosphorylation often is substoichiometric, meaning that phosphopeptides can be very low-abundant, and that they may even go unnoticed among non-phosphorylated peptides from the same or other proteins. Second, ionization of phosphopeptides is less efficient than for ‘normal’ peptides, which also does not help to detect them. Third, fragmentation of phosphopeptides does not obey the same rules that apply to normal peptides, which sometimes makes spectra hard to interpret. Finally, the phosphorylation site may be present in a peptide that is too large or too small to be detected and fragmented efficiently. If you know your protein, and the expected modification site, the choice of a different protease may be a good alternative for mapping the desired domain. As a consequence of all of this, it is not uncommon that we can tell a peptide is phosphorylated, but that it is difficult to pinpoint at exactly what site. Enrichment of phosphopeptides (actually the process involves depleting the sample of non-phosphorylated peptides) by techniques as IMAC or TiO2 is often helpful, especially for complex samples.
What about other post-translational modifications?
We can identify other PTMs - it will be helpful if you tell us which one(s) you expect to occur – we can then look for this specifically. Some are easier to detect than phosphorylation, because they tend to be more stable in the mass spectrometer (e.g. acetylation, methylation). Looking for any of the ~300 known PTMs creates a combinatorial problem, usually decreasing the likelihood of finding any one of them with high confidence. Nevertheless, there may be ways to approach this in an iterative way, thus identifying unexpected modifications.
I expected 1 protein in my gel band, you give me 25. How come?
If you excised a single band from a gel, it is not unlikely that this contains several proteins of (almost) the same size, some of which may be below the detection level of the staining method used. Mass spectrometers by far exceed the sensitivity of coomassie and silver staining.
I see a lot of keratins in my list of proteins. Where do they come from?
Most likely they were introduced during sample preparation, or (in case you used gels) during staining or cutting. Dust in the lab is the most likely source, so make sure you work in a clean area. In the User Guide we have some suggestions how to minimise contamination.
You could not identify a single protein in my gel band. Why?
Right, that’s embarrassing – or maybe not. Actually, there may be several reasons why this might occur. The most likely reason is that the amount of starting material was simply not enough. Other reasons might be that the protein does not contain (a sufficient number of) cleavage sites for trypsin, our work-horse protease. As a result, no peptides will be generated (and detected). This may be the case for some small proteins, but this may also occur to ‘exotic’ (e.g. highly acidic) proteins that contain fewer lysines and arginines than the average protein. The reverse might also be true: if a protein contains multiple cleavage sites, it will be digested into peptides that are too small to be detected, or to be sequenced with high confidence (e.g. histone tails). The alternative might be in the selection of another protease, which may viable if you know the protein you are working with. Another reason for not finding a protein may be that it is not present in the database. This is not unusual for proteins originating from poorly characterized organisms. Finally, if you coomassie-stained your gel in a microwave or scanned on an overhead projector foil, these are the most likely reasons for not finding anything: microwaving bakes proteins in the gel, and there is no way to get them out, however intense the stain. Polymers in the overhead projector foils cause trypsin not to work. Therefore, do not use a microwave to speed up staining or destaining, and only scan on glass plates (see also User Guide).
Can you quantify proteins by stable isotope labeling, such as SILAC?
Yes, we support full workflows using various stable isotope labeling strategies, including SILAC, TMT and dimethyl labeling, including data analysis. The latter two are peptide-based and are offered as a service by the Facility. SILAC labels are introduced during cell culture, which is typically carried out by the user/biologist before submitting samples to the Facility. If you are considering SILAC labeling, we are happy to advise in setting up and optimising the experiment - e.g. to verify by mass spectrometry, full incorporation of your heavy amino acids, before a mixing experiment with different conditions is carried out.
Can you analyse non-covalent protein interactions?
We can do this in exceptional cases only, mostly depending on sample preparation on your side (sorry…). Seriously, analysis of proteins (and complexes) under native conditions is restricted to samples devoid of salts and detergents. Salt cannot be tolerated because in the electrospray process it takes up charges much more easily than proteins, thus suppressing the protein signal. Volatile buffers like ammonium acetate work best, but not many protein complexes are stable under these conditions. If you read papers on native MS, realize that the authors often spend an awful lot of time (many months/years often) in getting their protein pure, in a high concentration, and in a MS-compatible buffer.
What is the level of detail you can report for large-scale MS results?
For most users, an Excel-list of identified proteins is usually sufficient, with an accession number to an appropriate database (e.g. Uniprot). However, we can provide much more than that, including sequences of identified peptides, peptide scores, sequence coverage, position of the peptide in the protein, annotated spectra for all peptides, etc. The latter may be required by some journals if you report identification of PTMs. In addition, we provide details on protein and peptide quantification, either using label-free or stable isotope-labeling approaches.