Proteomics Core FacilitySample Preparation
What is the minimum amount of protein required for MS analysis?
This is the single most frequent question we get, and the hardest one to answer. The boring (but correct) answer is that this depends on the sample, and on the question that you want to address. If you want to identify a single protein from gel, an amount that produces a coomassie-stainable band will be enough in most cases (this should be in the range of 10-20 ng). If you have a silverstained band (<10 ng), we may still be in a good position to get a protein identification, but there is no guarantee of success: there are some caveats with respect to protein recovery and interference in MS signal. In general, always send us as much as you can spare. If you think of sending half of your sample, which gave you a faint coomassie band, consider what you would do with the remaining half – it might be better to send us everything from the start. If you aim to find posttranslational modifications, silverstaining is definitely not enough, and you will need to send as much as you can. For identification of proteins in complex mixtures (e.g. separated over a gel lane), the amount of protein should be in the range of 20-50 µg.
To determine the weight of intact proteins, you should send at least 10 µg at a concentration not lower than 1 mg/ml. Usually this is sufficient to get a molecular weight determined under denaturing conditions. The situation for so-called ‘native’ conditions is entirely different. The required amount usually is much higher, but will totally depend on the composition, size, and purity of the protein. There are no standard conditions for this type of experiments, so this will be a matter of trial and error, and optimization of MS conditions. Another critical factor is the buffer composition (see below).
In what buffer should I send my sample?
Gels can be sent sealed in plastic, gel bands in closed eppendorf tubes. In both cases, no addition of water or buffer is required. Also, there is no need to ship them on ice. For MW determination of intact proteins, the buffer composition is not highly criticial, as long as it does not contain detergents and <2% glycerol. For MS under non-denaturing conditions, proteins must be in a volatile buffer such as ammonium acetate. Specific conditions will depend on the protein and the question to be addressed, ask staff in the Facility for advice.
How can I be sure you correctly identified my protein?
Protein identification by mass spectrometry is probabilistic, meaning that the best match is sought between an experimental spectrum and a theoretical spectrum for a peptide in the database. The score assigned to this match, and therefore the probability for the match to be right, depends on a number of parameters, such as spectral quality, mass accuracy, the size of the database, and the algorithm used for database searching. In addition, the more peptides are assigned to a given protein, the higher the protein score will be. For the identification of single proteins, we report an E-value indicating the likelihood that the identification was generated by chance. For large datasets, we report a false discovery rate, indicating the estimated number of wrong identifications in the entire dataset (usually <1%).
Can you identify proteins from organisms with unsequenced genomes?
We identify proteins by matching spectra to protein sequences in a database. This means, we do not determine peptide sequences de novo just from the spectrum. Thus, in principle, if a protein is not in the database it will not be identified. Therefore full genome sequences are very helpful for protein identification, even if the protein as such has never been observed before. For organisms with unsequenced genomes, the number of known proteins (or genes) can be far from complete, seriously hampering identification of novel proteins. We can search from DNA databases, so any additional information (partial genome sequences, or initial attempts for genome annotation, e.g. scaffolds) may be used to improve our chances. Remember that protein identification by MS is NOT a BLAST search, and thus proteins can not be identified ‘by homology’: even a single amino acid substitution will change the mass of a peptide, thus prohibiting its identification.
Can you identify phosphorylation sites?
In principle we are set up to do this. I say ‘in principle’, because there may be several practical causes why we would not find them. First, phosphorylation often is substoichiometric, meaning that phosphopeptides can be very low-abundant, and that they may even go unnoticed among non-phosphorylated peptides from the same or other proteins. Second, ionization of phosphopeptides is less efficient than for ‘normal’ peptides, which also does not help to detect them. Third, fragmentation of phosphopeptides does not obey the same rules that apply to normal peptides, which sometimes makes spectra hard to interpret. Finally, the phosphorylation site may be present in a peptide that is too large or too small to be detected and fragmented efficiently. If you know your protein, and the expected modification site, the choice of a different protease may be a good alternative for mapping the desired domain. As a consequence of this all, it is not uncommon that we can tell a peptide is phosphorylated, but that it is difficult to pinpoint at exactly what site. In this whole process, enrichment of phosphopeptides by techniques as IMAC my be helpful in some cases (especially isolating them from complex samples). We don’t offer this yet, but are planning to do so in the near future.
What about other posttranslational modifications?
We can identify other PTMs, it will be helpful if you tell us which one you expect to occur – we can then look for this specifically. Some are easier to detect than phosphorylation, because they tend to be more stable in the mass spectrometer (e.g. acetylation, methylation). Looking for any of the ~300 known PTMs creates a combinatorial problem, usually decreasing the likelihood of finding any one of them with high confidence. Nevertheless, there may be ways to approach this in an interative way, thus identifying unexpected modifications.
I expected 1 protein in my gel band, you give me 25. How come?
If you excised a single band from a gel, it is not unlikely that this contains several proteins of (almost) the same size, some of which may be below the detection level of the staining method used. Mass spectrometers exceed the sensitivity of coomassie and silver staining.
I see a lot of keratins in my list of proteins. Where do they come from?
Most likely they were introduced during sample preparation, or (in case you used gels) during staining or cutting. Dust in the lab is the most likely source, so make sure you work in a clean area. In the User Guide we have some suggestions how to minimise contamination.
You could not identify a single protein in my gel band. Why?
Right, that’s embarrassing – or maybe not. Actually, there may be several reasons why this might occur. The most likely reason is that the amount of starting material was simply not enough. Other reasons might be that the protein does not contain (a sufficient number of) cleavage sites for trypsin, our work-horse protease. As a result, no peptides will be generated (and detected). This may be the case for some small proteins, but this may also occur to ‘exotic’ (e.g. highly acidic) proteins that contain fewer lysines and arginines than the average protein. The reverse might also be true: if a protein contains multiple cleavage sites, it will be digested into peptides that are too small to be detected, or to be sequenced with high confidence (e.g. histone tails). The alternative might be in the selection of another protease, which may viable if you know the protein you are working with. Another reason for not finding a protein may be that it is not present in the database. This is not unusual for proteins originating from poorly characterized organisms.
I read a cool paper about SILAC - do you do this kind of quantitative proteomics?
Yes, we also read these papers. Unfortunately, we don’t offer quantitative analysis yet, but we are working on this….
Can you analyse non-covalent protein interactions?
We can do this in exceptional cases only, mostly depending on sample preparation on your side (sorry…). Seriously, analysis of proteins (and complexes) under native conditions is restricted to samples devoid of salts and detergents. Salt cannot be tolerated because in the electrospray process it takes up charges much more easily than proteins, thus suppressing the protein signal. Volatile buffers like ammonium acetate work best, but not many proteins are stable under these conditions. If you read papers on native MS, realize that the authors often spend an awful lot of time in getting their protein pure, in a high concentration, and in a MS-compatible buffer.
What is the level of detail you can report for large-scale MS results?
For most users, an Excel-list of identified proteins is usually sufficient, with an accession number to an appropriate database (e.g. Uniprot). However, we can provide much more than that, including sequences of identified peptides, peptide scores, sequence coverage, position of the peptide in the protein, annotated spectra for all peptides, etc. The latter may be required by some journals if you report identification of PTMs.