Multiple Sequence Alignment "How To"'s



Collecting sequences by keyword search

NCBI Entrez Keyword Query

Go to the NCBI Entrez website.

Switch the search to be for "Proteins" (assuming you are searching for protein sequences)

Enter your query into the "for" box, and execute it using the "Go" button.

This will identify those records within the Entrez protein sequence databases that match your query.

For help building appropriate queries, look through the following pages provided by NCBI

Entrez Help
Entrez sequence database searching tutorial
Entrez Search Field Descriptions and Qualifiers

To alter the way the records are displayed:

View the sequences in fasta format by switching "Display" to "fasta".
(other formats e.g. "genpept" can be used if you want to obtain more information about the sequences.)

To save the results:

Save the results into a file by switching "send to" to "file".
Be sure to store the file somewhere you will find it again, and to give it an appropriate name.

SRS Keyword Query

asdf

Automatic Alignment of Multiple Sequences (command line - default settings)

MUSCLE

./muscle -in in_filename.fasta -out out_filename.fasta

MAFFT

./mafft infile.fasta > outfile.fasta

PROBCONS

./probcons infile.fasta > outfile.fasta

CLUSTALW

./clustalw infile.fasta
(creates outfile "infile.aln")
asdadfs
asdfasdf


Automatic Alignment of Multiple Sequences (Webservers)

MAFFT

Open the link to the Japanese MAFFT webserver.

To collect the alignment use the "Fasta format" link at the top left of the page



Back to Gibson Team course pages at EMBL.