Multiple Sequence Alignment "How To"'s

Collecting sequences by keyword search

NCBI Entrez Keyword Query

Go to the NCBI Entrez website.

Switch the search to be for "Proteins" (assuming you are searching for protein sequences)

Enter your query into the "for" box, and execute it using the "Go" button.

This will identify those records within the Entrez protein sequence databases that match your query.

For help building appropriate queries, look through the following pages provided by NCBI

Entrez Help
Entrez sequence database searching tutorial
Entrez Search Field Descriptions and Qualifiers

To alter the way the records are displayed:

View the sequences in fasta format by switching "Display" to "fasta".
(other formats e.g. "genpept" can be used if you want to obtain more information about the sequences.)

To save the results:

Save the results into a file by switching "send to" to "file".
Be sure to store the file somewhere you will find it again, and to give it an appropriate name.

SRS Keyword Query


Automatic Alignment of Multiple Sequences (command line - default settings)


./muscle -in in_filename.fasta -out out_filename.fasta


./mafft infile.fasta > outfile.fasta


./probcons infile.fasta > outfile.fasta


./clustalw infile.fasta
(creates outfile "infile.aln")

Automatic Alignment of Multiple Sequences (Webservers)


Open the link to the Japanese MAFFT webserver.

To collect the alignment use the "Fasta format" link at the top left of the page

Back to Gibson Team course pages at EMBL.