Using RAxML

Getting RAxML

RAxML is a fast implementation of maximum-likelihood (ML) phylogeny estimation that operates on both nucleotide and protein sequence alignments.

You can download the latest version of RAxML from this website. After downloading and installing the software, it can be executed from the command line

It can also be used remotely from the webserver hosted at CIPRES.

However, as these ML analyses can be rather time-consuming, particularly when teaching we would prefer not to overload a webserver with too many jobs. Thus, the instructions below describe using a local command-line installation of RAxML.

Note, also, that the instructions below describe only a very limited subset of the functionality of RAxML, focused on obtaining a non-parametric bootstrapped analysis based on a protein sequence alignment.

This link is to a PDF of the documentation for RAxML version 7.0.4

Tree and Alignment Input Format

RAxML accepts both trees and alignments in PHYLIP format.

Estimating a Single Maximum-Likelihood Tree from Protein Sequences

An example of a command-line string used to estimate such a tree:
raxmlHPC -s TF105399.phy -n TF105399.raxml.singleTree -c 4 -f d -m PROTGAMMAJTT

Estimating a Set of Non-Parametric Bootstrap Trees

An example of a command-line string used to estimate a set of such trees:
raxmlHPC -s TF105399.phy -n TF105399.raxml -c 4 -f d -m PROTGAMMAJTT -b 234534251 -N 10

Projecting Bootstrap Confidence Values onto ML Tree

The command-line string below operates on
and integrates this information to output the ML tree with both ML branch lengths and the frequencies with which the splits in this tree are observed in the bootstrapped dataset (i.e. the "bootstrap confidence values")
raxmlHPC -f b -m PROTGAMMAJTT -c 4 -s TF105399.phy -z RAxML_bootstrap.TF105399.raxml -t RAxML_result.TF105399.raxml_singleTree -n BS_TREE

List of Relevant Command-Line Parameters


Specifying the Amino-Acid Substitution Model

The model is specified by supplying
The string supplied to the -m option (as we will be using it - there are additional options, but we will be ignoring them) is made up of several components, which must be concatenated in this order
  1. "PROT" to indicate that a protein/amino acid model is being specified
  2. "GAMMA" if a discrete-gamma distribution is being used to account for between-site rate variation
  3. the name of one of the amino-acid matrices e.g. "JTT"
  4. "F" if amino-acid frequencies should be estimated from the input alignment
For example, this string "PROTGAMMAJTT" specifies
Note that while RAxML allows rate variation to be modeled using a discrete-gamma distribution combined with an estimate of a proportion of invariant sites (using GAMMAI instead of just GAMMA), models are not provided that estimate only a proportion of invariant sites

List of Amino-Acid Matrices

There are 10 different matrices that can be used with this option, which were estimated from different alignments. The names of the different matrices are given below - the RAxML manual provides more information about them

Author: Aidan Budd
Back To Gibson Team Training Pages