Root

The root is the most recent common ancestor of all of the taxa in the tree. It is therefore the oldest part of the tree and tells us the direction of evolution, with the flow of genetic information moving from the root, towards the tips with each successive generation. 

Most methods of phylogenetic reconstruction do not estimate the position of the root, in part because this increases the number of possible trees, and therefore time that it takes to calculate the tree. An example of an unrooted tree which has been subsequently rooted is shown below in Figure 10.

Figure 10 Rooting a tree: before and after (note the branch lengths are not to scale).

If you have difficulty visualising the rooting process (shown above in Figure 10) then imagine that the tree was made from string, and that you are pushing a pin into the string to rotate the remaining branches around the pin-point. The arrow indicates the direction of evolution as implied by the root position. 

Rooting a tree affects its meaning

Deciding upon an appropriate root position is critical for phylogenetic interpretation because the root tells us the direction of evolution and so affects statements that we make about patterns of relatedness. For example, in the unrooted tree above (Figure 10, left) we cannot make statements such as “A is more closely related to B than it is to C” because this would not be true if the root occurred anywhere on the branches that connect A and B. 

Where to root a tree? 

There are two main approaches that we can use to root a tree:

Outgroup rooting: A preferred approach is generally to include one or more sequences in our analysis that we know are definitely more distantly related to our sequences of interest than they are to one another. These sequence are usually referred to as ‘outgroups’. The root estimate is then simply the point at which our outgroup(s) join the rest of our tree of interest. The best possible outgroups are those available which are most closely related to our sequences of interest. If outgroups are too distantly related then they can be unreliable as they may be difficult to align reliably or have become saturated with substitutions.

Midpoint rooting: This method requires you to make the assumption that all of your sequences are evolving at the same rate – you should do so cautiously because this assumption does not hold for many biological datasets. In this case, the root is positioned at the midpoint between the two longest branches. If you have taxa that were not sampled at the same time point then a slight modification of this method would be required to take into account the time elapsed between samples.