This manual describes how to use the distance
program described in the article. For
installation instructions, see the source code
page.
distance [-v] [-k] [-s] [-o <outfile>] [-x <infofile>] <infiles>
The program reads sequences from one or more input files, computes distances,
errors and substitution rate matrix estimates, and prints the result to
stdout. Any messages (comments, warnings, errors) during the execution
are written to stderr.
The input files are given either as filenames or by the special name
-- which instructs distance to read from stdin. Leaving
out sequence names will also cause distance to read from stdin.
The sequence files are read using the SEQIO
package from James
Knight. This package allow you to use a multitude
of sequence formats, and in particular every common sequence format.
The input sequences are must be aligned. Each site with an insertion or deletion, as well as ambigouity symbols will be removed. If the sequences are of different lengths, they will be cut off to the minimum length.
If the sequences are in a format without any information on sequence
names or identifiers, distance will create names of the form anonX,
where X is an integer, representing the order of the sequence (among the
unnamed sequences) in the input.
The output consists of three matrices: The distance estimates, error estimates and an estimate of the nucleotide substitution matrix, in that order. The output is designed so that the distances may be used as input to tree-building programs in the PHYLIP package (such as neighbor) without stripping off errors and substitution rate matrix. The rows in the substitution rate matrix are ordererd as ACGT.
stdout.distance to emite extra information about
the calculations.The program does not handle sequences that are too similar. If a nucleotide substitution has never occured in the dataset, there will be a division by zero (handled gracefully), and the results will be corrupt.
| Entry page | Source code | Manual | Test data | Illustrations | Links |