User's Manual

This manual describes how to use the distance program described in the article. For installation instructions, see the source code page.

Synopsis

distance [-v] [-k] [-s] [-o <outfile>] [-x <infofile>] <infiles> 

Description

The program reads sequences from one or more input files, computes distances, errors and substitution rate matrix estimates, and prints the result to stdout. Any messages (comments, warnings, errors) during the execution are written to stderr.

The input files are given either as filenames or by the special name -- which instructs distance to read from stdin. Leaving out sequence names will also cause distance to read from stdin. The sequence files are read using the SEQIO package from James Knight. This package allow you to use a multitude of sequence formats, and in particular every common sequence format.

The input sequences are must be aligned. Each site with an insertion or deletion, as well as ambigouity symbols will be removed. If the sequences are of different lengths, they will be cut off to the minimum length.

If the sequences are in a format without any information on sequence names or identifiers, distance will create names of the form anonX, where X is an integer, representing the order of the sequence (among the unnamed sequences) in the input.

The output consists of three matrices: The distance estimates, error estimates and an estimate of the nucleotide substitution matrix, in that order. The output is designed so that the distances may be used as input to tree-building programs in the PHYLIP package (such as neighbor) without stripping off errors and substitution rate matrix. The rows in the substitution rate matrix are ordererd as ACGT.

Options

-k
Keep prefixes in sequence identifiers introduces by the SEQIO package. Default is to remove them, but beware that ambigouity among identifiers are not checked.
-o
Write the output to <outfile>. Default is to use stdout.
-s
To make sure that the sequence identifiers are at most 10 characters long, which some programs (e.g. in the PHYLIP suite) demands. Note that no checking on uniqueness among identifiers are made.
-v
Verbose. Causes distance to emite extra information about the calculations.
-x
Specifies a file to put information about the computation in. If not given, all extra information goes to stderr.

Bugs

The program does not handle sequences that are too similar. If a nucleotide substitution has never occured in the dataset, there will be a division by zero (handled gracefully), and the results will be corrupt.


Entry page Source code Manual Test data Illustrations Links