Your internet browser stores recently viewed pages in a
disk buffer and may load the old version from your disk instead of
loading the updated page from our website. Please click on
Refresh/Reload (in the View menu of Internet Explorer or Firefox) to
force the browser to load the most recent version of this page.
: On some computer systems it is not possible to download EXE-files correctly, in which case the browser settings
should be checked (usually under security options) and in the last resort the system administrator should be contacted or a private internet facility with its own dial-in node and browser should be used (AOL, CompuServe, or others).
1. DOS version limitations
Network 2.x will run under Macs with DOS emulator as well as under Windows95/98 and Windows 2000. Since June 2001 a Windows Version - NETWORK 3.x or 4.x - can be downloaded. Therefore the DOS version will continue to be provided for free, but will not be extensively refined.
1.1 DOS version startup problems
A few users have reported problems with running Network 2.x on their
computers. In one case the solution was to download and run the program
from drive A. When too much RAM is occupied by other Windows programs
running in parallel, the program may timeout or a subroutine may fail.
1.2 DOS pathname conventions
Due to a compiler bug, NETWORK 2.x often cannot be run if it is installed
in directories which do not conform to old DOS pathname conventions.
The directory names should not be longer than 8 characters and should
best not contain lower case characters, spaces, or special characters.
For example, the pathname
will be ok, but
can lead to an error, because "PROGRAMME" has 9 characters, and
can lead to an error, because "MYNETWORK" has 9 characters, and
can lead to an error, because "Programs" contains lower case
The strange thing about this compiler error is that these problems do
not always need to occur. (This compiler error has also been encountered
with other compilers which we have used.)
1.3 Japanese fonts
When NETWORK 2.x is run on computers using Japanese (and also presumably
some other non-Western fonts), the text in the program fails to appear.
Please run the DOS program on PCs with US or other
Western codepage and keyboard settings, or use the Windows version of Network.
1.4 Hardware problems
We believe that problems with NETWORK 2.x happened on some (but not necessarily all) computers on
some local area networks (LANs) and that these were due to timing problems in the LAN-drivers
which are part of the operating system.
This update of Network 2.x attempts to circumvent these LAN-problems
using techniques which have been successful in the engineering software B2.
If problems nevertheless occur, please try running Network 2.x on a different
computer or a non-networked computer.
2. Compatibility between WIN and DOS versions
Network3.x for Windows accepts all input and output files generated by
Network2.x for DOS. Reverse compatibility (from 3.x back to 2.x) also
works, with the following exceptions:
(a) One file name ending has been renamed: *.mat (version 2.x) is now
*.rmf (version 3.x)
(b) Deletion coding (dash) is available in 3.x, but not in 2.x.
(c) Branch lengths are limited to 70 mutations in 3.x, and 25 mutations in
(d) File names may be up to 255 characters long in version 3.x , but only
8 characters in version 2.x.
(e) Network3.x displays text on Japanese computers, whereas Network2.x
3. Choice of data format
The network methods are designed for non-recombining DNA haplotypes, RNA or
amino acid sequences. The mutating units (characters) should be known and
coded at the highest possible resolution: for example, artefacts were often
produced in the human mtDNA RFLP literature by measuring independent
mutations at 3 adjacent nucleotides with only two endonucleases (Fig. 1 in
Bandelt et al. 1999
Similarly, in human Y STRs, a compound STR such as DYS389II should be
resolved into its mutational subcomponents m, n, and q
(Forster et al. 2000
) to avoid artefacts.
data typically are amino acid sequences, and also DNA sequences
containing nucleotide positions with more than two different nucleotides.
STR data are generally binary (if a single-repeat mutation mechanism
has generated the STR alleles). Multistate data can be analysed only by the
Median-Joining (MJ) network method, which is unreliable for longer branches, and not by the more robust Reduced Median (RM)
network method. Therefore, code your data in a multistate format (multistate *.rdf for DNA and multistate *.ami for amino acids) only if you are sure that they are not binary. Furthermore it is good practice to explore every possibility how multistate
data can be represented as binary data (for example by omitting multistate
nucleotide positions, or by grouping variants into transitions and
transversions) to run an exploratory RM analysis.
data typically are STRs and closely related DNA sequences (within a species). Furthermore, RFLPs are always binary, for which you can choose the Torroni RFLP format (*.tor). If you have a mixture of STR
data, RFLP data and/or binary point mutation DNA data from one chromosomal segment, then
choose the Y-STR data format (*.ych) and pretend that each RFLP and/or point mutation
is an STR with two length variants. DNA sequences which happen to be
binary at each nucleotide position can be
entered as binary rdf format (*.rdf), or alternatively as Torroni RFLP format (*.tor) if it is more convenient to pretend that each point mutation is a recognition site gain.
4. Data entry
Small data sets are best re-entered manually using the explicit data entry
options in the File menu.
Remember that you cannot code deletions in Network2.0; instead, code deletions with any
nucleotide and replace them with a dash (-) afterwards using Notepad/Editor if you wish to
use the file in Network4.x.
For entering FASTA format into Network, the new software DNA Alignment
Alternatively, users may wish to reformat their existing files to
Network specifications using for example Editor/Notepad. Please consult the example files in the Network download:
five acceptable entry formats are multistate *.rdf, binary *.rdf, *.ych, *.ami and *.tor. Two of these formats, multistate *.rdf and amino acid format (*.ami) are not included as examples in the DOS version download.
Taxon names in any file format should not be longer
than 6 characters, and each taxon name MUST be unique in the file.
Nucleotide, RFLP and STR names must not be longer than 5
characters in any of the three file formats. Sequence length (number of
characters) must not be longer than 500 positions in any file format.
For Roehl data format (*.rdf), consult the example in the download.
For Y STR format (*.ych) note that there is a limit of 100 STR loci.
Furthermore, beware that each STR entered in ych-format will be broken
up into several characters when Network converts it into rdf-format,
possibly exceeding the limit of 500 positions per sequence; a trial run is
Torroni RFLP format (*.tor) is the simplest to generate; consult the self-explanatory *.tor example file in the download.
Common formatting errors.
A single format error may cause Network to produce artefacts without an
Beware that MS Word and Windows Wordpad are unsuitable for editing your
data, because they can insert/delete spaces unpredictably. This causes
Network to produce artefacts.
It is safe to use the Windows text editor NOTEPAD (called
EDITOR on some non-US-language Windows versions).
In Network for Windows, the file format is not recognised if the appropriate file ending (*.rdf or *.ych or *.tor) is missing. Network for DOS is flexible in this respect.
Files generate error messages if there are empty lines at the end of a file (often the case when converting from Excel).
Files generate error messages if values are placed in quotation marks (often the case when converting from Excel).
5. Example files
Example files for Torroni RFLP format (Tibetan mtDNA RFLPs and east Asian mtDNA RFLPs), Y STR format
(Amerind Y STRs) and Roehl data format (Nuu Chah Nulth mtDNA control
region) are included in the download. All four analyses are discussed in
the literature: consult Figs 1 and 6 in
Bandelt et al. (1999)
Tibetans; Fig 3 in
Forster et al. (2001)
for the east Asians; Fig 4 in
Forster et al. (2000)
for the Amerinds; and Fig 7 in
Bandelt et al. (1995)
for the Nuu Chah Nulth.
6. Network calculation
If your data are binary and you expect branches which are more than a few
mutations long (you can get an idea by displaying the mismatch distribution
available in the File menu), then preferentially use the RM algorithm,
otherwise resort to the MJ algorithm.
The first thing you should do with any data file is to call up the Change
Weights option within RM or MJ to check whether all your characters or
nucleotide positions were entered and weighted correctly. We suggest
running an initial analysis with the default settings, that is, r set to 2
if you choose RM, or epsilon set to zero if you choose MJ.
If the networks turn out to be clean (i.e. treelike, and without large
cycles), you should experiment with slightly higher settings to visualise
the extent of homoplasy (potentially due to recurrent mutations, sequence
errors, recombination etc.).
If on the other hand the initial network is messy (high-dimensional cubes)
or contains an empty cycle larger than a rectangle (only in MJ networks),
then something is amiss (recurrent mutations, sequence errors,
recombination). To explore or overcome the problem, activate the
frequency>1 option before running the algorithms; this option will select
only those sequences confirmed at least twice in the data set. If the
network is still messy, you can investigate whether this may be due to a
few rapidly mutating characters by consulting the statistics option. These
characters are candidates for downweighting before running another
analysis. Weighting may be particularly relevant for STRs: the program
internally codes the entered STRs assuming a single-repeat mutation
mechanism. If this is known to be unrealistic for a given STR, the
offending STR should be dealt with by downweighting it as a whole, or by
differentially weighting its length transitions (labeled with a, b, c...).
In general, weights often are most effective when chosen conservatively,
e.g. a known tenfold higher mutation rate for a nucleotide position should
be translated into a much less extreme than tenfold lower weight setting in the
If despite these efforts the network still contains many high-dimensional
features, then (for binary data files), RM and MJ can be applied
sequentially. First, use RM to generate a *.rmf file (the *.out file will
also be generated but is of no consequence here); then, apply MJ to the
7. Large data sets
If your data set contains hundreds of sequences and the corresponding network is consequently difficult to visualise, use the star contraction option prior to the phylogenetic analysis. The star contraction option reduces large data sets to smaller data sets by identifying and contracting any starlike phylogenetic cluster into one ancestral type. The reduced data set can then be run in a phylogenetic algorithm (either our network methods or other tree-building methods) to produce a simplified skeleton phylogeny. In the graphical display, the algorithm remembers which sequences were contracted. An example star contraction analysis is included in the download and is discussed in Forster et al. (2001). Note that the publication contains some typographical errors (which do not influence the presented values or conclusions) on pages 1870/1871: for the Asian analysis the value for delta is set to 5, and the first round of star contraction reduces the data from 245 sequence types to 113 sequence types.
8. Time estimates
Time estimates for nodes in the phylogeny are not available in the DOS version, only in the Windows version.
9. Known bugs
In the case of multistate DNA files, the "Calculate Network" menu of
Network2.0c offers an option to change the transition/transversion ratio
(default setting 1:1). This option should be used with caution however, as
a different ratio can cause the MJ method to enter a continuous loop during
the network calculation.
Remedy: If differential weighting for transitions and transversions is
desired, identify and weight the relevant nucleotide positions manually.
In multistate DNA file format, there is no coding for deleted nucleotides:
"D" stands for "A or G or T" as recommended by the nomenclature committee
of the International Union of Biochemistry. (Note however that in the
distribution option, "D" DOES stand for "deletion".)
Remedy: Choose a nucleotide (A, G, C, or T) to code a deletion.
In the MJ algorithm for epsilon > 0, network links may be erroneously omitted when calculating complex data sets. Remedy: use Network 4.x.
Please cite this website (fluxus-engineering.com
), as well as
Bandelt et al. (1995)
when using RM,
Bandelt et al. (1999)
when using MJ, or
Forster et al. (2001)
To page top