Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jul;14(7):1394-403.
doi: 10.1101/gr.2289704.

Mauve: multiple alignment of conserved genomic sequence with rearrangements

Affiliations
Comparative Study

Mauve: multiple alignment of conserved genomic sequence with rearrangements

Aaron C E Darling et al. Genome Res. 2004 Jul.

Abstract

As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacterial chromosomes, and deletions remove segments of the genome. Consequently, each genome is a mosaic of unique lineage-specific segments, regions shared with a subset of other genomes and segments conserved among all the genomes under consideration. Furthermore, the linear order of these segments may be shuffled among genomes. We present methods for identification and alignment of conserved genomic DNA in the presence of rearrangements and horizontal transfer. Our methods have been implemented in a software package called Mauve. Mauve has been applied to align nine enterobacterial genomes and to determine global rearrangement structure in three mammalian genomes. We have evaluated the quality of Mauve alignments and drawn comparison to other methods through extensive simulations of genome evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A pictorial representation of greedy breakpoint elimination in three genomes. (A) The algorithm begins with the initial set of matching regions (multi-MUMs) represented as connected blocks. Blocks below a genome's center line are inverted relative to the reference sequence. (B) The matches are partitioned into a minimum set of collinear blocks. Each sequence of identically colored blocks represents a collinear set of matching regions. One connecting line is drawn per collinear block. Block 3 (yellow) has a low weight relative to other collinear blocks. (C) As low-weight collinear blocks are removed, adjacent collinear blocks coalesce into a single block, potentially eliminating one or more breakpoints. Gray regions within collinear blocks are targeted by recursive anchoring.
Figure 2
Figure 2
An unrooted phylogenetic tree relating the nine enterobacterial genomes in Table 1. The tree is a phylogenetic guide tree calculated using Neighbor Joining by the Mauve alignment system.
Figure 3
Figure 3
The performance of Mauve (left) and Multi-LAGAN (right) when aligning sequences evolved with increasing amounts of nucleotide substitution and indels. The multi-MUM anchoring technique used by Mauve limits its ability to align distantly related sequences. Multi-LAGAN version 1.2 did not complete the alignments of genomes without indels, resulting in the black row at the bottom. The substitution and indel rate observed in the enterobacteria is denoted by an asterisk (*).
Figure 4
Figure 4
The performance of Mauve (left) and Shuffle-LAGAN (right) when aligning two sequences evolved with increasing amounts of nucleotide substitution and inversions. Mauve is clearly more accurate than Shuffle-LAGAN at lower substitution rates. Shuffle-LAGAN version 1.2 did not complete some alignments without rearrangements, resulting in black entries. The observed substitution and inversion rate in the enterobacteria is denoted by an asterisk (*).
Figure 5
Figure 5
The performance of Mauve when aligning sequences evolved with rates similar to those observed among the group of nine enterobacteria. In this experiment, the substitution, indel, and inversion frequencies were held constant at rates similar to those observed in the enterobacteria. The asterisk (*) denotes the combination of large and small horizontal transfer rates observed in the enterobacteria. As the rate of large horizontal transfer increases, the amount of lineage-specific sequence relative to backbone grows. Because Mauve cannot align large lineage-specific regions, the alignment score drops. When scored only on regions considered backbone sequence, the accuracy is consistently above 98%.
Figure 6
Figure 6
Locally collinear blocks identified among the nine enterobacterial genomes listed in Table 1. Each contiguously colored region is a locally collinear block, a region without rearrangement of homologous backbone sequence. LCBs below a genome's center line are in the reverse complement orientation relative to the reference genome. Lines between genomes trace each orthologous LCB through every genome. Large gray regions within an LCB signify the presence of lineage-specific sequence at that site. Each of the 45 blocks has a minimum weight of 69. The Shigella and Salmonella genomes have undergone more genome rearrangements than the E. coli, possibly because of the presence of specific mobile genetic elements. The computation consumed ∼3 h on a 2.4-GHz workstation with 1 GB of memory. The figure was generated by the Mauve rearrangement viewer.
Figure 7
Figure 7
Mauve visualization of locally collinear blocks identified between concatenated chromosomes of the mouse, rat, and human genomes. Each of the 1251 blocks has a minimum weight of 90. Red vertical bars demarcate interchromosomal boundaries. The Mauve rearrangement viewer enables users to interactively zoom in on regions of interest and examine the local rearrangement structure. The computation consumed ∼12 h on a 1.6-GHz workstation with 2.5 GB of memory.

Similar articles

Cited by

References

    1. Bader, D.A., Moret, B.M., and Yan, M. 2001. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J. Comput. Biol. 8: 483–491. - PubMed
    1. Batzoglou, S., Pachter, L., Mesirov, J.P., Berger, B., and Lander, E.S. 2000. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 10: 950–958. - PMC - PubMed
    1. Blanchette, M., Bourque, G., and Sankoff, D. 1997. Breakpoint phylogenies. Genome Inform Ser. Workshop Genome Inform. 8: 25–34. - PubMed
    1. Blattner, F.R., Plunkett III, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453–1474. - PubMed
    1. Bourque, G. and Pevzner, P.A. 2002. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Res. 12: 26–36. - PMC - PubMed

WEB SITE REFERENCES

    1. http://gel.ahabs.wisc.edu/mauve; the Mauve alignment system and visualization environment.

Publication types

MeSH terms

LinkOut - more resources