Rainbow: An integrated tool for efficient clustering and assembling RAD-seq reads

Academic Article

Abstract

  • Motivation: The innovation of restriction-site associated DNA sequencing (RAD-seq) method takes full advantage of next-generation sequencing technology. By clustering paired-end short reads into groups with their own unique tags, RAD-seq assembly problem is divided into subproblems. Fast and accurately clustering and assembling millions of RAD-seq reads with sequencing errors, different levels of heterozygosity and repetitive sequences is a challenging question. Results: Rainbow is developed to provide an ultra-fast and memoryefficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top-down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom-up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data. © 2012 The Author.
  • Authors

    Published In

  • Bioinformatics  Journal
  • Digital Object Identifier (doi)

    Author List

  • Chong Z; Ruan J; Wu CI
  • Start Page

  • 2732
  • End Page

  • 2737
  • Volume

  • 28
  • Issue

  • 21