8017 PHAST: Phage Assembly Suite and Tutorial

Saturday, February 18, 2012
Exhibit Hall A-B1 (VCC West Building)
D. Leland Taylor , Davidson College, Gainesville, GA
Laurie Heyer , Davidson College, Davidson, NC
Background: Next generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome must be broken into sections and then sequenced, producing “reads.” A computer pieces these reads back together in a process known as genome assembly. PHAST (Phage Assembly Suite and Tutorial) is an online set of modules designed to teach the genome assembly process. The website includes a tutorial detailing the choices and issues in genome assembly. Additionally, the website allows users to assemble small genomes of their own. PHAST uses phage genomes, sequenced through the HHMI’s phage hunter program. Phages are appropriate for real-time assembly as they are very small in size and can be assembled with relatively high confidence and low computational resources. Methods: PHAST simplifies the process of cleaning and assembling reads, allowing users to assemble genomes in real time while focusing on the basics of the mathematical and computational algorithms used in the assembly process. The website provides a simple interface that cleans and assembles a phage genome in a seamless workflow. All of the phages on PHAST were sequenced using Roche’s 454 sequencing platform. Thus, PHAST is optimized to handle small genomes sequenced by a 454 sequencer. PHAST cleans the reads with open source script called the sff_extract and assembles these reads in FASTA format with MIRA, an open source genome assembler program. Results: PHAST is a set of online tutorials and assembly workflow. The tutorial includes a brief overview of sequencing technologies, an introduction to assembly terminology, an overview of assembly methods, and a detailed description of the more popular genome assembly methods. The user can assemble small genomes, changing the parameters based on the information in the tutorial. PHAST will assemble the selected genome in real-time. For example, the user can choose not to clean reads, or use a subset of the reads, and compare the effects these choices have on the final genome assembly. The assembly workflow allows users to see the direct effect that changing parameters explained by the tutorial has on the assembled genome. A user can access all of their assemblies, along with various data on the assemblies, through a simple menu. The website is accessible at compbio.davidson.edu. Conclusions: PHAST is designed to expose undergraduates early in their college career to an application of computational biology – a field of study that blends mathematics, computer science, and biology. This tool allows students to learn about genome assemblies and sequencing technologies and encourages them to continue their study in STEM fields, and prepare for the future of genome sequencing. PHAST will enhance the comprehension of students participating in the HHMI-funded phage genome-sequencing project.