Skip to main content

Home > Unit on Biologic Computation

Scientific Software Support and Bioinformatics Core Facility

Jonathan Epstein, MS
  • Jonathan Epstein, MS, Head, Unit on Biologic Computation

We provide end-user scientific software support for laboratories within NICHD's Division of Intramural Research. Operating as a core facility, we perform most of our work in conjunction with or on behalf of these laboratories.

Next-Generation sequencing and a new Molecular Genetics Laboratory

Since the DIR recently obtained its own Applied Biosystems SOLID 4 Next-Generation sequencing instrument, we have been actively involved in producing meaningful analyses from that instrument. The primary goal of this laboratory is to act as a certified clinical (CLIA) laboratory for sequencing human genes. Thus far, we have sequenced the DNA of several individuals, using Agilent SureSelect technology to select only the genes of interest.

The analyses to date have included calculating exon coverage for the regions selected, using the SureSelect assay, as well as corresponding coverage for whole exomes. In addition, we developed web-friendly tools to enable investigators to view the nature of observed polymorphisms using industry-standard tools such as the Santa Cruz browser and the Broad Institute's IGV viewer. More recently, we automated the design of the "baits" for this SureSelect method, to save time and reduce human error.

Mass spectrometry

We continue to collaborate with the Mass Spectrometry Core Facility. Our previous development of software for the de novo identification of peptides included a large data set (LIPCUT) that exhaustively enumerates all possible amino acid combinations falling into a given mass range. We recently developed software to enumerate the elemental composition of peptides in an analogous fashion and were able to leverage favorable combinatorics to perform such enumeration for higher-mass peptides (4,500 daltons versus 1,750 daltons for LIPCUT). We are seeking to combine these resources with a newly developed isotopic clustering algorithm to improve existing peptide fragmentation database searches by submitting refined versions of the original spectrum to the search engine.

Our current work also uses a database of known human proteins as a guide as to whether a given isotope cluster is likely to have been derived from an unmodified peptide or a peptide subjected to one or more common post-translational modifications.

We are also working on a Bayesian approach to add value to mass-spectral database searching by considering which peptides are associated with each candidate protein and the relative spectral intensity of those peptides.

Short-read genome assembly and analysis

We are actively involved in a joint project with the Program in Genomics of Differentiation's Section on Molecular and Cell Biology. So far, we have aligned Solexa reads of S. pombe to its reference genome and have remediated a set of S. pombe third-party genome corrections. We are working with three strains of S. pombe, one of which is a parent of the other two strains. We have identified differences between the parent strain and the canonical reference strain and, more importantly, differences between the parent strain and its two mutants. We resolved previously identified artifacts of this next-Gen sequencing and applied de novo assembly methods to resolve sections of poor sequence coverage. To learn more about how S. pombe acquires its natural resistance to rapamycin, we recently began work with a new set of strains.

Radiation hybrid mapping

Despite the availability of a draft zebrafish genome, we see demand for radiation hybrid mapping in zebrafish. We recently dismantled the LN54 radiation hybrid panel mapping web site (for technical reasons) and now receive requests to perform radiation hybrid mapping on a semi-manual basis.

General bioinformatics

We provided ongoing consultation on DNA and protein sequence analysis and on general bioinformatics issues to the Program in Genomics of Differentiation and have consulted with regard to evolving high-throughput DNA sequencing technology.

Publications

  • Potrykus K, Murphy M, Chen X, Epstein JA, Cashel M. Imprecise transcription termination within Escherichia coli greA leader gives rise to an array of short transcripts, GraL. Nucleic Acids Research 2010 38(5):1636-1651.

Collaborators

  • Peter Backlund, PhD, Mass Spectrometry Core Facility, NICHD, Bethesda, MD
  • Igor Dawid, PhD, Program on Genomics of Differentiation, NICHD, Bethesda, MD
  • Bruce Howard, MD, Program on Genomics of Differentiation, NICHD, Bethesda, MD
  • James Iben, PhD, Program on Genomics of Differentiation, NICHD, Bethesda, MD
  • Richard Maraia, MD, Program on Genomics of Differentiation, NICHD, Bethesda, MD
  • Matthew Olson, MD, The Johns Hopkins University, Baltimore, MD
  • Dan Sackett, PhD, Program on Physical Biology, NICHD, Bethesda, MD
  • Steven Salzberg, PhD, University of Maryland, College Park, MD
  • Reiko Toyama, PhD, Program on Genomics of Differentiation, NICHD, Bethesda, MD
  • Alfred Yergey, PhD, Mass Spectrometry Core Facility, NICHD, Bethesda, MD
  • Anelia Horvath, PhD, Molecular Genetics Laboratory, NICHD, Bethesda, MD
  • Christopher Wassif, MS, Molecular Genetics Laboratory, NICHD, Bethesda, MD
  • Forbes D. Porter, MD, Molecular Genetics Laboratory, NICHD, Bethesda, MD

Top of Page