Skip to main content

National Institutes of Health

Eunice Kennedy Shriver National Institute of Child Health and Human Development

2021 Annual Report of the Division of Intramural Research

Bioinformatics and Scientific Programming Core Facility

Ryan Dale
  • Ryan Dale, PhD, Scientific Information Officer, Head, Bioinformatics and Scientific Programming Core
  • Caroline Esnault, PhD, Staff Scientist
  • Apratim Mitra, PhD, Staff Scientist
  • Hongen (Henry) Zhang, PhD, Staff Scientist
  • Gennady Margolin, PhD, Bioinformatics Scientist
  • Mira Sohn, PhD, Bioinformatics Scientist
  • Kiersten Campbell, BS, Postbaccalaureate Fellow
  • Gus Fridell, BS, Postbaccalaureate Fellow
  • Eva Jason, BS, Postbaccalaureate Fellow
  • Nicholas Johnson, BS, Postbaccalaureate Fellow
  • Arjun Mittal, BS, Postbaccalaureate Fellow

The goal of the Bioinformatics and Scientific Programming Core (BSPC) is to provide expert bioinformatics support to NICHD researchers, assisting at all stages, from experimental design through several iterations of analysis to final manuscript preparation. In addition, we develop software tools that can be applied to a wide range of bioinformatics, genomics, and general data analysis, both at NICHD and in the larger international scientific community. We also coordinate training for staff and trainees in basic programming and genomic analyses to help build bioinformatics support directly within labs.

Structure

The BSPC uses a “hub and spoke” model, consisting of a central core of staff in Building 6A coordinating with embedded bioinformaticians (currently in Buildings 6, 49, and 35) working directly in laboratories. This allows us to build a centralized infrastructure that can be re-used across many research programs, while at the same time maintaining focused and custom local support in labs. Joint meetings and discussion allow everyone, central and embedded, to share lessons learned and identify new tools and methods.

Projects overview

In 2021, the BSPC worked on 97 projects, collaborating with principal investigators (PIs), fellows, staff scientists, and staff clinicians across 33 laboratories. Of these, 55 were new projects and 42 were carried over from the previous year. The projects included assays such as bulk RNA-Seq, single-cell RNA-Seq, ChIP-Seq, whole-exome sequencing, whole-genome sequencing, DNA methylation, CUT&RUN, bulk ATAC-Seq, and single-cell ATAC-Seq. In addition, new projects this year included analysis of TRIP (Thousands of Reporters in Parallel) data, CRISPRi methods development, and long-read assembly. Some projects involved custom algorithm development and tool development, and many projects required integration with published studies. New this year is the development of several new R Shiny web applications that our collaborators use to interactively explore and dig deeper into the analysis results we provide.

Projects often begin with an in-depth discussion with researchers to understand the background and goals of the project. It is important for us to understand the underlying biology and details of the experimental design (when applicable) for each project, so that we can make the most informed analysis decisions. We then provide a prioritized plan for the first round of analysis and schedule the work. There are often several iterations of analysis as a project progresses. Each iteration may add more sophisticated analyses, new data generated by the lab, or integrate results with published data. As expected for a no-cost shared resource, the time it takes for one iteration on one project is highly dependent on the existing workload across all other projects that we are handling in the Institute.

After each iteration, we meet to discuss the results in detail. The meeting includes a walk-through of the results, the computational background, discussion of how to use and interpret the tables, figures, and other output, and recommendations for next steps. Depending on the researcher’s interests, this can also include a discussion of the code and help with running it or adapting it to other projects in the lab. The next iteration of analysis is then planned, prioritized, and scheduled.

The BSPC’s collaboration includes writing the manuscript, producing figures and tables, consulting on interpretation, writing detailed computational methods, reviewing code, and submitting code to public repositories along with the complete software environments required to make the analyses reproducible.

Projects: computation and code

Most projects are multi-week or multi-month projects, which continue after many iterations and often require authoring substantial amounts of custom R and Python code. We work closely with NICHD's Molecular Genomics Core, where much of the raw high-throughput sequencing data for NICHD are generated. We can access these data directly, avoiding the need to coordinate data transfer and/or storage space with researchers. Analysis performed by the BSPC makes extensive use of NIH’s Biowulf high-performance computing cluster, and there is no direct cost to researchers for work done by the BSPC.

To ensure long-term computational reproducibility, we build a complete software environment for each project, which allows us to track all versions of software and dependencies, and any one project’s environment can be updated without affecting any others. All source code is kept under version control so that the entire history of the project can be tracked. We also build reproducible workflows for each project that keep track of which results have been updated and, wherever possible, provide output as standalone, interactive HTML files, so that researchers can easily explore their results.

We also maintain R Shiny applications into which we load analysis results. After our collaborators authenticate in the system, they are able to explore their results with interactive plots and tables, which allow them to dig deeper without requiring additional computational resources or bioinformatics skills. These applications are continuously updated based on feedback from our collaborators to ensure that they remain easy to use and helpful.

Additional software development and computational resources

The BSPC continues to develop and maintain publicly available open-source tools. One example is lcdb-wf, a system of workflows and pipelines to process high-throughput sequencing data, run extensive quality control, and perform differential ChIP-Seq or RNA-Seq analyses and which run on NIH’s Biowulf computing cluster. We also continue to contribute to the Bioconda project, a system used by bioinformaticians worldwide to easily install biology-related software tools.

The BSPC maintains an RStudio Connect Server instance, which allows us to publish interactive applications that researchers can use to interactively explore and plot their data. We also maintain a GitLab instance in NICHD’s data center, which provides source-code version control, issue tracking, and documentation for projects we work on in such a way that they can be shared with collaborators. These repositories currently store tens of thousands of lines of Python and R code and documentation written by the BSPC and used in various projects.

Publications

  1. Adams PP, Baniulyte G, Esnault C, Chegireddy K, Singh N, Monge M, Dale RK, Storz S, Wade JT. Regulatory roles of Escherichia coli 5' UTR and ORF-internal RNAs detected by 3' end mapping. eLife 2021; doi.org/10.7554/eLife.62438.
  2. Rodriguez-Gil JL, Baxter LL, Watkins-Chow DE, Johnson NL, Davidson CD, Carlson SR, Incao AA, NISC Comparative Sequencing Program, Wallom KL, Farhat NY, Platt FM, Dale RK, Porter FD, Pavan WJ. Transcriptome of HPβCD-treated Niemann-Pick disease type C1 cells highlights GPNMB as a biomarker for therapeutics. Human Mol Genet 2021; doi.org/10.1093/hmg/ddab194.
  3. Mahadevan V, Mitra AK, Zhang Y, Yuan X, Peltekian AA, Chittajallu VK, Esnault CM, Maric D, Rhodes CT, Pelkey KA, Dale RK, Petros T, McBain CJ. NMDARs drive the expression of neuropsychiatric disorder risk genes within GABAergic interneurons subtypes in the juvenile brain. Front Mol Neurosci 2021; doi.org/10.3389/fnmol.2021.712609.
  4. Tseng WC, Johnson Escauriza AJ, Tsai-Morris CH, Feldman B, Dale RK, Wassif CA, Porter FD. The role of Niemann-Pick type C2 in zebrafish embryonic development. Development 2021; 148(7):dev194258; doi.org/10.1242/dev.194258.
  5. Gaikwad S, Ghobakhlou F, Young DJ, Visweswaraiah J, Zhang H, Hinnebusch AG. Reprogramming of translation in yeast cells impaired for ribosome recycling favors short, efficiently translated mRNAs. eLife 2021; doi.org/10.7554/eLife.64283.

Collaborators

  • Joanna Klubo-Gwiezdzinska, MD, PhD, Metabolic Diseases Branch, NIDDK
  • William J. Pavan, PhD, Genomics, Development and Disease Section, NHGRI, Bethesda, MD
  • Shyamal Peddada, PhD, Biostatistics and Bioinformatics Branch, DIPHR, NICHD, Bethesda, MD
  • Michael E. Ward, MD, PhD, Inherited Neurodegenerative Diseases Unit, NINDS, Bethesda, MD

Contact

For more information, email ryan.dale@nih.gov or visit https://www.nichd.nih.gov/about/org/dir/other-facilities/cores/bioinformatics.

Top of Page