Bioinformatics and Biostatistics Core

Overview

The Bioinformatics and Biostatistics Core at Joslin Diabetes Center, an affiliate of Harvard Medical School (HMS), helps scientists convert biomedical questions into statistically testable hypotheses and extract meaningful insights from diverse data modalities and disease areas.

Team

Director Jonathan Dreyfuss holds a PhD in bioinformatics from Boston University. He is an Instructor at HMS and an Assistant Investigator at Joslin. He has published novel methods for more powerful mediation analysis in Nature Communications, more powerful replication analysis (a type of meta-analysis) in Communications Biology, targeted analysis of mass spectrometry data in Analytical and Bioanalytical Chemistry, metabolic network reconstruction and analysis in PLoS Computational Biology and in Cell Reports, and identifying predictive limits of machine learning applied to genetic data in BMC Genomics.

Senior bioinformatician & biostatistician Hui Pan holds a PhD in genetics from Fudan University, has published a first-author paper in Nature, and has led the analysis of many datasets including single cell multiome and spatial transcriptomics.

Together, we have over 20 years of experience in biostatistics and bioinformatics, and we have collaborated with labs across disease areas and data types to publish papers in Nature, Nature Medicine, Genome Medicine, Cell Metabolism, Nature Metabolism, Molecular Metabolism, Molecular Cancer, Clinical Cancer Research, JCI, Journal of Investigative Dermatology, PLoS Biology, PNAS, eLife, Clinical Epigenetics, and NEJM.

Free consultation

Free initial consultations (~ 1 hour) are available to all to discuss experimental design and analysis strategies. Would you like a free consultation? Email us.

For Harvard Medical School labs, free consultations that require slightly more time or limited data analysis such as power calculations can be requested by visiting https://catalyst.harvard.edu/biostatistics/consultations/, pressing on the Request Consultation button, and then in the form checking the Biostatistics Consultation box (NOT a Bioinformatics Consultation). Biostatistics requests from Joslin researchers automatically flow to Jonathan Dreyfuss, whereas non-Joslin HMS researchers should specify that they want to work with Jonathan Dreyfuss.

Services

For omic and non-omic data, we offer sample size, power calculations, experimental design recommendations and multiple analysis approaches such as ANOVA, ANCOVA, repeated measures ANOVA, handling of missing values, nonparametric analyses, causal inference such as mediation analysis, and machine learning.

We offer omic analysis of all data types, including next-generation sequencing (of bulk tissue, single cells, single nuclei, ATAC-seq, 10x multiome, DNA-seq, ChIP-seq, ribo-seq, m6A-seq, GRO-seq, PRO-seq, whole-genome bisulfite seq (WGBS), or spatial RNA-seq), mass spectrometry (proteomics, metabolomics, lipidomics, phosphoproteomics), microarrays (methylation arrays, SNP chips, SomaLogic), and qPCR. We also meta-analyze multiple experiments and integrate data types, construct and analyze networks such as gene networks, and apply metabolic flux analysis such as inferring fluxes from Seahorse flux analyzer data.

We can analyze and integrate public data sets, which you can cite and include in your manuscript. One helpful resource to find public data of all types is Omics Discovery Index, which includes data from the NIH NCBI Gene Expression Omnibus (GEO). Using GEO Profiles, you can search for the expression of a gene across GEO's curated data sets.

The typical bioinformatics pipeline includes normalization, quality control, Principal Component Analysis (PCA), differential abundance, pathway analysis, and visualization. This pipeline takes about 10 hours for most data types, such as bulk RNA-seq, proteomics (inc. SomaLogic), metabolomics, phosphoproteomics. However, it takes about half as much time for a normalized table of counts from high quality samples, whereas it takes about twice as much time (about 20 hours) for a dataset of raw scRNA-seq data. For 10x Genomics data, we can process it so that it can be viewed in 10x Genomics Loupe browser for about 5 hours. The number of hours required is mostly independent of the number of samples. Post-pipeline requests often involve accounting for sample quality, new comparisons, subgroup analyses, and additional visualizations.

A common turn-around time is 1-2 weeks, but when possible urgent requests (e.g. for grant deadlines) are accommodated.

Pricing

Our bioinformatics and biostatistics services cost $110/hour for Joslin investigators and $160/hour for all others, including commercial clients.

Authorship

Acknowledging help from our core is sufficient -- we do not require authorship. If you choose to include us as authors, it does not negate service charges.

Free services for Joslin

We offer seminars that teach the free R language and environment for biostatistics and bioinformatics based on a public interactive website we built at https://jdreyf.shinyapps.io/zero2bioinfo-interactively.

We maintain an in-house gene expression database of published data where a user can search for a gene and see its expression across approximately 75 studies. The Joslin intranet has instructions for logging into the database. We show a snapshot below, where fold-change (FC), p-value (P), and Benjamini-Hochberg (BH) false discovery rate per comparison are shown below the graphs.

Joslin Gene Expression Database Profile Example.

Location


Joslin Diabetes Center 3 Blackfan St #549 Boston, MA 02115

Links and Resources

Bioinformatics and Biostatistics Core on Joslin site

Contacts

Name	Role	Phone	Email	Location
Jonathan Dreyfuss, PhD.	Director		Jonathan.Dreyfuss@joslin.harvard.edu
Hui Pan, PhD.	Senior Bioinformatician III		Hui.Pan@joslin.harvard.edu