The Bioinformatics and Biostatistics Core at Joslin Diabetes Center, affiliate of Harvard Medical School (HMS), applies the latest statistical approaches for data-driven projects related to basic, clinical, and translational research across disease areas and biological data types. We also collaborate to develop new methods, as needed.
Director Jonathan Dreyfuss holds a PhD in bioinformatics from Boston University. He is an Instructor at HMS, an Assistant Investigator at Joslin, and an Associate Director of Harvard Catalyst Biostatistics Consulting. He has published novel methods for more powerful mediation analysis in Nature Communications, more powerful replication analysis (a type of meta-analysis) in Communications Biology, targeted analysis of mass spectrometry data in Analytical and Bioanalytical Chemistry, metabolic network reconstruction and analysis in PLoS Computational Biology and in Cell Reports, and identifying predictive limits of machine learning applied to genetic data in BMC Genomics.
Senior bioinformatician & biostatistician Hui Pan holds a PhD in genetics from Fudan University, has published a first-author paper in Nature, and has led the analysis of many datasets including single cell multiome and spatial transcriptomics.
Together, we have over 20 years of experience in biostatistics and bioinformatics and have published >10 papers together, as can be seen on Google Scholar.
Free initial consultations (~ 1 hour) are available to all to discuss experimental design and analysis strategies. Would you like a free consultation? Email us.
For HMS labs, free consultations that require slightly more time or limited data analysis, such as power calculations, can be requested at Harvard Catalyst biostatistics consulting by pressing on the Request Consultation button, and then in the form checking the Biostatistics Consultation box (NOT the Bioinformatics Consultation box). Biostatistics consultations from Joslin flow automatically to Jonathan Dreyfuss.
For omic and non-omic data, we offer sample size, power calculations, experimental design recommendations and multiple analysis approaches such as ANOVA, ANCOVA, repeated measures ANOVA, handling of missing values, nonparametric analyses, causal inference such as mediation analysis, and machine learning.
We offer omic analysis of all data types, including next-generation sequencing (of bulk tissue, single cells, single nuclei, or spatial), mass spectrometry, microarrays (e.g. methylchip, SomaLogic), and qPCR. We also meta-analyze multiple experiments and integrate data types, construct and analyze networks such as gene networks, and apply metabolic flux analysis such as inferring fluxes from Seahorse flux analyzer data.
The typical bioinformatics pipeline includes normalization, quality control, Principal Component Analysis (PCA), differential abundance, pathway analysis, and visualization. This pipeline takes about 10 hours for most data types, such as bulk RNA-seq, proteomics (inc. SomaLogic), metabolomics, phosphoproteomics. However, it takes about half as much time for a normalized table of counts from high quality samples, whereas it takes about twice as much time (about 20 hours) for a dataset of raw scRNA-seq data. For 10x Genomics data, we can process it so that it can be viewed in 10x Genomics Loupe browser for about 5 hours. The number of hours required are mostly independent of the number of samples. Post-pipeline requests often involve accounting for sample quality, new comparisons, subgroup analyses, and additional visualizations.
A common turn-around time is 1-2 weeks, but when possible urgent requests (e.g. for grant deadlines) are accommodated.
Our bioinformatics and biostatistics services cost $110/hour for Joslin investigators and $160/hour for all others, including commercial clients.
Acknowledging help from our core is sufficient -- we do not require authorship.
We can analyze and integrate public data sets, which you can cite and include in your manuscript. One helpful resource to find public data of all types is Omics Discovery Index, which includes data from the NIH NCBI Gene Expression Omnibus (GEO). Using GEO Profiles, you can search for the expression of a gene across GEO's curated data sets.
We offer seminars that teach the free R language and environment for biostatistics and bioinformatics based on a public interactive website we built at https://jdreyf.shinyapps.io/zero2bioinfo-interactively.
We maintain an in-house gene expression database of published data where a user can search for a gene and see its expression across approximately 75 studies. The Joslin intranet has instructions for logging into the database. We show a snapshot below, where fold-change (FC), p-value (P), and Benjamini-Hochberg (BH) false discovery rate per comparison are shown below the graphs.
Joslin Diabetes Center |
Name | Role | Phone | Location | |
---|---|---|---|---|
Jonathan Dreyfuss, PhD. |
Director
|
Jonathan.Dreyfuss@joslin.harvard.edu
|
||
Hui Pan, PhD. |
Senior Bioinformatician III
|
Hui.Pan@joslin.harvard.edu
|