BIF-31806 Data-Driven Discovery in the Life Sciences:​ Hypothesis Generation from Omics Data

Course

Credits 6.00

Teaching methodContact hours
Individual Paper3
Lecture23
Practical55
Group work13
Course coordinator(s)dr. MH Medema
Lecturer(s)dr. M Suarez Diez
dr. E Saccenti
dr. JA Hageman
dr. M Bosse
dr. ir. GJEJ Hooiveld
dr. ir. CA Maliepaard
dr. C Furlan
dr. JJJ van der Hooft
MFL Derks
R Holmer
Examiner(s)prof. dr. ir. D de Ridder

Language of instruction:

English

Assumed knowledge on:

Statistics and data analysis as applied to omics data, as treated in BIF-51306 Data Analysis and Visualization or SSB-30306 Molecular Systems Biology and in MAT-32806 Statistics for Data Scientists.

Continuation courses:

MSc thesis / internship.

Contents:

Across the life sciences, scientists utilize omics data to study biological phenomena in humans, plants, animals and microbes. This results in large and heterogeneous data sets that can be analyzed using a variety of algorithms and statistical methods. Making sense of the data, extracting biological knowledge out of the results of these analyses and composing new hypotheses and research questions from this is not trivial. However, when basic data science skills are combined with domain knowledge, either from literature or from databases accessed in a high throughput manner, omics data constitute a goldmine for data-driven discovery of novel insights and hypotheses that can be tested in follow-up experiments. This course will train students in linking domain knowledge to data using data science techniques and skills, in order to design omics experiments, evaluate the quality of the resulting data, interpret them in the light of literature and domain databases, and mine them to make discoveries and compose new research questions and hypotheses. Domain-specific case studies will allow students to directly apply their skills on data relevant to their specialization.

Learning outcomes:

After successful completion of this course students are expected to be able to:
- explain the advantages and limitations of different types of omics data;
- access, in a high throughput manner, databases commonly used in the life sciences for the interpretation of omics data;
- design effective omics experiments, with appropriate replicates, controls, controlling for batch effects, etc.;
- evaluate the quality and limitations of omics data (quality of the raw data, technical/biological variation, etc.) based on the outcomes of statistical analyses;
- interpret (processed) omics analysis results using domain knowledge, data mining (commonly used biological databases ) and literature mining;
- extract knowledge from the data and synthesize this into a (possible) biological story and compose new research questions and hypotheses based on this (data-driven).

Activities:

The course comprises two blocks: a general block of four weeks and a domain-specific case study of four weeks. In the first block, students will work on a single case study with pre-processed multi-omics data in pairs. After a (set of ) lectures/tutorials on the different omics techniques and on the design of omics experiments, but before seeing the actual data and experimental setup, they will write short research proposals in which they design the experiments themselves, under realistic constraints. Supported by intermittent plenary lectures and practical exercises, the students will then go through the steps of a typical omics analysis with the actual data, from quality evaluation to data mining, biological interpretation and composing new hypotheses. The period will close with an exam.
In the second four weeks, students will work on domain-specific case studies (focusing on either a plant, animal or microbiome omics dataset) in teams. Each team will be composed of students studying within this domain, supplemented with bioinformatics/methodology-oriented students. Coached by the teachers, the teams will explore/interpret the data to gain new biological insights and compose research questions and hypotheses for follow-up experimental work (on which they will write an individual report).

Examination:

- proposal / experimental design (10%);
- exam (40%);
- individual report, including interdisciplinary group performance (50%).
A minimum of 5.5 for each element is required for passing the course.

Literature:

Materials will be provided in Brightspace.

ProgrammePhaseSpecializationPeriod
Restricted Optional for: MBIBiologyMSc6MO
MPSPlant SciencesMSc6MO
MPBPlant BiotechnologyMSc6MO
MNHNutrition and HealthMScC: Molecular Nutrition and Toxicology6MO
MBFBioinformaticsMSc6MO