INF-34306 Data Science Concepts

Course

Credits 6.00

Teaching methodContact hours
Lecture16
Tutorial16
Practical20
Group work6
Course coordinator(s)prof. dr. ir. B Tekinerdogan
Lecturer(s)prof. dr. ir. D de Ridder
prof. dr. ir. B Tekinerdogan
Examiner(s)prof. dr. ir. D de Ridder
prof. dr. ir. B Tekinerdogan

Language of instruction:

English

Assumed knowledge on:

This course assumes basic working knowledge on mathematics and statistics, as treated in Mathematics 1 and 2 (MAT-14803/903), Mathematics for Social Sciences (MAT-12806) and Statistics 1 and 2 (MAT-15303/403).
It is not necessary to follow this course if the student had completed BSc minor in Bioinformatics or Data Science.

Continuation courses:

INF-21306 Data Management
INF-22306 Programming in Python
INF-33306 Linked Data
BIF-31806 Data-Driven Discovery in the Life Sciences:Hypothesis Generation from Omics Data
INF-33806 Big Data
FTE-35306 Machine learning
MAT-32806  Statistics for Data Scientist
CPT-30503 Data Science Ethics
SSB-30306 Molecular Systems Biology


Contents:

The amount and variety of data in the domains of living environment, food, health, society and natural resources increases very rapidly. Data thus plays an ever more central role in these areas, and careful processing and analysis can help extract information and infer new knowledge, eventually leading to new insights and a better understanding of the problem at hand. Knowledge of core concepts in data science – acquisition, manipulation, governance, presentation, exploration, analysis and interpretation – and elementary data science skills have become essential for researchers and professionals in most scientific disciplines. This course is an introduction to data science concepts, combining computer science, mathematics and domain expertise: acquiring and manipulating raw data, obtaining information by processing and exploration, and finally reaching understanding by analysis and modelling. This will be complemented by elementary skills in data wrangling, exploration and analysis. The content of the course is strongly embedded in a number of provided domain-specific cases from biology, health and nutrition and the environment, allowing students from many disciplines to appreciate the relevance of data science in their domains.


Learning outcomes:

After successful completion of this course students are expected to be able to:
- explain the relevance of data and data science in research and application within their field of study;
- recognize key concepts as used in data science practice and elaborated in continuation courses;
- discuss the need for and describe approaches to data acquisition, manipulation, storage, governance, exploration, presentation, analysis and modeling;

- apply a number of basic techniques for data wrangling, exploration and analysis in use cases related to their field of study, including practicing elementary scripting skills.

Activities:

In six blocks, students will learn data science concepts and obtain elementary practical skills, connected by a continuing use case that fits their study domain:

  1. Data science: its role in research and application. Thinking as a data scientist. Measurement and acquisition, data types, structured vs. unstructured data.
  2. Manipulation: integration, cleaning, selection. Wrangling: from spreadsheet to elementary scripting. 
  3. Storage and governance: databases, cloud, standards, governance, management and ethics. 
  4. Exploration and presentation: hypothesis tests, correlations, reporting, visualization, interaction.
  5. Analysis: machine learning, clustering, classification, regression, reasoning, interpretation.
  6. Interpretation: modeling, networks, processes, agents, limitations.

Each block consists of a combination of generic lectures and practical work in the context of a specific case, either writing an essay on the current topic or a report on practical work performed. For specific blocks, inspirational guest speakers from research or industrial data science practice will be invited.

Examination:

- essays and reports (50%);
- written exam (50%).
Each component needs a minimum mark of 5.5 to pass. this is a new course and the information above may change. Final information will be published in the study guide.

Literature:

see www.wur.eu/inf

ProgrammePhaseSpecializationPeriod
Restricted Optional for: BSWSoil, Water, AtmosphereBSc4WD
MBIBiologyMSc4WD
MEEEarth and EnvironmentMSc4WD
MFNForest and Nature ConservationMSc4WD
MESEnvironmental SciencesMSc4WD
MPSPlant SciencesMSc4WD
MPBPlant BiotechnologyMSc4WD
MNHNutrition and HealthMScB: Nutritional Physiology and Health Status4WD
MNHNutrition and HealthMScD: Sensory Science4WD
MNHNutrition and HealthMScC: Molecular Nutrition and Toxicology4WD
MNHNutrition and HealthMScA: Nutritional and Public Health Epidemiology4WD
MTOTourism, Society and EnvironmentMSc4WD