HNH-31606 Analytical Epidemiology II


Credits 6.00

Teaching methodContact hours
Group work14
Course coordinator(s)dr. ir. JCM Verhoef
Lecturer(s)dr. ir. A Kuijsten
prof. dr. HC Boshuizen
dr. ir. JCM Verhoef
prof. dr. ir. EJM Feskens
Examiner(s)dr. ir. JCM Verhoef

Language of instruction:


Assumed knowledge on:

Introduction to Epidemiology and Public Health, Basic and Advanced statistics, Study Designs, Analytical Epidemiology I 

Continuation courses:

MSc thesis with a focus on nutritional or public health epidemiology


Does overweight lead to death? How do I model the association between body fat mass and body mass index, or between the number of cigarettes smoked and death, or dose-response relationships in studies of nutritional supplementation, given that these relationships are non-linear? How do I analyse the frequency of food intake by age, or the number of cardiovascular events as a function of blood pressure? In early pregnancy, how can I predict which women will deliver a child with low birth weight? Does ‘body mass index’ improve the diagnosis of tuberculosis when added to the existing arsenal of laboratory tests? These are examples of common questions in nutritional epidemiological research. Researchers, policy makers, health professionals and the public are bombarded with conflicting messages about nutrition and health, often based on reports from questionably designed and executed epidemiological studies. Whether you wish to pursue a career in nutrition research, consultancy or in public health, you need to be able to interpret data. Data analysis needs to be done well and with integrity for research to be useful.

In this course, you will learn state-of-the-art solutions for problems that are commonly encountered in the analysis and interpretation of epidemiological studies on (global) nutrition and health. Theory is important, but we will focus on insights, understanding, practical steps and skills. We will keep the use of formulas and computation at a minimum, and take your intuition as a starting point. We will provide worked examples with code in Stata, a statistical software package that is both powerful and user-friendly. You will be surprised how you how easy it is to conduct analyses with this package. For exercises, you will adapt and combine such code in real-life datasets from past or ongoing studies.We may also conduct some analyses in R.

Although primarily intended for MSc students Nutrition and Health with a specialisation Nutritional and Public Health epidemiology, we believe that MSc students in other specialisations or even other MSc programmes will find this course useful. This course is mandatory for MSc Nutrition and Health students who wish to be considered for registration in The Netherlands as ‘Epidemiologist A’.

Modules include (tentative): 

Module 1 Analysis of confounding including Directed Acyclic Graphs (DAGs)

The objective of etiological research is to make causal inferences about an exposure. Traditional methods of adjusting for potential confounders can be inadequate. Directed acyclic graphs (DAGs) are increasingly used in modern epidemiology to visually present causal assumptions. They can help to identify the presence of confounding for the causal question at hand. DAGs visually show the underlying causal assumptions, which can help to determine the most effective method to deal with confounding control.

Modules 2-4 Modelling with continuous exposure variables
One problem in epidemiology concerns the issue of how to deal with nonlinearity in relationships between the outcome and independent variables. The problem is often ignored by assuming linearity, or continuous independent variables are categorised (e.g. dichotomised, or as quartiles). We will review why these approaches are highly undesirable, and introduce fractional polynomial regression analysis as a relatively new, alternative method.

Modules 5-10 Modelling count data
Count data are observations that have nonnegative integer values ranging from zero to some greater undefined number and are often observed within a period of time, within a geographical area, volume, or within a population at risk of disease (e.g., number of food items consumed per 24 hours, number of resident physicians per square kilometer, number of episodes of fever within a 6-month follow-up period). In these modules, we will review how count data are modelled as a function of independent variables.

Module 11 Prognosis and diagnosis using disriminant analysis
Discrimination is a measure of how well a test or model separates those with or without a disease or disorder (diagnosis) or those who will or will not develop a disorder (prognosis). Most published studies assess or compare the diagnostic performance of single tests. The results of these studies usually have limited applicability because tests are generally not done in isolation. In most settings, there is information from an array of diagnostic or predictive tests. The key question thus concerns the diagnostic or predictive test when applied in combination with (or applied after) other tests.

Modules 12-13 Dealing with missing values
Missing data occur in almost all datasets, and can cause biased results and reduced statistical precision. Simple solutions (e.g., last-observation carried forward) are often used but they can only partially redress these problems. Multiple imputation has recently been put forward as a novel method that can yield valid results even with large numbers of missing values.

Module 14 Meta-analysis
Meta-analysis concerns an effort to pool and synthesize the results from multiple studies, with the aim to increase statistical precision, to improve estimates of effect size and/or to resolve uncertainties when reports disagree. Although some consider meta-analyses to provide the highest level of evidence about the causal determinants of health and disease, there are others who merely regard them as marketing tools that are unnecessary, misleading and susceptible to bias due to flawed methodologies, or due to researchers’ subjective decisions, personal or professional conflicts of interest.

Module 15 p-values and confidence intervals revisited
P-values, statistical significance, power and confidence intervals. These concepts are widely used in epidemiology (and other sciences) but they are seldom understood and usually misinterpreted. In this module, you will gain a thorough understanding of these terms and how (not) to use them.

Learning outcomes:

After successful completion of this course students are expected to be able to:
- apply state-of-the-art methods to solve problems that are commonly encountered in epidemiological studies on (global) nutrition and health;
- adapt and combine selected functions in Stata.
1. This course is currently under development, topics may be included or replaced, and module learning objectives and contents can be changed;
2. This course is based in part on the course HNH-30606 Analytical Epidemiology II that was offered until the 2019/2020 academic year. If you attended that course and also want to take the present course, please consult your study adviser.
All data analyses will be executed in R and in this way students will implicitly learn how to use R for (nutritional) epidemiology and data analysis.


Lectures, software demonstrations, literature study, data analysis practicals and (individual and/or group) assignments, group discussions.


(Tentative) Portfolio (i.e., collection of worked solutions to exercises), final written examination.


To be announced.

Compulsory for: MNHNutrition and HealthMScA: Spec. A - Nutritional and Public Health Epidemiology6MO