INF-33806 Big Data


Code last year: (INF-51306)

Course

Credits 6.00

Teaching methodContact hours
Lecture14
Practical24
Group work9
Course coordinator(s)dr. I Athanasiadis
prof. dr. ir. B Tekinerdogan
Lecturer(s)prof. dr. ir. B Tekinerdogan
dr. I Athanasiadis
dr. C Catal
Examiner(s)dr. I Athanasiadis
prof. dr. ir. B Tekinerdogan

Language of instruction:

English.

Assumed knowledge on:

Familiarity with relational databases (like INF-21306 Data Management) or computer programming (like INF-22306 Programming in Python) is helpful.

Contents:

Big Data usually refers to data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. With the advancements in computing, the realization of Big Data systems has now become feasible and can trigger innovation and growth for various application domains, including most Wageningen University and Research domains. 
This course will discuss both the key concepts of Big Data and provide hands-on-experience in developing and using Big Data systems. We introduce concepts related to Big Data system architectures, distributed filesystems, the Map-Reduce framework, Resilient Distributed Datasets, and scalable linear and machine learning models, and how they are made available with cutting-edge technologies such as the Hadoop Distributed File System and Apache Spark. Students will practice with tools with individual tutorials, and gain hands-on experience by working on a group project formed as a "data challenge". Students will demonstrate the use of the tools learned in the course, but also their creativity as data scientists, that includes communicating the value of their findings with visualization tools. The course has been designed in such a way that it is accessible for students of a diverse range of disciplines in Wageningen University, like geo-information science, environmental sciences, biosystems engineering, bioinformatics and social sciences.

Learning outcomes:

After successful completion of this course students are expected to be able to:
- discuss the basic concepts related to Big Data and data-driven value-creation in the environmental, social and life sciences;
- use Big Data methods for designing scalable applications in the environmental, social and life sciences;
- discuss the role of various tools in the Big Data ecosystem and have hands-on experience with some of them;
- explore data analytics for discovery, and data visualization for communication of meaningful patterns in data;
- show insight into the value of data-driven innovation, and associate it with their own course of studies.

Activities:

- lectures;
- tutorials;
- group assignments;
- excursion (half day).

Examination:

- quiz (3 times, closed book) (50%);
- group work (50%).
The group work requires a minimum of 5.5 to pass.

Literature:

Recent scientific literature (a collection of papers) will be made available, which is available at no cost through the library.

ProgrammePhaseSpecializationPeriod
Restricted Optional for: MBEBiosystems EngineeringMSc2MO
MGIGeo-Information ScienceMSc2MO
MinorPeriod
Compulsory for: WUDSCBSc Minor Data Science2MO