INF-33806 Big Data

Course

Credits 6.00

Teaching methodContact hours
Lecture14
Practical24
Group work9
Course coordinator(s)ir. MA Zijp
Lecturer(s)dr. Q Liu
Examiner(s)dr. C Catal

Language of instruction:

English.

Assumed knowledge on:

Familiarity with relational databases (like INF-21306 Data Management) or computer programming (like INF-22306 Programming in Python) is helpful.

Contents:

Big Data usually refers to data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. With the advancements in computing, the realization of Big Data systems has now become feasible and can trigger innovation and growth for various application domains, including most Wageningen University and Research domains. 
This course will discuss both the key concepts of Big Data and provide hands-on-experience in developing and using Big Data systems. We introduce concepts related to Big Data system architectures, distributed filesystems, the Map-Reduce framework, Resilient Distributed Datasets, and scalable linear and machine learning models, and how they are made available with cutting-edge technologies such as the Hadoop Distributed File System and Apache Spark. Students will practice with tools with individual tutorials, and gain hands-on experience by working on a group project formed as a "data challenge". Students will demonstrate the use of the tools learned in the course, but also their creativity as data scientists, that includes communicating the value of their findings with visualization tools. The course has been designed in such a way that it is accessible for students of a diverse range of disciplines in Wageningen University, like geo-information science, environmental sciences, biosystems engineering, bioinformatics and social sciences.

Learning outcomes:

After successful completion of this course students are expected to be able to:
- discuss the basic concepts related to Big Data and data-driven value-creation in the environmental, social and life sciences;
- use Big Data methods for designing scalable applications in the environmental, social and life sciences;
- discuss the role of various tools in the Big Data ecosystem and have hands-on experience with some of them;
- explore data analytics for discovery, and data visualization for communication of meaningful patterns in data;
- show insight into the value of data-driven innovation, and associate it with their own course of studies.

Activities:

- lectures;
- practicals;
- group assignments

Examination:

- quiz (3 times, closed book) (50%);
- group work (50%).
Both quiz and group work require a minimum of 5.5 to pass.

Literature:

Recent scientific literature (a collection of papers) will be made available, which is available at no cost through the library.

ProgrammePhaseSpecializationPeriod
Restricted Optional for: BSWSoil, Water, AtmosphereBSc2MO
MEEEarth and EnvironmentMSc2MO
MBEBiosystems EngineeringMSc2MO
MGIGeo-Information ScienceMSc2MO
MinorPeriod
Compulsory for: WUDSCBSc Minor Data Science2MO