This course will be offered to:
- Intake Y2023 in Summer 2025 (Term 5)
- Intake Y2022 in Spring 2025 (Term 6)
- Intake Y2021 in Spring 2025 (Term 8)
Course Description
Database systems manage data which is at the heart of modern computing applications. We are in the era of big data, in which data is generated from many sources, in high velocity and with great variety. This poses numerous challenges in using and improving database technologies. Big data systems designed to support analytics are maturing and are becoming increasingly important to many applications.
This course covers the fundamentals of traditional databases, such as Oracle and MySQL, and core ideas of recent big data systems. Students will learn important problems in data management that these systems are designed to solve. They will experience with building applications on top of traditional databases, namely SQLite, and state‐of‐the‐art big data platforms, namely MongoDB and Apache Spark. These systems will be running both locally and on the Amazon cloud (Amazon Web Service). The students will be able to determine for themselves the advantages and disadvantages of different systems.
Prerequisites
- 10.014 Computational Thinking for Design (For AY2020 and subsequent batches)
- 50.004 Algorithms
- 50.005 Computer System Engineering
Learning Objectives
- Design and implement a database application on top of a relational database management systems (RDBMS).
- Identify major components of database and big data systems.
- Estimate the costs of different database operations.
- Explain how state‐of‐the‐art big data systems differ to one another.
- Implement a cloud‐based big data application.
- Explain how database and big data systems fit together in real‐world applications.
- Use cloud‐based systems.
Measurable Outcomes
- Develop a database design for an application.
- List and explain major components of database and big data systems.
- Write complex SQL queries.
- Estimate cost of different database operations.
- Compare different classes of big data systems.
- Write MapReduce and Spark jobs.
- Explain how a database differs to a big data system.
- Design, implement, and deploy database and big data systems on AWS.
Topics Covered
- Database systems internal
- Relational model
- ER to Relational Model
- Relational Algebra
- SQL
- Functional Dependency
- DB normalization
- Data Storage
- Indexing
- Query evaluation
- Query optimization
- Transaction Management
- Big data systems architecture
- AWS
- HDFS
- Spark
Textbook(s) and/or Other Required Material
- Abraham Siberschatz, Henry Korth, S Sudarshan. Database System Concepts, 6th edition.
- Johannes Gehrke, Raghu Ramakrishnan. Database Management Systems, 3rd edition.
- Thomas Erl. Big Data Fundamentals: Concepts, Drivers & Techniques. 2016.
- Armbrust et al. “Above the clouds: A Berkeley View of Cloud Computing”. EECS Technical Report. 2009.