This course provides students the necessary background and experience in data science technology and concepts. Students will gain experience with tackling a complete data science project, from data gathering and preprocessing to data analysis through machine learning tools. Students will learn to apply fundamental concepts in machine learning to data storage and distributed processing as a foundation for their project.
- 10.009 The Digital World (For Intake AY2019)
- [NEW] Data Driven World (For Intake AY2020 and subsequent batches)
- 50.034 Introduction to Probability and Statistics / 30.003 Introduction to Probability & Statistics / 40.001 Probability + 40.004 Statistics
- Be aware of the main goals of data science, its main application domains and current challenges.
- Apply tools to build basic models for solving typical data analytics problems.
- Visualise the structure of big data in order to uncover hidden patterns.
- Design and implement distributed database systems for managing heterogeneous data.
- Perform basic operations on a moderately complex distributed computation system, such as Spark.
- Explain the fundamentals of statistical machine learning and deep learning.
- Appreciate the technical skills necessary to be a capable data scientist.
- Identify important concepts and current challenges in data science.
- Design feature representations for image, text and time series data.
- Analyse data and build simple models in tools such as Weka, Python and Tableau.
- Implement distributed computation model using Spark.
- Evaluate the performance of different models using empirical benchmarks.
- Mathematically explain common machine learning models such as SVMs, logistic regression systems and neutral networks.
- Implement machine learning algorithms using software such as R, C++ and PyTorch.
- Manage big data using Hadoop and MapReduce.