This course provides students the necessary background and experience in data science technology and concepts. Students will gain experience with tackling a complete data science project, from data gathering and preprocessing to data analysis through machine learning tools. Students will learn to apply fundamental concepts in machine learning to data storage and distributed processing as a foundation for their project.
- 10.009 The Digital World
- 50.034 Probaility and Statistics or 30.003 Introduction to Probability & Statistics or 40.001 Probability and 40.004 Statistics
- Be aware of the main goals of data science, its main application domains and current challenges.
- Apply tools to build basic models for solving typical data analytics problems.
- Visualise the structure of big data in order to uncover hidden patterns.
- Design and implement distributed database systems for managing heterogeneous data.
- Perform basic operations on a moderately complex distributed computation system, such as Spark.
- Explain the fundamentals of statistical machine learning and deep learning.
- Appreciate the technical skills necessary to be a capable data scientist.
- Identify important concepts and current challenges in data science.
- Design feature representations for image, text and time series data.
- Analyse data and build simple models in tools such as Weka, Python and Tableau.
- Implement distributed computation model using Spark.
- Evaluate the performance of different models using empirical benchmarks.
- Mathematically explain common machine learning models such as SVMs, logistic regression systems and neutral networks.
- Implement machine learning algorithms using software such as R, C++ and PyTorch.
- Manage big data using Hadoop and MapReduce.