MACHINE LEARNING AND DATA ANALYTICS

SHE Level 4
SCQF Credit Points 20.00
ECTS Credit Points 10.00
Module Code MHI225680
Module Leader Yan Zhang
School School of Computing, Engineering and Built Environment
Subject Computing
Trimester
  • A (September start)

Pre-Requisite Knowledge

Database Development or equivalent

Summary of Content

This module will introduce the challenges and possible solutions around manipulating and analysing various datasets of different shapes and formats. The module is designed to help students learn to build and apply tools that are required to derive business value from data. The tools will range from data preprocessing, analysis to visualization tools.

Syllabus

-359 Overview Introduction to Artificial Intelligence (AI) and Machine Learning Introduction to supervised learning Introduction to unsupervised learning Introduction to other types of learning (reinforcement, transfer, etc) Business intelligence -359 Basic Data Analytic and Machine Learning Techniques Loading/importing data for analysis from various sources Data preparation and pre-processing Exploratory Data Analysis (EDA) Feature engineering Linear Regression Logistic Regression Decision Trees Association Rules Cluster Analysis Advanced Data Analytic and Machine Learning Techniques -359 Support Vector Machines Bayesian Network Classifiers Ensemble methods (bootstrapping, bagging, boosting). Parameter optimization Neural Networks Text Analysis Deep Learning -359 Model Evaluation, and Comparison and Deployment Quantify model performance (e.g. accuracy, precision, lift curves, ROC curves, etc.) APIs for model deployment e.g. REST API, pickle, etc.

Learning Outcomes

On completion of the module the student should be able to:1 - Demonstrate an understanding of the scope and methodology of artificial intelligence (AI) by applying it to different types and sizes of data2 - Build or apply existing state of the art tools to prepare data for analysis or visualization3 - Demonstrate an understanding of the role of machine learning and different algorithms and approaches to address different data analysis goals and the application of these algorithms to real-world problems to datasets of varying size.4 - Critically appraise the strategic importance of data and analytics in a business sense

Teaching / Learning Strategy

The university 'Strategy for Learning' documentation has informed the learning and teaching strategy for this module. The module's material will be introduced through lectures while practical programming and problem solving exercises, based on the lecture material, will be given to students for their laboratory sessions. All lecture and laboratory material will be made available on GCU Learn. A number of the technologies and approaches presented in the course have a large amount of external material online e.g. open source toolkits, data sources, video, tutorials etc. and links to these will be provided to the students. This also ensures that students have access to the most up to date technologies and tools being used in the area of big data. During all laboratory sessions students will receive formative feedback on their performance in undertaking the laboratory exercises. Summative feedback and grades will also be provided for the coursework assignment undertaken as part of the module using GCU Learn. GCU Learn will also be used to provide the students with module specific forums and wiki's to stimulate student and lecturer interaction out with the normal lecture and laboratory sessions.

Indicative Reading

Deep Learning (Adaptive Computation and Machine Learning Series), Ian Goodfellow and Yoshua Bengio, 2016 Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Ian H. Witten, Eibe Frank, Mark A. Hall (2016) Data Mining: Concepts and Techniques, Third Edition. Jiawei Han, Micheline Kamber, Jian Pei (2011) Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman and Jeff Ullman (2015) Doing Data Science: Straight Talk from the Frontline (O'Reilly). Cathy O'Neil, Rachel Schutt (2013) Hadoop: The Definitive Guide (4th Edition). Tom White (2015). Python for Data Analysis, Mckinney W., (2017) Modern Information Retrieval: The Concepts and Technology Behind Search (ACM Press). Dr Ricardo Baeza-Yates, Dr Berthier Ribeiro-Neto

Transferrable Skills

Specialist knowledge and application Critical thinking and problem solving Critical analysis Communication skills, written, oral and listening Numeracy Computer literacy Self confidence, self discipline & self reliance (independent working) Creativity, innovation & independent thinking Ability to prioritise tasks and time management Commercial awareness

Module Structure

Activity Total Hours
Independent Learning (FT) 132.00
Practicals (FT) 24.00
Lectures (FT) 24.00
Assessment (FT) 20.00

Assessment Methods

Component Duration Weighting Threshold Description
Coursework 1 n/a 50.00 35% Class assessment: written paper/on-line test
Coursework 2 n/a 50.00 35% Problem based assessment