BIG DATA

SHE Level 4
SCQF Credit Points 20.00
ECTS Credit Points 10.00
Module Code MHI224186
Module Leader Yan Zhang
School School of Computing, Engineering and Built Environment
Subject Computing
Trimesters
  • A (September start)
  • B (January start)
  • C (May start)

Pre-Requisite Knowledge

Database Systems Development or equivalent

Summary of Content

This module will provide an introduction to the challenges and possible solutions around big data. Big data is the common term for a collection of data sets so large and complex that it becomes difficult to process using commonly available relational database management systems, desktop statistics and visualization packages. The size and complexity of big data creates challenges in terms of storage, search, sharing, transfer, analysis and visualization. Students will be introduced to the problems of big data and technologies that help address these issues e.g. Hadoop. They will also gain an understanding of data mining and the different applications of data mining e.g. search systems, business intelligence, recommender systems etc. thereby gaining the practical and theoretical skills to make sense of large volumes of data.

Syllabus

-360b7 Introduction to problems in big data -360 o Characteristics o Terminology and concepts o Emerging trends o Characteristics o Issues o Benefits o Limitations -360b7 Data warehousing -360 o Dimensional approach o Normalised approach o Bottom up approach o Top down approach -360b7 Business intelligence -360 o OLAP o Forecasting o Predictive modelling o Metadata o Structured vs. unstructured data -360b7 Data mining -360 o Map reduce o Different technologies e.g. Hadoop, R, Weka etc. o Statistical analysis o Mining data streams o Clustering o Large scale machine learning -360b7 Search systems -360 o Link analysis o Indexing o Language models o Graph search -360b7 Recommender Systems -360 o Nearest neighbour search o Collaborative filtering o Content based recommenders o Dimensionality reduction -360b7 Data visualisation -360 o Aggregation o Drill-down o Filter o Roll-up o What-if analysis

Learning Outcomes

On successful completion of the module the student should be able to:1. Describe the basic principles of the relational data model and the issues of managing large data or big data using the relational data model2. Understand the role of data mining and different algorithms and approaches to address different data mining goals and the application of these algorithms to real-world problems for datasets of varying size.3. Have gained an appreciation of the strategic importance of business analytics and data and demonstrated an ability to extract, cleanse and manage it to derive business knowledge by using systems thinking and applying quantitative analytical techniques4. Apply their knowledge using various state of the art tools such as Hadoop, R etc.

Teaching / Learning Strategy

The university 'Strategy for Learning' documentation has informed the learning and teaching strategy for this module. The module's material will be introduced through lectures while practical programming and problem solving exercises, based on the lecture material, will be given to students for their laboratory sessions. All lecture and laboratory material will be made available on GCU Learn. A number of the technologies and approaches presented in the course have a large amount of external material online e.g. open source toolkits, data sources, video, tutorials etc. and links to these will be provided to the students. This also ensures that students have access to the most up to date technologies and tools being used in the area of big data. During all laboratory sessions students will receive formative feedback on their performance in undertaking the laboratory exercises. Summative feedback and grades will also be provided for the coursework assignment undertaken as part of the module using GCU Learn. GCU Learn will also be used to provide the students with module specific forums and wiki's to stimulate student and lecturer interaction out with the normal lecture and laboratory sessions.

Indicative Reading

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining: Concepts and Techniques, Third Edition. Jiawei Han, Micheline Kamber, Jian Pei Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman and Jeff Ullman Doing Data Science: Straight Talk from the Frontline (O'Reilly). Cathy O'Neil, Rachel Schutt Hadoop: The Definitive Guide (3 rd Edition). Tom White. The Adaptive Web: Methods and Strategies of Web Personalization (Lecture Notes in Computer Science). Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl Modern Information Retrieval: The Concepts and Technology Behind Search (ACM Press). Dr Ricardo Baeza-Yates, Dr Berthier Ribeiro-Neto

Transferrable Skills

D1 Specialist knowledge and application D2 Critical thinking and problem solving D3 Critical analysis D4 Communication skills, written, oral and listening D5 Numeracy D7 Computer literacy D8 Self confidence, self discipline & self reliance (independent working) D10 Creativity, innovation & independent thinking D15 Ability to prioritise tasks and time management D18 Commercial awareness

Module Structure

Activity Total Hours
Practicals (FT) 24.00
Assessment (FT) 18.00
Independent Learning (FT) 134.00
Lectures (FT) 24.00

Assessment Methods

Component Duration Weighting Threshold Description
Course Work 01 n/a 40.00 n/a Practically based assignment
Course Work 02 n/a 60.00 n/a Practically based assignment