BIG DATA

SHE Level 4
SCQF Credit Points 20.00
ECTS Credit Points 10.00
Module Code MHI225101
Module Leader Yan Zhang
School School of Computing, Engineering and Built Environment
Subject Computing
Trimesters
  • A (September start)
  • B (January start)

Pre-Requisite Knowledge

Database Systems Development or equivalent

Summary of Content

This module will provide an introduction to the challenges and possible solutions around big data. Big data is the common term for a collection of data sets so large and complex that it becomes difficult to process using commonly available relational database management systems, desktop statistics and visualization packages. The size and complexity of big data creates challenges in terms of storage, search, sharing, transfer, analysis and visualization. Students will be introduced to the problems of big data and technologies that help address these issues e.g. Hadoop. They will also gain an understanding of data mining and the different applications of data mining e.g. search systems, business intelligence, recommender systems etc. thereby gaining the practical and theoretical skills to make sense of large volumes of data. The percentage of Work Based Learning for this module, as represented by the proportion of the Activity Types which take place off campus, is 79%. The percentage of Work Based Assessment or equivalent activity for this module is 40%.

Syllabus

-360b7 Introduction to problems in big data -360 o Characteristics o Terminology and concepts o Emerging trends o Characteristics o Issues o Benefits o Limitations -360b7 Data warehousing -360 o Dimensional approach o Normalised approach o Bottom up approach o Top down approach -360b7 Business intelligence -360 o OLAP o Forecasting o Predictive modelling o Metadata o Structured vs. unstructured data -360b7 Data mining -360 o Map reduce o Different technologies e.g. Hadoop, R, Weka etc. o Statistical analysis o Mining data streams o Clustering o Large scale machine learning -360b7 Search systems -360 o Link analysis o Indexing o Language models o Graph search -360b7 Recommender Systems -360 o Nearest neighbour search o Collaborative filtering o Content based recommenders o Dimensionality reduction -360b7 Data visualisation -360 o Aggregation o Drill-down o Filter o Roll-up o What-if analysis

Learning Outcomes

On successful completion of the module the student should be able to:1 - Describe the basic principles of the relational data model and the issues of managing large data or big data using the relational data model2 - Understand the role of data mining and different algorithms and approaches to address different data mining goals and the application of these algorithms to real-world problems for datasets of varying size.3 - Have gained an appreciation of the strategic importance of business analytics and data and demonstrated an ability to extract, cleanse and manage it to derive business knowledge by using systems thinking and applying quantitative analytical techniques4 - Apply their knowledge using various state of the art tools such as Hadoop, R etc.

Teaching / Learning Strategy

Work based Education aims to maximise the direct and digitally mediated contact time with students by practicing teaching and learning strategies that use authentic work based scenarios and encourage action learning, enquiry based learning, problem based learning and peer learning. All these approaches aim to directly involve the students in the process of learning and to encourage sharing of learning between students. The module team will determine the level and accuracy of knowledge acquisition at key points in the delivery, inputting when necessary either directly or with the support of external experts who will add to the authenticity, the credibility and application of the education and learning in the workplace. The course material is introduced through lectures in the form of online presentations, while practical programming and problem solving exercises, based on the lecture material, will be given to students for their laboratory sessions. All lecture and laboratory material will be made available on GCU Learn. A number of the technologies and approaches presented in the course have a large amount of external material online e.g. open source toolkits, data sources, video, tutorials etc. and links to these will be provided to the students. This also ensures that students have access to the most up to date technologies and tools being used in the area of big data. During all laboratory sessions students will receive formative feedback on their performance in undertaking the laboratory exercises. Summative feedback and grades will also be provided for the coursework assignment undertaken as part of the module using GCU Learn. GCU Learn will also be used to provide the students with module specific forums and wiki's to stimulate student and lecturer interaction out with the normal lecture and laboratory sessions.

Indicative Reading

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining: Concepts and Techniques, Third Edition. Jiawei Han, Micheline Kamber, Jian Pei Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman and Jeff Ullman Doing Data Science: Straight Talk from the Frontline (O'Reilly). Cathy O'Neil, Rachel Schutt Hadoop: The Definitive Guide (3 rd Edition). Tom White. The Adaptive Web: Methods and Strategies of Web Personalization (Lecture Notes in Computer Science). Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl Modern Information Retrieval: The Concepts and Technology Behind Search (ACM Press). Dr Ricardo Baeza-Yates, Dr Berthier Ribeiro-Neto

Transferrable Skills

Specialist knowledge and application Critical thinking and problem solving Critical analysis Communication skills, written, oral and listening Numeracy Computer literacy Self confidence, self discipline & self reliance (independent working) Creativity, innovation & independent thinking Ability to prioritise tasks and time management Commercial awareness Develop an understanding of the practical considerations that constrain the application of theory in the workplace.

Module Structure

Activity Total Hours
Assessment (FT) 18.00
Independent Learning (FT) 134.00
Practicals (FT) 24.00
Lectures (FT) 24.00

Assessment Methods

Component Duration Weighting Threshold Description
Course Work 02 n/a 60.00 n/a Practically based assignment
Course Work 01 n/a 40.00 n/a Practically based assignment