BIG DATA PLATFORMS

SHE Level 5
SCQF Credit Points 20.00
ECTS Credit Points 10.00
Module Code MMI223995
Module Leader James Paterson
School School of Computing, Engineering and Built Environment
Subject Computing
Trimester
  • B (January start)

Summary of Content

This module covers the platforms that support data storage, processing and analytics in Big Data scenarios. It focuses on highly scalable platforms that provide operational capabilities for real-time, interactive processing and on platforms that provide analytical capabilities for retrospective, complex analysis. Students will gain an advanced understanding of the principles on which these platforms are based, and their strengths, weaknesses and applicability to different types of scenario. They will also gain advanced practical skills in the design and implementation of scalable Big Data platform solutions.

Syllabus

Big Data challenges: Volume, Velocity, Variety Big Data platform concepts Distributed storage Distributed processing CAP Theorem and eventual consistency Operational vs. Analytical workloads Big Data and traditional enterprise data warehouses Big Data architecture Cloud deployment and Cloud Platform support NoSQL data storage General characteristics, strengths and weaknesses of NoSQL NoSQL data store types and their applicability: including key-value, document, columnar, graph NoSQL modeling and schema design NoSQL querying Big Data processing frameworks MapReduce Hadoop Hadoop ecosystem: including Hive, Pig, Mahout Apache Spark Commercial implementations/services Examples of tasks undertaken by students in practical sessions are: Configuration of a database system and its associated software tools to allow implementation of a Big Data storage solution. Design of a data model for a specific database system to meet the needs of a representative Big Data storage scenario. Creation, deployment and test of a solution to a representative Big Data analytic processing scenario using an appropriate set of software tools. -360

Learning Outcomes

On successful completion of this module a student should be able to:Demonstrate a detailed understanding of the requirements, concepts and principles of data storage, processing and analytics in Big Data scenarios.Critically appraise the platform choices available for designing and implementing solutions for data storage, processing and analytics in Big Data scenarios.Design a responsive, scalable and robust solution for a Big Data scenario, making use of appropriate platforms and practices.Implement a prototype system to demonstrate and evaluate a representative design solution.

Teaching / Learning Strategy

The learning and teaching strategy for this module has been informed by the university's 'Strategy for Learning' design principles. The course material is introduced through lectures and laboratory sessions that draw upon and extend the lecture material to deepen students' knowledge. The laboratory sessions are designed as a set of formative exercises and a substantial summative exercise spanning several weeks. The formative exercises introduce a range of technologies that allow students to gain confidence and build knowledge of the range of solutions that can be applied to particular problems. Summative exercises provide experience in real-world problem-solving and challenges students to demonstrate analytical skills and capacity for divergent thinking. Tutorials will be used to help explain and elaborate on both the lecture material and the laboratory exercises; these will include a range of case studies that bring a global perspective to the subject matter. During all lab and tutorial sessions students receive formative feedback on their performance in undertaking the laboratory and tutorial exercises. Summative feedback and grades are also provided for the coursework assignments undertaken as part of the module, using GCULearn. GCU Learn is also used to provide the students with module specific Forums and Wikis to stimulate student and lecturer interaction outwith the normal lecture, laboratory and tutorial sessions. Flexible learning is encouraged and supported. All teaching materials and self-testing exercises are made available on GCULearn and links are provided to external materials such as podcasts, MOOCs, videos and relevant literature. All the computing resources used for laboratories are made available either by virtual machine images (supplied to students for use on their own computers) or online using industry standard cloud computing services provided by major global computing industry vendors. Due to the provision of all mater ial and computing facilities online, the module is suitable for use where Flexible and Distributed Learning (FDL) is required.

Indicative Reading

Marz, N (2015) Big Data: Principles and best practices of scalable realtime data systems, Manning Fowler, M (2012) NoSQL Distilled, Addison Wesley Redmond, E & Wilson, J (2012) Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, Pragmatic Bookshelf White, T (2015) Hadoop: The Definitive Guide, O'Reilly Karau, H, Konwinski, A, Wendell, P & Zaharia, M (2015) Learning Spark: Lightning-Fast Big Data Analysis, O'Reilly

Transferrable Skills

D1 Specialist knowledge and application D2 Critical thinking and problem solving D3 Critical analysis D4 Communication skills, written, oral and listening D7 Computer literacy D8 Self-confidence, self-discipline & self-reliance (independent working) D10 Creativity, innovation & independent thinking D14 Ability to prioritise tasks and time management D17 Commercial awareness

Module Structure

Activity Total Hours
Independent Learning (FT) 120.00
Practicals (FT) 24.00
Lectures (FT) 24.00
Assessment (FT) 20.00
Tutorials (FT) 12.00

Assessment Methods

Component Duration Weighting Threshold Description
Coursework 2 n/a 50.00 45% Design of solution to Big Data storage/processing scenario: report and demonstration of prototype
Coursework 1 n/a 50.00 45% Class assessment, e.g. class test or report