DATA ANALYTICS

SHE Level 5
SCQF Credit Points 20.00
ECTS Credit Points 10.00
Module Code MMI223997
Module Leader Frances Garven
School School of Computing, Engineering and Built Environment
Subject Cyber Security and Networks
Trimesters
  • A (September start)
  • B (January start)

Summary of Content

This module covers the basic concepts of statistics needed to understand the critical concepts of data mining, machine learning and predictive analytics used in the visualisation and analysis of data, particularly Big data. Big data is the term used for a collection of structured and unstructured data sets so large and complex that it is difficult to process using the commonly available relational database management systems and statistical software packages. Data mining is the process of discovering useful patterns and trends in large data sets. Predictive analytics is the process of extracting information from large data sets in order to make predictions and estimates about future outcomes. Students will gain an understanding of data preparation, the process models used in analytics, the algorithms and their requirements, the implementation of these algorithms using current technologies, and their applicability to different types of scenario. They will also gain advanced practical skills in the design, implementation and evaluation of analytical solutions to problems involving Big Data.

Syllabus

Overview : What is Data Analytics and Big Data? Data types: Structured a nd Unstructured data. Terminology: Data Mining, Machine Learning, Predictive Analytics, Business Intelligence, Data Science, Prediction, Classification, Segmentation. Supervised and Unsupervised Learning. Applications and Use Cases. Data Architecture and Analytics Process Models: Data acquisition and data integration. Data from different types of data sources. Analytics sandboxes. Data Warehousing and OLAP, Data Marts, Data Lakes, Data Staging area, Data Streams. Symmetric Multiprocessing (SMP) vs. Massive Parallel Processing (MPP). Data Analytics Lifecycle, IBM CRISP, SAS-SEMMA. Basic Data Analytic Methods: Loading/importing data for analysis, sampling, data preparation and pre-processing. Variable selection and transformation. Outliers. Categorisation. Treatment of missing values. Exploring the data: exploratory data analysis, descriptive statistics and graphical visualisations. Statistical Inference: Hypothesis testing Framework. Chi-Square test. Linear Regression. Advanced Analytical Methods: Logistic Regression, Decision Trees, Association Rules, Cluster Analysis, Support Vector Machines, Bayesian Network Classifiers, Neural Networks, Text Analysis, Deep Learning. Model Evaluation, Comparison and Deployment: Data set split up. Quantifying the performance of the models (lift curves, ROC curves). Ensemble methods (bootstrapping, bagging, boosting) . Deploy, monitor and backtest analytical models. Big Data Analytics Technologies: Hadoop, Cloud-based, R-based tools, Python-based tools, Java-based tools (WEKA), SAS Data Analytics, IBM Watson Analytics. Examples of tasks undertaken by students in practical sessions are: -360b7 Loading/Importing and preparation of data for analysis using appropriate software, including SAS. b7 Design, construction and interpretation of visual representations of statistics and data. b7 Applying various analytical methods such as regression and decision-trees using SAS and IBM Watson Analytics.

Learning Outcomes

On successful completion of this module a student should be able to:Demonstrate an in depth understanding of how key algorithms and models are applied in developing analytical solutions.Demonstrate an in depth understanding of how analytical solutions can deliver benefits to organizations.Critically appraise the approach, the selected data, the fitted models and evaluations used to solve Big Data problems.Apply their knowledge using various state of the art tools such as Hadoop, and SAS.Implement the various Data Mining steps to develop and evaluate appropriate models for a Big Data scenario.

Teaching / Learning Strategy

The learning and teaching strategy for this module has been informed by the university's 'Strategy for Learning' design principles. The course material is introduced through lectures and laboratory sessions that draw upon and extend the lecture material to deepen students' knowledge. The laboratory sessions are designed as a set of formative exercises and a substantial summative exercise spanning several weeks. The formative exercises introduce a range of technologies that allow students to gain confidence and build knowledge of the range of solutions that can be applied to particular problems. Summative exercises provide experience in real-world problem-solving and challenges students to demonstrate analytical skills and capacity for divergent thinking. Tutorials will be used to help explain and elaborate on both the lecture material and the laboratory exercises; these will include a range of case studies that bring a global perspective to the subject matter. During all lab and tutorial sessions students receive formative feedback on their performance in undertaking the laboratory and tutorial exercises. Summative feedback and grades are also provided for the coursework assignments undertaken as part of the module, using GCULearn. GCU Learn is also used to provide the students with module specific Forums and Wikis to stimulate student and lecturer interaction outwith the normal lecture, laboratory and tutorial sessions. Flexible learning is encouraged and supported. All teaching materials and self-testing exercises are made available on GCULearn and links are provided to external materials such as podcasts, MOOCs, videos and relevant literature. All the computing resources used for laboratories are made available either by virtual machine images (supplied to students for use on their own computers) or online using industry standard cloud computing services provided by major global computing industry vendors. The strategy for the delivery of this module is suitable for both an in-attendance mode of delivery as described previously and an on-line mode of delivery where Flexible and Distributed Learning (FDL) is required. Specifically the on-line delivery mode for this module will include the same e-books and materials that are used for the in-attendance mode but these materials will be formatted into structured self-paced interactive e-learning units containing lecture content, embedded interactive tutorial material with solutions and contextual feedback. Each unit will also contain interactive quizzes and application-simulations for tasks that include lab-work using analytics software. These units will be developed using available e-learning software such as Adobe Captivate and Camtasia; the units will be available via GCULearn.

Indicative Reading

Data Mining and Predictive Analytics (Wiley Series on Methods and Applications in Data Mining) Daniel T. Larose ,Chantal D. Larose (24 Apr 2015) Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems). Jiawei Han 3rd Revised edition (8 Mar. 2011) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data Hardcover. EMC Education Services (17 Mar 2015) Analytics in a Big Data World: The Essential Guide to Data Science and its Applications Bart Baesens (Wiley and SAS Business Series) (1 Jul 2014) Big Data Analytics For Beginners. Faraz Rabbani , Ali Roghani (26 Jan 2015) Practical Business Analytics Using SAS: A Hands-on Guide Shailendra Kadre, Venkat Reddy Konasani (30 Jan 2015) Hadoop: The Definitive Guide 3 rd Edition. Tom White. (29 May 2012) Data Analytics with Hadoop Benjamin Bengfort, Jenny Kim (25 Jan 2016) http://www.kdnuggets.com/

Transferrable Skills

D1 Specialist knowledge and application D2 Critical thinking and problem solving D3 Critical analysis D4 Communication skills, written, oral and listening D5 Numeracy D6 Effective information retrieval and research skills D7 Computer literacy D8 Self-confidence, self-discipline & self-reliance (independent working) D10 Creativity, innovation & independent thinking D14 Ability to prioritise tasks and time management D16 Presentation skills

Module Structure

Activity Total Hours
Assessment (FT) 20.00
Independent Learning (FT) 120.00
Practicals (FT) 24.00
Lectures (FT) 24.00
Tutorials (FT) 12.00
Independent Learning (FDL) 200.00

Assessment Methods

Component Duration Weighting Threshold Description
Exam (School) n/a 50.00 45% Class test: Written paper/on-line class test
Exam (School) n/a 50.00 45% Lab test: Practical lab test involving an application/on-line practical lab test**To obtain SAS certification the student must obtain a minimum aggregate mark of 60%