INTRODUCTION TO DATA SCIENCE

 SHE Level 3 SCQF Credit Points 20.00 ECTS Credit Points 10.00 Module Code M3I326702 Module Leader Gordon Morison School School of Computing, Engineering and Built Environment Subject Computing Trimester A (September start)

Summary of Content

This module will introduce the development of software programming skills used for data science using an appropriate programming language. The module will give the student the opportunity to explore software development methods for the key aspects of data science such as data capture, wrangling, analysis, processing and data visualisation. The student will gain an understanding of the various data science software ecosystems in order to apply statistical data analysis techniques (descriptive and inferential), machine learning and information visualisation techniques. This will be introduced via practical examples using both data simulation and real-world datasets to allow the student to make decisions that are supported by data. The percentage of Work Based Learning for this module, as represented by the proportion of the Activity Types which take place off campus, is 80%. The percentage of Work Based Assessment for this module is 10%.

Syllabus

Data Science basics - Matrix and Vector Representations Python Data Science Software Ecosystem: Numpy - SciPy - SciKitLearn - Matplotlib - Pandas Exploratory Data Analysis Basic Statistics: Population vs Sample, mean, median, mode, standard deviation, skewness, variance, correlation, covariance. Hypothesis testing, Statistical distributions, standard error and confidence interval, type 1 and 2 errors, p-value Data Manipulation: Software implementation using Pandas Data Visualization: Software implementation using Matplotlib Data Science Applications - Statistical Models - Analysis Techniques - Line Fitting - Prediction - Applying Algorithms - Forecasting Introduction to Machine Learning - Classification and Regression

Learning Outcomes

On successful completion of this module students should be able to:1. Demonstrate familiarity with the problems and issues surrounding real world data sources2. Apply filtering, cleaning and transformation techniques to data3. Understand how probability theory and statistical methods are applied to data4. Detail how the output of statistical models are interpreted to form an insight into data5. Understand the basics of Machine Learning including Classification and Regression6. Implement Data Science methods in an appropriate software programming language

Teaching / Learning Strategy

Work based Education aims to maximise the direct and digitally mediated contact time with students by practicing teaching and learning strategies that use authentic work based scenarios and encourage action learning, enquiry based learning, problem based learning and peer learning. All these approaches aim to directly involve the students in the processof learning and to encourage sharing of learning between students. The module team will determine the level and accuracy of knowledge acquisition at key points in the delivery, inputting when necessary either directly or with the support of external experts who will add to the authenticity, the credibility and application of the education and learning in the workplace.The course material is introduced through lectures in the form of online presentations. Students will engage with practical and tutorial activities including during sessions on campus which will allow students to discuss key concepts and issues with peers and with instructors. Students will be expected to undertake a significant level of independent study within the workplace, including practical activities, and links will be provided to appropriate external material such as podcasts, MOOCs, videos and literature to supplement the module content. Students will also be encouraged to reflect upon the theoretical learning within the work place and the application of newly learned concepts to the work environment. Full use will be made of GCU Learn to provide Lecture-based and related study materials, along with sample solutions of Tutorial and Laboratory exercises, thus encouraging the development of independent learning and allowing self-reflective feedback on student performance. Staff-based feedback on student performance for submitted work will be provided in line with the University feedback policy, with summative feedback and grades on the coursework assessment utilising GCU Learn. The additional interactive discussion features of GCU Learn will be utilised, as appropriate to the module, to stimulate independent and flexible student learning outwith scheduled class time. Lectures are supplemented by directed reading to relevant sources both hard and electronic format and varied further reading is encouraged.Hands on experience is gained in the process of completing lab exercises tailored to demonstrate the required tool.Students are supported in their studies by both face-to-face and on-line tutorials and online quiz material. Learning and teaching strategies will be developed and implemented, appropriate to students' needs, to enable all students to participate fully in the module.

An introduction to Data Science - by S. Saltz Jeffrey and Morgan Stanton Jeffrey 21 Dec 2017 Data Science (MIT Press Essential Knowledge series) - by John D Kelleher Python Data Science Handbook: Essential Tools for Working with Data - By Jake VanderPlas 2017

Transferrable Skills

D1 Critical thinking and problem solving D6 Time management (organising and planning work) D10 Information retrieval skills D13 IT Skills D14 Communication skills, written, oral and listening

Module Structure

Activity Total Hours
Assessment (FT) 28.00
Lectures (FT) 36.00
Tutorials (FT) 24.00
Seminars (FT) 12.00
Independent Learning (FT) 100.00

Assessment Methods

Component Duration Weighting Threshold Description
Course Work 01 n/a 50.00 35% Practical Assignment
Course Work 02 n/a 50.00 35% Practical Assignment