CIS7031 - Programming for Data Analysis
{` CIS7031 - Programming for Data Analysis Semester 2 Cardiff School of Technologies `}
Assessment Title: Employment in Wales
Learning Outcomes
This assessment is designed to demonstrate a student’s completion of the following Learning Outcomes:
- Critically analyse and evaluate various statistical and computational techniques for analysing datasets and determine the most appropriate technique for a business problem;
- Critically evaluate, develop and implement solutions for processing datasets and solving complex problems in various environments using relevant programming paradigms;
- Evaluate and apply key steps and issues involved in data preparation, cleaning, exploring, creating, optimizing and evaluating models;
- Evaluate and apply aspects of data science applications and their use.
EDGE
The Cardiff Met EDGE supports students in graduating with the knowledge, skills, and attributes that allow them to contribute positively and effectively to the communities in which they live and work.
This module assessment provides opportunities for students to demonstrate development of the following EDGE Competencies:
ETHICAL |
Students will be required to consider Ethical implication of their analysis and follow the necessary ethical approval processes while addressing problems associated with the assessment. |
DIGITAL |
Students will be required to demonstrate digital skills in the collation of data and analysis for their project. |
GLOBAL |
Students will demonstrate an awareness of the global context and apply this to their assessment |
ENTREPRENEURIAL |
Students will also demonstrate their developed entrepreneurial through working under their own initiative, formulating and presenting recommendations in order to solve an authentic and complex problem associated with the module. |
Assessment Requirements / Tasks (include all guidance notes)
This assignment will use employment data of Wales from the StatsWales data source. This dataset provides workplace employment estimates, or estimates of total jobs, for Wales and its NUTS2 areas, along with comparable UK data disaggregated by industry section.
For this assignment students will undertake a data analysis and machine learning approach to reveal the workplace employment landscape of Wales.
1. Data processing
1.1. Download the dataset for the period 2009 – 2018 and create a dataframe that concatenates Wales (total) employment value only.
1.2. Check for any null value or outlier. If found replace that with mean value.
1.3. Change the name of the industries as bellow
The final dataframe should look like following
Need to use Jupyter Notebook
Industry |
2018 |
2017 |
2016 |
2015 |
2014 |
2013 |
2012 |
2011 |
2010 |
2009 |
Agriculture | ||||||||||
Production | ||||||||||
Construction | ||||||||||
Retail | ||||||||||
ICT | ||||||||||
Finance | ||||||||||
Real_Estate | ||||||||||
Professional_Service | ||||||||||
Public_Adminstration | ||||||||||
Other_Service |
2. Data analysis
For each question provide graph/chart along with your own interpretation (~ 50 words)
- Which industry employed highest and lowest workers over the period?
- Which industry has the highest and lowest overall growth over the period?
- Which years are the best and worst performing year in relation to number of employment. (highest and lowest employment)
3. Visual analysis
Create a dynamic scatter/bubble plot showing the change of workforce number over the period using Plotly express.
4. Correlation
- Taking average employment number for each industry over the period, show and identify the highest and lowest correlated industries.
- Make a year wise correlation for each industry. Does the aforementioned industries are also correlated over the each year? Explain your answer.
- Clustering (k means & hierarchical)
- Using the best and worst performing year column’s employment data (2.3) undertake a K means clustering analysis (K=2 & 3) and identify industries cluster together. Write your own interpretation (~100 words).
- Using the same dataset (best & worst performing) create a hierarchical cluster. Compare the cluster with k means clusters.
- Discussion
Provide a brief discussion (~ 300 words) on employment landscape of Wales based on the employment data analysis results.
Assessment Criteria
1.1 Data preparation |
05 |
1.2 Data preparation |
05 |
1.3 Data preparation |
05 |
2.1 Data analysis |
05 |
2.2 Data analysis |
05 |
2.3 Data analysis |
05 |
3 Visual analysis |
20 |
4.1 Correlation |
10 |
4.1 Correlation |
10 |
5.1 Clustering |
10 |
5.2 Clustering |
10 |
6 Discussion |
10 |
Feedback
Feedback for the assessment will be provided electronically via Moodle, and will normally be available 4 working weeks after initial submission. The feedback return date will be confirmed on Moodle.
Feedback will be provided in the form of a rubric and supported with comments on your strengths and the areas which you improve.
All marks are preliminary and are subject to quality assurance processes and confirmation at the Examination Board.
Further information on the Academic and Feedback Policy in available in the Academic Handbook (Vol 1, Section 4.0)
Marking Criteria
70 – 100% (1st) |
Addressed all sections and provided correct answers with elegant presentation of results. Applied correct data analysis approaches and provided excellent interpretation on each section. |
60-69% (2:1) |
Addressed all sections and provided correct answers with good presentation of results. Applied mostly correct data analysis approaches and provided very good interpretation on each section. |
50-59% (2:2) |
Addressed most of the sections and provided mostly correct answers with average presentation of results. Applied some correct data analysis approaches and provided an average interpretation on each section. |
40-49% (3rd) |
Addressed few sections with few correct answers with/out any presentation of results. Applied mostly incorrect data analysis approaches and provided poor interpretation on each section. |
35-39% (Narrow Fail) |
Addressed few sections and provided mostly incorrect answer with poor presentation of results. Applied incorrect data analysis approaches and provided poor interpretation. |
<35% (Fail) |
Very poor report missing one or more required parts. |
Additional Information
Referencing Requirements (Harvard)
The Harvard (or author-date) format should be used for all references (including images).
Further information on Referencing can be found at Cardiff Met’s Academic Skills website.