This section contains the material for part 2 of the coursework.
- Important Dates
You must ensure you meet the deadlines for each element.
|Portfolio Element||Type||Due Date|
|1||Draft||Labs in Week 22|
|2||Final Report and Viva||During labs in week 23/24|
By Week 22, you should have a draft report ready to discuss with your tutor for feedback.
Plagiarism in data warehouse design is easy to spot. Please be aware that the penalties are severe and your final degree award classification is at risk.
- Coursework Content
You are required to work in a group (maximum 3 and minimum 1). Some sections are to be carried out by the whole group, and some sections MUST be carried out individually.
|Project Name||Student id||Student name||Contribution|
You are required to carry out a data-mining project on a Adult census data set, which can be found at
The data-mining project with results and analysis must be submitted in the form of a complete report.
Your project may use any combination of data-mining algorithms and software that has been covered in the module. You may also apply them to any aspect(s) of the dataset for knowledge discovery. Examples of techniques you could use are classification, regression, clustering, visualisation, etc.
Please see the below the aspects that you should consider:
- Data Audit (Group) (15 marks)
- Describe your data (give an overview summary of your data set)
- Identify your input and class variables (which variable are you going to use as your class variable)
- Analyse your variables (for each variable, you need to discuss the variable type, calculate relevant summary statistics and visually display the data)
- Discuss any anomalies in the data (for each variable you need to discuss missing values, outliers etc.)
- Pre-process the Data (Group) (15 marks)
- Discuss and carry out the appropriate handling of any anomalies identified in section 1.4
- Do you need to discard any of your input variables, justify the reasons
- Carry out appropriate Correlation Coefficient Analysis of the variables
- Carry out appropriate pre-processing of the data set
- Carry out any appropriate transformation of any of the input variable
- Data Mining (Individual) (35 marks)
- Appropriate use of data-mining algorithms and software (you need to use at least 2 different techniques)
- Appropriate selection and presentation of results
- Analysis of Results (Individual) (20 marks)
- Discussion and interpretation of the data-mining results (you need to compare the results you get from the data mining with the results of other members of your group)
- Discussion of the business intelligence that can be obtained from the results.
- Presentation (Group with each group member contributing individually) (15 marks)
Where to submit
Where to submit
Each individual group member should submit one report via the drop box in your learning environment. Make sure you clearly identify yourself by name and student Id.
Do not hand written assessed coursework directly to your tutor, and do not submit it by email to your tutor