MDS502: ability to create a database system and design a model for comprehensive data management: Data Management Assignment, CU, Malaysia
|University||curtin university (CU)|
|Subject||MDS502: Data Management|
This assignment has been designed to assess students’ ability to create a database system and design a model for comprehensive data management. This assignment addresses the following learning objectives for this module:
CLO1: Create database systems using appropriate design techniques and scripting languages (C6, PLO1, MQF1)
CLO2: Design models for storage, cleansing, transforming, integrating of data to meet organizational standards (P7, PLO2, MQF2)
B. ASSIGNMENT/PROJECT EVALUATION
These criteria will be used to evaluate your assignment/project submission:
• The assignment/project is well presented (Professional standard)
• The proposed data management system is suitable and feasible
• The database creation and meeting organizational requirements are well addressed
• Assumptions are listed
You are to create a comprehensive data extraction, transformation, and loading (ETL) project based on a real dataset. You are free to choose any dataset of your preference as long as it is a freely shared valid dataset generated by a legitimate organization or company.
The whole ETL project would mainly consist of data extraction, data transformation, and data loading processes along with supporting activities, which will be mentioned by the tasks in the next section. Data extraction could be either internal data sourcing or external data sourcing. Accordingly, data can be obtained through web scraping or a database management system.
Extract, Transform, Load (ETL) Case Study
You have been assigned to an ETL project that would process a dataset in ETL and analytics activities. You would start from a web scraping activity that extracts public data available on a website, for example: web scraping data on a weather website or Covid-19 pandemic-related website. For web scraping purpose, you may use an automation tool, such as Python-based Beautiful Soup. The related tasks that need to be done are divided in the upcoming sections.
Accomplishing these tasks properly would mean successful completion of this project. You may use the Python programming language to do the tasks. As such, an Integrated Development Environment (IDE) tool for Python such as Jupyter Notebook could be used. At the end of the extraction, the file must have more than sufficient number of attributes (guide: 10 attributes) including a range of data types consisting of character, integer, number/float, date, and another. In addition, the extracted file may have missing values and outliers.
Task 1: Data Extraction and Understanding the Dataset
Select a source website where you will do your web scraping on and justify your choice of the website. You may refer to this web scraping tutorial or if you prefer using Beautiful Soup library, After you complete your web scraping activity, list down all the available column names and explain each of them.
Task 2: Identifying the Variables/Columns
1. Analyze your dataset and recommend how it could be used for decision making. For example, whether it would be appropriate for prediction, classification, or other types of analysis.
2. Explain how each variable in the dataset, relevance as independent or dependent variable for the proposed model would be.
Task 3: Data Categorization and Data Profiling
1. As part of data management is data categorization. Select columns or variables according to two categories i.e., independent and dependent variables in your code. Include a screenshot of this activity.
2. Explain data profiling and data profiling components.
3. Appraise and evaluate your dataset with data profiling components.
Task 4: Data Transformation [20 marks, CL02]
It is likely that your data needs transformation in order to prepare it for future analytics, thus observe your data and decide whether you would eliminate duplicate rows, drop rows with missing or null values – or replace these values with median/mean value – Include your justifications. Also convert the categorical columns to numerical columns (if there is any).
Furthermore, detect the rows with outliers and decide what to do with them.
Moreover, standardize the data so that every column’s range of values would be the same e.g., 0-1 or 1-10 in order to support the analytics algorithm’s performance. Include the screenshots of the above activities.
Task 5: Analysis (Factor analysis / Classifications
Describe your data with justifications, so that you could present it to your management. The story telling could be established by discovering the correlations between the variables based on factor analysis. Alternatively, you may conduct classifications. Afterwards, explain how the discovered relevant factors affect the dependent variable. Also, plot or visualize these relevant factors. And finally, drop the unnecessary variables from your data and keep the relevant ones. Include the screenshots of the above activities.
Task 6: Data Loading
A relational database provides data query in simple and meaningful manner which would be a better data management practice, hence load the previously transformed data frame into local SQL database (create the Database) e.g., MySQL, PostgreSQL, or alike. Include a table specification using the following template:
Get Solution of this Assessment. Hire Experts to solve this assignment for you Before Deadline.
Get Help By Expert
Malaysia Assignment Help offers online assignment help on MDS502: Data Management. Our Information technology assignment helpers are working in several colleges in Malaysia as computer science professionals and provide DSC650 Data Technology And Future Emergence assignment sample, CSC405 Computer Application Assignment Sample, and DSC651 Data Represent And Reporting Techniques assignment sample at a cost-effective price.
Recent Solved Questions
- FIT1047: Your name as a MARIE string This is the first task you need to submit It’s just a little warm-up so you can get familiar with strings: Introduction to computer systems, networks Assignment, MUM, Malaysia
- UNGS2060: Discuss the circumstances that led to the spread of Islam in the Nusantara: Malay Virtues Heritage & Malaysian Society Research Paper, IIUM, Malaysia
- ETMS63020: Explain with applicable relevant examples how would you apply the following didactical principle: Economic and Management Sciences Teaching 1 Assignment, SPU, Malaysia
- MPU3112: Tugasan ini bertujuan untuk menilai kebolehan pelajar menganalisis kesepaduan sosial dalam mewujudkan: Hubungan Etink Assignment, OUM, Malayasia
- CDAD 2103: To help cope with the social distancing measures put in place in flattening the curve of COVID-19: Methodology of information system development Assignment, OUM, Malaysia
- HGM3043: Latar belakang, pertumbuhan penduduk dan sejarah perkembangan bandar ini And Dua aktiviti ekonomi asas di bandar ini: Geografi Bandar Dan Metropolitan Assignment, UPSI, Malaysia
- MGT 269: Students need to write an analytical report It is one of the methods to evaluate students understanding of real-world: Business Communication Report, UiTM, Malaysia
- ENT1100: Dropshipping is a fulfillment method where a store doesn’t keep the products it sells in stock: Entertainment Technology Course Work, NYCCT, Malaysia
- EEE4333: Demonstrates the ability to engage in independent learning of modern tools for the programming of embedded systems: Embedded system Coursework, SU, Malaysia
- MGT555: In 2021, the price charged for bottled drinks/box is RM10 for passion fruit, RM8 for mango, and RM6 for calamansi: Business analytics Report, UiTM, Malaysia