UECM1534: The file “Sports Sales.csv” contains data on the sales of products by sports companies around the world. Write a Python script that performs the following tasks in the given order: Programming Techniques for Data Processing Assignment, NUM, Malaysia
University | The National University of Malaysia (NUM) |
Subject | UECM1534: Programming Techniques for Data Processing |
Question 1
The file “Sports Sales.csv” contains data on the sales of products by sports companies around the world. Write a Python script that performs the following tasks in the given order. If you are using Jupyter Notebook, your script must be self-contained in a single code cell. That is, all the given tasks are performed without any error or warning when your script is run once from a single code cell. The tasks are:
- Read the dataset into a DataFrame called df. Then, display the first 5 rows of the dataset.
- Print out the number of records in the dataset and the total number of missing values.
- Remove the records where the Date, Customer ID, Customer Gender, Country, or Product Category fields have missing data. Save the result in a DataFrame called df_cleaned. Print out the total number of records removed this way.
- Fill in the remaining missing data in the fields of df_cleaned with the mean of the field. Print out the total number of values filled this way.
- Convert the column “Date” of df_cleaned to DateTime datatype (assume that the dates are day first). Then, set the column “Date” as the index and sort these dates in descending order.
- Convert the datatype of the numeric columns in df_cleaned to integer datatype. Note that the numbers should be rounded to the nearest integer after the conversion. Print out the data types of all the columns for confirmation.
- Add columns “Year” and “Quarter” to df_cleaned, where the column “Year” contains the year of the date in the index, and the column “Quarter” contains the quarter of the year of the date in the index. Then, display the first 5 rows of the dataset.
- Using df_cleaned, create a DataFrame called df_customers that keeps 5 sums –Order Quantity, Unit Cost, Unit Price, Cost, Revenue, and Profit — for each customer. Note that each customer is identified by his or her unique Customer ID. Then, sort the dataset by Revenue in descending order. Then, display the first 5 rows of the dataset.
- Using df_cleaned, create a dictionary called df_countries that keeps the unique
values in the column “Country” as its keys and keeps the dataset for each country as its values. For example, df_countries[“United States”] should reference the DataFrame containing the data for only the United States. The column “Country” should be dropped from this DataFrame. You should test this and display the resulting DataFrame. Extra marks will be given for automation.
Are You Searching Answer of this Question? Request Malaysian Writers to Write a plagiarism Free Copy for You.
Question 2
The file “Survey.csv” is a dataset that contains the results of a survey on social media users. The questions ask about:
- the background (demographics) of the respondent,
- the types of social media that are consumed by the respondent, and
- the types of issues that the respondent takes interest in on social media.
Each column (except the first) in the dataset corresponds to a question in the survey. The questions are given in row 8 and the category of the questions is in row 7. From row 9 onward, each row in the dataset corresponds to a respondent of the survey. The possible answers to each question in the survey are given in the top rows, that is, from row 1 up to row 6. In addition, the types of issues that the respondents are asked about are categorized into: - national issues, and
- local issues.
In particular, the columns “Living Costs” up to “NationalOthers” belong to national issues, and the columns “Land” up to “LocalOther” belong to local issues.
Get Solution of this Assessment. Hire Experts to solve this assignment for you Before Deadline.
Get Help By Expert
Grab our best programming assignment help to complete your UECM1534: Programming Techniques for Data Processing assignment. Malaysia Assignment Help has a team of academic writers who serve the 100% plagiarism-free solution of essay writing, report writing, dissertation writing, research paper, etc at a low price.
Recent Solved Questions
- Ms Kate is the principal of Unitarian Preschool with 14 years of experience: family Assignment, OUM, Malaysia
- FOBT3123: The Sale of Goods Act lays down a small number of compulsory legal rules concerned with an array of presumptions and implied: Business Law Assignment, UoC, Malaysia
- STA404: There is an increasing trend in purchasing the laptop due to the pandemic Covid19: Statistics for Business and Social Sciences Assignment, UiTM, Malaysia
- Using the brute-force approach, solve the knapsack problem above. Given a set of points in a 2D plane below: Algorithm Analysis and Design Assignment, UITM, Malaysia
- 3343/03: There are a number of ways for children to be socialised in order to become fully functioning: Family and Society Assignment, WOU, Malaysia
- BUS2022 To identify and analyze ethical dilemmas that can arise in the workplace: Business Ethics Report, JUW, Malaysia
- Learning is a relatively continuing product of experiences that come out as a result of the interaction between the individua: Theories and Contemporary Models in Visual Art Education Assignment, BCU, Malaysia
- BBPW3103 FINANCIAL MANAGEMENT 1- UOM
- BBM203: Explain the difference between paired and unpaired samples in a hypothesis testing procedure involving two sample means: Business Statistics Assignment, WOU, Malaysia
- Explain whether a third party is required to have constructive notice of the contents or requirements of a company’s: Corporate Law Assignment, VUC, Malaysia