logo

Crowdly

Browser

Add to Chrome

3.1 You are a data analyst for an e-commerce retailer. Management wants to segm...

✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.

3.1 You are a data analyst for an e-commerce retailer. Management wants to segment customers so that marketing can target each group with tailored campaigns. You will use K-means clustering on basic purchasing-behaviour data. You have the Mall_cutomers.csv dataset in your computer. The dataset have the following columns:

Column

Description

CustomerID

Unique ID (integer)

Gender

Male / Female

Age

Age of customer in years

Annual_Income($)

 

Reported annual income (thousands)

 

Spending_Score

1–100 score assigned by the mall based on purchasing behaviour

Using the following imported libraries:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.preprocessing import LabelEncoder

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

 

 

(a)    Write Python code snippet to load the  Mall_cutomers.csv from your computer into the pandas dataframe and display the first five rows and shape of the dataframe.                                                                                                     

(b)   Write Python code for Data pre-processing:

·   Drop the CustomerID column.

·    One-hot encode the Gender column.

·   Check for missing values.

·    Standardise all numeric features (Age, Annual_Income($), Spending_Score) with StandardScaler.

·         Show the final dataframe shape.                                                        

(c)     Write Python code to choose k (the number of clusters)                             

(d)   Write Python code to train the K-means clustering algorithm which is  KMeans(n_clusters=5, random_state=42) on the scaled features               

(e)   Visualise the clusters using a Python code.   (25 marks)

More questions like this

Want instant access to all verified answers on learning.richfield.ac.za?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Browser

Add to Chrome