✅ The verified answer to this question is available below. Our community-reviewed solutions help you understand the material better.
3.1 You are a data analyst for an e-commerce retailer. Management wants to segment customers so that marketing can target each group with tailored campaigns. You will use K-means clustering on basic purchasing-behaviour data. You have the Mall_cutomers.csv dataset in your computer. The dataset have the following columns:
Column | Description | |
CustomerID | Unique ID (integer) | |
Gender | Male / Female | |
Age | Age of customer in years | |
Annual_Income($) | Reported annual income (thousands)
| |
Spending_Score | 1–100 score assigned by the mall based on purchasing behaviour |
Using the following imported libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
(a) Write Python code snippet to load the Mall_cutomers.csv from your computer into the pandas dataframe and display the first five rows and shape of the dataframe.
(b) Write Python code for Data pre-processing:
· Drop the CustomerID column.
· One-hot encode the Gender column.
· Check for missing values.
· Standardise all numeric features (Age, Annual_Income($), Spending_Score) with StandardScaler.
· Show the final dataframe shape.
(c) Write Python code to choose k (the number of clusters)
(d) Write Python code to train the K-means clustering algorithm which is KMeans(n_clusters=5, random_state=42) on the scaled features
(e) Visualise the clusters using a Python code. (25 marks)