logo

Crowdly

Browser

Add to Chrome

MACHINE LEARNING 700(2025S1MAL700D)

Looking for MACHINE LEARNING 700(2025S1MAL700D) test answers and solutions? Browse our comprehensive collection of verified answers for MACHINE LEARNING 700(2025S1MAL700D) at learning.richfield.ac.za.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

3.2 You are provided with  access at which contains the classic 150-row

   please click on the link:  iris data:  The dataset have the following features.  

Sepal_length

Sepal length in cm

Sepal_width

Sepal width in cm

Petal_length

Petal length in cm

Petal_width

Petal width in cm

Species

Species (Setosa, Virginica)

(a)    Data Preparation   

   Write a Python code to:

       · Load the into a Pandas DataFrame and print the first eight rows.                                                                                             

       ·  Label encoding or One hot column encoding for the Species column and split the 

Iris_dataset.csv into 80% train and 20% test.

(b)   Write Python code to train the following algorithms using the 80% train portion:  Logistic Regression, Support Vector Machine, Decision Tree and Random Forest                                                                                            

(c)   Write a Python code for predictions using the following algorithms: Logistic Regression, Support Vector Machine, Decision Tree and Random Forest. (25 marks)  

             

View this question

3.1 You are a data analyst for an e-commerce retailer. Management wants to segment customers so that marketing can target each group with tailored campaigns. You will use K-means clustering on basic purchasing-behaviour data. You have the Mall_cutomers.csv dataset in your computer. The dataset have the following columns:

Column

Description

CustomerID

Unique ID (integer)

Gender

Male / Female

Age

Age of customer in years

Annual_Income($)

 

Reported annual income (thousands)

 

Spending_Score

1–100 score assigned by the mall based on purchasing behaviour

Using the following imported libraries:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.preprocessing import LabelEncoder

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

 

 

(a)    Write Python code snippet to load the  Mall_cutomers.csv from your computer into the pandas dataframe and display the first five rows and shape of the dataframe.                                                                                                     

(b)   Write Python code for Data pre-processing:

·   Drop the CustomerID column.

·    One-hot encode the Gender column.

·   Check for missing values.

·    Standardise all numeric features (Age, Annual_Income($), Spending_Score) with StandardScaler.

·         Show the final dataframe shape.                                                        

(c)     Write Python code to choose k (the number of clusters)                             

(d)   Write Python code to train the K-means clustering algorithm which is  KMeans(n_clusters=5, random_state=42) on the scaled features               

(e)   Visualise the clusters using a Python code.   (25 marks)

View this question

2..4 You are working on a binary classification problem to predict whether patients have Covid-19 disease (Positive) or not (Negative) using a logistic regression model. After training your model, you obtain the following confusion matrix on the test data:

 

Predicted Positive

Predicted Negative

Actual Positive

40

10

Actual Negative

20

30

Calculate the following performance metrics: accuracy, precision, recall, F1-score, and specificity.(10)

View this question

2.3  Using examples, differentiate between model overfitting and underfitting.  Describe in detail FOUR (four) ways to mitigate each of these problems.  (10)

                                                                                                                                       

View this question

2.2 Write Python code snippet to show  how to standardize a numeric feature called Age which is part of a dataframe, df using StandardScaler from sklearn.preprocessing (5)

View this question

2.1 How can a dataset without the target variable be utilized into supervised learning algorithms?(5)

View this question

Question 4 (25 marks)

4.1 Write a pseudo code/ algorithm for the K-means clustering algorithm.(10)

4.2   Explain why accuracy is not always the ideal metric for model evaluation (5)

 

 4.3 Use Table 4.3 to answer the following questions:  

     

     Table 4.3

·         age: Patient age (years).

·         blood_pressure: Systolic blood pressure (mmHg)

·         cholesterol: Serum cholesterol level (mg/dL)

·         has_disease: Binary target indicating if the patient has the disease (1 = yes, 0 = no)

Complete the following code to show a Python code snippet to train the decision tree and logistic regression models for making predictions.

import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report, confusion_matrix

 

# Load the data in the dataframe

data = {

    "age": [40, 55, 60, 45, 50, 35, 70, 65, 54, 75],

    "blood_pressure": [120, 140, 130, 135, 142, 110, 160, 150, 138, 170],

    "cholesterol": [200, 210, 250, 240, 190, 180, 260, 210, 225, 280],

    "has_disease": [0, 1, 0, 1, 0, 0, 1, 1, 0, 1]

}

df = pd.DataFrame(data)

 

# Split into features (X) and target (y)

X = df[["age", "blood_pressure", "cholesterol"]]

y = df["has_disease"]

 

 

# Dataset split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

 # Code for training the Decision tree and make predictions (5)

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

# Code for training the Logistic regression  and make predictions                                                 (5)

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

………………………………………………………………………………………………………………………………………………………………

View this question

Question 3 (25 marks)

3.1 Use Table 3.1  to answer the following questions:

Table 3.1

square_footage

num_bedrooms

age

location

price

800

2

10

Urban

350000

1200

3

5

Suburban

245000

1500

4

10

Urban

400000

1800

3

20

Urban

300000

2500

4

7

Rural

200000

900

2

25

Urban

250000

1100

5

15

Suburban

320000

2200

4

17

Rural

280000

square_footage is the area of the house in square feet.

num_bedrooms is the number of bedrooms.

age is the age of the house in years.

location is the location of the which is Urban, Suburban or Rural.

price is the cost of the house in ZAR (rand).

Using the following imported libraries:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.preprocessing import LabelEncoder

Each question has a weighting of 5 marks.

(a )Write Python code snippet to load the houses data in a dataframe

(b) Write a Python code snippet to separate independent features/attributes and a target feature (price).                                                                                                                   

 

(c) Write a Python code snippet to apply label-encoding (one-hot encoding) to the location column. 

                                                                                                                           

(d) Write a Python code snippet to split the dataset into 80% train and 20% test.

(e) Write a Python code snippet to train a linear regression model (using the test dataset) and predict the house price of a house with 2000 square_footage, num_bedrooms: 5, age: 5 and location:  Urban.                                                                                                                             

(a                                                                                                                               

View this question

1.4 Describe using an example the bias-variance trade-off in Machine Learning.(5)

View this question

1.3 List FIVE (5) Python Machine Learning libraries.(5)

View this question

Want instant access to all verified answers on learning.richfield.ac.za?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Browser

Add to Chrome