Looking for Data Analytics (Eng) / Data Analitika (Ing) - 344 test answers and solutions? Browse our comprehensive collection of verified answers for Data Analytics (Eng) / Data Analitika (Ing) - 344 at stemlearn.sun.ac.za.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
What is the second smallest distance from an instance in SBLTestingData to an instance in SBLTrainingData? Write your answer correct to 2 decimal places.
What is the mean value of the chlorides descriptive feature in SBLTrainingData? Write your answer correct to 2 decimal places.
What is the median value of the free.sulfur.dioxide descriptive feature in SBLTrainingData? Write your answer correct to 2 decimal places.
Use SBLTestingData to calculate the accuracy of a KDD-based KNN tree (K=1) after training it on SBLTrainingData. What is the accuracy as a decimal number between 0 and 1, correct to 2 decimal places?
What is the index of the SBLTrainingData instance that is most similar to the first instance in SBLTestingData?
It is suspected that the target values of the first 2 and last 2 instances in SBLTestingData were toggled by error. That is 0s were written instead of 1s (and vice versa) for those instances. Rectify this error in your dataframe (not in the CSV file) then calculate the F1score of a KNN(K=3) decision tree trained on SBLTrainingData and tested on the corrected dataframe. What is this F1score, correct to 3 decimal places? Assume a target value of 1 represents True and a target value of 0 represents False.
It is suspected that the target values of the 9th up to the 14th instances in SBLTestingData were toggled by error. That is 0s were written instead of 1s (and vice versa) for those instances. Rectify this error in your dataframe (not in the CSV file) then calculate the precision of a KNN(K=3) decision tree trained on SBLTrainingData and tested on the corrected dataframe. What is this precision, correct to 3 decimal places? Assume a target value of 1 represents True and a target value of 0 represents False.
A classification tree is induced, on four input features A, B, C, and D. Features A and D are both continuous and thresholding (as opposed to binning) is used to determine the split criteria for both features. Feature A has values ranging from 0 to 10 and feature D has values ranging from 1 to 100. Features B and C are categorical, with feature B having 5 outcomes and feature C having 3 outcomes. We have the following entropy information gains for the four features:
gain(A) = 0.7
gain(B) = 0.75
gain(C) = 0.7
gain(D) = 0.75
On which of the features should the dataset be split?