Looking for Data Analytics (Eng) / Data Analitika (Ing) - 344 test answers and solutions? Browse our comprehensive collection of verified answers for Data Analytics (Eng) / Data Analitika (Ing) - 344 at stemlearn.sun.ac.za.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
Assume that the K-medoid clustering algorithm has been applied to a data set described by two descriptive features. The table below shows the instances assigned to cluster 1. What is the ID of the instance that should represent the cluster? Assume that Manhattan distance is used.
ID d1 d2
4 3 4
5 6 2
9 6 4
13 7 3
15 7 4
17 7 6
19 8 5
What is a significant advantage of using the X-Means clustering algorithm over the standard k-means algorithm?
Given the instances , the randomly selected points , and the randomly sampled points , calculate the Hopkins statistic. Round your answer to three decimal places.
The table below shows a small data set in which each instance is described by three features. The k-means clustering algorithm is to be applied to this data set with k = 2 and using Euclidean distance. The initial centroids for the two clusters C1 and C2 are c1 = (0.4; 0.3; 0.2) and c2 = (0.7; 0.8; 0.7).
d1 d2 d3 dist(c1) dist(c2)
0.392 1.258 0.666 1.065356 0.552977
0.251 1.781 1.495 1.972964 1.340144
0.823 0.042 1.254 1.164650 0.946894
0.917 0.961 0.055 0.851607 0.699310
0.736 1.694 0.686 1.514044 0.894834
1.204 0.605 0.351 0.873065 0.643306
0.778 0.436 0.220 0.402219 0.607437
1.075 1.199 0.141 1.125747 0.782500
0.854 0.654 0.771 0.810847 0.223770
Assign the data points to the appropriate clusters and calculate the new cluster centroids. What is the d1 coordinate of the centroid of the first cluster?
We want to use clustering to transmit a compressed input signal over a network. Our network consists of an encoder and a decoder. The encoder uses pre-computed cluster centroids found with k-means clustering and Euclidean distance to find a representative example of the input signal. The ID of the representative example is then transmitted over the network. Given the cluster centroids:
ID d1 d2 d3
1 0.5 0.3 0.4
2 0.7 0.8 0.3
3 0.6 0.6 0.5
4 0.2 0.1 0.15
What ID will be transmitted if the input signal is (0.4, 0.4, 0.4)?
Given the instances and the randomly selected points , calculate the distance from each point in to its nearest neighbour in . Then, compute the average value.
Consider two instances a and b each described by the categorical feature (Feature A) and the categorical feature (Feature B). Given that the values for a are ("Yes", "<20") and the values for b are ("Yes", "<20"), what is the Gower distance between a and b?