Looking for Data Mining and Decision Support-Lecture,Section-1-Fall 2025 test answers and solutions? Browse our comprehensive collection of verified answers for Data Mining and Decision Support-Lecture,Section-1-Fall 2025 at moodle.nu.edu.kz.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
During your MLP cross-validation process, you observed clear overfitting patterns having a big gap/difference between the training and test errors. Which one will be right direction(s) to resolve it?
Compared to the SVC in sklearn, SVR has one important parameter, which decides a margin of tolerance meaning that no penalty is associated in the training loss function with points predicted within this distance from the actual value. What is the parameter?
How do we prevent overfitting?
Choose all the right evaluation metrics for classification.
(Note that, TP: True Positive, FN: False Negative, FP: False Positive and TN: True Negative).
For the below binary classification examples, which one is more important among 'a false negative' and 'a false positive'? (NOTE: Deducted points for the wrong answer(s). 0 point for 'Not Answering'. If students have specific/unique assumptions, please write them down on your plank paper).
Example 1: What if Jury or judge decides to make a criminal go free? (assuming the 'positive' is the prediction of 'criminal').
Example 2: Assume there is an airport ‘A’ which has received high-security threats and based on certain characteristics they identify whether a particular passenger can be a threat or not. Due to a shortage of staff, they decide to scan passengers being predicted as risk positives by their predictive model. What will happen if a true threat customer is being flagged/predicted as non-threat by airport model? (assuming the 'positive' is the prediction of 'threat/risk').
While high bias and low variance could result in over-fitting, low bias and high variance could result in under-fitting.
(NOTE: Deducted points for the wrong answer(s). 0 point for 'Not Answering').
Based on unlabeled training data, supervised learning (SL) algorithm(s) can be used to build an optimized model.
(NOTE: Deducted points for the wrong answer(s). 0 point for 'Not Answering').
Let's assume that a data scientist splits a dataset to training and test datasets using a test size of 0.3; and then he/she adopts a K-fold (k = 5) cross-validation technique. If so, which dataset(s) can be used for a validation dataset? (NOTE: Deducted points for the wrong answers. 0 point for 'Not Answering').
In general ML/DM process, the feature selection/reduction should be processed after the model selection.
For the learning rate in gradient descent, if it is too big, it will require a lot of training time.
(NOTE: Deducted points for the wrong answer(s). 0 point for 'Not Answering').