Looking for Introduction to Data Science (LTAT.02.002) test answers and solutions? Browse our comprehensive collection of verified answers for Introduction to Data Science (LTAT.02.002) at moodle.ut.ee.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
Your task is to train a regression model on a dataset that contains 2000 numeric features. You have tried and found out that using all of these features results in massive overfitting. Can the following methods help?
You are given 10 people’s shopping lists:
ID | Items |
1 | Shirt, Lantern, Meat, Milk |
2 | Bread, Shampoo, Ice Cream, Meat |
3 | Eggs, Frozen Pizza, Ice Cream |
4 | Ice Cream, Lantern, Meat, Milk |
5 | Milk, Bread, Sugar |
6 | Salt, Milk, Muesli |
7 | Chewing gum, Milk, Sour Cream |
8 | Sweets, Nuts, Cookies |
9 | Ice Cream, Bread |
10 | Meat, Milk, Butter, Bread |
Using these data, determine what the confidence is for the association rule Milk -> Meat ?
You decided to use grid search to tune the hyperparameters of your RandomForest model on your test set. Then you estimate the accuracy of your model on that test set. Your estimate of the accuracy will be:
You have a dataset that contains information about land prices in different areas of the world. Your task is to create a model that would be capable of predicting the price of any piece of land. Which of the following methods could be used for this task?
Which tasks can be achieved with principal component analysis (PCA) without having to combine with other methods?
The practice exam is only for practicing, and the points gathered here do not contribute to the total points in this course!
The practice exam consists of two parts:
Part 1 has a time limit of 15 minutes and consists of 7 short questions (2 questions each worth 0.5 points and 5 questions each worth 0.8 points).
Part 2 has a time limit of 15 minutes and consists of 1 long question (worth 5 points).
The tasks of the practice exam will be made available for all of you and there is no need to save them for yourself.
On the topic of statistical hypothesis testing, please do the following:
1. Come up with and describe a data mining scenario where statistical hypothesis testing could be used.
2. Define a null hypothesis and the corresponding alternative hypothesis in this scenario.
3. Explain what confidence level and p-value would mean in this scenario.
4. What are the two different possible results that the statistical test can have?
5. What can the data miner conclude from one or the other result?
Note that we expect about 1-3 sentences of text for each of the above points. Please start the answer to each point from a new line and with the respective number (1., 2., 3., 4., 5.).