Looking for ETC1010 - ETC5510 - Introduction to data analysis - S1 2025 test answers and solutions? Browse our comprehensive collection of verified answers for ETC1010 - ETC5510 - Introduction to data analysis - S1 2025 at learning.monash.edu.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
The following question is about tidy data. The table below contains looks at crime occurrence in different locations across Victoria:
| entry_point | Location | crime_type | count |
|---|---|---|---|
| FRONT DOOR | Oakleigh | violent | 67 |
| FRONT DOOR | Clayton | violent | 53 |
| WINDOW | Oakleigh | burglary | NA |
| WINDOW | Clayton | burglary | 6 |
| Roof | Oakleigh | Others | 17 |
| Roof | Clayton | Others | 22 |
If you would like to calculate the proportion of the different crime types by location which code do you need to use?
Hint: Missing values is typically denoted by "NA" in the dataset, we can ignore these values by passing the option "na.rm = TRUE" to the appropriate R command.
Incorrect answers will be penalised.
This question is about visualising temporal data.
The example data is on pedestrian counts in the city of Melbourne. The below plot looks at distribution of the pedestrian counts over weekdays in March across 24hrs, comparing 2019 to 2020.
ped %>%
ggplot(aes(x=Time, y=Count, group=Date, colour=as.factor(year))) +
geom_boxplot() +
facet_wrap(~ year, ncol= 1, scales = "free") +
scale_colour_brewer("", palette="Dark2") +
theme(legend.position="bottom", legend.title = element_blank())
Image failed to loadBy looking at the above plots, select all statements that are TRUE.
The following question is about visualisation.
The data shows calories of a selection of chocolate bars, 100g equivalents. Calories mapped to the vertical axis. For the following statement:
Dark chocolates are higher in calories than milk chocolates.
The following question is about tidy data. The table below contains looks at crime data in different locations across New South Wales:
| entry_point | lga | crime_type | count |
|---|---|---|---|
| FRONT DOOR | Paddington | arson | 100 |
| FRONT DOOR | CBD | arson | 60 |
| FRONT DOOR | Newtown | arson | 90 |
| WINDOW | Paddington | burglary | 65 |
| WINDOW | CBD | burglary | 55 |
| WINDOW | Newtown | burglary | 100 |
| ROOF | Paddington | burglary | 10 |
| ROOF | CBD | burglary | NA |
| ROOF | Newtown | burglary | NA |
What is the total number of arson crime incidents recorded in this data set for Newtown with entry point being ROOF?
This question is about working with temporal data. The example data is on pedestrian counts in the city of Melbourne. What time periods of Melbourne pedestrian traffic are NOT extracted by the code below?
Select all answers that apply. Incorrect answers will be penalised.
library(lubridate)
library(rwalkr)
ped_2020 <- melb_walk(from=Sys.Date() - 7L)
ped_2019 <- melb_walk(from=Sys.Date() - 30L - years(1), to=Sys.Date() - years(1))
The following question is about workflow and reproducibility. Suppose you are writing a report with Rmarkdown that will be presented to an important client. You have a time consuming calculation that is required for downstream chunks for making tables and charts but that isn’t necessary to show the client.
Which of the following chunks will compute the output but not print the resulting code in the report? Note there may be more than one correct answer. Incorrect answers are penalised.
{r chunk-A, eval = FALSE, echo = FALSE}
{r chunk-B, eval = FALSE, echo = TRUE}
{r chunk-C, eval = TRUE, echo = FALSE}
{r chunk-D, include = FALSE}
This question is about visualising temporal data.
The example data is on pedestrian counts in the city of Melbourne. The below plot looks at the pedestrian counts over weekdays in March, comparing 2019 to 2020.
ped %>%
ggplot(aes(x=Time, y=Count, group=Date, colour=as.factor(year))) +
geom_line() +
facet_wrap(~wday, ncol=7) +
scale_colour_brewer("", palette="Dark2") +
theme(legend.position="bottom", legend.title = element_blank())
Image failed to loadBy looking at the above plots, select all statements that are TRUE. Incorrect answers are penalised.
The following question is about tidy data. The table below contains looks at crime occurrence in different locations across Victoria:
entry_point lga crime_type count FRONT DOOR Paddington arson 100 FRONT DOOR CBD arson 60 FRONT DOOR Newtown arson 90 WINDOW Paddington burglary 65 WINDOW CBD burglary 55 WINDOW Newtown burglary 100 ROOF Paddington burglary 10 ROOF CBD burglary NA ROOF Newtown burglary NA
Which of the following statements about the variable count are TRUE?
Incorrect answers will be penalised.