logo

Crowdly

Browser

Add to Chrome

FIT3182 Big data management and processing - MUM S1 2026

Looking for FIT3182 Big data management and processing - MUM S1 2026 test answers and solutions? Browse our comprehensive collection of verified answers for FIT3182 Big data management and processing - MUM S1 2026 at learning.monash.edu.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

How does Kafka achieve write scalability, while achieving high availability? Choose the option below that is NOT correct

0%
0%
100%
0%
View this question

In streaming joins, a traditional hash join (for batch/static data) differs from a symmetric hash join (for unbounded streams). Which of the following statements are true about these join methods?

i) A traditional hash join for two tables (R ⋈ S) typically hashes one entire table (say, S) into memory and then probes with each row from the other table (R).

ii) A symmetric hash join for streams maintains a hash table for each input stream and continuously checks incoming tuples from each side against the stored tuples from the other side.

iii) A single, global hash table can be used to join any number of streams simultaneously without missing any matches.

iv) With two input streams, a symmetric hash join ensures that if a tuple arrives on one stream, it will be matched with any buffered tuples from the other stream that fall into the join window, so no valid pair is missed.

0%
100%
0%
0%
View this question

Which of the following is true about Spark DataFrame?

i) A DataFrame is a Dataset organized into named columns

ii) A DataFrame is conceptually equivalent to a table in a relational database

iii) Spark does not evaluate DataFrame lazily

iv) DataFrame computation happens only when action appears (e.g., display result, save output)

0%
100%
0%
0%
View this question

When applying time-based window sliding for streaming data for joining two streams, what needs to be done? Choose the statement that is NOT correct.

100%
0%
0%
0%
View this question

A data stream application processes incoming data from a data source. The data source emits a single tuple per second (constant rate). The data are received by the data stream application in the format of (eventTime,value). Once it arrives in the server, the processing timestamp is added to the tuple. The final tuple format after the addition of the processing timestamp is (eventTimestamp, value, processingTimestamp).

Examples of tuples are as followed:

{1,a,1}

{2,b,4}

{3,c,4}

{5,e,6}

{6,f,8}

{7,g,8}

{8,h,9}

{9,i,10}

{10,j,11}

Is the underlying network connecting to the data source and the stream server bursty?

100%
0%
0%
View this question

Which of the following correctly describes Apache Kafka's replication and broker?

i) A replicated partition contains exactly the same data as the leader. So the same message is stored multiple times.

ii) With multiple brokers, one unique message is located on a separate broker.

iii) With a stateless broker, the information about message consumption is maintained by the broker. 

iv) Kafka solves the problem of message deletion by using simple time-based SLA for the retention policy.

100%
0%
0%
0%
View this question

In the Spark Structured Streaming programming model, the "Output" is defined as what gets written out to the external storage. The output can be defined in three different modes. However, different types of streaming queries support different output modes, that is, the compatibility matrix. Based on the following list of query types, which are the correct output modes for these queries? 

i) Queries with aggregation (Aggregation on event-time with watermark) - Append, Update, Complete

ii) Queries with aggregation (Other aggregations) - Append

iii) Queries with mapGroupsWithState - Complete

iv) Queries with flatMapGroupsWithState - Append, Update

v) Queries with joins - Append

0%
0%
100%
0%
View this question

Which of the following is NOT true about Apache Kafka’s design?

0%
0%
0%
100%
View this question

In-Stream Join, it is important that the ACID Transaction property is adhered to. Which of the following correctly describes the usage of ACID property in stream join?

i) ACID property guarantees the validity of the stream join results.

ii) ACID property does not guarantee the consistency of the join results.

iii) When a tuple arrives at a stream and is ready to perform a join operation, these steps can be carried out in an interleaved manner.

iv) The join operation, although consisting of many operations, is considered one operation (atomic).

0%
100%
0%
0%
View this question

What is the purpose of watermarking in Spark Structured Streaming (or similar event-time based streaming systems)?

0%
0%
0%
100%
View this question

Want instant access to all verified answers on learning.monash.edu?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Browser

Add to Chrome