Looking for FIT3182 Big data management and processing - MUM S1 2026 test answers and solutions? Browse our comprehensive collection of verified answers for FIT3182 Big data management and processing - MUM S1 2026 at learning.monash.edu.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
How does Kafka achieve write scalability, while achieving high availability? Choose the option below that is NOT correct
In streaming joins, a traditional hash join (for batch/static data) differs from a symmetric hash join (for unbounded streams). Which of the following statements are true about these join methods?
i) A traditional hash join for two tables (R ⋈ S) typically hashes one entire table (say, S) into memory and then probes with each row from the other table (R). ii) A symmetric hash join for streams maintains a hash table for each input stream and continuously checks incoming tuples from each side against the stored tuples from the other side. iii) A single, global hash table can be used to join any number of streams simultaneously without missing any matches. iv) With two input streams, a symmetric hash join ensures that if a tuple arrives on one stream, it will be matched with any buffered tuples from the other stream that fall into the join window, so no valid pair is missed.
Which of the following is true about Spark DataFrame?
i) A DataFrame is a Dataset organized into named columnsii) A DataFrame is conceptually equivalent to a table in a relational databaseiii) Spark does not evaluate DataFrame lazilyiv) DataFrame computation happens only when action appears (e.g., display result, save output)
When applying time-based window sliding for streaming data for joining two streams, what needs to be done? Choose the statement that is NOT correct.
A data stream application processes incoming data from a data source. The data source emits a single tuple per second (constant rate). The data are received by the data stream application in the format of (eventTime,value). Once it arrives in the server, the processing timestamp is added to the tuple. The final tuple format after the addition of the processing timestamp is (eventTimestamp, value, processingTimestamp).
Examples of tuples are as followed:
{1,a,1}
{2,b,4}
{3,c,4}
{5,e,6}
{6,f,8}
{7,g,8}
{8,h,9}
{9,i,10}
{10,j,11}
Is the underlying network connecting to the data source and the stream server bursty?
Which of the following correctly describes Apache Kafka's replication and broker?
i) A replicated partition contains exactly the same data as the leader. So the same message is stored multiple times.ii) With multiple brokers, one unique message is located on a separate broker.iii) With a stateless broker, the information about message consumption is maintained by the broker. iv) Kafka solves the problem of message deletion by using simple time-based SLA for the retention policy.
In the Spark Structured Streaming programming model, the "Output" is defined as what gets written out to the external storage. The output can be defined in three different modes. However, different types of streaming queries support different output modes, that is, the compatibility matrix. Based on the following list of query types, which are the correct output modes for these queries?
i) Queries with aggregation (Aggregation on event-time with watermark) - Append, Update, Completeii) Queries with aggregation (Other aggregations) - Appendiii) Queries with mapGroupsWithState - Completeiv) Queries with flatMapGroupsWithState - Append, Updatev) Queries with joins - Append
Which of the following is NOT true about Apache Kafka’s design?
In-Stream Join, it is important that the ACID Transaction property is adhered to. Which of the following correctly describes the usage of ACID property in stream join?
i) ACID property guarantees the validity of the stream join results.ii) ACID property does not guarantee the consistency of the join results.iii) When a tuple arrives at a stream and is ready to perform a join operation, these steps can be carried out in an interleaved manner.iv) The join operation, although consisting of many operations, is considered one operation (atomic).
What is the purpose of watermarking in Spark Structured Streaming (or similar event-time based streaming systems)?