Looking for FIT3182 Big data management and processing - MUM S1 2026 test answers and solutions? Browse our comprehensive collection of verified answers for FIT3182 Big data management and processing - MUM S1 2026 at learning.monash.edu.
Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!
Consider the following PySpark code snippet for performing a streaming join:
joinedStream = stream1.join(
stream2,
expr("""
stream1.key = stream2.key AND
stream2.timestamp >= stream1.timestamp AND
stream2.timestamp <= stream1.timestamp + interval 5 minutes
""")
)
Which type of join is demonstrated above?
The following is an example of a Handshake Join between two streams (e.g. streams R and S) (Note that the window is denoted by the shaded area, and the handshake between two tuples is denoted by the double arrow):
What is the problem of the Handshake Join method?
Given the tuples below (tevent_time, value, tarrival_time), what are the tuples to be processed in the 3rd window, if we were to use time-based window with window size being 2 time units and sliding by 1 time unit? The red squared tuples are those processed in the first window.
Given the tuples below (tevent_time, value, tarrival_time), what are the tuples to be processed in the 3rd window, if we were to use tuple-based window with window size being 3 tuples and sliding by 1 tuple? The red squared tuples are those processed in the first window.
Which of the following correctly explains AM Join and M Join?
i) Both M-Join and AM-Join use one hash table for each stream.ii) In M-Join, when an incoming tuple arrives at a stream, it only probes to its hash table in order to find a match.iii) AM-Join has BiHT that summarizes all the hash tables in one bit-map tableiv)In AM-Join, when an incoming tuple arrives at a stream, it always probes to BiHT and the stream hash tables.
If a Spark Structured Streaming job experiences a sudden spike in incoming data that it cannot process as fast as data is arriving, what will happen by default?
Which statement is true about Kafka consumer groups?
How can you handle insertion of the tuple(s) in the Tuple-based Window Streams?
Which stream join algorithm is characterised by dividing the join window into two sets of alternating sub-windows (so that one sub-window can slide forward while the other holds tuples for matching), thereby avoiding missed matches in a sliding window join?
In a fixed-sized time-based window of K time units, we will have _________________ when the slide time is less than the window size.