Crowdly

Add to Chrome

Universities
learning.monash.edu
FIT3182 Big data management and processing - MUM S1 2026

FIT3182 Big data management and processing - MUM S1 2026

Looking for FIT3182 Big data management and processing - MUM S1 2026 test answers and solutions? Browse our comprehensive collection of verified answers for FIT3182 Big data management and processing - MUM S1 2026 at learning.monash.edu.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

Consider the following PySpark code snippet for performing a streaming join:

joinedStream = stream1.join(

stream2,

expr("""

stream1.key = stream2.key AND

stream2.timestamp >= stream1.timestamp AND

stream2.timestamp <= stream1.timestamp + interval 5 minutes

""")

)

Which type of join is demonstrated above?

Inner equi-join

Outer join

Interval join

100%

Arbitrary stateful join

View this question

The following is an example of a Handshake Join between two streams (e.g. streams R and S) (Note that the window is denoted by the shaded area, and the handshake between two tuples is denoted by the double arrow):

handshake

What is the problem of the Handshake Join method?

After a handshake of a pair of tuples is done, both streams R and S move forward at the same time.

100%

After a handshake happens, the streams do not move forward, but each tuple in the stream performs a handshake with the next tuple, and then after that, both streams move forward.

The main window consisting of basic windows is divided into two categories of basic windows. Each basic window is used in an alternate fashion. When the streams move, they will move to the empty basic window.

None

View this question

Given the tuples below (t_{event_time}, value, t_{arrival_time}), what are the tuples to be processed in the 3rd window, if we were to use time-based window with window size being 2 time units and sliding by 1 time unit? The red squared tuples are those processed in the first window.

Image failed to load: image

(3,c,4),(4,d,6),(5,e,8)

(5,e,8)

(2,b,4),(3,c,4),(4,d,6)

100%

(4,d,6),(5,e,8)

View this question

Given the tuples below (t_{event_time}, value, t_{arrival_time}), what are the tuples to be processed in the 3rd window, if we were to use tuple-based window with window size being 3 tuples and sliding by 1 tuple? The red squared tuples are those processed in the first window.

Image failed to load: image

(3,c,4),(4,d,6),(5,e,8)

100%

(6,f,12),(7,g,12),(8,h,12)

(4,d,6),(5,e,8),(6,f,12)

(2,b,4),(3,c,4),(4,d,6)

View this question

Which of the following correctly explains AM Join and M Join?

i) Both M-Join and AM-Join use one hash table for each stream.

ii) In M-Join, when an incoming tuple arrives at a stream, it only probes to its hash table in order to find a match.

iii) AM-Join has BiHT that summarizes all the hash tables in one bit-map table

iv)In AM-Join, when an incoming tuple arrives at a stream, it always probes to BiHT and the stream hash tables.

i, iii, iv

100%

i, ii, iii

i, iii

ii, iv

View this question

If a Spark Structured Streaming job experiences a sudden spike in incoming data that it cannot process as fast as data is arriving, what will happen by default?

The streaming query will immediately fail if it falls behind the schedule.

The streaming job will drop any incoming records that it cannot process in real-time to keep up.

Spark will automatically add more executors (nodes) to handle the higher data volume in real-time.

The excess data will accumulate in memory or in the source buffer, creating a backlog and increased latency.

100%

View this question

Which statement is true about Kafka consumer groups?

If one consumer in the group crashes, the partition it was consuming halts and will not be processed by anyone until it comes back.

Each partition in a topic is consumed by at most one consumer within a given consumer group.

100%

A consumer group can only have one consumer process all partitions of a topic.

All consumers in the same group receive every message published to the topic

View this question

How can you handle insertion of the tuple(s) in the Tuple-based Window Streams?

All

Many in, but only keep one; hence one will leave

Many in, many out

One in, one out

100%

View this question

Which stream join algorithm is characterised by dividing the join window into two sets of alternating sub-windows (so that one sub-window can slide forward while the other holds tuples for matching), thereby avoiding missed matches in a sliding window join?

Symmetric Hash Join

Handshake Join

M-Join using alternative windows

100%

Nested Loops Join

View this question

In a fixed-sized time-based window of K time units, we will have _________________ when the slide time is less than the window size.

Transformed Windows

Overlapped Windows

100%

Non-Overlapped Windows

Dynamic Windows