Getting Started with Kafka: How It Works

Posts

Kafka is a powerful distributed event streaming platform designed to handle real-time data feeds. It enables applications to publish and subscribe to streams of records, supporting a wide range of use cases such as data integration, stream processing, and real-time analytics. Kafka’s architecture is built for high throughput, fault tolerance, and scalability, making it a popular choice for handling large volumes of data in modern data infrastructures.

Unlike traditional messaging systems, Kafka stores streams of records in categories called topics and retains messages for a configurable amount of time, allowing consumers to process data at their own pace. This unique design supports various consumption patterns, including real-time processing and batch analytics.

Core Kafka Components and Their Roles

Kafka operates through a set of key components that work together to facilitate efficient data streaming:

  • Producers: These are client applications that publish data to Kafka topics. Producers decide which topic and partition their messages should be sent to, often based on message keys to maintain order within partitions.
  • Brokers: Brokers are Kafka servers that receive messages from producers, store them, and serve them to consumers. They form a Kafka cluster, distributing partitions across multiple brokers to achieve scalability and fault tolerance.
  • Topics and Partitions: Topics categorize messages, while partitions subdivide topics into ordered, immutable sequences of messages. Partitions allow Kafka to parallelize processing by distributing data across brokers.
  • Consumers: Consumers subscribe to topics to read and process messages. They track their progress using offsets and can be organized into consumer groups to enable scalable parallel processing.

Each of these components plays a specific role in the Kafka ecosystem, and their interaction defines Kafka’s overall functionality.

The Kafka Protocol: Communication Between Components

Kafka uses a binary protocol over TCP to enable communication between producers, brokers, and consumers. This protocol is designed to be efficient and supports operations such as producing data, fetching messages, managing offsets, and controlling topic metadata.

Producers and consumers interact with brokers through this protocol, ensuring that data can flow seamlessly in real-time. The protocol supports batching, compression, and acknowledgment mechanisms to optimize network usage and data durability.

Because Kafka’s protocol is simple yet powerful, it allows for the implementation of clients in various programming languages, facilitating integration with diverse systems and applications.

Distributed Architecture for Scalability and Reliability

Kafka’s strength as a distributed streaming platform lies in its carefully designed architecture that enables both scalability and reliability. The distributed nature of Kafka means that its components run across multiple servers or nodes, working together as a cluster. This setup allows Kafka to handle massive amounts of data in real time, support fault tolerance, and provide high availability. Let’s explore in detail how Kafka’s distributed architecture achieves these goals.

Kafka Cluster: Multiple Brokers Working in Unison

At the core of Kafka’s distributed architecture is the Kafka cluster, which consists of multiple Kafka brokers. Each broker is a server that stores data and serves client requests. A Kafka cluster can be as small as a few brokers or scale to hundreds of brokers, depending on workload demands.

The cluster works as a collective unit, with each broker responsible for a subset of partitions within various topics. By distributing partitions across brokers, Kafka ensures workload balancing and parallel data processing. This distribution enables Kafka to scale horizontally — simply adding more brokers to the cluster allows Kafka to handle more data and more clients.

Each broker in the cluster operates independently but coordinates with others to maintain a consistent state. This coordination is vital to keep track of which broker leads a given partition, replicates data, and manages client connections.

Partitioning: The Key to Scalability

Partitioning is fundamental to Kafka’s ability to scale. Each topic is divided into partitions, and these partitions are spread across brokers in the cluster. This division enables Kafka to handle very high throughput because multiple partitions can be read and written in parallel.

When producers send data to Kafka, they write to specific partitions based on a key or a partitioning strategy. Consumers read data from partitions, and because partitions are independent, multiple consumers can process partitions concurrently without conflicting with each other.

Partitions also provide Kafka with fault isolation. If a single partition or its broker fails, other partitions continue to operate normally, minimizing system-wide impact.

Replication: Ensuring Data Durability and Availability

Reliability in Kafka’s distributed architecture is largely achieved through replication. Each partition in Kafka is replicated across multiple brokers, with one broker acting as the leader and others as followers.

The leader broker handles all read and write requests for that partition, while the follower brokers replicate the data asynchronously. This replication guarantees that even if one broker fails, the data remains available on other brokers.

Kafka’s replication factor is configurable. For critical data, a higher replication factor is recommended to maximize durability and availability. The trade-off is increased storage and network overhead.

If the leader broker fails, Kafka’s controller broker automatically elects a new leader from the in-sync replicas (followers that have fully caught up with the leader). This failover process is transparent to producers and consumers, ensuring minimal disruption.

Controller Broker: The Cluster Manager

Within the Kafka cluster, one broker is designated as the controller. The controller broker manages the overall state of the cluster, including leadership elections for partitions and monitoring broker failures.

The controller coordinates partition assignments and oversees rebalancing when brokers join or leave the cluster. It also manages metadata propagation, ensuring all brokers have updated information about topic configurations, partition leaders, and cluster health.

The controller role is critical for maintaining cluster consistency and coordinating complex distributed operations across brokers.

Zookeeper and Metadata Management

Historically, Kafka relied on Apache ZooKeeper to manage cluster metadata and coordinate distributed processes. ZooKeeper tracks brokers, elects the controller, manages topic and partition metadata, and handles leader elections.

ZooKeeper ensures that the cluster’s state is consistent and reliable, even as brokers fail or network partitions occur. It provides a centralized service for configuration management and coordination.

In newer Kafka versions, Kafka is transitioning to a self-managed metadata quorum that eliminates the dependency on ZooKeeper, simplifying operations and reducing latency. Regardless of the method, managing metadata is vital for Kafka’s distributed consistency.

High Throughput and Low Latency Through Distributed Design

Kafka’s distributed architecture supports very high throughput and low-latency message handling. By spreading partitions across multiple brokers and allowing parallel reads and writes, Kafka can ingest and deliver millions of messages per second.

The design minimizes disk I/O by using sequential writes to append-only logs and efficiently uses network resources by batching messages between producers, brokers, and consumers.

Additionally, Kafka supports configurable acknowledgment policies that balance latency and durability. Producers can choose to receive acknowledgments only from partition leaders or from all replicas, influencing write latency and fault tolerance.

Fault Tolerance: Handling Failures Gracefully

In any distributed system, failures are inevitable. Kafka is built to handle failures gracefully without losing data or causing downtime.

When a broker fails, Kafka’s controller detects the failure through heartbeat monitoring. Partitions hosted on the failed broker trigger a leader election among the in-sync replicas. This failover mechanism quickly restores the availability of data partitions.

Consumer groups also benefit from fault tolerance. If a consumer within a group fails, the remaining consumers rebalance partition assignments, ensuring continuous data processing without duplication or loss.

Kafka’s design supports recovery from transient network issues, disk failures, and process crashes, making it highly resilient in production environments.

Scalability: Growing with Demand

Kafka’s architecture supports seamless scaling. To increase capacity, organizations add more brokers to the cluster. Kafka then redistributes partition leadership and data storage across the new brokers, balancing load and expanding throughput.

Scaling consumers is equally straightforward. Consumer groups allow multiple consumers to share the workload of processing topic partitions, enabling near-linear scaling of consumption.

Kafka’s partition-based approach means scaling is granular and controlled. Administrators can increase the number of partitions for a topic to match expected growth in message volume or consumer demand.

Balancing Consistency and Availability

Kafka’s distributed system design balances the CAP theorem trade-offs, prioritizing availability and partition tolerance while offering strong consistency within partitions.

Data is consistent within each partition because messages are strictly ordered by offsets. However, across partitions, ordering is not guaranteed, which is a trade-off for scalability and availability.

Replication ensures that data remains available even if some brokers fail, but there is a small window during failover when new leaders may not be fully up to date. Kafka mitigates this by using in-sync replica sets and acknowledgment policies to control data durability.

Monitoring and Managing a Kafka Cluster

Running a distributed Kafka cluster requires careful monitoring and management. Kafka provides extensive metrics on broker health, topic throughput, consumer lag, and replication status.

Tools and dashboards track key indicators like request rates, latency, partition distribution, and under-replicated partitions. Monitoring helps detect imbalances, resource bottlenecks, and potential failures early.

Automated cluster management tools assist with partition reassignment, topic configuration, and broker maintenance, enabling reliable operation at scale.

Real-World Impact of Kafka’s Distributed Architecture

Kafka’s distributed architecture makes it suitable for mission-critical applications that require real-time data streaming with high availability and durability. Use cases range from fraud detection in banking, log aggregation for IT operations, to powering large-scale event-driven microservices.

Enterprises leverage Kafka clusters with dozens or hundreds of brokers, managing millions of partitions and handling trillions of messages per day. The architecture supports their needs for scalability, fault tolerance, and continuous data availability.

Message Storage and Retention

Kafka stores messages on disk in a log format, where new messages are appended to the end of the log. Unlike many messaging systems that delete messages once consumed, Kafka retains messages for a configurable retention period or until a specified storage size is reached.

This approach allows consumers to read messages at any point within the retention window and to reprocess data if needed. It also supports multiple consumers with different processing speeds without impacting the producer’s ability to continue publishing data.

Kafka’s storage mechanism is optimized for sequential disk access, which improves performance and reduces latency. The combination of durable storage and retention policies makes Kafka suitable for both streaming and batch processing use cases.

Kafka Brokers: The Servers Behind the System

Kafka brokers are the fundamental servers that make up the Kafka cluster. Each broker is a separate Kafka server responsible for storing data and serving client requests. When a Kafka cluster is deployed, it typically consists of multiple brokers working together to provide a scalable and fault-tolerant system.

Each broker manages a subset of partitions and topics. Brokers handle incoming messages from producers and write them to disk, as well as serve consumers by responding to read requests. The distribution of partitions among brokers ensures that the workload is spread evenly across the cluster, which helps in balancing performance and resource utilization.

Brokers also handle coordination tasks, including managing leader election for partitions and maintaining metadata about the cluster state. This coordination is essential to ensure consistency and high availability across the cluster.

Partitioning: Organizing Data for Scalability

Kafka topics are divided into partitions, which are fundamental to Kafka’s scalability and parallel processing capabilities. Each partition is an ordered, immutable sequence of messages, and partitions are distributed across brokers.

Partitions allow Kafka to scale horizontally because multiple brokers can host different partitions of the same topic. This means producers and consumers can work with partitions in parallel, significantly improving throughput.

Because each partition is hosted on a single broker, Kafka can guarantee message ordering within that partition. This ordering is crucial for applications where the sequence of events matters.

The number of partitions for a topic can be configured, allowing administrators to balance between parallelism and resource consumption. More partitions typically improve scalability but may increase overhead.

Replication and Fault Tolerance

To achieve fault tolerance, Kafka replicates each partition across multiple brokers. Each partition has one leader broker and one or more follower brokers. The leader handles all read and write requests, while followers replicate the leader’s data to maintain an up-to-date copy.

Replication ensures that if a broker hosting a leader partition fails, one of the followers can be promoted to leader without data loss. This failover process is managed automatically by Kafka’s controller, a special broker that oversees cluster management.

Replication also helps improve read throughput, as some consumers can fetch data from follower replicas in certain configurations, though typically reads are served from leaders to maintain consistency.

The replication factor, or the number of copies of each partition, is configurable. Higher replication factors provide greater durability but require more storage and network resources.

Kafka’s Storage Model: Logs and Segments

Kafka stores messages in a log-structured format. Each partition corresponds to a log, which is an append-only sequence of records stored on disk. This design enables Kafka to efficiently write data sequentially, minimizing disk seek times and maximizing throughput.

Logs are divided into segments, which are smaller files that help manage storage and retention. When a segment reaches a configured size or age, Kafka closes it and starts a new one. This segmentation facilitates efficient data cleanup and recovery.

Because Kafka retains data for a configurable retention period, logs are not immediately deleted after consumption. Instead, Kafka deletes log segments based on retention policies, such as time-based or size-based limits.

This storage model supports various use cases, including replaying events for debugging, reprocessing data streams, and supporting multiple consumers with different consumption speeds.

Metadata Management and Cluster Coordination

Kafka brokers maintain metadata about the cluster, including information about topics, partitions, leaders, and consumer group offsets. This metadata is crucial for routing requests correctly and maintaining cluster health.

The Kafka controller broker plays a key role in managing this metadata and orchestrating cluster-wide actions such as leader elections and partition assignments.

Kafka uses ZooKeeper or its internal metadata quorum (in newer versions) to store and coordinate metadata. This coordination layer ensures consistency and helps brokers detect failures and recover quickly.

By efficiently managing metadata and coordinating cluster activities, Kafka provides a resilient and scalable infrastructure capable of supporting high-volume, real-time data streams.

Role of Producers in Kafka

Producers are the clients or applications responsible for sending data into Kafka. They generate the messages that form the core of Kafka’s event streaming capabilities. The producer’s job is to publish records to specific Kafka topics, which then become available for consumers to process.

Producers interact with Kafka brokers using the Kafka protocol, establishing connections to one or more brokers depending on the cluster configuration. They typically send data in batches to improve network efficiency and throughput. Batching reduces overhead by grouping multiple messages before transmission.

The producer is also responsible for deciding how messages are distributed within topics. Since topics can be divided into multiple partitions, the producer must select which partition to send each message to. This decision affects message ordering and load balancing across the Kafka cluster.

Message Categorization Using Topics

In Kafka, topics serve as the foundational organizational structure for data streams. They act as named channels or categories into which producers send messages and from which consumers read. Understanding how topics function is essential to grasp how Kafka manages and categorizes vast amounts of data in real-time.

The Concept of Topics in Kafka

A topic in Kafka is essentially a logical stream name. It represents a particular category or feed to which data records belong. For example, a company could have separate topics for user activities, application logs, financial transactions, or system monitoring metrics. Each of these topics holds messages related to its specific domain, allowing applications to handle and process data in a more organized way.

Topics are durable and can store data for configurable retention periods. This means that messages remain accessible in a topic long after they have been produced, enabling consumers to read at their own pace, replay messages if needed, and support multiple independent applications consuming the same data stream.

Why Categorize Messages Using Topics?

Categorizing messages into topics brings clarity and separation to data flows. Without topics, all messages would be mixed in a single stream, making it difficult to distinguish different types of data or apply specialized processing rules.

Topics allow teams to define clear boundaries between different data sources and use cases. For instance, analytics applications may only need to subscribe to topics related to sales or marketing data, while operational monitoring tools consume messages from system logs topics. This segmentation improves maintainability and reduces complexity in data pipelines.

Moreover, topics provide a mechanism to control access and security. Administrators can set permissions at the topic level, granting or restricting read and write access based on organizational policies. This enables Kafka to serve as a secure, multi-tenant event streaming platform.

Structure and Characteristics of Topics

A topic is not just a simple collection of messages; it has an internal structure designed for performance and scalability. Topics are divided into partitions, which are ordered sequences of records. Each partition is stored and managed by a single broker in the Kafka cluster.

The number of partitions determines the parallelism available for processing a topic’s messages. More partitions mean more consumers can read from the topic concurrently, improving throughput. However, this also increases resource consumption and management complexity.

Partitions within a topic maintain message order only at the partition level, not across the entire topic. This means that messages within a partition are strictly ordered by offset, but Kafka does not guarantee ordering between different partitions. Understanding this behavior is critical when designing applications that rely on message order.

Topics and Data Retention

Kafka topics have configurable retention policies that define how long messages are stored. This retention can be based on time (e.g., retain messages for seven days) or on the size of data (e.g., retain up to 100 GB of messages).

Retention policies allow Kafka to balance between resource usage and data availability. Even after consumers have processed messages, Kafka can keep them for further use cases such as reprocessing, debugging, or feeding additional downstream applications.

Kafka’s ability to retain data independently from consumption status differentiates it from traditional messaging systems that delete messages immediately once consumed. This feature supports multiple consumers with varying processing speeds and enables use cases requiring event replay.

Creating and Managing Topics

Topics can be created manually by administrators or automatically by Kafka when a producer or consumer attempts to use a topic that does not yet exist. Manual topic creation allows for precise control over configuration settings such as the number of partitions, replication factor, and retention policies.

Proper topic configuration is essential for optimizing Kafka performance. For example, a topic expected to handle high message volumes should have more partitions to enable parallel processing. Similarly, replication factors ensure fault tolerance but consume more cluster resources.

Kafka provides tools and APIs to manage topics dynamically, allowing administrators to modify configurations, add partitions, or delete topics as system requirements evolve.

Topic Naming Conventions and Best Practices

Choosing clear and consistent topic names is crucial for managing Kafka at scale. Names should be descriptive and follow organizational standards to help developers and operators quickly understand the data contained within a topic.

A common approach is to use a hierarchical naming convention that reflects the data domain and type, such as sales. Transactions, user.activity.login, or system.metrics.cpu. This structure aids in filtering and organizing topics, especially in large deployments.

Additionally, naming conventions often include environment indicators like dev, test, or prod to distinguish between topics used in different stages of development and deployment.

Interaction Between Producers, Topics, and Partitions

When a producer sends a message, it specifies the topic to which the message belongs. The producer may also provide a key that Kafka uses to determine the partition within the topic where the message should be stored. This key-based partitioning ensures that messages with the same key always go to the same partition, preserving order for related events.

For example, in an e-commerce system, all orders for a particular customer might share the same key, guaranteeing that these order events are processed in sequence by consumers.

If no key is provided, Kafka distributes messages across partitions in a round-robin fashion, balancing load evenly. This flexibility allows producers to optimize for either ordering or throughput, depending on their use case.

Topics and Multi-Tenancy

Kafka’s topic model supports multi-tenancy, meaning multiple teams or applications can share the same Kafka cluster without interfering with each other’s data streams. Topics provide the logical separation needed to isolate data access and processing.

Each tenant can own specific topics, with permissions enforced at the broker level. This separation facilitates resource sharing while maintaining data security and governance.

Kafka clusters can scale to handle thousands of topics, making them suitable for large organizations with diverse data streaming needs.

Challenges and Considerations in Topic Management

While topics provide many advantages, managing them effectively requires attention to several challenges:

  • Partition Count Decisions: Selecting the right number of partitions is crucial. Too few partitions limit scalability and throughput, while too many increase overhead and complexity.
  • Retention and Storage Costs: Longer retention policies increase storage requirements and may impact performance. Organizations need to balance data availability with resource costs.
  • Topic Explosion: In large environments, the number of topics can grow rapidly, making monitoring and management more complex. Naming conventions and lifecycle policies help mitigate this.
  • Data Schema Evolution: As data structures evolve, maintaining compatibility across producers and consumers is essential. Kafka often integrates with schema registries to manage schema versions and ensure data integrity.

Kafka Topics in Real-World Applications

In practice, Kafka topics enable various architectures and applications. For instance:

  • Event Sourcing: Each domain event is published to a topic, creating an immutable record of all changes. Consumers reconstruct the application state by processing these events.
  • Stream Processing: Applications consume topics to perform transformations, aggregations, and analytics in real-time, producing results to other topics.
  • Data Integration: Kafka acts as a central hub where data from multiple sources is categorized into topics and then routed to databases, data warehouses, or other systems.

Through topics, Kafka facilitates decoupling between data producers and consumers, enabling highly flexible and scalable data ecosystems.

Partitioning and Message Routing

Partitions enable Kafka to scale by dividing topics into multiple ordered sequences of messages. Each partition is hosted on a single broker, and the producer determines which partition a message belongs to.

The partitioning decision is often based on a message key. When a key is provided, Kafka’s default partitioner uses a hashing function to map the key to a specific partition. This approach ensures that all messages with the same key go to the same partition, preserving their order.

If no key is provided, Kafka distributes messages across partitions in a round-robin fashion to balance load evenly. This flexibility allows producers to control message ordering and distribution depending on application needs.

Ensuring Reliable Message Delivery

Kafka offers several mechanisms to guarantee the reliable delivery of messages from producers to brokers. Producers can configure acknowledgment settings that control when a message is considered successfully sent.

For instance, a producer can request no acknowledgments, acknowledgment from the leader broker only, or acknowledgment from all in-sync replicas. The highest level, acknowledgment from all replicas, ensures the strongest durability but may add some latency.

In case of failures or network issues, producers are designed to retry sending messages automatically. Combined with idempotent producer features, Kafka can avoid duplicate messages, providing exactly-once delivery semantics in supported scenarios.

Optimizations: Compression and Batching

To maximize performance, Kafka producers support compression and batching. Compression reduces the size of messages sent over the network and stored on disk, which saves bandwidth and storage space.

Kafka supports several compression algorithms, such as gzip, Snappy, and LZ4. Producers can choose the compression method that best suits their performance and resource trade-offs.

Batching groups multiple messages into a single request reduces the overhead of network round-trip. Producers can tune batch sizes and linger times to balance latency and throughput according to their application’s needs.

Understanding Kafka Consumers

Consumers are applications or services that subscribe to Kafka topics to retrieve and process messages. They represent the end users of the data streams produced and stored within Kafka. Consumers can be simple programs that process data in real-time or more complex systems that analyze, transform, or route the incoming data.

Consumers connect to Kafka brokers to fetch messages from one or more partitions of a topic. Kafka supports multiple consumers reading from the same topic concurrently, enabling scalable and flexible data processing pipelines.

Consumer Groups and Load Balancing

To scale message consumption and ensure efficient processing, Kafka uses the concept of consumer groups. A consumer group is a set of consumers that work together to consume messages from a topic. Within a group, each partition’s messages are delivered to only one consumer, preventing duplicate processing of the same message.

This model allows Kafka to distribute workload evenly among consumers, enabling horizontal scaling. If a new consumer joins the group, Kafka rebalances partition assignments so the workload is shared. Similarly, if a consumer leaves or fails, the partitions it was responsible for are reassigned to the remaining consumers.

Consumer groups are fundamental for fault tolerance and parallel processing in Kafka, allowing applications to process data streams efficiently without losing messages or duplicating efforts.

Offset Management: Tracking Progress in Kafka

Kafka consumers keep track of their position within each partition using offsets. An offset is a unique identifier for each message within a partition and represents the consumer’s progress in processing data.

Consumers commit their offsets periodically, either automatically or manually. Committing an offset signals to Kafka that the consumer has successfully processed messages up to that point, allowing the consumer to resume from the last committed offset after a restart or failure.

Offsets can be stored within Kafka itself, typically in a special internal topic, or externally, depending on the application’s design. Proper offset management is critical for ensuring that messages are neither lost nor processed more than once.

Pull-Based Message Consumption

Kafka uses a pull model for delivering messages to consumers. Unlike push-based systems, where the server sends data as soon as it is available, Kafka consumers request batches of messages from brokers when they are ready to process them.

This pull-based approach gives consumers control over the rate of message consumption, allowing them to handle backpressure effectively. If a consumer is slow or busy, it can delay fetching new messages, preventing overload.

Consumers fetch messages in batches, which improves throughput by reducing the number of network requests. Kafka also allows consumers to specify parameters such as batch size and fetch timeouts to fine-tune performance.

Exactly-Once Processing and Transactions

Kafka supports exactly-once processing semantics when used with Kafka Streams or with transactional producers and consumers. This capability ensures that each message is processed once and only once, even in the presence of failures or retries.

Transactional messaging in Kafka involves grouping multiple writes and reads into atomic operations. For example, a producer can write messages to multiple topics within a single transaction, guaranteeing that all writes either succeed or fail together.

Consumers can participate in transactions to commit offsets only if the processing of messages is successful. This coordination prevents data loss and duplication, which is critical for financial applications, event sourcing, and other sensitive use cases.

Consumer Configurations and Optimization

Kafka consumers provide various configuration options to optimize consumption based on the application’s needs. These options include settings for fetch sizes, timeouts, retry policies, and deserialization.

Consumers can also implement custom logic for error handling, message filtering, and processing guarantees. This flexibility allows Kafka to support diverse workloads, from high-throughput event processing to complex stream transformations.

Through this rich set of features and a robust consumption model, Kafka enables efficient and reliable processing of real-time data streams at scale.

Final Thoughts 

Kafka is a groundbreaking platform that has transformed how organizations handle real-time data streaming and processing. Its distributed, scalable, and fault-tolerant architecture allows it to manage massive volumes of data with high throughput and low latency, making it suitable for modern data-driven applications.

By breaking down data streams into topics and partitions, and by leveraging brokers to distribute and replicate data, Kafka achieves a balance between performance and reliability. Producers and consumers play distinct roles that enable flexible, efficient data ingestion and processing workflows, while Kafka’s protocol ensures smooth communication between all components.

Kafka’s design to retain messages for configurable periods rather than deleting them immediately after consumption opens up possibilities for complex use cases like event replay, auditing, and multiple parallel consumers operating independently.

The concept of consumer groups, offset management, and transactional processing ensures that Kafka can provide exactly-once semantics and strong guarantees, which are essential for critical business applications.

In summary, Kafka is more than just a messaging system; it is a powerful streaming platform that supports real-time analytics, event sourcing, and seamless data integration across diverse systems. Its robust architecture and flexible features make it a key component in the infrastructure of many modern enterprises aiming to harness the power of continuous data streams.