#008 Demystify Streaming vs Messaging (Kafka vs Rabbit MQ )

Movie Analogy from Bahuballi.

Sep 20, 2025

🎬 Movie Analogy: Baahubali

RabbitMQ = King’s Messenger 📨
- In Baahubali, when the King wants to deliver an important message (say, orders to a commander), he sends a single messenger.
- The messenger runs, hands over the message, and once delivered — his job is done.
- That’s RabbitMQ: one message, one delivery, task completed.
Kafka = Katappa during battle ⚔️
- In war scenes, Katappa is constantly shouting live updates — “Arrows from the left!”, “Elephants approaching!”, “Shields up!”
- Everyone — Baahubali, soldiers, commanders — hears the same information at the same time.
- Even if someone missed it live, the battlefield log (like war records) can replay what was said.
- That’s Kafka: continuous event streaming, replayable, shared by many at once.

FIRST will explain the key components of Schema Registry and Offsets present exclusively in Kafka, then move through architecture, business fit, and code for the key components.

1) Schema Registry & Offsets (first)

Kafka

Schema Registry (Confluent / open-source equivalents)
- Manages Avro/Protobuf/JSON Schema with compatibility rules (BACKWARD/FORWARD/FULL).
- Each record carries a schema ID; consumers fetch/deserialize safely as schemas evolve.
- Enables independent deploys and strong data contracts between teams.
Offsets (per topic-partition, per consumer group)
- Consumers pull and track offsets (stored in __consumer_offsets or externally).
- Can seek to an exact offset or timestamp → backfill, point-in-time reprocessing, audits.
- Powers replayability and exactly-once (with idempotent producers & transactions).

RabbitMQ

Schema: No built-in registry. You carry schema as app-level contracts (e.g., headers schemaVersion + tolerant readers).
Replay/Offsets: Push + ack model; message is removed on ack. No “offsets” to seek. For replay, use dead-letter queues (DLX) or keep a persistent copy (e.g., S3/DB) and re-enqueue.

2) Architecture (side-by-side)

RabbitMQ (AMQP broker)

Producer → Exchange (direct/topic/fanout/headers) → Binding → Queue → Consumer(s).
Push delivery; manual ack deletes message.
Routing-first (rich patterns via exchange type + routing keys).
Ordering: per queue but competing consumers can change perceived order.
HA/Scale: clustered brokers, mirrored/quorum queues; scale via more queues/consumers.

Kafka (distributed commit log / streaming)

Producer → Topic → Partitions (leader + replicas) → Consumer groups reading by offset.
Pull model; ordering within a partition; replay by resetting/setting offsets.
Scale via partitions (parallelism) and brokers; replication for HA; KRaft for metadata.

3) Business problem fit

Transactional task queues / request–reply / per-message retries
- Examples: email/PDF jobs, webhook fan-out with guarantees.
- ✅ RabbitMQ: push, prefetch (back-pressure), DLX, routing patterns.
Event streaming / analytics / CDC / audit / many readers
- Examples: clickstream, IoT/vitals, change-data-capture, ML features.
- ✅ Kafka: partitions for throughput, replay & time-travel, multi-consumer groups.
Schema governance across teams
- ✅ Kafka + Schema Registry (safe evolution at scale).
Strict “exactly once” end-to-end
- ✅ Kafka (idempotent producer + transactions + Streams EOS).
- ➖ RabbitMQ requires app-level idempotency.

4) Code covering key architectural components

Kafka — Producer with Schema Registry (Avro) + idempotence

# pip install confluent-kafka confluent-kafka[avro]
from confluent_kafka import SerializingProducer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
import socket, json

sr = SchemaRegistryClient({"url": "http://localhost:8081"})

order_schema = """
{
  "type":"record","name":"Order",
  "fields":[
    {"name":"order_id","type":"string"},
    {"name":"amount","type":"double"},
    {"name":"currency","type":"string","default":"USD"}
  ]
}
"""

value_ser = AvroSerializer(sr, order_schema)

producer = SerializingProducer({
  "bootstrap.servers": "localhost:9092",
  "client.id": socket.gethostname(),
  "enable.idempotence": True,   # EOS building block
  "acks": "all",
  "linger.ms": 10,               # batching
  "compression.type": "lz4"
})

topic = "orders.v1"
for i in range(1000, 1005):
    key = str(i % 3)             # key → partition (ordering within partition)
    producer.produce(topic=topic, key=key,
                     value={"order_id": str(i), "amount": 49.5, "currency": "USD"})
producer.flush()

Kafka — Consumer with manual commits + offset replay by timestamp

# pip install confluent-kafka confluent-kafka[avro]
from confluent_kafka import DeserializingConsumer, TopicPartition
from confluent_kafka.schema_registry.avro import AvroDeserializer
from confluent_kafka.schema_registry import SchemaRegistryClient
from datetime import datetime, timedelta

sr = SchemaRegistryClient({"url": "http://localhost:8081"})
value_deser = AvroDeserializer(sr, None)  # schema resolved by message schema-id

c = DeserializingConsumer({
  "bootstrap.servers": "localhost:9092",
  "group.id": "orders-analytics",
  "auto.offset.reset": "earliest",
  "enable.auto.commit": False
})
topic = "orders.v1"
c.subscribe([topic])

# --- Seek to a timestamp for backfill (last 24h) ---
_ = c.poll(0)                              # trigger assignment
assignments = c.assignment()
cutoff_ms = int((datetime.utcnow() - timedelta(hours=24)).timestamp() * 1000)
for tp in assignments:
    tp.offset = cutoff_ms
offsets = c.offsets_for_times(assignments, timeout=5.0)
for tp in offsets:
    if tp.offset >= 0:
        c.seek(tp)

try:
    while True:
        msg = c.poll(1.0)
        if not msg:
            continue
        record = msg.value()              # Avro → dict
        # ...process record...
        c.commit(msg)                     # manual, after success
finally:
    c.close()

Kafka Streams — stateful aggregation with EOS (topology backbone)

// Gradle: implementation "org.apache.kafka:kafka-streams:3.7.0"
StreamsBuilder b = new StreamsBuilder();
KStream<String, String> orders = b.stream("orders.v1");

// assume value contains amount field, parsed by your serde
orders
  .mapValues(v -> parseAmount(v))        // extract numeric amount
  .groupByKey()                          // key = productId/orderKey
  .reduce((a, x) -> a + x)               // running total (state store)
  .toStream()
  .to("revenue.v1");

KafkaStreams app = new KafkaStreams(b.build(), props);  // props include EOS configs
app.start();

Key Kafka hooks you’re demonstrating: schema governance, key→partition ordering, offset control/replay, manual commits, Streams state.

RabbitMQ — Producer/Consumer (topic exchange, versioned schema header) + DLX

Producer (pika) with “schemaVersion” header

# pip install pika
import pika, json

conn = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
ch = conn.channel()

ch.exchange_declare(exchange='events', exchange_type='topic', durable=True)
ch.queue_declare(queue='email.q', durable=True,
                 arguments={"x-dead-letter-exchange": "dlx"})  # DLX for failures
ch.queue_bind(queue='email.q', exchange='events', routing_key='notify.email')

payload = {"userId": 42, "subject": "Hello", "body": "Welcome!"}
props = pika.BasicProperties(
    content_type="application/json",
    delivery_mode=2,  # persistent
    headers={"schemaVersion": "2.1"}
)
ch.basic_publish(exchange='events', routing_key='notify.email',
                 body=json.dumps(payload), properties=props)
conn.close()

Consumer with prefetch, manual ack, and DLX on failure

import pika, json

conn = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
ch = conn.channel()
ch.basic_qos(prefetch_count=20)        # back-pressure

def handle(ch_, method, props, body):
    try:
        version = (props.headers or {}).get("schemaVersion", "1.0")
        data = json.loads(body)
        # branch by version or use tolerant parser...
        # process...
        ch_.basic_ack(delivery_tag=method.delivery_tag)
    except Exception:
        # reject & dead-letter for later replay/inspection
        ch_.basic_nack(delivery_tag=method.delivery_tag, requeue=False)

ch.basic_consume(queue='email.q', on_message_callback=handle, auto_ack=False)
ch.start_consuming()

Replay pattern (RabbitMQ)

Re-publish from DLX queue back to the main exchange after fixing the handler or writing a one-off “replayer” script.
Or maintain a persistent copy of messages (e.g., S3) to regenerate queues.

Key RabbitMQ hooks you’re demonstrating: exchange/queue/binding, push + prefetch, ACK/DLX, app-level schema versioning.

5) Quick design checklist (you can say this out loud)

Kafka: “We enforce schemas via Schema Registry, choose partitions for parallelism and ordering scope, use idempotent producers and transactions for EOS, and leverage offset seeks for backfills and audits.”
RabbitMQ: “We design exchange types and routing keys, set durable queues/messages, tune prefetch for back-pressure, and use DLX + persistent storage for replays. Schema is handled in headers/versioning at the app layer.”

CUT CLUTTER IN TECH

Discussion about this post

Ready for more?