Operating Kafka in Rails with Karafka: Production Architecture, Consumers, and DLQs (Part 1)

17 Nov 2025

by Syed Sibtain, System Analyst

A small but important note before we start:

This architecture was built by an incredible team of engineers who deeply understood Kafka, Rails, warehouse operations, and the legacy systems we were integrating with.

I was fortunate to be part of that team — contributing to design discussions, writing some of the consumers, building sync flows, and helping operate it in production.

This blog is my way of documenting and appreciating the architecture that our team built, and sharing the lessons I learned from working on it.

There are some jargon-heavy sections in this post, so I’ve added a vocabulary section at the end to help with the concepts.

Introduction

Every distributed system eventually faces the same pain: messages that retry endlessly, payloads that poison queues, and duplicate writes that quietly corrupt data.

A warehouse management system (WMS) we were building for a client faced similar pressures.

Business data flowed continuously between SAP, the WMS (Rails application), and a legacy Microsoft SQL Server 2009 database. As integrations expanded and throughput increased, the system needed a foundation that could move events reliably while keeping both sides perfectly in sync — even under heavy operational load.

Why We Chose This Architecture

Our team didn't adopt Kafka reactively.

The architecture was designed before development began, because the shape of the problem was already clear:

High-volume inbound purchase orders and stock transfers
Real-time receiving, Quality Control, Putaway, Armada operations
Bi-directional sync between SAP and WMS
Frequent updates to 20+ warehouse tables and other business entities
Zero tolerance for data loss or out-of-order writes

Our Technology Stack

To solve this, we standardized on the following components:

Redpanda/Kafka → the backbone for distributed events
Karafka → Rails-native producers and consumers
Dead-letter queues (DLQs) → safety net for corrupted messages and retries
Karafka Web UI → observability, monitoring, and replay during incidents

What This Post Covers

This post isn't a tutorial — it's a field story.

A walkthrough of how Kafka is operated from Rails in real production, how the sync-in and sync-out pipelines were set up, how the legacy adapter integrates into the flow, and how DLQs saved hours during operational incidents.

Note: If you need the basics first, start here: Real-Time Event Streaming in Rails with Apache Kafka and Karafka.

The Real Problem We Were Trying to Solve

Our team didn't adopt Kafka because we wanted "event-driven architecture." We adopted it because the warehouse had no room for delays or retry-loop chaos.

The Complexity of Warehouse Operations

Even a single inbound flow (Purchase Order → Armada Operation → Receive → Quality Control → Putaway → Verify and Close) touches multiple business domains and must be:

Idempotent — Each step must be idempotent
Ordered — Events must be processed in order
Captured — Failures must be captured without blocking operations
Conflict-free — Database writes must not conflict or duplicate

Rails background jobs weren't enough.

If SQL Server lagged or SAP sent malformed data, Sidekiq would stall the entire pipeline. We needed a system that:

Absorbed bursts of traffic without breaking
Isolated failures so one bad message didn't stop everything
Retried safely without creating duplicates
Guaranteed order within a Purchase Order
Didn't overwhelm SQL Server with sudden spikes
Let us replay anything we wanted
Scaled horizontally with multiple consumers

Our Solution

That led us to:

Kafka/Redpanda for ingestion and buffering
Karafka for operating the pipeline inside Rails
DLQs for safety and observability
Karafka Web UI for debugging in minutes instead of hours

Our Architecture at a High Level

Before building anything, we made a deliberate decision: Rails would be the operational brain, Kafka would be the event backbone, and both SAP and the legacy system would integrate only through Kafka.

To achieve this, we built two independent pipelines:

Sync-In (External Systems → Rails)
Sync-Out (Rails → Legacy SQL Server)

Both run through Kafka, both use Karafka, and both solve different business problems.

3.1 Sync-In — How Data Enters Rails (And Why It Comes From Two Places)

A lot of people assume Sync-In means "SAP sends us data."

But real-world systems are never that simple.

In our case, we had two independent sources pushing data into Kafka.

A) SAP (IDoc-style structured payloads)

SAP sent large IDoc-style payloads containing:

SyncName
DocumentNumber
SyncData

From SAP we consumed:

purchase_order_docs → Purchase orders
delivery_docs → Delivery documents
vendor_general → Vendor information
vendor_address → Vendor locations

These messages followed a strict contract and required dry-validation before processing.

B) Our Metadata Adapter (lean, domain-focused payloads)

The second source wasn't SAP at all—it was our internal metadata adapter, responsible for keeping warehouse reference data up to date.

These events were small, clean hashes that mapped directly to domain models. We needed this metadata for our warehouse operations to function:

Business units — required during receiving to identify which business unit the goods belonged to
Units of measure (UOM) — needed to specify the quantity we were receiving (e.g., pallets, cases, pieces)
Putaway zones — required to determine where received goods should be stored
Armada types — required when scheduling vehicles for warehouse operations
LPN prefix codes — needed to generate license plate numbers for tracking pallets and containers
Lottable validation — required to validate lot numbers during receiving and quality control
Docks — needed to schedule and manage dock assignments for incoming and outgoing shipments
Dock operational hours — required to know when docks are available for scheduling
Warehouses — needed to identify which warehouse facility operations are running in
Locations — required to track where inventory is stored within the warehouse
Quality check types — needed to determine what quality control inspections are required for received goods

SAP didn't own this metadata—yet our warehouse operations depended on it.

Why Two Sources?

Because they solved two different problems:

SAP → provided transactional documents (Purchase Orders, Delivery Documents, Vendor Information)
Metadata adapter → provided all operational data (Zones, Docks, Locations, Quality Control Types, etc.)

The WMS cannot run Receiving, Quality Control, Armada operations, or Putaway unless this metadata exists. Both sources were essential for our warehouse operations to function.

Sync-In Routing in Karafka

Rails consumes from both sources using Karafka:

class KarafkaApp < Karafka::App
  setup do |config|
    config.kafka = { 'bootstrap.servers': ENV.fetch("KAFKA_SERVERS") }
    config.client_id = "wms_rails_app"
  end

  routes.draw do
    # SAP events (IDoc-style)
    topic :purchase_order_docs do
      consumer PurchaseOrderConsumer
    end

    topic :delivery_docs do
      consumer DeliveryDocConsumer
    end

    topic :vendor_general do
      consumer VendorConsumer
    end

    topic :vendor_address do
      consumer VendorAddressConsumer
    end

    # Metadata events (lean domain payloads)
    topic :business_unit do
      consumer BusinessUnitConsumer
    end

    topic :uom do
      consumer UOMConsumer
    end

    topic :putaway_zone do
      consumer PutawayZoneConsumer
    end

    topic :armada_type do
      consumer ArmadaTypeConsumer
    end

    topic :dock do
      consumer DockConsumer
    end

    topic :location do
      consumer LocationConsumer
    end

    # ...other operational metadata consumers
  end
end

Consumer Responsibilities

Each consumer:

Validates — ensures payload structure matches contract
Transforms — converts external format to internal domain models
Writes to Postgres — persists data to Rails database

The Result

Once data lands in Rails, every warehouse operation runs from there:

Dock scheduling
Armada operations
Receiving
Quality Control
Putaway
Adjustments

Rails becomes the operational source of truth—and once Rails owns that truth, the next step is syncing it back into the legacy SQL Server system.

Continue reading: Operating Kafka in Rails with Karafka: Production Architecture, Consumers, and DLQs (Part 2)

Vocabulary

This post uses several technical terms and warehouse-specific jargon. Here's a quick reference:

Event Streaming & Messaging

Kafka/Redpanda — Distributed event streaming platforms that act as a message broker. They store events in topics and allow multiple consumers to read from them independently.
Karafka — A Ruby framework that provides Kafka integration for Rails applications. It handles producers (publishers) and consumers (subscribers) of Kafka messages.
Topic — A category or feed name in Kafka where messages are published. Think of it as a channel for specific types of events (e.g., purchase_order_docs, business_unit).
Producer — A component that publishes messages to Kafka topics. In our case, external systems (SAP, metadata adapter) act as producers.
Consumer — A component that reads and processes messages from Kafka topics. Our Karafka consumers validate, transform, and write data to databases.

Data Flow & Architecture

Sync-In — The pipeline that brings data from external systems (SAP, metadata adapter) into Rails. Data flows: External Systems → Kafka → Rails.
Adapter — A service that translates between different systems. Our metadata adapter provides warehouse reference data.
Contract Validation — Using a schema validation library (like dry-validation) to ensure message payloads match expected structure before processing. Prevents malformed data from causing errors downstream.

Warehouse Operations

Inbound Operations — Operations that bring goods into the warehouse, such as receiving, quality control, and putaway.
Bi-directional Sync — The process of syncing data between two systems, such as SAP and the warehouse management system.
WMS (Warehouse Management System) — Software that manages warehouse operations including receiving, storage, picking, and shipping.
SAP — Enterprise resource planning (ERP) software that manages business processes. In our case, it sent purchase orders and delivery documents.
Receiving — The process of accepting incoming goods into the warehouse, verifying quantities, and recording them in the system.
Quality Control (QC) — Inspection and testing of received goods to ensure they meet specifications before being put into storage.
Putaway — The process of moving received goods from the receiving area to their storage locations in the warehouse.
Armada Operations — Vehicle/fleet management operations for trucks and drivers entering and leaving the warehouse facility.
LPN (License Plate Number) — A unique identifier assigned to a pallet or container in the warehouse. Used to track inventory locations.

Technical Concepts

Idempotent — An operation that can be safely repeated multiple times with the same result. Critical for handling retries without creating duplicates.
Horizontal Scaling — Adding more consumer instances to handle increased load, rather than making a single instance more powerful.
IDoc (Intermediate Document) — SAP's standard format for exchanging business documents between systems. Contains structured data with SyncName, DocumentNumber, and SyncData fields.
Metadata — Reference data that describes other data. In our system, this includes zones, docks, locations, and quality check types that warehouse operations depend on.

Follow us