Operating Kafka in Rails with Karafka: Production Architecture, Consumers, and DLQs (Part 1)
by Syed Sibtain, System Analyst
A small but important note before we start:
This architecture was built by an incredible team of engineers who deeply understood Kafka, Rails, warehouse operations, and the legacy systems we were integrating with.
I was fortunate to be part of that team — contributing to design discussions, writing some of the consumers, building sync flows, and helping operate it in production.
This blog is my way of documenting and appreciating the architecture that our team built, and sharing the lessons I learned from working on it.
There are some jargon-heavy sections in this post, so I’ve added a vocabulary section at the end to help with the concepts.
Introduction
Every distributed system eventually faces the same pain: messages that retry endlessly, payloads that poison queues, and duplicate writes that quietly corrupt data.
A warehouse management system (WMS) we were building for a client faced similar pressures.
Business data flowed continuously between SAP, the WMS (Rails application), and a legacy Microsoft SQL Server 2009 database. As integrations expanded and throughput increased, the system needed a foundation that could move events reliably while keeping both sides perfectly in sync — even under heavy operational load.
Why We Chose This Architecture
Our team didn't adopt Kafka reactively.
The architecture was designed before development began, because the shape of the problem was already clear:
- High-volume inbound purchase orders and stock transfers
- Real-time receiving, Quality Control, Putaway, Armada operations
- Bi-directional sync between SAP and WMS
- Frequent updates to 20+ warehouse tables and other business entities
- Zero tolerance for data loss or out-of-order writes
Our Technology Stack
To solve this, we standardized on the following components:
- Redpanda/Kafka → the backbone for distributed events
- Karafka → Rails-native producers and consumers
- Dead-letter queues (DLQs) → safety net for corrupted messages and retries
- Karafka Web UI → observability, monitoring, and replay during incidents
What This Post Covers
This post isn't a tutorial — it's a field story.
A walkthrough of how Kafka is operated from Rails in real production, how the sync-in and sync-out pipelines were set up, how the legacy adapter integrates into the flow, and how DLQs saved hours during operational incidents.
Note: If you need the basics first, start here: Real-Time Event Streaming in Rails with Apache Kafka and Karafka.
The Real Problem We Were Trying to Solve
Our team didn't adopt Kafka because we wanted "event-driven architecture." We adopted it because the warehouse had no room for delays or retry-loop chaos.
The Complexity of Warehouse Operations
Even a single inbound flow (Purchase Order → Armada Operation → Receive → Quality Control → Putaway → Verify and Close) touches multiple business domains and must be:
- Idempotent — Each step must be idempotent
- Ordered — Events must be processed in order
- Captured — Failures must be captured without blocking operations
- Conflict-free — Database writes must not conflict or duplicate
Rails background jobs weren't enough.
If SQL Server lagged or SAP sent malformed data, Sidekiq would stall the entire pipeline. We needed a system that:
- Absorbed bursts of traffic without breaking
- Isolated failures so one bad message didn't stop everything
- Retried safely without creating duplicates
- Guaranteed order within a Purchase Order
- Didn't overwhelm SQL Server with sudden spikes
- Let us replay anything we wanted
- Scaled horizontally with multiple consumers
Our Solution
That led us to:
- Kafka/Redpanda for ingestion and buffering
- Karafka for operating the pipeline inside Rails
- DLQs for safety and observability
- Karafka Web UI for debugging in minutes instead of hours
Our Architecture at a High Level

Before building anything, we made a deliberate decision: Rails would be the operational brain, Kafka would be the event backbone, and both SAP and the legacy system would integrate only through Kafka.
To achieve this, we built two independent pipelines:
- Sync-In (External Systems → Rails)
- Sync-Out (Rails → Legacy SQL Server)
Both run through Kafka, both use Karafka, and both solve different business problems.
3.1 Sync-In — How Data Enters Rails (And Why It Comes From Two Places)
A lot of people assume Sync-In means "SAP sends us data."
But real-world systems are never that simple.
In our case, we had two independent sources pushing data into Kafka.
A) SAP (IDoc-style structured payloads)
SAP sent large IDoc-style payloads containing:
SyncNameDocumentNumberSyncData
From SAP we consumed:
purchase_order_docs→ Purchase ordersdelivery_docs→ Delivery documentsvendor_general→ Vendor informationvendor_address→ Vendor locations
These messages followed a strict contract and required dry-validation before processing.
B) Our Metadata Adapter (lean, domain-focused payloads)
The second source wasn't SAP at all—it was our internal metadata adapter, responsible for keeping warehouse reference data up to date.
These events were small, clean hashes that mapped directly to domain models. We needed this metadata for our warehouse operations to function:
- Business units — required during receiving to identify which business unit the goods belonged to
- Units of measure (UOM) — needed to specify the quantity we were receiving (e.g., pallets, cases, pieces)
- Putaway zones — required to determine where received goods should be stored
- Armada types — required when scheduling vehicles for warehouse operations
- LPN prefix codes — needed to generate license plate numbers for tracking pallets and containers
- Lottable validation — required to validate lot numbers during receiving and quality control
- Docks — needed to schedule and manage dock assignments for incoming and outgoing shipments
- Dock operational hours — required to know when docks are available for scheduling
- Warehouses — needed to identify which warehouse facility operations are running in
- Locations — required to track where inventory is stored within the warehouse
- Quality check types — needed to determine what quality control inspections are required for received goods
SAP didn't own this metadata—yet our warehouse operations depended on it.
Why Two Sources?
Because they solved two different problems:
- SAP → provided transactional documents (Purchase Orders, Delivery Documents, Vendor Information)
- Metadata adapter → provided all operational data (Zones, Docks, Locations, Quality Control Types, etc.)
The WMS cannot run Receiving, Quality Control, Armada operations, or Putaway unless this metadata exists. Both sources were essential for our warehouse operations to function.
Sync-In Routing in Karafka
Rails consumes from both sources using Karafka:
class KarafkaApp < Karafka::App
setup do |config|
config.kafka = { 'bootstrap.servers': ENV.fetch("KAFKA_SERVERS") }
config.client_id = "wms_rails_app"
end
routes.draw do
# SAP events (IDoc-style)
topic :purchase_order_docs do
consumer PurchaseOrderConsumer
end
topic :delivery_docs do
consumer DeliveryDocConsumer
end
topic :vendor_general do
consumer VendorConsumer
end
topic :vendor_address do
consumer VendorAddressConsumer
end
# Metadata events (lean domain payloads)
topic :business_unit do
consumer BusinessUnitConsumer
end
topic :uom do
consumer UOMConsumer
end
topic :putaway_zone do
consumer PutawayZoneConsumer
end
topic :armada_type do
consumer ArmadaTypeConsumer
end
topic :dock do
consumer DockConsumer
end
topic :location do
consumer LocationConsumer
end
# ...other operational metadata consumers
end
end
Consumer Responsibilities
Each consumer:
- Validates — ensures payload structure matches contract
- Transforms — converts external format to internal domain models
- Writes to Postgres — persists data to Rails database
The Result
Once data lands in Rails, every warehouse operation runs from there:
- Dock scheduling
- Armada operations
- Receiving
- Quality Control
- Putaway
- Adjustments
Rails becomes the operational source of truth—and once Rails owns that truth, the next step is syncing it back into the legacy SQL Server system.
Continue reading: Operating Kafka in Rails with Karafka: Production Architecture, Consumers, and DLQs (Part 2)
Vocabulary
This post uses several technical terms and warehouse-specific jargon. Here's a quick reference:
Event Streaming & Messaging
-
Kafka/Redpanda — Distributed event streaming platforms that act as a message broker. They store events in topics and allow multiple consumers to read from them independently.
-
Karafka — A Ruby framework that provides Kafka integration for Rails applications. It handles producers (publishers) and consumers (subscribers) of Kafka messages.
-
Topic — A category or feed name in Kafka where messages are published. Think of it as a channel for specific types of events (e.g.,
purchase_order_docs,business_unit). -
Producer — A component that publishes messages to Kafka topics. In our case, external systems (SAP, metadata adapter) act as producers.
-
Consumer — A component that reads and processes messages from Kafka topics. Our Karafka consumers validate, transform, and write data to databases.
Data Flow & Architecture
-
Sync-In — The pipeline that brings data from external systems (SAP, metadata adapter) into Rails. Data flows: External Systems → Kafka → Rails.
-
Adapter — A service that translates between different systems. Our metadata adapter provides warehouse reference data.
-
Contract Validation — Using a schema validation library (like dry-validation) to ensure message payloads match expected structure before processing. Prevents malformed data from causing errors downstream.
Warehouse Operations
-
Inbound Operations — Operations that bring goods into the warehouse, such as receiving, quality control, and putaway.
-
Bi-directional Sync — The process of syncing data between two systems, such as SAP and the warehouse management system.
-
WMS (Warehouse Management System) — Software that manages warehouse operations including receiving, storage, picking, and shipping.
-
SAP — Enterprise resource planning (ERP) software that manages business processes. In our case, it sent purchase orders and delivery documents.
-
Receiving — The process of accepting incoming goods into the warehouse, verifying quantities, and recording them in the system.
-
Quality Control (QC) — Inspection and testing of received goods to ensure they meet specifications before being put into storage.
-
Putaway — The process of moving received goods from the receiving area to their storage locations in the warehouse.
-
Armada Operations — Vehicle/fleet management operations for trucks and drivers entering and leaving the warehouse facility.
-
LPN (License Plate Number) — A unique identifier assigned to a pallet or container in the warehouse. Used to track inventory locations.
Technical Concepts
-
Idempotent — An operation that can be safely repeated multiple times with the same result. Critical for handling retries without creating duplicates.
-
Horizontal Scaling — Adding more consumer instances to handle increased load, rather than making a single instance more powerful.
-
IDoc (Intermediate Document) — SAP's standard format for exchanging business documents between systems. Contains structured data with
SyncName,DocumentNumber, andSyncDatafields. -
Metadata — Reference data that describes other data. In our system, this includes zones, docks, locations, and quality check types that warehouse operations depend on.