How is data processed through Qlik Stitch?

Greg_Taffer · Dec 9, 2025 12:55:50 AM

We often see new users implement Qlik Stitch in under five minutes. While the user experience for adding a data source and a destination is simple, there is a lot of complexity behind the scenes.

Let’s pull the curtains back and see how each record gets from its source to its destination through Qlik Stitch’s data pipeline.

The journey typically begins in a SaaS application or database, where it’s either pushed to Stitch via our API or through a webhook, or pulled on a schedule by the Singer-based replication engine that Qlik Stitch runs against data sources like APIs, databases, and flat files.

The image below outlines the internal architecture. For a full view, click on the image or download it.

stitch internal architecture.png

The data’s next stop is the Import API, a Clojure web service that accepts JSON and Transit, either point-at-a-time or in large batches. The Import API does a validation check and an authentication check on the request’s API token before writing the data to a central Apache Kafka queue.

At this point, Qlik Stitch has accepted the data. Qlik's system is architected to meet our most important service-level target: don’t lose data. To meet this goal, we replicate our Kafka cluster across three different data centers and require each data point to be written to two of them before it is accepted. Should the write fail, the requester will try again until it’s successful.

Under normal conditions, data is read off the queue seconds later by the streamery, a multithreaded Clojure application that writes the data to files on S3 in batches, separated according to the database tables the data is destined for. We have the capacity to retain data in Kafka for multiple days to ensure nothing is lost in the event that downstream processing is delayed. The streamery cuts batches after reaching either a memory limit or an amount of time elapsed since the last batch. Its low-latency design aims to maximize throughput while guarding against data loss or data leaking between data sets.

Batches that have been written to S3 enter the spool, which is a queue of work waiting to be processed by one of our loaders. These Clojure applications read data from S3 and do any processing necessary (such as converting data into the appropriate data types and structure for the destination) before loading the data into the customer’s data warehouse. We currently have loaders for Redshift, Postgres, BigQuery, Snowflake, and S3. Each is a separate codebase and runtime because of the variation in preprocessing steps required for each destination. Operating them separately also allows each to scale and fail independently, which is important when one of the cloud-based destinations has downtime or undergoes maintenance.

All of this infrastructure allows us to process more than a billion records per day, and allows our customers to scale their data volumes up or down by more than 100X at any time. Qlik Stitch customers don’t need to worry about any of this, however. They just connect a source, connect a destination, and then let Qlik Stitch worry about making sure the data ends up where it needs to be.

For additional information, reference the official Stitch documentation on Getting Started.

Environment

Qlik Stitch

How is data processed through Qlik Stitch?

How is data processed through Qlik Stitch?

Environment

Configuration

Performance