Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Generative Artificial Intelligence (GenAI) and related applications have exploded into the tech scene over the last couple of years. While the technology shows great promise, building data pipelines that leverage customers structured and unstructured data is a challenging and high effort integration activity.
Qlik Talend Cloud (QTC) Knowledge Mart data capabilities enable customers to simplify and accelerate the work needed to have their data flowing to Large Language Model (LLM) Retrieval Augmented Generation (RAG) based GenAI applications. In this blog we’ll cover this exciting new capability to simplify using your data with GenAI applications.
Watch a DEMO of this capability HERE!
Background – GenAI, LLM, RAG, Vector stores
Before diving into how QTC Knowledge Mart data capabilities assist, leveraging automation, in having enterprise data be made available seamlessly to RAG based GenAI applications, lets outline the technologies involved and the complexities found when building GenAI applications from scratch.
RAG is a method of implementing GenAI applications that ground the LLM with the data context that the LLM must use when answering a query. It is used in conjunction with LLMs to both avoid the need to train an LLM on customer specific data and limit the scope of the data the LLM will use to answer questions posted to it. While LLM based chat interfaces, such as ChatGPT, are the most readily recognizable element of a GenAI application, there are several precursor technologies and processes that need to be selected and integrated, typically with complex code-based methods.
Anatomy of a RAG based solution
A typical RAG based GenAI solution contains the following components and process flow.
To service a query from the user the RAG Application or Chat bot on enterprise data, the enterprise data needs to be loaded into a vector store with appropriate LLM embeddings. An LLM embedding refers to a vector representation of text (such as a word, sentence, or document) generated by a LLM like GPT, BERT, or other advanced models. The purpose of embeddings is to capture the semantic meaning of the text in a way that allows the model to perform various tasks, such as similarity search, classification, or language generation, more efficiently. An embedding is a high-dimensional numerical vector that represents a piece of data (like words or sentences) in such a way that semantically similar pieces of data are closer together in the vector space. This allows models to process and compare pieces of text effectively.
This vector is then passed to the LLM along with the text of the user query for the LLM to then use as the context against which the embeddings generated from the user query text to generate the response back to the user.
RAG based solution technology components
For this process to work, several technology decisions and integrations need to be made in advance.
All of this together paints the following picture of the required integration.
An implementation of this solution requires large amounts of effort scripting/coding and specialized knowledge. As we’ll see next, Qlik Talend Cloud automates most of the integration and only requires configuration and selections of the technology to be utilized.
Qlik Talend Cloud – Knowledge Marts
Qlik Talend Cloud (QTC) is purpose built to simplify and accelerate the implementation of RAG based GenAI data integration pipelines by using a no code approach. Let’s cover each of the features and how they leverage automation to enable this capability in detail.
Data source connectivity
QTC offers no-code connectivity to hundreds of data sources, including enterprise systems, mainframes, SAP, databases, and SaaS applications. It offers efficient, zero footprint, and minimal impact near real-time log based Changed Data Capture (CDC) or incremental API to only send data and changes once, without the need to reload the same data over and over, from source to target. The intuitive interface allows for an easy implementation of this connectivity and movement process, as shown below.
More information available on the following link on qlik.com
Data preparation/transformation
Once the data is in the target cloud platform the next step is to prepare it for vectorization. This entails creating derived data sets with the appropriate field and record joining and filtering that feed the relevant bits of data for the LLM to use. QTC offers multi-modal transformation design experience ranging from no code Transformation Flows to pro-code GenAI assisted query crafting. Learn more about these feature on Qlik Community blog and online guide
Data modeling
Once the necessary data sets have been generated, we then define relationship metadata between data sets. This allows for the subsequent Knowledge Mart step to recognize the potential building blocks for the document to prepare and store in the Vector DB.
Knowledge Marts and Vector DB/LLM integration
The data to be vectorized needs to go through a process of parsing, chunking, embedding, and indexing. Structured data (from tables and columns) needs to be converted to document format prior to these steps. QTC shines in this area with an intuitive interface for determining the elements to include in the document. To get an idea with an example of the level of effort that QTC Knowledge Mart Tasks automate, for just one point solution LLM and Vector store integration, please refer to the following article.
We can store vectors in either:
3. Specify the LLM connection. This connection and specified models will be used for both creating the embeddings for storing the document data in the Vector DB and also to power the completions of the chat interface available to the implementer to test the LLM. The options here depend on the prior choice of Vector DB.
Note: This interface is intended for the Knowledge Mart data implementer to test the integration of the data and processing components (LLM, Vector DB, etc.). It’s not intended to be and end user chat interface.
The completed pipeline would look like the following
Conclusion – Accelerating your GenAI journey
GenAI offers new and exciting capabilities to interact with data. Building the workflow that combines all the data sources, processing, and technologies typically entails a large effort. QTC accelerates enterprise GenAI implementations and allows for a faster time to value at a lower effort and cost than otherwise.
Whether using automatic ingestion of data from structured or unstructured sources, transformation into required data sets, the creation of a vector record with appropriate LLM embeddings, or the testing of chat answers, QTC lowers the barrier of entry and adoption to deliver RAG based GenAI solutions on your data. Reach out to your account team today to take advantage of this groundbreaking functionality.
Watch a DEMO of this capability HERE!
NOTE: Initial GA release (July 8th 2025) supports Snowflake/Cortex, OpenAI, Azure OpenAI, Amazon Bedrock, Elasticsearch, OpenSearch, and Pinecone. Support for other platforms mentioned will come in subsequent releases.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.