Skip to main content
Announcements
Save $600 on Qlik Connect registration! Sign up by Dec. 6 to get an extra $100 off with code CYBERSAVE: REGISTER
damienedwards
Employee
Employee

Discover Qlik Talend Cloud Data integration Pipelines with Databricks AI functions

Qlik Talend Cloud enables you to build data pipelines that capture data from numerous sources – including streaming and traditional data sources - and transform it to feed data lakes, lake houses or data warehouses. Adding AI capabilities to their data lakehouses is at the top of many IT organizations’ priority list because these capabilities represent a savings of time and manual effort for data consumers. Databricks is an extremely popular and versatile data lake platform, based on Delta tables, for which Qlik provides out-of-the-box seamless data integration solutions. With Qlik Talend Cloud’s advanced smart data pipeline execution capabilities, changes at the source can be automatically applied throughout the pipeline to Databricks Delta tables. Together, Qlik and Databricks provide a platform that will help customers leverage complex analytic capabilities throughout their data lifecycle. 

 

Introducing Databricks AI SQL functionality 

Databricks AI SQL functions enhance traditional SQL with advanced capabilities for predictive(??) data analysis and transformation. The SQL functions integrate machine learning models directly into SQL queries, which allow users to perform complex operations, predictions and analytics with SQL. These include, for instance, built-in support for model inference, and seamless application of AI models on data stored within Databrick's Delta tables.  Some of these functions can be leveraged for executing complex tasks such as sentiment analysis, grammar correction, language translation, data summarization, and data masking. The AI functions streamline the process of directly integrating AI into data stored within the Databricks platform. 

How Databricks AI SQL functions work with Qlik Talend Cloud 

 

Below is a depiction of a data flow for Qlik Talend Cloud ingesting data into Databricks and generating SQL with Databricks AI SQL functions.  Qlik Talend Cloud (QTC) Data Integration transformations generate SQL within the transformation flow, which is executed on the Databricks platform resulting in Databricks Delta tables storing the results of the AI functions. 

damienedwards_0-1730403140880.png

 

 

Qlik offers a no-code solution to use the Databricks AI functions. Your organization can utilize Qlik Talend Cloud for real-time CDC data ingestion, data transformation, data quality, and governance, while feeding data into your Databricks Lakehouse. Within the data pipeline, you can utilize the processors for AI functions by simply dragging and dropping them on the transformation flow canvas. 

Get Started with Databricks AI SQL functions with Qlik Talend Cloud 

 

Setting up and running Databricks AI functions 

In Qlik Talend Cloud you can use AI SQL functions within transformation tasks within a data-pipeline project. Transformations can be leveraged whether you use Qlik Talend Cloud Data Integration for onboarding data or any other tool to ingest data into Databricks. 

 

Below is an example of a typical Qlik Talend Cloud Data Integration pipeline. It consumes data from multiple sources and transforms that data into analytics-ready structures such as SQL-derived data views and automated data marts. 

 

damienedwards_1-1730403140882.png

Within the data pipeline, transformation workflows can be accessed by creating a transformation data task. 

damienedwards_2-1730403140884.png

 

 

The transform view provides an interface to map onboarded source data to the target dataset. To build a transformation, select the source datasets and add a transformation flow. 

 

damienedwards_0-1730405120732.png

 

 

From within the transformation flow, AI processors can be dragged to the canvas to create a data flow (see below). 

 

damienedwards_15-1730403432498.png

 

 

With the new AI processor selected, properties of the Databricks AI functions can be configured using the properties box on the right edge of the screenshot below. Databricks function names can be selected with the column as input for the function. Output column names can be selected. (Some of the Databricks AI functions will have additional input parameters required by the function.) 

In the example below, we are going to illustrate using the AI processor, that will call the ai_summarize SQL function to generate a summary for product categories in a table within a transformation workflow 

The screenshot shows how the Databricks Summarize AI function can be configured. The Summarize Databricks AI function will use AI to summarize the items found within that categorization on a single line of descriptive text. This will help downstream data consumers understand the data within a specific table –- in this case  product categories -- without having to spend time and effort on additional research and / or writing additional queries. 

damienedwards_1-1730405170150.png

 

 

Qlik Talend Cloud will generate the SQL to be executed downstream by the Databricks SQL Warehouse. 

damienedwards_8-1730403140893.png

 

If Data preview is enabled in the Qlik Talend Cloud tenant. A sample of data results will be shown in the canvas. The DESCRIPTION_SUMMARY column will show the results of the AI_SUMMARIZE function. 

damienedwards_16-1730403489626.png

 

The transformation flow will show an output dataset with a primary key. The dataset will be created in Databricks once the task is prepared and ready for data to be loaded. 

damienedwards_11-1730403140896.png

 

After completion of running the task, Databricks Data will be loaded with the results of the AI function.  (Results of the function are shown in the DESCRIPTION_SUMMARY column.) 

damienedwards_17-1730403572710.png

 

Conclusion 

Databricks AI functions can be leveraged for use today in Qlik Talend Cloud to provide AI capabilities to data directly using transformation flows without writing code. However, AI SQL functions can still be leveraged in custom code written within your transformation task. Utilizing Qlik to build your data pipeline can reduce the complex nuances of integrating AI capabilities with your data by utilizing a graphical interface for implementation of your transformation flow. This will help organizations quickly adapt to harnessing the power of AI for problem-solving with the Databricks platform. 

Tags (3)