Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Qlik Talend Cloud enables you to build data pipelines that capture data from numerous sources – including streaming and traditional data sources - and transform it to feed data lakes, lake houses or data warehouses. Adding AI capabilities to their data lakehouses is at the top of many IT organizations’ priority list because these capabilities represent a savings of time and manual effort for data consumers. Databricks is an extremely popular and versatile data lake platform, based on Delta tables, for which Qlik provides out-of-the-box seamless data integration solutions. With Qlik Talend Cloud’s advanced smart data pipeline execution capabilities, changes at the source can be automatically applied throughout the pipeline to Databricks Delta tables. Together, Qlik and Databricks provide a platform that will help customers leverage complex analytic capabilities throughout their data lifecycle.
Introducing Databricks AI SQL functionality
Databricks AI SQL functions enhance traditional SQL with advanced capabilities for predictive(??) data analysis and transformation. The SQL functions integrate machine learning models directly into SQL queries, which allow users to perform complex operations, predictions and analytics with SQL. These include, for instance, built-in support for model inference, and seamless application of AI models on data stored within Databrick's Delta tables. Some of these functions can be leveraged for executing complex tasks such as sentiment analysis, grammar correction, language translation, data summarization, and data masking. The AI functions streamline the process of directly integrating AI into data stored within the Databricks platform.
How Databricks AI SQL functions work with Qlik Talend Cloud
Below is a depiction of a data flow for Qlik Talend Cloud ingesting data into Databricks and generating SQL with Databricks AI SQL functions. Qlik Talend Cloud (QTC) Data Integration transformations generate SQL within the transformation flow, which is executed on the Databricks platform resulting in Databricks Delta tables storing the results of the AI functions.
Qlik offers a no-code solution to use the Databricks AI functions. Your organization can utilize Qlik Talend Cloud for real-time CDC data ingestion, data transformation, data quality, and governance, while feeding data into your Databricks Lakehouse. Within the data pipeline, you can utilize the processors for AI functions by simply dragging and dropping them on the transformation flow canvas.
Get Started with Databricks AI SQL functions with Qlik Talend Cloud
Setting up and running Databricks AI functions
In Qlik Talend Cloud you can use AI SQL functions within transformation tasks within a data-pipeline project. Transformations can be leveraged whether you use Qlik Talend Cloud Data Integration for onboarding data or any other tool to ingest data into Databricks.
Below is an example of a typical Qlik Talend Cloud Data Integration pipeline. It consumes data from multiple sources and transforms that data into analytics-ready structures such as SQL-derived data views and automated data marts.
Within the data pipeline, transformation workflows can be accessed by creating a transformation data task.
The transform view provides an interface to map onboarded source data to the target dataset. To build a transformation, select the source datasets and add a transformation flow.
From within the transformation flow, AI processors can be dragged to the canvas to create a data flow (see below).
With the new AI processor selected, properties of the Databricks AI functions can be configured using the properties box on the right edge of the screenshot below. Databricks function names can be selected with the column as input for the function. Output column names can be selected. (Some of the Databricks AI functions will have additional input parameters required by the function.)
In the example below, we are going to illustrate using the AI processor, that will call the ai_summarize SQL function to generate a summary for product categories in a table within a transformation workflow
The screenshot shows how the Databricks Summarize AI function can be configured. The Summarize Databricks AI function will use AI to summarize the items found within that categorization on a single line of descriptive text. This will help downstream data consumers understand the data within a specific table –- in this case product categories -- without having to spend time and effort on additional research and / or writing additional queries.
Qlik Talend Cloud will generate the SQL to be executed downstream by the Databricks SQL Warehouse.
If Data preview is enabled in the Qlik Talend Cloud tenant. A sample of data results will be shown in the canvas. The DESCRIPTION_SUMMARY column will show the results of the AI_SUMMARIZE function.
The transformation flow will show an output dataset with a primary key. The dataset will be created in Databricks once the task is prepared and ready for data to be loaded.
After completion of running the task, Databricks Data will be loaded with the results of the AI function. (Results of the function are shown in the DESCRIPTION_SUMMARY column.)
Conclusion
Databricks AI functions can be leveraged for use today in Qlik Talend Cloud to provide AI capabilities to data directly using transformation flows without writing code. However, AI SQL functions can still be leveraged in custom code written within your transformation task. Utilizing Qlik to build your data pipeline can reduce the complex nuances of integrating AI capabilities with your data by utilizing a graphical interface for implementation of your transformation flow. This will help organizations quickly adapt to harnessing the power of AI for problem-solving with the Databricks platform.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.