Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Team,
We have Spark cluster where we process data and apply business logic using PySpark.
This cluster is secure and no pull request is allowed. We are allowed to create the data and share it outside world.
We have requirement to share data to QlickSence for data visualisation.
Now is there any way to convert the data stored in PySpark Dataframe to QVD files, Any python library that can be used to create .qvd files.
Also is there any connector that can be use to share data from Spark Cluster to QS server or QS location?
Please let me if you need more information.
to add: Final_df contains all the data i would to send over to QS server/storage location in QVD file.
Thanks a lot @Scotchy for detailing out.
My search also narrowed down to the steps you mentioned.
Let me propose this to my stake holders.
QlikView Data (QVD) files are a proprietary format used by Qlik to store data for use with QlikView and Qlik Sense applications. There is no direct method or official Python library provided by Qlik to create QVD files from a PySpark DataFrame. However, there are alternative methods to share data from a Spark cluster to Qlik Sense:
CSV or Parquet Files:
dataframe.write.csv('path')
or dataframe.write.parquet('path')
methods in PySpark to save your data.Direct Database Write:
dataframe.write.jdbc(url, table, properties)
method.Qlik REST Connector:
Qlik Web Connectors:
Custom Connectors:
Qlik Data Transfer:
Qlik Replicate (formerly Attunity):
For any of these methods, you will need to ensure that your Spark cluster's security policies allow for the appropriate method of data transfer. Always confirm that the approach you choose complies with your organization's data governance and security standards.
Thanks a lot @Scotchy
Its helpful and detail information.
I will suggest and try these solution and will share which one worked for me.
In My environment only below options are possible. Let me explore.
CSV or Parquet Files:
Qlik Data Transfer:
On other hand is it "qlikconnect" we have to use to share data over QS?
Installation: You first install Qlik DataTransfer on a local machine that can access the data sources you want to transfer data from.
Configuration: Once installed, you configure Qlik DataTransfer to connect to your local data sources. These sources can be files (like CSV, Excel), databases, or even folder paths where files will be dropped.
Are you thinking of the python library ? qlikconnect
Yes, I was thinking of python library qlikconnect
I am not sure if we can use this to send data to QS.
Seems like its for read only like can be use to do things like fetch qlik charts data, evaluate your expression.
Do you have any Idea.
As this solution will be for Prod automation job I am pretty sure I would not be allowed for Qlik DataTransfer.
so its like there will be one production job that will take care of fetching data from source using PySpark ( Final_df contains all the data ) and send over to QS server/storage location in QVD file formate.
As you correctly mentioned, such libraries or connectors are typically used to interact with Qlik Sense APIs for operations like fetching data, not for pushing data to Qlik Sense.
To automate the process of transferring data from a PySpark DataFrame to a Qlik Sense environment, you would generally follow these steps:
Data Export: Export the PySpark DataFrame to a format that is compatible with Qlik Sense. This could be CSV, Excel, or other file formats that Qlik Sense can read.
File Transfer: Move the exported file to a location accessible by Qlik Sense. If you're using Qlik Sense Enterprise on Windows, this could be a network share. For Qlik Sense SaaS, you may need to upload the file to a cloud storage service that Qlik Sense can connect to, like Amazon S3 or Google Cloud Storage.
Qlik Sense Data Load: Use Qlik Sense data load scripts to import the data from the files into Qlik Sense applications.
Unfortunately, there is no direct method to convert a DataFrame to a QVD file in Python, as the QVD format is proprietary to Qlik. QVD files are typically generated by Qlik Sense or QlikView as part of the data load and transformation process.
For your production automation job, you would need to:
Here's a basic example of how you might export a DataFrame to CSV in PySpark:
python
final_df.write.csv(path='path_to_export_directory', header=True, mode='overwrite')
And here's a simple example of a Qlik Sense load script that would read from a CSV file:
qlik
LOAD
column1,
column2,
column3
FROM [path_to_csv_file]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq);
This load script would be part of a Qlik Sense app and would run within the Qlik Sense environment, not from an external Python script.
If you're looking for a more direct integration between a Spark cluster and Qlik Sense without using intermediate file storage, you would typically look into utilizing APIs or connectors, but this would require a read-compatible interface on the Spark cluster side.
Thanks a lot @Scotchy for detailing out.
My search also narrowed down to the steps you mentioned.
Let me propose this to my stake holders.