Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
jeevansalunke
Contributor II
Contributor II

Creating QVD file from PySpark dataframe

Hi Team,

We have Spark cluster where we process data and apply business logic using PySpark. 

This cluster is secure and no pull request is allowed. We are allowed to create the data and share it outside world. 

We have requirement to share data to QlickSence for data visualisation.

Now is there any way to convert the data stored in PySpark Dataframe to QVD files, Any python library that can be used to create .qvd files.

Also is there any connector that can be use to share data from Spark Cluster to QS server or QS location? 

Please let me if you need more information. 

to add: Final_df contains all the data i would to send over to QS server/storage location in QVD file. 

 

Labels (2)
1 Solution

Accepted Solutions
jeevansalunke
Contributor II
Contributor II
Author

Thanks a lot @Scotchy  for detailing out. 

My search also narrowed down to the steps you mentioned. 

  • Export the DataFrame to a CSV or another supported file format using PySpark's DataFrame writer.
  • Transfer the file to a location that your Qlik Sense environment can access.
  • Create a Qlik Sense data load script that reads from this file and loads the data into Qlik Sense, which can be triggered after the file transfer is complete.

Let me propose this to my stake holders. 

 

View solution in original post

6 Replies
Scotchy
Partner - Creator
Partner - Creator

QlikView Data (QVD) files are a proprietary format used by Qlik to store data for use with QlikView and Qlik Sense applications. There is no direct method or official Python library provided by Qlik to create QVD files from a PySpark DataFrame. However, there are alternative methods to share data from a Spark cluster to Qlik Sense:

  1. CSV or Parquet Files:

    • You can write the PySpark DataFrame to CSV or Parquet files, which Qlik Sense can easily import.
    • Use the dataframe.write.csv('path') or dataframe.write.parquet('path') methods in PySpark to save your data.
  2. Direct Database Write:

    • If Qlik Sense can connect to a database, you can write your DataFrame to a database table using JDBC, and then connect Qlik Sense to this database.
    • PySpark can write to databases using the dataframe.write.jdbc(url, table, properties) method.
  3. Qlik REST Connector:

    • If you have a web service interface to access your Spark cluster data, you can use the Qlik REST Connector to pull data into Qlik Sense from your service.
  4. Qlik Web Connectors:

    • Depending on where you store your output data from Spark, you might be able to use Qlik Web Connectors to connect to various web data sources and services.
  5. Custom Connectors:

    • Develop a custom connector using the Qlik Connectors SDK if the data needs to be retrieved in real-time or if there is a specific requirement that the existing connectors do not meet.
  6. Qlik Data Transfer:

    • Qlik DataTransfer is a utility that securely transfers on-premises data into Qlik Sense SaaS editions. If you can export your data to accessible on-premises storage, this utility can automate the transfer to Qlik Sense.
  7. Qlik Replicate (formerly Attunity):

    • Qlik Replicate can move data in real-time from various sources to many targets, and it might support transferring data from your environment into a Qlik-friendly format.

For any of these methods, you will need to ensure that your Spark cluster's security policies allow for the appropriate method of data transfer. Always confirm that the approach you choose complies with your organization's data governance and security standards.

jeevansalunke
Contributor II
Contributor II
Author

Thanks a lot @Scotchy 

Its helpful and detail information. 

I will suggest and try these solution and will share which one worked for me. 

In My environment only below options are possible. Let me explore. 

CSV or Parquet Files:

Qlik Data Transfer:

On other hand is it "qlikconnect" we have to use to share data over QS? 

Scotchy
Partner - Creator
Partner - Creator

  1. Installation: You first install Qlik DataTransfer on a local machine that can access the data sources you want to transfer data from.

  2. Configuration: Once installed, you configure Qlik DataTransfer to connect to your local data sources. These sources can be files (like CSV, Excel), databases, or even folder paths where files will be dropped.

Are you thinking of the python library ?  qlikconnect 

jeevansalunke
Contributor II
Contributor II
Author

Yes, I was thinking of python library   qlikconnect 

I am not sure if we can use this to send data to QS.

Seems like its for read only like can be use to do things like fetch qlik charts data, evaluate your expression.

Do you have any Idea. 

 

As this solution will be for Prod automation job I am pretty sure I would not be allowed for Qlik DataTransfer.  

so its like there will be one production job that will take care of fetching data from source using PySpark ( Final_df contains all the data ) and send over to QS server/storage location in QVD file formate.  

Scotchy
Partner - Creator
Partner - Creator

 

As you correctly mentioned, such libraries or connectors are typically used to interact with Qlik Sense APIs for operations like fetching data, not for pushing data to Qlik Sense.

To automate the process of transferring data from a PySpark DataFrame to a Qlik Sense environment, you would generally follow these steps:

  1. Data Export: Export the PySpark DataFrame to a format that is compatible with Qlik Sense. This could be CSV, Excel, or other file formats that Qlik Sense can read.

  2. File Transfer: Move the exported file to a location accessible by Qlik Sense. If you're using Qlik Sense Enterprise on Windows, this could be a network share. For Qlik Sense SaaS, you may need to upload the file to a cloud storage service that Qlik Sense can connect to, like Amazon S3 or Google Cloud Storage.

  3. Qlik Sense Data Load: Use Qlik Sense data load scripts to import the data from the files into Qlik Sense applications.

Unfortunately, there is no direct method to convert a DataFrame to a QVD file in Python, as the QVD format is proprietary to Qlik. QVD files are typically generated by Qlik Sense or QlikView as part of the data load and transformation process.

For your production automation job, you would need to:

  • Export the DataFrame to a CSV or another supported file format using PySpark's DataFrame writer.
  • Transfer the file to a location that your Qlik Sense environment can access.
  • Create a Qlik Sense data load script that reads from this file and loads the data into Qlik Sense, which can be triggered after the file transfer is complete.

Here's a basic example of how you might export a DataFrame to CSV in PySpark:

python

final_df.write.csv(path='path_to_export_directory', header=True, mode='overwrite')

 

 

And here's a simple example of a Qlik Sense load script that would read from a CSV file:

qlik

LOAD column1, column2, column3 FROM [path_to_csv_file] (txt, codepage is 1252, embedded labels, delimiter is ',', msq);

 

This load script would be part of a Qlik Sense app and would run within the Qlik Sense environment, not from an external Python script.

If you're looking for a more direct integration between a Spark cluster and Qlik Sense without using intermediate file storage, you would typically look into utilizing APIs or connectors, but this would require a read-compatible interface on the Spark cluster side.

jeevansalunke
Contributor II
Contributor II
Author

Thanks a lot @Scotchy  for detailing out. 

My search also narrowed down to the steps you mentioned. 

  • Export the DataFrame to a CSV or another supported file format using PySpark's DataFrame writer.
  • Transfer the file to a location that your Qlik Sense environment can access.
  • Create a Qlik Sense data load script that reads from this file and loads the data into Qlik Sense, which can be triggered after the file transfer is complete.

Let me propose this to my stake holders.