Qlik Community

Suggest an Idea

Vote for your favorite Qlik product ideas and add your own suggestions.

Announcements
QlikWorld 2022, LIVE in Denver CO., May 16-19, 2022. REGISTER NOW TO RECEIVE EARLY BIRD PRICING

Qlik Replicate parquet output format for S3 endpoint

ResMed_LesB
Contributor II
Contributor II

Qlik Replicate parquet output format for S3 endpoint

Our Data Analytics, Data Science, and AI Factory teams all work with Parquet files as their preferred and current output formats.  We are a new Data Integrations team trying to automate unified data pipelines for these teams while also building a persistence layer.  Without the ability to deliver data to these teams in their preferred industry standard format directly from Qlik Replicate, we will not be able to use the product and be forced to find alternate technology to do that.  JSON and CSV is fine for some things, but not having parquet as an output directly is a blocker for us delivering data to our most important customers.  Is parquet output for S3 endpoints (and others) on the near term roadmap?

6 Comments
Nathan1
Contributor III
Contributor III

Hey there,

This is actually something that we've been discussing with Qlik as well.   The feedback we got though was that the process flow they've settled on is to push data into a write-optimised format and then process the deltas to Parquet using Compose.

From your side, what are the issues with the Replicate ---> Compose ---> S3-Parquet as opposed to being able to do Replicate --->S3-Parquet?  Is it a performance-related concern?

Shelley_Brennan
Employee
Employee

Thank you for the suggestion.  We would like to get feedback from others as well and will consider for a future release.  We will also need to consider performance aspects to having Replicate generate Parquet files.  Have you considered Compose as a solution here?  

Status changed to: Open - Collecting Feedback
Prabodh
Creator
Creator
ResMed_LesB
Contributor II
Contributor II

@Nathan1 , the orchestration components needed to use Compose for this is a little outside the scope of what our team does as strictly an Integration and Automation team.  We don't manage what consumers do downstream.  So for us, we don't have components like Databricks or other EMR solutions in our workspaces and at this point have no other need to manage these components.  I completely understand the perspective of going to a write-optimized format though.  

jjames
Partner
Partner

This is a feature that one of our customers has requested as well.

 

Thanks,

jjames

Shelley_Brennan
Employee
Employee

Support for Parquet file format is on the Replicate roadmap.

Status changed to: Open - On Roadmap