Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
janda
Contributor II
Contributor II

QCDI Daily snaphot of DB table/view?

Hello everybody,

is there a possibility to make daily snapshots of a view in database (PostgreSQL, in this case) using QCDI and store it in an AWS S3 bucket?

Right now I can schedule a reload, but that just overwrites existing file LOAD00000001.snappy.parquet. I would assume that there is a functionality that allows me to create LOAD00000002.snappy.parquet, LOAD00000003.snappy.parquet,... but I cannot find it.

Thanks in advance for any suggestions

Petr

Labels (3)
2 Replies
Dennisleon
Contributor
Contributor


@jandaHealthCare gov wrote:

Hello everybody,

is there a possibility to make daily snapshots of a view in database (PostgreSQL, in this case) using QCDI and store it in an AWS S3 bucket?

Right now I can schedule a reload, but that just overwrites existing file LOAD00000001.snappy.parquet. I would assume that there is a functionality that allows me to create LOAD00000002.snappy.parquet, LOAD00000003.snappy.parquet,... but I cannot find it.

Thanks in advance for any suggestions

Petr


Hello,

You're right, reloading the data using QCDI will overwrite the existing file. Here are a couple of approaches to achieve daily snapshots of your PostgreSQL view in S3 using QCDI:

Approach 1: Utilize S3 Versioning

Enable S3 Versioning: Enable versioning on the S3 bucket where you want to store the snapshots. This allows S3 to keep previous versions of the file even when you overwrite it.

QCDS Configuration: In your QCDI configuration file (.qcdprc), set the output_overwrite_mode parameter to append. This will instruct QCDI to append data to the existing file in S3. However, since the file is overwritten with new data each day, S3 versioning will automatically create a new version each time.

 

[qcd]
output_overwrite_mode = append

 

Approach 2: Daily Load with Unique Filenames

QCDS Script: Write a script using QCDI that incorporates the date in the output filename. This makes each snapshot unique and avoids overwriting previous ones.

Date in Filename: Use Python's datetime module or a similar library to generate the date and integrate it into the filename within your QCDI script.

Here's a basic Python code example (assuming your script is written in Python):

 

from datetime import datetime

# Generate unique filename with date
today = datetime.today().strftime("%Y-%m-%d")
filename = f"LOAD_{today}.snappy.parquet"

# Use filename in your qcdi.output.path configuration
qcdi.output.path = f"s3://your-bucket/{filename}"

# Perform your QCDI load here

 

By implementing one of these approaches, you can achieve daily snapshots of your PostgreSQL view in S3 using QCDI.

I hope the solution may help you. 

 

 

 

janda
Contributor II
Contributor II
Author

Hello,

thanks for the suggestion. The 2nd approach is exactly what i would like to do, unfortunately we are using SAAS Qlik with Data integration

janda_0-1718174028585.png

 

a inside of it, i cannot find a way to modify code of the pipeline. 

Petr