Qlik Community

Catalog and Lineage

Discussion Board for collaboration around Catalog and Lineage.

Announcements
June 21, 10AM ET: Q&A with Qlik, Live Chat! Qlik Lineage for your Data Dynasty. REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
akshaye_c_navale

How to Archive & Delete Entity based on time frame i.e. last 180 days partitions

Hi Community,

In Qlik Data Catalyst, we are using REGISTER entity in our Qlik Data Catalyst environment, however Prepare creates target entity as MANAGED entity. But as per our security policy we want keep only last 180 days partition within Receiving directory for that entity and older than 180 days partition need to move in specific backup location before deleting those partitions from Receiving directory dock. Also after deletion of partition we want to delete metadata associated with those partitions.

Do you know how i can achieve this ? or is it already available within QDC application?

Thanks,
Akshaye

Labels (3)
4 Replies
Christopher_Ortega

What you are describing is possible to achieve via the API's, meaning you can create a process with the API's that manages this process.  There isn't a built in function within Qlik Catalog to automatically do this.

akshaye_c_navale
Author

Thanks for reply.

Can you please provide details which APIs i need to use to achieve this?

JitenderR
Employee
Employee

Hi @akshaye_c_navale  What is the version of QDC you have? Below steps for multi node environment, should be applicable to single node as well. 

Below steps to get you started on the scripting using API's. All these entities can be referred via API Documentation section of the UI. Below screenshot giving how to get to the API documentation. 

1. Capture details of all External Entities only. Say you capture the details as Source_Name|Entity_Name|Entity_id. Use the GET /entity/v1/getEntities API

2. Using the entity ID, capture the list of load logs for each entity. use the API GET /entity/v1/loadLogs/{entityIds} API. Sort the output to get the partition information older than 180 days. use the deliverId from the JSON that will give you the partition value. 

3. Delete the data you don't need for a specific partition using the PUT /entity/v1/dataLoadCleanUp/{bitMask}/{workOrderIds} API. Per your requirement you would want to use bitmask as 15 for this API. From step#2, all the partition you identified to be older than 180 days can be given as input to this API one at a time. 

4. Repeat the steps 2 through 4 for all entity ID's. 

5. At step#2, using the information returned you should be able to construct the HDFS path of receiving folder for the specific entity/partition and then use a distcp command to copy the contents of that HDFS folder into another HDFS location. 

Your first time run should run longer depending on how much old your data is. Schedule this script outside if business to avoid any conflicts. 

Wishing you the best. If you do create a solution successfully, please upload it here and it will help others in the community. Lastly, if you are scripting in Linux, you can refer to the below solution developed by one of our clients. Note that it is not supported by Qlik, but i am sure will give you a very good start to start API scripting. 

Good Luck!! & please let us know for any additional questions. 

https://github.com/maurice1408/pd_shell

Regards

JR 

JitenderR_0-1597182111855.png

 

akshaye_c_navale
Author

Hi @JitenderR ,

Thanks for your detail explanation.

Currently we are using Single Node environment & our QDC environment is on June 2020 SR2.

Will try to implement as per steps mention by you .

Thanks,
Akshaye