Loading Only Unique Data From a Folder

gorkemakinci · ‎2023-11-24

Hello there,

I do have a workflow which generates a .qvx that runs daily. When it first runs it includes all groups and it also controls if a group is complete with this workflow and note it down if it is. To sum it up let's say I do have 3 groups A, B and C. A's data are complete but B and C's are not. When it first runs it generates all data and note it down that A is complete, on the 2nd run since it noted down that A was complete it doesn't run the data of A but it does the same thing for B and C which it did a day ago and so on.

After all I have this folder which has lots of .qvx's that looks like this:

Groups	ID	data
A	1	a
A	2	aa
B	1	b
B	2	bb
C	1	c
C	2	cc
C	3	ccc

(first qvx)

Groups	ID	data
B	1	b
B	2	bb
C	1	c
C	2	cc
C	3	ccc

(second qvx)

When I load this data to Qlik Sense Enterprise of course I do want it to be as fast as possible, And right now it loads all qvx files completely to the table. I do want to learn if there's any possibility to load from a folder and specify it to not load the same data twice (Distinct didn't work for me maybe I missed sth), or else can I specify some Group and ID combinations to be load once or something like that. Every help will be appreciated, I'm ready to try new things out, thanks in advance!

Ahidhar · ‎2023-11-24

try using incremental load

vincent_ardiet_ · ‎2023-11-24

You could maybe create a first loop of files with (for each .. filelist...) storing the paths and the times (using FileTime).
You create an empty table FilterGroups containing the field Groups (with autogenerate 0 for example).
Then, you loop on this list from the newest file to the oldest and you load data renaming Groups to LoadedGroups and filtering on Not Exists (Groups).
On each iteration, you enrich FilterGroups with the distinct LoadedGroups you just loaded (renamed to Groups of course).
This should allow you to keep optimized loads.

gorkemakinci · ‎2023-11-27

This makes perfect sense, I just wasn't able to write the script of it. Is there an example of it or some sort of small scaled version somewhere which I can follow it like a guide to help me out?

Parthiban · ‎2023-11-27

You can create a key of those fields(like concatenate main fields) in first qvx and then use where not exists(here also use same fields like used in first qvx ) in second qvx. This makes duplicate data will not be loaded in second qvx data file.

gorkemakinci · ‎2023-11-27

That is my target, but I have multiple qvx files in a folder and I really need the most optimized script on this one. I'm trying to write @vincent_ardiet_ 's script, hopefully I'll manage to do so.

Aasir · ‎2023-11-27

// Load data from all .qvx files in the folder
Files:
LOAD
FileBaseName() as FileName
FROM [lib://YourFolderPath/*.qvx];

// Loop through each file and load unique data
For Each FileName in FieldValueList('FileName')
Load *
From [lib://YourFolderPath/$(FileName).qvx]
Where NOT Exists(Groups & ID); // Check if the combination of Groups and ID doesn't exist in the previously loaded data
Next;

// Concatenate the loaded data into a single table
Concatenate
LOAD * Resident Files;

// Drop the temporary Files table
Drop Table Files;

gorkemakinci · ‎2023-11-27

I did not have any previously loaded data, should I load the oldest file in the folder first and then use this script, would it work that way?

Parthiban · ‎2023-11-27

Above script are in loop. That means 1st loaded file are previously loaded file, then compare second load file with first loaded file suppose duplicate will happens means automatically remove with the help of where not exists. This script will work perfectly.

Ahidhar · ‎2023-11-27

assuming if we take above two .qvx files try this

Data:
load
Distinct "Groups",
ID,data,key;
load
"Groups",
ID,
data,
autonumber("Groups"&ID) as key
FROM [lib://DataFiles/*.qvx]
(qvx);

Data Load Editor

Developers

General Question

QlikWorld 2023

Script