Solved: Re: Help! concatenation of qvds by day into one - Page 2 - Qlik Community

ioannagr · ‎2021-05-26

Hello guys!

I have made qvds by day like qvd_01.01.2000 to qvd_24.05.2021 so as to handle a big table.

Now I want to concatenate them into one.

I tried

LOAD *

FROM [$(vPathQVD)qvd_*.qvd]
(qvd);

Yet it returns 0 rows.

(Qlik stored them in folders in a very weird way in the path, like first it did the days then the months, then the year

for example, qvd_01 -> qvd_01.01, qvd_01.02, etc (first day of all months ) -> qvd_01.01.2000, qvd_01.01.2001, qvd_01.01.2002 etc ,

qvd_02 -> qvd_02.01, qvd_02.02, etc (second day of all months ) -> qvd_01.01.2000, qvd_01.01.2001, qvd_01.01.2002 etc)

Help please?

marcus_sommer · ‎2021-05-27

I never handle files and folders in the way of DD.MM.YYYY or using folders for daily/monthly data else creating YYYYMM-files in folders which run over all data or at least several years - often is only reason to divide folders a different data-structure within the files to avoid extra logics to check them and react differently in any way - just to keep my logics as simple as possible. IMO all other ways to structure the data within the file-system has more disadvantages as benefits. But if you now have such structure and you could handle it there is no need to change it - it was just a suggestion.

To loop over multiple folder you may add the following to the call-statement:

for each vFolder in 'folderpath1', folderpath2', 'folderpath3'
Call ScanFolder('$(vFolder)') ;
next

I don't know if it's directly worked with lib-connection - by using the legacy mode it should be in each case.

- Marcus

ioannagr · ‎2021-05-27

@marcus_sommer i though of creating month & year wise qvds, but the hardware available could only do daily ones. But should you ever need daily ones, how would you store them in order not to have this chaos of subfolders?

(never knew qlik would do such a complex storing when i didnt ask for it ! I was waiting for a surely long list of qvds by date that i would easily call by qvd_* wildcard )

marcus_sommer · ‎2021-05-27

I don't think that Qlik has a complex storing else it's quite simple. If you got your mentioned structure it is probably caused from the kind of variables which you have created to load + filter + store all the data.

I'm not absolutely sure how I would handle daily data but I would tend to store them all within a single folder - why distributing them on a daily/monthly/yearly level? A direct file-access is rather seldom and with an appropriate name-pattern the sorting/filtering within the file-manager is quite easy. Reading the folder-content with functions like filelist() is very fast so that it could be neglected.

Beside this if the hardware was to weak to create YYYYMM.qvd's it will be probably become also difficult to add all of the daily files into a big one. Maybe you re-view your origin creating process again if not any unsuitable logics did prevent a better performance.

- Marcus

ioannagr · ‎2021-05-27

@marcus_sommer Hereby is my code for these qvds by day that followed to this weird storing:

NoConcatenate
TempDate:
load Distinct date(floor(Mydate),'DD/MM/YYYY') as Date;

SQL SELECT Mydate
FROM Table1;

LET NumRows=NoOfRows('TempDate');
trace $(NumRows);
For n=1 To $(NumRows)

let vDate = Peek('Date',$(n)-1,'TempDate');
trace $(vDate);
TempSize:
NoConcatenate

LOAD
id,
size,
date(floor(Mydate),'DD/MM/YYYY') as DateOfInterest;

SQL select
tab1.`id`,tab1.`Mydate`,
tab2.`fk`, tab2.`size`
from

Table1 tab1

inner join
Table2 tab2
on tab1.`id`=tab2.`fk`
where date_format(tab1.`Mydate`,"%d/%m/%Y")='$(vDate)';

TotalSize:
LOAD
id,
sum(size) as Total
resident TempSize
group by id;

store TotalSize into [$(vPathQVD)/TotalSize_$(vDate).qvd];
DROP TABLE TempSize;
DROP TABLE TotalSize;

next n;
drop table TempDate;

Can't see how this storing in the path happened... Can you see something due to experience that i don't?

marcus_sommer · ‎2021-05-27

All the files should go into one folder because the path-variable $(vPathQVD) isn't touched within the loading/storing-loop. To use a different name-pattern for the files you could just created another variable $(vDate2) to store the files.

Loading + aggregation of just two columns should be also on a YYYYMM level possible without stressing the hardware too much. If there aren't many values for the field size you should consider if an aggregation is really suitable? If there isn't a rate of at least 1:3 I wouldn't do it.

- Marcus

ioannagr · ‎2021-05-27

@marcus_sommer they did all go into $(vPathQVD) but stored themselves in this strange way in it.

Can you please elaborate on how the script should look with this $(vDate2)?

I'm not sure if it's really clear to me how would it serve me (By that I undestand that the storing way is due to format DD/MM/YYYY.)

As for the rate, yes it is almost always 1:15-20. Any other way than date level, took days and eventually failed. And now i have them daily but can't concatenate them, heeeeeeeeeeeeelp 😓

marcus_sommer · ‎2021-05-28

You may add it in this way:

...
let vDate = Peek('Date',$(n)-1,'TempDate');
let vDate2 = date('$(vDate)', 'YYYYMMDD');
...
store TotalSize into [$(vPathQVD)/TotalSize_$(vDate2).qvd];
...

To aggregate anything within the load is one of the most heavy measures which could be done within the script. This relates to the CPU as well as to the consumed RAM and should be measured and controlled during the run-time of the loads. Especially if there isn't enough RAM available for this task it could slowdown very significantly.

Your applied approach of slicing the buckets smaller is of course possible but often not the only measures and not always the most suitable one. In each case you should know which tools/tasks/ processes run in parallel and how many resources do they consume and from your task the same + how many records/field-values are loaded - and how many available resources are left. By any kind of performance issues to monitor the hardware resources should be the first measure.

Beside this and already hinted an aggregation of the data may not always be useful - especially if it's not reduce the number of the distinct field-values which may happens in your case by aggregating values against a rather granular ID. Because of the special storing-logic from Qlik - only distinct field-values in own system-tables and using bit-stuffed pointer to the data-tables - you may reduce with your aggregation the number of records in a rate from 1:20 to 5% but the needed RAM may only drop to 90% and calculating with them in the UI may also give rather small and maybe unnoticed performance benefits from 10%.

IMO there is no general rule on which dataset and with which requirements to show which views which approach would return the best results - but aggregating the data should be rather the last measurement to ensure a certain degree of performance and not applied in a prophylactic manner.

- Marcus

ioannagr · ‎2021-05-28

Dear @marcus_sommer , this solves the problem I had before about reading from folders and subfolders, so thank you very much. Couldn't know that my format affects the way data are being organised! Thank you so much. 😊

As for the rest, some things I haven't worked out yet, so I'm learning each day and still don't know the best practices. Learning day by day! I'm thankful for this community a lot.

Thank you!