Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Amit_B
Creator II
Creator II

Reducing Load Time while Adding a Flag

Hi,

I have a data table from a car rental company. The table contains around 60 million records showing the status of each vehicle on each day (not every vehicle appears every day).

I'm trying to add a flag field that marks the first occurrence of each vehicle with a 1.

I tried doing this using Min(Date), and also by sorting and using Previous(). The result is correct, but the load time is very long (over an hour).

Are there any other ways to add this flag without such a long load time?

Thanks!

Labels (4)
8 Replies
ramazanerduran
Partner - Contributor III
Partner - Contributor III

Hi @Amit_B,

 

If the entire data is not in a changing/updating structure, I recommend that you do these operations by dividing your data into batches.

For example, if the data of the last month changes, you do this operation once for the data other than the first month and save it as a QVD. Then, you do the same operation quickly for the data of the first month and concatenate it with the old data.

Or, instead of saving it with QVD, you run the old data once in a new application and leave it as a data model and combine it with the new data you have processed with a binary load.

 

Sincerely,

dmnas
Contributor II
Contributor II


@Amit_B wrote:

Hi, daisy management reviews

I have a data table from a car rental company. The table contains around 60 million records showing the status of each vehicle on each day (not every vehicle appears every day).

I'm trying to add a flag field that marks the first occurrence of each vehicle with a 1.

I tried doing this using Min(Date), and also by sorting and using Previous(). The result is correct, but the load time is very long (over an hour).

Are there any other ways to add this flag without such a long load time?

Thanks!


ChatGPT said:

Use Qlik’s FirstSortedValue() with a pre-aggregated table or apply RowNo() over a sorted resident table grouped by VehicleID—it's faster than Previous() or full joins and helps reduce load time.

dmnas
Contributor II
Contributor II

Thanks for sharing. It really helpful for me. Cabinet Company


Reinstatement contractor Singapore
MyCryptoParadise Review

marcus_sommer

I think this are already the main-approaches to get this kind of information. But there may some potential in the way how these approaches are implemented - respectively how these measurements could be included within an incremental logic. Because aggregations and also interrecord-functions are in general heavy and against 60 M of records it must take some time. I could imagine doing some thing like this:

m:
mapping load ID & '|min|' & date(min(date)), 1 from X group by ID;
mapping load ID & '|max|' & date(max(date)), -1 from X group by ID;

and then a call like:

applymap('m', ID& '|min|' & Date, applymap('m', ID& '|max|' & Date, 0)) as Flag

and in scenarios in which a numbering and/or any kind of offsets between the records is needed then applying interrecord-function (important - referencing the current table and not the resident one).

Amit_B
Creator II
Creator II
Author

Hi,

I'm using binary loading then loading data from the DB, that won't help me in the case I mentioned (from time to time, I do a full load from the DB).

I'd be happy to know if you have a way this could help me.

Amit_B
Creator II
Creator II
Author

Hi,

I tried to do it but it didn't help, and the loading time even increased.

Can you add an example so I can see that I did it correctly?

marcus_sommer

Each incremental approach has no one-way-logic else needs at least two steps - the initial load and then incremental ones. It's even quite normal that there more branches within the load- and/or the task-chains to switch between the initial load and then conditionally for full/partial/incremental loads whereby the full/partial ones could run within other time-frames.  

RafaelBarrios
Partner - Specialist
Partner - Specialist

Hi @Amit_B 

you can try using a join, something like this

 

Main_Table:
Load
    num(DateField) as DateNumValue,  //convert date to num to make it a bit faster
    *
FROM <where ever you are pulling from>
 
//load from previous tables with only one record per car with min date
left join(Main_Table)
Load
    min(DateNumValue) as DateNumValue,
    CarCode,
    1 as flagField
RESIDENT Main_Table
GROUP BY CarCode;
 
Result will look like this one
RafaelBarrios_0-1748204020271.png

in my case, only car 90 and 100 apeard a couple of times

for sure, the join will be faster than using previous or a for/while loop.

 

Hope this helps.

BEST,

help users find answers! Don't forget to mark a solution that worked for you & to smash the like button!