Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a data table from a car rental company. The table contains around 60 million records showing the status of each vehicle on each day (not every vehicle appears every day).
I'm trying to add a flag field that marks the first occurrence of each vehicle with a 1.
I tried doing this using Min(Date), and also by sorting and using Previous(). The result is correct, but the load time is very long (over an hour).
Are there any other ways to add this flag without such a long load time?
Thanks!
Hi @Amit_B,
If the entire data is not in a changing/updating structure, I recommend that you do these operations by dividing your data into batches.
For example, if the data of the last month changes, you do this operation once for the data other than the first month and save it as a QVD. Then, you do the same operation quickly for the data of the first month and concatenate it with the old data.
Or, instead of saving it with QVD, you run the old data once in a new application and leave it as a data model and combine it with the new data you have processed with a binary load.
Sincerely,
@Amit_B wrote:
I have a data table from a car rental company. The table contains around 60 million records showing the status of each vehicle on each day (not every vehicle appears every day).
I'm trying to add a flag field that marks the first occurrence of each vehicle with a 1.
I tried doing this using Min(Date), and also by sorting and using Previous(). The result is correct, but the load time is very long (over an hour).
Are there any other ways to add this flag without such a long load time?
Thanks!
Use Qlik’s FirstSortedValue()
with a pre-aggregated table or apply RowNo()
over a sorted resident table grouped by VehicleID—it's faster than Previous()
or full joins and helps reduce load time.
Thanks for sharing. It really helpful for me. Cabinet Company
I think this are already the main-approaches to get this kind of information. But there may some potential in the way how these approaches are implemented - respectively how these measurements could be included within an incremental logic. Because aggregations and also interrecord-functions are in general heavy and against 60 M of records it must take some time. I could imagine doing some thing like this:
m:
mapping load ID & '|min|' & date(min(date)), 1 from X group by ID;
mapping load ID & '|max|' & date(max(date)), -1 from X group by ID;
and then a call like:
applymap('m', ID& '|min|' & Date, applymap('m', ID& '|max|' & Date, 0)) as Flag
and in scenarios in which a numbering and/or any kind of offsets between the records is needed then applying interrecord-function (important - referencing the current table and not the resident one).
Hi,
I'm using binary loading then loading data from the DB, that won't help me in the case I mentioned (from time to time, I do a full load from the DB).
I'd be happy to know if you have a way this could help me.
Hi,
I tried to do it but it didn't help, and the loading time even increased.
Can you add an example so I can see that I did it correctly?
Each incremental approach has no one-way-logic else needs at least two steps - the initial load and then incremental ones. It's even quite normal that there more branches within the load- and/or the task-chains to switch between the initial load and then conditionally for full/partial/incremental loads whereby the full/partial ones could run within other time-frames.
Hi @Amit_B
you can try using a join, something like this
in my case, only car 90 and 100 apeard a couple of times
for sure, the join will be faster than using previous or a for/while loop.
Hope this helps.
BEST,
help users find answers! Don't forget to mark a solution that worked for you & to smash the like button!