Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
stevelord
Specialist
Specialist

Functional question, concatenating dozens of tables?

Hi, I have a request for a single date selection solution for a dashboard with many date fields specific to the kinds of activities those date fields came from.  End users are also fine having dates with no activity factored into averages and such, so if exerciselog has data for 6 of the 7 calendar days, they want those no activity days to deflate the average.

My understanding is that concatenating many tables as much as possible is less demanding than link table when there are millions of records involved.  Short story is I have tens of millions of rows, dozens of columns, and maybe 15 tables, and I'm worried concatenating it all into one super table might crash something.  If the short answer is 'yes, concatenate like crazy and change their date field to the same field name", I'll go for it.   I thought I'd run this idea by the community before spending much time implementing it and crashing things.   Much more detail is below.

I have these basic data tables:

Clients - basic info about clients, links to offices on client id

Offices - basic info about offices in clients, connects to clients on client id, to users on office id

Users - contains user id and basic data about users, links to offices on office id

From there I have these activity tables

exerciselog

sleeplog

foodlog

(and many more)

The activity tables all connect back to users on user id, and their fields are all unique to them (for instance exerciselog has exercisedate and sleeplog has sleepdate).

Master calendar table I'm going to throw in:

Temp: 

Load 

               min(NUM(ActivityDate)) as minDate, 

               max(NUM(ActivityDate)) as maxDate 

Resident AllData; 

 

Let varMinDate = Num(Peek('minDate', 0, 'Temp')); 

Let varMaxDate = Num(Peek('maxDate', 0, 'Temp')); 

DROP Table Temp; 

 

TempCalendar: 

LOAD 

               $(varMinDate) + Iterno()-1 As Num, 

               Date($(varMinDate) + IterNo() - 1) as TempDate 

               AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate); 

 

MasterCalendar: 

Load 

               Date(TempDate) As FileDate, 

               Year(TempDate) As FileYear, 

               Month(TempDate) As FileMonth

Resident TempCalendar 

Order By TempDate ASC; 

Drop Table TempCalendar;

My idea is this:

1> change all the activity-specific date fields to be 'activitydate' on all of those tables

2> stick the word concatenate in front of all of the loads after the first table (I'd give up the ability to locate dimensions by table name on the object wizards, but the entire rest of the dashboard is finished so not too worried)

3> make a master calendar that links  to above supertable with the activitydate field and the standard run of year(activitydate) as activityyear, month(activitydate) as activity activity month, etcetera.

Will anything explode or cause me problems down the road?  I've done this with smaller data sets, but rather fear doing it for millions of records populating dozens of fields.

1 Solution

Accepted Solutions
ramoncova06
Specialist III
Specialist III

you should be fine doing it, QV will not explode

I would just add a distinct load to your script so your max and min dates are calculated faster

TempLoad:

Load distinct

               ActivityDate

Resident AllData;

Temp:

         load

             min(NUM(ActivityDate)) as minDate,

               max(NUM(ActivityDate)) as maxDate

resident TempLoad

Let varMinDate = Num(Peek('minDate', 0, 'Temp'));

Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));

DROP Tables Temp, TempLoad;

View solution in original post

8 Replies
ramoncova06
Specialist III
Specialist III

you should be fine doing it, QV will not explode

I would just add a distinct load to your script so your max and min dates are calculated faster

TempLoad:

Load distinct

               ActivityDate

Resident AllData;

Temp:

         load

             min(NUM(ActivityDate)) as minDate,

               max(NUM(ActivityDate)) as maxDate

resident TempLoad

Let varMinDate = Num(Peek('minDate', 0, 'Temp'));

Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));

DROP Tables Temp, TempLoad;

Anonymous
Not applicable

For best performance, load each table separately and use the STORE command to convert  them into QVD files in a separate qvw file typically referred to as a generator .  Then in a separate QVW which would be your dashboard concatenate the QVD files.

You can load a million rows under 20 seconds if they are in a unoptimized script and even faster if the script is optimized.

stevelord
Specialist
Specialist
Author

Thanks Ramon, load distinct is a nice touch.

Thanks Mike, These are being loaded from qvds that were built by earlier qlikviews.

We have exactly one qvw pulling everything from the databases (it's like a sql library unto itself), then it stores the results into various qvds of the appropriate kind and drops its tables when done.

From there we have a few qvws transforming some of the data to fit business needs that were identified later- a few very large crosstables and one merging a couple of other tables.

Dashboards are just pulling stuff from the qvds.  Most of the process happens overnight, but the qvds are there for us to access data on our server without burdening the database servers during business hours.

For reporting we were mostly fine answering one or two questions for large groups of people using separate date fields on each sheet related to that topic.

For a one-on-one consulting function, there was a need for a person to see everything from every topic about a single individual on one sheet.  Having many different date fields on that one sheet looked crazy, so we began the master calendar work.

Anonymous
Not applicable

If I'm working with many date fields I personally like to create a date link table.  For example if you have an open, reviewed, and close date in a table you can link them all with a common date field to a master calendar.

For Example:

Fact:
Load

     ID,

     BookDate,

     ReviewDate,

     CloseDate,

     #Amount

From Table;

LinkTable:

Load

     ID,

     'Booked' as DateType,

     BookDate as Date;

Load

     ID,

     'Reviewed' as DateType,

     ReviewDate as Date;

Load

     ID,

     'Close' as DateType,

     CloseDate as Date

Resident Fact;

Then proceed to create your MasterCalendar from the Date Field in the LinkTable.

Now in the Dashboard if you want to see the amount you booked vs the amount you closed you can simply write the formula Sum({<DateType = {'Booked'}>} #Amount ) and Sum({<DateType = {'Closed'}>} #Amount ) and use your Generic Master Calendar as the listbox for date selections.    

stevelord
Specialist
Specialist
Author

Thanks Mike.  I was also thinking ahead to a possible need to separate the dates specific to activities once we get to the other side of the master calendar.  This is a handy approach to mark those dates.

Edit: for the main activities, the fields with values we sum are actually named for the activity so easy to sum(fieldname) on that front.  I was planning to tie activitylogdate to the master calendar  This link table approach would likely be used to deal with the fact we have not only activitydate for the date the person logged activity, but createddate and modifieddate for when the entries were actually entered or changed.

stevelord
Specialist
Specialist
Author

Update: I may be discovering a limit on the size of the concatenated table after all.

Physical memory usage is at 93% and I've run it higher than that before, but it appears to have stopped between table 6 and 7 with a few more tables to go.  It may be at 11-12 million rows and I haven't counted how many dozens of fields.

I did go through and store all the tables that involved join or crosstable functions into qvds, then reload them plain for purpose of the master calendar concatenation.  And pretty sure all of the temporary tables are dropped.  (The only error attempt before this was where I forgot to update a reference to a table name for one of the tables I stored into a qvd.)

I had added 2 fields to each table, one referencing the date field i wanted and calling it ActivityDate, and the other with a note like 'Activity' as ActivityDateType, then concatenated the wazzoo out of them.  I may fall back on setting up the date and datetype link table- it may be the same millions of rows, but only few columns in the table, so maybe that will help.

PS> Nothing exploded.  It just stopped.  I used task manager to kill the process, but I had successfully saved before attempting that reload so all of the data was still there.  (I had to reorganize alot.  Now I just need to remove the concatenates and setup the link table.)

ramoncova06
Specialist III
Specialist III

I worked with a data set that had around 1 billion records before and I didn't have any issues on how much data it I concatenated into the fact table, I had some issues with the file size around 8 GB(this was corrected with QV 11.0.2, if you feel that you should still have plenty or RAM, check for any cartesian product

stevelord
Specialist
Specialist
Author

The qvw is around 360MB file size.  The physical memory it was reaching when concatenating tables was around 3.5GB.  My theory is it just hit some maximum number of columns wide and decided to stop.  It didn't come very close to hitting the physical memory capacity of the server.  I've done that before and just split things into smaller batches.  Also I've crashed the server once or twice with calculations that ran amok unexpectedly.  Here the load just got to a point far into the concatenates, a few tables from the end, and stopped doing anything for 10 minutes before I abandoned it.