Qlik Community

Qlik Sense Advanced Authoring

Discussion board where members can learn more about Qlik Sense Advanced Authoring.

Announcements
See why Qlik is recognized as a Leader for the 10th year in a row – and discover how Qlik can help put your business in the lead. Get Report
Highlighted
Partner
Partner

Loading only the necessary data in script

Hi all,

I get a large daily file with no date stamp in it. It has 20 fields in it and roughly 500,000 lines everyday. It is an extract of the state of the data for that particular date in time.

Example:

ID            Total Remaining                    Area

1               $100                                          Site 1
2               $75                                            Site 1
3               $60                                            Site 2

Tomorrow I might get another file that looks like this:

ID            Total Remaining                    Area Owed

1               $100                                          Site 1
2               $40                                            Site 1
3               $20                                            Site 2

Where the information for ID 1 did not change, and the "Total Remaining" changed for the other 2 IDs.
The ID field is an identifier, all other fields are descriptive of the ID and can change. 

I'd like to create a trend overtime, so I understand that I could just add a made up field each day that is Date() as "Loaded Date" to trend it. However, because the files are so large and I need to load ~5 years worth of daily files, it gets extremely large. I was wondering if there was a way to somehow only load data if it is different, but if it was the same, somehow mark that it is the same as the day before for the "loaded date". Or if there was another solution to reducing the size of the load, I'd love to hear it.

 

9 Replies
Highlighted
Luminary
Luminary

Re: Loading only the necessary data in script

Hello! It would be a little involved at the beginning, but you could create a QVD of all of the initial files, and create an ID out of ID&'-'&Total Remaining&'-'&Area.  once this is created, you can load this qvd first each day, then load the remaining files with a "Where Not Exists(NEW ID FIELD) in the load statement.  this will bring in only values that are not yet represented. This could also incorporate your add date if needed. Once its all loaded, you overwrite the original big qvd for the next day's reload. 

Highlighted
Partner
Partner

Re: Loading only the necessary data in script

Would I still be able to trend via that load date this way? 

Highlighted
Partner
Partner

Re: Loading only the necessary data in script

I don't understand how that would accommodate the date without creating full separate instances of things?

Highlighted
Partner
Partner

Re: Loading only the necessary data in script

Hi Melissa,

Do you really need it to be on ID level? Or can you group it on site level? This would reduce the amount of rows sufficiently.

Jordy

Climber

Work smarter, not harder
Highlighted
Partner
Partner

Re: Loading only the necessary data in script

Hi Jordy,

I unfortunately do really need it on an ID level.

Highlighted
Partner
Partner

Re: Loading only the necessary data in script

Hi Melissa,

That means that you need all the data, so you can't aggregate. Another solution would be to replace all the values in your columns with integers and create dimensions.

Example:

Old:
ID            Total Remaining                    Area Owed
1               $100                                          Site 1
2               $40                                            Site 1
3               $20                                            Site 2

Turn into:

ID            Total Remaining                    %KeyArea
1               $100                                         1
2               $40                                          1
3               $20                                          2

AreaTable
Load
  %KeyArea
  Area Owed
From [YourSource]  (qvd)
;

This way you reduce the amount of characters in your data and will make it faster.

Jordy

Climber

Work smarter, not harder
Highlighted
Partner
Partner

Re: Loading only the necessary data in script

Thanks, Jordy. I'll try that today!

Highlighted
Partner
Partner

Re: Loading only the necessary data in script

@jheasleythat way the data would not load if the value ever changes back to what it was in past.

eg:

Day1

ID Total Remaining Area

1           100                      1

Day2

ID Total Remaining Area

1            80                      1

Day3

ID Total Remaining Area

1           100                      1

Day3 data would not load because there would already be id 1-100-1

 

Luminary
Luminary

Re: Loading only the necessary data in script

Agreed - but it would if you included the date as part of the ID.