Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

How to populate a sparsely populated field

Sometimes when you load data into QlikView you find that a field is sparsely populated, i.e. it has discrete enumerable values where some values are missing.

 

JoinPeekTable1.pngIt could be like in the table to the right where you have three dates, each with some kind of conversion rate. The table only contains the dates where the conversion rate changed, not the dates between the changes.

 

However, the user will want to ask the question: “What was the status on this specific day?” In other words, the user wants to be able to click on a reference date to see the number that is associated with this date – but the date might not exist in the source data.

 

In such a situation, you need to generate the missing dates between the changes as individual records and use the “Rate” value from the previous date.

 

There are several ways to do this in QlikView, and all of them involve some script programming, using temporary tables. One algorithm is

 

  1. Load the source table containing the rates (below called “Rates”).
  2. Find largest and smallest date in the “Rates” table.
  3. Generate all dates between the largest and smallest dates (below called “Dates”)..
  4. Join the “Dates” table (outer join) onto the “Rates” table.
  5. Sort the resulting table according to date.
  6. Propagate the value of “Rate” downwards to all records that have NULL in the “Rate” field, using the Peek() function.

 

Visually, the join and peek steps of the algorithm look like this:

 

JoinPeekProcess100.png

 

In the QlikView script, the algorithm would look like the following:

 

     TempTable_Rates:
     Load Date, Rate From Rates ;

 

     MinMaxDate:
     Load Min (Date ) as MinDate , Max (Date ) as MaxDate resident TempTable_Rates;
     Let vMinDate = Peek('MinDate',-1,'MinMaxDate') - 1;
     Let vMaxDate = Peek('MaxDate',-1,'MinMaxDate')    ;

 

     Join (TempTable_Rates)

     Load Date(recno()+$(vMinDate)) as Date Autogenerate vMaxDate - vMinDate;

 

     Rates:

     NoConcatenate Load Date,
          If( IsNull( Rate ), Peek( Rate ), Rate ) as Rate
          Resident TempTable_Rates
          Order By Date ; /* so that above values can be propagated downwards */

 

     Drop Table MinMaxDate, TempTable_Rates;

 

Problem solved!

 

This method can be adapted for most situations when you need to insert additional records in existing data: Warehouse balances, Exchange rates, etc.

 

See more in the Technical Brief: Generating Missing Data In QlikView.

 

HIC

41 Comments
MVP
MVP

I like the use of the inline Peek here, not something I have tried before.

Something I have found is with large amounts of data that doing a Min and Max on the table to get the first and last dates can be time consuming in the load.  In these cases I will sometimes let the DB do the work and call a separate SQL query to give the Min and Max values rather than a Resident load.

Thanks for the ever interesting posts Henrich!

0 Likes
595 Views
Employee
Employee

Glad you like it. The inline Peek() is something I use quite often. It's also very useful for making accumulations in the script.

Concerning the Min and Max: I absolutely agree! If you have large amounts of data and a good DB, there is no reason to do anything else than an SQL query.

HIC

0 Likes
595 Views
mr_barriesmith
Contributor

Hi. If you have large volume Resident table and you want to get max or min then take a look at the field functions... it is hard to believe how fast this is.

fieldvalue() - QlikView is storing the field as a column of data and this function allows you to get to that column and fieldvaluecount() can get you to the max.

0 Likes
595 Views
MVP
MVP

Good post as usual Henric,

I've addressed this before using IntervalMatch() as in this post (and variations on the same theme). What I have not tested though is how does this performs agains large datasets.

Regards.

0 Likes
595 Views
jeffmartins
Contributor II

Hi,

I've had time consuming problems using resident load to get min and max dates and the fieldvalue() function saved me a lot of time.

Instead of using resident load I've created MinMaxDate table as follow:

MinMaxDate:

Load min(Date) as MinDate,

        max(Date) as MaxDate;

Load FieldValue('Date',IterNo()) as Date

autogenerate(1)

while not IsNull(FieldValue('Date',Iterno()));

Nice post HIC

0 Likes
595 Views
jeffmartins
Contributor II

Hi,

Does anyone know if there is a better way to get min and max dates ?

The previous MinMaxDate table was the fastest I got after doing some tests.

Regards

0 Likes
595 Views

Jeff,

I found the FieldValues() method to be the fastest. However, I've always done it like this:

LOAD max(FieldValue('Date', recno())) AUTOGENERATE FieldValueCount('Date');

I don't know if this is detectably faster or slower than your while loop.

If the field exists in more than one table, and you want the values from a specific table, of course you have to use to RESIDENT method. In the subroutine posted here http://community.qlik.com/message/313808#313808, I'm using RESIDENT because I can't be sure if the field appears in more than one table.

-Rob

595 Views
Not applicable

Hi Henric

Can we extend this table till today? Assuming today is 2013-01-12 and the rate has not changed since 2013-01-08

0 Likes
595 Views
Employee
Employee

Yes, just change the definition of vMaxDate, e.g:

Let vMaxDate = Num(Today())    ;

HIC

595 Views
Not applicable

Hello Henric,

I'm a beginner at Qlikview, but I had a basic question about your suggested algorithm.

I am specifically having difficulty with your temp table:

TempTable_Rates:
Load Date, Rate From Rates ;

When I've tried to replicate this table for my own purposes, I get the following error:

Cannot open file 'C:\Users\Steve Giardina\Desktop\Rates

Is there some syntax I'm missing to get my script to reference the Rates table at the end of the algorithm?

Thanks!

0 Likes
595 Views
gussfish
Contributor II

Steve,

the "FROM" keyword is for when you want to load from a file; to load

from an internal table, use "RESIDENT" i.e.

TempTable_Rates:

Load Date, Rate RESIDENT Rates ;

Kind Regards,

Angus.

"This e-mail and any attachments to it (the "Communication") is, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Australia and New Zealand Banking Group Limited ABN 11 005 357 522, or any of its related entities including ANZ Bank New Zealand Limited (together "ANZ"). ANZ does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication."

0 Likes
595 Views
Not applicable

Hi Henric,

Is the reverse population possible? I mean I have data as below (as of month end). I want to populate it on all month dates, for example, 01-01-2013 to 31-01-2013 has rate of 0.98

Date               rate

31-01-2013     0.98

28-02-2013     0.9776

31-03-2013     0.9766

0 Likes
595 Views
Employee
Employee

Absolutely! The above algorithm can be used. All you need to do is the following:

1. The ORDER BY should be descending, i.e.

... ORDER BY Date DESC ;

2. The definition of the field Rate should be made slightly differently, probably something like:

... If( Peek( Month(Date) )=Month(Date), Peek( Rate ), Rate ) as Rate

HIC

595 Views
Not applicable

Great Share Henric!!! 

-Yojas

0 Likes
595 Views
Not applicable

Very good step-by-step description of how to solve this commonly recurring problem! Thanks for sharing!

0 Likes
595 Views
sunilkumarqv
Valued Contributor II

What incase if multiple data i.e columns in database here you taken only rates. How to generate dates b/w them?

0 Likes
595 Views
Employee
Employee

Can you post an example of what your input table looks like?

HIC

0 Likes
595 Views
sunilkumarqv
Valued Contributor II

Date,DataID,ProdA,ProdB,SubProducts,Rate,SubValue,Revenu

31-03-2012,Pen,Soap,Lux,10,50

31-03-2013,Pen,Soap,Renolds,20,60

There what you defined Dates b/w Rate .But here we have no of columns with diff categoirs that not unique.

How can solve that ?

0 Likes
595 Views
Not applicable

Henric Cronström

Hi HIC

Can we use it to generate monthly straight line depreciation/amortization schedules? I mean if we give Start Date Field, End Date Field, Cost, Life in months.

Thanks in advance!

BR,

SAK

0 Likes
595 Views
swijnans
New Contributor III

Hi Henric,

I used your technical brief in an attempt to generate missing data on a multiple component table, but failed miserably ;-) I created a post. Would you be able to find the time to assist me on this matter? http://community.qlik.com/thread/128328

Regards, Suzan

0 Likes
595 Views
swijnans
New Contributor III

Hi Sunil.

I have a similar challenge. Did you solve it? http://community.qlik.com/thread/128328

Regards, Suzan

0 Likes
595 Views
sunilkumarqv
Valued Contributor II

Hi Suzan,

After several attempts. I'm able to crack that .

In My scenario :

as follows  synchronize this script into your scenario based

MinMaxTable:

  Load min(Date) as MinDate,

  Max(Date) as MaxDate

  resident Tablename;

  // floor converts to integer to strip time component

  Let vDateZero =floor(Num(peek('MinDate',0,'MinMaxTable'))) - 1 ;

  Let vLastDate = floor(Num(peek('MaxDate',0,'MinMaxTable'))) ;

  Let vFirstDate = floor(Num(peek('MinDate', 0, 'MinMaxDate'))) ;

  Drop Table MinMaxTable ;

Join(TableName)

  // This load runs 2nd to generate date components

  Load *,

  MonthEnd(Date) as MonthEndDate,

  AutoNumber(MonthEnd(Date)) as MonthEndID,

  year(Date) as Year,

  Month(Date) as Month;

 

  // This load runs 1st

  Load Date($(vDateZero) + recno()) as Date 

  Autogenerate $(vLastDate) - $(vDateZero);

PLS:

     NoConcatenate Load Date,

          If( IsNull( FieldName ), Peek( FieldName ), FieldName ) as FieldName ,

         If( IsNull( FieldName 2), Peek( FieldName 2), FieldName2 ) as FieldName

       

           Resident TableName

          Order By Date ; /* so that above values can be propagated downwards */

0 Likes
595 Views
Not applicable

Hi Hendric,

I have read your generating missing data pdf file.We have a situation similar to,  Populating a table with warehouse balances. But in our case we have a three dimension data instead of two. We have date, store and Item. The item price comes from SalesOrder table which only give the price on a particular date. I have tired to adopt your example given for two dimension, but till now not able succeed. We need to populate the price of an item in a store for all the missing dates. The price will be same from the date the salesorder is released to next sales order date. if there is no sales order for the last one month, then we need to populate the last month's price to till date. Any pointers to deal with three dimension is appreciated.

Thanks & Regards

Kichenin

0 Likes
595 Views
Not applicable

Excellent blog just what I was looking for!

0 Likes
595 Views
annabergendal
Contributor

Hi,

I use your method but end up with a script error:

Error in expression: ')' expected

Join (Qty_Tmp)

     Load Date(recno()+) as MonthYear

     Autogenerate vMaxDatePeriod - vMinDatePeriod;

What could be the problem? Any help is appreciated. The code the cause the failure:

Join (Qty_Tmp)

     Load Date(recno()+$(vMinDatePeriod)) as MonthYear

     Autogenerate vMaxDatePeriod - vMinDatePeriod;

0 Likes
595 Views
Employee
Employee

You have an unassigned variable in

     Load Date(recno()+$(vMinDatePeriod)) as MonthYear

The plus sign cannot be followed by a right parenthesis.

Run the script in the debugger to find out why vMinDatePeriod doesn't have a value.

HIC

0 Likes
595 Views
annabergendal
Contributor

Thanks,

it was like you said, the variable had no value because it was never assigned.

Now that I solved that I have another problem.

Instead of date I want to use month as time, since the table should be on aggregated level (sum of quantities):

Join (Period_Qty_in_stock)

     Load Date((recno()+$(vMinDatePeriod)),'YYYY-MM') as MonthYear

     Autogenerate vMaxDatePeriod - vMinDatePeriod;

Qty_in_stock_per_period:

     NoConcatenate Load MonthYear,

          If( IsNull( Qty_in_stock_sum ), Peek( Qty_in_stock_sum ), Qty_in_stock_sum ) as Qty_in_stock_sum_per_period

          Resident Period_Qty_in_stock

          Order By MonthYear ;

But it doesn't work... In the join I get all dates instead of only the months... What should I do? Can I use Autogenerate and only get the months, not all dates?

0 Likes
595 Views
Employee
Employee

In principle, your method should work.

However, you must make sure that the MonthYear is defined the same way in both places. If you use a date serial number as month (~ 41000) then you can use

     Date( MonthStart( Date ), 'YYYY-MM' ) as MonthYear

But your Autogenerate creates one record per recno(), i.e. one record per date, so this must be changed to one record per month. One solution could be:

     Load distinct

          Date( MonthStart(recno()+$(vMinDatePeriod)),'YYYY-MM') as MonthYear

     Autogenerate vMaxDatePeriod - vMinDatePeriod;

Here the Autogenerate creates one record per month, but the distinct reduces it to one record per month. 

HIC

0 Likes
595 Views
annabergendal
Contributor

Thanks for the help, I realize I don't need Autogenerate though. I already have my TempTable (joining quantity table with periodcounter with months). But when I run

Qty_in_stock_per_period:

     NoConcatenate Load MonthYear,

          If( IsNull( Qty_in_stock_sum ), Peek(Qty_in_stock_sum), Qty_in_stock_sum ) as Qty_in_stock_sum_per_period,

          If( IsNull( Qty_in_stock_sum ), Previous(Qty_in_stock_sum), Qty_in_stock_sum ) as Qty_in_stock_sum_per_period_2

          Resident Period_Qty_in_stock

          Order By MonthYear;

         

          drop table Period_Qty_in_stock;

I still get null values for Qty_in_stock_sum_per_period, whilst in Qty_in_stock_sum_per_period_2 I get a value in the month with null after a month with value... Why can't the peek/previous find the values when null?

0 Likes
595 Views
Employee
Employee

Because Peek() looks in the output fields of the Load, whereas Previous() looks in the input fields of the Load. Peek(Qty_in_stock_sum) doesn't work since Qty_in_stock_sum doesn't belong to the output fields.

You need to use

     Qty_in_stock_per_period:

     NoConcatenate Load MonthYear,

          If( IsNull( InputQty ), Peek( OutputQty ), InputQty ) as OutputQty

          Resident Period_Qty_in_stock

          Order By MonthYear;

HIC

0 Likes
595 Views