Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm a QV novice so please be patient with me...
I'm building a solution for an energy company. I was asked to provide a report which will give the accurate number of active connections for each calendar day.
The interval table, named 'Link' contains the following fields:
%Connection_ID - Key to Connection table
%SupplierRegister_ID - "PK" for 'Link' table
StartSupplyDate_ID - Connection start date (Integer form of date. e.g. 20090728)
EndSupplyDate_ID - Connection end date (Integer form of date. e.g. 20090728)
%Customer_ID - Key to customer date
This table contains at the moment 1.6M records. It is connected, using Intervalmatch to Master date table.
Master date table contains a row for each date since January first 2008 (570 records).
Code:
SET NullInterpret = $(vDateKey);
NullAsValue EndSupplyDate_ID;
Link:
LOAD %Connection_ID,
%SupplierRegister_ID,
StartSupplyDate_ID,
StartSupplyMessage_ID,
EndSupplyDate_ID,
EndSupplyMessage_ID,
%Customer_ID,
_Link_created_date
FROM QVDs\LinkTable.qvd] (qvd);
Connection:
LOAD %Connection_ID,
EAN,
[Connection Product],
[Connection Street],
[Connection HouseNo],
[Connection HouseNoExtra],
[Connection Postcode],
[Connection City]
FROM QVDs\Connection.qvd] (qvd);
Customer:
LOAD %Customer_ID,
[Customer Name],
[Customer PhoneNo],
[Customer Email],
[Customer BirthDate],
_Customer_created_date
FROM QVDs\Customer.qvd (qvd);
Date:
LOAD
%Date_ID,
Date,
CY_Flag,
Weekday,
[Week Number],
Year,
Month,
Day,
Quarter,
(Year * 10000) + (Month * 100) + Day AS DateKey
FROM QVDs\DATE.qvd] (qvd);
LEFT JOIN(Link) INTERVALMATCH (%Connection_ID, DateKey)
LOAD %Connection_ID,
StartSupplyDate_ID,
EndSupplyDate_ID
RESIDENT Link;
Because of the sheer amount of volume data, I keep getting the notorios VIRTUAL MEMORY error. When making the link, Link table exceeds 50M records. This is a major performance overhead. As this is a part of POC project we would like to show that QV can handle large amount of date, w/out killing the server...
Maybe IntervalMatch is not the right approach? Is there another solution?
Help will be greatly appreciated!
Dror
Hi,
I can see from the code that you are creating multiple records for each connection id for e.g if a connection has start as Jan 2008 and end date as today, it is going to create as many records for that id (570 records for same connection with distinct datekey). My question is do you really need this kind of data? If yes, what purpose it serves?
Thanks
Amit
Amit,
I'm aware of the fact that a single connection might appear i.e for for each distinct datekey between date span. The Idea is to be able to provide analysis per day basis. You can limit the subset to week interval, but assuming there are roughly 700000 active connection at any given time, and we limit to current year, the result set will be 36.4M records. still too much...
The purpose of this analysis is be able to show breakdown of active connections, accross varous perios interval and broken down into several dimmesional properties (that's why i've used the link (star) approach)
Thanks
Dror
First, I hope you are trying this in 64bit, not 32bit.
Is the only purpose of the IntervalMatch to link %Connection_ID to various periods? That is, select a [Week Number] and show all the %Connection_ID that were active that week? If so, can you use a simple IntervalMatch instead of the Extended form. Meaning:
LEFT JOIN(Link) INTERVALMATCH (DateKey)
LOAD StartSupplyDate_ID, EndSupplyDate_ID RESIDENT Link;
Don't know if that may work out better. IntervalMatch does generate a lot of rows. Set Analysis is another approach you may consider.
Any chance you can post a QVW with a small amount of records? If you can upload a qvw, maybe we can make a better analysis of the problem. See http://community.qlik.com/wikis/qlikview-wiki/preparing-examples-for-upload-reduction-and-data-scram... if you have concerns about privacy.
-Rob
Hello Dror,
Sorry, I accidently hit the 'Suggest as Answer' button on your post.
Even I had a similar situation in one of my qv application..we were using intervalmatch which created millions of records and began causing memory issues. Below is what we did in that case:
1) Capture one record per id for each month from start till end date. In date key we used to keep the begin date of the month.
2) If there is a change in any attribute of the id, capture that as a new record. Date key for this record could be kept as the actual date when the change happened.
Take example of a stock 'XYZ', analyst started rating this stock from Jan 08 till Jan 09. In current scenario, we create 365 records for this stock. However, if the rating of this stock has changed only twice during this time period, we can work with just 14 records (12 records for 12 months and 2 records where ratings changed). Next, you create a date island table which is not linked to any other table in the model and has all the date attributes. Now the chart where you want to show data across intervals should compare it's datekey with the date from island table.
Hope this might be helpful in your case....
Thanks
Amit
Amit, Rob
Thanks for your help. It has certainly gaven me a lead...
I will try post scaled-down app later.
Dror
Hi Dror,
Did you get any solution to this? i am facing the same issue.
Thanks,
Rajni Batra