Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello guys,
been developing for Qlikview for the last 8 months on and off (doing SQL and Java as well)
I've stombled upon a major problem.
my company deals with user data for cellular companies.
we have a subscriber base of 10M + subscriber for the medium size compaines,
for each subscriber we reserve data of usages (SMS/Minutes of use, spending...) for the last 90 days
so far = 900M rows.
each subscriber can participate in ~700 Marketing activities, and he can be invited for each activity several times during the 90 days.
using this data,
I need to calculate and present in various charts and tables the usages for each subscriber, per each marketing activity relative to the distance of the usage date from the invitation date.
the hypothasis is that each subscriber can enter each MA 15 times at the most
so we are at worst talking about ~15 BILLION records.
the requiement is for this granularity of data (subscriber, MA, days from invitation).
all the data models processes I have tested (been using generic loads, cross tables etc) were to little success.
the data is huge.
I want to start a disccusion on how to desgin the data model to the best possible performance.
should I use 1 fact cross table of some design, break it down to dims and fact, use key tables, try hash, apply map... don't know.
I need to be able to load and process the data in ~3 hours and then present it. Qlikview table object take forever to load that much of data. charts are behaving better.
please help, this is as much a data modeling problem as it is a technical challenge.
thank you in advance,
Matan.
I have'nt mentioned: the data is currently loaded from 2 fact tables:
Marketing activity (keys are, MA_ID, SUBSCRIBER, INVITATION_DATE) ~ 3GB
and
subscriber panel (keys are subscriber_id, period_start date). ~ 18 GB
Why would anyone analysing 10m worth of users, ever want it on the subscriber level?
It sounds like you might be trying to serve multiple purposes in the same document.
Would it be possible for you to change the requirements to something that is presentable, ie something like this in
Count(Users)
Where MA is not null within last x days
MA, Count(Users) as TotalMaUsers
Date, MA, Count(Users)
Calendar
Although if you really want this, i would only load
Subscriber_Integer, MA_Integer, Days_Integer
Create a mapping table from MA to integer.
Create a mapping table for user info to integer
Create a mapping table for Days_integer to calendar
Edit:
Also try and elicit anything from the users about what they don't want to analyze on, and cut it from the load.
thank you Anders, for your quick response.
all the keys has already been "integered" in the extraction process.
We've explained to the Marketeers that the granularity is overwhelming. they gave us seveal discount regarding the Data volume, and somewhat regarding the granularity.
the data is to be presented will be aggrated by weeks (div 7) and only 6 weeks (div 2).
2 weeks prior to marketing activity entry and 4 weeks max post entry.
last weeks should be aggragted but must indicate how many days have been aggragated (in case the marketing activity is presenty active, and the subscriber has entered in say... 2 weeks ago).
that's basically it.
up
up
I dnt knw abt the solution for ur issue but still....... Hw u manage..........
900M rows
15 billion records............ O Jesus please forgive thm... the users dnt knw wht they r doing.........
Wht is the RAM size u r using ?????/////
Dnt mind forgive me for the comment........
0.5TB...
also, new requirement
10 permutations per MA. so... multiply by 10.
that's 150 BILLION.
lol
I have'nt the slightest clue how to design such a data model.
ideas?
O man ..... u having a hard time ........
So this will the challenging requirement for QlikTech to develop new version of QVW...
" Important feature stating 150B data processed in seconds.......... "