Big Data Problem

Report Inappropriate Content · ‎2013-05-07

Hello guys,
been developing for Qlikview for the last 8 months on and off (doing SQL and Java as well)

I've stombled upon a major problem.

my company deals with user data for cellular companies.

we have a subscriber base of 10M + subscriber for the medium size compaines,

for each subscriber we reserve data of usages (SMS/Minutes of use, spending...) for the last 90 days

so far = 900M rows.

each subscriber can participate in ~700 Marketing activities, and he can be invited for each activity several times during the 90 days.

using this data,

I need to calculate and present in various charts and tables the usages for each subscriber, per each marketing activity relative to the distance of the usage date from the invitation date.

the hypothasis is that each subscriber can enter each MA 15 times at the most

so we are at worst talking about ~15 BILLION records.

the requiement is for this granularity of data (subscriber, MA, days from invitation).

all the data models processes I have tested (been using generic loads, cross tables etc) were to little success.

the data is huge.

I want to start a disccusion on how to desgin the data model to the best possible performance.

should I use 1 fact cross table of some design, break it down to dims and fact, use key tables, try hash, apply map... don't know.

I need to be able to load and process the data in ~3 hours and then present it. Qlikview table object take forever to load that much of data. charts are behaving better.

please help, this is as much a data modeling problem as it is a technical challenge.

thank you in advance,

Matan.

Report Inappropriate Content · ‎2013-05-07

I have'nt mentioned: the data is currently loaded from 2 fact tables:

Marketing activity (keys are, MA_ID, SUBSCRIBER, INVITATION_DATE) ~ 3GB

and

subscriber panel (keys are subscriber_id, period_start date). ~ 18 GB

calvindk · ‎2013-05-07

Why would anyone analysing 10m worth of users, ever want it on the subscriber level?

It sounds like you might be trying to serve multiple purposes in the same document.

Would it be possible for you to change the requirements to something that is presentable, ie something like this in

Count(Users)

Where MA is not null within last x days

MA, Count(Users) as TotalMaUsers

Date, MA, Count(Users)

Calendar

Although if you really want this, i would only load

Subscriber_Integer, MA_Integer, Days_Integer

Create a mapping table from MA to integer.

Create a mapping table for user info to integer

Create a mapping table for Days_integer to calendar

Edit:

Also try and elicit anything from the users about what they don't want to analyze on, and cut it from the load.

Report Inappropriate Content · ‎2013-05-07

thank you Anders, for your quick response.

all the keys has already been "integered" in the extraction process.

We've explained to the Marketeers that the granularity is overwhelming. they gave us seveal discount regarding the Data volume, and somewhat regarding the granularity.

the data is to be presented will be aggrated by weeks (div 7) and only 6 weeks (div 2).

2 weeks prior to marketing activity entry and 4 weeks max post entry.

last weeks should be aggragted but must indicate how many days have been aggragated (in case the marketing activity is presenty active, and the subscriber has entered in say... 2 weeks ago).

that's basically it.

Report Inappropriate Content · ‎2013-05-07

up

Report Inappropriate Content · ‎2013-05-08

up

Report Inappropriate Content · ‎2013-05-08

I dnt knw abt the solution for ur issue but still....... Hw u manage..........

900M rows

15 billion records............ O Jesus please forgive thm... the users dnt knw wht they r doing.........

Wht is the RAM size u r using ?????/////

Dnt mind forgive me for the comment........

Report Inappropriate Content · ‎2013-05-08

0.5TB...

Report Inappropriate Content · ‎2013-05-08

also, new requirement

10 permutations per MA. so... multiply by 10.

that's 150 BILLION.

lol

I have'nt the slightest clue how to design such a data model.
ideas?

Report Inappropriate Content · ‎2013-05-08

O man ..... u having a hard time ........

So this will the challenging requirement for QlikTech to develop new version of QVW...

" Important feature stating 150B data processed in seconds.......... "

Related Topics