Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

One dimension made from two different dimensioins

Hi everyone,

I already searched the web and this forum and asked a bunch of people, but no one seemed to have an answer to the following:

Our users come from a source, click through our system and go to a sink afterward. Now sink and source are from the same pool of partners, so we would like to have a table with the partners as dimension, where we can show certain clicks sorted by source in one column and those clicks sorted by sink in the next column.

The data is loaded in the following way:

user

sourcesink
User 1Partner 1Partner 3
User 2Partner 2Partner 1
User 3Partner 1Partner 2
User 4Partner 2Partner 2

and we would like to have the following table of this:

Partner# as source
# as sink
Partner 12

1

Partner 222
Partner 301

Partner# as source# as sinksum
Partner 1213
Partner 2224
Partner 3011

Without the sum, we could use two different tables, but like this, it doesn't work unfortunately.

I tried to use source and sink as dimensions, then limit the second dimension by the first and restrict the sums by the first resp. by the second dimension, but this doesn't seem to help.

We have quite large tables (a couple of Gigabytes per table) and we already did an aggregation per day and still need up to 20GB RAM, so we would like to avoid loading double data. Is there a way to get the second table from the first one?

Thank you for your help

Ulrich

1 Solution

Accepted Solutions
Gysbert_Wassenaar

Load the data so it is in this form instead:

User, Partner, Type

User 1, Partner 1, Source

User 1, Partner 3, Sink

User 2, Partner 2, Source

User 2, Partner 1, Sink

...etc

You can then use Partner as dimension and expressions like:

Source: count({<Type={'Source'}>} User)

Sink: count({<Type={'Sink'}>} User)

Total: count(User)


talk is cheap, supply exceeds demand

View solution in original post

6 Replies
Gysbert_Wassenaar

Load the data so it is in this form instead:

User, Partner, Type

User 1, Partner 1, Source

User 1, Partner 3, Sink

User 2, Partner 2, Source

User 2, Partner 1, Sink

...etc

You can then use Partner as dimension and expressions like:

Source: count({<Type={'Source'}>} User)

Sink: count({<Type={'Sink'}>} User)

Total: count(User)


talk is cheap, supply exceeds demand
Not applicable
Author

Thank you for your reply. I thought about a reformatting like this as well. But unfortunately a refomatting won't work, since our table is much bigger than this example and the redundancies will make the needs for RAM explode.

If it is the only solution though, we need a better system, I just hoped, there could be a solution avoiding this.

Not applicable
Author

I agree with Gysbert, and I don't think that a crosstable is too demanding for the system!!

If you want you can post your script.

Gio

Gysbert_Wassenaar

Have you actually tested that? Try it on a limited set of a couply of million records. Another thing you could try is to generate id numbers for users and partners with the autonumber function. Such fields compress pretty good. And you could use a flag field for Source/Sink. Though it has a cardinality of only two already. So QV will use one bit per record for the pointers to the symbol table. See this blog post: Symbol Tables and Bit-Stuffed Pointers

Anyway, I don't think you have an alternative. Trying to 'fix' this in the UI is going to perform far worse.


talk is cheap, supply exceeds demand
Not applicable
Author

Very interesting link, thanks for that

My first quick test with loading those data a second time resulted in a very large file. But probably there is very much to be optimized.

Not applicable
Author

I did it and it worked. Thanks a lot for your help.