Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi everyone,
I already searched the web and this forum and asked a bunch of people, but no one seemed to have an answer to the following:
Our users come from a source, click through our system and go to a sink afterward. Now sink and source are from the same pool of partners, so we would like to have a table with the partners as dimension, where we can show certain clicks sorted by source in one column and those clicks sorted by sink in the next column.
The data is loaded in the following way:
user | source | sink |
---|---|---|
User 1 | Partner 1 | Partner 3 |
User 2 | Partner 2 | Partner 1 |
User 3 | Partner 1 | Partner 2 |
User 4 | Partner 2 | Partner 2 |
and we would like to have the following table of this:
Partner | # as source | # as sink |
---|---|---|
Partner 1 | 2 | 1 |
Partner 2 | 2 | 2 |
Partner 3 | 0 | 1 |
Partner | # as source | # as sink | sum |
---|---|---|---|
Partner 1 | 2 | 1 | 3 |
Partner 2 | 2 | 2 | 4 |
Partner 3 | 0 | 1 | 1 |
Without the sum, we could use two different tables, but like this, it doesn't work unfortunately.
I tried to use source and sink as dimensions, then limit the second dimension by the first and restrict the sums by the first resp. by the second dimension, but this doesn't seem to help.
We have quite large tables (a couple of Gigabytes per table) and we already did an aggregation per day and still need up to 20GB RAM, so we would like to avoid loading double data. Is there a way to get the second table from the first one?
Thank you for your help
Ulrich
Load the data so it is in this form instead:
User, Partner, Type
User 1, Partner 1, Source
User 1, Partner 3, Sink
User 2, Partner 2, Source
User 2, Partner 1, Sink
...etc
You can then use Partner as dimension and expressions like:
Source: count({<Type={'Source'}>} User)
Sink: count({<Type={'Sink'}>} User)
Total: count(User)
Load the data so it is in this form instead:
User, Partner, Type
User 1, Partner 1, Source
User 1, Partner 3, Sink
User 2, Partner 2, Source
User 2, Partner 1, Sink
...etc
You can then use Partner as dimension and expressions like:
Source: count({<Type={'Source'}>} User)
Sink: count({<Type={'Sink'}>} User)
Total: count(User)
Thank you for your reply. I thought about a reformatting like this as well. But unfortunately a refomatting won't work, since our table is much bigger than this example and the redundancies will make the needs for RAM explode.
If it is the only solution though, we need a better system, I just hoped, there could be a solution avoiding this.
I agree with Gysbert, and I don't think that a crosstable is too demanding for the system!!
If you want you can post your script.
Gio
Have you actually tested that? Try it on a limited set of a couply of million records. Another thing you could try is to generate id numbers for users and partners with the autonumber function. Such fields compress pretty good. And you could use a flag field for Source/Sink. Though it has a cardinality of only two already. So QV will use one bit per record for the pointers to the symbol table. See this blog post: Symbol Tables and Bit-Stuffed Pointers
Anyway, I don't think you have an alternative. Trying to 'fix' this in the UI is going to perform far worse.
Very interesting link, thanks for that
My first quick test with loading those data a second time resulted in a very large file. But probably there is very much to be optimized.
I did it and it worked. Thanks a lot for your help.