Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Cross Tabulation of census style data

I have "questions", "respondents" and "answers". Below is an example with 5 questions, two respondents and 8 sets of answers (yes = 1/no = 0). Is it feasible to try and get an output table that looks something like:


female? | <30? | 1
female? | >=30&<=50? | 1


(ie the total of females by age groups)? Multiply questions and respondents by ~20 000, there might be things like (urban?/rural?, state, town etc).

The script:




question:
LOAD * INLINE [
question no, question
q1, male?
q2, female?
q3, <30?
q4, >=30&<=50?
q5, >50?
];

answer:
LOAD * INLINE [
question no, respondent, answer
q1, r1, 0
q2, r1, 1
q3, r1, 1
q4, r1, 0
q5, r2, 0
q1, r2, 0
q2, r2, 1
q3, r2, 0
q4, r2, 1
q5, r2, 0
];




Thanks for any help, I am a noob so please forgive me if I am missing the completely obvious. I can structure the input in any way I like, so I could duplicate the question tables if that were to help with cross tables (for example).

Regards

Alex

5 Replies
Miguel_Angel_Baeyens

Hello Alex,

Check this file with sample code about crosstables. IT may be helpful if your data format is similar to the inline tables I used in it. As far as I see it, any question should be a separate field with their correspondings 1 or 0, so you could accumulate them.

Hope it helps!

EDIT: You're likely to find more interesting the GENERIC load with the sample data you provided. There are some useful posts and blog entries about them in the forum as well.

Not applicable
Author

Hello Miguel

I am exploring this problem using the personal edition which means I cannot look at your sample. However I suspect that from your response that I may not be able to manipulate the questions as I would like (you say "question should be a separate field").

I did experiment with loading the question table twice so that I can use "question x" as an x co-ordinate and "question y" as a y co-ordinate. This looked promising - and looked a bit like the table I posted, but I need to have some cross table query as an expression in col 3. To rephrase my question, is it possible to do a "join" operation on in-memory tables (like SQL)?:

so "Sum (answer)" would be replaced with a SQL like query of my own which can reference the values in column 1 and column 2. I have been reading the manual quite hard, but still can't see the wood for the trees.

Thanks

Alex

Miguel_Angel_Baeyens

Hello Alex,

Let's see if Jason can take a look at the files and make them available for personal edition.

Similar to how you join queries in SQL you can do with tables loaded in QlikView. According to your script above

question:LOAD * INLINE [ question no, question q1, male? q2, female? q3, <30? q4, >=30&<=50? q5, >50?]; answer:JOIN LOAD * INLINE [ question no, respondent, answer q1, r1, 0 q2, r1, 1 q3, r1, 1 q4, r1, 0 q5, r2, 0 q1, r2, 0 q2, r2, 1 q3, r2, 0 q4, r2, 1 q5, r2, 0]; T:GENERIC LOAD respondent, question, answerRESIDENT question;


Now create a new chart, pivot table, add "<30?" as dimension and

Sum({< question = {'female?'} >} answer)


As expression.

Is this closer to what you are looking?

Not applicable
Author

Thank you again for such a thorough attempt to help me, it works well for that smallish set of questions.

I think that I am going to hit a rock wall with the number of questions: I must cater for about 20,000.

So, unless I am missing something, the "programmability" of the solution is not there, as I don't think it is practical to do the join the way you have, and to have so many field names? The final user should have the ability to grab a bunch of questions from x, and a bunch from y and simply crosstab.

I made an assumption that QlikView would provide a query language for joining tables already in memory and it seems I am wrong. I wasn't sure even that would scale, but I was going to test that.

Regards

Alex

Not applicable
Author

Hi Miguel

I have done something that works, but is a bit odd, maybe it exploits a bug?

Note that in the straight chart and the pivot table, the Sum expression is invalid! I really have not got my head around the set notation at all. However, it does what I want!?!

Maybe you can explain, or refine it a bit?

Essentially I duplicated all input:



questionx:
LOAD * INLINE [
questionx
male?
female?
<30?
>=30&<=50?
>50?
];

questiony:
LOAD * INLINE [
questiony
male?
female?
<30?
>=30&<=50?
>50?
];

answerx:
LOAD * INLINE [
questionx, respondent, answer
male? , r1, 1
female? , r1, 0
<30? , r1, 1
>=30&<=50?, r1, 0
>50? , r1, 0
male? , r2, 0
female? , r2, 1
<30? , r2, 0
>=30&<=50?, r2, 1
>50? , r2, 0
male? , r3, 0
female? , r3, 1
<30? , r3, 0
>=30&<=50?, r3, 0
>50? , r3, 1
male? , r4, 0
female? , r4, 1
<30? , r4, 0
>=30&<=50?, r4, 0
>50? , r4, 1
male? , r5, 1
female? , r5, 0
<30? , r5, 0
>=30&<=50?, r5, 0
>50? , r5, 1
];

answery:
LOAD * INLINE [
questiony, respondent, answer
male? , r1, 1
female? , r1, 0
<30? , r1, 1
>=30&<=50?, r1, 0
>50? , r1, 0
male? , r2, 0
female? , r2, 1
<30? , r2, 0
>=30&<=50?, r2, 1
>50? , r2, 0
male? , r3, 0
female? , r3, 1
<30? , r3, 0
>=30&<=50?, r3, 0
>50? , r3, 1
male? , r4, 0
female? , r4, 1
<30? , r4, 0
>=30&<=50?, r4, 0
>50? , r4, 1
male? , r5, 1
female? , r5, 0
<30? , r5, 0
>=30&<=50?, r5, 0
>50? , r5, 1
];


Thanks and regards

Alex