Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Big table of users (efficient sql select)

Hi guys,

I have a User table with almost 500 million records and I have been including it partially in the model like this:

left Keep (Fact)

load * ;

sql select * from Users;

(Both tables join on user ID)

However, this approach looks into the entire 500M record table, before it joins with the users contained in the fact table, which takes quite a while. Is there a better approach? Is there a way to SQL Select with a where clause, using the distinct users from the Fact table, for example?

All help is appreciated.

Thank you.

Nuno

1 Solution

Accepted Solutions
maxgro
MVP
MVP

you can build a dynamic sql where condition

T: Load

Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as User,

Round(1000*Rand()*Rand()*Rand1) as Expression1;

Load  Rand() as Rand1, IterNo() as TransLineID, RecNo() as TransID

Autogenerate 1000  While Rand()<=0.5 or IterNo()=1;

LOAD chr(39) & Concat( DISTINCT User , chr(39) & ',' & chr(39)) & chr(39) AS UserList

Resident T;

LET vUsers = Peek('UserList');

trace $(vUsers);

and then use in SQL

SQL select *

from Users

WHERE User IN ($(vUsers ));

but for 200k distinct users I think you can't build an efficient where clause;

example in Oracle a condition where user in (user1, user2, ....) will works only for 1000 users

You can try with an OR instead of IN

where (user = 'user1' or user = 'user2' ...............)

Maybe an option to load all the users in Qlik with an incremental load?

View solution in original post

10 Replies
sunny_talwar

You can store the distinct list of user ID in a variable and use that variable in your SQL Where clause. I have done something similar before, will check if I can get hold of that example.

Anonymous
Not applicable
Author

Does your Fact table exist in SQL outside of QlikView without a great deal of QVS load logic? If so, perhaps just

SELECT user.* FROM user INNER JOIN fact ON user.userid = fact.userid;

Let the DBMS do the heavy lifting.

Colin-Albert

Do you need all fields from the Users table?

Select * can be slower than selecting just the fields you need.

Also consider using a mapping table and applymap rather than joining the data.

It can be quicker to create and apply several mapping tables rather than joining.

Not applicable
Author

That seems like the way I want to go. If you find a good example, please let me know.

Thanks,

Nuno

maxgro
MVP
MVP

Can you join in the database (SQL select .....)  the fact table and the users table?

Not applicable
Author

Hi,

the fact table is created within qlikview. Getting it to sql again (or creating it in sql to begin with) would beat the purpose of what I am trying to achieve.

Thanks,

Nuno

petter
Partner - Champion III
Partner - Champion III

How many distinct User-IDs do you have in your fact table?

Not applicable
Author

Around 200,000.

petter
Partner - Champion III
Partner - Champion III

The only way I can see that is really efficient and still not very complicated is:

1) Create a table in your DB called FACT_USERID having one column UserID.

     Make sure that you have indexes on the user id columns in the involved tables.

2) Load Script:

    

       FACT_USERID:

       LOAD DISTINCT

          UserID

       RESIDENT

          FACT

       ORDER BY

          UserID

       ;

       STORE FACT_USERID INTO FACT_USERID.CSV (txt);

       DROP TABLE FACT_USERID;

       EXECUTE cmd.exe /c cmd-file-to-execute-import-of-CSV-into-the-DB.cmd FACT_USERID.CSV;

       // all popular SQL databases has some sort of command-line import tool to retrieve txt/csv-files ...

        LEFT KEEP (FACT)

        LOAD * ;

        SQL

          SELECT

               *

          FROM

               Users

          WHERE EXISTS( SELECT USER_ID FROM FACT_USERID);