SQL SELECT is necessary for retrieving data from an external SQL data source. Using functions like MAX into the SQL statements will use for instance the ODBC driver instead of QlikView for doing the calculation. I think it is always better and perhaps faster to use these functions in QlikView but it could be different for each connection and data source type.
The SQL statement is executed FIRST, passing a database data retrieval command to the database via your ODBC driver. When the data has been fetched from the database, it can be further processed by QlikView. This is what happens in the LOAD statement.
There is an extreme overlap in what data transformations a database can do for you and what QlikView can do for you. For example:
SQL SELECT left(myField,3) as FieldPrefix FROM myTable;
would ask the database to pick the 4 first characters from 'myField', call it FieldPrefix and THEN send it back to QlikView.
If you do not want to incurr additional load on your database source system, you could have instead left QlikView to do the same by the statements:
LOAD left(myField,3) as FieldPrefix; SQL SELECT myField FROM myTable;
The database would FIRST retrieve myField, after which the data is passed UP through the preceding LOAD statement in QlikView.
The SQL statement is always below any QlikView LOAD statements. You cannot switch the order around. The way to read the scope of a full pass of data is from BOTTOM to the TOP. My hypothesis of why this is would be because QlikView needs a template of what to do once data is received PRIOR to receiving the data, thus prior to the SQL statement being issued it needs to understand the intended QlikView transformations to be performed.
yes, your understanding is correct - LOAD is used for QlikVIew functions and proper case renaming (some databases can only work with upper case), while SELECT is executed on the database and is using the database syntax.
For large data volumnes, it's preferable to make the initial selections, groupping and any complex calculations on the database side, to avoid bringing over and then discarding a lot of data.
You don't lose any flexibility. If the database engine can't do a particular operation, you can still do it in QlikView. But you would generally want to do operations where they can be handled most efficiently. I don't generally worry about how large the data set is - I just try to make a habit of doing operations in the most efficient spot, even when it doesn't matter. I figure it's a good habit to be in for when it really matters. How large the database has to be for it to matter will depend on the specifics in your shop. In our shop, I'd say it only starts getting critical when we're retrieving millions of rows of data, but could probably make a noticeable difference with hundreds of thousands of rows.
In practice, by far the most common thing I do on the database side is filtering the records being retrieved. Other manipulation is quite rare for me.
You should always load data and the lowest level that your analysis requires. If you keep this rule, you'll never lose any flexibility.
As far as "what's large data volumes" - it's all relative to your environment. For a small company with a 32 bit server, - a few millions of rows is "large". For a large company with 64-way beast with 128 GB of RAM, a few hundred of million rows is "large".
To generalize, I'd say, every time you cross the "few millions of rows" mark, you should begin worrying about performance and thinking about the most efficient way of loading and presenting data.
How would you select all tables? I cant find any examples where anything other than reading from only one table is used. I need to read all tables in a database and count the number of rows in each. Example of how this works for one table is below: