Re: Large Fields in Fact Tables with millions of r... - Qlik Community

gines_rueda · ‎2013-09-11

Hello devs,

I want to ask about the suitability of having very large fields in QlikView e.g, a RTF with 4000 characters. Specially when this fields are inside fact tables with millions of rows showed inside a pivot table. I have noticed that in this case the performance goes down.

Is there a document or guide that cover this? Or maybe a general document explaining that the issues with large datasets are not only with the number of rows but also with the number of fields and the size of them?

The best approach for this that I have thought is to do a document chaining when the rows available go down of a limit.

Any information on this topic will be appreciated.

Thanks.

marcus_sommer · ‎2013-09-11

Critical will be how many distinct values from this field exists because qv stored only distinct values. If the number of distinct values very large it will definitely have a big impact on the performance.

Perhaps there are another possibilities. I assume that these "RTF" are descriptions to other fields. This could also be solved with hyperlinks to these files.

- Marcus

gines_rueda · ‎2013-09-11

Hello Marcus thanks for answering.

The thing is, as you said this fields are descriptions, so in some fact tables we have one distinct description per row.

For example we have a transaction, with all the numerical data and some descriptions, this description fields are unique to the transaction.

If there is a big impact in the performance as you said, do you know were can I get some kind of documentation to show to my bosses some evidence of this?

Ginés.

marcus_sommer · ‎2013-09-11

I don't know special documentations about this topic but I think you could find such informations somewhere in the whitepapers from qv. Generally need a string-char 1 byte and in your case is the result (as simply forecast):

4.000 Chars * 1.000.000 Rows = 4.000.000.000 Byte or 4 GB only for this field and this will lead to not optimal until really bad performance.

It will depend from your hardware and network ressources and the number of users. You should simply try if it worked or not. If not you need alternative solutions like hyperlinks or handling-szenarios in which only a part from this field will be loaded.

- Marcus

gines_rueda · ‎2013-09-11

Hello again Marcus,

I was thinking about the handling scenarios and the possibility of a document chaining between a main report (the current one without the rtf large fields) to another that only shows a detailed view of subset of this (maybe only one), when the available selection chase that subset number.

This first approach of a document chaining count with a detailed view loading all the data from the database, but not allowing to selections or transformations with them (trying to reduce the ram consumption), then if you clean you selection and allows more than this subset, you will be sent to the main report again.

Were be much better trying to load less data or do a reduce of the data that the report load? I that even possible with a good performance? I was thinking altering the “Initial Data Reduction Based on Section Access” for example. Are there another solutions inside this approach?

Or maybe you recommend another approach?

Thanks,

Ginés.

christian77 · ‎2013-09-11

Hi.

You may want to use DIRECT SELECT. It is a new feature in QV 11.2.

By using this feature you load information from big tables on demand only, that means, when a new filter is set or when a new calculation is needed.

Use DIRECT SELECT and enjoy that relaxing café con leche in the Plaza Mayor.

christian77 · ‎2013-09-11

If your field is 4000 characters long, it is impossible to see more than one at a time.

With DIRECT SELECT you can bring one by one when needed.

To use it, install QV 11.2, latest version should be SR4. Type DIRECT SELECT instead of SQL SELECT and that's it.

Activo:

load
'$(Compañia)' as "Company",
Name as "Budget Name",
Activo as "Activo"
where Activo = 1;
SQL SELECT *
FROM "$(Compañia)$G_L Budget Name";

Activo:

load
'$(Compañia)' as "Company",
Name as "Budget Name",
Activo as "Activo"
where Activo = 1;
DIRECT SELECT *
FROM "$(Compañia)$G_L Budget Name";

Of course you need an open connexion with your DB at all time. The same connector will do it if lines are open.

I havent tryed this feature yet. If you try it, please send some feedback.

Good Luck

gines_rueda · ‎2013-09-11

Thanks for your answers Christian, I am gonna check this new feature and see how it works.

I will let you know how it goes.

PS: I will enjoy that relaxing café con leche but not in the Plaza Mayor, maybe in the "Malagueta".

Cheers.

Ginés

marcus_sommer · ‎2013-09-12

I don't believe you could be fast enough with direct select to ensure a satisfactory user response, then the database must also go through many records and the result-traffic is also not really less.

Beside work with hyperlinks it could be an alternative to generate the descriptions on the fly. I assume that many parts of the descriptions are redundant and could be through algerithms put together.

- Marcus

gines_rueda · ‎2013-09-12

Hello Marcus

I am currently doing an approach that only shows the large fields when the available selections is only one entity (like if it's a detailed view), and I am using the direct select to in that moment obtain this fields, so only one each time.

What do you mean with an approach with Hyperlinks, you mean open another report with a detailed view or you have in mind another thougth that I am not aware of?

Thanks.

Large Fields in Fact Tables with millions of rows