Re: Optimal computer configuration for QlikView de... - Qlik Community

marko_rakar · ‎2017-09-29

I have tried to read a number of forum articles about choosing optimal computer configuration for running QlikView desktop and I am still a bit confused.

What would you do to build your ultimati QlikView desktop computer (one you use to develop your apps). If you have a number of large data sets and during development you need to play a lot with data models, imports and other stuff; what is the critical component of your computer?

We all know RAM size is important, what about disk system (m2 SSDs), but what is the number of cores and/or threads, should I go for maximising processor cores, or maybe even go to multi processor configuration? Or should I go for maximum core speed?

Some articles state that I should disable hyperthreading and use only cores (is this still true with version 12)?

What about Intel vs. AMS, AMD now has a number of processors with a large number of cores; there used to be complaints on qlik on AMD (is this still valid complaint)?

Disk system, has anyone tried to work with Optane memory?

If I work with a lots of loads I noticed that my workstation will start using all cores, but each of them is fairly lightly loaded (about 10-20%), since I use m2 SSD I was wondering what is it that stops my processor from crunching data - how to figure out what is the bottleneck?

How would you build your development workstation?

marko_rakar · ‎2018-05-21

This thread is now nine months old, there is some number of reads and there is no reply, so I would like to ask this question again.

When choosing optimal QlikView desktop workstation (for ETL, dashboard design and testing) what should I look for:

number of cores/threads? in some documents it states that HT should be disabled, in some (newer ones) it appears that HT is ok
memory speed, quite obvious, but would I benefiit drom more memory channels (which might lead me to consider AMD vs Intel platform)
AMD vs Intel, it used to be a problem in the past, I guess it works ok now? (new threadripper and EPYC processors from AMD look great)
L3 cache, we should also go for as much as possible but is there some measurement on what to look for (is it better to have 2ghz machine with 40mb L3 cache or 3ghz machine with 12mb l3 cache)
what is the influence of graphic cards, how much of dashboard presentation is offloaded to the graphics card? (if I do not use web view)
I noticed also that on my system where I have both sata and m.2 SSD, system does not appear to be significantly faster (in saving, loading) if I use m.2 disk, it appears that disk utilization is quite low (even on huge 40gb+ data sets)

So, anyone willing to add some thoughts to this, how to make optimal qlikview desktop workstation?

marko_rakar · ‎2018-08-27

Here I am again with some additional thoughts and actuall comparison results (bear in mind this is qlikview desktop, latest version all updates):

I have created simple load script which loads chicago parking ticket data (https://www.propublica.org/nerds/download-chicago-parking-ticket-data) which is 28mil rows csv file, when this csv is loaded I will make one more load resident from the same data (in order to remove hard disk system from equation)
I would test other stuff as well (I did variations with some calculations while doing resident load so to move from multi threaded operation to single threaded) but I was unable to find other functions which would push computer to the limit and be measurable at the same time.

Results are as following:

on my Intel Xeon 1650v2 system (which uses sata SSD as primary disk), I could run this script in 3 minutes 11 seconds
on testbed AMD 2990WX system (which uses nvme SSD storage which is waaay faster), I could run same script in 2 minutes 34 seconds
on testbed AMD 2950X system (same as above, I only changed processor), I could run same script in 2 minutes 5 seconds

Now, in theory brand new AMD 2990wx which is 32 core processor should be much faster, but in reality it is not; this is likely to its (at the moment) unique NUMA only arhitecture where not all cores have direct access to the memory. In practice, 2950x with its slightly faster cores and with UMA arhitecture ends up being faster.

What is interesting, is that processor utilization during load was about 50%, maybe slightly above that (on all cores). I wonder what I have to do to force processor to go above that because I think it should be able to import data much faster (in theory, my disk should be able to read whole csv file in 2 seconds max, so disk is not an issue here, something else is).

Does anybody else have in mind what I could test on QlikView Desktop in order to test its speed, or how can I optimize my machine to run even faster? (my work is mostly data forensics and connecting various data sets in previously unimaginable ways, so I spend most of my time loading, reloading and testing various scenarios so this is why it is important to me)

p.s. just for the record, code I used is here:

Let MyMessage = Now(1) & ' time';

Trace $(MyMessage);

Directory;

//first 1000

LOAD ticket_number,

issue_date,

violation_location,

license_plate_number,

license_plate_state,

license_plate_type,

zipcode,

violation_code,

violation_description,

unit,

unit_description,

vehicle_make,

fine_level1_amount,

fine_level2_amount,

current_amount_due,

total_payments,

ticket_queue,

ticket_queue_date,

notice_level,

hearing_disposition,

notice_number,

officer,

address

FROM

data\processed\parking_tickets.csv

(txt, codepage is 1252, embedded labels, delimiter is ',', no eof);

Let MyMessage = Now(1) & ' time';

Trace $(MyMessage);

store parking_tickets into parking_tickets.qvd;

Let MyMessage = Now(1) & ' time';

Trace $(MyMessage);

test1:

load

hash128(license_plate_number) as H1,

AutoNumber(license_plate_number) as A1,

AutoNumberHash256(license_plate_number) as A2

resident parking_tickets;

Let MyMessage = Now(1) & ' time';

Trace $(MyMessage);

marcus_sommer · ‎2018-08-27

I think there won't be an optimal configuration because each system will have strengths and weaknesses and it will depend on your data and what do you want to do with them which one might be more suitable. And these data and your requirements might change from project to project. This means your decision will be always a compromise.

I could not give you some definite sugestions to the processor-types and the number of cores and their speed (the max. single-core speed shouldn't be neglected because not all calculations are multi-threaded) because my hardware is much too old as to deduce any suggestions from them for new hardware. But unless there are some valid testings with appropriate results for it I would avoid the AMD platform because more cores mustn't not mandatory lead to a better performance because the cores must communicate with eachother properly and this was a bottleneck within the old opteron-platform.

If you do performance-testings with monitoring the core-utilization you should test multiple scenarios not only on loading data else also to transforming them and to calculate any results within the UI.

Further I would suggest that your development-machine is rather slower as the later productive-environment to ensure that everything which runs satisfying on your machine will also run on the faster productive. To estimate the opposite will be quite more difficult.

Beside this I would suggest to use at least a qvd-layer with or without an incremental approach to avoid the loadings of huge and/or multiple datasets from flatfiles or databases again and again. I'm not sure but I assume that an optimized load of a qvd will be more suitable to find the biggest bottleneck between ssd-, ram and cpu-performance.

- Marcus

firestream · ‎2019-09-25

Found this thread.

Just wonder anyone has an updated test data for newer AMD EPYC processor that is based on ROME?

Optimal computer configuration for QlikView desktop