30 Replies Latest reply: Jun 27, 2013 12:58 PM by Aji Paul RSS

    QlikView Server slower with more CPUs

    Karl Pover

      Hello all,

      I have a machine with 1 CPU, 4 cores, 8 system threads and 8 GB RAM which calculates the same graphs in half the time as a machine with 2 CPUs, 8 cores, 32 system threads and 64 GB RAM. Both have QV Server V9 SR5 and neither machine is virtualized.

      Has anybody experienced something similar?

      Thanks for the help.

      Regards,

      Karl

        • QlikView Server slower with more CPUs
          Clever Anjos

          It´s really kind of weird.

          Are the CPU´s the same model?

          Did you compare the CPU graph while the graph is being calculated?

          • QlikView Server slower with more CPUs
            Karl Pover

            The slower machine has has the following characteristics

            Memoria: 1066MHz DDR3 BUS: QPI de 6.4GT/s (No existe bus frontal)
            Spec Floating Point rate: SPEC2006 FP_RATE 2 CPU X7560 = 283

            I've attached the server architecture.

            Regards.

            • QlikView Server slower with more CPUs

              Hi,

              Try disabling the hyperthreading in CPU . See here http://andpointsbeyond.com/category/qlikview/

              "Hyperthreading is designed to help deal with unoptimized applications and the limitations of operating system schedulers. My understanding from the last time that hyperthreading was actively marketed is that QlikView does not benefit and can actually suffer when hyperthreading is enabled. QlikView has highly optimized code and uses it's own threading algorithms to maintain peak performance. Hopefully someone from QlikTech can confirm in the comments that hyperthreading is not advisable.

               

              -Alex

                • QlikView Server slower with more CPUs
                  Karl Pover

                  Thanks, Alex. I'm going to look into hyperthreading.

                    • QlikView Server slower with more CPUs
                      Karl Pover

                      Just to give some kind of close to this issue, I'l like to add that disabling the hyperthreading didn't change the performance of the QV Application on the server. IBM delivered another server that is a slighly older model and finally the performance of QV was acceptable. After confirming another similar case that also involved a IBM server, we've come to the conclusion that the newest IBM server models don't seem to work well with QlikView. We're waiting for IBM to give us a solution.

                        • QlikView Server slower with more CPUs

                          Hej Karl,

                           

                          Windows Server 2008 is running on both IBM machines?

                          Do you are using the R2 edition of w2k8?

                           

                           


                          The basic notion behind SMT parking is that the Windows scheduler will attempt to schedule threads so that all physical cores are occupied before any core gets two threads scheduled on its two front-ends (or logical cores). Since Hyper-Threading involves some cache partitioning and other forms of resource sharing, this is a potentially important feature. We've seen scheduler quirks cause poor and oddly unpredictable performance on Core i7 processors in the past. Based on our limited experience testing with Windows 7 and a cadre of SMT-enabled processors for this review, our initial impressions of SMT parking are positive. We've seen performance results for executables that rely on the Windows scheduler for thread allocation that match the performance of executables with explicit, SMT-aware thread affinity built in. Our initial sense is that SMT parking blunts some potential disadvantages of Hyper-Threading, making it more of an unqualified win, even on the desktop.


                          Source: http://techreport.com/articles.x/17545/2

                            • QlikView Server slower with more CPUs
                              Karl Pover

                              Yeah, they both have Win 2008 R2. IBM is continuing to do tests to see what is wrong.

                              Regards.

                                • QlikView Server slower with more CPUs
                                  Jay Jakosky

                                  Definitely sounds like a hardware issue. I'm glad that IBM is taking care of it. I've been working on a similarly spec'd system from Dell and performance has been wonderful. There are very few of these systems out there, so if you need a benchmark on our server I may be able to help.

                                  • QlikView Server slower with more CPUs
                                    Jorge Silva

                                    Hi Karl, do you had any answer from IBM to resolve this issue?

                                    I've a scenario similar to your's... do you solve it?

                                    Thanks,

                                    Jorge

                                      • QlikView Server slower with more CPUs

                                        I think QV10 takes into account the multiple cores alot better than QV9.

                                        • QlikView Server slower with more CPUs
                                          Karl Pover

                                          No news yet from IBM. They were able to improve the performance, but it still doesn't run as fast as my laptop. If and when I find out something, I'll post it.

                                          We tested QV 10 and we didn't see any improvement.

                                          Regards.

                                            • QlikView Server slower with more CPUs
                                              Karl Pover

                                              Ok, so we've tried 2 other large industrial strength servers from another Intel family and HP and every server a process QlikView chart even slower then a smaller, older server and my laptop. You might say that the server would run better with more users, but with 2 users (56 seconds per chart) the servers take almost twice the time as 1 user (34 seconds per chart). I guess it might run better with 100 users, but this is crazy.

                                              I apparently have a magic laptop because the same graph takes 15 seconds per chart.

                                              Does anybody know about any of the following:

                                              - Has anybody had any relation with the QlikTech Scalibility Center?

                                              - Who else is having a similiar situation?

                                              Regards.

                                               

                                                • QlikView Server slower with more CPUs

                                                  Hi,

                                                  Are there some complicated formulas in those charts, or just a lot of data?

                                                  Disable CPU hyperthreading. Suppose there is sizeable amount of memory, so don't let Windows automatically manage the size of the pagefile. Use a fixed amount of pagefile, ex 2 GB. The old mantra swap = two times the memory does not apply for a tool called "in memory analytics"; you really need data to fit in memory.

                                                  To do a one-to-one comparison with laptop, limit QVS service from using all CPU cores from QEMC : System : Setup: QlikView Servers : Performance : CPU affinity. Choose same amount of cores as in the laptop. Use the Windows task manager to really see the rest of the CPU cores idle.

                                                  -Alex

                                                   

                                                    • QlikView Server slower with more CPUs
                                                      Karl Pover

                                                      Thanks for the suggestions. I just got a list of things to review from QlikTech and it mentions many of the same things that are mentioned here. Like Iván said, we´re testing all the suggestions we can.

                                                      The test that seems to have the biggest impact is the number of cores that we allow QlikView to use. With 1 user QlikView runs twice as fast when we restrict QlikView to 4 cores than when we allow it to use 32 cores. With 2 users and 32 cores, QlikView runs almost twice as slow as with 1 user. All CPUs available to QlikView are used at 100% so QlikView doesn't seem to know how to restrict itself.

                                                      Maybe once I have 10 concurrent users in the system QlikView will work better. So, against all logic if a user experiences poor QlikView performance then they should ask other users to enter and use QlikView. That's wierd.

                                                        • QlikView Server slower with more CPUs

                                                          From your original question : "a machine with 2 CPUs, 8 cores, 32 system threads"

                                                          If you would disable the CPU hyper-threading, there would be 8 cores = 8 system threads in the machine.

                                                          -Alex

                                                            • QlikView Server slower with more CPUs

                                                              Hi,

                                                              we' ve an internal discussion regarding this. And this post is only for completeness:

                                                              (This statement comes from a consultant colleague)

                                                              this QPI Wrap Card is needed only when the Server has 4 CPUs installed, but here the X3850 box has only 2 CPUs.

                                                              This could eventually cause a performance degradation.

                                                               

                                                              Here is a excerpt from a whitepaper of the ex5:

                                                               

                                                              These wrap cards complete the full QPI mesh to allow all four

                                                              processors to be connected to each other. The QPI Wrap Cards are not needed in

                                                              two-processor configurations and not needed when a MAX5 is connected.

                                                               

                                                              3.4.1 QPI Wrap Card

                                                              In the x3850 X5, QPI links are used for inter-processor communication, both in a single node

                                                              and two-node system. They are also used to connect the system to a MAX5 memory

                                                              expansion drawer. In a single node x3850 X5, the QPI links are connected in a full mesh

                                                              between all CPUs. To complete this mesh, the QPI Wrap Card is used.

                                                              Tip: The QPI Wrap Cards are only for single-node configurations with three or four

                                                              processors installed. They are not necessary for any of the following items:

                                                              _ Single-node configurations with two processors

                                                              _ Configurations with MAX5 memory expansion units

                                                              _ Two-node configurations

                                                               

                                                               

                                                              A further performance issue can appear, when RAM-modules are not optimally distributed over the available DIMM-slots.

                                                              Attached is the link the WP from IBM: http://www.redbooks.ibm.com/redpapers/pdfs/redp4650.pdf

                                                               

                                                                • QlikView Server slower with more CPUs

                                                                   


                                                                  olli wrote:
                                                                  this QPI Wrap Card is needed only when the Server has 4 CPUs installed, but here the X3850 box has only 2 CPUs.
                                                                  This could eventually cause a performance degradation.


                                                                   

                                                                  Thanks, for the suggestion, we got the same comment from QlikTech. Now that you come out with this again, we feel more comfortable asking the provider to remove the card. We will let you know the outcome as soon as we do the testing without the qpi wrap card.

                                                                  Thanks, again.

                                                                   

                                                                • QlikView Server slower with more CPUs

                                                                   


                                                                  Alexandru Toth wrote:
                                                                  From your original question : "a machine with 2 CPUs, 8 cores, 32 system threads"
                                                                  If you would disable the CPU hyper-threading, there would be 8 cores = 8 system threads in the machine.
                                                                  -Alex<div></div>


                                                                  The server have 2 CPU's with 8 cores each, for a total of 16 physical CPU. We did see and increase of performance when disabling hyper-threading, thanks. I'm guessing we might be able to get a decent performance with a combination of all your suggestions.

                                                                  Regards

                                            • QlikView Server slower with more CPUs

                                              Hi Karl,

                                              here are a suggestion from my colleagues:

                                              Some time ago we had some performance problems in with newer machines as well. It was caused by the Win2008 default power plan "Balanced". We experienced much better (script) performance when we turned the power plan to "High Performance".
                                              Maybe it's the case in your scenario as well..
                                              I changed the Power Plan from "Ballenced" (which is default, so check after install OS ?) to "High Performance", since Windows 2008 SR2 does not support Intel Turbo Boost in other modes.
                                              So it takes now 23 minutes (a little slower than on our server, but that's plaubible because of lower CPU clocking / smaller cache).


                                              from MS Performance Tuning Guidelines for Windows Server 2008 R2, Page 14:

                                              Processor Performance Boost Policy :
                                              Intel Turbo Boost Technology is a feature that allows Intel processors to achieve
                                              additional performance when it is most useful (that is, at high system loads).
                                              However, this feature increases CPU core power consumption, so we configure Turbo
                                              Boost based on the power policy that is in use. Turbo Boost is enabled for High
                                              Performance power plans and disabled on Balanced and Power Saver plans for the
                                              current generation of processors. For future processors, this default setting might
                                              change depending on the power efficiency of such features. To use the Turbo Boost
                                              feature under the Balanced or Power Saver plans, you must configure the Processor
                                              Performance Boost Policy parameter.
                                              The Processor Performance Boost Policy is a percentage value from 0 to 100. The
                                              default value of this parameter is 35 percent on Balanced and Power Saver plans. Any
                                              value lower than 51 disables Turbo mode. To enable Turbo Mode, set this value to 51
                                              or higher.
                                              The following commands set Processor Performance Boost Policy to 100 on the
                                              current power plan. Specify the policy by using a GUID string, as shown below:
                                              Powercfg -setacvalueindex scheme_current sub_processor 45bcc044-d885-
                                              43e2-8605-ee0ec6e96b59 100
                                              Powercfg -setactive scheme_current

                                              Note that you must run the powercfg -setactive command to enable the new
                                              settings. You do not need to reboot the server.
                                              To set this value for power plans other than the current selected plan, you can use
                                              aliases such as SCHEME_MAX (Power Saver), SCHEME_MIN (High Performance), and
                                              SCHEME_BALANCED (Balanced) in place of SCHEME_CURRENT. Replace "scheme
                                              current" in the powercfg -setactive commands shown above with the desired alias to
                                              enable that power plan. For example, to adjust the Boost Policy in the Power Saver
                                              plan and make Power Saver the current plan, run the following commands:
                                              Powercfg -setacvalueindex scheme_max sub_processor 45bcc044-d885-43e2-
                                              8605-ee0ec6e96b59 100
                                              Powercfg -setactive scheme_max

                                              Good luck!

                                              Rainer

                                              • QlikView Server slower with more CPUs

                                                Hello all,

                                                Thanks everyone for your suggestions because they have been useful. We just finished tunning up the server with the following changes in the configuration:

                                                - Power Options. Changed from Balanced to High Performance

                                                -Hyper threading. Changed to disable

                                                -Virtualization capabilities. Changed to disable

                                                -Ram Memory Configuration. RAM modules evenly distributed among the three channels. Density per module 8GB @ 1333mhz

                                                -Hardware Prefetch. Qliktech suggested to disable this option, however we found out that prefetch benefits performance, so we turn it back on.

                                                The server now is performing our test at 8.8s comparing the 17s intially.

                                                Best Regards, thanks everybody.

                                                  • QlikView Server slower with more CPUs
                                                    Karl Pover

                                                    Just as a final note. It is important that this information be relayed to partners and customers. We only discovered this issue after starting to use 64-bit laptops. There is probably plenty of improvement that could have been made in past projects and now when the customer asks for server requirements we will deliver these requirements along with the QlikView standard requirements.

                                                    The QlikTech Scalibility Center should post a white paper on what they've seen.

                                                    Thank you for all your posts.

                                                    • QlikView Server slower with more CPUs

                                                      Hello ivan_cruz, How can I disable Hyper threading. Is it on the BIOS or from Windows control panel. I would appreciate if you can give some more input on these. Thanks in advance