Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
We work with Postgresql DB to retrieve data into Qlik Sense.
We identified (in the attached example - retrieving 4 million records) that the query at the SQL level ends in around 40 seconds, while another process takes 4 minutes.
20240717T110810.305+0300 0024 LIB CONNECT TO 'PostgreSQL_localhost'
20240717T110811.328+0300 Connected.
20240717T110811.330+0300 0028 SQL SELECT *
20240717T110811.330+0300 0029 FROM table1
20240717T110811.330+0300 0030 WHERE "timestamp" > '2024-06-27 20:04:08'
20240717T110855.187+0300 43 fields found: field1, field2, field3, field4, field5, field6, field7, field8, field9, field10, field11, field12, field13, field14, field15, field16, field17, field18, field19, field20, field21, field22, field23, field24, field25, field26, field27, field28, field29, field30, field31, field32, field33, field34, field35, field36, field37, field38, field39, field40, field41, field42, timestamp
20240717T111256.239+0300 4,002,000 lines fetched
20240717T111257.239+0300 Execution finished.
Is it possible to get an explanation of what exactly happens after the query ends on Qlik side?
What exactly is done in the fetching phase that takes 4 minutes?
Is it possible to debug the matter using logs?
Thanks,
Eli.
I'm not sure that this interpretation of the document-log is correct. I could imagine that it means that the data-base needs the 40 seconds to start the return of the first records which are then finished after further 4 minutes.
You may a bit playing with this thought by changing the number of records with an adjusted where-clause to load against yesterday and/or the yearstart or ...
@EliGohar not sure what you mean by 'another process' but it seems that once the tables/records are extracted, Qlik might be indexing those records which might add to the overall extraction time.
Also agree with @marcus_sommer , interpretation seems incorrect.
@Vishal_Gupta - By indexing do you mean search index using CreateSearchIndexOnReload parameter? because we disabled it in the script.
@marcus_sommer - I may be wrong in my interpretation, but I saw that on the Postgresql side the query that Qlik sends through the script ends in 40 seconds. So what exactly happens next? Is there a way to improve the retrieval speed?
Thanks!
Like above hinted I assume that the 40 seconds is the query on the data-base side and the other time is the network-transfer per driver to Qlik and it's loading into the system- and data-tables.
I suggest to apply the mentioned adjustments to the where-clause and/or also some playing with 1 / 3 / 10 / n fields - there should be some insightful pattern ...
Assuming that really all fields and records are mandatory needed you may consider to apply any incremental logic and/or to divide the load into n loads of n fields. There are more possibilities but the effects are rather small.
To start out with, Qlik doesn't have the data when it reports "43 fields found: ". Here's a debug look into what is happening under the hood on each side with default settings:
The brief version of this is that Qlik will report the fields found in the first batch of information from PostgreSQL then continue to process the stream of data from PostgreSQL.
As background, PostgreSQL's ODBC driver (and Qlik's specific implementation) supports 4 modes:
Reference docs: https://docs.posit.co/drivers/1.7.0/pdf/Simba%20PostgreSQL%20ODBC%20Driver%20Install%20and%20Configu...
Ages ago I did some basic testing with these results:
The default behavior of the driver is to use a Use Declare / Fetch mode with a cache or page size of 200 rows of data.
Retrieve Entire Result Set In Memory, as the name implies will queue up a batch of data from the database and deliver it en masse to the Qlik Engine. We can see that the memory usage is relatively high compared to the Single Row and Use Declare / Fetch methods. It seems to be the fastest option to be sure, but only slightly in this relatively simple test.
Use Multiple Statements behaves a bit unexpectedly to my eyes. The memory usage is highest and the reload time is near the worst. It isn’t obvious how this mode is particularly relevant to a BI use case as our Engine will only issue a single SELECT statement so our queries will never come in multiples to take advantage of the theoretical benefits here.
Single Row Mode implies that the entire result set is stored in memory by PostgreSQL and the driver accepts a single row at a time. In terms of performance, it is slower than the fastest mode but not significantly so.
Notes:
How to implement: Set the key value pairs in the Advanced section of the connector config. Example setting SingleRowMode
Hi Levi,
Thanks for the detailed answer!
Our data source in Postgres DB is a complex view, each record in view involved some data calculations: concatenation of strings, and some math actions.
So when using batch fetching (by cursor) it takes a long time.
When we tried to use SingleRowMode - the total time reduced to ~1.5min, but we see that the stress moved to db (1.2min is the fetching from DB)
Do you have any idea how to improve this scenario?
Thanks again,
Eli.
Before delving into the technical, I would recommend assessing what you are trying to achieve. Via the advanced parameters, you have increased the throughput to Qlik, but at a cost of undesired strain on the database.
Let’s presume that you want to have as fast a reload as possible simultaneously lowering the stress on the underlying data source as much as possible. The standard approach in Qlik is to use a technique called an incremental load:
The simple description of this is that you will:
There are a number of articles across Community on this concept.
@Levi_Turner We already implemented incremental load. Data is loaded incrementally every 5 minutes (Insert only). The incremental loads insert about 100K new records (maximum), so they are done quite quickly.
Our interest is with the initial loading, which as mentioned can include millions of records, and therefore the purpose of the post - to understand if something can be improved in the initial loading.
Eli.