Re: Pulling data from Postgresql DB - Question ab... - Qlik Community

EliGohar

Hi,

We work with Postgresql DB to retrieve data into Qlik Sense.
We identified (in the attached example - retrieving 4 million records) that the query at the SQL level ends in around 40 seconds, while another process takes 4 minutes.

20240717T110810.305+0300 0024 LIB CONNECT TO 'PostgreSQL_localhost'
20240717T110811.328+0300 Connected.
20240717T110811.330+0300 0028 SQL SELECT *
20240717T110811.330+0300 0029 FROM table1
20240717T110811.330+0300 0030 WHERE "timestamp" > '2024-06-27 20:04:08'
20240717T110855.187+0300 43 fields found: field1, field2, field3, field4, field5, field6, field7, field8, field9, field10, field11, field12, field13, field14, field15, field16, field17, field18, field19, field20, field21, field22, field23, field24, field25, field26, field27, field28, field29, field30, field31, field32, field33, field34, field35, field36, field37, field38, field39, field40, field41, field42, timestamp
20240717T111256.239+0300 4,002,000 lines fetched
20240717T111257.239+0300 Execution finished.

Is it possible to get an explanation of what exactly happens after the query ends on Qlik side?

What exactly is done in the fetching phase that takes 4 minutes?

Is it possible to debug the matter using logs?

Thanks,

Eli.

marcus_sommer

I'm not sure that this interpretation of the document-log is correct. I could imagine that it means that the data-base needs the 40 seconds to start the return of the first records which are then finished after further 4 minutes.

You may a bit playing with this thought by changing the number of records with an adjusted where-clause to load against yesterday and/or the yearstart or ...

Vishal_Gupta

@EliGohar not sure what you mean by 'another process' but it seems that once the tables/records are extracted, Qlik might be indexing those records which might add to the overall extraction time.

Also agree with @marcus_sommer , interpretation seems incorrect.

If a post helps to resolve your issue, please accept it as a SOLUTION and leave a LIKE!

EliGohar

@Vishal_Gupta - By indexing do you mean search index using CreateSearchIndexOnReload parameter? because we disabled it in the script.

@marcus_sommer - I may be wrong in my interpretation, but I saw that on the Postgresql side the query that Qlik sends through the script ends in 40 seconds. So what exactly happens next? Is there a way to improve the retrieval speed?

Thanks!

marcus_sommer

Like above hinted I assume that the 40 seconds is the query on the data-base side and the other time is the network-transfer per driver to Qlik and it's loading into the system- and data-tables.

I suggest to apply the mentioned adjustments to the where-clause and/or also some playing with 1 / 3 / 10 / n fields - there should be some insightful pattern ...

Assuming that really all fields and records are mandatory needed you may consider to apply any incremental logic and/or to divide the load into n loads of n fields. There are more possibilities but the effects are rather small.

Levi_Turner

To start out with, Qlik doesn't have the data when it reports "43 fields found: ". Here's a debug look into what is happening under the hood on each side with default settings:

The brief version of this is that Qlik will report the fields found in the first batch of information from PostgreSQL then continue to process the stream of data from PostgreSQL.

As background, PostgreSQL's ODBC driver (and Qlik's specific implementation) supports 4 modes:

Retrieve Entire Result Set in Memory
Use Multiple Statements
Single Row Mode
Use Declare / Fetch

Reference docs: https://docs.posit.co/drivers/1.7.0/pdf/Simba%20PostgreSQL%20ODBC%20Driver%20Install%20and%20Configu...

Ages ago I did some basic testing with these results:

The default behavior of the driver is to use a Use Declare / Fetch mode with a cache or page size of 200 rows of data.

Retrieve Entire Result Set In Memory, as the name implies will queue up a batch of data from the database and deliver it en masse to the Qlik Engine. We can see that the memory usage is relatively high compared to the Single Row and Use Declare / Fetch methods. It seems to be the fastest option to be sure, but only slightly in this relatively simple test.

Use Multiple Statements behaves a bit unexpectedly to my eyes. The memory usage is highest and the reload time is near the worst. It isn’t obvious how this mode is particularly relevant to a BI use case as our Engine will only issue a single SELECT statement so our queries will never come in multiples to take advantage of the theoretical benefits here.

Single Row Mode implies that the entire result set is stored in memory by PostgreSQL and the driver accepts a single row at a time. In terms of performance, it is slower than the fastest mode but not significantly so.

Notes:

This is a synthetic test where Qlik Sense Desktop is connecting to a local PostgreSQL instance. My guess is that the Use Declare / Fetch (and the default setup) would perform worse in terms of performance when more realistic networking is involved as the batches are rendered on the PostgreSQL side and then seem to be delivered to our ODBC Connector.
This was a very simple data set so I suspect the memory usage by the connector will show greater differences with more realistic data. I expect the Single Row Mode to be the lowest in terms of resource usage as the ODBC connector only needs to process a single row of data vs. the Use Declare / Fetch method (batches of multiple records) or the Entire Result Set In Memory method which likely would falter at scale in many environments including Qlik SaaS

How to implement: Set the key value pairs in the Advanced section of the connector config. Example setting SingleRowMode

Pulling data from Postgresql DB - Question about performance

General Question