Integrate QLIK with Elastic Search - Qlik Community

Anonymous · ‎2018-05-30

Is it possible to connect QLik View with Elastic Search. That is to have the QLik view as the visualization system at the front-end which connects and directly executes queries against big data hosted at elastic.

Best Regards,

Reem

Gysbert_Wassenaar · ‎2019-02-06

Yes, certainly. ElasticSearch has a REST interface that you can use with the Qlik Sense Rest Connector.

talk is cheap, supply exceeds demand

yuvalos3 · ‎2019-04-08

Hi, can you give an example of how to achieve it?

I was trying to use the pagination but elasticsearch has a limit of 10K docs.

Is there a known way to fetch a large index from es using qlik rest connector?

Gysbert_Wassenaar · ‎2019-04-08

You can use the scroll api to get all the data from an index. I use two subroutines to do this. The first gets the first batch and is used to calculate the number of iterations needed to scroll through all the data. The second subroutine is called iteratively in a loop.

CALL GetData_FirstBatch;

For i = 2 To $(vIterations)
CALL GetData_NextBatch(i);
Next

SUB GetData_FirstBatch

    SET vTotal=;
    SET vCalls=;
    SET vScroll_ID=;
    SET vBody =;

    LET vSize = 5000;

	LIB CONNECT TO 'elasticsearch_rest_connection'; // just a template, we overrule the url in WITH CONNECTION ...

    RestConnectorMasterTable:
    LOAD
    	*,
        '$(vNow)|1' as BatchNo
        ;
    SQL SELECT 
		...loads of stuff...
    FROM JSON (wrap on) "root" PK "__KEY_root"
   
    WITH CONNECTION (
    URL "https://myelasticsearchsite/myindex-*/_search",
    HTTPHEADER "Authorization" "Bearer $(vSessionToken)",
    QUERY "scroll" "10m",
    QUERY "size" "$(vSize)"
    );


	[hits_u0]:
    LOAD	
    	[total_u0]
    RESIDENT
    	RestConnectorMasterTable
    WHERE
    	NOT IsNull([__FK_hits])
        ;

    LET vTotal = peek ('total_u0',-1,'hits_u0');
    LET vIterations = ceil($(vTotal)/$(vSize));

	DROP TABLE [hits_u0];
   

END SUB


SUB GetData_NextBatch(Iteration)

    [root]:
    LOAD	
    	[_scroll_id]
    RESIDENT
    	RestConnectorMasterTable
    WHERE
    	NOT IsNull([__KEY_root])
        ;

    LET vScroll_ID = peek('_scroll_id',-1,'root');
    LET vBody = '{ "scroll" : "10m", "scroll_id" : "$(vScroll_ID)" }';
    LET vBody = Replace('$(vBody)','"',chr(34)&chr(34));

    DROP TABLE [root]; 
        
    
    LIB CONNECT TO 'elasticsearch_rest_connection';

    CONCATENATE (RestConnectorMasterTable)
    
    LOAD
    	*,
        '$(vNow)|$(Iteration)' as BatchNo
        ;
    SQL SELECT 
		...lots of stuff...
    FROM JSON (wrap on) "root" PK "__KEY_root"

    WITH CONNECTION (
    URL "https://myelasticsearchsite/_search/scroll",    
    HTTPHEADER "Authorization" "Bearer $(vSessionToken)",
    BODY "$(vBody)"
    );
   
    
    
END SUB

talk is cheap, supply exceeds demand

yuvalos3 · ‎2019-04-08

Thanks a lot, it was very helpful!

yuvalos3 · ‎2019-04-10

Hi, I'm having problem with this script for large amounts of data,

I was trying to retrieve from an index with 500K docs, and it was working fine, took about a minute.

But then I tried to use it for an index with 8.5 million records, and it takes too long and Qlik also stuck on "Loading table" when I go to the Data manager. the 'Data load progress' is getting slower as the time passes by,

I think it has something to do with the concatenation in each loop.

But even if I didn't write it explicitly, Qlik will do the concatenation implicitly because the columns are the same.

I do know that our architect once achieved it using the regular pagination of elasticsearch and just used the:
"Offset" pagination with: "Count": size, "from": 0, "Total": "root/hits/total".

I'm not sure what I'm doing differently, he told me that he did the very same thing.

Though I can't blame Qlik for this, the pagination limitation is indeed in the elasticsearch.

So, can you think of something else to load such large amounts of data?

Thanks

Gysbert_Wassenaar · ‎2019-04-10

The RestConnectorMasterTable table can get very big. Make sure you're not running out of ram.

If your ES index contains a field that allows you to partition the data into blocks perhaps you can retrieve all the data is smaller sets, like for example 1 million at a time. You retrieve the 1 million records, store them into qvd's, retrieve the next million, concatenate them to the qvd's, store the qvd's with now 2 million records, etc...., till you've got all the records.
Or perhaps you can partition your ES index into several indices and load from each separate index one at a time and store intermediate results into qvd's again.

talk is cheap, supply exceeds demand

yuvalos3 · ‎2019-04-15

I just want to update how we managed to solve the issue.

We've increased the index parameter index.max_result_window to 10M.

After that, we used a regular pagination (no scroll)

In the rest connector we put the following:

URL: our index url

Pagination: Offset

'Start' param name: from

'Start' initial value: 0

'Count' parameter name: size

'Count' initial value: 10000

'Total records' path: root/hits/total

This took about 100 minutes to load 8.5M records from es.

Afterwards we had another issue, we couldn't load the 'Data manager' window.

We've bypassed it by adding at the end of the script a store command to a qvd file:

STORE [TableName] into [lib://[qvd_folder_connector]/[FileName].qvd];

And then we've created an app and loaded the data from the qvd file instead.

Gysbert_Wassenaar · ‎2019-04-15

Thanks for the feedback!

talk is cheap, supply exceeds demand

Hanan · ‎2019-10-24

Hello,
I want make a REST connection for ES.

I have the same problem, number of lines (10000) I implement your code but I have the following error message:

"HTTP protocol error 401 (Unauthorized): 'Bearer' authentication schema provided by the web-service is not supported or your credentials are not valid."

The credential its OK(the connection work)

Maybe you know why?

Best regard (sorry for my english)

Hanan