Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
since_1995
Contributor III
Contributor III

Distance matrix api request

Hi,

I have address data from a tables with filter address_type as origin as main and address_type as destination as lookup table

now I need to calculate the distance from the each origin and destination address

so Using Cartesian join hence not adding any expression key and using the join model as all rows, left outer join.

passing the origin and destination to the distance matrix api.

I have 25 origins and 3000 destination so after Cartesian join it becomes 75000 rows, so the distance matrix will have 75000 api request but this flow doesn’t work since the number of rows are huge, the job will run for hours and fails after some time . So is there a way to fix this

Lookup

|

main—-tmap—-tflowtoiterate—-trest—textractjsonfield—tmap— tdbout

Labels (2)
9 Replies
Anonymous
Not applicable

Hi

You can have a try to activate the 'enable parallel execution' option on the iterate connector, too many interactions with the API may cause timeout errors, and the API may also have a limit on the maximum number of calls.

Regards

Shong

since_1995
Contributor III
Contributor III
Author

HI Shong, I have modified my job a bit and I tried the parallel execution , i have given 10000 as the execution number, it has increased the speed of the api call but providing this error at tdbout stating it was aborted because the number of waiters of this exceeds the 20 statement limit.

 

 

 

tDBINPUT --->Iterate__(x10000)--->TREST---textractJSON--TMAP--TDBOUTPUT

 

 

Anonymous
Not applicable

You set a very very big number, usually, we set the number 2 or 4.

since_1995
Contributor III
Contributor III
Author

Okay I set it as 5, but no luck. The job is still taking more than 2 hrs to finish

Anonymous
Not applicable

Both data processing and data transmission will take time. After all, you have so many rows of data that need to be processed iteratively and write the data to cloud DB. I have no opinion on how much time is correct, as long as it is within an acceptable range. I think you need to double check if you really need a left outer join without keys, and possible to filter the rows before calling API?

 

since_1995
Contributor III
Contributor III
Author

I removed the left outer join from the talend, doing cartessian join in the sql itself. Is there a way where i can pass 100 rows per api request?

 

This is the request

 

context.url+context.key+"&destinations="+((String)globalMap.get("row2.Destination"))+"&origins="+((String)globalMap.get("row2.Origin"))+"&units=imperial"

 

tDBINPUT --->Iterate__(x10)--->TREST---textractJSON--TMAP--TDBOUTPUT

Anonymous
Not applicable

The latency here is caused by the individual API calls. API calls are slow by comparison to a process running entirely on your machine/server. You say 75000 address combinations take more than 2 hours. Even if that were 3 hours, that is just under 7 address combinations a second.

 

You may need to think about an alternative way of calculating these distances. If you are simply calculating the distances as the crow flies (not using roads), there are algorithms you can use to calculate the straight line distances between GPS coordinates. That would be MUCH faster, but you would first need to get the GPS coordinates of your addresses. But, to do that you would only need 3025 API calls to collect the GPS coordinates.

since_1995
Contributor III
Contributor III
Author

yes, i already used the geocoding api and got the GPS coordinates.

Anonymous
Not applicable

@Since 1995​ OK, take a look at this article to see if this is what you need.....

https://en.wikipedia.org/wiki/Vincenty%27s_formulae

 

I have used this and it is pretty easy to implement. Here is a Java implementation of this.....

 

https://stackoverflow.com/a/9822531