Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have address data from a tables with filter address_type as origin as main and address_type as destination as lookup table
now I need to calculate the distance from the each origin and destination address
so Using Cartesian join hence not adding any expression key and using the join model as all rows, left outer join.
passing the origin and destination to the distance matrix api.
I have 25 origins and 3000 destination so after Cartesian join it becomes 75000 rows, so the distance matrix will have 75000 api request but this flow doesn’t work since the number of rows are huge, the job will run for hours and fails after some time . So is there a way to fix this
Lookup
|
main—-tmap—-tflowtoiterate—-trest—textractjsonfield—tmap— tdbout
Hi
You can have a try to activate the 'enable parallel execution' option on the iterate connector, too many interactions with the API may cause timeout errors, and the API may also have a limit on the maximum number of calls.
Regards
Shong
HI Shong, I have modified my job a bit and I tried the parallel execution , i have given 10000 as the execution number, it has increased the speed of the api call but providing this error at tdbout stating it was aborted because the number of waiters of this exceeds the 20 statement limit.
tDBINPUT --->Iterate__(x10000)--->TREST---textractJSON--TMAP--TDBOUTPUT
You set a very very big number, usually, we set the number 2 or 4.
Okay I set it as 5, but no luck. The job is still taking more than 2 hrs to finish
Both data processing and data transmission will take time. After all, you have so many rows of data that need to be processed iteratively and write the data to cloud DB. I have no opinion on how much time is correct, as long as it is within an acceptable range. I think you need to double check if you really need a left outer join without keys, and possible to filter the rows before calling API?
I removed the left outer join from the talend, doing cartessian join in the sql itself. Is there a way where i can pass 100 rows per api request?
This is the request
context.url+context.key+"&destinations="+((String)globalMap.get("row2.Destination"))+"&origins="+((String)globalMap.get("row2.Origin"))+"&units=imperial"
tDBINPUT --->Iterate__(x10)--->TREST---textractJSON--TMAP--TDBOUTPUT
The latency here is caused by the individual API calls. API calls are slow by comparison to a process running entirely on your machine/server. You say 75000 address combinations take more than 2 hours. Even if that were 3 hours, that is just under 7 address combinations a second.
You may need to think about an alternative way of calculating these distances. If you are simply calculating the distances as the crow flies (not using roads), there are algorithms you can use to calculate the straight line distances between GPS coordinates. That would be MUCH faster, but you would first need to get the GPS coordinates of your addresses. But, to do that you would only need 3025 API calls to collect the GPS coordinates.
yes, i already used the geocoding api and got the GPS coordinates.
@Since 1995 OK, take a look at this article to see if this is what you need.....
https://en.wikipedia.org/wiki/Vincenty%27s_formulae
I have used this and it is pretty easy to implement. Here is a Java implementation of this.....
https://stackoverflow.com/a/9822531