Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
krisjanis
Contributor II
Contributor II

Calculate distances between many points (large data)

Hi,

Basically I have all functions two calculate distances between two data points, but I have a problem because I have large data set and my server get 100% cpu and 100% Ram because of large data set.

There are 2 two tables:

1st is member_addresses table, where are 2 millions unique rows:

MemberId, Lat, lng

10001, 56.95, 24.25

10002, 56.96, 24.24

etc.


2nd table is store location addresses, where are 20'000 unique stores


Storeid, lat, lng

5001, 60.12, 24.86

5002, 60.13, 24.85



I need to find out for each member nearest 5 stores, when I try to join both table together, it creates 2'000'000* 20*000 rows, and it freezes my server. I know, it's not efficient, but can you help to suggest which will be the best way to calculate it?



1 Solution

Accepted Solutions
kamal_sanguri
Specialist
Specialist

Hi,

You may have to apply some effort, pls refer this qlikview doc.

You can try with grouping of either Long or Lat that will help you to avoid unwanted join.

EDIT:

So the idea is to divide Lon or Lat in group and assign them a key, do it for both the tables (members & stores). Don't join Qlik will make an association itself.

View solution in original post

7 Replies
Patric_Nordstrom
Employee
Employee

Hi,

Split up the request, send perhaps 10 000 members at the time.

Thanks,

Patric

kamal_sanguri
Specialist
Specialist

Hi,

You may have to apply some effort, pls refer this qlikview doc.

You can try with grouping of either Long or Lat that will help you to avoid unwanted join.

EDIT:

So the idea is to divide Lon or Lat in group and assign them a key, do it for both the tables (members & stores). Don't join Qlik will make an association itself.

krisjanis
Contributor II
Contributor II
Author

That's possible, but I am thinking about to create loop.

Taking just one member and running through each store table and then find 5 the nearest stores and drop other 19k rows.

Patric_Nordstrom
Employee
Employee

You can write your own code or just use the loop in the code that the connector produces.

Move the header for the inline table and move server call inside of the "chunk" iteration.

Thanks,

Patric

krisjanis
Contributor II
Contributor II
Author

Thanks.

I tried 3 different loads.

2 with loops and your solution. Seems your solutions is the fastest. 6x faster than loop.

kamal_sanguri
Specialist
Specialist

Loops with big data sets are never good. Glad you could resolve your issue.

krisjanis
Contributor II
Contributor II
Author

Yes. Thank again !

Maybe you can help me with one more thing about loading only top 5 the nearest POS points.

I think code can be optimized.

Right now it is slowing down at table: Member_POS_stores_order because of autonumber and ordering.

first 1500

Actual_address:

LOAD distinct

  MembershipNo,

  MemberAddress,

     status,

     type,

     formatted_address,

     place_id,

     location_type, 

     [location/lat],

     [location/lng],

     If([location/lat]>=0 and [location/lat]<=10,'A',

  If([location/lat]>10 and [location/lat]<=20,'B',

  If([location/lat]>20 and [location/lat]<=30,'C',

  If([location/lat]>30 and [location/lat]<=40,'D',

  If([location/lat]>40 and [location/lat]<=50,'E',

  If([location/lat]>50 and [location/lat]<=60,'F','Z')))))) as key

    

FROM

[servername\QVD\All_members_geocoded.qvd]

(qvd)

where 1=1

and exists (Key,MembershipNo&'-'&MemberAddress)

and Match(status, 'OK');

inner join

LOAD

     id,

     name as POS_name,

     street_address,

     postal_code,

     city,

     country,

     latitude as POS_lat,

     longitude as POS_lng,

     brand_name,

     If(latitude>=0 and latitude<=10,'A',

  If(latitude>10 and latitude<=20,'B',

  If(latitude>20 and latitude<=30,'C',

  If(latitude>30 and latitude<=40,'D',

  If(latitude>40 and latitude<=50,'E',

  If(latitude>50 and latitude<=60,'F','Z')))))) as key

FROM

[servername\QVD\Partner_POS_Locator.qvd] (qvd);

Address_POS_distances:

Load

  MembershipNo as MembershipNo1,

  MemberAddress as MemberAddress1,

     status as status1,

     type as type1,

     formatted_address as formatted_address1,

     place_id as place_id1,

     location_type as location_type1, 

     [location/lat] as [location/lat1],

     [location/lng] as [location/lng1],

     id as id1,

     POS_name as POS_name1,

     street_address as street_address1,

     postal_code as postal_code1,

     city as city1,

     country as country1,

     POS_lat as POS_lat1,

     POS_lng as POS_lng1,

     brand_name as brand_name1,

     ACOS(SIN([location/lat]*Pi()/180)*SIN(POS_lat*Pi()/180)

     +COS([location/lat]*Pi()/180)*COS(POS_lat*Pi()/180)

     *COS((POS_lng*Pi()/180)-([location/lng]*Pi()/180)))

     *6371000 as [Distance (metres)]

     

resident Actual_address;

drop table Actual_address;

Member_POS_stores_order:

load

  MembershipNo1 as MembershipNo,

  MemberAddress1 as MemberAddress,

  [location/lat1] as [location/lat],

  [location/lng1] as [location/lng],

  id1 as id,

     POS_name1 as POS_name,

     street_address1 as street_address,

     postal_code1 as postal_code,

     city1 as city,

     country1 as country,

     POS_lat1 as POS_lat,

     POS_lng1 as POS_lng,

     brand_name1 as brand_name,

     [Distance (metres)],

     AutoNumber(MembershipNo1&id1&[Distance (metres)],MembershipNo1) as Rank

    

resident  Address_POS_distances

order by  [Distance (metres)] asc;

drop table Address_POS_distances;

Member_POS_Stores_Top5:

load

  MembershipNo,

  MemberAddress,

  [location/lat],

  [location/lng],

  id as POSid,

     POS_name,

     street_address,

     postal_code,

     city,

     country,

     POS_lat,

     POS_lng,

     brand_name,

     [Distance (metres)],

     Rank

resident Member_POS_stores_order

where Rank <=5;

drop table Member_POS_stores_order;