7 Replies Latest reply: May 29, 2017 6:28 AM by Krisjanis Berzs RSS

    Calculate distances between many points (large data)

    Krisjanis Berzs

      Hi,

       

      Basically I have all functions two calculate distances between two data points, but I have a problem because I have large data set and my server get 100% cpu and 100% Ram because of large data set.

       

      There are 2 two tables:

       

      1st is member_addresses table, where are 2 millions unique rows:

       

      MemberId, Lat, lng

      10001, 56.95, 24.25

      10002, 56.96, 24.24

      etc.


      2nd table is store location addresses, where are 20'000 unique stores


      Storeid, lat, lng

      5001, 60.12, 24.86

      5002, 60.13, 24.85



      I need to find out for each member nearest 5 stores, when I try to join both table together, it creates 2'000'000* 20*000 rows, and it freezes my server. I know, it's not efficient, but can you help to suggest which will be the best way to calculate it?



        • Re: Calculate distances between many points (large data)
          Patric Nordström

          Hi,

           

          Split up the request, send perhaps 10 000 members at the time.

           

          Thanks,

          Patric

            • Re: Calculate distances between many points (large data)
              Kamal Kumar Sanguri

              Hi,

               

              You may have to apply some effort, pls refer this qlikview doc.

               

              You can try with grouping of either Long or Lat that will help you to avoid unwanted join.

               

              EDIT:

              So the idea is to divide Lon or Lat in group and assign them a key, do it for both the tables (members & stores). Don't join Qlik will make an association itself.

                • Re: Calculate distances between many points (large data)
                  Krisjanis Berzs

                  Thanks.

                   

                  I tried 3 different loads.

                  2 with loops and your solution. Seems your solutions is the fastest. 6x faster than loop.

                    • Re: Calculate distances between many points (large data)
                      Kamal Kumar Sanguri

                      Loops with big data sets are never good. Glad you could resolve your issue.

                        • Re: Calculate distances between many points (large data)
                          Krisjanis Berzs

                          Yes. Thank again !

                           

                          Maybe you can help me with one more thing about loading only top 5 the nearest POS points.

                          I think code can be optimized.

                           

                          Right now it is slowing down at table: Member_POS_stores_order because of autonumber and ordering.

                           

                          first 1500

                          Actual_address:

                          LOAD distinct

                            MembershipNo,

                            MemberAddress,

                               status,

                               type,

                               formatted_address,

                               place_id,

                               location_type, 

                               [location/lat],

                               [location/lng],

                               If([location/lat]>=0 and [location/lat]<=10,'A',

                            If([location/lat]>10 and [location/lat]<=20,'B',

                            If([location/lat]>20 and [location/lat]<=30,'C',

                            If([location/lat]>30 and [location/lat]<=40,'D',

                            If([location/lat]>40 and [location/lat]<=50,'E',

                            If([location/lat]>50 and [location/lat]<=60,'F','Z')))))) as key

                           

                           

                              

                          FROM

                          [servername\QVD\All_members_geocoded.qvd]

                          (qvd)

                          where 1=1

                          and exists (Key,MembershipNo&'-'&MemberAddress)

                          and Match(status, 'OK');

                           

                          inner join

                          LOAD

                               id,

                               name as POS_name,

                               street_address,

                               postal_code,

                               city,

                               country,

                               latitude as POS_lat,

                               longitude as POS_lng,

                               brand_name,

                               If(latitude>=0 and latitude<=10,'A',

                            If(latitude>10 and latitude<=20,'B',

                            If(latitude>20 and latitude<=30,'C',

                            If(latitude>30 and latitude<=40,'D',

                            If(latitude>40 and latitude<=50,'E',

                            If(latitude>50 and latitude<=60,'F','Z')))))) as key

                           

                           

                          FROM

                          [servername\QVD\Partner_POS_Locator.qvd] (qvd);

                           

                          Address_POS_distances:

                          Load

                            MembershipNo as MembershipNo1,

                            MemberAddress as MemberAddress1,

                               status as status1,

                               type as type1,

                               formatted_address as formatted_address1,

                               place_id as place_id1,

                               location_type as location_type1, 

                               [location/lat] as [location/lat1],

                               [location/lng] as [location/lng1],

                               id as id1,

                               POS_name as POS_name1,

                               street_address as street_address1,

                               postal_code as postal_code1,

                               city as city1,

                               country as country1,

                               POS_lat as POS_lat1,

                               POS_lng as POS_lng1,

                               brand_name as brand_name1,

                               ACOS(SIN([location/lat]*Pi()/180)*SIN(POS_lat*Pi()/180)

                               +COS([location/lat]*Pi()/180)*COS(POS_lat*Pi()/180)

                               *COS((POS_lng*Pi()/180)-([location/lng]*Pi()/180)))

                               *6371000 as [Distance (metres)]

                               

                          resident Actual_address;

                          drop table Actual_address;

                           

                           

                           

                          Member_POS_stores_order:

                          load

                            MembershipNo1 as MembershipNo,

                            MemberAddress1 as MemberAddress,

                            [location/lat1] as [location/lat],

                            [location/lng1] as [location/lng],

                            id1 as id,

                               POS_name1 as POS_name,

                               street_address1 as street_address,

                               postal_code1 as postal_code,

                               city1 as city,

                               country1 as country,

                               POS_lat1 as POS_lat,

                               POS_lng1 as POS_lng,

                               brand_name1 as brand_name,

                               [Distance (metres)],

                               AutoNumber(MembershipNo1&id1&[Distance (metres)],MembershipNo1) as Rank

                              

                          resident  Address_POS_distances

                          order by  [Distance (metres)] asc;

                          drop table Address_POS_distances;

                           

                           

                          Member_POS_Stores_Top5:

                          load

                            MembershipNo,

                            MemberAddress,

                            [location/lat],

                            [location/lng],

                            id as POSid,

                               POS_name,

                               street_address,

                               postal_code,

                               city,

                               country,

                               POS_lat,

                               POS_lng,

                               brand_name,

                               [Distance (metres)],

                               Rank

                          resident Member_POS_stores_order

                          where Rank <=5;

                          drop table Member_POS_stores_order;

                    • Re: Calculate distances between many points (large data)
                      Krisjanis Berzs

                      That's possible, but I am thinking about to create loop.

                      Taking just one member and running through each store table and then find 5 the nearest stores and drop other 19k rows.