Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Basically I have all functions two calculate distances between two data points, but I have a problem because I have large data set and my server get 100% cpu and 100% Ram because of large data set.
There are 2 two tables:
1st is member_addresses table, where are 2 millions unique rows:
MemberId, Lat, lng
10001, 56.95, 24.25
10002, 56.96, 24.24
etc.
2nd table is store location addresses, where are 20'000 unique stores
Storeid, lat, lng
5001, 60.12, 24.86
5002, 60.13, 24.85
I need to find out for each member nearest 5 stores, when I try to join both table together, it creates 2'000'000* 20*000 rows, and it freezes my server. I know, it's not efficient, but can you help to suggest which will be the best way to calculate it?
Hi,
You may have to apply some effort, pls refer this qlikview doc.
You can try with grouping of either Long or Lat that will help you to avoid unwanted join.
EDIT:
So the idea is to divide Lon or Lat in group and assign them a key, do it for both the tables (members & stores). Don't join Qlik will make an association itself.
Hi,
Split up the request, send perhaps 10 000 members at the time.
Thanks,
Patric
Hi,
You may have to apply some effort, pls refer this qlikview doc.
You can try with grouping of either Long or Lat that will help you to avoid unwanted join.
EDIT:
So the idea is to divide Lon or Lat in group and assign them a key, do it for both the tables (members & stores). Don't join Qlik will make an association itself.
That's possible, but I am thinking about to create loop.
Taking just one member and running through each store table and then find 5 the nearest stores and drop other 19k rows.
You can write your own code or just use the loop in the code that the connector produces.
Move the header for the inline table and move server call inside of the "chunk" iteration.
Thanks,
Patric
Thanks.
I tried 3 different loads.
2 with loops and your solution. Seems your solutions is the fastest. 6x faster than loop.
Loops with big data sets are never good. Glad you could resolve your issue.
Yes. Thank again !
Maybe you can help me with one more thing about loading only top 5 the nearest POS points.
I think code can be optimized.
Right now it is slowing down at table: Member_POS_stores_order because of autonumber and ordering.
first 1500
Actual_address:
LOAD distinct
MembershipNo,
MemberAddress,
status,
type,
formatted_address,
place_id,
location_type,
[location/lat],
[location/lng],
If([location/lat]>=0 and [location/lat]<=10,'A',
If([location/lat]>10 and [location/lat]<=20,'B',
If([location/lat]>20 and [location/lat]<=30,'C',
If([location/lat]>30 and [location/lat]<=40,'D',
If([location/lat]>40 and [location/lat]<=50,'E',
If([location/lat]>50 and [location/lat]<=60,'F','Z')))))) as key
FROM
[servername\QVD\All_members_geocoded.qvd]
(qvd)
where 1=1
and exists (Key,MembershipNo&'-'&MemberAddress)
and Match(status, 'OK');
inner join
LOAD
id,
name as POS_name,
street_address,
postal_code,
city,
country,
latitude as POS_lat,
longitude as POS_lng,
brand_name,
If(latitude>=0 and latitude<=10,'A',
If(latitude>10 and latitude<=20,'B',
If(latitude>20 and latitude<=30,'C',
If(latitude>30 and latitude<=40,'D',
If(latitude>40 and latitude<=50,'E',
If(latitude>50 and latitude<=60,'F','Z')))))) as key
FROM
[servername\QVD\Partner_POS_Locator.qvd] (qvd);
Address_POS_distances:
Load
MembershipNo as MembershipNo1,
MemberAddress as MemberAddress1,
status as status1,
type as type1,
formatted_address as formatted_address1,
place_id as place_id1,
location_type as location_type1,
[location/lat] as [location/lat1],
[location/lng] as [location/lng1],
id as id1,
POS_name as POS_name1,
street_address as street_address1,
postal_code as postal_code1,
city as city1,
country as country1,
POS_lat as POS_lat1,
POS_lng as POS_lng1,
brand_name as brand_name1,
ACOS(SIN([location/lat]*Pi()/180)*SIN(POS_lat*Pi()/180)
+COS([location/lat]*Pi()/180)*COS(POS_lat*Pi()/180)
*COS((POS_lng*Pi()/180)-([location/lng]*Pi()/180)))
*6371000 as [Distance (metres)]
resident Actual_address;
drop table Actual_address;
Member_POS_stores_order:
load
MembershipNo1 as MembershipNo,
MemberAddress1 as MemberAddress,
[location/lat1] as [location/lat],
[location/lng1] as [location/lng],
id1 as id,
POS_name1 as POS_name,
street_address1 as street_address,
postal_code1 as postal_code,
city1 as city,
country1 as country,
POS_lat1 as POS_lat,
POS_lng1 as POS_lng,
brand_name1 as brand_name,
[Distance (metres)],
AutoNumber(MembershipNo1&id1&[Distance (metres)],MembershipNo1) as Rank
resident Address_POS_distances
order by [Distance (metres)] asc;
drop table Address_POS_distances;
Member_POS_Stores_Top5:
load
MembershipNo,
MemberAddress,
[location/lat],
[location/lng],
id as POSid,
POS_name,
street_address,
postal_code,
city,
country,
POS_lat,
POS_lng,
brand_name,
[Distance (metres)],
Rank
resident Member_POS_stores_order
where Rank <=5;
drop table Member_POS_stores_order;