Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Looping in load script
I have the following data:
Customerid | Age | Income (k) | Purchased |
1 | 45 | 46 | Book |
2 | 39 | 100 | TV |
3 | 35 | 38 | DVD |
4 | 69 | 150 | Car Cover |
5 | 58 | 51 | CD player |
What I like to do is to find the nearest neighbor,
Meaning, which customer id are relatively close to each other.
The formula I’m using is:
SQRT(((( (customerid(X)Age) - customerid(Y)Age))/(MAX(age)-Min(Age)))^2) + (((customerid(X)Income) - (customerid(Y)Income))/(MAX(Income)-MIN(Income)))^2 )
What I like to do is, run this in a loop in the load script,
And get the nearest neighbor for each customer id.
My expected output should be:
Customer, neighbored, score
For example:
For customer 5
SQRT((((58 - 45)/(69-35))^2) + ((51 - 46)/(150-38))^2 ) = 0.38495
Customer =5
Neighbor = 1
Score = 0.38495
Checking customer 5 against other customers will result a higher scores, so eventually, I need the minimum for each customer that was checked.
Thanks for your help,
Tomer
See attached qvw. I don't think this is suitable for very large number of records. You may want to use a specialized tool voor this kind of analysis. Maybe R with the Rweka package.
thanks its very helpful.
I will give it a try with Weka as well.
Cheers,
Tomer