Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, I have a table with two fields Address1 and Address2 and i need to use a wildmatch, word by word like a cartesian produt to know how many words in Address2 exists in Address1.
Ex:
Input:
Address1 | Address2 |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | Av.+das+Américas+1650+Bloco+4+Loja+119+Barra+da+Tijuca+Rio+de+Janeiro |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | Av+das+Americas+1650+Blc+1+Lj+102+Barra+da+Tijuca+Rio+de+Janeiro |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | Avenida+das+Américas+1600+Barra+da+Tijuca+Rio+de+Janeiro |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | Avenida+das+Américas+1650+Barra+da+Tijuca+Rio+de+Janeiro |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | Avenida+das+Américas+1650+Bloco+04/+Loja+118+Recreio+dos+Bandeirantes+Rio+de+Janeiro |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | Avenida+das+Américas+1690+Barra+da+Tijuca+Rio+de+Janeiro |
Output
Address1 | Num_WildMatch |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | 7 |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | 8 |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | 6 |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | 6 |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | 4 |
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO | 8 |
Hello Eduardo, you could start with something like the following:
[ADDRESSES]:
LOAD AutoNumberHash128(Address1, Address2) AS COMBO_ID, Upper(Address1) AS Address1, Upper(Address2) AS Address2 INLINE [
Address1, Address2
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Av.+das+Américas+1650+Bloco+4+Loja+119+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Av+das+Americas+1650+Blc+1+Lj+102+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1600+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1650+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1650+Bloco+04/+Loja+118+Recreio+dos+Bandeirantes+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1690+Barra+da+Tijuca
];
[SUBCOMPONENTS]:
LOAD COMBO_ID, Subfield(Address1 & '+' & Address2,'+') AS SUBCOMPONENTS RESIDENT ADDRESSES;
[COUNT_SUBCOMPONENTS]:
LOAD DISTINCT COMBO_ID, SUBCOMPONENTS, Count(SUBCOMPONENTS) AS COUNT RESIDENT [SUBCOMPONENTS] GROUP BY COMBO_ID, SUBCOMPONENTS;
[ISOLATE_REPEATERS]:
LOAD COMBO_ID, Count(DISTINCT SUBCOMPONENTS) AS SUBCOMPONENT_MATCHES RESIDENT [COUNT_SUBCOMPONENTS] WHERE COUNT > 1 GROUP BY COMBO_ID;
Hello Eduardo, you could start with something like the following:
[ADDRESSES]:
LOAD AutoNumberHash128(Address1, Address2) AS COMBO_ID, Upper(Address1) AS Address1, Upper(Address2) AS Address2 INLINE [
Address1, Address2
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Av.+das+Américas+1650+Bloco+4+Loja+119+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Av+das+Americas+1650+Blc+1+Lj+102+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1600+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1650+Barra+da+Tijuca+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1650+Bloco+04/+Loja+118+Recreio+dos+Bandeirantes+Rio+de+Janeiro
AV+AMERICAS+1700+BARRA+DA+TIJUCA+RIO+DE+JANEIRO, Avenida+das+Américas+1690+Barra+da+Tijuca
];
[SUBCOMPONENTS]:
LOAD COMBO_ID, Subfield(Address1 & '+' & Address2,'+') AS SUBCOMPONENTS RESIDENT ADDRESSES;
[COUNT_SUBCOMPONENTS]:
LOAD DISTINCT COMBO_ID, SUBCOMPONENTS, Count(SUBCOMPONENTS) AS COUNT RESIDENT [SUBCOMPONENTS] GROUP BY COMBO_ID, SUBCOMPONENTS;
[ISOLATE_REPEATERS]:
LOAD COMBO_ID, Count(DISTINCT SUBCOMPONENTS) AS SUBCOMPONENT_MATCHES RESIDENT [COUNT_SUBCOMPONENTS] WHERE COUNT > 1 GROUP BY COMBO_ID;
Hey Evan, thank you for your time.
Your solution doesnt work like i needed but was very clever the way that you splited word by word after concatenate the two strings, but with that i can easely use a wildmatch.
I saw that you used a AutoNumberHash128 and i'm sorry for that extra effort, cause i already have a primary key, i just din't wrote cause i thought thats no use here (and i was wrong).
I'll end the code e post here, but the right answer is your. Thank you
Here the final code:
NoConcatenate Aux: load CNPJ, UPPER(EnderecoCompleto) AS Address1, UPPER(End) AS Address2 Resident PlaceId; [SUBCOMPONENTS]: LOAD CNPJ, Address1, Address2, Subfield(Address2,'+') AS SUBCOMPONENTS RESIDENT Aux; [COUNT_SUBCOMPONENTS]: LOAD DISTINCT CNPJ, Address1, Address2, SUBCOMPONENTS, WildMatch(Address1,'*'&SUBCOMPONENTS&'*') AS Match RESIDENT [SUBCOMPONENTS] ; [COUNT_SUBCOMP]: LOAD DISTINCT CNPJ, Address1, Address2, Count(SUBCOMPONENTS) AS COUNT RESIDENT [SUBCOMPONENTS] GROUP BY CNPJ, Address1,Address2; DROP TABLE Aux,SUBCOMPONENTS; [RESULT]: LOAD DISTINCT CNPJ, Address1, Address2, sum(Match) AS Macth RESIDENT [COUNT_SUBCOMPONENTS] Group by CNPJ, Address1, Address2, ; Left Join(RESULT) LOAD DISTINCT CNPJ, Address1, Address2, COUNT Resident [COUNT_SUBCOMP]; DROP TABLE [COUNT_SUBCOMPONENTS],COUNT_SUBCOMP; [SCORE]: LOAD DISTINCT CNPJ, Address1, Address2, Macth/COUNT as Score RESIDENT [RESULT]; DROP TABLE RESULT;
And Output
Now with just a Min() function i can found the likely Address