Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
martinholm
Contributor
Contributor

Search for similar field values, replace and remove duplicates

Hello

I'm working on a project where categories have been saved with similar names multiple times. An issue which should have been fixed at an earlier stage, but cannot be done.

here's an example of values that should all be named "Solumstrand"

clipboard_image_0.png

As there is quite a lot of these mistakes, i would like to give a column of correct names which should be (wild?)matched with the faulty names and replace them.

I have attempted like this for joining and i suppose i can remove the faulty names in a new load statement

//Names i want to correct to
Prosjekttabell:
Load Prosjektnavn;
LOAD Prosjektnavn INLINE [
Prosjektnavn
'Solumstrand',
'Muusøya',
'Kim Karlsen'
'Sande',
'Svelvik'
'Elvika',
'Lier',
'Linnes',
'Mjøndalen',
'Vidar Hansen',
'Åsgårdstrand'
];

//I want to search Prosjekt for substrings found in Prosjekttabell and join the fitting name to all the correct rows
join (Prosjekttabell)
Load distinct PROSJEKT_ID,
PROSJEKT_NR,
PROSJEKT
Resident 'Drammen_test'
where WildMatch(PROSJEKT, [Prosjektnavn])= [Prosjektnavn];

I get the error message "Cannot find the field name Prosjektnavn"

Thanks in advance!

Labels (1)
1 Solution

Accepted Solutions
JGMDataAnalysis
Creator III
Creator III

A slightly shorter version...

ProsjektnavnMap:
MAPPING LOAD Lower(Prosjektnavn), '<' & Prosjektnavn & '>'
INLINE [
        Prosjektnavn
        Solumstrand
        Muusøya
        Kim Karlsen
        Sande
        Svelvik
        Elvika
        Lier
        Linnes
        Mjøndalen
        Vidar Hansen
        Åsgårdstrand
];

Test:
NOCONCATENATE
LOAD * WHERE Len(Trim(newPROSJEKT)); // Comment this line to load all rows.
LOAD DISTINCT
    PROSJEKT_ID,
    PROSJEKT_NR,
    PROSJEKT AS oldPROSJEKT,
    TextBetween(
    	MapSubstring('ProsjektnavnMap', Lower(PROSJEKT)), '<', '>'
    ) AS newPROSJEKT
RESIDENT Drammen_test;

View solution in original post

4 Replies
Brett_Bleess
Former Employee
Former Employee

Have a look at the following Design Blog post, that may be a better way to go at things potentially.

https://community.qlik.com/t5/Qlik-Design-Blog/Mapping-and-not-the-geographical-kind/ba-p/1463192

If you want to search for other blog posts, use the following link:

https://community.qlik.com/t5/Qlik-Design-Blog/bg-p/qlik-design-blog

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.
JGMDataAnalysis
Creator III
Creator III

What criteria do you use to classify "duplicates"?

Try this script and tell me about it...

Prosjekttabell:
NOCONCATENATE
LOAD * INLINE [
    Prosjektnavn
    Solumstrand
    Muusøya
    Kim Karlsen
    Sande
    Svelvik
    Elvika
    Lier
    Linnes
    Mjøndalen
    Vidar Hansen
    Åsgårdstrand
];

ProsjektnavnMap:
MAPPING LOAD Lower(Prosjektnavn), '<' & Prosjektnavn & '>'
RESIDENT Prosjekttabell;

Test:
NOCONCATENATE
LOAD * WHERE Len(Trim(newPROSJEKT)); // Comment this line to load all rows.
LOAD DISTINCT
    PROSJEKT_ID,
    PROSJEKT_NR,
    PROSJEKT AS oldPROSJEKT,
    TextBetween(
    	MapSubstring('ProsjektnavnMap', Lower(PROSJEKT)), '<', '>'
    ) AS newPROSJEKT
RESIDENT Drammen_test;

DROP TABLE Prosjekttabell;

 

JGMDataAnalysis
Creator III
Creator III

A slightly shorter version...

ProsjektnavnMap:
MAPPING LOAD Lower(Prosjektnavn), '<' & Prosjektnavn & '>'
INLINE [
        Prosjektnavn
        Solumstrand
        Muusøya
        Kim Karlsen
        Sande
        Svelvik
        Elvika
        Lier
        Linnes
        Mjøndalen
        Vidar Hansen
        Åsgårdstrand
];

Test:
NOCONCATENATE
LOAD * WHERE Len(Trim(newPROSJEKT)); // Comment this line to load all rows.
LOAD DISTINCT
    PROSJEKT_ID,
    PROSJEKT_NR,
    PROSJEKT AS oldPROSJEKT,
    TextBetween(
    	MapSubstring('ProsjektnavnMap', Lower(PROSJEKT)), '<', '>'
    ) AS newPROSJEKT
RESIDENT Drammen_test;
martinholm
Contributor
Contributor
Author

Hello! Sorry for late answer.

This works like a charm, thanks a lot!