Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data cleansing on existing table

Dear All,

I've got a problem with creating new data based on existing.

There is a Excel file with just two columns (No and Name):

No, Name
1, Entre-
2, os-Rios
3, Albergavia-
4, a-Velha
5, Albergavia-
6, a-Novo
7, Riba-
8, mandego
9, Puebla de
10, Sanabria
11,
12, ert2
13, 1234
14, Vale
15, de Canba
16, Varzea
17, Cova
18, PORTO
19, FARO
20, Moledo do Minho

The result should look like..

Nr, Name
1, Entre-os-Rios
2, Albergavia-a-Velha
3, Albergavia-a-Novo
4, Ribamandego
5, Puebla de Sanabria
6, Vale de Canba
7, Varzea Cova
8, Porto
9, Faro
10, Moledo do Minho

Any idea how to accomplish this in QV?

Thank you.

Regards,

Nick

1 Solution

Accepted Solutions
Not applicable
Author

Hi Nick,

Obviously it is better to organize the source data betterGeeked, but since it is not a perfect world ....

Maybe my suggestion will help you with your problem.

The steps I took: 1: Determine where a new name (could) starts and calculate a potential namegroup index

2; group the table on the namegroup and concatenate the name-strings

3: check against an existing table if the names are valid

In the script it looks like:

Temp1:

Load * inline

[x, NameString

1, Entre-

2, os-Rios

3, Albergavia-

4, a-Velha

5, Albergavia-

6, a-Novo

7, Riba-

8, mandego

9, Puebla de

10, Sanabria

11,

12, ert2

13, 1234

14, Vale

15, de Canba

16, Varzea

17, Cova

18, PORTO

19, FARO

20, Moledo do Minho];

// Start a new namegroup for all records that start with an Uppercase character.

// this could be any condition of course

Temp2:

Load *

, NameString & spacing as NameString2;

Load *

, if (Upper(Left(NameString,1)) = Left(NameString,1)

, rangesum(1,peek('namegroup'))

, peek('namegroup') ) as namegroup

, if (right(NameString,1)='-'

, ''

, ' ') as spacing

RESIDENT Temp1

where IsText(NameString);

// group and concat the string

Temp3:

Load namegroup

, Concat(NameString2) as newName

resident Temp2

group by namegroup;

// capitalize if the second character is in uppercase.

// check against another table (if applicable)

Temp4:

Load *

, exists(Name, TheName) as nameFound;

Load namegroup

, if (Upper(Mid(newName,2,1)) = Mid(newName,2,1)

, Capitalize(newName)

, newName) as TheName

resident Temp3;

Good Luck,
Regards, Jan


View solution in original post

2 Replies
Not applicable
Author

Hi Nick,

Obviously it is better to organize the source data betterGeeked, but since it is not a perfect world ....

Maybe my suggestion will help you with your problem.

The steps I took: 1: Determine where a new name (could) starts and calculate a potential namegroup index

2; group the table on the namegroup and concatenate the name-strings

3: check against an existing table if the names are valid

In the script it looks like:

Temp1:

Load * inline

[x, NameString

1, Entre-

2, os-Rios

3, Albergavia-

4, a-Velha

5, Albergavia-

6, a-Novo

7, Riba-

8, mandego

9, Puebla de

10, Sanabria

11,

12, ert2

13, 1234

14, Vale

15, de Canba

16, Varzea

17, Cova

18, PORTO

19, FARO

20, Moledo do Minho];

// Start a new namegroup for all records that start with an Uppercase character.

// this could be any condition of course

Temp2:

Load *

, NameString & spacing as NameString2;

Load *

, if (Upper(Left(NameString,1)) = Left(NameString,1)

, rangesum(1,peek('namegroup'))

, peek('namegroup') ) as namegroup

, if (right(NameString,1)='-'

, ''

, ' ') as spacing

RESIDENT Temp1

where IsText(NameString);

// group and concat the string

Temp3:

Load namegroup

, Concat(NameString2) as newName

resident Temp2

group by namegroup;

// capitalize if the second character is in uppercase.

// check against another table (if applicable)

Temp4:

Load *

, exists(Name, TheName) as nameFound;

Load namegroup

, if (Upper(Mid(newName,2,1)) = Mid(newName,2,1)

, Capitalize(newName)

, newName) as TheName

resident Temp3;

Good Luck,
Regards, Jan


Not applicable
Author

Very good solution. This will help!

Thank you very much!

Regards, Nick