Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

substringcount multiple words

Hi all,

I'm looking for a solution to deal with the following issue.

I have a table of names (e.g. "Joe Bloggs", "Mary Smith") - I have around 300,000 distinct values in this table.

I also have around 10 tables, each containing an ID field, <some other fields> and a narrative field which can range in size from a few lines to a few thousand lines.  These tables can be over 100,000 rows.

I need to loop through each narrative field in each of my 10 tables and pull out the IDs for each table where the narrative contains any of the names in my Name table.

e.g

NameTable:

ID, Name

1, Joe Bloggs

2, Mary Smith

3, Sam Taylor

DataTable:

ID,.....,NarrativeField

1,......,"...This is some data, Mary Smith likes the colour red..."

2,......,"..Sam Taylor loves QlikView.."

3,......,"..Joe Bloggs is married to Mary Smith.."

I would like my results to show the following:

ResultsTable:

ID, Name, FoundInID

1, Joe Bloggs, 3

2, Mary Smith, 1

2, Mary Smith, 3

3, Sam Taylor, 2

I've found examples where this can easily be done using a subfield solution (with an inner join back to my NameTable) but as I have more than 1 word for a name, I can't subfield on a space.  I've also found a nice regex solution, but as I have so many names to check, it's taking too long and I'm getting memory issues.

Can anyone think of any other solution?  I basically need to do a substringcount on my narrative field but can't iterate over every name as there are so many.

Many thanks

1 Solution

Accepted Solutions
swuehl
MVP
MVP

Maybe something like

NameTable:

LOAD * INLINE [

NameID, Name

1, Joe Bloggs

2, Mary Smith

3, Sam Taylor

];

MAP:

MAPPING

LOAD Name, '//'&NameID&'\\'

Resident NameTable;

DataTable:

LOAD * INLINE [

DataID,.....,NarrativeField

1,......,"...This is some data, Mary Smith likes the colour red..."

2,......,"..Sam Taylor loves QlikView.."

3,......,"..Joe Bloggs is married to Mary Smith.."

];

ResultsTable:

LOAD DataID, TextBetween(Search,'//','\\',iterno()) as NameID, 1 as Count

WHILE iterno() <= SubStringCount(Search,'//');

LOAD DataID, MapSubString('MAP',NarrativeField) as Search

Resident DataTable;

View solution in original post

5 Replies
Not applicable
Author

Sorry - forgot to say, I need to also count how many times each of the names appears in the narrative text

e.g.

Results table

ID, Name, FoundInID, Count

1, Joe Bloggs, 3, 1

2, Mary Smith, 1, 1

2, Mary Smith, 3, 1

3, Sam Taylor, 2, 1

swuehl
MVP
MVP

Maybe something like

NameTable:

LOAD * INLINE [

NameID, Name

1, Joe Bloggs

2, Mary Smith

3, Sam Taylor

];

MAP:

MAPPING

LOAD Name, '//'&NameID&'\\'

Resident NameTable;

DataTable:

LOAD * INLINE [

DataID,.....,NarrativeField

1,......,"...This is some data, Mary Smith likes the colour red..."

2,......,"..Sam Taylor loves QlikView.."

3,......,"..Joe Bloggs is married to Mary Smith.."

];

ResultsTable:

LOAD DataID, TextBetween(Search,'//','\\',iterno()) as NameID, 1 as Count

WHILE iterno() <= SubStringCount(Search,'//');

LOAD DataID, MapSubString('MAP',NarrativeField) as Search

Resident DataTable;

Not applicable
Author

Amazing.  Thanks so much Stefan

swuehl
MVP
MVP

You are welcome.

Just out of curiosity, I haven't tried this approach with thousands of key words searching in huge text base.

How is it performing on your data?

(I don't think there is anything better within Qlik scripting only, so if the performance is not acceptable, you probably need to use some other tool for the search and index building)

Not applicable
Author

Great on all but one of my tables, that only has 27,000 rows but the narrative field on that table contains a huge amount of data (that field contains the text from a word document).  I'm getting an 'out of memory' error on that one so need to think of something different for that table.

Works perfectly on the rest though!