Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I'm looking for a solution to deal with the following issue.
I have a table of names (e.g. "Joe Bloggs", "Mary Smith") - I have around 300,000 distinct values in this table.
I also have around 10 tables, each containing an ID field, <some other fields> and a narrative field which can range in size from a few lines to a few thousand lines. These tables can be over 100,000 rows.
I need to loop through each narrative field in each of my 10 tables and pull out the IDs for each table where the narrative contains any of the names in my Name table.
e.g
NameTable:
ID, Name
1, Joe Bloggs
2, Mary Smith
3, Sam Taylor
DataTable:
ID,.....,NarrativeField
1,......,"...This is some data, Mary Smith likes the colour red..."
2,......,"..Sam Taylor loves QlikView.."
3,......,"..Joe Bloggs is married to Mary Smith.."
I would like my results to show the following:
ResultsTable:
ID, Name, FoundInID
1, Joe Bloggs, 3
2, Mary Smith, 1
2, Mary Smith, 3
3, Sam Taylor, 2
I've found examples where this can easily be done using a subfield solution (with an inner join back to my NameTable) but as I have more than 1 word for a name, I can't subfield on a space. I've also found a nice regex solution, but as I have so many names to check, it's taking too long and I'm getting memory issues.
Can anyone think of any other solution? I basically need to do a substringcount on my narrative field but can't iterate over every name as there are so many.
Many thanks
Maybe something like
NameTable:
LOAD * INLINE [
NameID, Name
1, Joe Bloggs
2, Mary Smith
3, Sam Taylor
];
MAP:
MAPPING
LOAD Name, '//'&NameID&'\\'
Resident NameTable;
DataTable:
LOAD * INLINE [
DataID,.....,NarrativeField
1,......,"...This is some data, Mary Smith likes the colour red..."
2,......,"..Sam Taylor loves QlikView.."
3,......,"..Joe Bloggs is married to Mary Smith.."
];
ResultsTable:
LOAD DataID, TextBetween(Search,'//','\\',iterno()) as NameID, 1 as Count
WHILE iterno() <= SubStringCount(Search,'//');
LOAD DataID, MapSubString('MAP',NarrativeField) as Search
Resident DataTable;
Sorry - forgot to say, I need to also count how many times each of the names appears in the narrative text
e.g.
Results table
ID, Name, FoundInID, Count
1, Joe Bloggs, 3, 1
2, Mary Smith, 1, 1
2, Mary Smith, 3, 1
3, Sam Taylor, 2, 1
Maybe something like
NameTable:
LOAD * INLINE [
NameID, Name
1, Joe Bloggs
2, Mary Smith
3, Sam Taylor
];
MAP:
MAPPING
LOAD Name, '//'&NameID&'\\'
Resident NameTable;
DataTable:
LOAD * INLINE [
DataID,.....,NarrativeField
1,......,"...This is some data, Mary Smith likes the colour red..."
2,......,"..Sam Taylor loves QlikView.."
3,......,"..Joe Bloggs is married to Mary Smith.."
];
ResultsTable:
LOAD DataID, TextBetween(Search,'//','\\',iterno()) as NameID, 1 as Count
WHILE iterno() <= SubStringCount(Search,'//');
LOAD DataID, MapSubString('MAP',NarrativeField) as Search
Resident DataTable;
Amazing. Thanks so much Stefan
You are welcome.
Just out of curiosity, I haven't tried this approach with thousands of key words searching in huge text base.
How is it performing on your data?
(I don't think there is anything better within Qlik scripting only, so if the performance is not acceptable, you probably need to use some other tool for the search and index building)
Great on all but one of my tables, that only has 27,000 rows but the narrative field on that table contains a huge amount of data (that field contains the text from a word document). I'm getting an 'out of memory' error on that one so need to think of something different for that table.
Works perfectly on the rest though!