Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a dataset/log of web urls visited and would like to compare this against a spreadsheet of known sites to create a column that is a yes/no flag. Can someone illustrate how this can be done in the load script or provide an example?
If I had exact matches, I would load the data, then left join the spreadsheet, then load into a final table with YES/NO values based on match or null, but the problem is that I need to match on "contains" so that
"microsoft.com" matches "http://www.microsoft.com/index.html" for example
The spreadsheet has ~3000 rows...
I thought of stripping down the url field to a hostname, but I really need the general pattern match approach if that is possible.
I think for this you could use a mapping with mapsubstring(). Here an example how it in general worked: Re: Passing parameter strings that contain special characters. Based on this approach you could modify it to your real aim. This could be done by replacing the match-string with nothing - '' - and then making a check on the length like:
....
len(Url) = len(mapsubstring('MapTable', Url, '')) as Flag
...
which would return true or false respectively 0 or -1.
- Marcus