Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I'm having an issue loading thousands of HTML files into Qlikview. This was initially working until recently, when the data was altered, my original load script looked like this:
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @1);
Now, the client has altered the data so some of the HTML files now contain a second table at the bottom of the page, pushing the tables of the data I need to extract from to be called @2. The problem is that not all HTML files will contain this second table so when I changed the load script to what it is below, I get a 'cannot locate table in HTML file' because if the new table isn't present, the information I need is in table @1.
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @2);
What is the best way to check for the right table and only load the relevant data in the initial load?
Thanks.
Thanks for your suggestion Jonathan. I ended up going with a very similar approach:
SET ErrorMode=0;
Customer:
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @1)
where len(@36)>1;
Concatenate
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @2);
SET ErrorMode=1;
The newly added tables only have 12 fields where the older ones have 36. Because these are being loaded from HTML files, NULL fields are initially loaded as ' ' (then transformed later) so the last column of the older files (@36) will always contain data. That why I added the "where len(@36)>1;"
Thanks to both of you for your help!
Have you tried using the Error variables like ErrorMode and ScriptError to
a) disable the Error when the table 2 is not found, so the script continues
b) check the returned ScriptError code and load table 1 when table 2 not exists
Hi Swuehl,
Thanks for your suggestion. I've set ErrorMode = 0 like you said so Table @2 is now loading, thank you. As for your suggestion about ScriptError, I'm not sure how that would look.
Due to the nature of the HTML tables, I cannot load embedded labels, so each column for both tables are called @1, @2, @3, @4 etc. I can't load all columns from table @1 as the data would then be mixed up. Would your suggestion still be feasible knowing this?
Thank you.
Hi
What I would do for the field names is to concatenate all the values loaded from @1 into one table, create a mapping table (eg mapRename1) to rename the fields (using rename fields using mapRename1;)...
Then concatenate the values from @2 into a second table. Create a mapping tables for the @2 fields (mapRename2) to rename @2 fields...
and combine then combine them by concatenating the table from @2 (now with fields renamed) onto the table from @1 (with fields renamed).
HTH
Jonathan
Hi Swuehl,
It looks like the ScriptError function is not working here. Below is what my script looks like but the data from table @1 is not being loaded:
Customer:
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @2);
IF ScriptError <> 0 THEN
Customer:
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @1);
END IF
Just to test the function wasn't working, I removed the second load script and replaced it with exit script;. The entire load script ran to completion i.e. the exit script; command wasn't executed.
Any thoughts?
Hi
I think the problem is that you are performing a wild card load, so the scripterror will be for the last file attempted - if that was successful, then the scripterror will be 0, even though one or more of the the previous files did experience the error.
You will need to loop over the files using ForEach vFile in Filelist() syntax - like this pseudocode:
mapRename1:
Mapping LOAD ...
mapRename2:
Mapping LOAD ...
Qualify *;
ForEach vFile in Filelist(.....)
Table2:
LOAD ... FROM @2 ...;
If ScriptError <> 0 then
Table1:
LOAD ... FROM @1 ...;
End If
End If
Unqualify *;
Rename fields using mapRename1;
Rename fields using mapRename2;
Concatenate (Table1)
LOAD * Resident Table2;
DROP Table Table2;
Notes:
HTH
Jonathan
Thanks for your suggestion Jonathan. I ended up going with a very similar approach:
SET ErrorMode=0;
Customer:
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @1)
where len(@36)>1;
Concatenate
LOAD *
FROM
[..\Source\HTML\*.html]
(html, codepage is 1252, no labels, table is @2);
SET ErrorMode=1;
The newly added tables only have 12 fields where the older ones have 36. Because these are being loaded from HTML files, NULL fields are initially loaded as ' ' (then transformed later) so the last column of the older files (@36) will always contain data. That why I added the "where len(@36)>1;"
Thanks to both of you for your help!
Thanks for you help Jonathan, I really appreciate it. I ended up not going for this approach at all, but instead used a variation of your original post.
Thanks again for your help!