Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
ciaran_mcgowan
Partner - Creator III
Partner - Creator III

Loading From Multiple HTML Files Where Table Varies

Hi all,

I'm having an issue loading thousands of HTML files into Qlikview. This was initially working until recently, when the data was altered, my original load script looked like this:

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @1);

Now, the client has altered the data so some of the HTML files now contain a second table at the bottom of the page, pushing the tables of the data I need to extract from to be called @2. The problem is that not all HTML files will contain this second table so when I changed the load script to what it is below, I get a 'cannot locate table in HTML file' because if the new table isn't present, the information I need is in table @1.

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @2);

What is the best way to check for the right table and only load the relevant data in the initial load?

Thanks.

1 Solution

Accepted Solutions
ciaran_mcgowan
Partner - Creator III
Partner - Creator III
Author

Thanks for your suggestion Jonathan. I ended up going with a very similar approach:

SET ErrorMode=0;

Customer:

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @1)

where len(@36)>1;

Concatenate

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @2);


SET ErrorMode=1;

The newly added tables only have 12 fields where the older ones have 36. Because these are being loaded from HTML files, NULL fields are initially loaded as '&nbsp' (then transformed later) so the last column of the older files (@36) will always contain data. That why I added the "where len(@36)>1;"

Thanks to both of you for your help!

View solution in original post

7 Replies
swuehl
MVP
MVP

Have you tried using the Error variables like ErrorMode and ScriptError to

a) disable the Error when the table 2 is not found, so the script continues

b) check the returned ScriptError code and load table 1 when table 2 not exists

ciaran_mcgowan
Partner - Creator III
Partner - Creator III
Author

Hi Swuehl,

Thanks for your suggestion. I've set ErrorMode = 0 like you said so Table @2 is now loading, thank you. As for your suggestion about ScriptError, I'm not sure how that would look.


Due to the nature of the HTML tables, I cannot load embedded labels, so each column for both tables are called @1, @2, @3, @4 etc. I can't load all columns from table @1 as the data would then be mixed up. Would your suggestion still be feasible knowing this?


Thank you.

jonathandienst
Partner - Champion III
Partner - Champion III

Hi

What I would do for the field names is to concatenate all the values loaded from @1 into one table, create a mapping table (eg mapRename1) to rename the fields (using rename fields using mapRename1;)...

Then concatenate the values from @2 into a second table. Create a mapping tables for the @2 fields (mapRename2) to rename @2 fields...

and combine then combine them by concatenating the table from @2 (now with fields renamed) onto the table from @1 (with fields renamed).

HTH

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
ciaran_mcgowan
Partner - Creator III
Partner - Creator III
Author

Hi Swuehl,

It looks like the ScriptError function is not working here. Below is what my script looks like but the data from table @1 is not being loaded:

Customer:

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @2);

IF ScriptError <> 0 THEN

Customer:

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @1);

END IF

Just to test the function wasn't working, I removed the second load script and replaced it with exit script;. The entire load script ran to completion i.e. the exit script; command wasn't executed.

Any thoughts?

jonathandienst
Partner - Champion III
Partner - Champion III

Hi

I think the problem is that you are performing a wild card load, so the scripterror will be for the last file attempted - if that was successful, then the scripterror will be 0, even though one or more of the the previous files did experience the error.

You will need to loop over the files using ForEach vFile in Filelist() syntax - like this pseudocode:

mapRename1:

Mapping LOAD ...

mapRename2:

Mapping LOAD ...

Qualify *;

ForEach vFile in Filelist(.....)

     Table2:

     LOAD ... FROM @2 ...;

     If ScriptError <> 0 then

          Table1:

          LOAD ... FROM @1 ...;

     End If

End If

Unqualify *;

Rename fields using mapRename1;

Rename fields using mapRename2;  

Concatenate (Table1)

LOAD * Resident Table2;

DROP Table Table2;

Notes:

  • The Qualify separate the loaded data into Table1.@1, Table1.@2.... and Table2.@1, Table2.@2, ....
  • Thee mapping tables will need to use these field names for the the "from" name in the mapping table
  • You may need to combine the two mapping tables into a single table

HTH

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
ciaran_mcgowan
Partner - Creator III
Partner - Creator III
Author

Thanks for your suggestion Jonathan. I ended up going with a very similar approach:

SET ErrorMode=0;

Customer:

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @1)

where len(@36)>1;

Concatenate

LOAD *

FROM

[..\Source\HTML\*.html]

(html, codepage is 1252, no labels, table is @2);


SET ErrorMode=1;

The newly added tables only have 12 fields where the older ones have 36. Because these are being loaded from HTML files, NULL fields are initially loaded as '&nbsp' (then transformed later) so the last column of the older files (@36) will always contain data. That why I added the "where len(@36)>1;"

Thanks to both of you for your help!

ciaran_mcgowan
Partner - Creator III
Partner - Creator III
Author

Thanks for you help Jonathan, I really appreciate it. I ended up not going for this approach at all, but instead used a variation of your original post.

Thanks again for your help!