Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

For each table in document

  • I'm working with a lot of HTML documents. These include between 1 and N tables.
  • The content from each of these tables can be very different:
    • 1 to N columns
    • Different column titles / field names.
  • Now I'm trying to write an import script for these files.
  • My approach is:
    • For each file
      • Get number of tables
      • For each table
        • Get number of columns
        • For each column/field
          • Get first entry
          • Do something with this column/field if keyword is "foo"
          • Do something with this table if keyword is "bar"
        • .
      • .
    • .
  • I think I have to iterate tables @1, @2, @3, ... and in this tables iterate the columns again @1, @2, @3, @4, ....
  • I don't know how to itarate unknown number of tables and columns.

My current code is:

SET dirpath = [.];

SET filescheme = [*3*];

SET fileext = [html htm];

sub workInDirectory(Root, Namescheme, Extlist)

    for i=1 to SubStringCount(Extlist, ' ')

        for each File in FileList(Root & '\' & Namescheme & SubField(Extlist, ' ', '$(i)'))

            if FindOneOf('$(File)', '~$') = 0 then // Protects opening open documents.

                Filelist:

                LOAD

                    '$(File)'               as [FilePath],

                    FileSize('$(File)')     as [FileSize],

                    FileTime('$(File)')     as [FileTime]

                    AutoGenerate(1);

                Pathinfos:

                LOAD

                    '$(File)'           as [FilePath],

                    FileDir('$(File)')  as [FileDir],

                    FileName()          as [FileName],

                    FileBaseName()      as [FileBaseName],       

                    FileExtension()        as [FileExtension]

                FROM $(File)

                (fix, codepage is 1252);

               

                // TODO:

                // For each document.

                    // Count number of tables in document.

                    // For each table in document.

                       // Get number of columns

                       // For each column

                          // Get first entry

                          // Check these for key words.

                          // If key word is foo do 1.

                          // If key word is bar do 2.

                // Do 1: foo-function.

                // Do 2: bar-function.

               

               // TODO: Change this

                LOAD

                    '$(File)' as [FilePath],

                    @1,

                     @2,

                     @3,

                     @4

                FROM $(File)

                (html, codepage is 1252, no labels, table is @4);

            end if

        next File

    next i

    for each Dir in DirList(Root&'\*')

        call workInDirectory(Dir, Namescheme, Extlist)

    next Dir

end sub

call workInDirectory('$(dirpath)', '$(filescheme)', '$(fileext)')

1 Solution

Accepted Solutions
Not applicable
Author

I've got a solution. I don't like this, but it works.

Sub loadAllExistingTablesFrom(inFile)

    LET vContinueLoop = True();

    SET vTblNum = 1;

    SET vTable = '';

    LET col1_row0 = '';

    Do While vContinueLoop = True()

        vTable = '@' & $(vTblNum)

        // TODO: Find a table check without error suppression.

        SET ErrorMode = 0; // Because we load possibly unexisting tables.

            TableHead:

            First 1 LOAD

                '$(inFile)' as [FilePath],

                @1

            FROM $(inFile)

            (Html, Codepage Is 1252, No Labels, Table Is $(vTable) );

            col1_row0 = Peek('@1', 0, 'TableHead');

            DROP TABLE TableHead;

        SET ErrorMode = 1;

        If SubStringCount(Trim('$(col1_row0)'), ' ') > 0 Or SubField(Trim('$(col1_row0)'), ' ', '1') <> '' Then

            Call mainLoader(inFile, '$(vTable)')

        Else

            vContinueLoop = False();

        End If

        vTblNum = $(vTblNum) + 1;

    Loop

End Sub

View solution in original post

4 Replies
Gysbert_Wassenaar

  • I think I have to iterate tables @1, @2, @3, ... and in this tables iterate the columns again @1, @2, @3, @4, ....
  • I don't know how to itarate unknown number of tables and columns.

I don't know either. I just use a large number that I know will be larger than the maximum number of tables in a document: Re: know the number of tables.

To find the number of columns of a table you can load the first record of the table and then use the NoOfFields function using the table name as argument.


talk is cheap, supply exceeds demand
Not applicable
Author

Thank you for your answer.

This approach does not work for me. QlikView is running into an endless loop.

I was using 12 to 1 in the for loop (from your approach) at 6 test files including 1 to 7 tables.

Not applicable
Author

I've got a solution. I don't like this, but it works.

Sub loadAllExistingTablesFrom(inFile)

    LET vContinueLoop = True();

    SET vTblNum = 1;

    SET vTable = '';

    LET col1_row0 = '';

    Do While vContinueLoop = True()

        vTable = '@' & $(vTblNum)

        // TODO: Find a table check without error suppression.

        SET ErrorMode = 0; // Because we load possibly unexisting tables.

            TableHead:

            First 1 LOAD

                '$(inFile)' as [FilePath],

                @1

            FROM $(inFile)

            (Html, Codepage Is 1252, No Labels, Table Is $(vTable) );

            col1_row0 = Peek('@1', 0, 'TableHead');

            DROP TABLE TableHead;

        SET ErrorMode = 1;

        If SubStringCount(Trim('$(col1_row0)'), ' ') > 0 Or SubField(Trim('$(col1_row0)'), ' ', '1') <> '' Then

            Call mainLoader(inFile, '$(vTable)')

        Else

            vContinueLoop = False();

        End If

        vTblNum = $(vTblNum) + 1;

    Loop

End Sub

Gysbert_Wassenaar

Looks like a perfectly acceptable solution to me. If it can't be done the way it ought, it ought to be done the way it can.


talk is cheap, supply exceeds demand