6 Replies Latest reply: Jul 23, 2015 3:15 PM by Sasidhar Parupudi RSS

    Extract Datas from Html non parsed

    German Torres

      Hi!!!

      Im needing a little help again with other file to extract data.

       

      I have an html file and i need the following datas:

       

      UNIVERSIDAD XXXXX  into a Field called "Obra"

      Hasta: (in this case the value its) 16/07/2015

      Neto Rendido: (in this case the value its) $51.814,10

       

      This Extraction its not only for this File. I have many files with the same format, and i need to read them all

       

      Can Someone help me?

      Regards!!!

       

      Germán

        • Re: Extract Datas from Html non parsed
          Srikanth P

          Hi Please find the below sample script.

           

          TEMP:

          LOAD

            RowNo() AS RN ,

            LN1 &'|'& LN2 &'|'& LN3 &'|'& LN4 &'|'& LN5 &'|'& LN6 &'|'& LN7 &'|'& LN8 AS LINE

          ;

          LOAD trim(@1) AS LN1,

               trim(@2) AS LN2,

               trim(@3) AS LN3,

               trim(@4) AS LN4,

               trim(@5) AS LN5,

               trim(@6) AS LN6,

               trim(@7) AS LN7,

               trim(@8) AS LN8

          FROM [PL0716_15628_3.HTM] (html, codepage is 1252, no labels, table is @1);

           

           

          TEMP2:

          LOAD

            * ,

            IF(Previous(RN)=RN, IF(Previous(LINES)= LINES , Peek(SN) , RangeSum(1,Peek(SN)) ) ,1) AS SN

          Where  Len(Trim(LINES))

          ;

          LOAD

            RN,

            Trim(SubField(LINE ,'|')) AS LINES

          Resident TEMP ;

           

           

          DROP Table TEMP;

           

           

          SourceFile:

          LOAD

            RN,

            Concat( DISTINCT LINES , ' ' , SN ) AS HLINE

          Resident TEMP2

          Where Len(Trim(LINES))

          Group By RN

          ;

           

           

          DROP Table TEMP2;

           

           

          FinalTemp:

          LOAD

            DATASTRING ,

            SubField(DATASTRING,'|',1) AS Obra,

            subfield(SubField(DATASTRING,2),':',-1) AS Hasta ,

            subfield(SubField(DATASTRING,3),':',-1) AS [Neto Rendido]

          ;

          LOAD

            Concat( HLINE , '|' , RN ) AS DATASTRING

          Resident SourceFile

          Where WildMatch( HLINE , 'UNIVERSIDAD*', '*Hasta*' ,'*Neto Rendido*');

           

           

          DROP Table SourceFile;

            • Re: Extract Datas from Html non parsed
              German Torres

              Hi Dathu!!!

              Thanks!!

              But this only works for one file, isnt it?

              I need the same but for many files into the same folder.

              How could do i?

               

              Regards!!!

                • Re: Extract Datas from Html non parsed
                  Sasidhar Parupudi

                  May be this post can help you to loop through the html files in a folder

                  Re: loop through to load all files from a folder and its subfolders?

                  • Re: Extract Datas from Html non parsed
                    Srikanth P

                    Hi, If all files have same format create a for loop to load the data from each file.

                     

                    Since we have the html files we can't load like sub folders or folders.

                     

                    First Create a small excel or Inline table having all the html file names. (This must be a manual or some how you get from other source)

                     

                    HtmlFilesList:

                    LOAD * INLINE [

                    HtmlFileName

                    file1

                    file2

                    file3

                    .

                    .

                    ];

                     

                    SET vTableName = TransformedData:;

                     

                    For i=1 to FieldvalueCount('HtmlFileName')

                     

                    Let vHtmlFile = FieldValue('HtmlFileName') ;

                     

                    // **** Your Load Script Logic for each html page ****

                     

                    TEMP:

                    LOAD

                      RowNo() AS RN ,

                      LN1 &'|'& LN2 &'|'& LN3 &'|'& LN4 &'|'& LN5 &'|'& LN6 &'|'& LN7 &'|'& LN8 AS LINE

                    ;

                    LOAD trim(@1) AS LN1,

                         trim(@2) AS LN2,

                         trim(@3) AS LN3,

                         trim(@4) AS LN4,

                         trim(@5) AS LN5,

                         trim(@6) AS LN6,

                         trim(@7) AS LN7,

                         trim(@8) AS LN8

                    FROM [$(vHtmlFile)] (html, codepage is 1252, no labels, table is @1);

                     

                    TEMP2:

                    LOAD

                      * ,

                      IF(Previous(RN)=RN, IF(Previous(LINES)= LINES , Peek(SN) , RangeSum(1,Peek(SN)) ) ,1) AS SN

                    Where  Len(Trim(LINES))

                    ;

                    LOAD

                      RN,

                      Trim(SubField(LINE ,'|')) AS LINES

                    Resident TEMP ;

                     

                    DROP Table TEMP;

                     

                    SourceFile:

                    LOAD

                      RN,

                      Concat( DISTINCT LINES , ' ' , SN ) AS HLINE

                    Resident TEMP2

                    Where Len(Trim(LINES))

                    Group By RN

                    ;

                     

                    DROP Table TEMP2;

                     

                    $(vTableName)

                    LOAD

                      DATASTRING ,

                      SubField(DATASTRING,'|',1) AS Obra,

                      subfield(SubField(DATASTRING,2),':',-1) AS Hasta ,

                      subfield(SubField(DATASTRING,3),':',-1) AS [Neto Rendido]

                    ;

                    LOAD

                      Concat( HLINE , '|' , RN ) AS DATASTRING

                    Resident SourceFile

                    Where WildMatch( HLINE , 'UNIVERSIDAD*', '*Hasta*' ,'*Neto Rendido*');

                     

                    DROP Table SourceFile;

                     

                    Let vHtmlFile = ;

                    SETvTableName = Concatenate(TransformedData);

                    Next i

                     

                    Drop Table HtmlFilesList ;

                • Re: Extract Datas from Html non parsed
                  German Torres

                  I Apreciate your help but something dosnt look good on datas. Its not parsing datas in the correct way as you can see on the screen attached.

                   

                  For instance, look the date data, it dosnt include the year. Or there's some Neto Value that its $29.67 when you can see on DataString that this value its $29.673,86

                  Qlikview_Ripsa.jpg

                  Something its not working.

                  Thanks anyway.