Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Remove useless information from a log file before import

Dear all,

I try to analyze a log file. The structure of the file looks like this:

******* Block A ***********

01/01/2014 16:18:47 useless header infos

01/01/2014 16:18:47 useless header infos

....

******* Block B ***********

01/01/2014 16:18:47 useless header infos

|-|------------------------------|----------|---|----|----|

| PARAMETERS 'INPUT'

|-|------------------------------|----------|----|----|----|

|1|COL2                          |COL3      |COL4|COL5|COL6|

|-|------------------------------|----------|---|----|----|

|0|123456789012345678901234567890|1234567890|000|0000|0000|

|-|------------------------------|----------|---|----|----|

...

******* Block C ***********

01/01/2014 16:18:47 useless header infos

|-|------------------------------|----------|---|----|----|

| TABLE 'ABC'

|------------------|------------------|--------------------|

|COL1              |COL2              |COL3                |

|------------------|------------------|--------------------|

|012345678901234567|012345678901234567|012345678901234567  |

|------------------|------------------|--------------------|

|blabla            |                  |                    |

|blabla            |                  |                    |

....

|-|------------------------------|----------|---|----|----|

...

Block A, B and C recur in the logfile on no fixed position or line for other dates, too.

Now I am looking for a good way to import the data into QlikView. I am only interessted in the table data in Block C.

What is the best way to get rid of the useless infos in Block A and B? I want only Block C with its fixed structure to remain, so that I can import it as simple as a CSV file.

I hope you can help me on this issue. Thanks for your help in advance.

Best regards,

Dan

9 Replies
marcus_sommer

You could use '|' as delimiter and set a header from x rows. If it is more complicated with your data-structure you could use a bunch of functions like peek(), len(), subfield(), keepchar() .... to create a "normal" data-structure.

There are also native qlikview-functionalities (within the load-wizard) to transform unstructered data - a very good explantion for this is in the book QlikView 11 for Developers (is generally a big recommendation).

- Marcus

martynlloyd
Partner - Creator III
Partner - Creator III

You could use something like this to tag the start and end of Block C

LOAD A
FROM
[..\Community2.xlsx]
(
ooxml, no labels, table is Sheet1);

Temp1:
LOAD
A,
If(WildMatch(A, '*BLOCK C*'), RowNo()) as BlockCStart,
If(WildMatch(A, '*BLOCK A*'), RowNo()) as BlockCEnd
Resident Log;

You'd then have a table of start/end of all block C's to then use Peek or Where...

If someone knows how to set a variable in the middle of a Load, then you would be able to tag the rows as good/bad...

Regards,

Marty.

Not applicable
Author

This seems to be an interesting approach. I will try it the next days and tell you if it worked for me.

Thanks.

Best regards

Dan

Not applicable
Author

The WildMatch function worked good so far. I transformed the information into a new temp table:

TableNr | LineNrStart | LineNrEnd

1          | 755            | 839

2          | 912            | 1200

This means that the relevant information (Block C) is in the lines 755 to 839 and 912 to 1200 in my log file.

I created the following table - containing my original log file:

LogFile:

LOAD

RowNo() as lineNr, A as lineContent

FROM [logfile.txt]

...

Now I wanted to extract the relevant information (Block C) from it using the new temp table. What is the best way to do so? In SQL I would start something like this:

SELECT lineContent

FROM LogFile

WHERE lineNr between 755 and 839

AND lineNr between 912 and 1200;

Thanks in advance for your help.

Best regards

Dan

martynlloyd
Partner - Creator III
Partner - Creator III

Hi Dan,

I haven't tested this but it will be something like:

// loop through temp table 

FOR vblock = 1 to NoOfRows('TempTable') 

  Let vStart = Peek('LineNrStart',vBlock,'TempTable')

  Let vEnd = Peek('LineNrEnd',vBlock,'TempTable')

Log:

LOAD

lineContent

Resident ....

Where lineNr >= $(vStart) and <= $(vEnd)

;

NEXT vblock

QV will automatically concatenate the individual loads and you will end up with one table, drop everything you don't need as the last thing.

Regards,

Marty.

maxgro
MVP
MVP

if you want to filter block 'C'

tmp:

LOAD F1, rowno() as id,

if(index(F1, 'Block'),  PurgeChar(replace(F1, 'Block', ''), ' *')) as block

FROM [http://community.qlik.com/thread/128625] (html, codepage is 1252, embedded labels, table is @1, filters(

ColSplit(1, IntArray())

));

t:

NoConcatenate load *

where block = 'C';

NoConcatenate load id, F1,

if(len(trim(block))=0, peek(block), block) as block

Resident tmp;

DROP Table tmp;

evan_kurowski
Specialist
Specialist

Hello online-dan,

Here's a routine (log_parse.qvs) that will extract the table in the 3rd Block from the varying structure flat-file log.txt, and export to ABC.CSV.


It uses the premises that:

  • the start indicator and block header size for Block 3 remains consistent
  • the end of table indicator for Block 3 remains consistent
  • rows with more than 5 consecutive dashes are spacers

Also, it is assigning the table and column names based on a hardcode.  If you expected the structure of the table in Block 3 to be varying, and wanted to dynamically parse for a changing set of attributes each time (table name, field names, and number of columns), that kind of flexibility would need to be added.

evan_kurowski
Specialist
Specialist

Massimo, I really like how you referenced back to the html table in this thread to source your script.  That could be a convenient and step-saving technique I would use again (vs. sometimes copying the table from the thread, saving to disk, etc..)

Not applicable
Author

Thank you all for your scripts and hints. I will try them this week an give you feedback.

Best regards

Dan