Remove useless information from a log file before ... - Qlik Community

Report Inappropriate Content · ‎2014-08-07

Dear all,

I try to analyze a log file. The structure of the file looks like this:

******* Block A ***********
01/01/2014 16:18:47 useless header infos
01/01/2014 16:18:47 useless header infos
....
******* Block B ***********
01/01/2014 16:18:47 useless header infos
|-|------------------------------|----------|---|----|----|
| PARAMETERS 'INPUT'
|-|------------------------------|----------|----|----|----|
|1|COL2                          |COL3      |COL4|COL5|COL6|
|-|------------------------------|----------|---|----|----|
|0|123456789012345678901234567890|1234567890|000|0000|0000|
|-|------------------------------|----------|---|----|----|
...
******* Block C ***********
01/01/2014 16:18:47 useless header infos
|-|------------------------------|----------|---|----|----|
| TABLE 'ABC'
|------------------|------------------|--------------------|
|COL1              |COL2              |COL3                |
|------------------|------------------|--------------------|
|012345678901234567|012345678901234567|012345678901234567  |
|------------------|------------------|--------------------|
|blabla            |                  |                    |
|blabla            |                  |                    |
....
|-|------------------------------|----------|---|----|----|
...

Block A, B and C recur in the logfile on no fixed position or line for other dates, too.

Now I am looking for a good way to import the data into QlikView. I am only interessted in the table data in Block C.

What is the best way to get rid of the useless infos in Block A and B? I want only Block C with its fixed structure to remain, so that I can import it as simple as a CSV file.

I hope you can help me on this issue. Thanks for your help in advance.

Best regards,

Dan

marcus_sommer · ‎2014-08-07

You could use '|' as delimiter and set a header from x rows. If it is more complicated with your data-structure you could use a bunch of functions like peek(), len(), subfield(), keepchar() .... to create a "normal" data-structure.

There are also native qlikview-functionalities (within the load-wizard) to transform unstructered data - a very good explantion for this is in the book QlikView 11 for Developers (is generally a big recommendation).

- Marcus

martynlloyd · ‎2014-08-07

You could use something like this to tag the start and end of Block C

LOAD A
FROM
[..\Community2.xlsx]
(ooxml, no labels, table is Sheet1);

Temp1:
LOAD
A,
If(WildMatch(A, '*BLOCK C*'), RowNo()) as BlockCStart,
If(WildMatch(A, '*BLOCK A*'), RowNo()) as BlockCEnd
Resident Log;

You'd then have a table of start/end of all block C's to then use Peek or Where...

If someone knows how to set a variable in the middle of a Load, then you would be able to tag the rows as good/bad...

Regards,

Marty.

Report Inappropriate Content · ‎2014-08-07

This seems to be an interesting approach. I will try it the next days and tell you if it worked for me.

Thanks.

Best regards

Dan

Report Inappropriate Content · ‎2014-08-07

The WildMatch function worked good so far. I transformed the information into a new temp table:

TableNr | LineNrStart | LineNrEnd

1 | 755 | 839

2 | 912 | 1200

This means that the relevant information (Block C) is in the lines 755 to 839 and 912 to 1200 in my log file.

I created the following table - containing my original log file:

LogFile:

LOAD

RowNo() as lineNr, A as lineContent

FROM [logfile.txt]

...

Now I wanted to extract the relevant information (Block C) from it using the new temp table. What is the best way to do so? In SQL I would start something like this:

SELECT lineContent

FROM LogFile

WHERE lineNr between 755 and 839

AND lineNr between 912 and 1200;

Thanks in advance for your help.

Best regards

Dan

martynlloyd · ‎2014-08-08

Hi Dan,

I haven't tested this but it will be something like:

// loop through temp table

FOR vblock = 1 to NoOfRows('TempTable')

Let vStart = Peek('LineNrStart',vBlock,'TempTable')

Let vEnd = Peek('LineNrEnd',vBlock,'TempTable')

Log:

LOAD

lineContent

Resident ....

Where lineNr >= $(vStart) and <= $(vEnd)

;

NEXT vblock

QV will automatically concatenate the individual loads and you will end up with one table, drop everything you don't need as the last thing.

Regards,

Marty.

maxgro · ‎2014-08-08

if you want to filter block 'C'

tmp:

LOAD F1, rowno() as id,

if(index(F1, 'Block'), PurgeChar(replace(F1, 'Block', ''), ' *')) as block

FROM [http://community.qlik.com/thread/128625] (html, codepage is 1252, embedded labels, table is @1, filters(

ColSplit(1, IntArray())

));

t:

NoConcatenate load *

where block = 'C';

NoConcatenate load id, F1,

if(len(trim(block))=0, peek(block), block) as block

Resident tmp;

DROP Table tmp;

evan_kurowski · ‎2014-08-08

Hello online-dan,

Here's a routine (log_parse.qvs) that will extract the table in the 3rd Block from the varying structure flat-file log.txt, and export to ABC.CSV.

It uses the premises that:

the start indicator and block header size for Block 3 remains consistent
the end of table indicator for Block 3 remains consistent
rows with more than 5 consecutive dashes are spacers

Also, it is assigning the table and column names based on a hardcode. If you expected the structure of the table in Block 3 to be varying, and wanted to dynamically parse for a changing set of attributes each time (table name, field names, and number of columns), that kind of flexibility would need to be added.

evan_kurowski · ‎2014-08-08

Massimo, I really like how you referenced back to the html table in this thread to source your script. That could be a convenient and step-saving technique I would use again (vs. sometimes copying the table from the thread, saving to disk, etc..)

Report Inappropriate Content · ‎2014-08-11

Thank you all for your scripts and hints. I will try them this week an give you feedback.

Best regards

Dan

Remove useless information from a log file before import