Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear all,
I try to analyze a log file. The structure of the file looks like this:
******* Block A ***********
01/01/2014 16:18:47 useless header infos
01/01/2014 16:18:47 useless header infos
....
******* Block B ***********
01/01/2014 16:18:47 useless header infos
|-|------------------------------|----------|---|----|----|
| PARAMETERS 'INPUT'
|-|------------------------------|----------|----|----|----|
|1|COL2 |COL3 |COL4|COL5|COL6|
|-|------------------------------|----------|---|----|----|
|0|123456789012345678901234567890|1234567890|000|0000|0000|
|-|------------------------------|----------|---|----|----|
...
******* Block C ***********
01/01/2014 16:18:47 useless header infos
|-|------------------------------|----------|---|----|----|
| TABLE 'ABC'
|------------------|------------------|--------------------|
|COL1 |COL2 |COL3 |
|------------------|------------------|--------------------|
|012345678901234567|012345678901234567|012345678901234567 |
|------------------|------------------|--------------------|
|blabla | | |
|blabla | | |
....
|-|------------------------------|----------|---|----|----|
...
Block A, B and C recur in the logfile on no fixed position or line for other dates, too.
Now I am looking for a good way to import the data into QlikView. I am only interessted in the table data in Block C.
What is the best way to get rid of the useless infos in Block A and B? I want only Block C with its fixed structure to remain, so that I can import it as simple as a CSV file.
I hope you can help me on this issue. Thanks for your help in advance.
Best regards,
Dan
You could use '|' as delimiter and set a header from x rows. If it is more complicated with your data-structure you could use a bunch of functions like peek(), len(), subfield(), keepchar() .... to create a "normal" data-structure.
There are also native qlikview-functionalities (within the load-wizard) to transform unstructered data - a very good explantion for this is in the book QlikView 11 for Developers (is generally a big recommendation).
- Marcus
You could use something like this to tag the start and end of Block C
LOAD A
FROM
[..\Community2.xlsx]
(ooxml, no labels, table is Sheet1);
Temp1:
LOAD
A,
If(WildMatch(A, '*BLOCK C*'), RowNo()) as BlockCStart,
If(WildMatch(A, '*BLOCK A*'), RowNo()) as BlockCEnd
Resident Log;
You'd then have a table of start/end of all block C's to then use Peek or Where...
If someone knows how to set a variable in the middle of a Load, then you would be able to tag the rows as good/bad...
Regards,
Marty.
This seems to be an interesting approach. I will try it the next days and tell you if it worked for me.
Thanks.
Best regards
Dan
The WildMatch function worked good so far. I transformed the information into a new temp table:
TableNr | LineNrStart | LineNrEnd
1 | 755 | 839
2 | 912 | 1200
This means that the relevant information (Block C) is in the lines 755 to 839 and 912 to 1200 in my log file.
I created the following table - containing my original log file:
LogFile:
LOAD
RowNo() as lineNr, A as lineContent
FROM [logfile.txt]
...
Now I wanted to extract the relevant information (Block C) from it using the new temp table. What is the best way to do so? In SQL I would start something like this:
SELECT lineContent
FROM LogFile
WHERE lineNr between 755 and 839
AND lineNr between 912 and 1200;
Thanks in advance for your help.
Best regards
Dan
Hi Dan,
I haven't tested this but it will be something like:
// loop through temp table
FOR vblock = 1 to NoOfRows('TempTable')
Let vStart = Peek('LineNrStart',vBlock,'TempTable')
Let vEnd = Peek('LineNrEnd',vBlock,'TempTable')
Log:
LOAD
lineContent
Resident ....
Where lineNr >= $(vStart) and <= $(vEnd)
;
NEXT vblock
QV will automatically concatenate the individual loads and you will end up with one table, drop everything you don't need as the last thing.
Regards,
Marty.
if you want to filter block 'C'
tmp:
LOAD F1, rowno() as id,
if(index(F1, 'Block'), PurgeChar(replace(F1, 'Block', ''), ' *')) as block
FROM [http://community.qlik.com/thread/128625] (html, codepage is 1252, embedded labels, table is @1, filters(
ColSplit(1, IntArray())
));
t:
NoConcatenate load *
where block = 'C';
NoConcatenate load id, F1,
if(len(trim(block))=0, peek(block), block) as block
Resident tmp;
DROP Table tmp;
Hello online-dan,
Here's a routine (log_parse.qvs) that will extract the table in the 3rd Block from the varying structure flat-file log.txt, and export to ABC.CSV.
It uses the premises that:
Also, it is assigning the table and column names based on a hardcode. If you expected the structure of the table in Block 3 to be varying, and wanted to dynamically parse for a changing set of attributes each time (table name, field names, and number of columns), that kind of flexibility would need to be added.
Massimo, I really like how you referenced back to the html table in this thread to source your script. That could be a convenient and step-saving technique I would use again (vs. sometimes copying the table from the thread, saving to disk, etc..)
Thank you all for your scripts and hints. I will try them this week an give you feedback.
Best regards
Dan