Adding a static header row to multiple files split... - Qlik Community

evan_kurowski · ‎2014-04-15

I am splitting up several .txt data files which each individually contained a different years worth of data. It turns out the requirement for the application will actually be using data files stored in weekly increments, not yearly. So in order to test the functionality I needed to break out the data in the yearly files into separate weekly files.

There were several step-intensive ways (i.e. cut & paste) I started to approach this before I decided there must be a more condensed solution. I managed to configure an awk statement that will now split up my data files meeting the naming convention, into separately and appropriately named files at the weekly level.

The problem I'm trying to fix is that I haven't been able to insert the column headers as the first row into each week's output file. Ideally I'd like to get this syntax into a single awk statement or perhaps a chained set of command-line directives. The statement and the data structure below:

awk -F "\t" '{close(f);f=$2}
{print > "Inventory"f".txt"}' Inventory*.txt

Cust# WkYr SKU InvWeek InvYear OnHand
0000101348 201101 00105D 01 2011 25.000
0000101348 201101 00106D 01 2011 4.000

evan_kurowski · ‎2014-04-15

Maybe I should add... I am able to load from all my various text files as long as none of them contain a header row, using delimeter field number assignments within QlikView. I have a feeling since the header row is not present in the many individual .txt files, perhaps a small bit of performance is gained?

LOAD
@1 As Cust#,
@2 As WkYr,
@4 As SKU,
@5 As InvYear,
@6 As OnHand
FROM
Mytextfiles.txt
(txt, codepage is 1252, no labels, delimiter is '\t', no quotes)

I always considered this an option but I was reluctant to modify the scripts to accomodate this. I would have rather adjusted the data sources to fit the scripts, rather than the opposite in this case.

If anyone has any tips or insight as to how I could automate adding the field names header row to a large number of .txt files being split apart from a larger file, I would appreciate your insight. Thanks so much! ~Evan

Anonymous · ‎2014-04-15

Hello Qlik Community Members- this discussion has was posted previously but accidentally deleted we are recreating the thread. If you have any helpful answers please feel free to respond.

Oleg Troyansky Apr 11, 2011 11:50 PM (in response to EvanKurowski)

I'm not familiar with awk, so I can't quite help with a solution outside of QlikView. If I had to do something like the, I'd read all the fields into QlikView and then create a "for"loop, running across all week numbers, and extract information just for that week and store it using the command STORE into txt files. This way, I think, the headers will be in...

--------------------------

EvanKurowski Apr 12, 2011 5:41 PM (in response to Oleg Troyansky)

That seems like a good suggestion to figure out Oleg, because I would like the knowledge to do this 'splitting' technique within QlikView. Thank you for your response.

Originally I had started to construct a concat string of all the different WkYr values to use with a 'For..Each' loop, but then QlikView started to get stubborn and refused to chain-concatenate my string variable. Everything was pretty much a tanglement of variables & chr(39)'s & ','s & whatnot...

Then I think I realized I could try it within a single loop and no second loop was needed. My attempt is below. When it arrives at this portion of the script I have the table 'Temp_Sort' which has a full listing of all my data which I would like to break out into weekly increment text files.

MyWkYrs:
LOAD DISTINCT
WkYr
RESIDENT Temp_Sort
ORDER BY WkYr DESC;

Let vWkYrCount=NoOfRows('MyWkYrs');
TRACE NoOfRows(vWkYrCount) = $(vWkYrCount);

FOR i = 0 to vWkYrCount-1
Let curWeek = Peek('WkYr',$(i),MywkYrs);
TRACE curWeek = $(curWeek) and i = $(i);

//Put STORE ~> INTO statements here
//START section to CREATE, STORE, then DROP each weekly increment table

[Purgatory$(curWeek)]:
NOCONCATENATE LOAD *
RESIDENT Temp_Sort
WHERE WkYr = $(curWeek);

STORE * FROM [Purgatory$(curWeek)] into [Purgatory$(curWeek).txt] (txt, delimiter is '\t');
DROP Table [Purgatory$(curWeek)];

//END section to CREATE, STORE, then DROP each weekly increment table

NEXT

Any thoughts on optimization, efficiency, etc.. would be welcome. Thank you.

--------------------------

EvanKurowski Apr 12, 2011 5:43 PM (in response to EvanKurowski)

Also, any thoughts on how to suppress the double spacing when cutting pasting code would be nice. What settings am I missing? Sure does make it harder to read.

-------------------------

Oleg Troyansky Apr 12, 2011 5:55 PM (in response to EvanKurowski)

Your code looks pretty good - this is exactly what I meant...

The double-spacing has been a sore issue for me, too - I never know how to post the code correctly, and nothing worked for me quite well so far...

-------------------------

alxtoth Apr 13, 2011 4:31 AM (in response to Oleg Troyansky)

Delimit the script part with [ c o d e ] and [ / c o d e ] .

-Alex

------------------------------

alxtoth Apr 13, 2011 4:29 AM (in response to EvanKurowski)

Yeeee, I am not the only one around using awk 🙂

#get the header
head -n 1 origfile.txt > header.txt

#use the header
cat header.txt weekly.txt > result.txt

-Alex
www.snowflakejoins.com

----------------------------------------

EvanKurowski Apr 13, 2011 10:50 AM (in response to alxtoth)

Excellent suggestions Alexandru.. and these steps will work great if I am dealing with a small amount of files to repeat these steps on. The thing now is I need to figure out how to incorporate the naming convention of many output files. We are taking what could be several Yearly level flat file data sources and breaking them out into many Weekly level files, so I need some sort of naming convention functionality for the output files.

You will see in the example I posted in the first post, it will take any files with the 'Inventory' string prefix so it will grab Inventory2000, Inventory2001, Inventory2002 and break them all up. And the output file names will be based on the field name they are reading from, so the output formats are going to be Inventory200201, Inventory200202,..etc.. If I could just get the header row to be written in that loop everything would be perfect!

Here are some examples that came tantalizingly close but I still didn't solve how to modify.

http://unstableme.blogspot.com/2009/09/split-file-using-awk-few-examples.html

This example #1 takes a flat file and breaks it up when it finds a header row in the flat file based on a repeating string. In my case it was all raw data.

http://unstableme.blogspot.com/2008/08/split-file-add-headers-awk-and-bash.html

This example will insert a header row, but it knows the length of each breakout file was a fixed, predetermined number of rows (4 rows). In my case the number of rows being written to each breakout file can vary.

So here this left me very, very close but still so far.. Also, I did wrap the post above in the

tags, I even copied and pasted from Word but... still got double spacing.

----------------------------------

alxtoth Apr 13, 2011 3:31 PM (in response to EvanKurowski)

hi,

First split the files with awk. Than add the header in a for loop to all weekly files.

-Alex

-----------------------------------

alxtoth Apr 13, 2011 3:33 PM (in response to alxtoth)

You migth want to use another text editor. Well, anything except Word.

Adding a static header row to multiple files split by AWK