Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I've got the problem, that I have CSV file with the structure:
Data |
Remark |
0001 2012, 12345678 |
Time: 05.01.2012 |
Pos Time Value |
1 12 41,5625 |
2 41 41,5687522888184 |
3 72 41,5687522888184 |
0002 2012, 12345678 |
Time: 06.01.2012 |
Pos Time Value |
1 29 39,0499992370605 |
2 59 39,1312522888184 |
3 89 39,1937522888184 |
The list of values continues a lot of times. So this is only an example.
On loading the CSV, I would like to have a table like this:
Pos | 0001 2012 | 0002 2012 |
1 | 41,5625 | 39,0499 |
2 | 41,5687 | 39,1312 |
3 | 41,5687 | 39,1937 |
Any other blocks in the CSV file should result in an additional column. The header rows and the blank lines should be skip!
Does anybody has an idea to accomplish that.
Many thanks
Michael
Hi Michael,
please see attached solution. The whole ETL process is done with several steps:
1. load related lines
2. mark blocks for columns
3. split data lines into fields
4. transpose with generic load
5. create target table
6. collect auto created generic tables
7. join everything together to finalize target table
- Ralf
Hi Michael,
this is possible. You have to read the whole reacord as line and parse the columns of the data lines with subfield().
You have to switch between header and data section on a condition.
After you have read the whole file you can transpose the lines into positions and columns.
It makes only sense to develop an example on the real data. Maybe you can attache a file.
- Ralf
Hi Ralf,
here is an example file from the original data.
Michael
Hi Michael,
please see attached solution. The whole ETL process is done with several steps:
1. load related lines
2. mark blocks for columns
3. split data lines into fields
4. transpose with generic load
5. create target table
6. collect auto created generic tables
7. join everything together to finalize target table
- Ralf
Hi Ralf,
wow, that's really great and exactly what I was looking for!
But there is an additional question. The code only reads data from 2012. Is there a more general way, which also works with 2013 data?
Thanks a lot for your effort!
Michael
Yes, of course. Just replace the wildmatch in Step1 with:
Step1:
LOAD if(wildmatch(Line, '* 201?,*', -1, 0), subfield(Line, ',', 1), peek(Col)) as Col, Line
Resident RawData;
- Ralf
Thank Ralf,
now it's exactly what I needed!
Michael