G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
I need to combine the subsequent G203, G204, G205 and G205 lines into one line
The record types G204, G205 and G206 are not mandatory
Thanks and regards,
Amirths
Hai SHONG,
Input file
=======
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
Output file
========
G203abcd1234efgh20090805|G204abcd1234jhdf20090805|G205abcd1234idpe20090805|G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805|G204abcd1234jhdf20090805|G205abcd1234idpe20090805|G206abcd1234leyc20090805
G203abcd1234efgh20090805|G204abcd1234jhdf20090805
G203abcd1234efgh20090805|G204abcd1234jhdf20090805|G205abcd1234idpe20090805
Imagine this as a set of transactions for an account
The transaction details continue in G204, G205 and G206 if the txn length is more than the per line limit
Here I dont have a unique key to merge the lines as I can have more than 1 set of transactions for an account
Basically I need to scan the records in a loop after G203 till I reach G206 or another G203
Thanks and regards,
Amirths
I think the fastest route ( if you don't have in the record some information to correlate the different records ) is to generate a synthetic key... For your job I think the quickest job is to lay out a flow like input file ---> tjavarow --> denormalize on full line -> fileoutput to create a key to correlate records, in tjavarow place a code like ---------------------------------- int mykey = (Integer)(globalMap.get("MYSYNTHKEY") == null ? 0 : globalMap.get("MYSYNTHKEY")); if(input_row.Column0.equals("G203")) { //gen new key and store it mykey++; globalMap.put("MYSYNTHKEY", mykey); } output_row.FULLROW = input_row.Column0+input_row.Column1; output_row.SYNTHKEY = mykey; ---------------- Then, in output you will have a full line of the input file plus a key that correlate all the relative Gxxx records so you can easy obtain your output file with tdenormalize on the FULLROW column.
Hai,
Thanks for the reply......
But the problem in my case is, the input file is a positional file.
The key generates a sequeunce number and so the position of the key varies for each set of transaction, which I cannot handle dynamically.
Thanks and regards,
Amirths
Amirths,
I assure you that the design I suggested produce the file you said in output and works as positional.
It is correct, try it.
The generated key is not ever increasing, but increases only when G203 is presented.
I.E.
Input with positional first column first 4 chars, second column the remaining:
-------
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
-----
Synth Key Added with tjava ( another field )
-----
1G203abcd1234efgh20090805
1G204abcd1234jhdf20090805
1G205abcd1234idpe20090805
1G206abcd1234leyc20090805
2G203wxyz5678efgh20090805
3G203jsdf92342urfj20090805
4G203abcd1234efgh20090805
4G204abcd1234jhdf20090805
4G205abcd1234idpe20090805
4G206abcd1234leyc20090805
then tdenormalize on the full row excluding the key
------
G203abcd1234efgh20090805G204abcd1234jhdf20090805G205abcd1234idpe20090805G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805G204abcd1234jhdf20090805G205abcd1234idpe20090805G206abcd1234leyc20090805
-----
You can handle the synth key dinamically with tDenormalize component.
No problem... To optimize: - you can process only the changed data if you need to schedule - or process data in chunks - or you can process as a one shot ( anyway I think that for >10M records you should use 64bit version of talend ) bye