Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Best practice for splitting a file into 'n' number of files (split every 'x' number of rows)

I'm trying to test out how I can split a file out into 'n' number of files (e.g. 3 lines per file).  I have the below setup which works, albeit an extra null file gets created which I need to figure out.  I'm trying to wrap my head around if there's a better, simpler way I should achieve this output.  In my real-world example my input file will have many columns (around 100), so I'm not sure it's a best practice to be copying all the values from row2 into row3 in the tJava_3

image.png

image.png

tJava_3

final int ROWS_PER_FILE = 3;

int currentIteration = (Integer)globalMap.get("tFlowToIterate_1_CURRENT_ITERATION");
int fileSuffix = (int)Math.floor(currentIteration / ROWS_PER_FILE);
String fileSuffixStr = String.format("%04d", fileSuffix);

// Make a new filename every 3 records
String filename = "SomeFile.txt";
filename = filename.replace(".txt","-"+fileSuffixStr+".txt");
globalMap.put("filename", filename);

// Copy row data to output flow
row3.FirstName = (String)globalMap.get("row2.FirstName");
Labels (3)
5 Replies
Anonymous
Not applicable
Author

Hi,

 

    The most easy way is to add a sequence number to the record when you are passing the data through tMap. Take the reminder value of tMap sequence and based on the reminder value, you can pass the data to three output flows in tMap itself.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

manodwhb
Champion II

@Moe ,since you were writing to file using tFileOutputDelimited, there is an check box which you can use to split output in several files and then specify the number of line to split in rows in each output file. I believe this would be best way when you were using tFileouputDelimited.

Untitled.png

Anonymous
Not applicable
Author

Thanks I have that option currently but the suffix added is just a sequential number. I need to have a differently formatted suffix (0001 vs 1).
akumar2301
Specialist II

The Solution which Manohar specified is simplest.

If you need to rename file , you can rename it after using tFileList and tFileCopy.

Anonymous
Not applicable
Author

Thanks, let me try to just utilize renaming the files after the fact. I'm trying to achieve padding the sequence number to 4 positions with leading zeros

before: FileName-1.txt
after: FileName-0001.txt