Hi
I have a subjob that processes some data and outputs it to a CSV using tOutputFileDelimited (1167 rows). I need to pull this into a tFileInputDelimited further down the flow, but every time it imported 0 rows.
So I split the subjob so that the tFileInputDelimited was the first component in a second subjob that was triggered on the first job completing, however now it's only pulling in 57 rows!
If I run the second subjob independently, it pulls in every row from the file, is there any reason it won't do this from onSubjobOk?
The job is essentialy like this:
tMap --> tFileOutputDelimited ¦ v onSubjobOk ¦ v tFileInputDelimited --> tMap
Any help would be appreciated as the output file is working fine and I can't under stand why the input won't read all the data rows unless the subjob is ran independently! I have attached images of the input and output settings...
You may be running into a timing issue where the output file has not yet been flushed to the drive and closed. If you want to keep the current design, put in something like a 30 second tSleep to give the data time enough to flush to disk.
However, a better solution is to not use an external file at all. For this very small number of rows, take a look at tHashOutput and tHashInput. This is an in-memory hash table that will give better performance, operates in real-time, and does not rely on an external drive system to function.
Be aware of things like memory consumption when using the tHash components. I have jobs that cache several hundred thousand rows of data and they are just fine. Your mileage may vary, depending on your environment.