Hi All,
I am essentially trying to do a select distinct to get unique rows from a relatively small data set - 420,000 rows x 55 columns. I am using the tUniqueRows component and persistently getting java heapspace errors.
I have tried a number of options - increasing the jvm parameter up to 2048; increasing the page file; using tHashOutput and tHashInput files; doing the unique on a single column - where I would ideally like to do it across all; and writing the data set out into a delimited file in my parent job and moving the tUniqueRow into a separate job and reading the delimited file back in there.
I have tried using the tUniqueRow component with standard setting first with all of the above mentioned options, and then also setting the tUniqueRow component settings to use disk with a buffer size of 1000 for all above mentioned options - seems to make little difference to the final outcome.
When using the disk and buffer size settings, the job manages to load all rows into the tUniqueRow component, but then fails with the java heapspace error before outputting any results. I have tried output to delimited file (preferred) and also to tHashout and even tLogRow, just in case it was writing to the delimited file that caused the error.
I suspect the large number of columns is the problem, but am not sure how I can easily remedy this situation.
Any ideas???
Error as follows -
Starting job BMD01_UniqueRow at 11:49 05/12/2012.
connecting to socket on port 3550
connected
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space
disconnected
at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.tFileInputDelimited_1Process(BMD01_UniqueRow.java:5212)
at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.runJobInTOS(BMD01_UniqueRow.java:5393)
at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.main(BMD01_UniqueRow.java:5258)
Caused by: java.lang.OutOfMemoryError: Java heap space
at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow$1FileRowIterator_tUniqRow_1.load(BMD01_UniqueRow.java:4214)
at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow$1FileRowIterator_tUniqRow_1.next(BMD01_UniqueRow.java:4239)
at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.tFileInputDelimited_1Process(BMD01_UniqueRow.java:4320)
... 2 more
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
Job BMD01_UniqueRow ended at 11:53 05/12/2012.