Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a file with over 100 million rows of data.
The job processes around 2,780 files per second when the job starts, but after about 5 million rows the speed starts to slow down and eventually goes down to about 2 rows per second.
The job is:
tFileInputDelimited > tMap > tContextLoad
↓
tJava > tFileOutputDelimited
In the tMap component, I have Advanced settings Store on disk Max buffer size: 1,000,000
In the job's Run Tab advanced settings I have: -Xms6256M and -Xmx7024M
The virtual server I am running the job on has 8 processors, 8 sockets and 32GB of RAM
What can I do to keep the job running at 2,780 files per second?
Your routine needs to look something like this (you will need to handle the imports, etc).....
public class GPSConvert { public static String ConvertCoords(Double long_, Double lat_){ String myResult = ""; CoordinateConversion cs = new CoordinateConversion(); //GET THE MGRS VALUE: myResult = String.valueOf( cs.latLon2MGRUTM(lat_,long_)); return myResult; } }
You can use this in your tMap by simply placing the code below in your the column you want to output this data in....
routines.GPSConvert.ConvertCoords(row1.long, row1.lat)
There may be a bit of tidying up to do, but this will make your job run a lot faster.
Can you show us a screenshot of your job. Your job description doesn't make any sense I'm afraid.
Also, I have just run a job where I generated 100,000,000 rows of data and wrote them to a file. It was writing at 1.3 millions rows a second and I was using just 4GB RAM. I sense you are doing a little more than just reading and writing. A screenshot might help fill in the blanks
In the tJava I am passing the Latitude, Longitude values to convert the point to the Military Grid Reference System value, here is the code:
String myResult = "";
CoordinateConversion cs = new CoordinateConversion();
Double lat_ = Double.parseDouble(context.myLat);
Double long_ = Double.parseDouble(context.myLong);
//GET THE MGRS VALUE:
context.myResult = String.valueOf( cs.latLon2MGRUTM(lat_,long_));
//WRITE TO OUTPUT FILE:
row2.myLat = context.myLat;
row2.myLong = context.myLong;
row2.myResult = context.myResult;
There isn't much to the job, the Lat Long values are loaded to context variables then passed to the tJava for converting to MGRS, then the MGRS value is outputted to the results file:
Instead of continuing to keep throwing more memory at it, is there some way to clear the job buffer/cache every 1M rows processed?
I am still a little confused by the layout. Why are you assigning context variables millions of times? Why are you iterating to a tJava? What is the tJava sending to the tFileOutputDelimited? By the way, the tJava is not really best suited to working with row connectors. Can you give a description of what you are trying to achieve? This does not look like it will be terribly efficient at all.
I have a file with over 100M unique Latitude, Longitude points.
I need to find out what the corresponding Military Grid Reference System (MGRS) designation is for each point.
For example input:
LATITUDE | LONGITUDE |
33.172 | -97.069 |
Output:
MGRS | LATITUDE | LONGITUDE |
14SPB800720 | 33.172 | -97.069 |
I pull the latitude longitude from each row of the file, I am passing the lat/long values to context variables so I can use the context variables in the tJavaRow when I call the function for getting the MGRS value.
OK, that is not necessary and is probably causing horrendous memory and time issues. Here is the layout you will need....
Input File ----->tMap------->Output File
The function can be used in a tMap against your column values while they are part of the row. If your function is several lines of code, add it to a Routine. If you are not sure how to do that, post your function here and I can help convert it for you.
If you convert it to use the above configuration, it will run significantly faster.