Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi,
I used talend for data extraction and insertion into OTSDB. But I need to cut my file, and a classic iteration take too much time (40 rows/s and I have 90 millions rows).
Do you know how to send for example 50 rows by 50 rows instead of each row individually?
Best regards,
terreprime.
Can you show the JSON that was created using a System.out.println call?
Are you sure that the Rest service method should be PUT? I would have opted for POST if I wasn't given a direction here.
Hi,
This is the JSON:
Actually, I changed the put method by post and it works !
But the treatment is really slow .... 20 rows/s.
How can I accelerate the treatment of file ?
Terreprime
A web service call is always going to be slow by comparison to loading to an on-premise database. What you need to do is figure out if you can batch up your calls. If you can, you then need to work out how many rows can be sent at a time. There is absolutely no way you will get it going faster than this unless you can send more than one row per web service call or send parallel service calls. I suspect you can send more than one row in each web service call.
Yes, I know, this is the initial subject of this topic. I'll like sending data by group of 50 rows for example. But I don't know how to proceed.
OK, I have put together an example you will need to extrapolate from. It is quite simple. The layout for your job will be.....
tFileInputDelimited ---> tJavaFlex ---> tFileterRow ---> tFlowToIterate --->tRest
1) You read the file as normal with the tFileInputDelimited.
2) The magic happens in the tJavaFlex. The code below shows what I did with my example. You will need to extrapolate from this to put in your JSON build (and combine) code....
Start Code
//Used to count the rows int count = 0; //Used to concatenate your Strings String myConcatenatedVal = "";
Main Code
//Append 1 to each incoming row
count++;
//Concatenate your code (adjust this to concatenate your computed JSON Strings
myConcatenatedVal = myConcatenatedVal+row1.newColumn;
//A modulus operation to fire on every 50th row. It sets the output "newColumn" to the concatenated value, then resets the myConcatenatedVal and count variables.
if(count%50==0){
row2.newColumn = myConcatenatedVal;
myConcatenatedVal = "";
count=0;
}else{
//The output "newColumn" column is set to null when not the 50th row
row2.newColumn = null;
}
This code will build up your records and only output a value every 50th record. It will output a null value for every other row. To handle this null value (to filter it out), we use the tFilterRow. Use the Advanced Mode and then set the code to ....
input_row.newColumn!=null
Your tRest will now only run once for every 50 records.
I hope that helps
Your code is really great !
I adapt your code for OTSDB:
Start code:
//Used to count the rows int count = 0; //Used to concatenate your Strings String myConcatenatedVal = "[";
Main code;
count++;
String myJSON = "{\"metric\":\""+context.metric+"\",\"value\":" + row1.myJSON+",\"timestamp\":"+context.timestamp+",\"tags\":{"+ "\"" + context.tags1 + "\"" +":"+"\"" + context.tags2+ "\"" + "}}";
myConcatenatedVal += myJSON;
context.timestamp++;
if(count%50==0){
row2.myJSON=myConcatenatedVal + "]";
myConcatenatedVal = "[";
count = 0;
}else{
myConcatenatedVal += ",";
row2.myJSON= null;
}
Result:
I am between 850 and 1000 row/s it's much better !
Thank you very much !
Best regards
Terreprime
No problem. You may be able to tweak this to get even better performance by adjusting the number of rows you are merging.
Hi
If i have 153 records in my input and try to execute this job i am getting only 3 rows in output.rest three records are ignored.
can you please help me how to get those 3 records too.
for eg. my output is
1,2,3...50
51,52,53 ... 100
101,102,103...150
but ignoring 151,152 and 153.
Can you please help me getting these records too as a new line in this code