Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
AbiJeev
Creator
Creator

TALEND tBigQueryinput is fetching records very slow

Hi,

Am using tbigqueryinput to fetch sessions data from ​Google analytics, we tried fetching data as is and also after unnesting but it's taking almost 2 hrs to fetch data for one date. We are loading by using parallaleize option and splitting a date record in to two and loading parallel. Still it's taking 1 hr for minimal records. Is there a way to optimize performance . We found that bottleneck happens reading from big query and not while loading into target snowflake table. Am using Talend big data version 7.3.1

Labels (4)
8 Replies
Anonymous
Not applicable

Have you tried running this query using the Google Big Query Cloud console? What is the difference in performance to return the complete dataset for a day? Have you tested this on the same machine as your job is running? What Result Size have you selected for this component? Do you know the size of the result (per day) you are expecting? This document from Google may help you as well.....

https://cloud.google.com/bigquery/quotas#queries

AbiJeev
Creator
Creator
Author

Hi ​rhall,

Yes in console its just taking 25 secs to 40 secs depends on date size but in talend same query its taking 1 hour 30minutes for a day's data. Result size after unnesting its coming around 12 million and it varies for different dates

Anonymous
Not applicable

When I mentioned Result Size, I meant the Result Size setting in the BigQuery component. It should be at the very bottom of the page, underneath the query. I suspect that you need to change this from small to large.

AbiJeev
Creator
Creator
Author

Am already using large only​

AbiJeev
Creator
Creator
Author

We are using left join unnest of columns in Google analytics table ​

Anonymous
Not applicable

Hang on, I have just looked at the numbers. You said that you are getting something like 12,000,000 rows of data in 90 minutes. (12,000,000 / 90) / 60 = 2222.222 records a second. This data not only needs to be queried by Google, but the records need to be sent back over the net to your machine and split into rows. When you query using the console, you are not getting 12 million rows returned to you, you are getting a representation of the result. I don't think that this is particularly slow to be honest.

AbiJeev
Creator
Creator
Author

Yes, but is there a way to speed up the process ​

Anonymous
Not applicable

How fast is your internet connection? What bandwidth do you have available on your network? How much data (kb) is downloaded per row? There are so many unknowns here.