Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Am using tbigqueryinput to fetch sessions data from Google analytics, we tried fetching data as is and also after unnesting but it's taking almost 2 hrs to fetch data for one date. We are loading by using parallaleize option and splitting a date record in to two and loading parallel. Still it's taking 1 hr for minimal records. Is there a way to optimize performance . We found that bottleneck happens reading from big query and not while loading into target snowflake table. Am using Talend big data version 7.3.1
Have you tried running this query using the Google Big Query Cloud console? What is the difference in performance to return the complete dataset for a day? Have you tested this on the same machine as your job is running? What Result Size have you selected for this component? Do you know the size of the result (per day) you are expecting? This document from Google may help you as well.....
https://cloud.google.com/bigquery/quotas#queries
Hi rhall,
Yes in console its just taking 25 secs to 40 secs depends on date size but in talend same query its taking 1 hour 30minutes for a day's data. Result size after unnesting its coming around 12 million and it varies for different dates
When I mentioned Result Size, I meant the Result Size setting in the BigQuery component. It should be at the very bottom of the page, underneath the query. I suspect that you need to change this from small to large.
Am already using large only
We are using left join unnest of columns in Google analytics table
Hang on, I have just looked at the numbers. You said that you are getting something like 12,000,000 rows of data in 90 minutes. (12,000,000 / 90) / 60 = 2222.222 records a second. This data not only needs to be queried by Google, but the records need to be sent back over the net to your machine and split into rows. When you query using the console, you are not getting 12 million rows returned to you, you are getting a representation of the result. I don't think that this is particularly slow to be honest.
Yes, but is there a way to speed up the process
How fast is your internet connection? What bandwidth do you have available on your network? How much data (kb) is downloaded per row? There are so many unknowns here.