Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
msminek
Creator
Creator

Downloaing a huge file using tGoogleDriveGet causes OutOfMemoryError

Hi,
I try to download a huge file around 3GB in Google Drive using tGoogleDriveGet in TOS for DI 7.3.1  7.2.1.
But, I got a exception below.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at com.google.api.client.util.ByteStreams.copy(ByteStreams.java:55)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:94)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:63)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.executeCurrentRequest(MediaHttpDownloader.java:246)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:198)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMediaAndDownloadTo(AbstractGoogleClientRequest.java:642)
        at com.google.api.services.drive.Drive$Files$Get.executeMediaAndDownloadTo(Drive.java:3256)
        at org.talend.components.google.drive.runtime.GoogleDriveUtils.getResource(GoogleDriveUtils.java:385)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.getFile(GoogleDriveGetRuntime.java:76)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.runAtDriver(GoogleDriveGetRuntime.java:60)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveGet_1Process(gd_get.java:3833)
        at local_project.gd_get_0_1.gd_get.tFileInputDelimited_1Process(gd_get.java:3144)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveConnection_1Process(gd_get.java:652)
        at local_project.gd_get_0_1.gd_get.runJobInTOS(gd_get.java:4249)
        at local_project.gd_get_0_1.gd_get.main(gd_get.java:4071)
I use -Xms2048M -Xmx2048M system properties for zlulu 8.

openjdk version "1.8.0_212"
OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-win64) (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-win64) (build 25.212-b04, mixed mode)

And the direct file saving with tGoogleDriveGet got same error.

Google API support resumable media download. So tGoogleDriveGet also could support a huge file download with a small foot print, I thought.
https://developers.google.com/api-client-library/java/google-api-java-client/media-download
 
How can I solve this error ?

Regards.
Labels (2)
5 Replies
manodwhb
Champion II
Champion II

@sm , can you try to use jvm parameters like below and check that are you getting this error.

-Xms2048M -Xmx4096M

manodwhb
Champion II
Champion II

@sm ,check the below link to add JVM parameters for a job.

https://help.talend.com/reader/m454VvH7E2VjR2~xX7IjXQ/pUeYxsZrtMN4IeK5FjuZpA

msminek
Creator
Creator
Author

Thanks for your quick reply. I will try it later using the standalone job for the testing.

But, One thing I want to clarify your kindly advice.
Why 4GB is enough for me?

Regards.
manodwhb
Champion II
Champion II

@sm , we need test with multipule times with different parameters and see which one will work,since i have given based on your file is 3 GB

msminek
Creator
Creator
Author

0683p000009MZk1.png

Thanks @manodwhb. I got what you mean. I wonder if I I need 100GB of heap for a 100GB file.

One thing, I apologies that TOS for DI version is 7.2.1. not 7.3.1. I'm sorry.

The whole job is like this picture. CSV carries document IDs created from tGoogleDriveList result. The 1st~3rd files are small. And the 4th file is a 3GB mpeg file in the CSV.






The execution log with -Xmx4096M option is below. Same error.

D:\talend-gd\GoogleDriveAccessor\gd_get>java -Dtalend.component.manager.m2.repository="D:\talend-gd\GoogleDriveAccessor\gd_get/../lib" -Xms4096M -Xmx4096M -cp .;../lib/routines.jar;../lib/auto-common-0.3.jar;../lib/auto-service-1.0-rc2.jar;../lib/avro-1.8.1.jar;../lib/commons-codec-1.10.jar;../lib/commons-compress-1.8.1.jar;../lib/commons-io-2.5.jar;../lib/commons-lang3-3.8.1.jar;../lib/commons-logging-1.2.jar;../lib/components-api-0.27.3.jar;../lib/components-common-0.27.3.jar;../lib/components-googledrive-definition-0.27.3.jar;../lib/components-googledrive-runtime-0.27.3.jar;../lib/crypto-utils.jar;../lib/daikon-0.31.7.jar;../lib/daikon-exception-0.31.7.jar;../lib/dom4j-1.6.1.jar;../lib/google-api-client-1.27.0.jar;../lib/google-api-services-drive-v3-rev151-1.25.0.jar;../lib/google-http-client-1.27.0.jar;../lib/google-http-client-jackson2-1.27.0.jar;../lib/google-oauth-client-1.27.0.jar;../lib/google-oauth-client-java6-1.27.0.jar;../lib/google-oauth-client-jetty-1.27.0.jar;../lib/guava-20.0.jar;../lib/httpclient-4.5.5.jar;../lib/httpcore-4.4.9.jar;../lib/j2objc-annotations-1.1.jar;../lib/jackson-annotations-2.9.0.jar;../lib/jackson-core-2.9.9.jar;../lib/jackson-core-asl-1.9.14-TALEND.jar;../lib/jackson-databind-2.9.9.jar;../lib/jackson-mapper-asl-1.9.14-TALEND.jar;../lib/javacsv-2.0.jar;../lib/javax.inject-1.jar;../lib/javax.servlet-api-3.1.0.jar;../lib/jcl-over-slf4j-1.7.25.jar;../lib/jetty-6.1.26.jar;../lib/jetty-util-6.1.26.jar;../lib/joda-time-2.8.2.jar;../lib/json-io-4.9.9-TALEND.jar;../lib/jsr305-1.3.9.jar;../lib/log4j-1.2.17.jar;../lib/org.eclipse.swt.gtk.linux.x86_64-4.3.jar;../lib/org.osgi.service.component.annotations-1.3.0.jar;../lib/paranamer-2.7.jar;../lib/pax-url-aether-2.4.7.jar;../lib/servlet-api-2.5-20081211.jar;../lib/slf4j-api-1.7.25.jar;../lib/slf4j-log4j12-1.7.10.jar;../lib/snappy-java-1.1.1.3.jar;../lib/talend-codegen-utils.jar;../lib/talend_file_enhanced_20070724.jar;../lib/talendcsv.jar;../lib/xz-1.5.jar;gd_get_0_1.jar; local_project.gd_get_0_1.gd_get  --context=Default
(snip)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at com.google.api.client.util.ByteStreams.copy(ByteStreams.java:55)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:94)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:63)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.executeCurrentRequest(MediaHttpDownloader.java:246)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:198)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMediaAndDownloadTo(AbstractGoogleClientRequest.java:642)
        at com.google.api.services.drive.Drive$Files$Get.executeMediaAndDownloadTo(Drive.java:3256)
        at org.talend.components.google.drive.runtime.GoogleDriveUtils.getResource(GoogleDriveUtils.java:385)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.getFile(GoogleDriveGetRuntime.java:76)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.runAtDriver(GoogleDriveGetRuntime.java:60)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveGet_1Process(gd_get.java:3833)
        at local_project.gd_get_0_1.gd_get.tFileInputDelimited_1Process(gd_get.java:3144)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveConnection_1Process(gd_get.java:652)
        at local_project.gd_get_0_1.gd_get.runJobInTOS(gd_get.java:4249)
        at local_project.gd_get_0_1.gd_get.main(gd_get.java:4071)

Regards.