Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
msminek
Creator
Creator

Downloaing a huge file using tGoogleDriveGet causes OutOfMemoryError

Hi,
I try to download a huge file around 3GB in Google Drive using tGoogleDriveGet in TOS for DI 7.3.1  7.2.1.
But, I got a exception below.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at com.google.api.client.util.ByteStreams.copy(ByteStreams.java:55)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:94)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:63)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.executeCurrentRequest(MediaHttpDownloader.java:246)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:198)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMediaAndDownloadTo(AbstractGoogleClientRequest.java:642)
        at com.google.api.services.drive.Drive$Files$Get.executeMediaAndDownloadTo(Drive.java:3256)
        at org.talend.components.google.drive.runtime.GoogleDriveUtils.getResource(GoogleDriveUtils.java:385)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.getFile(GoogleDriveGetRuntime.java:76)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.runAtDriver(GoogleDriveGetRuntime.java:60)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveGet_1Process(gd_get.java:3833)
        at local_project.gd_get_0_1.gd_get.tFileInputDelimited_1Process(gd_get.java:3144)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveConnection_1Process(gd_get.java:652)
        at local_project.gd_get_0_1.gd_get.runJobInTOS(gd_get.java:4249)
        at local_project.gd_get_0_1.gd_get.main(gd_get.java:4071)
I use -Xms2048M -Xmx2048M system properties for zlulu 8.

openjdk version "1.8.0_212"
OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-win64) (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-win64) (build 25.212-b04, mixed mode)

And the direct file saving with tGoogleDriveGet got same error.

Google API support resumable media download. So tGoogleDriveGet also could support a huge file download with a small foot print, I thought.
https://developers.google.com/api-client-library/java/google-api-java-client/media-download
 
How can I solve this error ?

Regards.
Labels (2)
5 Replies
manodwhb
Champion II
Champion II

@sm , can you try to use jvm parameters like below and check that are you getting this error.

-Xms2048M -Xmx4096M

manodwhb
Champion II
Champion II

@sm ,check the below link to add JVM parameters for a job.

https://help.talend.com/reader/m454VvH7E2VjR2~xX7IjXQ/pUeYxsZrtMN4IeK5FjuZpA

msminek
Creator
Creator
Author

Thanks for your quick reply. I will try it later using the standalone job for the testing.

But, One thing I want to clarify your kindly advice.
Why 4GB is enough for me?

Regards.
manodwhb
Champion II
Champion II

@sm , we need test with multipule times with different parameters and see which one will work,since i have given based on your file is 3 GB

msminek
Creator
Creator
Author

0683p000009MZk1.png

Thanks @manodwhb. I got what you mean. I wonder if I I need 100GB of heap for a 100GB file.

One thing, I apologies that TOS for DI version is 7.2.1. not 7.3.1. I'm sorry.

The whole job is like this picture. CSV carries document IDs created from tGoogleDriveList result. The 1st~3rd files are small. And the 4th file is a 3GB mpeg file in the CSV.






The execution log with -Xmx4096M option is below. Same error.

D:\talend-gd\GoogleDriveAccessor\gd_get>java -Dtalend.component.manager.m2.repository="D:\talend-gd\GoogleDriveAccessor\gd_get/../lib" -Xms4096M -Xmx4096M -cp .;../lib/routines.jar;../lib/auto-common-0.3.jar;../lib/auto-service-1.0-rc2.jar;../lib/avro-1.8.1.jar;../lib/commons-codec-1.10.jar;../lib/commons-compress-1.8.1.jar;../lib/commons-io-2.5.jar;../lib/commons-lang3-3.8.1.jar;../lib/commons-logging-1.2.jar;../lib/components-api-0.27.3.jar;../lib/components-common-0.27.3.jar;../lib/components-googledrive-definition-0.27.3.jar;../lib/components-googledrive-runtime-0.27.3.jar;../lib/crypto-utils.jar;../lib/daikon-0.31.7.jar;../lib/daikon-exception-0.31.7.jar;../lib/dom4j-1.6.1.jar;../lib/google-api-client-1.27.0.jar;../lib/google-api-services-drive-v3-rev151-1.25.0.jar;../lib/google-http-client-1.27.0.jar;../lib/google-http-client-jackson2-1.27.0.jar;../lib/google-oauth-client-1.27.0.jar;../lib/google-oauth-client-java6-1.27.0.jar;../lib/google-oauth-client-jetty-1.27.0.jar;../lib/guava-20.0.jar;../lib/httpclient-4.5.5.jar;../lib/httpcore-4.4.9.jar;../lib/j2objc-annotations-1.1.jar;../lib/jackson-annotations-2.9.0.jar;../lib/jackson-core-2.9.9.jar;../lib/jackson-core-asl-1.9.14-TALEND.jar;../lib/jackson-databind-2.9.9.jar;../lib/jackson-mapper-asl-1.9.14-TALEND.jar;../lib/javacsv-2.0.jar;../lib/javax.inject-1.jar;../lib/javax.servlet-api-3.1.0.jar;../lib/jcl-over-slf4j-1.7.25.jar;../lib/jetty-6.1.26.jar;../lib/jetty-util-6.1.26.jar;../lib/joda-time-2.8.2.jar;../lib/json-io-4.9.9-TALEND.jar;../lib/jsr305-1.3.9.jar;../lib/log4j-1.2.17.jar;../lib/org.eclipse.swt.gtk.linux.x86_64-4.3.jar;../lib/org.osgi.service.component.annotations-1.3.0.jar;../lib/paranamer-2.7.jar;../lib/pax-url-aether-2.4.7.jar;../lib/servlet-api-2.5-20081211.jar;../lib/slf4j-api-1.7.25.jar;../lib/slf4j-log4j12-1.7.10.jar;../lib/snappy-java-1.1.1.3.jar;../lib/talend-codegen-utils.jar;../lib/talend_file_enhanced_20070724.jar;../lib/talendcsv.jar;../lib/xz-1.5.jar;gd_get_0_1.jar; local_project.gd_get_0_1.gd_get  --context=Default
(snip)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at com.google.api.client.util.ByteStreams.copy(ByteStreams.java:55)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:94)
        at com.google.api.client.util.IOUtils.copy(IOUtils.java:63)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.executeCurrentRequest(MediaHttpDownloader.java:246)
        at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:198)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMediaAndDownloadTo(AbstractGoogleClientRequest.java:642)
        at com.google.api.services.drive.Drive$Files$Get.executeMediaAndDownloadTo(Drive.java:3256)
        at org.talend.components.google.drive.runtime.GoogleDriveUtils.getResource(GoogleDriveUtils.java:385)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.getFile(GoogleDriveGetRuntime.java:76)
        at org.talend.components.google.drive.runtime.GoogleDriveGetRuntime.runAtDriver(GoogleDriveGetRuntime.java:60)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveGet_1Process(gd_get.java:3833)
        at local_project.gd_get_0_1.gd_get.tFileInputDelimited_1Process(gd_get.java:3144)
        at local_project.gd_get_0_1.gd_get.tGoogleDriveConnection_1Process(gd_get.java:652)
        at local_project.gd_get_0_1.gd_get.runJobInTOS(gd_get.java:4249)
        at local_project.gd_get_0_1.gd_get.main(gd_get.java:4071)

Regards.