Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Big excel files as input

Hello and greetings from Germany 0683p000009MA9p.png,

 

this is my routine:

tFileInputExcel -> tMap -> tMysqlOutput

 

My problem is that the Excel file contains 190000 rows.

I read a lot and understand that talend can't process that large Excel files.

I tried to convert it to csv but it's complicated, because there are semicolons as data in columns.

Also my file uses commas as decimal seperators which throws errors by talend (or mysql or both).

 

All in all I need a different solution. Any suggestions?

I read about spliting the Excel in 10000 row parts but I can't reconstruct it without tFileOutputDelimited and this is not working because of the semicolons in columns and my missing knowledge in talend 0683p000009M9p6.png 0683p000009MACJ.png . I need a solution that works fully self-sufficient.

 

Thanks for your support!

 

[Using: Talend DataIntegration 6.4.0]

Labels (2)
1 Solution

Accepted Solutions
cterenzi
Specialist
Specialist

On the Advanced Settings tab of your tFileInputExcel component, there's a Generation Mode setting with two choices: User mode and Event mode.  Event mode consumes less memory and may allow you to read directly from your Excel file.

View solution in original post

6 Replies
TRF
Champion II
Champion II

The problem is probably not the numer of lines in the input file, but the size of the memory to deal with the content.

Do you have any error when the job runs? If yes, which one?

cterenzi
Specialist
Specialist

On the Advanced Settings tab of your tFileInputExcel component, there's a Generation Mode setting with two choices: User mode and Event mode.  Event mode consumes less memory and may allow you to read directly from your Excel file.

Anonymous
Not applicable
Author

@TRF:

Spoiler
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260)
    at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1420)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1385)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1370)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:370)
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:144)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:183)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:175)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:438)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:403)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:266)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:336)
    at local_project.wwi_excel_0_1.wwi_excel.tFileInputExcel_1Process(wwi_excel.java:1313)
    at local_project.wwi_excel_0_1.wwi_excel.runJobInTOS(wwi_excel.java:2485)
    at local_project.wwi_excel_0_1.wwi_excel.main(wwi_excel.java:2334)

@cterenziI think it worked!!! Wow thank you 0683p000009MApk.png I will check the data and reply tomorrow!

Anonymous
Not applicable
Author

@cterenziIt works like a charm (less than 1 minute for 200000 rows).

 

But I get another error and maybe you could help me again? 0683p000009MACn.png Or should I start a new topic?

 

I get this error:

Spoiler
Exception in component tFileInputExcel_2 (wwi_excel)
java.lang.RuntimeException: The cell format is not Date in ( Row. 182369 and ColumnNum. 7 )
    at local_project.wwi_excel_0_1.wwi_excel.tFileInputExcel_2Process(wwi_excel.java:1379)
    at local_project.wwi_excel_0_1.wwi_excel.runJobInTOS(wwi_excel.java:2272)
    at local_project.wwi_excel_0_1.wwi_excel.main(wwi_excel.java:2121)

It's very weird because all other rows are loaded correctly and this field has the same formatation. (see screenshot attachment)

Little be sad that the whole file has only 182396 rows (27 more) 0683p000009M9p6.png


Capture.PNG
cterenzi
Specialist
Specialist

Cell appearance and cell format are different things in Excel. If you right-click on that date and choose Format cells... it will likely be formatted General or Text instead of Date.

Also, I'm not sure whether that row number means the actual Excel row or the row number after excluding headers. You may want to check both places.
Anonymous
Not applicable
Author

But I get another error and maybe you could help me again?  0683p000009MACn.png Or should I start a new topic?

 I get this error:

Spoiler
Exception in component tFileInputExcel_2 (wwi_excel)
java.lang.RuntimeException: The cell format is not Date in ( Row. 182369 and ColumnNum. 7 )
    at local_project.wwi_excel_0_1.wwi_excel.tFileInputExcel_2Process(wwi_excel.java:1379)
    at local_project.wwi_excel_0_1.wwi_excel.runJobInTOS(wwi_excel.java:2272)
    at local_project.wwi_excel_0_1.wwi_excel.main(wwi_excel.java:2121)

It's very weird because all other rows are loaded correctly and this field has the same formatation. (see screenshot attachment)

Little be sad that the whole file has only 182396 rows (27 more) 0683p000009MAcj.png

 
 
Tags (0)
 


Hello hw 123,

This topic has been set as resolved. Could you please create a new topic? Which will

 get the best support from the community. Many thanks

Best regards

Sabrina