Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tFileInputExcel with more than approx. 2500 rows gives stackoverflow

Does anyone else have problems with excel as input? It seems to be the Pattern (regex) which gives a stackoverflow when there is a bit more than 2500 rows. I have 7000 rows, but the error always comes after approx. 2500 rows have been processed. Exporting to CSV and doing the same parsing gives me no problems. As a sidenote, it seems much faster to process CSV.
Labels (2)
12 Replies
Anonymous
Not applicable
Author

Hi,
How do you set your component tFileInputExcel? The purpose of Using Regex is that: select this check box if you want to use a regular expression to filter the sheets to process. Would you give us your screenshot for your job?
Best regards
Sabrina
Anonymous
Not applicable
Author

Hello
I dropped it again due to the problem, and went for CSV.
I was of course aware of not enabling or using regexp anywhere in this test.
The scenario is simple: Make an excel with 7000 rows, and lets say, five columns. Content can be anything, and even the same for each row. Read it in and just use logRow.
Regarding patterns it is specific to this component. I have since used the same data from CSV, where I converts it through tReplace, through double regexp's in a tMap, and have also used tAggregateRow. No problems in getting through all records.
Anonymous
Not applicable
Author

Hi,
Regarding patterns it is specific to this component. I have since used the same data from CSV, where I converts it through tReplace, through double regexp's in a tMap, and have also used tAggregateRow

You must be designed a job, would you minding uploading screenshot to us(especially the tMap). From your description, is the job flow tFileinputdelimited-->tReplace-->tMap-->tAggregateRow-->tLogrow, right? All is fine in .csv file but not excel? Need more info from you, thanks alot!
Best regards
Sabrina
Anonymous
Not applicable
Author

I am seeing a similar issue. I am loading a date dimension from an excel spreadsheet and the tFileInputExcel fails after about 2600 records.
Anonymous
Not applicable
Author

Thank you jmagana
To xdshi. There is nothing more to it than I stated. No advanced designs needed. Just try it.
Anonymous
Not applicable
Author

Hi Jojs
For testing, I am reading 10000 rows from excel file on v5.2.1 and it works, which version are you using? Do I miss something to reproduce the problem? What do you mean "Pattern (regex) which gives a stackoverflow"?
Shong
Anonymous
Not applicable
Author

Hello
It is excel 2007 (xlsx). I attach a job which fails, including test-data "test material.xlsx". (Ups, I can't attach zip-files?, well, the explanation below should be enough)
This test was run on mac OSX 10.8.2 with TOS 5.2.1.r95162
I did one additional test, where I copied only formats, numbers, dates and text into a new sheet, and deleted the original sheet. Now I can read all records, and it is much faster. This sheet is named "test material2.xlsx"
The sheet have the following columns:
id, Type, Color, Type title, Date, Revision, Title, Collection
I just realize that the first column with ids is made like this
1
=+A2+1
=+A3+1
...
When I replace that with pure numbers, it can be read with no problems.
Regarding the observation about regular expressions, I would like to quote from the documentation:
tFileInputExcel opens a file and reads it row by row to split data up into fields using regular expressions.
Now I tried to change the advanced setting for "Generation mode" from "Memory-consuming", to "Less memory consumed". That will also do the trick, and actually "Less memory consumed" reads all sheets faster than when using "Memory-consuming"
So I guess that formulas in the sheet and "Memory-consuming" do not work so well together.
And based on tests, it seems that "Less memory consumed" is faster anyway.
Best regards - Jojs
Anonymous
Not applicable
Author

Hi Jojs,
It suggested that you should open a new topic for your issue so that more persons in forum will see it. In addition, could you upload your job screenshots into forum to help us to address your issue.
Best regards
Sabrina
Anonymous
Not applicable
Author

As simple as this attached. Does it tell you someting new?
I am not sure what you mean about starting a new topic? About what, and where, and for whom?
To me it sounds like a bug for "Memory-consuming" "Generation mode". But I would question the reason to even have that mode, since it is slower even for simple files. I suggest that Talend either remove that generation mode, fix the bug or document it with the component.
(On another note, there is a similar option when parsing XML-files, which also is questionable if it is about performance)

Best regards - Jojs
http://www.talendforge.org/forum/img/members/60190/mini_103940_Sk