Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I need to load data from a large Excel spreadsheet - only one block.
The issue is, I don't even know where exactly that block starts (in which line) and how big it is because there are a number of
other blocks of data above it which grow over time.
The only thing I know is that the first block starts in line 10, so I can use that to get the column_headers (which are always the same).
I can also find out what lines belong to that block because it has one column (which doesn't even exist at the top, it only exists in some
blocks of data around the one I'm looking for) has one of a handful of entries.
That is all well.
The challenge is getting the line with the column_headings exactly once:
- There are different cost_types and every block is for one cost_type;
- Several cost_types (and thus several blocks of data) make up the one I am looking for and each of them might or might not be
filled with data.
I can filter for another column that only exists in these (three or four) blocks of data, so I'll only have three or four header_lines rather
than a dozen. Plus all the lines with the actual data.
=> How could I enter a condition in the table_wizard so that out of these three or four header_lines, only one is loaded?
I hope that is understandable. I attach a small sample that just about shows how the data looks like.
There will be another challenge coming because there is one such list per month and I have to load and append them all,
but let's go step by step.
Thanks a lot!
Best regards,
DataNibbler
Okay,
I think I have found a solution.
After building the LOAD with the table_wizard, I can simply insert an IF_condition like
>> IF([field] <> 'header_1', [field]) as [field] <<<
=> That way, those superfluous header_lines effectively disappear.
This is solved then.
Do you have any way of influencing how the spreadsheet is produced? If you have you can define an Excel range with a name for the particular block you need to fetch. This can be referred to as a table in the wizard just as the regular sheets in a workbook are called tables (by QlikView). All ranges will be listed in the drop-down in the wizard along with all the sheet names...
If you can't have range(s) added to the spreadsheet as standard then you will have to resort to other techniques of course:
One question that is important ... is there ever any blank rows in-between the different blocks? If not you could read all blocks into QlikView and then use RESIDENT load filtering out the rows you want from the full table and discarding the rest.
Hi Petter,
in theory I have the possibility of influencing the look of the Excel_files - but this file in particular is being used by the entire maintenance_team, so the chances of that are very slim in this case.
Yes, there can be blank rows inbetween the individual blocks, too.
I already thought of doing a RESIDENT LOAD with the DISTINCT keyword, so every line is only loaded once (as every line of fact_data has one unique number). But for that to work I have to do a LOAD with only those fields that I want to apply the DISTINCT to, no?
So I would have to later join it to the rest again.
A possible approach, but it looks like rather a lot of work. Is there any way I can filter out the unnecessary lines right from the start?
Are the headers for each block the same or different?
Have you looked at using the transformation wizard to delete rows before the required header and drop rows after the header after the next header?
I just notice - don't quite know why - as soon as I apply some expression to one of the fields, some of the values seem to disappear.
For example, my date_field originally looks like that
>> 004209.00000 << (just an example)
=> When I apply the expression >> Date(Floor([date_field]), 'DD.MM.YY') << to get a proper date, the superfluous header_lines where that cannot be applied (because the field_value is text) disappear.
Let's see what I can make of this.
And yes, I have used every trick the table_wizard offers - every one that I know about.
Okay,
I think I have found a solution.
After building the LOAD with the table_wizard, I can simply insert an IF_condition like
>> IF([field] <> 'header_1', [field]) as [field] <<<
=> That way, those superfluous header_lines effectively disappear.
This is solved then.
Just to round it off - you can import all the rows with all headers with a:
LOAD
*
FROM
[.....]
( ..., no labels ...);
And then you have the complete raw spreadsheet at your disposal. All the Fields are named like the columns in Excel A, B, C, D, ....
And the number of fields are adjusted dynamically (added) if any of the rows contain additional columns later...