Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

QLIKVIEW vs ETL

Hi community,

QLIKVIEW can be considered as an ETL?

ETL means Extract-Transform-Load. It is a platform known well by the Computer scientist.

Thank you.

17 Replies
johnw
Champion III
Champion III


gerryhorgan wrote:your answer surprises me, since you are obviously experienced and very vocal on this site. I don't think you gave a straight answer, rather a pedantic one that misrepresented the product.


I gave a straight answer as best I could. But as I said to the original poster, "I do think I'm misunderstanding your questions." And perhaps I'm misunderstanding your points about ETL as well. But my answer wasn't a sales pitch - I couldn't care less if QlikTech or some other company gets your money, QlikView isn't the answer to all problems, and there are plenty of things about QlikView that I'm not happy with. I'm not out to mislead anyone about what QlikView can and can't do or what it should or shouldn't be used for.


gerryhorgan wrote:I believe that QlikView have chosen to gloss over this with deliberate half truths. I'm specifically referring to their web page "QlikView and your data" with the section "Beyond traditional ETL - Add value not redundancy". Paraphrasing, they state that traditional BI requires ETL followed by cube building. They then state that due to Qlikview's in-memory architecture, cubes are not required and infer that neither is ETL since "data simply needs to be moved in to memory". I agree about the cubes. I disagree about most of the ETL data preparation leading up to the time of cube construction. I see this as a deliberate half truth from QlikView. They go on to say that "qlikView can aggregate and calculate in real-time". So what? What I interpret this as is, QlikView can aggregate and calculate good data intermingled with transactional garbage in real time.


Let me start with this bit, since I mostly agree with you here. You can't skip the ETL step for anything complicated. Merely aggregating transactional garbage is of little use. For complicated data, you either need a data warehouse (which required ETL to build) or you need to do ETL to get the data in a usable format for QlikView. You're very unlikely to be able to take complicated data from a modern business, load it in raw from databases, and get something beautiful in QlikView. If QlikTech implies that's how it works and that it's easy and wonderful, then I disagree with them. And if you think that's what I was implying, that was certainly not my intent. You need an ETL step.


gerryhorgan wrote:My limited personal experience with the product coupled with answers from both the sales and technical team during our proof of concept and our trainer, point to a different conclusion. QlikView is only suitable for light ETL or to put it another way QlikView is a bad ETL tool...
My conclusion which was agreed with grudgingly by the senior sales person on site is that the most efficient way to develop an overall QlikView solution is to perform a separate ETL stage before the data is fed to QlikView. QlikView is not an efficient way to do the many ETL tasks (even though some of them may be achieveable in an inefficient manner). It is not reasonable to expect users to access raw transactional data, use QlikView and produce totals that agree with existing reports (which may incorporate multiple complex business rules).

Well, I would disagree with your limited personal experience and with your senior sales person then. Now, perhaps you're right about your own ETL needs, but at our company, a separate ETL stage outside of QlikView is usually not the best or most efficient way to handle things in my opinion.

Initially, we did use separate ETL if I understand what you're saying. We had a handful of batch jobs, SAS jobs, whatever, that would actually do the database reading, transform the data into a reporting format, and then dump it out as a flat file. Several of these are still in existence. QlikView isn't serving as much of an ETL in those cases, just an L.

But as I gained experience, in some cases I realized that I could do that work in the QlikView script, and typically with MUCH less effort, many fewer lines of code. The problem was simply one of familiarity, in the sense that if you normally program in Java, you'll find programming in LISP to be very unfamiliar, very awkward. That doesn't mean that the system you're unfamiliar with is worse for the job in any objective sense. At this point, I do most new ETL inside of QlikView itself.

I do agree that in some cases, such as when all your raw data contains is some basic business facts, and rules about how to transform those facts into a report, QlikView is probably not the place to apply those rules. I'd probably encourage people in this case to apply those business rules to the business facts and actually store the results in separate tables. Perhaps think of it like a little data warehouse for that specific data, reporting data built from the transactional data and business rules. Then the reports and QlikView would load from there, and both would match because they're showing the same data. But in this case, the ETL is very much happening outside of QlikView, which is perhaps one of your main points. If so, yes, agreed. In some cases it is much better to do the ETL outside of QlikView. Complicated business logic isn't something you generally want to duplicate across multiple environments. You want to do it once and only once.

And even when in my opinion the ETL should be in QlikView, I'll agree that people are not going to be able to purchase QlikView and immediately start writing complicated ETL inside of it. It's going to take experience, possibly years of experience to master a new environment. In the mean time, writing complicated ETL in whatever system you're used to writing in may be the best solution. And you certainly CAN continue using QlikView in that way, as only a load step, with the actual extract and transformation occurring elsewhere. It's not my preference, but you can.

And I know of at least one QlikView expert (and I'm sure there are others) who would probably strongly disagree with me, and would probably state emphatically that the ETL should be done outside of QlikView, and that QlikView should be used as a reporting system only, not as an ETL system.

So returning now to me being misleading and pedantic, and perhaps being pedantic about it, nobody asked, "Can I do the most complicated possible ETL in QlikView, and do it easily, and is it the right place for a beginner to do it?" They just asked, "Can QlikView be considered an ETL." Simple ETL is simple. Complicated ETL is complicated. But it'll do both, I do both with it, and I generally prefer to do my ETL inside of QlikView instead of outside. I consider it an ETL product.

Not applicable
Author

Thanks John, for taking the time to give such a lengthy and carefully thought out answer. My company has already made a significant commitment to the product. We have found it difficult to obtain good quality technical opinion from people not affiliated with QlikView. Your input and some of the other replies have been very valuable.

Not applicable
Author

I read this discussion with interest, but I have to mention that everybody here have their own idea what ETL tool is. I also started to learn QV some time ago, and from my previous experience with working with industrial ETL tools I can rank Qlikview by following domains:

1) Extraction - good

2) Transformation - poor

3) Loading - good

So why I think transformation is performing poor? First of all, ETL tool cannot do regular tasks much more better than SQL script or Excel, it allows to do such tasks that were almost impossible for others.

I don't know Qlikview very well, but features listed below must present in ETL tool:

1) Creating SCD type2 from regular table, which actually is a snapshot without any dates

2) Ability to write into several targets at one moment in parallel

3) visual interface for designing ETL proccesses (ability to view dataflows, not the plain script code)

So from my point today Qlikview cannot be named as ETL tool - it's just have some abilities for ETL.

Best regards, Andrei.

PS. I was impressioned by crosstable and hierarchy functionality in Qlikview, but this is not enough for ETL 🙂

johnw
Champion III
Champion III

Regardling the features that you say must be present in an ETL tool:

1) Someone posted a link to their blog post yesterday about handling slowly changing dimensions. I believe what they implemented was type 2. It's certainly easy enough to link your data to a type 2 table using intervalmatch. Ah, here we are, from Torben Seebach, based on an earlier example by colinr:

http://guerrillabi.com/Slowly_Changing_Dimensions_QlikView

Not sure what you mean about creating it "from regular table, which actually is a snapshot without any dates". If you have a single snapshot, you don't have a slowly changing dimension (Edit: Oops. You have a type 1.). If you have multiple snapshots, you need to distinguish them in some way. Type 2 distinguishes them by a version number or a date range. Another typical approach is to timestamp each snapshot in a separate history database, but then you're talking about a type 4 slowly changing dimension, not a type 2. Are you saying you think that for software to be called an ETL, it must "automatically" convert from type 4 to type 2? It's certainly not overly difficult in QlikView to do this conversion, but you're right that there isn't some "convert this SCD from type 4 to type 2" button.

Most of our SCDs are type 4, and I'll admit it took quite a while before I got comfortable manipulating them in QlikView, and that it still doesn't seem particularly intuitive to me.

I'm probably not understanding what you're saying, though.

2) Not sure what you're saying here.

3) Agreed, QV has no visual interface for ETL. It's just script.

I have nothing to compare to, but I could probably agree that QlikView is poor at the transformation step. I'm sure it could be a lot better, and I don't think the data transformation step has been QlikView's focus, but rather the manipulation and display of the data once transformed. That seems particularly true if, as was claimed earlier, their marketing heavily implies that transformation isn't necessary. Obviously their focus won't be on something their marketing implies isn't necessary, even if many of us agree that it IS necessary.

Not applicable
Author

Hi, John.

I see that you missed one condition in my statement: "table without any dates". I understand that QV can use tables like SCD2, but I think it would be very difficult to create table of type SCD2.

I'll try to give an example. Let's imagine that the source table looks like following (column names in first row) in 01-Jan-2011 when we first extract it's data:

ID, Attr1, Attr2, Attr3

1, A1, N3, B2

2, A1, N2, B5

3, A2, N3, B4

Than next day (02-Jan-2011) we extract it once again and now data is:

ID, Attr1, Attr2, Attr3

1, A1, N3, B4

3, A2, N3, B4



4, A3, N2, B1

Proper SCD2 table will look like:

SID, ID, Attr1, Attr2, Attr3, DateFrom, DateTo

1, 1, A1, N3, B2, 01-Jan-1000, 01-Jan-2011

2, 2, A1, N2, B5, 01-Jan-1000, 01-Jan-2011

3, 3, A2, N3, B4, 01-Jan-1000, 31-Dec-9999

4, 1, A1, N3, B4, 02-Jan-2011, 31-Dec-9999

5, 4, A3, N2, B1, 02-Jan-2011, 31-Dec-9999

If you know method how to create such SCD2 table from source table and can share it, I'll be very thankful to you.

2) One of the distinctive features of ETL tools is ability to write into different targets in parallel: one statement to write data into a table and into a file. This can not be achieved by SQL statement - you can specify only one table using "INTO" clause. In QV seems that all can be done in sequential order only writing 2 STORE statements.

So there are a lot of cases when QV alone is not enough.

johnw
Champion III
Champion III


Andrei_Bushkevich wrote:If you know method how to create such SCD2 table from source table and can share it, I'll be very thankful to you.


I think it's a bad idea to create the SCD2 table in QlikView. I think the SCD2 table should be in your DBMS where other systems have access to it. QlikView is not a DBMS.

That said, it's possible of course. I did think it would be simpler than it ended up being. And perhaps it is simpler, but I wasn't able to simplify any further quickly. I'm particularly unfond of the CurrentRecord table, which seems like a ridiculously roundabout way to do what it's doing, but I have to use exists() to get an optimized load. So it seems I either need a field like that, or to duplicate the ToTimestamp in an earlier load, which might be fewer lines of code but would be slower to execute. Anyway, maybe someone else can improve it a bit.

Below is the script. The source is MyTable.txt, which you would cut and paste the data into, or continue modifying data in. You'd load from your real source at that point in a real application, of course. The first time you reload it'll use January 1, 1000 as the FromTimestamp. After that, it will use the time of the load. I think this method of loading will be efficient, such as using optimized QVD loads, but I'm not seeing it in the log, and it's too fast to see on my screen. I didn't assign the SID since rows can be uniquely identified without it by ID and FromTimestamp, but one could probably easily be assigned with the recno() function.

SET vQVD=MyTable.qvd;

[MyTable]:
LOAD
@1 as ID
,@2 as Attr1
,@3 as Attr2
,@4 as Attr3
FROM MyTable.txt (txt, codepage is 1252, no labels, delimiter is ',', msq, header is 1 lines)
;
LEFT JOIN ([MyTable])
LOAD
timestamp(if(filesize('$(vQVD)')>0,now(),makedate(1000,1,1))) as FromTimestamp
,timestamp(makedate(9999,12,31)) as ToTimestamp
AUTOGENERATE 1
;
IF filesize('$(vQVD)') > 0 THEN
CONCATENATE ([MyTable])
LOAD *
FROM $(vQVD) (QVD)
WHERE exists(ToTimestamp)
;
[MyTable2]:
NOCONCATENATE
LOAD ID, Attr1, Attr2, Attr3
,min(FromTimestamp) as FromTimestamp
,max(FromTimestamp) as MaxFrom
RESIDENT [MyTable]
GROUP BY ID, Attr1, Attr2, Attr3
;
DROP TABLE [MyTable];

[MyTable]:
NOCONCATENATE
LOAD *
,timestamp(if(ID=previous(ID) or MaxFrom<now(),now()-1/(24*60*60),makedate(9999,12,31))) as ToTimestamp
RESIDENT [MyTable2]
ORDER BY ID, FromTimestamp desc
;
DROP TABLE [MyTable2];
DROP FIELD MaxFrom;

[CurrentRecord]:
LOAD timestamp(makedate(9999,12,31)) as CurrentRecord
AUTOGENERATE 1
;
CONCATENATE ([MyTable])
LOAD *
FROM $(vQVD) (QVD)
WHERE not exists(CurrentRecord,ToTimestamp)
;
DROP TABLE [CurrentRecord];
END IF

STORE [MyTable] INTO $(vQVD) (QVD);

Not applicable
Author

Hi John,

Firstly, let me thank you for your help last time with sort problem. it worked just fine.

On this issue,the only thing I might add is that "It all depends" on what kind expertise you have at your disposal. From my experience, ETL guys wont enjoy doing ETL in Qlikviev. Most of the time we are having problems hiring Good Qlikview developers and even the Best of Qlikview developers face problems on large scale data modeling. also, its hard to train people do to ETL in Qlikview when they do not have any experience in data manipulations, and when they have experience in data manipulations they dont want to work with Qlikview and they would rather do ETL using SSIS and other opensource tools. And the reason is Job advancements and work load. Also, they say expert in SSIS can almost work for any company as more jobs and options areavailable and Qlikview ETL has limited options, unless you are "John Witherspoon"

You have lots of experience in general and so do I so for you and me it might not be an issue, but some people I have worked with are already complaining about career development.

Also, some companies are also thinking of not having to employ redundant work force, they alreadyhave SQL developers and DBA's.

Its easier to have standards and processes in ETL tools and its hard to do in Qlikview. (Big installations)

Hard to maintain because its all scripts and its not very clean. SSIS can be managed by in house DBA's or developers. Think about having to ETL 400 tables in Qlikview from 4 different RDBMS.

So, "IT ALL DEPENDS".But in general, if you have 10-15 tables then no need to read this article and worry about ETL at all. But to Architect Mid-Large Qlikview Installations, I would rather not suggest to do ETL in Qlikviewjust because of complications and hard to maintain codes (Youjust don’t have expertise out there).

IT ALL DEPENDS - on BI Architect what he feels comfortable with. Anyways, how many "John Witherspoon"are out there who is looking to change jobs. I bet even you could have hard time hiring people like you, I'mhaving trouble in recruiting.

Not applicable
Author

Excelent Solution for SCD type 2. It works like chairme!!!

Is it possible to insert a new row telling if is a new record ou a changed one? i am trying to edit it but ... didnt get there.