2 Replies Latest reply: Mar 13, 2013 12:03 PM by BayuAditya RSS

    Architecture and development for terabytes of data

      Hi all,

      Currently I've been asked by my client to build a architecture plan for QV in their life insurance company.

      They have around 4 terabytes of data in oracle.

      Is there anyone could help me on this? how the best architecture and development plan for this? Architecture in here is also about the detail specification of hardware.

      Honestly, this huge data is a nightmare for me and also its a life insurance company which is the analysis must include for the first year of customer (all year).

       

      Need an expert...help

        • Re: Architecture and development for terabytes of data
          Bill Markham

          Hi

           

          4 terabytes of data in Oracle is significant, but it does not make it mandatory for you to enter the nightmares of the Twilight Zone.  It is always an option though should you like scary movies.

           

          Here some sweeping guesses to encourage sweet derams.

           

          Oracle data held is oft split 50:50 data vs.indices, so halving the 4 TB leaves 2 TB.of real data.

           

          The odds are at least half of this data will be of no relevance to QlikView, so halving 2 TB leaves 1 TB of data for QlikView to handle.

           

          QlikTech marketing blurb often states "compression" to 10% when loaded into RAM.  It is in reality de-duping and not compression, but it stills end up smaller.  I am more comfortable with 25% as that leaves headroom for other RAM overheads and our friends in marketing exagerating a tad.  So quartering 1 TB gets us down to 250 Gbytes.

           

          Nowadays 250 Gbytes RAM is becoming more & more commonplace and is nothing to be frightened of.

           

          Disc is a fairly cheap commodity nowadays, but you'll have planty of QVD's, QVW's & raw data floating about and nobody will ever likes deleting the old rubbish files -  so I'd go for 10 TB RAID 1 across a couple of dozen disc spindles.

           

          For CPU grunt it really depends on what it needs to do and how many End Users.  But if you don't know then bung in a couple of dozen Intel cores and leave spare slots to be able to double this when the proverbial & the fan collide.

           

          QlikView architecture is not complicated and should not be frightening either.  If you don't know the predicted workload, you could always guess and go for:

          • Production QV / Web Server:    2 server node cluster, external shared storage as Windows UNC Share
          • Publisher:      Single server, say half the spec of the Production QV / Web nodes
          • Test:             Single server, say half the spec of the Production QV / Web nodes
          • Dev:              Single server, say half the spec of the Production QV / Web nodes

           

           

          It is all about estmiating the length of a piece of elastic that you have never seen.

           

           

          Sweet dreams,    Bill