1 Reply Latest reply: Aug 3, 2016 10:42 AM by Fikri Shihab RSS

    Filter data with conditions

    Fikri Shihab

      Hi Qlikview Experts,

       

      I have some requirements that I need to process with source and also the processed data.

       

      The source table contains several columns, but only a few columns that I'd like to focus.


      The interested columns are date, time, network_id, cell_id, region, terminal_name, terminal_SWV, hardware_status, terminal_lat, terminal_long.


      The daily log files contains the information for each terminal roughly every 5 minutes. Thus each line denotes the information of each hardware or terminal every 5 minutes. This includes the status of the hardware or terminal in network every 5 minutes such as its status (in network or undetected and how long it has been undetected), where the terminal is located and its region, etc.

       

      I need to count and identify terminal or hardware that is in network or out of network/undetected for certain period of time (e.g. < 5 minutes, 5-10 minutes, etc.). Unfortunately, the hardware status column is in string and does not really give direct indicator where the data can be easily filtered. So, the strings in the data needs to be processed first. What I have done to do is to convert the text in the column hardware_status to seconds, something like below in the script:

       

       

      hardware_status_table:

      LOAD

           date, time, network_id, cell_id, region,

            terminal_name, terminal_swv, hardware status,

        // (UNDETECTED 14 days 10 h 40 min 3 sec)   - indicating how long the terminal is out of network    

            mid(hardware status, index(hardware status, 'sec')-3, 2) as second,

             mid(hardware status, index(hardware status, 'min')-3, 2) as minute,

             mid(hardware status, index(hardware status, 'h')-3, 2) as hour,

             mid(hardware status, index(hardware status, 'days')-3, 2) as day,

       

       

             if (mid(hardware status, index(hardware status, 'sec')-3, 2)>0, mid(hardware status, index(terminal_age, 'sec')-3, 2),0) +

             if (mid(hardware status, index(terminal_age, 'min')-3, 2)>0, mid(hardware status, index(terminal_age, 'min')-3, 2) * 60, 0) +

             if (mid(hardware status, index(hardware status, 'h')-3, 2)>0, mid(hardware status, index(hardware status, 'h')-3, 2) *60 * 60,0) +

             if (mid(hardware status, index(hardware status, 'days')-3, 2)>0, mid(hardware status, index(hardware status, 'days')-3, 2) * 60 * 60 * 24, 0) as[Time in sec],    

       

       

          if (left(hardware status,11) ='(UNDETECTED', 'UNDETECTED', if (left(terminal_age,11)='(IN_NETWORK', 'IN_NETWORK','UNKNOWN')) as IN_NETWORK,

         

           terminal_lat, terminal_lon,

         

           if (previous(terminal_name) <> terminal_name,1,0) as back_in,

           Timestamp(Floor( Timestamp( date & ' ' & time ,'DD/MM/YYYY hh:mm:ss'),1/288)) as [Time Stamp]

       

       

      FROM

      [Source\*.csv]

      (txt, codepage is 1252, embedded labels, delimiter is ',', msq)

       

       

      where wildmatch(terminal_name,'FX*') and date >='20/06/2016';

       

       

      store terminal_status_table into filename.qvd(qvd);

       

      I created an output with a straight table like the attached and some charts as below:

       

      undetected1.png

       

      undetected2.png

       

      The dimension is Timestamp.

       

      The expression for the column indicating terminal undetected between 5-10 minutes in the straight table is simply

       

      =sum(if(([Time in sec]>300 and [Time in sec]<=600 and IN_NETWORK='IN_NETWORK'), 1,0))


      Could someone verify if the above script and expression are correct to describe the requirements. Is there also another simpler way?

       

      Now, I found that on the same day there are many terminals with the same id in different zone show "undetected" but also shows in_network in different zone. which are not expected in the calculation. Example is below.    

       

      Datehardware_idnetwork_nameRegionStatusLess than 5 MinLess than 10 MinUndetected (sec)In Network (sec)
      2016-06-20FX151.126649145Zone1IN_NETWORK240-
      2016-06-20FX151.126649145Zone1UNDETECTED003183160

       

      I'd like to calculate availability of total terminals, but need to filter out this type of condition. So the condition is if the same terminals in different zone on the same period (e.g. day or possibly another defined period) shows undetected, then do not include in the calculation. In this case the terminal status may be old and just simply be incorrectly reported as as it moves to different zone, so we need to discard this.

       

      I'd like to filter this out so that the result of the expression above does not include the undesired condition.

       

      What would be the right expression for this?

       

      Thanks for help!