Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
i have a table with data like
ServerName , Timestamp, Service,Status
abcd,25/07/2015 10:12:45 AM,service1,stopped
abcd,25/07/2015 10:12:45 AM,service2,running
abcd,25/07/2015 10:12:47 AM,service3,running
abcd,25/07/2015 10:12:48 AM,service4,running
efgh,25/07/2015 10:12:55 AM,service1,running
efgh,25/07/2015 10:12:55 AM,service2,running
efgh,25/07/2015 10:12:56 AM,service3,stopped
efgh,25/07/2015 10:12:57 AM,service4,running
Now my requirement is if the Service service1 is not running then i want to add a new column as OVERALLSTATUS in this table like below
ServerName , Timestamp, Service,Status,OVERALLSTATUS
abcd,25/07/2015 10:12:45 AM,service1,stopped,0
abcd,25/07/2015 10:12:45 AM,service2,running,0
abcd,25/07/2015 10:12:47 AM,service3,running,0
abcd,25/07/2015 10:12:48 AM,service4,running,0
efgh,25/07/2015 10:12:55 AM,service1,running,1
efgh,25/07/2015 10:12:55 AM,service2,running,1
efgh,25/07/2015 10:12:56 AM,service3,stopped,1
efgh,25/07/2015 10:12:57 AM,service4,running,1
basically if service1 is running then OVERALLSTATUS will be 1 for all the other services for the server and if its not running then OVERALLSTATUS should be 0.
Please suggest a way to do in load script as well as data will be huge
Try like
LOAD *, rangesum(if(Service = 'service1' and Status = 'running', 1, 0), Peek(OVERALLSTATUS)) as OVERALLSTATUS Inline [
ServerName , Timestamp, Service,Status
abcd,25/07/2015 10:12:45 AM,service1,stopped
abcd,25/07/2015 10:12:45 AM,service2,running
abcd,25/07/2015 10:12:47 AM,service3,running
abcd,25/07/2015 10:12:48 AM,service4,running
efgh,25/07/2015 10:12:55 AM,service1,running
efgh,25/07/2015 10:12:55 AM,service2,running
efgh,25/07/2015 10:12:56 AM,service3,stopped
efgh,25/07/2015 10:12:57 AM,service4,running
];
use a if condition as
Load
ServerName , Timestamp, Service,Status,
if(Service ='service1' and Service='running',1,0) as OVERALLSTATUS
Hi Sujeet,
Your expression generates only 0. Please check the expected out asked Binod.
Assuming your data source is sorted by Server and the first service is always service1, then:
LOAD ServerName,
Timestamp,
Service,
Status,
If(ServerName = Previous(ServerName), Peek(OVERALLSTATUS), If(Service = 'service1' And Status = 'running', 1, 0)) As OVERALLSTATUS
FROM ...
If you source data is not sorted, or the service might not be the first, then you will not be able to do this in one pass. (you said your data set was huge).
This is not right solution if i add a new set of server and services its simply adding up with previous i want only 0 and 1 as value not the cumulative sum
LOAD *, rangesum(if(Service = 'service1' and Status = 'running', 1, 0), Peek(OVERALLSTATUS)) as OVERALLSTATUS Inline [
ServerName , Timestamp, Service,Status
abcd,25/07/2015 10:12:45 AM,service1,stopped
abcd,25/07/2015 10:12:45 AM,service2,running
abcd,25/07/2015 10:12:47 AM,service3,running
abcd,25/07/2015 10:12:48 AM,service4,running
efgh,25/07/2015 10:12:55 AM,service1,running
efgh,25/07/2015 10:12:55 AM,service2,running
efgh,25/07/2015 10:12:56 AM,service3,stopped
efgh,25/07/2015 10:12:57 AM,service4,running
efgh,26/07/2015 10:12:55 AM,service1,running
efgh,26/07/2015 10:12:55 AM,service2,running
efgh,26/07/2015 10:12:56 AM,service3,stopped
efgh,26/07/2015 10:12:57 AM,service4,running
];
Yes the service might not be sorted also the same server will get repeated with different time stamp
Then you can do this through a mapping table:
MapStatus:
Mapping LOAD ServerName & '-' & Timestamp,
If(Status = 'running', 1, 0)
FROM ....
WHERE Service = 'service1';
LOAD ServerName,
Timestamp,
Service,
Status,
ApplyMap('MapStatus', ServerName & '-' & Timestamp, 0) As OVERALLSTATUS
FROM ...;
I think this will give the best possible performance for a large data set.
Try something like this
Data:
LOAD *,Timestamp#(Timestamp,'DD/MM/YYYY hh:mm:ss TT') as Time Inline [
ServerName, Timestamp, Service, Status
abcd,25/07/2015 10:12:45 AM,service1,stopped
abcd,25/07/2015 10:12:45 AM,service2,running
abcd,25/07/2015 10:12:47 AM,service3,running
abcd,25/07/2015 10:12:48 AM,service4,running
abcd,25/07/2015 10:12:49 AM,service4,running
efgh,25/07/2015 10:12:55 AM,service1,running
efgh,25/07/2015 10:12:55 AM,service2,running
efgh,25/07/2015 10:12:56 AM,service3,stopped
efgh,25/07/2015 10:12:57 AM,service4,running ];
New:
NoConcatenate
LOAD *,
if(RowNo()=1 and lower(trim(Status))='stopped',0,
if(RowNo()=1 and lower(trim(Status))='running',1,
if(ServerName<>Previous(ServerName) and lower(trim(Status))='stopped',0,
if(ServerName<>Previous(ServerName) and lower(trim(Status))='running',1,
if(ServerName=Previous(ServerName),Peek('OVERALLSTATUS ')))))) as OVERALLSTATUS
Resident Data
Order by ServerName,Service,Timestamp asc;
DROP Table Data;
or you can do just Order by ServerName,Timestamp asc;
Please see the attached
Dear jontydkpi
I was thinking if this gives big mapping table due to timestamp, is it ok to have mapping tables with thousands of records. In this case equal to the no of fact table records.