Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
Hope everyone is well.
We are working on a problem in which we have large data ~ 10G and would like to do some data analysis on it.
We are working with Vehicle Tracking System data. We have a table by the name of DataLog in which we have the following fields:
log_id
log_tun_id - vehicle id
log_dateintime - datetime
log_di1 - event
log_port - communication port
Each minute there is a record inserted in the Datalog table for each vehicle in the above table.
What we want to do is to measure the number of packets received on each port when the vehicle engine is on. The way we measure this is the following
log_di1 = 1 - Ignition ON
log_di1 = 0 - Ignition OFF
we want to find out the number of packets received between Ignition ON and Ignition OFF for each vehicle.
For example
log_dateintime, log_tun_id, di1, log_port
Jan 10, 2016 10:00 am , 1200, 1, UDP-1
Jan 10, 2016 10:03 am , 1200, 1, UDP-2
Jan 10, 2016 10:04 am , 1200, 1, UDP-1
Jan 10, 2016 10:07 am , 1200, 1, UDP-4
Jan 10, 2016 10:10 am , 1200, 0, UDP-2
Now in the above example we get to know that the vehicle engine was "ON" from 10:00am to 10:10 am - for 10 minutes but we received only 5 packets where as we should have received 10 packets (1 packet per minute).
How can i group by the log_tun_id and get the first instance of di1=1 and then the next instance of di1=0 to make it a block that the vehicle engine is "ON".
After that I can do the analysis on the port number , dates etc.
Sample Data - https://www.dropbox.com/s/en2j7mohe94dfq6/portanalysis.csv?dl=0
Resultant Data - https://www.dropbox.com/s/s9phja7tkh0stdn/portanalysis-result.xlsx?dl=0
Will appreciate if someone can assist.
Thanks.
Message was edited by: Syed Hussain
HI,
Can you please share your sample data to work on?
not able to access the data set.. can you attach the sample data?
Are you looking some thing like bellow..
if you are looking some thing different, specify the expected output and provide the data...
Thanks
Pradeep