Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi gurus,
Due to data is not 'clean' enough, I experienced data that has these fields. Please see attached
Bus_ID act as as unique of individual bus
Bus Name is the name of the bus
Date is the date the bus starts it journey
Bus_Passenge is the amount of passenger the bus take
As you can see from the table, Bus_ID 411 and 800 have duplicate record. Due to the user wrong input on the Bus_ID, it count it as two recrods. Therefore when i try to find the total passenger, it added all even though it is suppose to be one record(400) for bus ID 411.
Is there anyway I can do the sum of the passenger by having one record for each bus _ID. It only takes in record that has the earliest date.
Thanks
Message was edited by: Bennn Bennn New file
any help
Have you try to separate the datas in the script.
One dimension table with the bus datas and one fact table with the journey datas.
[Bus]:
LOAD DISTINCT Bus_ID as %Key_Bus_Id,
Bus_Name,
Bus_Passenger
Resident Sheet1;
[Journey]:
LOAD Bus_id as %Key_Bus_Id,
Date
Resident Sheet1;
DROP Table Sheet1;
After that you can sum without any specific formula.
Hi Benn,
test qvw. is empty.
attach some data or provide file for further assistance.
In case of duplicate values make straight table with expression as distinct.
e.g count(distinct(passenger))
Use the expression:
Sum(Aggr((FirstSortedValue(Bus_Passenger,Date)),Bus_ID))
PFA
I realized the exmaplethat I gave is not accurate enough.
the sum of the passenger is not a distinct value. Is an individual recrodt that sum up the value of passenger in a bus.
That is why if i will to use firstsorted value, it will give 1 as the output as each record stands for 1
Hello Bennn,
Please explain a bit more. Would be better if you can share a sample data in excel with desired result described.
ok give me five minutes. hope you are still online
Edit
From the new attached file, there are individual record on the table.When i sum up the bus_passenger, it will take all the records. However, due to data are not clean, some records are not means to gel together. Notably BMW, it has three different date, 07, 08, and 18. Logically, I would want BMW to only take 07 records, earliest record, which is suppose to be 300 rather than 1200.
Another example will be Audi, where there is two date, 09 and 20. Suppose to take earliest which is 09, value will be 590 compare to 1380 currently.
Thanks very much