Skip to main content
Announcements
Customer Spotlight: Discover what’s possible with embedded analytics Oct. 16 at 10:00 AM ET: REGISTER NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
n1ef5ng1
Creator
Creator

Sum Distinct *Attached file included

Hi gurus,

Due to data is not 'clean' enough, I experienced data that has these fields. Please see attached

Bus_ID act as as unique of individual bus

Bus Name is the name of the bus

Date is the date the bus starts it journey

Bus_Passenge is the amount of passenger the bus take

As you can see from the table, Bus_ID 411 and 800 have duplicate record. Due to the user wrong input on the Bus_ID, it count it as two recrods. Therefore when i try to find the total passenger, it added all even though it is suppose to be one record(400) for bus ID 411.

Is there anyway I can do the sum of the passenger by having one record for each bus _ID. It only takes in record that has the earliest date.

Thanks

Message was edited by: Bennn Bennn New file

10 Replies
n1ef5ng1
Creator
Creator
Author

any help

Anonymous
Not applicable

Have you try to separate the datas in the script.

One dimension table with the bus datas and one fact table with the journey datas.

[Bus]:

LOAD DISTINCT Bus_ID as %Key_Bus_Id,
    
Bus_Name,
    
Bus_Passenger
Resident Sheet1;

[Journey]:
LOAD Bus_id as %Key_Bus_Id,
          
Date
Resident Sheet1;

DROP Table Sheet1;

After that you can sum without any specific formula.

ashwanin
Specialist
Specialist

Hi Benn,

test qvw. is empty.

attach some data or provide file for further assistance.

ashwanin
Specialist
Specialist

In case of duplicate values make straight table with expression as distinct.

e.g count(distinct(passenger))

tresesco
MVP
MVP

Use the expression:

Sum(Aggr((FirstSortedValue(Bus_Passenger,Date)),Bus_ID))

PFA

n1ef5ng1
Creator
Creator
Author

I realized the exmaplethat I gave is not accurate enough.

the sum of the passenger is not a distinct value. Is an individual recrodt that sum up the value of passenger in a bus.


That is why if i will to use firstsorted value, it will give 1 as the output as each record stands for 1

tresesco
MVP
MVP

Hello Bennn,

Please explain a bit more. Would be better if you can share a sample data in excel with desired result described.

n1ef5ng1
Creator
Creator
Author

ok give me five minutes. hope you are still online

n1ef5ng1
Creator
Creator
Author

Edit

From the new attached file, there are individual record on the table.When i sum up the bus_passenger, it will take all the records. However, due to data are not clean, some records are not means to gel together. Notably BMW, it has three different date, 07, 08, and 18. Logically, I would want BMW to only take 07 records, earliest record, which is suppose to be 300 rather than 1200.

Another example will be Audi, where there is two date, 09 and 20. Suppose to take earliest which is 09, value will be 590 compare to 1380 currently.

Thanks very much