Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

mark duplicate records

Hello,

       I am trying to mark duplicate rows in my data based on a condition. My data has several fields like name,lifecycleStatus. There are multiple rows with the same name+ lifecycleStatus as well as different entries for one name with different lifecycleStatus.

for e.g.

Name  LifecycleStatus

A          Disposed

A          Disposed

B          User Acceptance

B          Disposed

C          Operational

C          User Acceptance

C          Disposed

I am trying to mark the duplicates so as to retain the below:

Name  LifecycleStatus

A          Disposed     ( Keep/Mark/flag only one row of the same entries as valid)

B          User Acceptance  (Keep/Mark/flag only the row that are not marked as Disposed)

C          Operational           (Keep/Mark/flag only the row(s) that are not marked as Disposed)

C          User Acceptance

I did a left join on self to get a count of Name as a new column to mark the duplicates. So anything with count > 2 is a duplicate. But within the duplicates I need to flag them further based on the lifecycle status.

I can create an expression to further flag down the Names with distinct Lifecycle Status with an expression for e.g.

=if(aggr(count(distinct [LifecycleStatus])>1,[Key_Name]),[Key_Name])

How an I achieve something similar during load? I need to mark the duplicates during load. I have attached a sample qvw.

Any help/guidance is highly appreciated.

Thanks,

Samira

7 Replies
swuehl
MVP
MVP

Maybe like


INPUT:
LOAD DISTINCT * INLINE [
Name, Status
A  ,        Disposed
A   ,       Disposed
B    ,      User Acceptance
B     ,     Disposed
C     ,     Operational
C ,         User Acceptance
C  ,        Disposed
];

LEFT JOIN
LOAD Name, Count(Status) as Count
Resident INPUT
Group by Name;

RESULT:
NOCONCATENATE
LOAD Name, Status
Resident INPUT
Where Count =1 or Status <> 'Disposed';

DROP TABLE INPUT;

MarcoWedel

Hi,

to identify e.g. only the first of each Key/Status combination, maybe something like this might help:

CMDB:

LOAD CMDB_Key_Name,

    NodeIP,

    CMDB_LifecycleStatus,

    CMDB_OperatingSystem,

    CMDB_sys_class_name,

    CMDB_sys_id,

    Key_OS,

    AutoNumber(RecNo(),CMDB_Key_Name&'|'&CMDB_LifecycleStatus) as FlagKeyStat

FROM QlikCommunity_Thread_201320.csv (txt, utf8, embedded labels, delimiter is ',', msq, no eof);

regards

Marco

maxgro
MVP
MVP

INPUT: 

LOAD DISTINCT rowno() as Id, * INLINE [ 

Name, Status 

A  ,        Disposed 

A   ,       Disposed 

B    ,      User Acceptance 

B     ,     Disposed 

C     ,     Operational 

C ,         User Acceptance 

C  ,        Disposed

D, Disposed

]; 

load

  Id, Name, Status,

  Name <> Peek('Name') or Name = Peek('Name') and Status <> 'Disposed' as Flag     // 1 not dupl

Resident INPUT

order By Name, Status  desc;

DROP Table INPUT;

sinanozdemir
Specialist III
Specialist III

Just another approach:

Capture.PNG

Here is the data model:

Capture.PNG

Thanks

Not applicable
Author

Hello, thank you for the fast response.  I tried the solution you suggested  but I was losing all the entries  with same Name and Status. So in my e.g. above I was losing all entries for A  but it handled the other scenario pretty well. I was able to use your suggestion (along with below suggestion from MG to use peek) and incorporate it in my model to get the results.

I basically created 2 flags, one for count (distinct Lifecycle status) per Name and another flag for count (Name) per same lifecycle status for itself and then used peek to pick the first record from within the same duplicate entries ( for Name + Status).

Thanks,

Samira

Not applicable
Author

Hello M G,

              Thank you for your suggestions. When I tried your approach against my data, it retained all rows for duplicates with same Name but different Status, however it handled the duplicates with the same Name + Status very well. In my e.g. above it retained the below

    

rowNameStatusflag
13099BDisposed-1
568BUser Acceptance-1
205ADisposed-1
11881ADisposed

0

However the result I was looking for is as below:

    

rowNameStatusflag
13099BDisposed0
568BUser Acceptance-1
205ADisposed-1
11881ADisposed0

I was able to create another flag as suggested by swuehl and use your suggestion for peek to get the results.

Thanks,

Samira

Not applicable
Author

Hi Marco,

         Good suggestion, thank you. I will try that in my solution as I make further progress. I have multiple data sources that I plan to merge into this and this will come handy for sure.

Thanks,

Samira