Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
AWS Degraded - You may experience Community slowness, timeouts, or trouble accessing: LATEST HERE
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

unique values from the columns

I am new to talend software using since a week so please anyone can you help me.

how to get the unique values from the columns without using database and using version 6.4

it also includes getting the all columns data into one column and the data is unique i.e it doesn't repeat in any one of the column and then post that data

Labels (4)
10 Replies
Anonymous
Not applicable
Author

Welcome to Talend! Can I ask you for a bit more information. Maybe an example of your input data and an example of how you want it output. That should give enough info to help you out.

Anonymous
Not applicable
Author

my input data is the github repository  topics and the output should be the unique github  topics of all the repositories which should not be repeated


Anonymous
Not applicable
Author

I'm afraid I will need actual examples of the data. The more information you give about a problem, the better the chance of a response. If people have to search for examples of your data, they tend not to be so willing to respond. 

Be sure not to supply anything sensitive. Replace any sensitive or private data with random data.

Anonymous
Not applicable
Author

i am attaching the excel file for details, 

but these details are obtained from the github using github apis to get the repo names and their topics

some of the topics are repeated and i need to get all the unique topics in a single column and they should not be repeated 

and also post those unique values into another website. 

In excel file we see that i am having columns topics 0, topics 1, topics 2 and i need these three columns to be combined into one column  and it should be having only unique values

 

Note: i am doing it for 1000 to 10000 of records in the excel i have given only around 30 to 40 of records. 

the flow is like

tloop---------> tjava--------------> trestclient----------------> textractJsonFields-----------------> tunite-----------> tmap----------> tlogRow

 

in this flow tloop and tjava are used to loop for pagination of github to get greater than 100 of records .

trestClient is used with the get method to get all the repo names and their topics.

tExtractJsonFields helps to get only the required fields.

tunite helps to unite all the data which is more than 100 records and give output at a time

tmap for mapping of the data.

 tlogrow to print output

 


Capture.JPG
Anonymous
Not applicable
Author

Take a look at the last solution on this thread.

https://community.talend.com/t5/Design-and-Development/Searching-distincrt-elements-in-a-line/m-p/21...

First you need to combine the columns. You can do this with a tMap. Just link all of the columns together using simple Java String concatenation. Add a separator, maybe a semicolon. Something like this....

row1.column1 +";"+row1.column2+";"+row1.column3

You can then use the code in the thread I linked to above to remove duplicates. 

 

Anonymous
Not applicable
Author

Actually i need all the columns data in one column which means not concatenation 

it is one after the other like example input is from link @captureJPG and the output should be like in the link @output1 

then get the distinct values of that single column AllTags from output1 .

 

and could you be still more specific of using the distinct code that you have provided in the previous link as i am new i dont know where to write or copy the code.


output1.JPG
Capture.JPG
Anonymous
Not applicable
Author

i dont know what you exactly want 

but i try to solve your question

try this

# Import pandas package 
import pandas as pd
  
# create a dictionary with five fields each
data = {
    'A':['A1', 'A2', 'A3', 'A4', 'A5'], 
    'B':['B1', 'B2', 'B3', 'B4', 'B4'], 
    'C':['C1', 'C2', 'C3', 'C3', 'C3'], 
    'D':['D1', 'D2', 'D2', 'D2', 'D2'], 
    'E':['E1', 'E1', 'E1', 'E1', 'E1'] }
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# Get the unique values of 'B' column
df.B.unique()
Anonymous
Not applicable
Author

i  want to concatenate  my columns but in different way where i want to get the all the columns data in a single column i.e one column data after the another

like in the output1JPG image in that i have commented the start and end of each column data i.e, if column 1 has 10 values and column 2 has 20 values both combined together i wanted to get 30 values 

And in previous message you have given the python code can i know how to implement it


output1.JPG
Anonymous
Not applicable
Author

This is why i asked for the data and for a good explanation of the problem. The data needs to be in a format that people can use. Screenshots are not good enough. If you had given your input data in a format I could use and your output data so that I could understand it, I could have helped you a while ago. However, I will need to think about your requirement to try and understand it, since you have given clues to what is not a normal requirement. I will try and look at this later as I have a lot I need to work on.