Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
suppose I have two CSV files: language.csv and languagecheck.csv
note there is not any direct relation between them.
and I have two jobs, I have two questions for job1 and one question for job2
Job1:
tMap: it is a cartesian join with the result of 9 rows.
I wrote a println inside the tJavarow:
System.out.println(input_row.lookup_id);
the result should be: 11 and 12 and 13
but it is:
1 testtest aa
12 rest bb
13 quest cc
11 testtest aa
12 rest bb
13 quest cc ...
Question 1: Why this happens and how I can solve it?
Question 2: If I open the result I see again something strange, why?
Job2: in this job, I compare the value of the column Name of languagecheck.csv with the column Name of language.csv
tMap:
The result should be two columns id and count, the value of count should be 0 but the result is:
Question 3: these two extra columns came from where and why the value of count is 1?
The result should be
id count
1 0
2 0 ....
Note: I don't want to create a join between two CSVs inside tMap.
Hi,
For the first Question, you are using tab as the line separator. But most probably you must not have changed the default line separator semicolon to Tab in both or one of the input files. I got the right results.
Coming to your second query, you will have to again check the column separator symbol as first task. Now, I am joining the two datasets using name column.
I got zero match as shown below.
I am wondering why you are not doing the join within tMap. The problem with your current match method is that you are doing a Cartesian join and then trying to do java functions to perform the same task. Due to Cartesian set, the results to process will become bigger especially for bigger datasets. Which means your current program may suffer throughput issues in future. Also I personally do not like to do hand coding when the same functionalities are provided by ETL tools 🙂 Why to reinvent the same old wheel 😉
Hope I have answered your query. Could you please spare a second to mark the topic as resolved.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
For the first Question, you are using tab as the line separator. But most probably you must not have changed the default line separator semicolon to Tab in both or one of the input files. I got the right results.
Coming to your second query, you will have to again check the column separator symbol as first task. Now, I am joining the two datasets using name column.
I got zero match as shown below.
I am wondering why you are not doing the join within tMap. The problem with your current match method is that you are doing a Cartesian join and then trying to do java functions to perform the same task. Due to Cartesian set, the results to process will become bigger especially for bigger datasets. Which means your current program may suffer throughput issues in future. Also I personally do not like to do hand coding when the same functionalities are provided by ETL tools 🙂 Why to reinvent the same old wheel 😉
Hope I have answered your query. Could you please spare a second to mark the topic as resolved.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
No, they are the same as each other:
I also mentioned I can't use join, this question is a simplified case of my problem and I need the cartesian result.
Just assume the name of the CSV2 is part of the name of CSV1, now you cannot use the join as it always gives 0 results.
Unfortunately I am really confused what you are trying to achieve in this use case.
You had initially asked two queries and I had showed how it is working properly for your sample data. Right now, are you saying that your lookup and main flow are same but you are not getting any match?
Could you please rephrase your query along with current and expected output details so that Talend community members can give informed thoughts.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi @nthampi
I want to thank you and accept your answer as you mentioned about separator "
you are using tab as the line separator. But most probably you must not have changed the default line separator semicolon to Tab in both or one of the input files"
The problem was about caching old separator even after changing that to another. When I used Java debug mode it was solved. I mentioned this issue here: