Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a task to cleanse some very dirty and large datasets and have tried a few components such as tSamplereservoir and tSamplerow.and others...
I've run into errors due to the formatting of the dataset being incorrect, so I've been trying to find out the problem with the data without running the whole table to do so.
.
I want to do a few specific things to check these large datasets like the below;
Many thanks
Hi Again, I'm doing a trial of Talend & I'm a bit concerned that no-one can answer my questions?
I thought these basic data examination tasks would be pretty easy to answer for someone who had Talend experience.
I've used several similar products & can usually fins solutions by trial and error/Internet searches/Forums but not so far.
I'd like to know if my questions are not phrased properly or confusing or whatever as I've done a lot of research to try & answer these questions my self.
Otherwise, Talend may not be for me.
Thanks
Hi Again, I'm doing a trial of Talend & I'm a bit concerned that no-one can answer my questions?
I thought these basic data examination tasks would be pretty easy to answer for someone who had Talend experience.
I've used several similar products & can usually fins solutions by trial and error/Internet searches/Forums but not so far.
I'd like to know if my questions are not phrased properly or confusing or whatever as I've done a lot of research to try & answer these questions my self.
Otherwise, Talend may not be for me.
Thanks
Hi,
First of all, I'm sorry for the time we needed to answer you.
If you want to select distinct values into one column, you can use the tUniqRow component, that should answer to your need: https://help.talend.com/reader/FnHYY1jWCvZe5NolmUNMdQ/ZWEHPNtq0AakndnOqHQJOQ
If you need to isolate a certain row and have a way to identify it, you can use the tFilterRow component (see https://help.talend.com/reader/Btf8zDsnT4ikhQgFW1plpQ/A8jXysHjNUXgVcJIkOBapg), you can use it even if you need several columns to identify your row in an unique manner.
About the tReservoirSampling component, it can be very useful if you want to extract a sample of your data that will be homogeneous, it will guarantee you that your profiling is not biased on your sample for example (it won't be the case if you take for example the 1000 first rows).
I hope it will answer your questions.
Damien