The same old problems, new technology. Talend explain yourself!
I am trying to connect to an Apache Hadoop 2.7.0 distribution. I know it is not supported, but the supported versions are so far behind where we are now, I am not inclined to go there. I have been using Talend's software for years and am more than proficient in the DI and ESB applications. I therefore thought that I would stand a pretty good chance of using the "Import Custom Definition" option in "Hadoop Cluster Connection" settings. I have seen a few posts where this option has been suggested, but that is as far as it goes. An example is
https://community.talend.com/t5/Data-Quality-Preparation-and/Talend-and-HortonWorks-2-0a/td-p/107573. There is no information online as to what is meant by a "custom definition file" and all we are being hinted at is that it is a zip. There is no zip file in the Hadoop distribution, so I simply zipped my distribution and tried that. I got this...
Can you please explain exactly what this zip file is, where I can get one or how I can build one (ie what data I need)?
Talend's software is brilliant. I have been using it for years and have built my business around both the Open Source and Enterprise Editions. However Talend fails significantly when it comes to its documentation. It is beyond appalling. If I were to recommend a single thing that would elevate Talend to a position where it could truly compete with the closed source big boys, it would be to focus on documenting. Focus on explaining what the cryptic selection/option boxes do. Focus on spoon feeding answers to people so that a single answer on a place like this CAN serve hundreds of people. At the moment I am only sticking with this because I know that when I get this working (and when I discover and make personal notes on the "features"), I will find it very useful. If I were new to Talend, I would have dropped the BigData tool by now as not even Sherlock Holmes could piece together the obfuscated "instructions".
Has this feature been fixed in latest Big Data 6.0.1 version?
As downloaded and checked, still for apache version, can only support 1.0.0 version.
BTW the you tube video titled "How-to add an unsupported Hadoop to Talend Studio" metioned in
https://jira.talendforge.org/browse/DOCT-5104 is also not available in youtube.
Please help assist
Hi mints, you can support a standard Apache Hadoop cluster using a Clourdera (based on Apache 2) version. It is a pain and requires you to dig around for configuration settings, but it does work. I have a 4 node Apache Hadoop 2.7 cluster working. I found most of the time was spent configuring the cluster. Once that was done (and after all of the reading it took), I found that selecting the latest Cloudera config with adjusted URLs and ports worked for me.
I think that Apache 2 needs to be supported as an Apache distribution though. I am obviously not the only person who took the approach that if learning Big Data, it is best to start with the base open source Hadoop distribution.