MapReduce exemple

Anonymous · ‎2013-03-18

Hi 😃
is there an exemple of effective running MapReduce example using talend open studio for big data?
thank you 😃

Anonymous · ‎2013-03-18

Hi,
We are currently working on adding more Big Data and particularly MR examples in our resources (for our next (5.3) release mostlikely) but there is one Big Data example in the studio User Guide: https://help.talend.com/display/TALENDOPENSTUDIOFORBIGDATAUSERGUIDE52EN/B.3.2+Translating+the+scenar...
And you can find some interesting webinars which may help too:http://www.talend.com/resources/webinars
I hope this may help a bit.
If you have precise MR questions, don't hesitate to post again here.
Elisa
(Doc team)

Anonymous · ‎2013-03-18

Thank you for answering Elisa.
i'm hadoop newbie and i'm trying Talend Open Studio for big data since few days, searching how to run mapreduce jobs. ok, so i'll wait for next tutorial! i'm looking forward to try them!
Yes, i'am looking on the example "B.3. Finding out who visit your website most often". some error to fix (hope it will work, i liked the example's topic!)

Anonymous · ‎2013-03-18

Now, we assume your Hadoop cluster is already set up and correctly configured (which is not always that easy). So be aware that the Talend documentation does not intend to focus on how to set up Hadoop, but really how to set up Jobs using Talend Hadoop connectors.
Just mentioning that, because I had the question before.
If you have a particular MR use case in mind based on your own needs, feel free to expose it here.
Elisa

Anonymous · ‎2013-03-19

Hi
You should also check out the Youtube videos on big data ( http://www.youtube.com/user/TalendChannel).
Enjoy!

_AnonymousUser · ‎2013-06-12

Now, we assume your Hadoop cluster is already set up and correctly configured (which is not always that easy). So be aware that the Talend documentation does not intend to focus on how to set up Hadoop, but really how to set up Jobs using Talend Hadoop connectors.
Just mentioning that, because I had the question before.
If you have a particular MR use case in mind based on your own needs, feel free to expose it here.
Elisa

Elisa,
I am happy to see Talend getting into the MapReduce world. I have a specific use case that I would like some assistance on. I have web log files in flat format that I want to unpack. I would then like to aggregate this data into a new table.
Here is an example of a row within the flat file. Each dimension is delimited by ^. Thank you!!!
Time^UserId^AdvertiserId^OrderId^LineItemId^CreativeId^CreativeVersion^CreativeSize^AdUnitId^CustomTargeting^Domain^CountryId^Country^RegionId^Region^MetroId^Metro^CityId^City^PostalCodeId^PostalCode^BrowserId^Browser^OSId^OS^OSVersion^BandWidth^TimeUsec^AudienceSegmentIds^Product^RequestedAdUnitSizes^BandwidthGroupId^MobileDevice^MobileCapability^MobileCarrier^GfpContentId^IsCompanion
2013-06-03-15:44:00^tEyYz5wJNfF-iCl3IKWT8A^12690422^136588262^26445782^26194754342^1^160x600^55707782^location=bottomleft;login=no;ptype=search;search=zebra_decor;visitorid=38949244799;wm_visit_id=38949244799^bellsouth.net^2840^United States^21158^Mississippi^200647^Greenwood-Greenville MS^1020740^Greenville^0^^500118^Microsoft Internet Explorer 10.Any^501026^Microsoft Windows 8^^cable^1370288640^9483370|9619690|9686410|9686530|10620730|10621210^Ad Server^160x600^4^^^^0^false

Anonymous · ‎2013-06-16

Hi,
I have a very basic question related to this topic..
Does Talend use the Capability and Processing power of Hadoop only in the Pig (MR option) in the Open Studio for Big Data 5.3.0 option?
I assume that if I use a HivevInput option and extract around 10 gig of data and then use a tMAP compent, here I am not using the processing power of Hadoop while doing the mapping.? Is this correct?
Thanks.

Anonymous · ‎2013-06-17

Hello Ganesh,
You're almost right.
In Talend Open Studio for BigData, Talend uses the power of MapReduce with these components:
- PIG (all the components)
- Sqoop (the 3 components)
- The ELT components for Hive. (tELTHiveInput, tELTHiveMap and tELTHiveOutput)
You are right, using the tHiveInput and then a tMap, MapReduce is used only to execute the query you have written in the tHiveInput, but the processing within the tMap is made in Java, locally.
Additionally, in 5.3, Talend Platform (enterprise version) brings the ability to design and execute a M/R transformation using the usual ETL components, you used to use in the previous version. That means you can design anything in M/R using the classical ETL components.

Anonymous · ‎2013-06-18

Hi rdubois,
Thanks for the detailed reply! It helps in designing and saved my time

Anonymous · ‎2013-06-28

I am currently evaluating Talend to possibly replace our current internally developed ETL framework. We want to move to Hadoop and incorporate MapReduce in our ETL process. Because I am new to both Talend and Hadoop, I wanted to see if anyone could guide me for a particular use case I will use as the basis for a proof of concept.
I have many input forms as part of our ETL (.xml, csv, .dmp) and often they are archived as some form of zip. We would like the raw input to be initially placed in Hadoop using a specific structure, processed in Hadoop using MapReduce, and finally stored in Hadoop using a specific structure.
I am able to connect to my Hadoop instance and understand the Put, Get, Delete, and Copy features which Talend provides. What I don't understand is how to look for any zip file, copy it to a staging area, unpack it, process it, and then save it back to Hadoop.
Thanks.

Big Data

v5.x