Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
 
					
				
		
Hi Team,
can you help me how to load HTML data into tables by using Talend?
I have attached sample HTML file.
Regards
Jay
 lojdr
		
			lojdr
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hello Jayrapolu,
Generally, HTML is a subset of XML therefore use XML components.
First thing, the file you attached is not a valid HTML file. There are missing some tags (e.g. <HTML></HTML>), some tags are not closed (e.g. <BODY>)... You have not specified what should be the output format and some other important conditions, so it is hard to provide you the exact answer, but...
The most important component is tXMLMap I think. See the attached screenshot (sorry for the naming convention). If we take only the important part of the HTML you provided:
<body> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td width="186" class="headlabel">CONSUMER:</td> <td width="320" class="headvalue">Jay</td> <td width="73"><img src="images/spacer.gif" /></td> <td width="118" class="headlabel">DATE:</td> <td width="128" class="headvalue">17-10-2017</td> </tr> <tr> <td class="headlabel">MEMBER ID:</td> <td class="headvalue">AA40238899_C2C1 </td> <td><img src="images/spacer.gif" /></td> <td class="headlabel">TIME:</td> <td class="headvalue">12:32:54</td> </tr> </table> </body>
You can use the following job to extract headlabels and headvalues.
I also attached an export of the job.
I hope, that this will help you to solve this task.
Best regards
lojdr
 lojdr
		
			lojdr
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hello Jayrapolu,
Generally, HTML is a subset of XML therefore use XML components.
First thing, the file you attached is not a valid HTML file. There are missing some tags (e.g. <HTML></HTML>), some tags are not closed (e.g. <BODY>)... You have not specified what should be the output format and some other important conditions, so it is hard to provide you the exact answer, but...
The most important component is tXMLMap I think. See the attached screenshot (sorry for the naming convention). If we take only the important part of the HTML you provided:
<body> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td width="186" class="headlabel">CONSUMER:</td> <td width="320" class="headvalue">Jay</td> <td width="73"><img src="images/spacer.gif" /></td> <td width="118" class="headlabel">DATE:</td> <td width="128" class="headvalue">17-10-2017</td> </tr> <tr> <td class="headlabel">MEMBER ID:</td> <td class="headvalue">AA40238899_C2C1 </td> <td><img src="images/spacer.gif" /></td> <td class="headlabel">TIME:</td> <td class="headvalue">12:32:54</td> </tr> </table> </body>
You can use the following job to extract headlabels and headvalues.
I also attached an export of the job.
I hope, that this will help you to solve this task.
Best regards
lojdr
 
					
				
		
Thanks for the solution. Very much appreciated.
Regards
Jay
