Hello! I'm trying to use tTikaExtractor to parse some word files. But I have no idea what component I should use for the output. When I try with a fixedflowinput I cannot connect it. Any help ? Thanks a lot !
Thanks for your reply, No I'm trying to parse .docx files. When I try to use tFixedFlowInput, I canot even make the link between the 2 components. Should I change something in the tFixedFlow Input ? What should be the shema for example ? Thanks !
Yes, I have already checked the component description, for example I would like to use the CONTENT_XHTML property, how can I define this in the tFixedFlowInput ?
Edit :
For example, I created this job :
What is the configuration of the FixedFlowInput ?
I can't figure out how to configure this
Any help ? Thanks !
Hi,
Tika extractor is a very powerfull component for pdf extraction and doc also. I recently downloaded the 1.11 version from apache, put il in the ttika folder and just change the reference to it on tTikaExtractor_java.xml in the section :
<CODEGENERATION>
<IMPORTS>
<IMPORT
NAME="tika"
MODULE="tika-app-1.11.jar"
Requires java 1.7