Re: [Resolved] How to use tTikaExtractor ? - Qlik Community

Anonymous · ‎2015-05-18

Hello!
I'm trying to use tTikaExtractor to parse some word files.
But I have no idea what component I should use for the output. When I try with a fixedflowinput I cannot connect it.
Any help ?
Thanks a lot !

Anonymous · ‎2015-05-19

Hi,
Do you want to parse HTML?
Have you tried to use tTikaExtractor -> tFixedFlowInput -> tFileOutputDelimited?
Best regards
Sabrina

Anonymous · ‎2015-05-19

Thanks for your reply,
No I'm trying to parse .docx files.
When I try to use tFixedFlowInput, I canot even make the link between the 2 components. Should I change something in the tFixedFlow Input ?
What should be the shema for example ?
Thanks !

Anonymous · ‎2015-05-19

Hi,
Have you already checked component introduction about TalendExchange:tTikaExtractor?
Best regards
Sabrina

Anonymous · ‎2015-05-19

Yes, I have already checked the component description, for example I would like to use the CONTENT_XHTML property, how can I define this in the tFixedFlowInput ?
Edit :
For example, I created this job :

What is the configuration of the FixedFlowInput ?

I can't figure out how to configure this
Any help ? Thanks !

Anonymous · ‎2015-05-20

Ok, I found how to do it, maybe it will be uselfull for someone else.
How to get data from tTikaExctrator in a tRowGenerator component :

Anonymous · ‎2015-10-28

Hi,
Tika extractor is a very powerfull component for pdf extraction and doc also. I recently downloaded the 1.11 version from apache, put il in the ttika folder and just change the reference to it on tTikaExtractor_java.xml in the section :
<CODEGENERATION>
    <IMPORTS>
      <IMPORT
        NAME="tika"
        MODULE="tika-app-1.11.jar"
Requires java 1.7

[Resolved] How to use tTikaExtractor ?

Talend Data Integration

v5.x