Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Parquet file output to local directory

Hi,

 

Is there any Talend component for converting a simple CSV file to Parquet file format and can output in local directory? 
I already check tFileOutputParquet but its output will be in a bigdata system. We're currently using Talend Real-time Big Data Platform (7.2.1)

 

Thank you.

Labels (2)
1 Solution

Accepted Solutions
manodwhb
Champion II
Champion II

I do not think that you can directly convert csv to parquet file without using tfileoutputparquet component.

View solution in original post

4 Replies
manodwhb
Champion II
Champion II

I do not think that you can directly convert csv to parquet file without using tfileoutputparquet component.
RAJ6
Contributor III
Contributor III

Hi @Manohar B​ 

 

i want to know how to convert csv to parquet file without using tfileoutputparquet. kindly share your information as soon as possible

 

 note: i am using talend open studio for bigdata 7.3.

manodwhb
Champion II
Champion II

@RAJESH J​ , May be you need to check with java can you able to create parquet file with out using tfileoutputparquet.

onursahan
Partner - Contributor
Partner - Contributor

Hi,

I have issue about converting csv to parquet with tFileOutputParquet.

But the component required winutils and c++ 2010 for windows.

onursahan_0-1707737330355.png

After added HADOOP_HOME and run c++ file, component is creating empty .parquet file with error.

How can I resolve this issue?


Error output:

Starting job csv2parquet at 14:16 12/02/2024.
[statistics] connecting to socket on port 3431
[statistics] connected
[WARN ] 14:16:08 org.apache.hadoop.util.NativeCodeLoader- Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in component tFileOutputParquet_1 (csv2parquet)
ExitCodeException exitCode=-1073741515: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1007)
at org.apache.hadoop.util.Shell.run(Shell.java:900)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1306)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1288)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:867)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:254)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:234)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:333)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:322)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:353)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:403)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:466)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:445)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105)
at org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:246)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:280)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:530)
at bidemo.csv2parquet_0_1.csv2parquet.tFileInputDelimited_1Process(csv2parquet.java:756)
at bidemo.csv2parquet_0_1.csv2parquet.runJobInTOS(csv2parquet.java:1689)
at bidemo.csv2parquet_0_1.csv2parquet.main(csv2parquet.java:1387)
[FATAL] 14:16:08 bidemo.csv2parquet_0_1.csv2parquet- tFileOutputParquet_1 
org.apache.hadoop.util.Shell$ExitCodeException: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1007) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell.run(Shell.java:900) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1306) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1288) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:867) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:254) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:234) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:333) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:322) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:353) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:403) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:466) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:445) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105) ~[hadoop-common-3.2.4.jar:?]
at org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:246) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:280) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:530) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at bidemo.csv2parquet_0_1.csv2parquet.tFileInputDelimited_1Process(csv2parquet.java:756) [classes/:?]
at bidemo.csv2parquet_0_1.csv2parquet.runJobInTOS(csv2parquet.java:1689) [classes/:?]
at bidemo.csv2parquet_0_1.csv2parquet.main(csv2parquet.java:1387) [classes/:?]
[statistics] disconnected
 
Job csv2parquet ended at 14:16 12/02/2024. [Exit code  = 1]