Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Parquet file output to local directory

Hi,

 

Is there any Talend component for converting a simple CSV file to Parquet file format and can output in local directory? 
I already check tFileOutputParquet but its output will be in a bigdata system. We're currently using Talend Real-time Big Data Platform (7.2.1)

 

Thank you.

Labels (2)
1 Solution

Accepted Solutions
manodwhb
Champion II
Champion II

I do not think that you can directly convert csv to parquet file without using tfileoutputparquet component.

View solution in original post

4 Replies
manodwhb
Champion II
Champion II

I do not think that you can directly convert csv to parquet file without using tfileoutputparquet component.
RAJ6
Contributor III
Contributor III

Hi @Manohar B​ 

 

i want to know how to convert csv to parquet file without using tfileoutputparquet. kindly share your information as soon as possible

 

 note: i am using talend open studio for bigdata 7.3.

manodwhb
Champion II
Champion II

@RAJESH J​ , May be you need to check with java can you able to create parquet file with out using tfileoutputparquet.

onursahan
Partner - Contributor
Partner - Contributor

Hi,

I have issue about converting csv to parquet with tFileOutputParquet.

But the component required winutils and c++ 2010 for windows.

onursahan_0-1707737330355.png

After added HADOOP_HOME and run c++ file, component is creating empty .parquet file with error.

How can I resolve this issue?


Error output:

Starting job csv2parquet at 14:16 12/02/2024.
[statistics] connecting to socket on port 3431
[statistics] connected
[WARN ] 14:16:08 org.apache.hadoop.util.NativeCodeLoader- Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in component tFileOutputParquet_1 (csv2parquet)
ExitCodeException exitCode=-1073741515: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1007)
at org.apache.hadoop.util.Shell.run(Shell.java:900)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1306)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1288)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:867)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:254)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:234)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:333)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:322)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:353)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:403)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:466)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:445)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105)
at org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:246)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:280)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:530)
at bidemo.csv2parquet_0_1.csv2parquet.tFileInputDelimited_1Process(csv2parquet.java:756)
at bidemo.csv2parquet_0_1.csv2parquet.runJobInTOS(csv2parquet.java:1689)
at bidemo.csv2parquet_0_1.csv2parquet.main(csv2parquet.java:1387)
[FATAL] 14:16:08 bidemo.csv2parquet_0_1.csv2parquet- tFileOutputParquet_1 
org.apache.hadoop.util.Shell$ExitCodeException: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1007) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell.run(Shell.java:900) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1306) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1288) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:867) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:254) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:234) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:333) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:322) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:353) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:403) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:466) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:445) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105) ~[hadoop-common-3.2.4.jar:?]
at org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:246) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:280) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:530) ~[parquet-hadoop-1.10.1.jar:1.10.1]
at bidemo.csv2parquet_0_1.csv2parquet.tFileInputDelimited_1Process(csv2parquet.java:756) [classes/:?]
at bidemo.csv2parquet_0_1.csv2parquet.runJobInTOS(csv2parquet.java:1689) [classes/:?]
at bidemo.csv2parquet_0_1.csv2parquet.main(csv2parquet.java:1387) [classes/:?]
[statistics] disconnected
 
Job csv2parquet ended at 14:16 12/02/2024. [Exit code  = 1]