Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
MBourassa1682971203
Contributor
Contributor

Write a delta format file

Hi,

I would like to know if there is a way to write in a delta format file ?

My projet is to take a table from a db and write it in a delta format file.

I have seen "tDeltaLakeOutput properties for Apache Spark Batch" and it seems it is possible to stores data in Delta format in files but I don't find this basic setting. I don't know if this could be the solution ?

Thank you

0695b00000ht6gAAAQ.jpg0695b00000ht6fvAAA.png

Labels (2)
6 Replies
Anonymous
Not applicable

Hello,

So far, we are able to ingest data with Delta format in dataset.

Could you please let us know if this KB article helps?

https://community.talend.com/s/article/How-to-ingest-data-to-Azure-Databricks-Delta-Lake-with-Delta-...

Best regards

Sabrina

 

MBourassa1682971203
Contributor
Contributor
Author

Hi Sabrina,

 

The page is not working: I get this error: Oops! Looks like we ran into a problem with your request. Please contact Talend Customer Support for further assistance.

 

in a talend page.

MBourassa1682971203
Contributor
Contributor
Author

Hi Sabrina,

I think I have found what I'm looking for.

I have seen that it is possible to convert parquet file to delta format file so I thought to get data from a data base, put it in a parquet file and convert it to delta format file like this.

Could you tell me if this could do the work ?

0695b00000htD93AAE.jpgThank you

I have tried but I get this error:

Exception in thread "main" java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'

at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)

at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)

at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1215)

at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1420)

at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)

at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)

at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)

at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)

at org.apache.spark.sql.delta.storage.HadoopFileSystemLogStore.listFrom(HadoopFileSystemLogStore.scala:83)

at org.apache.spark.sql.delta.storage.DelegatingLogStore.listFrom(DelegatingLogStore.scala:119)

at org.apache.spark.sql.delta.SnapshotManagement.listFrom(SnapshotManagement.scala:62)

at org.apache.spark.sql.delta.SnapshotManagement.listFrom$(SnapshotManagement.scala:61)

at org.apache.spark.sql.delta.DeltaLog.listFrom(DeltaLog.scala:62)

at org.apache.spark.sql.delta.SnapshotManagement.getLogSegmentForVersion(SnapshotManagement.scala:95)

at org.apache.spark.sql.delta.SnapshotManagement.getLogSegmentForVersion$(SnapshotManagement.scala:89)

at org.apache.spark.sql.delta.DeltaLog.getLogSegmentForVersion(DeltaLog.scala:62)

at org.apache.spark.sql.delta.SnapshotManagement.$anonfun$updateInternal$1(SnapshotManagement.scala:284)

at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)

at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)

at org.apache.spark.sql.delta.DeltaLog.recordOperation(DeltaLog.scala:62)

at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:112)

at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:97)

at org.apache.spark.sql.delta.DeltaLog.recordDeltaOperation(DeltaLog.scala:62)

at org.apache.spark.sql.delta.SnapshotManagement.updateInternal(SnapshotManagement.scala:282)

at org.apache.spark.sql.delta.SnapshotManagement.updateInternal$(SnapshotManagement.scala:281)

at org.apache.spark.sql.delta.DeltaLog.updateInternal(DeltaLog.scala:62)

at org.apache.spark.sql.delta.SnapshotManagement.$anonfun$update$1(SnapshotManagement.scala:243)

at org.apache.spark.sql.delta.DeltaLog.lockInterruptibly(DeltaLog.scala:163)

at org.apache.spark.sql.delta.SnapshotManagement.update(SnapshotManagement.scala:243)

at org.apache.spark.sql.delta.SnapshotManagement.update$(SnapshotManagement.scala:239)

at org.apache.spark.sql.delta.DeltaLog.update(DeltaLog.scala:62)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.doCommit(OptimisticTransaction.scala:749)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.doCommit$(OptimisticTransaction.scala:715)

at org.apache.spark.sql.delta.OptimisticTransaction.doCommit(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$doCommitRetryIteratively$2(OptimisticTransaction.scala:684)

at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)

at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)

at org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:112)

at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:97)

at org.apache.spark.sql.delta.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$doCommitRetryIteratively$1(OptimisticTransaction.scala:680)

at org.apache.spark.sql.delta.DeltaLog.lockInterruptibly(DeltaLog.scala:163)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.lockCommitIfEnabled(OptimisticTransaction.scala:659)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.doCommitRetryIteratively(OptimisticTransaction.scala:674)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.doCommitRetryIteratively$(OptimisticTransaction.scala:671)

at org.apache.spark.sql.delta.OptimisticTransaction.doCommitRetryIteratively(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.liftedTree1$1(OptimisticTransaction.scala:522)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$commit$1(OptimisticTransaction.scala:462)

at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)

at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)

at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)

at org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:112)

at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:97)

at org.apache.spark.sql.delta.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.commit(OptimisticTransaction.scala:459)

at org.apache.spark.sql.delta.OptimisticTransactionImpl.commit$(OptimisticTransaction.scala:457

at org.apache.spark.sql.delta.OptimisticTransaction.commit(OptimisticTransaction.scala:86)

at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:83)

at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1$adapted(WriteIntoDelta.scala:78)

at org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:198)

at org.apache.spark.sql.delta.commands.WriteIntoDelta.run(WriteIntoDelta.scala:78)

at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:154)

at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)

at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)

at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)

at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)

at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)

at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)

at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)

at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)

at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)

at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)

[WARN ] 12:49:54 org.apache.spark.SparkEnv- Exception while deleting Spark temp dir: C:\tmp\spark-d7580ae9-e76f-4980-aa3f-aa8657aab947\userFiles-a1c7c163-c4f2-461d-814c-617d60412e45

java.io.IOException: Failed to delete: C:\tmp\spark-d7580ae9-e76f-4980-aa3f-aa8657aab947\userFiles-a1c7c163-c4f2-461d-814c-617d60412e45\talend_file_enhanced-1.3.jar

MBourassa1682971203
Contributor
Contributor
Author

I have fixed the first error by adding hadoop.dll in c:\windows\system32

MBourassa1682971203
Contributor
Contributor
Author

Hi Sabrina,

 

I think the thing is working even with this error.

 

Do you know how to avoid this error ?

 

Thank you

Anonymous
Not applicable

Hello,

The DB connection is successful with you when creating a JDBC metadata connection?

0695b00000htEu0AAE.pngCould you please check if you untick checkbox "Use Auto-Commit"

Inside tDBOutput component - Advanced settings tab and re-run the job to see if it works?

Best regards

Sabrina