Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Artemis_Mercury
Contributor
Contributor

Compression Discrepancy in XML Files Generated by Talend vs eprocat

I'm currently working on a project where I'm creating a catalog using the BMEcat new catalog 1.2 version Schema through Talend. I've imported the BMEcat XSD in Talend Studio, created a structure, and used a tHMap to map elements from source to target (BMEcat XML). However, I'm encountering a perplexing issue with the file size and compression ratio.

The generated XML file from Talend is approximately 3GB in size. When I compress it using the Deflate algorithm at normal compression level, the resulting ZIP file is around 400MB, reaching only a 12% compression ratio.

Interestingly, when I create the same catalog using eprocat, the raw file size is also 3GB. However, when I compress it using the Deflate algorithm at normal compression level, the ZIP file is only around 170MB, achieving a 5% compression ratio.

Upon inspecting the files, I noticed two main differences:

1. **XML Notation:**

  - Talend: `<?xml version='1.0' encoding='UTF-8'?><BMECAT xmlns="http://www.bmecat.org/XMLSchema/1.2/bmecat_new_catalog" version="1.2">`

  - eprocat: `<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE BMECAT SYSTEM "bmecat_new_catalog_1_2.dtd"> <BMECAT version="1.2">`

2. **File Structure:**

  - Talend: Single-line XML file without indentation.

  - eprocat: Indented XML file.

I've conducted some tests by changing the XML notation at the top and indenting the Talend-generated file to match the eprocat structure. Despite these adjustments, the compression ratio remains at 12% for the Talend-generated file.

**Objective:**

I aim to achieve a similar compression ratio for the Talend-generated file as I do with eprocat (5%). Are there any specific configurations or optimizations in Talend that can be applied to enhance the compression ratio? Any insights or suggestions would be greatly appreciated.

Thank you in advance for your assistance!

Labels (4)
0 Replies