Talend Agentic AI Code Quality Framework - Must ha... - Qlik Community

Sourav_Roy · ‎2025-06-22

Talend AI Code Quality Framework

This project automates detection and fixing of Talend job design issues using YAML-based rules, .item job files, and Open Router LLMs (GPT-based). It supports zipped jobs, generates CSV reports, and shows beautifully formatted CLI output.

It Include 10 Production Ready jobs already. Just plug and play. Understand then Reuse.

This project is an AI adaption of my previous one - Qlik Community

Link to Github Repo - Github

Connect with me for suggestions/feedback - LinkedIn

🎥Demo Video

https://www.youtube.com/watch?v=H0oVfQjiWkk

See the full agentic Talend LLM quality fixer in action!

✅50+ Production-Grade Code Quality Checks

This framework supports 50+ scalable, production-level rule checks tailored for enterprise Talend pipelines.
These cover:

🔐Context misuse & variable scope
🧱 Schema issues, nulls, and defaults
🚨Missing error handling (e.g., tDie, tWarn)
🪝 Unused metadata & dead code
🌀Infinite loop detection & unsafe joins
🎯Naming conventions, error propagation, subjob limits, and more

All rules are YAML-defined and support both auto fix and LLM-assisted suggestions.

🚀How to Run

Place your zipped Talend Jobs into:

zipped_jobs/

Run the full pipeline (after placing your real zipped Talend jobs inside zipped_jobs/):

python talend_AgenticAI_SouraV1.py --verbose

This will:

✅Validate rule YAMLs
✅Extract .item files to jobs/ from nested zips
✅Lint and optionally auto-fix violations
✅Query LLM for unresolved issues
✅Save results to fixed/ and reports/fix_summary_report.csv

🔐OpenRouter API Key Setup

Visit https://openrouter.ai/
Sign in and generate an API key from:
https://openrouter.ai/keys
Copy the API key (starts with sk-or-...)
Create a .env file in your project root with:

OPENROUTER_API_KEY=sk-or-your-key-here

Make sure python-dotenv is installed:

pip install python-dotenv

This allows llm.py to securely load your OpenRouter key at runtime.

📁Project Structure

├── talend_AgenticAI_SouraV1.py              # Main orchestrator script
├── extract_items_with_delay.py     # Extracts `.item` files from zipped_jobs/
├── validate_syntax.py              # Checks YAML rule format
├── src/
│   ├── main.py                     # Linting and fixing engine
│   └── llm.py                      # LLM handler via OpenRouter API
├── rules/                          # YAML rules for code quality
├── zipped_jobs/                    # Input zipped Talend exports
├── jobs/                           # Extracted .item files
├── fixed/                          # Auto-fixed jobs
└── reports/
    └── fix_summary_report.csv      # CSV summary

🤖LLM Integration

Provider: OpenRouter.ai
Model: GPT-3.5-Turbo
Prompt Example:
"Fix violation: {rule description}"
Fallback: Printed clearly in CLI if LLM fails
Confidence Score: Low/Medium/High based on length

📦.item Job Extraction Logic

Looks inside each zipped_jobs/*.zip
Recursively searches folders for process/ directory
Copies .item files to jobs/
Adds delay of 1.1s per extraction
Logs each job as:
```
✅ Extracted job: myJob_0.1.item
```

📊CSV Report Format

job_file rule_id status llm_suggestion confidence myJob.item RULE_036 llm-suggested Defaults in schema don't match data type medium myJob2.item RULE_028 fixed 1.0

🧠 Features

💬GPT suggestions for unresolved issues
🛠Auto-fix for rules with "strategy: auto"
🔄Caching of LLM responses
🐢1.1s throttled file extraction
📋Emoji-decorated CLI
🧪 YAML rule format validation

🧰 Requirements

Python 3.9+
pip install -r requirements.txt with:
- openai>=1.0.0
- python-dotenv
- PyYAML

💡Future Ideas

Rule-specific enable/disable
GUI dashboard for job summaries
Email report delivery
.env setup wizard

Talend Agentic AI Code Quality Framework - Must have for every Project!!

Administration

Azure

Big Data

Cloud

Data Mapper

Data Preparation

Design and Development

Java

JDBC

JSON

ORACLE

REST

snowflake

SOAP

SQL

Studio

v6.x

v7.x

v8.x