Convert legacy Talend ETL to modern dbt SQL in minutes. Semantic AI agent that reads your Talend jobs, understands the logic, and writes production dbt SQL. Full workflow orchestration with Temporal.
pip install taldbt
v0.2.1
Copy
taldbt is a Gen 3 AI agent. It parses Talend XML into an AST, infers intent, and generates modular CTE-based dbt SQL autonomously.
From legacy Talend XML to production-ready dbt project with full validation.
Every component chosen for a reason. No bloat, no unnecessary abstractions.
llm_complete() inside SQL. Auto-registers Ollama or cloud models.Choose what works for your environment. All free, all production-ready.
Full stack: Streamlit + Ollama + Temporal + DuckDB. One command, GPU-accelerated AI.
Lightweight Python package. CLI + AI migration agent. Add [ui] for Streamlit, [all] for everything.
No install needed. Upload your Talend ZIP, get a dbt project back. Powered by Cerebras AI cloud.
AdventureWorks — 24 jobs, 53 sources, 114 components. Oracle + MySQL + MSSQL. Perfect for testing.
Everything you need to get started and go to production.
docker pull souravetl/taldbt:latest docker pull ollama/ollama:latest docker compose up -d
docker exec taldbt-ollama ollama pull qwen3-coder:30b
http://localhost:8501Install taldbt as a Python package. Lightweight, no Docker needed.
# Core AI migration agent + CLI pip install taldbt==0.2.1 # With Streamlit web UI pip install taldbt[ui]==0.2.1 # With Temporal orchestration pip install taldbt[temporal]==0.2.1 # Everything pip install taldbt[all]==0.2.1
CLI usage:
# Launch the web UI taldbt ui # Discover and analyze a Talend project taldbt discover ./my_talend_project # Full migration to dbt taldbt migrate ./my_talend_project ./dbt_output # Check version taldbt version
Requirements: Python 3.10+ and Ollama for local AI (optional, falls back to free cloud AI).
The Docker stack includes Streamlit, Ollama (GPU), Temporal, and DuckDB — all pre-configured.
docker pull souravetl/taldbt:latest docker pull ollama/ollama:latest # Download docker-compose.yml from Docker Hub description docker compose up -d # Pull the AI model docker exec taldbt-ollama ollama pull qwen3-coder:30b # Open # App: http://localhost:8501 # Temporal: http://localhost:8233
No GPU? Use CPU override:
docker compose -f docker-compose.yml -f docker-compose.cpu.yml up -d
Air-gapped? Use build-dist.bat to create an offline package with tar files.
The live app at taldbt.streamlit.app runs on Streamlit Cloud with Cerebras AI.
How it works on cloud:
To deploy your own instance: Contact [email protected] for enterprise licensing and private deployment.
taldbt auto-detects and chains LLM providers with intelligent fallback:
Priority: Ollama (local) → Cerebras → Groq → OpenRouter
How it works:
<think> blocks from reasoning models are stripped automaticallyDuckDB Flock Extension:
The flock extension connects DuckDB to the active LLM provider. This enables llm_complete() inside SQL queries for semantic validation — checking if generated SQL matches the original Talend intent. Auto-registers the active model (local or cloud) on every DuckDB connection.
Free providers:
taldbt translates Talend's orchestration (tRunJob, tParallelize) into Temporal workflows.
What gets generated:
workflows.py — Parent/child workflows mirroring Talend job chainsactivities.py — run_dbt_model activity for each data jobworker.py — Registers workflows + activities on task queuerun_workflow.py — Triggers the root workflowExecution flow:
Master_Job (parent)
├─ ParallelJobWorkflow (child)
│ ├─ run_dbt_model("dimproducts_copy")
│ └─ run_dbt_model("productvendor_copy")
├─ ProductSubJobsWorkflow (child)
│ ├─ run_dbt_model("dimproductcosthistory_copy1")
│ └─ run_dbt_model("load_dimprodinventory_copy")
└─ run_dbt_model("shipmethodmysql_op")
Dashboard: Available at http://localhost:8233 when running Docker or local Temporal CLI. Shows workflow status, history, timing, and state transitions.
What Talend components are supported?
549 component knowledge base entries covering tMap, tDBInput/Output (all databases), tFilterRow, tAggregateRow, tSortRow, tUniqRow, tJavaRow, tReplicate, tRunJob, tParallelize, tFlowToIterate, and more. Custom Java goes through the AI translation pipeline.
What databases does it handle?
48% MSSQL, 37% Teradata, 14% MySQL from our analysis of 1,595 .item files across 8 GitHub repos. DuckDB handles all dialect translation via sqlglot.
Does it work without a GPU?
Yes. Use cloud AI (Cerebras/Groq — free) or Ollama CPU mode. The knowledge base handles 96% deterministically without any LLM.
Can I use my own LLM?
Yes. Any OpenAI-compatible endpoint works. Set LLM_PROVIDER=custom with your base_url and API key.
Is my data secure?
All processing happens locally (Docker/desktop) or in your Streamlit Cloud instance. No data leaves your environment. API keys are stored in encrypted Streamlit secrets.