ETL Zammad → PostgreSQL

ETL pipeline developed in Python to extract data from the Zammad API, transform it, and load it into a PostgreSQL database for analytics and reporting.

Description

This project implements a full ETL workflow:

1. Extraction: Data retrieval from the Zammad API (tickets, users, organizations, states, time entries, and real ticket change history).

2. Transformation: Data cleaning, normalization, and enrichment, including reconstruction of real ticket timelines.

3. Loading: Incremental loading into PostgreSQL with a relational schema optimized for BI analysis.

Architecture

Zammad API → extract.py → transform.py → load_postgre.py → PostgreSQL
                                ↑
                          main.py (orchestrator)

Components

File	Description
`main.py`	ETL orchestrator
`extract.py`	Data extraction from API (with pagination and ticket history)
`transform.py`	Data cleaning and transformation
`load_postgre.py`	PostgreSQL loading (table creation and upserts)
`load_sqlite.py`	SQLite loading (development-only, disabled by default)

The load_sqlite.py was used during development and have been replaced by the final PostgreSQL implementation.

Data Model

The model is structured in three layers:

Dimension tables: — reference catalogs: users, organizations, ticket_states, time_entry_types

Core table: tickets — connects all dimensions

Fact & history tables: — capture activity over time: time_entries, ticket_state_history, ticket_event_history, fact_ticket_time, fact_ticket_time_enriched

Aggregated BI table: ticket_summary — denormalized and optimized for direct consumption in visualization tools

Real Change History

A key feature of this project is the integration of the Zammad endpoint /api/v1/ticket_history/{id} . This endpoint returns all individual changes of a ticket with exact timestamps, allowing reconstruction of the real timeline of each ticket, including transitions between states such as open, pending, resolved, or billable.

Each change is stored with:

start_time

end_time

duration_h_m

This enables advanced analysis such as time per state, bottlenecks, and full lifecycle tracking.

Note: Access to this endpoint requires a token with admin and ticket.agent permissions enabled.

Configuration

Create an .env file based on .env.example:

ZAMMAD_TOKEN=your_token
ZAMMAD_URL=https://your-zammad-instance/api/v1
 
DB_HOST=your_host
DB_PORT=5432
DB_NAME=your_db
DB_USER=your_user
DB_PASSWORD=your_password

Installation

pip install -r requirements.txt

Main dependencies:

pandas
requests
psycopg2
python-dotenv

Execution

python src/main.py

The process performs:

Extraction of tickets, users, organizations, states, and time entry types
Extraction of real change history per ticket
Data transformation and cleaning
Table creation if not exists
Data insertion/update (upserts using ON CONFLICT)
Generation of the aggregated ticket_summary table

Deployment

The project is fully containerized using Docker and includes an internal cron-based scheduler for automated execution.

Docker Setup

The ETL runs inside a lightweight Python 3.11 container with cron installed.

Key aspects of the Docker configuration:

Based on python:3.11-slim
Installs cron inside the container
Copies the project and dependencies
Registers a custom cron job
Uses a startup script to initialize execution and scheduling

Scheduled Execution (Cron inside container)

A cron job is embedded directly into the container via a custom crontab file:

*/30 * * * * root cd /app && python src/main.py >> /var/log/cron.log 2>&1

This means:

The ETL runs every 30 minutes.
Execution happens inside the container.
Logs are written to /var/log/cron.log
Both standard output and errors are captured

Startup Script (stash.sh)

The container is started using a script (start.sh in Docker, implemented as stash.sh), which orchestrates the runtime behavior:

#!/bin/bash

echo "Starting Zammad ETL..."

python src/main.py

echo "Initial execution completed. Starting cron..."

cron && tail -f /var/log/cron.log

What this script does:

1. Initial ETL run

Executes the pipeline immediately when the container starts
Ensures data is available without waiting for the first cron cycle

2. Starts cron daemon

Activates the scheduled job inside the container

3. Log streaming

Uses tail -f to continuously stream logs
Keeps the container alive (required for Docker)

Key Features

Automatic pagination for Zammad API
Real ticket change history reconstruction (per ticket)
Time unit transformation (1 unit = 15 minutes)
Upserts using ON CONFLICT to prevent duplicates
Normalized relational model + BI-ready ticket_summary table
Ready for integration with BI tools like Metabase, Apache Superset, or Grafana
Dockerized execution with built-in cron scheduling

Notes

The .env file is excluded from the repository for security
Requires Zammad API access with admin and ticket.agent permissions
PostgreSQL must be available before execution
Fact tables (fact_ticket_time, fact_ticket_time_enriched, ticket_state_history) are fully rebuilt on each run using TRUNCATE before loading

Author

Cèlia Trullà

This project was developed in a professional internship context as part of a data integration and processing workflow.

GitHub: https://github.com/ChampCTE
Email: celia.trulla@gmail.com

License

This project is intended for internal or educational use only. Commercial use is not permitted.

For inquiries or usage requests, contact: celia.trulla@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DockerFile		DockerFile
README.md		README.md
crontab		crontab
requirements.txt		requirements.txt
stash.sh		stash.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Zammad → PostgreSQL

Description

Architecture

Components

Data Model

Real Change History

Configuration

Installation

Execution

Deployment

Docker Setup

Scheduled Execution (Cron inside container)

Startup Script (stash.sh)

Key Features

Notes

Author

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ETL Zammad → PostgreSQL

Description

Architecture

Components

Data Model

Real Change History

Configuration

Installation

Execution

Deployment

Docker Setup

Scheduled Execution (Cron inside container)

Startup Script (stash.sh)

Key Features

Notes

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages