Senior AI Engineer · Berlin, Germany

AI evaluation tools and privacy-first product systems.

I build cost-aware LLM/RAG/agent evaluation tooling, production AI systems, and product architectures for sensitive data domains.

View Pangolin Eval Health Passport GitHub

6+ years AI, ML, and software engineering

v0.2.2 Pangolin Eval public release for local AI evaluation workflows

HealthKit privacy-first wearable data product with Apple Health writeback

Current direction

Products that make AI and sensitive data systems measurable.

My work sits between applied AI engineering, production LLM systems, privacy-first product engineering, and business-impact optimization. I care about the practical tradeoffs that decide whether a product survives production: model quality, latency, reliability, inference cost, privacy boundaries, and maintainability.

At Solenix Engineering, I contribute to ESA and EUMETSAT-related AI initiatives across satellite health forecasting, telemetry anomaly detection, AI validation workflows, synthetic QA generation for RAG evaluation, multi-agent LLM systems, and Kubernetes/GitOps deployments. In public, I am now building Pangolin Eval and Health Passport as proof that the same production discipline can become useful tools.

Focus areas

Where I am building depth.

AI evaluation

LLM, RAG, and agent workload measurement across quality, latency, cost, reliability, and release gates.

Production LLM systems

RAG, AI agents, evaluation workflows, provider switching, synthetic QA data, and observability.

Privacy-first products

Local-first data flows, permission boundaries, receipts, and product architecture for sensitive domains.

MLOps and LLMOps

MLflow, CI/CD, Docker, Kubernetes, GitOps, monitoring, and model lifecycle management.

Selected projects

Open-source and product work.

Flagship open-source AI project

Pangolin Eval

Public Python CLI/library for measuring LLM, RAG, and agent workloads across cost, latency, quality, and reliability. Includes weighted evaluators, gates, RAG diagnostics, TraceCards, OTel-style exports, gateway examples, Docker demos, and a v0.2.2 release track.

Open repository

Product build

Health Passport

Privacy-first iOS continuity layer for Fitbit/Google wearable data. It imports supported data, preserves normalized records locally, and writes clean supported samples back to Apple Health with user permission.

Open repository

Quaker

Product-style macOS maintenance CLI with dry-run-first safety, local memory, rules, profiles, hooks, and scriptable output.

Open repository

LLM E2E Applications

RAG and LLM application experiments, including PDF chat workflows with LangChain, FAISS, and OpenAI embeddings.

Open repository

Experience

Applied AI, production systems, and measurable outcomes.

2026 - Public work

Pangolin Eval and Health Passport

Published Pangolin Eval as an open-source local evaluation toolkit for LLM, RAG, and agent workflows. Building Health Passport as an iOS-first, privacy-first wearable data continuity product with HealthKit, TypeScript normalization rules, local vault boundaries, sync receipts, and backend Pro-service foundations.

2024 - Present

Senior AI Engineer · Solenix Engineering GmbH

Contributing to ESA and EUMETSAT-related AI initiatives across mission operations, satellite health forecasting, telemetry anomaly detection, AI validation, synthetic QA generation, multi-agent LLM systems, MLflow monitoring, and Kubernetes/GitOps deployment workflows.

2023

Data Scientist · CRED Investments

Built LLM-backed and ML systems for entity intelligence, matching, ranking, and operational automation. Reduced monthly cloud expenditure by about 35% / $32K+ through model, VM, and storage optimization.

2021 - 2022

Data Scientist · DeFacto

Delivered forecasting, recommendation, segmentation, and neural translation systems tied to revenue, sales uplift, campaign response, and translation cost reduction.

2018 - 2021

Software Engineer / ML Engineer · Earlier roles

Built NLP chatbots, sentiment analysis, demand forecasting, ETL automation, SQL optimization, and data analysis pipelines across startup and consulting environments.

Stack

Tools I use to ship AI systems.

Python FastAPI LangChain LlamaIndex MLflow Kubernetes Docker GitOps Flux Airflow OpenTelemetry RAG evaluation TraceCards Swift SwiftUI HealthKit TypeScript Node.js AWS GCP PostgreSQL MongoDB Weaviate PyTorch TensorFlow scikit-learn Spark

Contact

Let’s talk about production AI systems and product tools.

I am building in public around Pangolin Eval, cost-aware LLMOps, model evaluation, privacy-first product engineering, and practical tooling for teams moving from prototype to production.

Email GitHub LinkedIn