Name: We will do them later - A Talk About LLM Evals
Start: 2026-05-28T14:45:00+0200
End: 2026-05-28T15:15:00+0200

We will do them later - A Talk About LLM Evals

Thursday May 28, 2026 14:45 - 15:15 CEST

🤖 DATA/AI ARENA

Coding Agentic Systems is fun — it feels like magic!
Vibe-coding Agentic System is even more fun — double magic, hah!?

But quality evaluation of LLM generations? That part is usually… boring.
And if you’re not in the Python ecosystem, good luck finding a framework that actually works for you.
So eval tasks quietly sit in our backlogs, waiting for better days, the v2 release, or some future “we’ll fix it later.”

Drawing from real experience building production RAGs, agentic pipelines, and LLM-powered features across different stacks, I'll share the lessons learned the hard way — what broke, what worked, and what we wished we'd measured from day one.
The goal of this talk is simple: make evals understandable, and the creation process easy — with Coding Agents doing the heavy lifting alongside you. Practical tips, tricks, and a fresh perspective on making evaluation a natural part of developing LLM-powered applications.

Speakers