Create robust evaluations for agentic apps

Learn how to leverage advanced features of the Evaluations framework to build robust evaluations for your app. Explore evaluating flows with tool calling and dynamic conditions, and how to define what correct behavior means for your use case. Discover how to generate synthetic data, use judges effectively, and validate your datasets for reliable results.

Watch Video (21 min)

✍️

No notes available yet

Be the hero who changes that. Watch the video, jot down what matters, and open a pull request – it's genuinely quick.

Learn how to contribute