Improve your prompts by hill-climbing with Evaluations

Learn comparative evaluation techniques to guide your prompt engineering and select the right model for your app. Explore how to baseline performance, expand your evaluation strategy, and convert results to JSON for integration with other tools. Discover when to apply different prompting strategies and how to iteratively refine prompts for best results.

Watch Video (26 min)

✍️

No notes available yet

Be the hero who changes that. Watch the video, jot down what matters, and open a pull request – it's genuinely quick.

Learn how to contribute