1 minute read

When you’re building production GenAI systems, reliability isn’t optional—it’s the line between a demo and a deployable product.

We learned this the hard way.

We had a GenAI-powered bot that returned structured JSON responses to a microservice. Even with temperature=0, about 10% of responses were malformed, triggering 4xx errors in our Grafana dashboards. We’d implemented retries, but they were costly—in latency, tokens, and unpredictability.

We couldn’t switch to a more capable (or expensive) model due to constraints. We tried prompt engineering, schema validation with Pydantic, and deterministic settings—but the LLM kept tripping over JSON formatting. The content was often correct, but a missing quote, an unescaped newline, or a trailing comma would break parsing.

So instead of fighting the model’s inherent non-determinism, we built a safety net.

We added a lightweight post-processing layer:

  • When Pydantic raised an InvalidJSONError,
  • We applied a targeted regex-based cleanup to extract and restructure the JSON,
  • Then re-validated the result.

The outcome? 4xx errors dropped from 10% to just 2%.

The Takeaway

Reliability in GenAI systems isn’t about eliminating randomness—it’s about containing it.

You can’t always control what the LLM outputs, but you can design resilient wrappers around its known failure modes. Sometimes, the most “unsexy” code—regex, fallback parsers, strict validation—is what makes your AI system production-ready.

If you’re shipping GenAI, ask yourself:
Where is your system brittle? And what’s your Plan B when the LLM inevitably glitches?