THE CHALLENGE

The problem.

Electronic Arts needed to scale an AI-powered question-answering system that could classify content across multiple domains and generate accurate, contextual answers. The core challenge was building a system architecture that could absorb new domains and question types quickly without requiring a full rebuild each time. Accuracy at scale was critical, and the system needed to handle both templated and open-ended queries reliably across EA's diverse content landscape.

THE SOLUTION

What we built.

Systematic prompt engineering at scale

AE embedded with EA's AI team to systematically test and improve prompt strategies for candidate generation and domain classification. Rather than one-off prompt tuning, the team built a structured evaluation process testing multiple prompt variants across domains, measuring recall and precision at each stage of the pipeline.

Scalable architecture for domain expansion

The team refactored the system architecture to decouple domain-specific logic from the core question-answering pipeline, making it possible to onboard new domains and question types in days rather than weeks. This included building reusable evaluation harnesses and modular prompt templates.

End-to-end evaluation infrastructure

AE built evaluation frameworks that measured accuracy across the full pipeline, from domain identification through candidate generation to final answer quality, rather than relying on component-level metrics that could mask end-to-end issues.

HOW IT WORKS

The details.

Systematic prompt engineering at scale

AE embedded with EA's AI team to systematically test and improve prompt strategies for candidate generation and domain classification. Rather than one-off prompt tuning, the team built a structured evaluation process testing multiple prompt variants across domains, measuring recall and precision at each stage of the pipeline.

Scalable architecture for domain expansion

The team refactored the system architecture to decouple domain-specific logic from the core question-answering pipeline, making it possible to onboard new domains and question types in days rather than weeks. This included building reusable evaluation harnesses and modular prompt templates.

End-to-end evaluation infrastructure

AE built evaluation frameworks that measured accuracy across the full pipeline, from domain identification through candidate generation to final answer quality, rather than relying on component-level metrics that could mask end-to-end issues.

KEY TAKEAWAYS

What we learned.

Scalable AI systems require architecture decisions upfront that decouple domain-specific logic from the core pipeline.
End-to-end evaluation infrastructure is essential for measuring real-world accuracy, not just component-level metrics.
Prompt engineering at scale requires systematic testing across domains, not one-off optimization.

IN SUMMARY

Bottom line.

AE Studio worked with EA's engineering team on prompt engineering, evaluation infrastructure, and system architecture to improve both accuracy and scalability of their AI question-answering pipeline. The engagement focused on building repeatable processes for adding new domains and measuring end-to-end system performance.

Electronic Arts

The problem.

What we built.

Systematic prompt engineering at scale

Scalable architecture for domain expansion

End-to-end evaluation infrastructure

The details.

Systematic prompt engineering at scale

Scalable architecture for domain expansion

End-to-end evaluation infrastructure

What shipped.

What we learned.

Bottom line.

Bring us the hard problem.