Automating Enterprise Discovery: Superintelligent’s AI Interview Agent

Superintelligent is an expert matchmaking platform connecting enterprises to agents, agent builders, and agent infrastructure. As an AI agent marketplace, their team helps enterprises determine which agents to deploy and what partners to help them execute.

Superintelligent partnered with Fractional AI to solve a new challenge: automate the consulting-style interviews they were conducting to identify agent opportunities inside large organizations. The result was a voice agent capable of conducting natural, human-like interviews across an entire enterprise—scaling qualitative insight collection in a way that was never before possible.

“This isn’t a 1x or 2x improvement. It’s a 10x better experience—and one that wasn’t even possible before.” — Nathaniel Whittemore, Superintelligent CEO‍

This case study was also covered in April 2025 on The AI Daily Brief with Nathaniel Whittemore.

‍

Problem

Superintelligent wanted to rapidly scale to meet client demand and needed a better way to perform agent readiness audits—a process that maps where AI agents could make the most impact within large companies. Traditionally this would have required hiring teams of consultants to run high-touch interviews, but this was prohibitively expensive and difficult to scale. Additionally, static survey tools and form-based approaches lack the fluidity and adaptability needed for deep qualitative research—especially to surface open-ended insights through natural, trust-based conversation.

The goal was to build an AI voice agent that could autonomously conduct qualitative interviews at scale—delivering results on par with human consultants, but with 24/7 availability and the ability to run hundreds of interviews simultaneously.

*Superintelligent’s process for clients starts with the Agent Readiness Audit*

Solution

Fractional AI delivered a production-ready voice agent powered by a modular agent architecture and a highly-customized real-time LLM orchestration system. The voice agent now enables Superintelligent to conduct on-demand interviews that adapt intelligently to user responses and deliver transcript-level insights for analysis.

Key features include:

Smart follow-up questions guided by question-specific goals
Simultaneous interviews with hundreds of employees—no scheduling required
Interactive “interview roadmap” UI allowing users to navigate between questions
Live transcript with real-time error filtering
Automatic transcription and aggregation for later analysis

Behind the scenes, a multi-agent architecture manages complex tasks like drift detection, adaptive question sequencing, transcript filtering, and session state—all critical to keeping interviews on track and delivering high-quality outcomes.

“We’re giving consultants better data, faster. This frees people up to do higher-order tasks.” — Chris, CEO of Fractional AI

‍

Looking Under the Hood

Project Setup

Model – The core “brain” of the interview agent is powered by OpenAI’s gpt-4o-realtime-preview, which enables fast, responsive interactions. It’s supported by a set of sub-agents that use a combination of gpt-4o and gpt-4o-mini for specialized tasks. Evaluations are handled separately using Claude 3.5.
Tooling – The frontend is built with React.js and styled using Tailwind CSS. WebSockets support real-time communication. The backend is implemented with Next.js and includes a dedicated WebSocket server. Infrastructure is deployed via AWS ECS, with Postgres for persistent data storage.
Data – The data output consists of real-time transcripts generated during agent conversations. These transcripts drive all downstream analysis, evaluation, and iteration on agent behavior.
Data processing – A combination of Graphile worker and custom backend scripts handle background job processing, including evaluation jobs, syncing user profile data, and transcript analysis. WebSockets enable real-time data streaming between frontend and backend and between backend and OpenAI.

Production Timeline

Building with OpenAI’s Realtime Voice API

The foundation of the voice agent experience was OpenAI’s Realtime API, which offers low-latency, conversational voice capabilities. While the core model performed impressively in real-time conversation, building a robust, production-grade system around it required significantly more than prompt engineering.

Initially, the team experimented with a monolithic prompt containing the entire interview structure and logic. While this worked in controlled scenarios, it proved brittle when evaluated for real-world deployment. The model frequently went off track, struggled with question sequencing, and often got caught in conversational rabbit holes.

To stabilize the experience, Fractional AI developed a multi-agent orchestration layer around the core voice model. This included:

Tool usage orchestration, enabling the agent to move deterministically between questions
A drift detection sub-agent, monitoring conversations and triggering course correction when needed
A next-question agent, which not only selected the appropriate next question based on transcript context and prioritized goals, but also helped reword questions to reflect what had already been discussed—bridging gaps where the core model struggled to recall earlier details
Transcript filtering logic, mitigating hallucinations or errors from Whisper (e.g., false foreign-language transcription)

In addition to these architectural layers, the team encountered several API-specific quirks that required creative problem-solving. One major challenge stemmed from the 15-minute session cap imposed by the OpenAI Realtime API. Superintelligent’s interviews often needed to run longer, and users needed the ability to pause and resume later. Resuming a session wasn’t straightforward—context could only be reloaded via text, and injecting too much prior content often caused the model to stop producing audio altogether.

To solve this, the team designed a non-obvious workaround that allowed sessions to resume from where the user left off—maintaining continuity in both voice and transcript—while working within the API’s limitations. The result was a seamless user experience that masked considerable backend complexity.

While OpenAI’s Realtime API provided a powerful base, building a reliable and context-aware enterprise voice product demanded substantial layering of logic, agent control, and infrastructure—both to steer the conversation and to smooth over the platform’s constraints.

Architecture Diagram:

Evaluations: The Complexity of Measuring Voice Agents

Unlike traditional software, voice agents don’t have obvious pass/fail conditions. There’s no simple metric to determine whether an interview was “good”. Was it the amount of information collected? The tone? The lack of confusion? The answer varies not only by use case but also by user.

From the start, it was clear that evaluating the voice agent would require more than logs and transcripts. To move beyond vibe-driven development, Fractional AI built a custom evaluation framework tailored to the nuances of real-time, open-ended conversation.

The approach combined synthetic conversations with human-like evaluation criteria:

LLM-generated personas simulated realistic interviewees across job functions and personalities
These personas were interviewed by the voice agent in a fully automated loop
Post-interview, an evaluation suite (powered by another LLM) scored the conversation on dimensions like clarity, coverage, professionalism, and goal completion

Each metric was tied to a prompt-crafted rubric, giving the team directional insight into system behavior without requiring massive manual labeling efforts. This allowed for rapid iteration across different model versions, prompt updates, and control strategies.

Despite the rigor, the process surfaced a critical truth: voice agents operate in a space where subjective experience matters. What feels “off” to one user may feel natural to another. And unlike web forms or chatbots, users may walk away mid-interview, switch languages, or respond unpredictably—factors hard to simulate in a lab.

To bridge the gap between synthetic and real-world behavior, the evaluation system became not just a QA mechanism, but a design feedback loop—shaping how the agent asked questions, responded to edge cases, and gracefully handled drift.

“We weren’t just evaluating whether the system worked—we were teaching it how to conduct a good interview, over time.” — Eddie, CTO of Fractional AI

Impact

The interview agent conducted thousands of interviews in its first month in production, enabling Superintelligent to scale faster than ever before. At one Fortune 100 client, the voice agent interviewed 150 employees in the first two weeks at a total cost of $500. The agent is built for scale (currently the limitation on scale is OpenAI rate limits) and is ready to support Superintelligent’s ambition to conduct millions of interviews per year.