Quantum Automations Quantum Automations
Blog · Portfolio
← Back to Blog
Guide · Architecture

Voice AI Architecture: A 2025 Implementation Guide

Published May 2026
Topic Architecture · Voice AI
Reading time 9 min
For UK SMEs
On this page
  1. Reference Stack
  2. Latency Budget
  3. Call Flow (JSON)
  4. Webhooks and Functions
  5. Good / Bad / Ugly

Reference Stack

  • Telephony: Twilio Programmable Voice or SIP trunk → WebRTC
  • STT: Deepgram / Google STT; TTS: ElevenLabs / Azure; VAD enabled
  • LLM: GPT-4o-mini / GPT-4.1-mini for latency-sensitive turns
  • Orchestration: n8n / Make for workflow ops; Node services for hot paths
  • Agent runtime: Retell.ai or custom broker with barge-in + endpointing
  • Data: Redis (state), Postgres (events), S3 (call recordings, transcripts)

Latency Budget (round-trip)

  • RTP jitter + media pipeline: 50–120ms
  • STT partials: 150–350ms; endpointing 300–600ms (tune aggressively)
  • LLM generation: 250–800ms (streamed), keep tokens small
  • TTS synthesis: 120–350ms; cache recurring utterances
  • Total target: < 1.2–1.6s for natural turn-taking

Recommended VAD/Endpointing

  • Min speech: 180–250ms; max pause: 450–650ms; noise gate enabled
  • Barriers: suppress TTS playback until confident end-of-user-turn
  • Barge-in: allow user interruption; crossfade TTS and resume STT

Call Flow (JSON)

{
  "states": {
    "greeting": { "say": "Hi, this is Nova from Quantum. Is this John?", "on": { "yes": "qualify", "no": "wrong_party", "no_speech": "reprompt" } },
    "qualify": { "ask": "Are you currently evaluating AI for appointment setting?", "capture": ["intent", "timeline"], "on": { "positive": "book", "negative": "objection", "unclear": "clarify" } },
    "book": { "action": "cal.com:create_booking", "on": { "success": "confirm", "failure": "handoff" } },
    "objection": { "say": "Understood. Would a quick demo help decide?", "on": { "yes": "book", "no": "opt_out" } },
    "handoff": { "action": "transfer_to_human" }
  }
}

Webhooks and Functions

// Example: booking webhook -> HubSpot create engagement
app.post('/webhooks/booking', async (req, res) => {
  const { attendee, start, end, uid } = req.body;
  await hubspot.createMeeting({ subject: 'Voice booking', start, end, attendees: [attendee.email] });
  await db.insert('bookings', { uid, attendee_email: attendee.email, start, end });
  res.sendStatus(200);
});

Good / Bad / Ugly

  • Good: Fast barge-in, tight endpointing, short utterances, deterministic tool-calls.
  • Bad: Long-form LLM replies, no caching, unbounded function schemas.
  • Ugly: Voicemail false-positives, DTMF misfires, timezone mismatches in bookings.

Related Reading

Voicemail Detection

Hybrid AMD/VAD strategies for outbound

Inbound Lead Routing

Speed-to-lead routing patterns

Need a production-grade voice stack?

30-minute audit. We map your stack, your constraints, and where AI will pay back fastest.

Take the Quantum Leap →
© 2026 Quantum Automations Group Ltd
Home Blog Portfolio Privacy Terms Security