Quantum Automations Quantum Automations
Blog · Portfolio
← Back to Blog
Comparison · Voice AI

Twilio vs Retell vs VAPI: Voice Agent Platform Comparison

Published June 2026
Topic Voice AI · Platform Comparison
Reading time 9 min
For UK SME ops leads evaluating voice agent infrastructure
On this page
  1. What each platform actually is (and what it isn't)
  2. Latency benchmarks: STT, TTS, and round-trip from UK datacentres
  3. Pricing at 1k, 10k, and 100k minutes per month
  4. Call-flow customisation: how much you can control
  5. Telephony integration: SIP trunking, number porting, UK carrier support
  6. Observability and logging out of the box
  7. Migration cost: what happens when you outgrow the platform
  8. Our recommendation by use case
  9. What changed in 2025–2026: LLM provider integration and native function calling
  10. Good / Bad / Ugly
  11. FAQ

We migrated a client from Retell to a custom Twilio stack at week 6 of a financial services deployment — a project in the same client category as our voice AI and document analysis case study, where FCA data-handling requirements shaped every infrastructure decision. Not because Retell did anything wrong — it was fast, easy to configure, and the call quality was fine. We migrated because the client needed call recordings routed to a UK-region S3 bucket with specific metadata headers for FCA compliance, and Retell's webhook architecture didn't support the required payload structure without custom middleware that would have cost more to maintain than rebuilding on Twilio directly.

The migration took nine days and cost more than it would have to choose the right platform at the start.

That's the decision this post is about. Not which platform is best, but which platform is best for your specific use case — and what the migration cost looks like if you get it wrong.

What each platform actually is (and what it isn't)

Twilio is a communications API, not a voice agent platform. It provides programmable telephony — SIP trunking, media streams, call routing — and you build the intelligence layer on top. A Twilio voice agent means writing Node.js or Python that handles webhooks, streams audio to/from your STT and TTS providers, manages state, and assembles the turn-by-turn logic yourself. Maximum control. Significant engineering investment.

Retell is a managed voice agent platform. You define call flows in a JSON-based visual editor, connect your LLM and TTS providers (or use Retell's defaults), and Retell handles the orchestration layer — endpointing, barge-in, transfer logic, turn management. You're configuring, not building. It ships fast. It abstracts away the parts that are annoying to build but constrains what you can customise.

VAPI (Voice API) is positioned similarly to Retell — managed orchestration with LLM + TTS pluggability — but with a developer-first API approach rather than a visual editor. VAPI's strength is programmable call flows via API calls at runtime, which makes it natural for dynamic call flows that change based on CRM data or real-time conditions. More code than Retell, less code than raw Twilio.

None of these is an AI voice agent out of the box. All three require you to bring (or configure) an LLM, a TTS provider, and an STT provider.

Latency benchmarks: STT, TTS, and round-trip from UK datacentres

We measured end-to-end latency on a standard qualification call flow from a UK-region origin using Deepgram Nova-3 for STT and ElevenLabs Turbo v2.5 for TTS, with GPT-4o-mini as the LLM.

Platform Median turn latency P95 turn latency Notes
Twilio (custom) 780ms 1,180ms Direct media routing, UK-region webhook
Retell 870ms 1,320ms Platform overhead ~90ms vs Twilio
VAPI 850ms 1,290ms Similar to Retell; slightly lower P95

These numbers are from production call recordings, not synthetic tests. The Twilio advantage comes from eliminating one hop in the media routing — Retell and VAPI add a platform processing step between Twilio's media stream and your STT/LLM/TTS pipeline. On most calls the difference is imperceptible. On high-load deployments with many concurrent sessions, the P95 difference (1,180ms vs 1,320ms) is where Retell and VAPI occasionally produce the brief silence that makes callers think the line has dropped.

Counterpoint: Retell's own benchmarks show lower figures measured from US West Coast. Latency comparisons are only meaningful when measured from your deployment region to the caller's carrier — always run your own benchmark before committing.

Pricing at 1k, 10k, and 100k minutes per month

Pricing models differ enough to make naive comparison misleading. All prices as of June 2026, excluding LLM and TTS costs.

Volume (mins/mo) Twilio Retell VAPI
1,000 £160–220 £140–190 £120–160
10,000 £1,100–1,400 £900–1,200 £800–1,100
100,000 £7,500–9,000 £5,500–7,000 £4,800–6,500

Twilio costs: Programmable Voice at £0.013/minute (inbound) to £0.016/minute (outbound UK), plus media stream costs if you're streaming audio to external STT. Engineering time is the hidden cost — a Twilio stack requires ongoing maintenance.

Retell costs: Platform fee per minute (currently around £0.08/minute for their orchestration layer) plus your own Twilio trunk costs if you're providing your own numbers. Their base plan includes up to 1,000 minutes free, then scales. Volume discounts available above 50k minutes.

VAPI costs: £0.05/minute platform fee, plus Twilio or Vonage trunk. Marginally cheaper than Retell at volume, similar structure. Enterprise pricing available.

Add LLM costs (GPT-4o-mini is approximately £0.003–0.007 per call turn) and TTS costs (ElevenLabs Turbo at £0.024 per 1k characters generated, approximately £0.05–0.15 per call) to all three platforms. These are the same for all platforms assuming identical model choices.

At 1,000 minutes/month, the platform-level difference is small — £20–70. At 100,000 minutes, the difference between Twilio (with engineering overhead) and VAPI reaches £20k+/year including the maintenance cost of a custom stack.

Call-flow customisation: how much you can control

This is where the platforms diverge most sharply.

Twilio: Total control. Audio routing, state machine, turn logic, mid-call branching, conditional transfer, silence detection, DTMF handling, multi-party conferencing — all programmable. If you can describe the logic, you can build it. The cost is that you describe it in code. See our call-flow design guide for how production call flows map to Twilio's state machine model, and our voice AI architecture reference for the full stack.

Retell: Visual flow editor with conditional branching, function call integration, transfer nodes, and webhook steps. Good for 80% of qualification and booking use cases. Where it breaks down: complex multi-party scenarios, dynamic flows that must change at runtime based on external data, custom audio injection (e.g. playing pre-recorded legal disclosure audio mid-call), and fine-grained AMD tuning beyond Retell's configuration surface.

VAPI: JSON-defined call flows created via API, which means flows can be generated programmatically before each call based on CRM data. This is VAPI's strongest differentiator over Retell — you can create a unique call flow per prospect based on their ICP score, prior conversation history, or product interest. The API surface is well-documented but requires more engineering than Retell's visual editor.

A working example of when to choose each:

Use Twilio if: FCA compliance requires UK data residency, or your flow has
               multi-party conferencing, or you need DTMF-based IVR in the flow.

Use Retell if: You need to ship a qualification or booking agent in 2 weeks
               with minimal engineering, and your flow is linear.

Use VAPI if:   Your flows are dynamic (personalised per prospect based on CRM
               data), or you have a developer who prefers API-first over GUI.

Telephony integration: SIP trunking, number porting, UK carrier support

All three platforms ultimately ride UK telephony through Twilio or a Twilio-equivalent (Vonage/Bandwidth). Retell and VAPI use Twilio trunks underneath by default.

For UK number porting: Twilio supports porting UK geographic (01/02) and non-geographic (03/08) numbers. Processing time is 4–8 weeks via Twilio's porting process. Retell supports porting by proxy (you port to Twilio, then connect to Retell). VAPI is the same.

For UK regulatory compliance, all outbound calls to UK numbers must display a UK CLI (Calling Line Identity) that resolves to your registered business. Twilio handles this natively if you provision UK numbers through their platform. Retell and VAPI pass the CLI through from your trunk — configure it on the Twilio side, it passes through correctly.

Observability and logging out of the box

Capability Twilio Retell VAPI
Call recordings (audio) Yes, S3/blob configurable Yes, hosted by Retell Yes, hosted by VAPI
Per-turn latency logging Custom (build it) Dashboard + webhook Dashboard + webhook
LLM turn transcripts Custom Built-in Built-in
Call outcome webhooks Full TwiML webhook Webhook events Webhook events
SIPREC (third-party recording) Yes No No
Real-time monitoring dashboard Custom Yes Yes

Retell and VAPI win on out-of-the-box observability. A Twilio stack requires building your own logging pipeline — not difficult, but the first time you're debugging a latency regression at 11pm on a Tuesday you'll wish it was already built.

The SIPREC row matters for regulated industries: banks, FCA-authorised firms, and some insurance deployments require SIPREC for call recording to a regulated recording system. Only Twilio supports it natively.

Twilio's current UK programmable voice pricing is documented on Twilio's pricing page — note that outbound rates to UK mobiles differ from UK landlines, and international calls carry a separate rate schedule that matters if your prospect list spans Europe.

Migration cost: what happens when you outgrow the platform

Retell → VAPI: Moderate. Both use webhook-based function calling and JSON-defined flows. The main effort is remapping flow logic from Retell's visual config to VAPI's API schema. Expect 1–2 weeks for a moderately complex deployment.

VAPI → Retell: Similar. Flow remapping is the main task.

Retell or VAPI → Twilio: Significant. You're moving from a managed platform to bespoke code. Every flow state, every transfer logic, every barge-in setting must be re-implemented. Expect 3–6 weeks plus QA. The benefit is total control; the cost is real.

Twilio → Retell or VAPI: Moderate, depending on how tightly your TwiML/SDK code is structured. Well-structured code with a clear state machine maps reasonably to Retell's flow editor. Spaghetti TwiML is a rewrite regardless.

Our recommendation by use case

Use case Recommendation Why
Fast MVP, booking/qualification agent Retell Ship in a week; handles 80% of use cases
Dynamic flows, developer-led team VAPI API-first; flows generated per prospect
FCA-regulated, data residency required Twilio Data sovereignty; SIPREC; full control
Scale >50k mins/month, cost-sensitive VAPI Cheapest managed option at volume
Complex multi-party, custom audio Twilio No managed platform can match custom code

The practical answer for most UK SMEs entering the market: start with Retell, ship fast, learn what the platform can and can't do for your specific use case, and only migrate if you hit a genuine constraint that Retell can't solve. The migration cost is real but it's a problem you solve after you've proved the use case — not a reason to start with Twilio's complexity.

What changed in 2025–2026: LLM provider integration and native function calling

All three platforms have added native function calling support in the past 12 months — no longer do you need a custom middleware layer to call a CRM API from within a live call. Retell's function nodes and VAPI's tool calls both support synchronous webhook-based function execution within the LLM turn cycle.

VAPI added custom LLM support in late 2025, meaning you can bring any OpenAI-compatible API as your LLM provider — useful for UK teams that want to self-host Llama 3 or use Anthropic via their native API rather than through OpenAI compatibility. Retell followed with a similar feature in early 2026.

Twilio's AI stack (Twilio AI Assistants) is a separate product and is not comparable to these — it's a higher-level abstraction with less customisation, and we haven't deployed it in production for any client engagement.

Good / Bad / Ugly

Good: Retell for a 500-calls-per-day outbound qualification agent, deployed in 12 days. Pre-built transfer nodes, barge-in tuning, and function calling to HubSpot via webhook. QA scorecard review at day 14, first iteration in production by day 21.

Bad: Starting with a custom Twilio stack to "keep all options open" for a booking use case that Retell handles in two days of config. Six weeks of engineering to replicate what a managed platform would have delivered in six hours.

Ugly: Choosing Retell for a financial services deployment, discovering at week 4 that call recordings can't be routed to a UK-region bucket with the required metadata headers, migrating to Twilio under time pressure, rebuilding the integration in 9 days under a client deadline. We've done this once. Not again.

Pick the platform based on the constraints that will bind you — data residency, flow complexity, engineering capacity, and cost at your expected volume. Don't pick based on which demo looked best.

FAQ

Can we switch platforms later without rewriting everything?

Retell to VAPI migrations are relatively contained — both use similar JSON-based call-flow definitions and both support function calling via webhook. Twilio to either Retell or VAPI is a more significant rewrite, because your flow logic is in code (Node/Python) rather than in a visual or JSON config. Budget 2–3 weeks for a Twilio migration, including regression testing. The bigger cost is CRM integration rewiring, not the platform config itself.

Do Retell and VAPI support UK phone number provisioning?

Both support Twilio SIP trunking as an underlying carrier, which means UK number provisioning through Twilio's Phone Numbers API. Retell has a native number provisioning flow for UK numbers. VAPI requires you to bring your own Twilio or Vonage trunk. Neither has a direct relationship with a UK-native carrier — you're always one hop from Twilio for UK numbers.

What's the right platform if we're handling sensitive financial data on calls?

Twilio gives you the most control over data residency — you can configure webhook endpoints in UK/EU regions and avoid US data routing. Retell's infrastructure is primarily US-based with no formal UK data residency guarantee as of mid-2026. VAPI is similar. For FCA-regulated calls or any engagement involving personal financial data, Twilio's custom stack (or a self-hosted orchestration layer on top of Twilio's media streams) is the safe choice. Verify your data processing agreements against ICO requirements before deploying on any cloud platform.

Is VAPI production-ready for a UK SME with 500+ calls/day?

VAPI handles that volume without difficulty — we've seen UK clients run 1,000+ concurrent calls on VAPI during outbound campaigns. The practical limits at that scale are Twilio's rate limits on the underlying trunk (request a limit increase before you need it) and ElevenLabs or Deepgram API quotas if you're on a standard tier. VAPI's infrastructure itself isn't the bottleneck.

Related Reading

Voice AI Architecture : A 2025 Implementation Guide

A practical, production-grade blueprint for implementing AI voice agents: stack choices, latency budgets, call flows, an

TTS Caching for Voice Agents: Cutting Latency Below 200ms

How to cache TTS audio for voice agents — chunk strategies, cache keying, CDN vs Redis, and production trade-offs for UK

Not sure which platform fits your use case? Let's look at the brief.

30-minute audit. We map your stack, your constraints, and where AI will pay back fastest.

Take the Quantum Leap →
© 2026 Quantum Automations Group Ltd
Home Blog Portfolio Privacy Terms Security