A fintech client's outbound campaign dropped from 38% answer rate to 14% over four days in March 2025. No script change. No timing change. No list change. The number had been flagged as "Spam Likely" by Hiya after hitting 340 calls on day two — twice the safe daily ceiling for a number under 30 days old. By the time we found it, the number was burnt and the campaign calendar had to move two weeks. That two-week delay cost the client roughly 800 missed conversations. This post is the playbook we built afterwards.
How carrier spam-flagging actually works in the UK (and why it's silent)
Carriers do not phone you to say your number has been flagged. There is no webhook, no alert email, no dashboard notification from BT Wholesale or the mobile networks. The flag propagates silently through a distributed label network and the first signal you get is a quiet slide in answer rate that most ops teams attribute to list quality or time-of-day friction.
When a call leaves your SIP trunk, it carries a CLI (Calling Line Identification). The terminating carrier checks that CLI against its own internal analytics: call velocity, complaint history, call duration patterns, and whether the number appears on crowdsourced spam databases. If the score crosses a threshold, the carrier either suppresses the call or passes a label to the handset app — iPhone's Silence Unknown Callers, Android's Call Screen, or the OEM call-screening layer built into Samsung and Google Pixel devices.
Third-party analytics providers — Hiya, First Orion, YouMail — run separate label networks. They aggregate user reports, call pattern data, and business registrations, then surface labels via OS integrations: Hiya in Samsung's native dialler; First Orion in T-Mobile/EE's branded caller ID; YouMail in its own app and some B2B softphones.
The catch: the flag applies at the label network level, not the carrier level. A number can be clean on EE's network and flagged in Hiya's database simultaneously. Your answer rate blends both states, so the drop looks gradual rather than binary.
Hiya, First Orion, and YouMail: the three networks that label your number
| Network | Handset reach (UK) | Business dispute path | Flag speed |
|---|---|---|---|
| Hiya | Samsung native dialler, Hiya app (~6M UK downloads) | hiyaphone.com/en/business — requires business verification | 24–72 hrs to flag; 5 business days to dispute |
| First Orion | T-Mobile/EE via CNAM; partner integrations | firstorion.com partner portal — business registration required | 48–96 hrs to flag; 7–14 days to dispute |
| YouMail | YouMail app; B2B softphone integrations | robocallindex.com — report submission only, no formal dispute | Hours to flag; no guaranteed removal timeline |
Hiya is the most consequential for UK voice campaigns because of its Samsung integration. On the flagship Galaxy S series — the dominant Android handset in the UK B2C market — Hiya's label appears inline on the incoming call screen before the user even picks up. A "Spam Risk" label at that moment is effectively a declined call.
First Orion matters for campaigns targeting the 18–35 demographic who are disproportionately on EE, which uses First Orion for its Nuisance Call Blocking feature.
YouMail has a smaller direct reach but its Robocall Index data is picked up by trade press and occasionally cited in ICO enforcement context, so persistent presence there can create reputational drag beyond the label itself.
Warm-up schedule for new UK numbers: calls per day, cadence, and duration targets
The principle behind warm-up is simple: new numbers with no behavioural history look identical to newly provisioned robocall infrastructure. The carrier analytics systems do not know you are a legitimate business until you prove it through call pattern behaviour over time.
This is the schedule we use for every new number before it enters live campaign rotation:
Week 1 (Days 1-7):
- Daily volume: 50–150 calls
- Min average connected duration: 45 seconds
- Call window: 09:00–17:30 BST only
- No more than 20 calls/hour
- Ensure ≥40% calls result in a human answer (not voicemail)
Week 2 (Days 8-14):
- Daily volume: 150–250 calls
- Min average connected duration: 40 seconds
- Call window: 08:30–18:00 BST
- Max 35 calls/hour
Week 3 (Days 15-21):
- Daily volume: 250–400 calls
- Min average connected duration: 35 seconds
- Call window: 08:00–18:30 BST
- Max 50 calls/hour
Week 4+ (Steady state):
- Daily volume: up to 500 calls (higher only with verified business ID)
- Review answer rate weekly against 14-day rolling average
Two things to avoid in week one: back-to-back calls with zero gap — space them at least 45 seconds apart; and uniform call durations, which look synthetic to carrier analytics. If you are running a voice agent, make sure voicemail detection is calibrated so short-duration VM drops do not skew your duration average down.
Number pool strategy for outbound campaigns above 200 calls per day
Single-number campaigns above 200 calls per day are operationally fragile. One flag event and your entire campaign capacity disappears. The right model is a pool with rotation logic and per-number health tracking.
For a 500 calls/day campaign, we provision six numbers: four active, one in warm-up, one in reserve. The rotation logic works like this:
// Twilio outbound number selector (simplified)
const pool = [
{ number: '+441234567001', dailyCalls: 0, answerRate7d: 0.34, status: 'active' },
{ number: '+441234567002', dailyCalls: 0, answerRate7d: 0.31, status: 'active' },
{ number: '+441234567003', dailyCalls: 0, answerRate7d: 0.29, status: 'active' },
{ number: '+441234567004', dailyCalls: 0, answerRate7d: 0.33, status: 'active' },
{ number: '+441234567005', dailyCalls: 0, answerRate7d: null, status: 'warmup' },
{ number: '+441234567006', dailyCalls: 0, answerRate7d: null, status: 'reserve' },
];
function selectOutboundNumber(pool) {
const active = pool
.filter(n => n.status === 'active' && n.dailyCalls < 130)
.sort((a, b) => a.dailyCalls - b.dailyCalls); // round-robin by load
if (!active.length) throw new Error('Pool exhausted — check warm-up schedule');
return active[0];
}
The 130-call ceiling per number is deliberately below the 150-call warm-up limit for numbers under 30 days old. Once a number is past 60 days, raise the ceiling to 175 but keep pool rotation active — spread distribution protects each number's health profile. This also guards against the second-order failure: when one number gets flagged and you move all volume to the remaining ones, the spike can cascade the flag to those numbers too. We have seen a campaign burn two numbers in a single week this way.
Monitoring answer rates as a leading indicator of flag status
Answer rate is the earliest reliable signal you have. Carrier analytics take 24–72 hours to propagate a flag. If you are checking answer rates weekly, you will lose three or four days of campaign capacity before you catch it.
The monitoring setup we run is straightforward: a daily cron job that pulls Twilio call logs, calculates answer rate by number, and compares against a 14-day rolling average. Alert threshold is a 20% relative drop — so if a number holds 32% answer rate over 14 days and drops to 25% in a single day, that triggers a review.
{
"monitoring": {
"check_interval_hours": 24,
"alert_threshold_relative_drop": 0.20,
"minimum_daily_calls_for_signal": 30,
"rolling_window_days": 14,
"alert_channels": ["slack:#voice-ops", "email:[email protected]"],
"auto_quarantine_threshold": 0.35
}
}
The auto_quarantine_threshold of 0.35 relative drop means that if answer rate drops by 35% or more in a single day (e.g. from 32% to 20%), the number is automatically moved to status: quarantine in the pool and the warm-up number is promoted to active. No human intervention required at 2am.
For teams building on call flow architecture, pipe answer rate telemetry into the same observability stack as agent latency metrics. A number health event and a latency spike are often correlated — a flagged number that does connect tends to attract more hostile calls, pushing average call duration down and agent performance metrics off.
Recovery playbook when a number is flagged: the sequence that works
Here is the exact sequence, not the aspirational version.
Day 0 (discovery): Pull the number from active rotation immediately. Do not continue calling on it — every additional flagged call deepens the label. Check Hiya's public lookup at hiyaphone.com/en/spam-call-lookup to confirm the label. Promote your warm-up number to active and order a replacement number from your SIP provider to start warming up.
Day 0 (dispute filing): Register or verify your business on Hiya for Business. If you are already registered, file the dispute with call volume data, business registration details, and a description of the legitimate campaign use. For First Orion, log into the partner portal and submit equivalent documentation.
Days 1–5 (dispute window): Do not re-introduce the number to any outbound dialling. Use the time to audit your call pattern — was the trigger volume, duration, or complaint-based? The dispute confirmation email from Hiya usually arrives on day three to five and includes the revised label status.
Days 6–14 (propagation period): Even after label removal at the Hiya database level, cached labels persist at the handset layer. Answer rate recovery is gradual — we typically see full recovery in 18–22 days from a successful dispute. If the number is a long-standing DDI, it is worth recovering. If it was campaign-specific, retire it and start the replacement's warm-up cycle immediately.
Good / Bad / Ugly
Good: A campaign that catches a drop early (within 24 hours), quarantines the number, and promotes its warm-up spare. Dispute filed on day zero, campaign continues on backup number, original recovered in three weeks. Net impact: one day of slightly reduced capacity.
Bad: A campaign that runs daily but reviews answer rates weekly. The number runs flagged for five days at 40% reduced capacity before the team notices. Dispute filed late, recovery extended. Net impact: ~25% campaign volume lost across a two-week window.
Ugly: The March scenario above. No pool, no monitoring, no warm-up schedule. Single number hits 340 calls on day two. Flagged. Campaign paused. Dispute filed, but during the pause the campaign calendar shifted and the slot was lost. Number eventually recovered but the opportunity window it was booked for had passed.
PECR intersection: what UK compliance looks like at number rotation scale
PECR (Privacy and Electronic Communications Regulations 2003) requires that every outbound direct marketing call presents a valid CLI — a number the recipient can call back that connects to your business. Each number in your pool must meet this requirement individually.
The practical implications for a six-number pool:
- Every number must be a live DDI (Direct Dial-In) connected to your business telephone system. You cannot rotate through disconnected numbers or one-way-out SIP trunks.
- Every number must be checked against the Telephone Preference Service (TPS) and Corporate TPS (CTPS) before dialling. Pool rotation does not change this requirement — the suppression check is per-contact, per-call, not per-number.
- The ICO expects you to maintain records of which numbers were used in which campaigns, because CLI patterns are a primary evidence source in enforcement investigations. Log your pool rotation decisions.
What PECR does not do: cap the number of numbers you can rotate. The ICO's direct marketing guidance is silent on pool size. The constraint is legitimate CLI, not pool depth. One counterpoint worth noting: some compliance advisers argue that rapid number rotation signals intent to evade TPS screening, particularly if numbers rotate faster than the 28-day TPS refresh cycle. We disagree, but it is a minority view you will encounter — the Direct Marketing Association's Code of Practice sets out the mainstream industry position.
What changed in 2025–2026: carrier analytics APIs and self-serve flag disputes
Two developments shifted the landscape significantly since late 2024.
First, Hiya launched self-serve business analytics access in Q4 2024, giving registered businesses a dashboard view of their numbers' label history and complaint volume. The API access is tiered — free accounts see label status; paid accounts get complaint volume and call pattern flags that explain why a label was applied. For the first time, UK outbound teams have direct visibility into carrier reputation data without inferring it from dispute outcomes.
Second, OFCOM published updated guidance on Nuisance Calls and Texts in January 2025, which explicitly mentions carrier-level analytics as a tool in enforcement referrals. Carrier flag data is now potentially part of the evidence chain in a regulatory investigation — treat call pattern hygiene as a compliance activity, not just a deliverability one.
For Twilio users, the Regulatory Compliance API added UK CLI pre-validation in early 2025, so you can confirm a number's regulatory status before provisioning it rather than after your first failed call.
Number health is infrastructure. The warm-up schedule is a deployment process. The monitoring stack is an SLA. Teams that treat it as an afterthought pay for it in burnt campaigns and two-week delays — see how prompt engineering for voice agents and TTS caching feed into the same operational discipline. Build the pool, build the monitoring, build the recovery playbook before you need any of them.
For a real-world example of these layers working together, see our Voice AI and Document Analysis case study — it covers the telephony stack including number management in a regulated environment.