Voice and video impersonation: when the call sounds right

← Series introduction Article 07 of 13

The risky moment can look like a bookkeeper answering a call from the owner.

The owner is travelling. The company is waiting on equipment for a job. The phone rings, the caller ID looks familiar, and the voice on the line sounds like the owner: same pace, same impatience, same way of saying the bookkeeper's name.

"I'm on a borrowed phone. Mine died. I need you to get a wire out before the supplier closes."

A familiar voice can feel like proof. High-stakes requests still need verification through a known channel.

Voice cloning and synthetic video have weakened one of the oldest business habits: recognizing a person by sound or face.

Staff are used to trusting a familiar voice, especially when the request appears to come from an owner, executive, manager, client, or long-time vendor. AI makes that habit dangerous for requests involving money, banking, payroll, credential resets, account access, or other irreversible action.

The reliable signal is still the action being requested. If a call, voicemail, video, or recorded announcement asks someone to move money, change payment details, approve a credential reset, disclose access, or bypass a normal process, it needs verification through a known channel.

What the risk is

Voice and video impersonation uses synthetic audio or video to make a person appear to say or request something they did not say. The target is the human verification habit: "I know that voice" or "I saw their face on the call."

Voice cloning is the more immediate SMB risk because it is cheap, quick, and works over ordinary phone calls or voicemail. Short clips of clean audio from podcasts, webinars, sales calls, voicemail greetings, LinkedIn videos, social media posts, or recorded meetings can be enough for some tools to imitate a person's voice.

Video impersonation is less common in routine SMB fraud, but the risk is moving into normal business tools. A fake participant may appear on a video call. A pre-recorded "CEO announcement" may ask staff to take an unusual action. A live video impersonation may be paired with a compromised account, making the meeting invitation or chat thread look legitimate.

Treat voice and video recognition as weak evidence for high-risk actions. A strange call may be ordinary, and a familiar call may still require verification.

Common patterns include:

A voice call from an owner or controller asking for an urgent wire transfer.
A voicemail from an executive asking staff to approve a banking change.
Caller ID spoofing paired with a cloned voice.
A video call where a familiar-looking participant asks for account access or payment approval.
A pre-recorded executive video that tells staff to ignore the normal process for one urgent item.
A compromised real account used to schedule a meeting where the audio or video is controlled by the attacker.

Phishing and payment fraud covered text-based fraud in email, chat, invoices, portals, QR pages, and landing pages. This article covers the same recognition problem when the apparent proof is a voice or face. Meeting AI covered legitimate meeting bots and recording sprawl, which is a different meeting risk.

How it happens in a normal SMB

A small Alberta industrial services company has an owner who is visible in the local market. He has appeared on a trade podcast, recorded short videos for LinkedIn, left voicemail greetings, and joined sales calls that were recorded by prospects.

That public and semi-public audio gives an attacker enough material to imitate him. The attacker also gathers business context from the company website, project announcements, job postings, and social media. The company is hiring technicians, working on a site outside Edmonton, and waiting on equipment for a customer deadline.

On Thursday morning, the bookkeeper receives a call. The caller ID shows the owner's name because the attacker has spoofed the number. The voice sounds right.

"It's me. I'm on a borrowed phone because mine died. We have a supplier issue on the Edmonton job. I need a wire sent before noon or they will release the equipment to someone else."

The caller knows the project name, the supplier category, and the owner's travel schedule. He sounds irritated in the way the owner sometimes sounds when a job is at risk. He says he will send the banking details by email and asks the bookkeeper to move quickly.

An email arrives two minutes later with wiring instructions. The name on the message appears to match the owner. The sender address is wrong, but the bookkeeper is on her phone and sees mostly the display name. She has also just heard the owner's voice.

The bookkeeper starts the wire process. The bank requires a second approval, so she messages the operations manager: "Owner called. Equipment hold. Need second approval on urgent wire."

The operations manager has also heard the owner talk about the delayed equipment. He approves in the banking portal. The bank workflow now has two approvals and still no independent check of the caller. Nobody calls the owner's known mobile number or uses the company's verification phrase because the voice seemed to settle the question.

He has no idea what wire they are talking about.

The business now has a payment-fraud incident. The phone call started it, and the approval process failed when voice recognition was allowed to authorize an irreversible action.

The failure path

The failure path looks like this:

An attacker collects voice or video material from public posts, webinars, voicemail greetings, sales calls, recordings, or social media.
The attacker gathers business context: roles, projects, travel, vendors, deadlines, payment habits, or reporting lines.
AI helps create a voice clone, voicemail, video clip, or live impersonation that feels familiar.
The attacker makes a high-risk request: wire transfer, banking change, payroll routing, credential reset, account recovery, or process bypass.
Staff rely on the familiar voice, face, caller ID, meeting invite, or apparent account.
The action is taken without callback to a known number or another approved verification channel.
Money, account access, confidential information, or business authority goes to the attacker.
The business discovers the fraud when the real person denies the request or the expected payment, access, or approval fails.

The same lesson from the previous article applies here. The request type drives the control. Text, voice, and video can all look authentic enough to fool a busy person.

Voice and video make the pressure worse because they feel personal. A staff member may hesitate to challenge a familiar voice, especially when the caller sounds annoyed, rushed, or senior. That is why the verification rule has to be normal before the call happens.

Business consequence

The first consequence is often direct financial loss.

In the industrial services company, the wire went to an attacker. The business still needs the equipment, still owes the real supplier if an invoice exists, and now has to work with the bank, insurer, IT provider, and possibly legal counsel to preserve evidence and attempt recovery.

The financial loss may be larger than a text-only fraud because a familiar voice can push staff past the hesitation they might feel with email. The staff member may also feel responsible because they "heard the owner" and acted quickly. That can create blame inside the company when the better fix is a stronger approval process.

Other consequences can follow:

Payroll or vendor banking details may be changed after a convincing call.
A credential reset may give the attacker control of email, payroll, banking, or vendor portals.
A fake executive video may cause staff, clients, or media to believe leadership said something damaging.
A video meeting may be used to pressure a staff member into sharing a screen, approving access, or skipping a normal step.
Insurance questions may focus on whether the business had callback, dual-approval, and social-engineering controls.
Staff may stop trusting phone and video communication for legitimate urgent work.

There is also an evidence problem. The business may not have a recording of the call. Caller ID may be spoofed. A video meeting may have been scheduled through a compromised account. The investigation may depend on phone records, banking logs, email headers, meeting metadata, chat messages, and staff recollection.

Controls that interrupt the failure path

The first control is a verification rule for high-risk voice and video requests.

Voice or video can start a conversation. A high-risk action still requires callback to a known channel, second approval, or another approved verification step. Staff need permission to pause and follow the process even when the voice sounds like the owner.

Start here

Require callback to a known number before acting on high-risk voice or video requests.
Use two-person approval for wire transfers, banking changes, payroll routing changes, and credential recovery.
Use a pre-established verbal code phrase as a supplemental check, especially when normal callback is temporarily unavailable.
Treat caller ID, display name, and meeting invitation source as weak signals.
Document who verified the request, which known number or channel was used, and who gave second approval.
Give staff explicit permission to slow down any request that sounds urgent and irreversible.

Add where needed

Require authenticated meeting join, lobby control, and participant approval for finance, leadership, legal, HR, security, or client-confidential calls.
Use identity-at-join questions for high-stakes video meetings, based on facts an attacker cannot find publicly.
Verify pre-recorded executive videos through a second channel before staff act on any unusual request.
Keep public executive audio and video exposure in mind when setting verification rules for owners and leaders.
Train finance, payroll, HR, executive assistants, and operations staff on voice-clone scenarios tied to their actual workflows.
Review banking and payroll approval limits so no single call can trigger a large irreversible action.

The callback rule should use known records. If the caller says their phone died and provides a new number, treat that number as part of the request. Call the number already on file or use another approved channel the business controlled before the call.

The code phrase should be boring and protected. Avoid public slogans, pet names, sports teams, family details, or anything likely to appear online. Change it when staff who know it leave the company or when the phrase may have been exposed.

Policy rule this creates

Rule 07 of 13

Wire transfers, payment redirects, banking changes, payroll routing changes, credential resets, account recovery, and other irreversible business actions require verification beyond voice or video. The same rule applies to requests for credentials, MFA codes, recovery codes, or access details. High-risk voice and video requests must be verified through callback to a known number, in-person confirmation, a pre-established code phrase, or another approved channel before action is taken. Staff are expected to pause and verify even when the caller sounds or looks familiar.

Common questions about AI voice and video impersonation

The questions that come up most often when a business tries to extend the verify-the-action rule from email into voice calls, voicemail, and video meetings.

Is voice cloning actually being used against small businesses, or is it mostly an enterprise problem?

Voice cloning tools are now commodity software, with many available through free trials or low-cost subscriptions, and the audio threshold for a workable clone has dropped to seconds of clean voice. The same unit-economics shift that made AI-amplified text phishing viable at SMB scale (covered in phishing and payment fraud) applies to voice, because the cost of producing a convincing impersonation no longer requires bespoke effort per target. The relative business impact can be higher for an SMB, because a single fraudulent wire authorized over a voice call can seriously affect cash flow, payroll, or vendor relationships. Many SMB owners also have enough public or semi-public audio available through podcasts, sales calls, voicemail greetings, webinars, or social media for an attacker to attempt a convincing imitation.

Can we tell if a voice on a call is real or AI-generated?

AI voice generation has advanced to the point that real-time detection by ear is unreliable for the staff member taking the call. Subtle artifacts (slight unnatural pacing, breath patterns, certain phonemes under stress) can sometimes be heard, but the staff member is rarely positioned to listen analytically while answering a phone and responding to an urgent request. Real-time AI-voice detection products exist and are improving, but coverage depends on the product, the audio quality, and the specific voice model used. The operational rule is to treat voice as weak evidence for high-risk actions regardless of what any detection tool says or how familiar the voice sounds.

What if the caller says they're using a borrowed phone because theirs died?

Call the number already on file for that person, regardless of what new number the caller provides. The 'I am on a borrowed phone' framing is the key part of the social-engineering script because it gives the attacker permission to call from any number and shifts the conversation away from the known channel. If the real person cannot be reached on the known number, the request should follow the business's pre-approved backup verification path. Urgency is part of the attack pattern, so staff should not bypass verification just because the caller says the matter cannot wait.

What's a 'pre-established code phrase' and how do we set one up without making it weird?

A code phrase is a short word or sentence known only to a small trusted group (typically the owner, senior leadership, and finance approvers), distributed in person or through a separately authenticated channel, never by ordinary email, chat, or text. The phrase should not be derivable from public sources, so kids' names, sports teams, pet names, company slogans, and anything in social media or website content are out. A code phrase is a supplemental identity check, not a standalone approval for a major wire, banking change, payroll change, credential reset, or account recovery; callback to the number on file remains the primary control whenever it is available. Rotate on a regular cadence (at minimum annually) and after specific triggers: a phishing incident affecting one of the people who knew the phrase, a lost or compromised device, an employee with knowledge of the phrase leaving the business, or any suggestion that the phrase may have been exposed during a recorded, compromised, or suspicious call.

How do we train staff to challenge a familiar voice without making them feel rude?

Leadership has to make the pause expected and visibly endorsed before the call ever happens. Owners and executives should tell staff explicitly that 'even if it sounds like me, call me back' applies regardless of urgency, and they should mean it when staff actually follow the rule and ask. A script removes the awkwardness by framing the callback as standard procedure: 'I just need to call you back on the number we have on file before I process this, that's our standard verification.' Staff who feel they will be criticized for slowing down a request will skip the verification under pressure, which is why the cultural permission has to come from the top before voice or video fraud appears.

We already require two-person approval for wires. Isn't that enough?

Two-person approval works as a control only when the second person performs independent verification of the request itself, separate from whatever the first person saw or heard. In the article scenario, both approvers heard the same voice and acted on it; the second approval added a signature without adding any independent check. The fix is to require at least one of the two approvers to perform out-of-band verification (callback to the number on file) before approving, with the channel and number used recorded in the approval. Without that requirement, the second approval functionally duplicates the first instead of providing independent verification, which means the control reduces to a single point of failure.

Could someone fake my video on a Teams or Zoom call?

Real-time video deepfakes in live calls are technically possible, but voice impersonation is the more immediate SMB risk and video impersonation appears less common in routine SMB fraud. The more common video-impersonation patterns are pre-recorded 'executive announcements' telling staff to take an unusual action, and compromised real accounts used to schedule meetings where the audio or video is controlled by the attacker. If the business is hosting the meeting, lobby controls, participant approval, and identity-at-join questions based on non-public facts can help; if the business is joining a meeting hosted by someone else, especially from a possibly compromised real account, the meeting invite should be verified through a known channel before joining or acting on anything requested in the meeting. Video meeting platforms are adding AI-call-detection signals, but coverage will lag the threat and the verification rule should not depend on those signals.

Should owners and executives stop appearing in public podcasts, videos, or webinars to reduce voice-cloning material?

Stopping public appearances is not a realistic control for most owners, because podcasts, sales calls, webinars, recruiting videos, and LinkedIn presence matter for sales, hiring, and brand. One practical step is to use a phone-system default or text-to-speech voicemail greeting instead of the owner's own recorded voice, which removes one easily reachable source of clean audio. One recorded interview, sales call, webinar, or voicemail greeting may provide enough material for an attempted clone, so public-exposure reduction helps only at the margins. The main control is the verification rule that catches the fraudulent request regardless of how convincing the voice sounds.

What do we do if we already acted on what turned out to be a fake voice or video call?

The first hour is the highest-leverage window for voice or video fraud response, because bank recall odds drop sharply with time and the evidence that proves what happened can disappear from voice and video platforms quickly. Call the bank immediately and ask for the bank's wire recall, ACH return, EFT return, or fraud recovery process. Preserve the voice and video evidence the business has access to: caller ID details, any phone-system call recordings, voicemail audio, video meeting metadata, host-platform recordings, screen recordings, and chat threads from any related conversation. Check whether any account that scheduled the meeting or sent related messages may have been compromised, and review other recent banking changes, payment runs, credential resets, and approvals for the same pattern. Notify IT or the policy owner, and hand decisions about insurance claims, legal counsel, fraud or law-enforcement reporting (in Canada, the Canadian Anti-Fraud Centre; in the US, the FBI's IC3 and the affected bank's fraud line), and client notification to the owner with appropriate advice.

Turn this rule into a working AI policy

The Free AI Policy Kit turns the thirteen decisions from this series into editable documents: an AI usage policy, employee survey, tools register, incident checklist, and 90-day rollout plan.

Get the Free AI Policy Kit