The risky moment can look like a shortcut that works.
Someone has a repetitive task: rename files, clean a spreadsheet, merge exports, update records, move documents, reconcile invoices, or build a small automation. They ask AI for help. The AI gives them a script, a command, a macro, a browser extension, a low-code flow, or installation steps.
The instructions are clear. The explanation sounds confident. The first test appears to work.
Then the person runs it against real business data.
That is the point where AI stops being advice and becomes execution.
If the script is wrong, the package is malicious, the command hits the wrong folder, or the automation behaves differently at scale, the damage lands directly in the business.
What the risk is
This risk is staff treating AI as expert authority on technical work and running what it produces without review by someone qualified to assess what it will actually do in the business's environment.
The output may be:
- A PowerShell, Bash, Python, or JavaScript script.
- A command copied into Terminal, PowerShell, Command Prompt, or an admin console.
- An Excel macro, Office script, Google Apps Script, or SharePoint automation.
- A Power Automate, Zapier, Make, or similar low-code workflow.
- A browser extension, add-in, or desktop utility the AI recommends installing.
- AI-generated application code shipped into a customer-facing system.
- A package installation command such as
pip installornpm install.
AI writes plausible technical instructions, and those instructions become dangerous when people execute them against systems staff do not fully understand.
AI can generate code without knowing:
- Which folder contains production data.
- Which files have backups or version history.
- Which records are legally or contractually important.
- Which applications use hidden IDs, links, metadata, or naming conventions.
- Which user account has access to which systems.
- Which package names are real and trustworthy.
- Which command will behave differently on this endpoint, tenant, version, region, or configuration.
This is separate from Agentic AI. In agentic AI, the AI takes the action itself. In this article, a person takes the action by running the AI's output.
It is also separate from Developer workstation infrastructure. That article covers the persistent endpoint risk created when AI-driven work causes staff to install Python, Node.js, VS Code, package managers, browser automation, local model tools, or similar developer infrastructure. This article covers the immediate risk of executing the AI-generated output: the script, command, macro, installer, package, flow, or code.
It is separate from wrong or mixed AI output, which covers AI output that is wrong as information. Here, the output is dangerous because it runs.
How it happens in a normal SMB
A small law firm decides to clean up a shared client folder before moving to a new document-management system. The folder contains years of Word documents, PDFs, scanned letters, signed agreements, draft filings, and client correspondence.
The office manager has been asked to rename old files into a consistent format:
YYYY-MM-DD - Client Name - Matter Number - Description
Doing it manually would take days. She asks the firm's sanctioned AI assistant for a PowerShell script that can rename the files automatically. She gives the AI a few examples of old filenames and the desired new pattern.
The AI returns a script and explains each step. It says the script will extract dates from filenames, preserve the client name, and rename each file into the new convention. The explanation sounds reasonable. The office manager has used AI successfully for email drafts, spreadsheet formulas, and policy templates, so she trusts the output.
She tests it on six copied files. The test works.
A non-malicious script can still be wrong in ways the test did not reveal.
The filenames vary in ways the sample did not show: two dates, no date, underscores, client short names, matter numbers that look like dates, and generic scanner names. The script handles the simple cases and fails silently on the edge cases.
Hundreds of files are renamed into ambiguous names. Duplicate outputs receive suffixes, some matter numbers disappear from filenames, and other documents move into the wrong year folder because the script picked the wrong date.
The office manager does not notice immediately. The script finishes, prints a success message, and the folder looks cleaner at first glance.
The problem shows up the next morning: a paralegal cannot find the right version of a filing package, another staff member finds a client letter under the wrong matter, and a partner realizes that several renamed files no longer contain the information needed to match them to the file record.
The business now has to stop normal work and reconstruct what happened.
The damage came from unreviewed automation running with the office manager's normal write access to a shared business folder.
The failure path
The failure path looks like this:
-
A staff member has a repetitive business task.
-
AI produces a script, command, macro, installer, package recommendation, or automation flow.
-
The output sounds competent and may work on a small sample.
-
The staff member runs it against real business systems, shared data, client records, financial records, or production workflows.
-
The AI output behaves incorrectly at scale, in edge cases, or in the specific business environment.
-
Data is corrupted, moved, renamed, overwritten, exposed, deleted, duplicated, or changed in a way the business did not intend.
-
The business discovers the issue through broken workflow, missing records, customer impact, accounting discrepancies, unusual access, or a failed downstream process.
-
Recovery depends on backups, logs, version history, and whether anyone can reconstruct exactly what the AI-generated technical output did.
The dangerous part is confidence. Several successful AI-assisted shortcuts make the next shortcut feel routine. That is when staff stop treating the output as code and start treating it as instructions from an expert.
The risk is not limited to formal programming. A Power Automate flow that updates the wrong SharePoint library, a spreadsheet macro that strips leading zeros from account numbers, a Zapier workflow that sends customer records to the wrong destination, or an AI-recommended browser extension that reads every page can cause the same class of damage.
Business consequence
The first consequence is operational loss.
In the law firm example, staff must restore files from backup, compare restored files against the current folder, identify which work was lost after the backup point, and reattach documents to the correct matters. That can consume days of partner, paralegal, and administrative time. It can also create missed deadlines, awkward client conversations, and uncertainty about whether the recovered folder is complete.
Other consequences depend on what the automation touched:
- Financial records can be duplicated, misclassified, deleted, or imported with the wrong fields.
- Client records can be merged, overwritten, exposed, or attached to the wrong file.
- Email automations can send the wrong attachment, wrong recipient list, or wrong customer segment.
- CRM automations can change opportunity stages, owners, notes, or follow-up dates at scale.
- Payroll or HR exports can be transformed in ways that are hard to detect until a downstream process fails.
- Customer-facing code can introduce security defects, broken workflows, or misleading behaviour.
- Package installation can pull malicious code from public registries, especially when AI recommends a hallucinated package name that an attacker has registered.
The evidence problem is practical. After the script has run, the firm may not know which files were touched, which records changed, which package was installed, which command was copied, or whether the AI chat still contains the exact output. If the user edited the script before running it, the final executed version may be lost.
That makes this both a business-continuity problem and a security problem. The incident may be data corruption, unauthorized disclosure, credential exposure, malware installation, or a production defect, depending on what the AI-generated technical output did.
Controls that interrupt the failure path
Start with a simple rule: AI-generated technical output needs qualified review before execution, even when the AI explains it clearly.
Qualified review means IT, the MSP, a developer, or a designated technical reviewer who can read the script or automation, understand the system it will touch, and explain what will happen if it fails.
Start here
- Require qualified review before AI-generated scripts, commands, macros, automations, installers, package commands, browser extensions, or code are run against business systems, shared resources, client data, financial data, regulated data, or production workflows.
- Test on a representative copy before touching the live folder, live spreadsheet, live CRM, live accounting system, or live customer environment.
- Confirm there is a current backup, version history, export, or rollback path before running any automation that can change or delete data.
- Keep the final executed item: the script, command, macro, flow definition, package list, and AI prompt thread where practical.
- Treat low-code automations as change-controlled systems when they can change client, financial, regulated, or operational data.
- Limit write access to shared resources. A script can only damage what the user running it can write to.
- Remove local admin rights from ordinary users as endpoint hardening for installation and system-change paths.
Add where needed
- Use package allowlists or controlled proxies for public registries such as PyPI and npm.
- Do not install packages, tools, extensions, or add-ins just because AI recommends them.
- Prefer managed automation platforms with logging over one-off scripts run from a user's workstation.
- Require pull request, peer review, testing, and deployment controls for AI-generated code in customer-facing applications.
- Disable or restrict Office macros, unsigned scripts, browser extensions, and add-ins by default.
- Use separate test locations for file operations, import jobs, data cleanup, and bulk record updates.
- Ask the MSP or technical reviewer to check destructive commands, recursive operations, wildcard paths, credential handling, external network calls, and package provenance.
The standard should scale with consequence. A formula on a personal scratch spreadsheet needs a lighter process than a script that touches client files, accounting data, payroll records, or production systems. AI can be useful for technical work, but execution needs review when the blast radius is real.
Backups help, but they may not show which work disappeared after the backup, which files changed later, which clients were affected, or whether downstream systems consumed corrupted data before the restore.
Policy rule this creates
Rule 10 of 13
AI-generated commands, scripts, macros, automations, installers, package-install commands, browser extensions, add-ins, and application code may not be run against business systems, shared resources, client data, financial data, regulated data, or production workflows without review by IT, the MSP, a developer, or a designated technical reviewer. Any automation that can change, delete, move, expose, or transmit business data must be tested on non-production data first and must have a documented rollback path. Public-registry package installation requires approved sources or allowlisting where available. Local admin rights are limited to roles that require them.
Common questions about AI scripts and automation
The questions that come up most often when a business starts noticing how often AI is handing staff scripts, commands, macros, low-code flows, or installation steps to run.
We don't have programmers on staff. Is this really a risk for us?
AI now hands staff in office manager, finance, marketing, and operations roles scripts, commands, macros, Power Automate flows, and installation steps to solve everyday business problems, which means the technical-execution surface has grown without programmers entering the building. The bar to copy-paste a PowerShell command, a Python script, or a low-code workflow is low, and the AI explanation that comes with it sounds confident enough that staff trust it. SMBs typically don't have a designated technical reviewer who can read what the AI produced and explain what will actually happen against the business's specific systems. A single AI-generated script that corrupts the shared drive, mis-renames a client folder, or runs the wrong Power Automate flow against a live system can consume days of staff and owner time at SMB scale.
Our staff use Power Automate, Zapier, and Excel macros. Do those count, or is this only about scripts?
Power Automate, Zapier, Excel macros, Office scripts, Apps Script, SharePoint automations, browser extensions, and low-code workflows all count, because the risk is execution against business systems regardless of the technical form. The distinction from agentic AI is that here, a person runs the AI's output, so the qualified-review checkpoint sits between the AI suggestion and the person executing it. A Power Automate flow that updates the wrong SharePoint library, an Excel macro that strips leading zeros from account numbers, or a Zapier workflow that sends customer records to the wrong destination can cause the same kind of damage as a misbehaving PowerShell script. The blast radius is determined by what the executable touches in the business.
What does 'qualified review' actually look like for an SMB without a full-time IT team?
Qualified review is reading what the AI produced and being able to explain what it will do against the specific system it touches; the right reviewer depends on what the script reaches. Review by a second competent staff member can fit routine cases like a basic Excel macro that only touches the user's own file, a simple Zapier flow between two approved tools, or a Power Automate workflow that posts to a personal Teams channel. MSP review (or a designated technical reviewer's review) is the right lane for anything that runs against shared resources, customer or client data, financial or accounting systems, regulated data, payroll or HR data, production workflows, or anything with a multi-step blast radius. When in doubt about which lane an AI-generated script belongs in, the safe default is the MSP lane.
Doesn't requiring review before running AI-generated scripts slow staff down on the productivity AI is supposed to deliver?
The review requirement applies when the script touches shared resources, business systems, customer or client data, financial systems, regulated data, or production workflows. AI-assisted productivity inside a single user's own files (drafting an email, fixing a formula in a personal scratch sheet, summarizing the user's own document, building a personal-use macro) sits below the threshold and runs without review. Review is reserved for the cases where a wrong answer can damage shared business data. The practical line for staff is whether the AI's output will run against something other people in the business depend on.
What's a 'test on a representative copy' when we don't have a staging environment?
The practical pattern for most SMBs is copy the thing to a test location first, run the AI-generated automation against the copy, and inspect the result before pointing the automation at the live target. Concrete versions include duplicating the folder to a test path before running a file-rename or move script, restoring a recent backup into a separate folder before cleanup or merge operations, using test modes or test runs against non-production connections/data where available, and creating draft tables, test records, or sandbox environments where the platform offers them. Where no sandbox exists, a representative subset of real data copied to an unused folder, sheet, or test record is the SMB version of staging. The representative-copy test should include the messy edge cases (unusual filenames, missing fields, multi-date entries, mixed formats) the AI's small-sample test did not see.
When AI tells someone to run pip install or npm install, what's the risk?
Public package registries (PyPI for Python, npm for Node.js, similar registries for other languages) host code anyone can publish; a copy-paste install command runs that code on the machine that ran it, with that user's permissions. AI assistants sometimes invent package names that look plausible but do not exist on the registry (the slopsquatting pattern), and attackers can register those names, so a copy-paste install command can fetch attacker code under a legitimate-sounding name. Even when a package is real, it may include code the business did not intend to run; vulnerable, malicious, or maintainer-compromised packages reach the endpoint through normal install commands. The practical controls are verifying the package exists on the official registry, checking the publisher, project page, recent activity, dependency tree, and download history, confirming the package is actually required, routing installation through the MSP or technical reviewer for anything beyond personal experimentation, and treating package installation as the persistent-endpoint change covered in developer workstation infrastructure.
What about browser extensions or Office add-ins that AI recommends installing?
Browser extensions and Office add-ins inherit access inside the product or browser where they run, but the scope depends on the permissions they declare and the access the user approves. A browser extension can request access to specific sites, all sites the user visits, or only the active tab, while an Office add-in may request read or read/write access inside Outlook, Excel, Word, or whichever Office product hosts it. AI-recommended extensions and add-ins should go through the same qualified-review process as scripts, because broad permissions can reach banking, admin consoles, webmail, CRM, or business files and the installation persists beyond a one-time task. Microsoft 365 and Google Workspace offer admin controls for what users can install, though the mechanism and coverage vary by product, tier, and platform; where those controls are not available, the fallback is a documented approved list, quarterly install review, and a default of not installing on AI recommendation alone.
What about AI-generated code in our customer-facing systems, like a website customization or a Shopify extension?
AI-generated code in customer-facing systems is the business's responsibility regardless of who or what wrote it. The category includes website customizations, Shopify or Squarespace customizations, Airtable extensions, HubSpot modules, and small internal applications a person writes with AI assistance and ships externally. This is different from the AI features shipped by vendors, covered in vendor AI features, because the business is accountable for the code's behavior even when AI wrote it. AI-generated code can introduce security defects (cross-site scripting, SQL injection, exposed credentials), broken customer-facing workflows, or behavior that misrepresents prices, policies, or commitments. The review and testing process should include a technical review by someone other than the person who wrote it, testing against a representative copy or sandbox before deployment, and a rollback path that can be exercised quickly if a defect is detected after launch. For meaningful customer-facing surfaces, the right reviewer is the MSP, an outside developer, or a contracted code reviewer.
What should we keep on file when AI gives staff a script or command?
For any AI-generated script, command, macro, or automation that ran against business systems, keep the version that actually executed (which may differ from what the AI originally produced if the user edited it before running), the AI chat thread or prompt that produced it, and the timestamp and account that ran it. For installations, keep the install command as run, the package name and version, the source registry, and the resulting installed-package list. For low-code workflows (Power Automate, Zapier, Make), keep the flow definition export and a record of when it was published or modified. The evidence enables a technical reviewer to reconstruct what happened during incident response and supports renewal-time review of what the business is actually running.
An AI-generated script just damaged our shared drive. What do we do?
Containment comes before evidence collection in AI-script damage response, because every additional run can extend the damage and overwrite the state that determines what can be recovered. Stop running the script (and any related flow, macro, or installer) before doing anything else. Capture the evidence first: the executed script or command, the AI chat thread, the timestamp and account that ran it, and any installed-package list, so a technical reviewer can determine exactly what the automation touched. Preserve relevant logs, recycle bin/version history state, flow run history, and endpoint evidence before cleanup or restoration where feasible. Triage the affected systems by recoverability: file operations may be partially reversible from version history or backup if action is taken before retention expires; record changes in CRM, accounting, or other databases may need a point-in-time restore that loses subsequent legitimate work; package installations may require endpoint review for what the package actually did beyond installing itself; and data or payments sent externally through an automation may follow the financial-recovery process covered in phishing and payment fraud. Route the response through the MSP or technical reviewer, because what the AI-generated technical output actually did and what downstream systems may have consumed corrupted data before the issue was caught usually exceeds the depth most internal staff can assess alone. Notification decisions for affected clients, customers, or regulators depend on what was damaged or exposed and stay with the owner and appropriate counsel.
One of 13 rules for your AI usage policy
The rule above is one of 13 that make up a working AI Usage Policy. The SMB AI Policy Builder walks you through the full set of decisions and produces the policy, working documents, and a 90-day implementation plan.
Launching soon. Join the waitlist to be notified.