AI as the arsonist and the firefighter: Why automated tools miss what matters most

AI tools are simultaneously making your code less accessible while being marketed as the solution to fix accessibility problems. Your development team uses Copilot to generate code faster. Your QA team uses automated scanners to catch accessibility bugs. Both are real. Both are insufficient. And they're working against each other.

In H1 2025, 2,014 digital-specific lawsuits were filed. Developers are shipping inaccessible code at the same rate companies are being sued for inaccessible code. The gap between what AI can fix and what it's creating has become a liability.

This is Part 3 of our series on how AI fuels accessibility lawsuits. We've covered how crawl-by lawsuits industrialized litigation, how hallucinated citations don't protect defendants, and now we're tackling the tool that's supposed to make development faster—but isn't making it more accessible.

The 20–30% coverage problem

Let's start with what automated testing does well. Your scanner will catch missing alt text. It will flag low contrast ratios. It will find empty form labels. These are syntactic errors—structural problems that affect code, not context.

But here's the problem: automated tools cover only 20–30% of WCAG criteria. According to Deque's analysis, real-world detection rates sit at 57%. That leaves 70–80% of accessibility issues invisible to your scanning tool.

The AI code paradox: Training on 94.8% broken web

Now let's talk about the code AI tools are generating in the first place.

GitHub Copilot and similar AI assistants are trained on the open web. They learn patterns from millions of repositories, code snippets, and examples. The web is their training data. And 94.8% of homepages have detectable accessibility errors, according to the WebAIM Million 2025 study.

Code trained on bad data produces bad output.

When a developer asks Copilot to "create a button," they're statistically likely to get something like this:

Not a semantic `<button>` element. A `<div>` with an onclick handler and a hastily added role attribute. The model learned from repositories where this pattern exists. It often works. Good enough. Ships.

Except it's not good enough. Keyboard users can't tab to it without additional tabIndex handling. Screen reader users hear "button" but the interaction doesn't behave like a native button. The code works in happy-path scenarios. It fails everywhere else.

This isn't a Copilot problem specifically. It's a training data problem. And the training data is the entire broken web.

GitClear analyzed 211 million lines of code from real development teams. The findings: code cloning is up 48%, and code churn rose from 3.1% to 5.7%. More churn means code is being rewritten more often, which means it's less stable. Less stable code carries more accessibility debt—because developers ship, iterate, and fix later. Except "later" often never comes.

Higher code churn correlates with higher accessibility debt. Not because AI is malicious. But because the development velocity increase isn't matched by an accessibility discipline increase.

The hallucination problem in automated remediation

Here's where it gets thorny. AI can suggest fixes for straightforward issues. Missing alt text? AI can generate one. Missing form label? AI can add it. But fixes without context create new problems.

An AI remediation tool analyzes a form and suggests adding `aria-label="button"` to an element that already has a button role. The linter now passes. The suggestion silenced the warning. But a screen reader user hears "button button," or the aria-label obscures meaningful content. The tool fixed the error but broke the experience.

This happens because AI lacks the semantic understanding needed to fix accessibility properly. It sees a rule violation. It applies a rule-based fix. It doesn't evaluate whether the fix makes sense for actual users.

The remediation narrative is powerful. Ship broken code, then AI fixes it later. But in practice, fixing accessibility debt is harder than preventing it. Once bad patterns propagate, they lock into architecture, design systems, and team practices. Retroactive AI-generated patches create technical debt of their own.

What to do about it

This doesn't mean you should stop using AI tools. It means you should use them within boundaries.

Treat AI code generation as a starting point, not a solution. When Copilot generates code, enforce a review process that specifically flags accessibility concerns. Look for semantic HTML first. Check for ARIA misuse. If your team doesn't review for accessibility, you're outsourcing your accessibility decisions to a model trained on a broken web.

Pair automated testing with manual assessment. Run your scanner. Then do the 70–80% that requires human judgment. Get keyboard-only testing into your QA process. Test with real assistive technology. A clean scan report is a necessary condition, not a sufficient one.

Make accessibility part of code review, not a post-ship concern. If accessibility feedback is always a downstream activity, you're fighting human nature. Developers are incentivized to ship. Make accessibility part of the definition of done, and review it at pull request time.

Invest in accessibility domain knowledge on your team. Not everyone needs to be a WCAG expert, but your team needs people who understand assistive technology beyond linter rules. This is where AI tools are most useful—they handle the mechanical work, and your humans handle the judgment calls.

Push back on "fully automated" narratives. If a vendor says their tool solves accessibility without human involvement, they're overpromising. Accessibility requires human judgment. AI tools should amplify human judgment, not replace it.

In Part 4, we'll cover the hybrid strategy that actually works—how to combine automated tools, AI assistance, and human expertise in a way that reduces lawsuits instead of multiplying them.