Every few months, a new tool promises to solve test maintenance forever. "Self-healing tests!" the marketing says. "AI-powered selector recovery!" The pitch is compelling: your tests break less, your team spends less time fixing them, everyone ships faster.
I've evaluated a lot of these tools. I've built some of this functionality myself. And I've come to a conclusion that vendors won't like: most self-healing is just expensive flakiness management.
What Self-Healing Actually Does
Here's how most "self-healing" systems work:
- A test fails because a selector no longer matches
- The system looks for similar elements on the page (same text, similar position, matching attributes)
- If it finds a candidate, it updates the selector and reruns the test
- If the test passes, the "healing" is considered successful
This works great in demos. A button moves, the tool finds it, the test passes. Magic.
The Problem Nobody Talks About
What happens when the self-healer finds the wrong element?
Imagine a checkout flow where the "Place Order" button gets replaced with a two-step confirmation. The old button is gone. A new "Continue" button appears in roughly the same position with similar styling.
The self-healer sees the test fail, finds "Continue," clicks it, and... the test passes. But you're no longer testing what you thought you were testing. The original test verified that orders get placed. The "healed" test verifies that users can click a button.
This is the core issue: self-healing prioritizes test stability over test validity.
When It Gets Worse
The really insidious failures happen slowly. The healer makes small adjustments over time—a slightly different selector here, a fallback locator there. Each individual change seems reasonable. But after six months, you have tests that:
- Don't match your original test design
- Click elements you never intended to interact with
- Pass consistently while missing actual bugs
- Can't be reviewed because nobody remembers what they were supposed to test
I've seen teams with 80%+ automation coverage and zero confidence in their test suite. The tests run, they mostly pass, and nobody trusts them.
What Actually Works
Instead of healing broken tests, invest in not breaking them:
1. Use Stable Selectors
data-testid attributes are boring and effective. They're not coupled to
styling or content. They survive refactors. They communicate intent.
// Bad: Will break when copy changes
await page.click('text=Submit Order');
// Good: Survives UI changes
await page.click('[data-testid="checkout-submit"]');
2. Test at the Right Level
Most flaky tests are E2E tests that should be integration tests. If you're testing business logic, you don't need a browser. If you're testing API contracts, you don't need a frontend. Reserve E2E for actual user journeys.
3. Accept Some Maintenance
Tests that break when the UI changes are giving you information. Maybe the change was intentional and the test needs updating. Maybe the change was accidental and the test caught a bug. Self-healing hides both signals.
4. Invest in Observability
When tests fail, you need to understand why quickly. Good failure messages, automatic screenshots, video recordings, network logs. Make debugging fast rather than avoiding it.
A Better Use of AI
If you're going to use AI in test automation, use it for things that actually help:
- Failure analysis: Explain why a test failed in human-readable terms
- Test generation: Suggest new test cases based on code changes
- Flakiness detection: Identify tests that fail intermittently and need attention
- Coverage gaps: Find areas of the application that aren't tested
Notice what these have in common: they help humans make decisions rather than making decisions for humans.
The Bottom Line
Self-healing tests are a response to a real problem—test maintenance is expensive and often frustrating. But they solve it in the wrong way.
A test suite that runs green isn't valuable. A test suite that catches bugs is valuable. Self-healing optimizes for the former at the expense of the latter.
If your tests break constantly, that's a signal to improve your testing strategy—not to paper over the symptoms with automation.
Related Reading
- AI Token Costs in CI/CD - a practical governance model for AI usage in QA pipelines.
- Why I'm Skeptical of Low-Code Testing Tools - trade-offs to evaluate before buying automation platforms.
- playwright-spec-doc-reporter - an open-source reporting tool with AI-assisted failure analysis.
Originally shared on LinkedIn. Have thoughts? Let's discuss.