If you write Behat tests for Drupal, you've probably touched at least one of these three projects: drupal/drupal-driver(Opens in a new tab/window), drupal/drupal-extension(Opens in a new tab/window), or drevops/behat-steps(Opens in a new tab/window). Together, they form the foundation of how most Drupal sites express what they're supposed to do, and how those expectations get checked in CI.
This week we tagged coordinated alpha releases of all three. drupal-driver 3.0.0-alpha1(Opens in a new tab/window), drupal-extension 6.0.0-alpha1(Opens in a new tab/window), and behat-steps 3.8.0(Opens in a new tab/window). We wanted to talk about how that happened, because the answer involves becoming co-maintainers of two of those projects and six months of focused engineering across three milestones.
Why this stack matters
The Drupal Behat extension is how thousands of Drupal sites express their behaviour in plain English. Given I am logged in as a user with the administrator role, when I visit /admin, then I should see the dashboard.
The three projects play complementary roles. drupal-driver provides a PHP API for interacting with Drupal directly from test code, used to create users, content, and configuration without going through the UI. drupal-extension is the Behat extension that wires the driver into Behat and ships the bundled step definitions Drupal teams use out of the box. behat-steps, which we've maintained at DrevOps since 2018, is an optional library of additional reusable steps that handle the messy realities of testing real sites.
When the upstream projects slow down, everyone downstream feels it. PHP 8.4 doesn't get tested. New Drupal field types don't get supported. Bugs sit in trackers for years. We use this stack ourselves on every Drupal project we deliver, so we had skin in the game. When the opportunity came to step up as co-maintainers, we said yes, and then we got to work.
Milestone one: making things maintainable again
When we started picking up commits on drupal-extension in late December 2025, the first job wasn't features. It was making the projects maintainable.
That meant CI that actually ran on current PHP. Both upstream projects now test across PHP 8.2, 8.3, and 8.4 against Drupal 10 and 11, with a lowest-dependencies matrix to catch dependency drift. That is ten combinations per push on each repository. The previous matrix tested neither PHP 8.4 nor Drupal 11.
It also meant code quality tooling that's now standard but had never been added to either upstream project. PHPStan at level 7 on drupal-extension, level 3 on drupal-driver, both running on every push. PHPCS with the Drupal coding standard plus our own DrevOps standard. Rector for automated PHP version upgrades. Parallel-lint, gherkin-lint, and composer-normalize. A separate lint job runs as a hard CI gate so quality regressions get caught at PR review, not in production.
Code coverage came next. drupal-driver now enforces a 95% coverage threshold; drupal-extension is at 70% and climbing. Coverage reports get posted as PR comments on every change so you can see what your contribution covered before merging it.
Then there was the actual testing approach, which is where this gets interesting. drupal-driver now runs kernel tests against multiple Drupal core versions, exercising every field handler end to end against a real Drupal kernel rather than mocked stubs. On top of that, we added a downstream smoke-check job that builds drupal-extension against the local copy of drupal-driver and runs its full test suite, so changes to the driver can't silently break the extension before they land. On drupal-extension itself, the Behat suite now runs against real running Drupal 10 and Drupal 11 sites and covers every bundled step the extension ships. The same approach we've used in behat-steps for years, applied upstream.
This is the unglamorous part of maintainership and it took a few weeks to land across both projects. Without it, none of the bigger work that followed would have stuck.
Milestone two: clearing the contribution backlog
The next phase was the harder conversation. Both upstream projects had pull requests open from contributors going back years. Some had real value but had simply waited too long to get merged. Others needed re-rolling against current code. A few needed a conversation with the original author about whether the approach still made sense given what had changed in Drupal core since the PR was raised.
We took the time to go through them properly. On drupal-driver, we merged ten community contributions in a single April working session, with full attribution preserved to the original authors. That includes:
- A proper
FileHandlerthat supports actual managed-file lookup (originally PR #123, one of the oldest open contributions on the repo) - A
DaterangeHandlerwith timezone-aware formatting - A
NameHandlerfor the Name field module - An
OgStandardReferenceHandlerfor Organic Groups - A rewritten
AddressHandlerwith named-key support instead of brittle positional columns - And several smaller improvements that had been waiting for years to land
We re-rolled them against the new architecture, ran them through the new test suite, and got them in. The contributors who originally raised them got credited in the commit history, which matters. Some of these PRs predated the current Drupal version entirely, so getting them merged required understanding both what the original author had been trying to solve and how Drupal had moved since.
Bug fixes followed the same pattern. We worked through the open issue tracker on both projects systematically, closing real bugs with real test coverage. Issues like base fields not being detected correctly, Drush 12+ output formats breaking user parsing, and config caching that wasn't refreshing between scenarios and broke change detection in tests. The kind of bugs that bite real projects in production CI runs and are hard to track down.
Milestone three: re-architecting for the next decade
With the projects stabilised and the contribution backlog cleared, we could finally do the bigger work. drupal-driver 3.x and drupal-extension 6.x represent a re-architecture that drops support for Drupal 6 through 9, requires PHP 8.2 minimum, and opens the door for the kind of contributions that the old architecture made painful.
The driver gained a composable capability interface model. Instead of a monolithic BaseDriver class that every driver had to extend, drivers now declare which capabilities they implement: ContentCapability, UserCapability, CacheCapability, MailCapability, and so on. Twelve capability interfaces in total. This makes it possible to write a partial driver, or extend an existing one, without inheriting half the API you don't need.
Field handling got a complete redesign. The old fieldExists and fieldIsBase pair has been replaced by a FieldClassifier with nine predicates that distinguish between base fields, configurable fields, computed fields, and custom-storage fields. The full truth table is documented in the repo. Field handlers themselves now register through a directory-based registry, so consumer projects can add their own handlers without forking. Entity stubs are now typed objects with proper accessors instead of \stdClass.
The extension shipped its own breaking changes. Step-definition method names now follow a strict convention where @Then methods contain Assert and @Given and @When methods don't, so a CI validator can statically check that a method's intent matches its annotation. Twenty-one method names changed to comply. Twenty step text patterns were updated to be more readable and consistent. Configuration keys got renamed for clarity. The full list is in the UPGRADING.md(Opens in a new tab/window) guide.
This is the kind of work that's easy to argue for and hard to execute, because every single change has to be considered against backward compatibility, downstream impact, and maintainer bandwidth. We did the considering, made the calls, and shipped it.
A note on AI
We get asked about AI a lot, so a quick note on the role it played in this work.
AI was a real participant. It helped us triage years of issue tracker history and surface which open PRs were still worth landing. It drafted first passes of replies to contributors whose work was finally getting merged. It worked alongside us on commit messages, release notes, UPGRADING.md tables, and cross-checks between code and documentation. It read diffs back to us so we could spot what we'd missed. On a project that spans nine years of accumulated context, that kind of help is significant, and we'd rather be honest about it than pretend otherwise.
The thing that mattered, and the part we'd encourage other maintainers to think about, is that we were guiding it. Every output came back through human review before it went out under our names. Decisions about which PRs to merge, which architecture to adopt, which APIs to break, which contributor responses to send: those were ours, informed by AI's drafts but never delegated to them. Good results came from good prompts, careful review, and tight loops. The skill is in the guiding, not the prompting.
The result: 200+ commits across the two upstream projects in six months, on top of our regular client work. The judgment, the review, the relationships, and the responsibility stayed with us. AI was the leverage that made the pace possible. Both parts of that sentence matter.
What's next
Three coordinated alpha releases is a milestone, not a finish line. We need testers. We need real projects to upgrade and tell us where the upgrade guide misses things, where the new architecture surprised them, where a bundled context they subclassed got renamed in a way the rename guide didn't catch.
If you maintain a Drupal project that uses Behat, please consider trying the alphas. The upgrade path is documented. The CI is green. The failure modes we know about are listed. What we need now is the failure modes we don't.
Issues, feedback, and questions are welcome on the relevant repos. We're around.