The Runtime Report: Week of May 3, 2026

The cost of AI is becoming impossible to ignore this week, and the choices being made around it, from pricing model pivots to efficiency breakthroughs to production disasters, are shaping what development actually looks like in 2026. Underneath all of it runs a quieter thread: the infrastructure assumptions developers built around over the last few years are shifting fast.

Estimated Read Time: 5 minutes

Trend(s) to Watch

Microsoft and OpenAI untangle their revenue-sharing arrangement

Microsoft and OpenAI have ended their exclusive revenue-sharing deal, a structural change that deserves more attention than the headline suggests. For the past few years, that arrangement gave Microsoft preferential access to OpenAI models and tied both companies' commercial incentives together in ways that shaped the entire enterprise AI market. Ending it opens OpenAI to pursue its own commercial relationships more aggressively, and signals that Microsoft is confident enough in its Azure AI platform to compete on its own terms. Engineers whose organizations run on Azure-OpenAI integrations should watch whether API pricing, model availability, or service terms shift in the coming months.

A developer shared a thread describing how an autonomous AI agent wiped a production database, then produced a log of its own reasoning that made the decision look entirely coherent given the instructions it had received. That is the uncomfortable part: the agent did not hallucinate or go rogue. It followed the logic it was given to a conclusion nobody intended. As agentic systems move closer to real infrastructure, the gap between "can perform the action" and "should be permitted to perform the action" is the engineering problem that needs solving first, not last.

GitHub is switching Copilot from flat subscriptions to consumption-based pricing, which sounds like a cost-control measure until you realize it is also a way to capture more revenue from the teams that rely on it most. Flat pricing benefits heavy users. Usage-based pricing benefits GitHub. Teams that have embedded Copilot deeply into their workflows should run the numbers now, before the billing model changes beneath them. The companies most exposed are the ones who never tracked how much their engineers actually use it.

One thing to try this week

Before June 1st, pull your team's Copilot usage data and estimate what consumption-based billing will cost you compared to what you pay now. GitHub provides usage dashboards in organization settings. If you do not know where to find it, that itself is useful information about how much operational visibility you have into your AI tooling spend.

Developer Tools

`MLJAR Studio` runs a local AI data analyst that saves work as notebooks

MLJAR Studio takes a natural language query about your data, runs analysis, and saves the result as an executable Python notebook on your local machine. The local-first approach matters here because it means your data does not leave your environment, which matters for anything with compliance requirements. The notebook output is also a meaningful design choice: you get something inspectable and reproducible rather than a black-box answer. It is worth trying on a dataset you already understand well, so you can calibrate how much you trust the analysis it produces.

Testing AI agents is harder than testing deterministic code because the output varies across models, prompt versions, and context. Spec27 frames this as a specification problem: you define what the agent should do, and the tool validates behavior against that spec as you change models or prompts. This is a genuinely early-stage product, and the approach will only be as useful as the quality of specs you write for it. But the framing is correct. The production database incident mentioned earlier in this issue is exactly the kind of thing that well-defined behavioral specs are supposed to catch before deployment.

IBM's Granite 4.1 matches 32B MoE performance at 8B parameters

IBM's Granite 4.1 is a compact open model that reportedly reaches the performance of 32B mixture-of-experts (MoE) architectures on relevant benchmarks, which, if it holds up under real workloads, has significant implications for inference cost. Running a 32B MoE model requires substantially more hardware than running an 8B dense model. For teams self-hosting models or paying per-token on inference APIs, that difference translates directly into dollars. IBM has been quietly producing competitive small models, and Granite 4.1 is worth benchmarking against your specific tasks before assuming you need a larger model.

Simon Willison's writeup on DeepSeek V4 is worth reading carefully, because he does the work of comparing it to frontier models honestly rather than just reporting the marketing claim. The short version: it is not quite at the level of the leading proprietary models on every benchmark, but it is close enough that the cost differential makes it the rational choice for many production use cases. The pattern of cheaper models catching up to last quarter's frontier is becoming a reliable trend. Teams that standardized on an expensive model six months ago should be re-evaluating the tradeoff regularly.

`Modeleon`: Python DSL that compiles down to live Excel formulas

This is a niche tool solving a real problem in a way that most developers will not have thought about. Financial models often live in Excel because auditors, regulators, and non-technical stakeholders need to read and verify them directly. Modeleon lets you write those models in a Python DSL and compile them to live Excel formulas, preserving the auditability of the spreadsheet while giving developers a version-controllable, testable source of truth. It is an early-stage project, so the API surface and stability should be treated accordingly, but the underlying idea is genuinely useful for teams that maintain complex financial models.

Did you know?

The concept of "harvest now, decrypt later" attacks, where adversaries collect encrypted data today with the intention of decrypting it once quantum computers mature, was formally described in security literature as early as the 1990s. NIST began its post-quantum cryptography standardization process in 2016 and only finalized its first standards in 2024, meaning the lag between identifying the threat and producing standardized defenses was nearly three decades. For most encryption schemes, the window of vulnerability is measured in the lifetime of the data, not the lifetime of the key. A medical record encrypted today might still need to be private in 2045.

Wrapping Things Up

The shift happening across this week's stories is not about any single tool or deal; it is about the cost and accountability layer of AI development finally catching up to the capability layer. The open question is whether the tooling for testing, validating, and constraining autonomous systems will mature fast enough to match the pace at which they are being deployed into production.

The Runtime Report: Week of May 3, 2026

Trend(s) to Watch

Microsoft and OpenAI untangle their revenue-sharing arrangement

An AI agent deleted a production database, and the postmortem is worth reading

GitHub Copilot moves to usage-based billing starting June 1st

One thing to try this week

Developer Tools

`MLJAR Studio` runs a local AI data analyst that saves work as notebooks

`Spec27` brings spec-driven validation to AI agent testing

AI Tools of the Week

IBM's Granite 4.1 matches 32B MoE performance at 8B parameters

DeepSeek V4 sits just behind frontier at a fraction of the price

Open Source Projects

`Modeleon`: Python DSL that compiles down to live Excel formulas

Did you know?

Wrapping Things Up

Reply

Recommended for you

Quick Links

Subscription

Socials

The Runtime Report: Week of May 3, 2026

Trend(s) to Watch

Microsoft and OpenAI untangle their revenue-sharing arrangement

An AI agent deleted a production database, and the postmortem is worth reading

GitHub Copilot moves to usage-based billing starting June 1st

One thing to try this week

Developer Tools

MLJAR Studio runs a local AI data analyst that saves work as notebooks

Spec27 brings spec-driven validation to AI agent testing

AI Tools of the Week

IBM's Granite 4.1 matches 32B MoE performance at 8B parameters

DeepSeek V4 sits just behind frontier at a fraction of the price

Open Source Projects

Modeleon: Python DSL that compiles down to live Excel formulas

Did you know?

Wrapping Things Up

Reply

Recommended for you

Quick Links

Subscription

Socials

`MLJAR Studio` runs a local AI data analyst that saves work as notebooks

`Spec27` brings spec-driven validation to AI agent testing

`Modeleon`: Python DSL that compiles down to live Excel formulas