The Moment APIs Meet Agents

The industry is entering a phase where the primary consumer of the API is no longer a human-driven interface, but an autonomous agentic system. MCPs (Model‑Context Protocols), tool‑calling frameworks and autonomous workflows are turning APIs into the main action interface for AI systems. Current implementations largely rely on transitive compatibility, treating the agent as a legacy client and ignoring the unique constraints of the token economy.

That approach assumes the API surface designed for human developers will automatically suit agents. It ignores a deeper question: why were APIs designed as they were in the first place? For more than a decade, API design has optimized for composability — small, reusable, well‑scoped endpoints. This made perfect sense when the consumers were human coders and user interfaces. Composability let teams build UIs incrementally, debug flows step by step and orchestrate distributed services manually.

Agents, however, operate under different constraints. They live in a token economy: every API call consumes tokens, increases context and state to track and adds potential failure points. They pursue goals rather than stepwise CRUD operations. Composability remains a valid architectural pattern for human-in-the-loop systems, but it is no longer the optimal target for autonomous orchestration.


Why Composability Emerged — And Where It Shines

REST, CRUD and resource‑based design principles emerged because the first consumers of APIs were people writing code.

  • Incremental state changes: UI flows and human decision‑making happen step by step. Developers need to create an object, update it, attach related resources and finally complete an action.
  • Separation of concerns: Microservices architecture encouraged dividing systems into small services that could be composed. Each service presented a narrow interface centred on a resource.
  • Debugging and predictability: Smaller endpoints meant easier testing, less risk and clearer logs. Engineers could trace where a failure occurred because each call represented a single state transition.

In that context, composability was not just clever — it was essential. It allowed teams to build complex workflows through the composition of small parts. It scaled across organizations and remained agnostic to specific clients.


Compression: A New Driver for Agentic APIs

Agents don’t think in CRUD, and they don’t debug line by line. They think in terms of outcomes. They aim to achieve a goal subject to constraints, and they must do so within a limited context window and token budget. Every additional API call has costs:

  • Tokens and latency: Each call produces more data that needs to be included in context. Latency adds to the agent’s reasoning time.
  • State management: Agents need to track intermediate state across calls and reason about partial successes or failures.
  • Error surface: Each additional call introduces another point of potential failure and ambiguity.

Compression represents a fundamental inversion of the traditional API surface. Instead of exposing a series of granular endpoints (e.g. POST /orders , POST /orders/{id}/items, POST /orders/{id}/discount ), a compressed design might provide a single, outcome‑oriented endpoint (e.g. POST /checkout ). The agent sends its intent once, and the backend handles the multi‑step orchestration deterministically.

Crucially, this is not just about saving tokens. It’s about relocating complexity. The orchestration logic moves from the agent (a probabilistic system) into backend code (a deterministic system). The question then becomes: where should complexity live?


Challenges and Considerations of Compressed APIs

Compression isn’t a silver bullet. It introduces its own design challenges. The following subsections explain these challenges and propose ways to address them.

The Black Box Problem and Error Granularity

A primary concern with highly compressed endpoints is observability. When everything happens in one call, do we lose visibility? Multi‑step REST flows let agents or developers see which steps succeeded and which failed. They can retry or adjust at each stage.

However, compressed endpoints do not have to be black boxes. If designed intentionally, they can provide more informative feedback than a fragmented flow. Instead of returning a generic 500 Internal Server Error, a compressed API can return a structured result with state‑aware error messages. For example:

  • The discount code SPRINGSALE has expired; suggest valid alternatives.
  • Inventory is insufficient for item SKU‑482; available quantities are [2, 3].

The agent receives a single response object containing all relevant issues and suggestions, reducing ambiguity and "goal entropy." Rather than reconstructing state across multiple calls, the agent handles one well‑defined result.

MCP Is the Delivery Mechanism — Compression Is the Strategy

Another point of confusion arises around MCP and similar tool‑calling protocols. MCP is a transport and exposure layer, not a design philosophy. It standardizes how capabilities are described and invoked by agents. If you expose fifty granular endpoints via MCP, the agent’s toolset becomes cluttered; tool selection becomes a reasoning burden.

Compression is a content strategy. It reduces the surface area by grouping low‑level primitives into higher‑order capabilities that align with outcomes. Instead of providing a bag of loosely connected actions, you provide a smaller set of powerful tools. This makes MCP more efficient: the agent has fewer but more meaningful choices. You are not hiding functionality; you are abstracting it for the agent’s needs.

Robustness and the Translation Gap

A third challenge emerges around robustness. Some suggest that agent‑specific APIs require entirely new validation models, but the core difference lies in the source of input. Traditional clients are humans or controlled UIs that follow documentation. Agents generate inputs probabilistically based on prompts, which can lead to a translation gap:

  • Passing a string where an enum is expected because it "felt right" in context.
  • Inventing parameters or skipping required ones because the model interpreted natural language loosely.
  • Combining actions in ways that conflict with user constraints expressed earlier in the conversation.

Robust agent APIs must validate both structure and intent. They must confirm not only that an ID exists, but also that the action aligns with constraints provided earlier. Compression aids robustness by simplifying the interface: a single intent yields a single deterministic workflow. The backend code handles the complex steps predictably, and the agent can focus on formulating clear intent rather than orchestrating the steps.


Compression vs. Composability: A Spectrum, Not an Either/Or

The debate between compression and composability is sometimes framed as a zero‑sum choice: either you build fine‑grained REST endpoints or you compress everything into atomic operations. In practice, they lie on a spectrum, and the right choice depends on context.

  • Human‑focused workflows benefit from composability. User interfaces often need to create resources, apply partial updates and handle errors gracefully at each step. Composability also supports experimentation and flexibility at the edge, letting teams assemble new flows without changing the backend.
  • Agent‑focused workflows often benefit from compression. When the consumer is an agent with limited context and cost constraints, fewer calls and deterministic orchestration can reduce failure and cost. Deterministic execution is valuable in regulated domains where emergent behaviour and probabilistic decision‑making are unacceptable.
  • Hybrid approaches are possible. It can be pragmatic to expose two surfaces: a granular API for human developers and a compressed, outcome‑centric layer for agents. This dual‑layer approach is similar to the separation between internal and public APIs or between synchronous and asynchronous interfaces. It acknowledges that different consumers have different needs.

Importantly, model capability improvements don’t negate this trade‑off. Even if future agents can manage state across many calls flawlessly, that doesn’t mean they should handle critical orchestration. Design isn’t only about “what models can do”; it’s about “where complexity should live.” Determinism, observability and governance remain essential regardless of model intelligence. Composability optimizes for human flexibility; compression optimizes for system determinism.

To assume that future advances in model capability will make orchestrating complexity inside the agent "for free" is a dangerous default mindset. We don’t design distributed systems with the expectation that CPUs will get faster, so we can ignore algorithmic efficiency. We don’t design databases on the premise that storage will become cheaper, so we can be lax about schema discipline. In the same way, we shouldn’t design API surfaces on the assumption that models will inevitably get smarter and cheaper and therefore can absorb all orchestration complexity. That assumption is not inevitable — it is a design choice with consequences.


Conclusion: Rethinking Responsibilities in the Agentic Era

As APIs and AI agents intersect, we have an opportunity to revisit assumptions about design. Composability was and remains essential for human‑oriented workflows. Compression offers a powerful alternative when the consumer is an agent operating in a token‑constrained, goal‑driven environment. The key is not to pick a side but to evaluate where complexity should reside.

A compressed endpoint doesn’t hide complexity — it relocates orchestration from a probabilistic agent to deterministic backend code. A composable endpoint doesn’t solve every problem — it assumes a human or agent can effectively assemble small parts into a goal. Capability does not imply responsibility. Just because models can orchestrate complex flows doesn’t mean they should, especially when determinism and governance matter.

The challenge for the modern architect is no longer the definition of resources, but the rigorous boundary-setting of intent, who the consumer is and what constraints the system operates under. Sometimes the right answer is to expose a rich set of composable primitives. Sometimes it’s to compress a workflow into a single, outcome‑driven call. In many cases, it may be both.

The future of the API is not merely a collection of endpoints, but a tiered ecosystem of capabilities. In the agentic era, the most successful systems will be those that resolve the tension between human flexibility and machine efficiency through the deliberate use of compression.