Some thoughts on AI Agents.

Agents

I’ve been trying to learn more about agents. I still need to code one because I think that might be the only way I’m going to really understand it.

Here is some reading that I intend to follow up on about agents.

AWS Strands API

https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/ and

https://strandsagents.com/latest/documentation/docs/examples/

This is the framework being used internally at AWS for agentic development. At BMJWe are working on a small POC with AWS right to explore agentic systems and how they can support editors in assessing papers. I’m very excited about this, and having spoken with folks at Scholar One, Wiley, Cactus, and others, what we are trying out in this space is close to what others are thinking.

The POC will look at multi-agent systems in the context of paper evaluation and the strands framework is going to be the workhorse of this POC.

The strands blog has a few quotes that I like.

We found that relying on the latest models' capabilities to drive agents significantly reduced our time to market and improved the end user experience, compared to building agents with complex orchestration logic.

And I like this:

You can easily use any Python function as a tool, by simply using the Strands @tooldecorator.

And I like this:

An agent interacts with its model and tools in a loop until it completes the task provided by the prompt.

The page says for the demos:

You will need a GitHub personal access token to run the agent. Set the environment variable GITHUB_TOKEN with the value of your GitHub token. You will also need Bedrock model access for Anthropic Claude 3.7 Sonnet in us-west-2, and AWS credentials configured locally.

At the same time it should be possible to run entirely locally too, but I’m not quite sure how to do that. It seems like an opinionated way of creating and running agents, which is not a bad thing by any means.

You can think of Strands as an interface, but the infrastructure that the agents run on, inside AWS, is https://aws.amazon.com/bedrock/agentcore/. This infra is touted as being server less, perhaps it’s hoisted on top of S3 / Dynamo and Step functions? It can retain state between interactions and has dashboards for monitoring. The monitoring is a part that I am very interested in, I think a current challenge for agentic workflows is monitoring for test success, vs monitoring for activity.

So that’s all about how to create agents within the AWS ecosystem, and to be honest, there is a lot of overhead there to get started, compared to just spinning something up very quickly on one’s own local system.

Agent Hello World

The following blog is the “hello world” example of how to write an agent - https://fly.io/blog/everyone-write-an-agent/

I love this line.

Security for LLMs is complicated and I’m not pretending otherwise. You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting. But I’m a vulnerability researcher. It’s reasonable to back away slowly from anything I call “interesting”.

This is also very insightful.

You’re allotted a fixed number of tokens in any context window. Each input you feed in, each output you save, each tool you describe, and each tool output eats tokens (that is: takes up space in the array of strings you keep to pretend you’re having a conversation with a stateless black box). Past a threshold, the whole system begins getting nondeterministically stupider.

I love this line from this blog post.

To do that, we remembered everything we said, and everything the LLM said back, and played it back with every LLM call. The LLM itself is a stateless black box. The conversation we’re having is an illusion we cast on ourselves.

That last comment is particularly key. We believe that we are interacting with an entity, but we are interacting with a fixed state system, it’s so so weird. It’s like the substance dualism, the conformance theory, the great Hoodini bait and switch, and yet maybe that’s how our brains are wired too?

I think this post is a must read for anyone interested in agents, I really do.

Agents in practice

So those are two ways of thinking about moving to deploying agentic systems. That’s all very well and good, but what do we need to make agents work inside an organisation?

This post - https://mmc.vc/research/state-of-agentic-ai-founders-edition/ - looks at that questions and I am utterly unsurprised that it’s headline finding is the following

The biggest challenges founders encounter when they are deploying AI Agents in production environments are actually _not_of the technical variety, instead they are:

Workflow integration and the human-agent interface (60% of startups)
Employee resistance and other non-technical factors (50%)
Data privacy and security (50%)

Also interesting

As the ecosystem is in such nascent stages, most (52%) startups are building their agentic infrastructure fully or predominantly in-house.

But why use AI agents at all? Why not RPA (Robotic Process Automation) or other traditional forms of automation? That’s because AI agents are better for complex, dynamic, and unstructured tasks that require cognitive ability, reasoning, and adaptability. Unlike RPA which follows rigid, pre-defined rules,

“AI proliferation creates selling friction. Every incumbent provider promises AI enabled point solutions now, which are often initially attractive to customers as it is covered by a committed budget. But this results in a fragmented AI strategy and very often fails to bring the latest innovation; not all AI is equal.”

This isn’t a new problem; we’ve always had these issues with enterprise software. But here’s a fun fact for you – 42% of enterprises need access to eight or more data sources to deploy AI agents successfully. It’s not as much fun when you’re working through it all: legacy tech stacks don’t always have an API, documentation is lacking, customers rely on a variety of super-walled archaic applications that keep the company knowledge blocked, so data is siloed and distributed… and the list goes on.

The most successful deployment strategies we’ve seen started with:

simple and specific use cases with clear value drivers, that were low risk yet medium impact;
weren’t majorly disruptive to existing workflows;
preferably automating a task that the human user dislikes (or was outsourced);
the output of the workflow can be easily/quickly verified by the human for accuracy or suitability; and
demonstrated clear ROI quickly

Forward Deployed Engineers (FDE) driving adoption forward: A Forward Deployed Engineer (FDE) is a software engineer who works directly with customers, often embedded within their teams, to solve complex, real-world problems – so it’s a hybrid role where an FDE is a software developer, a consultant and a product manager, all rolled into one.

So to conclude:

Agents are not all that mysterious, in terms of hooking LLM loops together, and stuffing the LLM with context. The weird thing is that once you do the context stuffing whatever weird gradient descent has been built into the LLM actually ends up having the LLM invoke tools that its context knows about.
MCP is little more than a way to stuff context, and you can do that directly - either by manually coding the agentic loop.
I’m not going to feel really comfortable until I’ve seen a few examples of the kind of management plane reporting that are available in agentic systems, and it seems that this is early enough days that most enterprises that are using agents well are rolling a lot of their own infrastructure.
Business and data challenges remain the key thing holding back using these tools.