(Updated on 11/15/24)
As artificial intelligence continues to evolve, AI agents will take on increasingly complex tasks. Although many agent frameworks currently exist—such as AutoGen, LangGraph, and CrewAI - and numerous papers have been written along with some successful proof of concepts, agents have been challenging to move into production with current models. However, agents will soon be deployed across various devices (in retail, labs, banking), automating tasks or enabling things that were previously thought impossible.
The open-source community have been developing agents using LLMs for the past few years, but it was well known that major AI labs were preparing to jump into the agent space. They have been discussing this at length, with companies like Anthropic and OpenAI making it clear that this was a major focus for them. Their upcoming releases, along with contributions from the wider community, should make 2025 the year of AI agents, with many capable of working together beyond just computers connected to the internet.
And just this week, Anthropic announced a version of their AI model, Claude, capable of computer use - translating instructions, checking spreadsheets, scrolling, moving cursors, and executing a series of actions to achieve objectives. Claude does this by analyzing screenshots, counting pixels to determine cursor movements, and clicking in the correct places—a process that, while innovative, underscores a significant inefficiency in how AI interacts with software designed for humans.
This approach requires AI agents to mimic human interactions, essentially teaching them to navigate interfaces built for human senses and motor skills. It’s akin to asking a robot to use a screwdriver designed for human hands instead of giving it a tool tailored to its mechanical capabilities.
So this raises the question: Why are we making AI conform to human-centric software interfaces when we could design software specifically for AI agents?
We need to focus on creating software that is data and task centric and not human-UI centric. The AI does not care about its user experience. So user experience needs to change to helping the human user express their objectives, helping the user guide or correct the AI, and display the outcome of the interactions.
The Inefficiency of Mimicking Human Interactions
Training AI agents to interact with software via graphical user interfaces (GUIs) such as what Anthropic has done involves complex image recognition, pixel counting, and simulated mouse movements. This not only consumes computational resources but also introduces potential errors. A slight change in the UI layout or an unexpected pop-up can confuse the AI, leading to failures in task execution.
Consider a scenario where an AI assistant needs to update a spreadsheet. Teaching it to navigate menus, click on cells, and input data as a human would is cumbersome. Instead, if the spreadsheet software provided an API for data manipulation, the AI could perform the task more efficiently and reliably.
Building Software for AI Agents
To overcome these inefficiencies, we should shift towards designing software that AI agents can interact with directly. This means extending existing applications or creating new ones with machine-readable interfaces that AI can access without relying on a GUI.
Although, it doesn't have to be REST APIs running locally, it could also be locally installed command line interfaces CLIs, I think the standardization and wide use of REST makes sense to use this structure. Conceivably, there could be a server running in the background and software designed for AI would be registered with the OS on installation and running on that server. So the agents would be able to easily see what functionality it had access to in order to accomplish a user's objective through internal software REST calls as well as using external calls to other APIs. The agents would then have local tools as well as external internet tools. This could be rolled out in various ways so that software that was made to this standard could coexist with traditional software (using the "agent computer use" that Anthropic just released) or software could be released with both versions - the traditional software and software designed for AI.
As I've stated, it doesn't have to be a locally running background server with API endpoints, it could be some other kind of implementation, but there are plenty of examples of software that currently use background REST API calls. For example, you can have locally running versions of PostgreSQL, Gitlab, and Jenkins accessible through their API calls. Another good example is Home Assistant. You can use the Home Assistant API for home automation that can interact with various Home devices controlling lights, thermostats, and other devices through Home Assistant's API.
Example: An AI-Driven Spreadsheet
Here's a specific example of a productivity use case assuming it could be controlled through API endpoints (and Microsoft Excel already has a REST API).
Imagine a spreadsheet application that offers a comprehensive API for data manipulation. An AI agent running locally on a machine could:
- Read Data: Retrieve cell values, ranges, and metadata directly.
- Write Data: Update cells, add formulas, and insert data without GUI interaction.
- Analyze: Perform computations, generate charts, and identify trends through API calls.
The AI wouldn’t need to “see” the spreadsheet; it would understand its structure and content inherently, leading to faster and more accurate task completion. The AI doesn't need to "experience" the interface. It just needs efficient access to software and data.
Transforming Productivity Software
In this paradigm, productivity software undergoes a significant transformation.
Example: Code Development:
Instead of manually writing code, developers could specify functionality and constraints:
- Intent: “Create a function that sorts a list of customer names alphabetically and removes duplicates.”
- AI Agent’s Role: The AI generates the code, tests it, and presents it for review. The agent is capable of looking over the entire code base and design documents.
- User Interaction: The developer reviews the code changes, provides feedback, accepts changes, and iterates as necessary.
Example: Research and Analysis:
Researchers could leverage AI agents to gather information and synthesize insights:
- Intent: “Summarize the latest research on renewable energy storage solutions and prepare a presentation.”
- AI Agent’s Role: The AI collects data from reputable sources, analyzes trends, and generates a presentation. These sources could be local sources or local functionality in other programs as well as internet sources.
- User Interaction: The researcher reviews the content, adjusts focus areas, and finalizes the presentation.
While productivity software would see dramatic changes, entertainment software might remain largely unaffected. Games and media content are inherently human-centric experiences designed for enjoyment and engagement. However, AI could enhance these experiences by personalizing content or managing in-game assets based on user preferences.
An Operating System Designed for AI Agents
Taking this concept further, envision an operating system (OS) specifically designed for AI agents. AI legend Andrej Karpathy has proposed an AI-based operating system centered around large language models (LLMs) equipped with tools. I'm proposing taking this a step beyond that and saying that an OS should be developed that is explicitly AI-centric and agent-centric. This OS wouldn’t just be a platform for running applications but a dynamic environment where AI agents can perform complex operations seamlessly.
While languages like Python and others already allow direct execution of OS commands, an API layer for operating system tasks could offer distinct advantages. By exposing OS functionalities—file management, network communication, process control, and more—through standardized RESTful APIs, such a system would provide language-agnostic access and simplify integration for AI agents built with diverse tools and frameworks.
This AI-centric OS would be more than a platform for executing applications; it would be a dynamic, modular environment tailored for agent-based interactions. By introducing an API layer, the OS could ensure consistent and secure access to its capabilities while abstracting the complexity of direct command execution. AI agents could leverage these APIs to interact with the OS in a predictable, scalable, and maintainable way, unlocking a new level of efficiency.
User Experience in an AI-Centric OS
For human users, interacting with such an OS would be fundamentally different:
- Expressing Intent: Users would convey their objectives verbally or via text. For example, “Create an analysis of the last three months advertising effectiveness - make sure to take into account any competitive trends or exogenous variables.”
- AI Execution: AI agents interpret these intents and execute tasks using the OS’s APIs.
- Feedback and Control: Users receive updates on task progress and can intervene or adjust objectives as needed.
- Output Consumption: Once tasks are completed, users engage with the results - organized files, generated reports, or synthesized research findings.
So Instead of simply equipping large language models (LLMs) with tools, the OS itself becomes a tool-rich environment specifically designed for agents. This architecture transforms the relationship between software, operating systems, and AI, creating a seamless bridge where agents can efficiently perform complex tasks while remaining secure, scalable, and adaptable to future advancements.
The Human-AI Interface: A New UI Paradigm
The user interface in this AI-centric world shifts from direct manipulation to intent expression and result consumption. Human interaction focuses on expressing goals or objectives, which the AI interprets and executes. Importantly, achieving a single human objective often requires multiple interdependent tasks, such as performing data analysis in Excel, creating a presentation in PowerPoint, and sharing the final output via email or communication platforms like Teams or Slack. However, AI agents are not constrained by the need to open or interact with these applications as humans do. In many cases, the AI doesn’t even require the software to be installed. Instead, it performs the necessary actions behind the scenes, generating the final outputs - such as .xlsx, .csv, .pptx, or PDF files—directly. By bypassing traditional application workflows, AI agents streamline the process, delivering results efficiently without the overhead of navigating human-centric software interfaces.
The actual human interface could be reduced down:
- Command/Voice Interface: A simple input field or voice interface where users state their objectives.
- Progress Feedback: Dashboards or notifications that keep users informed about task status.
- Result Display: Outputs are presented in human-friendly formats - documents, visualizations, or actionable summaries.
Benefits of Designing Software for AI Agents
- Efficiency: Eliminates the overhead of GUI navigation by AI agents.
- Reliability: Reduces errors caused by UI changes or unexpected elements.
- Scalability: AI agents can perform tasks faster and handle larger volumes of data.
- User Empowerment: Users focus on defining goals rather than executing steps, enhancing productivity.
Embracing the Paradigm Shift
As AI becomes more integrated into our daily workflows, rethinking software design to accommodate AI agents is not just logical - it’s inevitable. This doesn't need to happen all at once. At first, there could be software with dual uses - human interfaces and AI data/functionality access. But by building software whereby AI communicates directly with data resources and functionality through APIs and an OS designed for AI use, we unlock the full potential of intelligent agents, streamline processes, and create a more efficient partnership between humans and AI agents.
It’s time to start thinking about moving beyond teaching AI to use our tools and start building software and an operating system designed for AI.