There’s a joke about London buses: you wait for ages for one to come along and suddenly three arrive together. (It is supposed to be a joke but it’s also true.)

It’s beginning to feel like that with agent orchestration frameworks. We have solutions from the usual suspects, of course, like LangChain and LlamaIndex, but new products are coming on the market with alarming frequency: CrewAI, MotleyCrew, Autogen, and Atomic Agents are some that I’ve encountered recently but there are more frameworks for which I haven’t even found time to look at the README on GitHub.

Now, from one of the biggest players, OpenAI, comes Swarm. And because it’s from OpenAI, I thought I should take a closer look at it. Swarm takes a step back from the sophistication of other products. Rather than present itself as an all-singing, all-dancing solution, it is an experimental, educational framework developed by OpenAI for exploring lightweight multi-agent orchestration. The aim, it seems, is for you to learn how tool-using multi-agent systems work rather than provide a production-ready solution like the products mentioned above.

Its main purpose is to demonstrate the notions of handoffs and routines that are detailed in OpenAI’s “Orchestrating Agents: Handoffs & Routines” cookbook. These ideas are fundamental to the construction of multi-agent solutions:

  • Routines can be thought of as a set of instructions (which in the context of AI agents, can be represented by a system prompt), the agent that encompasses it, and the tools available to the agent. That may sound like quite a lot of stuff but, in Swarm, these are easily coded.

  • Handoffs are the transfer of control from one agent to another - just like when you phone the bank, the person answering the phone may pass you on to someone more expert in your particular interests. In Swarm different agents will perform different tasks, but, unlike the real world, with Swarm, the new agent has a record of your previous conversations. Handoffs are key to multi-agent systems.

If you master these concepts you are well on the way to developing multi-agent solutions to complex problems.

Shortly, we will explore the construction of tool-using multi-agent apps in Swarm. But first, maybe we should ask the question, why would you want a multi-agent system, anyway?

Why multi-agent systems?

We all know LLMs hallucinate. They are, after all, just systems that work out the next probable token and, while we hope those tokens are generated from relevant material, we can’t guarantee that. We could, and sometimes do, end up with an entirely fictional response to an LLM query.

It seems clear that the more complex the task you set an LLM, the more likely it is to hallucinate, so breaking down that complexity into smaller tasks and assigning them to different agents may result in a more resilient system.

Furthermore, with a multi-agent system, we can assign different LLMs to different agents and assign different tools to different agents. So, if we have an LLM that is particularly good at code generation, we can create one agent that uses that, while another agent might be better at, say, interpreting the natural language of the user. Working together the agents could provide a better application-generating solution than a single agent that was more general-purpose.

Another consideration is that smaller, more specialist LLMs, might run on smaller less specialist hardware thus allowing local processing for free rather than forking out loads of dosh to large industrial mega-companies (just a thought).

But let’s assume that we do want to create a multi-agent system with Swarm. We’ll start with the basics.

Installation

I generally start any new project by creating a virtual environment. I use Conda, so for this project, from a command prompt, I opened a new directory and created and activated a virtual environment called swarm:

mkdir swarmapps
cd swarmapps
conda create -n swarm python=3.12
conda activate swarm

Swarm requires Python 3.10+ (here I’m using 3.12) and can be installed from GitHub using pip:

pip install git+ssh://git@github.com/openai/swarm.git

or

pip install git+https://github.com/openai/swarm.git

If you want to reproduce my code yourself, you’ll also have to install wikipedia as this is used in a Python tool and you will also need to install jupyter to run the code.

I use VSCode for my programming[1] and use the command code . to invoke the editor in the current directory. If you prefer to use Juypter , run jupyter notebook or jupyter lab instead. OK, now we’re up and running.

As I used a Jupyter Notebook to write the program code, to follow along you can just pop each piece of code into a new notebook cell (the complete Notebook will also be available on GitHub).

A first Swarm program

Bear in mind that to run standard Swarm code you need an OpenAI account and API key. The key should be set as an environment variable and will be picked up by Swarm. This means that you will be charged if you run the code below. You shouldn’t rack up much of a bill with this project - I’ve spent less than a dollar on it, so far, but you should always make sure that you monitor your expenditure in the OpenAI dashboard.

Before we get to any working code we will need to import Swarm and create a client (as we would with an OpenAI app).

from swarm import Swarm, Agent
client = Swarm()

The next thing is to create a basic agent. We’ll give it a name: ‘Agent’ and it will be in the variable agent (and, yes, you’re right, not a lot of effort went into coming up with those names, and not much for the instructions, either - I wonder if anyone has tried “You are an unhelpful agent”).

agent = Agent(
    name="Agent",
    instructions="You are a helpful agent.",
)

That is pretty easy. The instructions that we give it will become the system prompt.

Now we can use it.

First, we need to create a message list that we can pass to the agent, which contains the user’s initial message and to which the agent will add its response.

We then invoke the function,client.run(), passing in the message list and the agent. We record the response is the variable response.

Here’s the code.

messages = [{"role": "user", "content": "What is the capital of Portugal"}]
response = client.run(agent=agent, messages=messages)

print(response.messages[-1]["content"])

This is a simple call to the agent, asking a simple question and, as you would expect, getting a simple answer.

The first message is coded as a dictionary with the "role" set to "user" and the "content" set to the question that we want to ask - in this case, we want to know the capital of Portugal. This message is the only item in the list, messages. When we pass the message and the agent to the client.run() function the LLM will append a new message to the list and return a response that includes those messages.

In the last line of the code, we print the content of the last message which is:

The capital of Portugal is Lisbon.

So, no problems there but there is one thing that we notice. Swarm is not as sophisticated as other frameworks. We have to hardcode the messages in a list: there are no helper functions as you might find in other frameworks. This, I think, that this is deliberate. The aim of Swarm is for you to understand what you are doing and not hide the functionality behind too many abstractions.

So, that is the basic stuff done. Now we will explore the more interesting aspects of Swarm.

(As a matter of interest did I try changing the prompt to “You are an unhelpful agent” and it responded with “I cannot assist with that” - fair enough! )

Handoffs

Here is an example of a simple handoff from the Swarm docs[2]. We define two agents one speaks English and the other speaks Spanish. Additionally, we define a tool function (that returns the Spanish agent) which we append to the English agent.

english_agent = Agent(
    name="English Agent",
    instructions="You only speak English.",
)

spanish_agent = Agent(
    name="Spanish Agent",
    instructions="You only speak Spanish.",
)

def transfer_to_spanish_agent():
    """Transfer spanish speaking users immediately."""
    return spanish_agent

english_agent.functions.append(transfer_to_spanish_agent)

Below is an example of talking to the English agent in English.

messages = [{"role": "user", "content": "Hi. How are you?"}]
response = client.run(agent=english_agent, messages=messages)

print(response.messages[-1]["content
"])

And here is the response that is logical and shows no handoff.

Hello! I'm just a computer program, so I don't have feelings, 
but I'm here to help you. How can I assist you today?

But if you talk to the English agent in Spanish.

messages = [{"role": "user", "content": "Hola. ¿Como estás?"}]
response = client.run(agent=english_agent, messages=messages)

print(response.messages[-1]["content"])

The response shows a handoff to the Spanish agent and the response is, of course, in Spanish.

{"assistant": "Spanish Agent"}
¡Hola! Estoy bien, gracias. ¿Y tú, cómo estás?

This is a simple example and the response shows that handoffs work. However, it has to be said that this is not particularly useful as the agent will normally respond in the same language as the query without the need to handoff to a specialised agent. We’ll see a more useful example later.

Tools

We’ve seen a simple tool in the handoff above but let’s define a couple more functions that will be used as tools that perform tasks that the agent cannot.

import wikipedia

def wikipedia_lookup(q: str) -> str:
    """Look up a query in Wikipedia and return the result"""
    try: return wikipedia.page(q).summary
    except: return None

def wikipedia_search(q: str) -> str:
    """Search for a topic in Wikipedia and return the result"""
    return wikipedia.search(q)

Above there are two functions:

  • wikipedia_lookup returns a summary of a Wikipedia page

  • wikipedia_search returns the result of (of course) a Wikipedia search.

Both take strings as input parameters and both return strings. The docstring for each gives a brief description of the function. The type hints and the docstrings are useful for the agent to know how to use the tools.

Here are a couple of examples of how the functions can be used. First, the search function:

topics = wikipedia.search('EU capitals')
topics
['European Capital of Culture',
 'List of national capitals',
 'Religion in the European Union',
 'European Union',
 'Vienna Capitals',
 'European Council on Foreign Relations',
 'Crime in London',
 'Crime in Bucharest',
 'Flag of Europe',
 'Ramona Strugariu']

It returns a list of results. Now let’s look up the Wikipedia page for one of the results with the lookup function:

entry = wikipedia_lookup(topics[3])
print(entry)
The European Union (EU) is a supranational political and 
economic union of 27 member states that are located... 

etc.

The response is only a summary of a Wikipedia page but far too long to include here, so I’ve truncated it but you get the idea.

Now let’s see how we use the tools with an agent.

A tool-using agent

Here we define a new agent that incorporates the tools that we used above.

The agent responds to a query by searching for keywords and looking up a suitable entry in Wikipedia. You can see the detailed instructions in the code.

wikiagent = Agent(
    name="Agent",
    instructions="""You are a helpful agent that answers user queries by 
                    finding and analysing information from Wikipedia.
                    You should follow the following sequence:
                      1. Work out what the user is interested in.
                      2. Pick keywords
                      3. Use the lookup tool with the most relevant 
                         keyword
                      4. From the resulting list of results pick the most 
                         relevant to the user query and search for it 
                         using the search tool 
                      5. If you are able provide an answer from that 
                         information, stop and answer, otherwise, start 
                         again from step 2 but with different keywords. 
                  """,
    functions=[wikipedia_lookup, wikipedia_search],
)

Notice that we have added the functions directly to the agent definition rather than appending them as before.

The agent should use lookup and search to answer the query and then present an answer. The aim of the next prompt is to get the agent thinking. It needs to find the largest city in Scotland and then what the population of that city is.

messages = [{"role": "user", "content": "What is the population of the largest city in Scotland"}]

response = client.run(agent=wikiagent, messages=messages)
print(response.messages[-1]["content"])

The response is less than perfect but, as we shall see, it is still a good demonstration of how the tools are used.

Glasgow is the largest city in Scotland by population, but the specific 
population number isn't provided in the summary. 
You might need to check the most recent data from official sources such 
as the National Records of Scotland for the exact number.

We could probably do a little better by adjusting the user and system prompts. But the interesting thing is the process. Here are all the messages:

Tool call: [{'id': 'call_DkOUR7CXi3Zr3nLvEESaG96N', 
             'function': {'arguments': '{"q":"Largest city in Scotland"}',
             'name': 'wikipedia_lookup'}, 'type': 'function'}]
Content: None
Content: This list of towns and cities in Scotland with a population of more than 15,000 is ordered by population, as defined and compiled by the National Records of Scotland organisation. Glasgow is the largest city in Scotland by population, whilst the capital city, Edinburgh, is the second largest by population and largest by area (although the Aberdeen and Edinburgh local authority areas contain far more undeveloped land and have a lower population density than the council areas of Dundee and Glasgow; these are the only four city-districts in the country). The city of Stirling has the smallest population amongst Scotland's cities, with an estimated population of just over 37,000 inhabitants. In total, Scotland consists of eight cities, with multiple larger towns, the largest town being Paisley. 
The section "Localities" contains a list of basic populated areas ordered by population. The section "Settlements" is a list of populated urban areas, some of which are composed of more than one locality, and which may span across the boundaries of more than one council area.
All localities are either settlements themselves, or contained within larger settlements. As of 2020, there are 656 localities in Scotland, and 514 settlements (i.e. 142 of the localities combine as elements of larger settlements).
Tool call: [{'id': 'call_FQ5FjV3SvB8mShL56sn7fK35', 
             'function': {'arguments': '{"q":"Glasgow population"}', 
             'name': 'wikipedia_search'}, 
             'type': 'function'}]
Content: None
Content: Glasgow
Tool call: [{'id': 'call_dIrT5i9e9YeLK083yCoLPwe0', 
             'function': {'arguments': '{"q":"Glasgow demographics"}', 
             'name': 'wikipedia_search'}, 
             'type': 'function'}]
Content: Glasgow is the largest city in Scotland by population. To find the exact population size, I'll check the details for Glasgow.
Content: Demographics of Glasgow
Tool call: [{'id': 'call_h2i0ckryAaPW16vUnEpJQzW9', 
             'function': {'arguments': '{"q":"Demographics of Glasgow"}',
             'name': 'wikipedia_lookup'}, 
             'type': 'function'}]
Content: None
Content: Glasgow is the most populous city in Scotland and the fourth most populous city in the United Kingdom.

While we haven’t got the answer we were hoping for we can clearly see how the agent uses the different tools in its attempt to work its way to an answer.

Next, we will look at a workflow that demonstrates handoff between agents (and which is a little more successful).

A simple workflow with one agent handing off to another

Now we will design a system that does work well. It looks up information about a location from Wikipedia and then passes this information on to a Public Relations agent whose job is to produce a short PR briefing on that location.

In addition, we ask the PR agent to explain how it constructed its response from the original Wikipedia material (this is to make sure that the PR agent really is using the data from the original lookup).

Below, we see the definition of two agents: the first looks up a location, and the second performs the PR task. So that the first agent can handoff to the PR agent, we have defined a tool transfer_to_pr_agent and included that in the first agent’s definition.

def transfer_to_pr_agent():
    return pr_agent

agent = Agent(
    name="Agent",
    instructions="""You are a helpful agent that answers user queries by 
                    finding and analysing information from Wikipedia.
                    You will be given a city or other location and you 
                    must retrieve it's entry on Wikipedia and then hand 
                    over to the PR Agent.""",
    functions=[wikipedia_lookup, transfer_to_pr_agent],
)

pr_agent = Agent(
    name="PR Agent",
    instructions="""You are an experienced PR copywriter. 
                    Use only the information retrieved from Wikipedia to 
                    write an enthusiastic 100-word summary of the topic 
                    that would be suitable for a promotional campaign.
                    Explain how you used the original material to create 
                    the result""",
)

We use it thus:

messages = [{"role": "user", "content": "Paris"}]

response = client.run(agent=agent, messages=messages)
print(response.messages[-1]["content"])

And the response is an upbeat puff piece for Paris. Following that we have the description of how the PR agent used the Wikipedia text to create its result.

Welcome to Paris, the breathtaking capital of France, 
where romance dances in the air and culture knows no 
bounds! As the heart of finance, fashion, and gastronomy 
since the 17th century, Paris shines as the iconic \"City 
of Light,\" home to 2.1 million welcoming residents and a
treasury of art and history. With world-renowned landmarks
like the Louvre, the majestic Seine, and a vibrant arts 
scene, Paris captivates 12 million visitors in its embrace. 
Experience seamless connections through thriving transport 
networks, indulge in epicurean delights, and feel the 
pulse of creativity and innovation in this unparalleled 
metropolis. Paris awaits your discovery!

---

I crafted this exciting summary by highlighting key 
aspects from the Wikipedia entry on Paris. The content 
emphasized the city's historic importance as a cultural, 
financial, and gastronomical hub while accentuating its 
renowned symbols like the Louvre and the Seine. The 
population and importance of its transport system, 
alongside the notion of Paris being a vibrant visitor 
destination, were weaved into engaging phrases to capture 
the unique essence of Paris as a global icon.

If we look at the complete list of messages we can see the handoff (I’ve truncated some of the output for brevity).

Tool call: [{'id': 'call_5KAWDJeA6vDx0gwPPyFPM0et', 
             'function': {'arguments': '{"q":"Paris"}', 
             'name': 'wikipedia_lookup'}, 
             'type': 'function'}]
Content: None
Content: Paris (French pronunciation: [paʁi] ) is the capital and...

Tool call: [{'id': 'call_ahCllvYoiZR2RXCoTszlHlMO', 
             'function': {'arguments': '{}', 
             'name': 'transfer_to_pr_agent'}, 
             'type': 'function'}]
Content: None
Content: {"assistant": "PR Agent"}
Tool call: None
Content: Welcome to Paris, the breathtaking capital of France...

---

I crafted this exciting summary by highlighting key aspects from...

You can see that this agent system works well. There is the original lookup from Wikipedia followed by a handoff to the PR agent which then produces the result required.

What more could you ask for? (I can see PR directors rubbing their hands with glee and considering replacing their expensive executives with simple LLM prompts!)

Conclusion

In an attempt to keep things simple, we have used examples that are not particularly functional but demonstrate the processes involved in multi-agent systems. Leading on from that, we learn from this exercise that, unlike programming which is precise and deterministic, prompting an LLM is not and, as with the tool-using example, we do not necessarily get the result that we want from the prompt that we design.

Nevertheless, I think that we can see how the fundamental elements of routines (agents, prompts and tools) and handoffs work in Swarm to enable the construction of a useful multi-agent system.

I hope this has been a useful introduction to Swarm. There is much more to it than would fit into an article like this and I would encourage you to look at the docs on GitHub[2] for more details and examples.


As ever, thanks for reading, I hope that this quick run through Swarm has been useful. You can see more of my articles on my website and can find out when I publish new stuff by subscribing to my occasional newsletter. Most of my stuff is also on Medium (paywalled).

The code for this article can be found in this GitHub repo.

Notes and references

  1. My personal preference is to use VSCode because I can edit different types of file, Notebooks, Python programs and Markdown files, for example. VSCode also allows me to easily push my code to my GitHub repository. I’ve not kept my eye on Jupyter developments but I realise, now, that you can do all this with Juypter Lab, too, (with the right extensions) but I haven’t explored this as an option, yet.
  2. You can find the Swarm documentation on GitHub. Swarm is MIT-licensed.
  3. Disclaimer: I have no commercial connection with any company mentioned in this article.