Build an AI Agent from Scratch

Bhavik Jikadara
AI Agent Insider

Intelligent agents have become increasingly crucial, especially in AI-driven applications. Whether you’re developing chatbots, automated customer service, or complex decision-making systems, understanding how to build an agent that can select and use the right tools autonomously is a valuable skill.

In this blog, we’ll explore how to build an agent from scratch using Python. This agent will be capable of making decisions based on user input, selecting appropriate tools, and executing tasks accordingly. Let’s dive in!

What is an Agent?

An agent is an autonomous entity capable of perceiving its environment, making decisions, and taking actions to achieve specific goals. Agents can vary in complexity from simple reactive agents that respond to stimuli to more advanced intelligent agents that learn and adapt over time. Common types of agents include:

  1. Reactive Agents: Respond directly to environmental changes without internal memory.
  2. Model-Based Agents: Use internal models of the world to make decisions.
  3. Goal-Based Agents: Plan actions based on achieving specific goals.
  4. Utility-Based Agents: Evaluate potential actions based on a utility function to maximize outcomes.

Examples include chatbots, recommendation systems, and autonomous vehicles, each utilizing different types of agents to perform tasks efficiently and intelligently.

For more details, you can read the full article

The core components of our agent are:

  • Models: The brain of the agent, responsible for processing the input and generating responses.
  • Tools: Pre-defined functions that the agent can execute based on the user’s request.
  • Toolbox: A collection of tools the agent has at its disposal.
  • System Prompt: The instruction set that guides the agent on how to handle user input and choose the right tools.

Implementation

Now, let’s roll up our sleeves and start building!

Perquisites

Before diving into building an intelligent agent, it’s essential to meet a few prerequisites to ensure a smooth development process. Here’s what you’ll need:

Step 1: Setting Up the Environment

To start, ensure you have Python installed. You’ll also need to install some essential libraries. For this tutorial, we’ll be using requests, json, and termcolor. Additionally, we will use dotenv to manage environment variables.

pip install requests termcolor python-dotenv

Step 2: Defining the Model Class

The first thing we need is a model that will process user input. We’ll create an OllamaModel class, which interacts with a local API to generate responses.

Here’s a basic implementation:

from termcolor import colored
import os
from dotenv import load_dotenv
load_dotenv()
### Models
import requests
import json
import operator
class OllamaModel:
def __init__(self, model, system_prompt, temperature=0, stop=None):
"""
Initializes the OllamaModel with the given parameters.

Parameters:
model (str): The name of the model to use.
system_prompt (str): The system prompt to use.
temperature (float): The temperature setting for the model.
stop (str): The stop token for the model.
"""

self.model_endpoint = "http://localhost:11434/api/generate"
self.temperature = temperature
self.model = model
self.system_prompt = system_prompt
self.headers = {"Content-Type": "application/json"}
self.stop = stop

def generate_text(self, prompt):
"""
Generates a response from the Ollama model based on the provided prompt.

Parameters:
prompt (str): The user query to generate a response for.

Returns:
dict: The response from the model as a dictionary.
"""

payload = {
"model": self.model,
"format": "json",
"prompt": prompt,
"system": self.system_prompt,
"stream": False,
"temperature": self.temperature,
"stop": self.stop
}

try:
request_response = requests.post(
self.model_endpoint,
headers=self.headers,
data=json.dumps(payload)
)

print("REQUEST RESPONSE", request_response)
request_response_json = request_response.json()
response = request_response_json['response']
response_dict = json.loads(response)

print(f"\n\nResponse from Ollama model: {response_dict}")

return response_dict
except requests.RequestException as e:
response = {"error": f"Error in invoking model! {str(e)}"}
return response

This class initializes with the model’s name, system prompt, temperature, and stop token. The generate_text method sends a request to the model API and returns the response.

Step 3: Creating Tools for the Agent

The next step is to create tools that our agents can use. These tools are simple Python functions that perform specific tasks. Here’s an example of a basic calculator and a string reverser:

def basic_calculator(input_str):
"""
Perform a numeric operation on two numbers based on the input string.

Parameters:
input_str (str): A JSON string representing a dictionary with keys 'num1', 'num2', and 'operation'. Example: '{"num1": 5, "num2": 3, "operation": "add"}' or "{'num1': 67869, 'num2': 9030393, 'operation': 'divide'}"

Returns:
str: The formatted result of the operation.

Raises:
Exception: If an error occurs during the operation (e.g., division by zero).
ValueError: If an unsupported operation is requested or input is invalid.
"""

# Clean and parse the input string
try:
# Replace single quotes with double quotes
input_str_clean = input_str.replace("'", "\"")
# Remove any extraneous characters such as trailing quotes
input_str_clean = input_str_clean.strip().strip("\"")

input_dict = json.loads(input_str_clean)
num1 = input_dict['num1']
num2 = input_dict['num2']
operation = input_dict['operation']
except (json.JSONDecodeError, KeyError) as e:
return str(e), "Invalid input format. Please provide a valid JSON string."

# Define the supported operations
operations = {
'add': operator.add,
'subtract': operator.sub,
'multiply': operator.mul,
'divide': operator.truediv,
'floor_divide': operator.floordiv,
'modulus': operator.mod,
'power': operator.pow,
'lt': operator.lt,
'le': operator.le,
'eq': operator.eq,
'ne': operator.ne,
'ge': operator.ge,
'gt': operator.gt
}

# Check if the operation is supported
if operation in operations:
try:
# Perform the operation
result = operations[operation](num1, num2)
result_formatted = f"\n\nThe answer is: {result}.\nCalculated with basic_calculator."
return result_formatted
except Exception as e:
return str(e), "\n\nError during operation execution."
else:
return "\n\nUnsupported operation. Please provide a valid operation."

def reverse_string(input_string):
"""
Reverse the given string.

Parameters:
input_string (str): The string to be reversed.

Returns:
str: The reversed string.
"""

# Reverse the string using slicing
reversed_string = input_string[::-1]

reversed_string = f"The reversed string is: {reversed_string}\n\n.Executed using the reverse_string function."
# print (f"DEBUG: reversed_string: {reversed_string}")
return reversed_string

These functions are designed to perform specific tasks based on the input provided. The basic_calculator handles arithmetic operations, while reverse_string reverses a given string.

Step 4: Building the Toolbox

The ToolBox class stores all the tools the agent can use and provides descriptions for each:

class ToolBox:
def __init__(self):
self.tools_dict = {}

def store(self, functions_list):
"""
Stores the literal name and docstring of each function in the list.

Parameters:
functions_list (list): List of function objects to store.

Returns:
dict: Dictionary with function names as keys and their docstrings as values.
"""

for func in functions_list:
self.tools_dict[func.name] = func.doc
return self.tools_dict

def tools(self):
"""
Returns the dictionary created in store as a text string.

Returns:
str: Dictionary of stored functions and their docstrings as a text string.
"""

tools_str = ""
for name, doc in self.tools_dict.items():
tools_str += f"{name}: \"{doc}\"\n"
return tools_str.strip()

This class will help the agent understand which tools are available and what each one does.

Step 5: Creating the Agent Class

The agent needs to think, decide which tool to use, and execute it. Here’s the Agent class:

agent_system_prompt_template = """
You are an agent with access to a toolbox. Given a user query,
you will determine which tool, if any, is best suited to answer the query.
You will generate the following JSON response:

"tool_choice": "name_of_the_tool",
"tool_input": "inputs_to_the_tool"

tool_choice: The name of the tool you want to use. It must be a tool from your toolbox or "no tool" if you do not need to use a tool.
tool_input: The specific inputs required for the selected tool. If no tool, just provide a response to the query.
Here is a list of your tools along with their descriptions:
{tool_descriptions}

Please make a decision based on the provided user query and the available tools.
"""
class Agent:
def __init__(self, tools, model_service, model_name, stop=None):
"""
Initializes the agent with a list of tools and a model.

Parameters:
tools (list): List of tool functions.
model_service (class): The model service class with a generate_text method.
model_name (str): The name of the model to use.
"""

self.tools = tools
self.model_service = model_service
self.model_name = model_name
self.stop = stop

def prepare_tools(self):
"""
Stores the tools in the toolbox and returns their descriptions.

Returns:
str: Descriptions of the tools stored in the toolbox.
"""

toolbox = ToolBox()
toolbox.store(self.tools)
tool_descriptions = toolbox.tools()
return tool_descriptions

def think(self, prompt):
"""
Runs the generate_text method on the model using the system prompt template and tool descriptions.

Parameters:
prompt (str): The user query to generate a response for.

Returns:
dict: The response from the model as a dictionary.
"""

tool_descriptions = self.prepare_tools()
agent_system_prompt = agent_system_prompt_template.format(tool_descriptions=tool_descriptions)

# Create an instance of the model service with the system prompt

if self.model_service == OllamaModel:
model_instance = self.model_service(
model=self.model_name,
system_prompt=agent_system_prompt,
temperature=0,
stop=self.stop
)
else:
model_instance = self.model_service(
model=self.model_name,
system_prompt=agent_system_prompt,
temperature=0
)

# Generate and return the response dictionary
agent_response_dict = model_instance.generate_text(prompt)
return agent_response_dict

def work(self, prompt):
"""
Parses the dictionary returned from think and executes the appropriate tool.

Parameters:
prompt (str): The user query to generate a response for.

Returns:
The response from executing the appropriate tool or the tool_input if no matching tool is found.
"""

agent_response_dict = self.think(prompt)
tool_choice = agent_response_dict.get("tool_choice")
tool_input = agent_response_dict.get("tool_input")

for tool in self.tools:
if tool.name == tool_choice:
response = tool(tool_input)

print(colored(response, 'cyan'))
return
# return tool(tool_input)

print(colored(tool_input, 'cyan'))

return

This class has three main methods:

  • prepare_tools: Stores and returns the descriptions of the tools.
  • think: Decide which tool to use based on the user prompt.
  • work: Executes the chosen tool and returns the result.

Step 6: Running the Agent

Finally, let’s put everything together and run our agent. In your script’s main section, initialize the agent and start accepting user input:

# Example usage
if __name__ == "__main__":

tools = [basic_calculator, reverse_string]

# Uncoment below to run with OpenAI
# model_service = OpenAIModel
# model_name = 'gpt-3.5-turbo'
# stop = None

# Uncomment below to run with Ollama
model_service = OllamaModel
model_name = "llama3.1"
stop = "<|eot_id|>"

agent = Agent(tools=tools, model_service=model_service, model_name=model_name, stop=stop)

while True:
prompt = input("Ask me anything: ")
if prompt.lower() == "exit":
break

agent.work(prompt)

Conclusion

Building an agent from scratch is a rewarding experience. It gives you a deep understanding of how intelligent systems can be constructed to make decisions and perform tasks autonomously. By following this guide, you’ve learned how to build a simple yet powerful agent capable of selecting and executing tools based on user input.

For further reading and more advanced implementations, consider exploring Reinforcement Learning (RL) and other AI techniques that can be integrated into your agents.

Don’t hesitate to comment if you have any questions. Your feedback is greatly appreciated and will help me understand the things I am lacking & feel free to reach out or connect with me via my LinkedIn page. Cheers!

Happy coding! 🎉

References

[1] How to install python in Windows 11?

[2] The Essential Git Cheat Sheet for Beginners

[3] Step-by-Step Guide to Building Your Large Language Models (LLMs)

[4] Llama 3.1: Meta’s Advanced Open-Source AI Model

[5] AI Agent and LLM Limitations, tools, and Challenges

[6] Prompt Engineering: Key Concepts, Examples, and Tips for Effective AI Utilization

Responses (3)

What are your thoughts?