Common ChatGPT API Function Call Mistakes and How to Fix Them
Learn how to handle function name hallucinations, function call chaining and more accurate function call arguments
The function call feature in OpenAI's ChatGPT APIs is a powerful tool, enabling us to perform some impressive tasks. If you are not familiar with function calls in the ChatGPT APIs, you may want to refer to my previous article on this topic. In that piece, I provided numerous examples of function calls and a comprehensive tutorial on building a Flask-based chat application with ChatGPT-like browsing and code interpreter plugins. Check it out.
This article serves as a follow-up to the previous one, discussing some of the unexpected challenges that can arise when developing with the ChatGPT function call APIs. Summary of what’s covered in this post:
Getting the model to generate more accurate function call arguments
Getting the model to generate a chain of function calls
Handling function and argument name hallucinations
Let’s dive right in!
The code examples shown in this article are available on GitHub under an MIT license.
Quick Overview of Function Calling
Before we get into the potential issues, let's briefly look at an example using the function call feature of ChatGPT APIs. If you are familiar with how function calling works, feel free to skip to the next section.
Suppose we want the model to provide the weather for tomorrow in London, and we pass the function get_current_weather
, which takes the location name as a parameter. An example request could look like the following (adapted from the OpenAI docs):
messages = [{"role": "user",
"content": "What's the weather like in Boston?"}]
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state",
}
},
"required": ["location"],
},
}
]
And the model's response might contain a function call like this:
{
"index": 0,
"message": {
"role": "assistant",
"content": None,
"function_call": {
"name": "get_current_weather",
"arguments": "{\n "location": "London"}"
}
},
"finish_reason": "function_call"
}
In response to this function call request from the model, we must send back the result of this function call, which goes back as part of the messages. The role should be "function
", and we also need to provide the name of the function, as shown below:
def get_weather(location: str) -> Dict:
return {"temperature": 30, "conditions": ["windy", "cloudy"]}
weather_info = get_weather(location)
messages.append(Message(role = "function",
content = json.dumps(weather_info),
name = "get_current_weather"))
# send the updated messages back to ChatGPT
Now that we've seen an example of using the function call parameter in the chat APIs, let's discuss some of the potential challenges when using this feature.
Treat Function Descriptions as User Prompts
The OpenAI documentation states that the list of functions passed to the API is part of the context provided to the ChatGPT model. That means the descriptions of these functions should be written like prompts, informing the model in detail about the situations in which it can and should call the function. Let's understand this with the help of an example where we provide a python code interpreter function to the model.
from typing import List, Dict
import openai
import requests
from pprint import pprint
GPT_MODEL = "gpt-3.5-turbo-0613"
SYSTEM_PROMPT = """
You are a helpful AI assistant. You answer the user's queries.
When you are not sure of an answer, you take the help of
functions provided to you.
NEVER make up an answer if you don't know, just respond
with "I don't know" when you don't know.
Ask clarifying questions when you need more information
"""
functions = [
{
"name": "python",
"description": "Executes python code and returns value printed on stdout",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "Python code which needs to be executed"
}
}
}
}
]
def _chat_completion_request(messages) -> Dict:
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer " + openai.api_key,
}
body = {
"model": GPT_MODEL,
"messages": messages,
"functions": functions,
"temperature": 0.7
}
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers=headers,
json=body,
)
return response.json()["choices"][0]["message"]
messages: List[Dict] = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "what is today's date?"}
]
response = _chat_completion_request(messages)
if (response.get('function_call')):
pprint(response.get('function_call'))
else:
pprint(response)
We have hard-coded a user message to ask for today's date. The function's description is very simple: "Executes Python code and returns the value printed on stdout.
" Essentially, this means that the ChatGPT model should generate Python code that prints its value on stdout
in order for this function to return the code's output. Let's see if the model follows the instructions as intended:
{'arguments': '{\n'
'"code": "import datetime\\n\\ndate = '
'datetime.date.today()\\ndate"\n'
'}',
'name': 'python'}
As you can see, even though it has generated a function call to our function, the code is not in the expected format. It does not print the value of the date variable on stdout
, which goes against our expectations.
The Solution
As stated previously, we need to treat the description as part of the prompt instruction provided to the model so that it does not surprise us by doing its own thing. Let’s see how we could modify the description for the python code interpreter function.
functions = [
{
"name": "python",
"description": """
Read the value printed by the given python code.
The code SHOULD explicitly call print so that
this function returns the output.
For example: "import math; print(math.pi)"
is correct
But "import math; math.pi" is incorrect because
it doesn't print the value
""",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "Python code which needs to be executed"
}
}
}
}
]
This is a much more detailed description. Let’s run the program and see if ChatGPT does the right thing now.
{'arguments': '{\n "code": "import datetime; print(datetime.date.today())"\n}',
'name': 'python'}
This looks much better, the value is being printed as we asked.
Send Prompt-like Instructions in Function Call Response
The general theme around using the function call feature is that every metadata element should be treated as an instructional prompt. If this isn't done, we risk the model interpreting things incorrectly.
For instance, suppose we have two functions: one performs a web search and returns a list of matching URLs, while the other scrapes the content of a webpage at a given URL. When we ask the model for information about Leonardo DiCaprio's current girlfriend, we expect it to perform a web search and then call the scraper function to extract information from the resulting URLs. However, the model may not proceed as expected without additional hints. Let’s see this in code:
from typing import List, Dict
import openai
import requests
from pprint import pprint
import json
GPT_MODEL = "gpt-3.5-turbo-0613"
SYSTEM_PROMPT = """
You are a helpful AI assistant. You answer the user's queries.
When you are not sure of an answer, you take the help of
functions provided to you.
NEVER make up an answer if you don't know, just respond
with "I don't know" when you don't know.
Ask clarifying questions when you need more information
"""
functions = [
{
"name": "web_search",
"description": "Does a web search and return list of URLs of top 10 pages",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "user query"
}
}
}
},
{
"name": "web_scraper",
"description": "Scrapes content at the given URL",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL of the web page to be scraped"
}
}
}
},
]
def _chat_completion_request(messages) -> Dict:
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer " + openai.api_key,
}
body = {
"model": GPT_MODEL,
"messages": messages,
"functions": functions,
"temperature": 0.7
}
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers=headers,
json=body,
)
return response.json()["choices"][0]["message"]
messages: List[Dict] = [
{"role": "system",
"content": SYSTEM_PROMPT
},
{"role": "user",
"content": "who is Leonardo Dicaprio's current girlfriend?"
}
]
response = _chat_completion_request(messages)
if response.get('function_call'):
func_call = response.get('function_call')
func_name = func_call.get('name')
if func_name != 'web_search':
raise Exception(f'Unsupported function name {func_name}')
print('Executing web search')
websearch_response = {'urls': ['https://hollywood.com/leonardo']}
messages.append({'role': 'function',
'content': json.dumps(websearch_response),
'name': 'websearch'})
next_response = _chat_completion_request(messages)
pprint(next_response)
else:
pprint(response)
Most of the boilerplate is the same as the example from the previous section. I have highlighted the changed parts in bold. Here's an explanation of the changes:
We have modified the functions. We now pass two functions to ChatGPT. One is responsible for performing a web search, which returns a list of matching URLs, and the other is for scraping the content of a web page given a URL.
We have also updated the user message. In this example, we are asking the model for information about Leonardo DiCaprio's current girlfriend. Since we are seeking current information, we expect the model to perform a web search.
After receiving a response from ChatGPT, we check if it triggered a function call. If it did, we generate a hypothetical result in the form of a URL. We add this as another message with the role set as '
function
' and make another call to ChatGPT. Our expectation is that ChatGPT, upon receiving these URLs, will make another function call to scrape the content from those URLs. This is necessary to obtain the actual text and enable the model to figure out the answer to the question.
Let's run the code and observe if the model generates a call to the scraper function or not.
{'content': "I'm sorry, but I couldn't find any information about Leonardo DiCaprio's current girlfriend.",
'role': 'assistant'}
Well, it didn’t go as we expected it to. Looks like we need to hint ChatGPT into calling the scraper function.
The Solution
Again, the fix is relatively simple. We have to add more instructions in our function call response so that ChatGPT knows that it needs to call another function. Let’s see the code:
messages: List[Dict] = [
{"role": "system",
"content": SYSTEM_PROMPT
},
{"role": "user",
"content": "who is Leonardo Dicaprio's current girlfriend?"
}
]
response = _chat_completion_request(messages)
if response.get('function_call'):
func_call = response.get('function_call')
func_name = func_call.get('name')
if func_name != 'web_search':
raise Exception(f'Unsupported function name {func_name}')
websearch_response = {'urls':
['https://hollywood.com/leonardo'],
'message': 'Scrape these URLs to get the text'
}
messages.append({'role': 'function',
'content': json.dumps(websearch_response),
'name': 'websearch'})
next_response = _chat_completion_request(messages)
pprint(next_response)
else:
pprint(response)
The highlighted part shows the change. We have simply added a message in the result for the web search function saying that the URLs need to be scraped. Running the code now will show that the model now generates a 2nd function call to the scraper function.
{'content': None,
'function_call': {
'arguments': '{\n"url": "https://hollywood.com/leonardo"\n}',
'name': 'web_scraper'},
'role': 'assistant'}
Handle Function and Argument Name Hallucinations
One issue you might encounter with the GPT-3.5 version of the chat model is hallucinating function names, and sometimes argument names, when generating a function call. This problem can be more pronounced when providing a larger list of functions. We will take the Python code interpreter function example again from the first section. Here is the complete code once more:
functions = [
{
"name": "web_search",
"description": "Does a web search and return list of URLs of top 10 pages",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "user query"
}
}
}
},
{
"name": "web_scraper",
"description": "Scrapes content at the given URL",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL of the web page to be scraped"
}
}
}
},
{
"name": "python_interpreter",
"description": "Executes python code and returns value printed on stdout",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "Python code which needs to be executed"
}
}
}
}
]
def _chat_completion_request(messages) -> Dict:
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer " + openai.api_key,
}
body = {
"model": GPT_MODEL,
"messages": messages,
"functions": functions,
"temperature": 0.7
}
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers=headers,
json=body,
)
return response.json()["choices"][0]["message"]
messages: List[Dict] = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "what is today's date?"}
]
response = _chat_completion_request(messages)
if (response.get('function_call')):
pprint(response.get('function_call'))
else:
pprint(response)
I have changed the function name from “python
” to “python_interpreter
” and added couple of more functions in the list. It appears the model is more likely to make an error in function name when you pass a bigger list of functions.
On running this code, you might see ChatGPT sometimes generate function call for a function named “python
” instead of “python_interpreter
”. I’m going to give it a try.
{'arguments': 'import datetime\n\ntoday = datetime.date.today()\ntoday',
'name': 'python'}
It might take several attempts for this to happen. However, in one of my other projects this happens almost always, your luck may vary.
You should also notice that the model has messed up the arguments
JSON as well. It gave the python code directly as the value of the “arguments”, as opposed to giving it as a key-value pair of argument name and its value, i.e.
"arguments": {"code": "<python code>"}
The Solution
The fix I have for this problem is to guide ChatGPT to retry with the correct function name. There are two possible ways to provide this guidance:
One option is to send another message to ChatGPT with the role "function" and a message asking it to retry with one of the valid function names.
The second option is to send a user message to ChatGPT where we explicitly ask it to retry with one of the valid function names.
I discovered that although the first technique works, it is not reliable, and the model may persistently generate incorrect function names. However, the second option works more reliably. This is because it resembles an instruction directly from the user, and it appears that ChatGPT is better at following user instructions than instructions within a function call result.
messages: List[Dict] = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "what is today's date"}
]
response = _chat_completion_request(messages)
if (response.get('function_call')):
func_name = response.get('function_call').get('name')
if func_name != 'python_interpreter':
print(f'Invalid function name {func_name}')
messages.append({'role': 'user',
'content': """
Retry with one of the available functions:
[websearch, web_scraper, python_interpreter]
"""
})
next_response = _chat_completion_request(messages)
pprint(next_response)
else:
pprint(response.get('function_call'))
else:
pprint(response)
Here’s what we changed:
We added a check for function name called by the model
If the model calls a function that we don't recognize, we make another call to ChatGPT where we send a user message asking it to retry with one of the valid function names.
And voila, it works!
Invalid function name python
{'content': None,
'function_call': {'arguments': 'import datetime\n'
'\n'
"# Get today's date\n"
'today = datetime.date.today()\n'
'\n'
'today',
'name': 'python_interpreter'},
'role': 'assistant'}
Be Cautious of ChatGPT’s Training Data Cutoff
This is a minor concern. Since ChatGPT's training data cutoff date is in 2021, there is a possibility that it may generate incorrect values or information when functions require present-day parameter values unless explicitly handled. For instance, if you ask it to query analytics data for the last 3 days, where the function requires start_date
and end_date
parameters, it might generate dates from 2021 or 2022. In such cases, it is advisable to provide a function for handling dates or include the date in the prompt. Similarly, there may be other scenarios where ChatGPT needs to query the web for the latest information, and in the absence of a web search function, it may generate inaccurate results.
Closing Thoughts
These were some of the issues I encountered while working on my toy ChatGPT app with plugin support. It's possible that there are more corner cases that I haven't come across yet. If you are aware of any, please share them in the comments along with how you resolved them.
It's worth noting that most of these issues are specific to GPT-3.5
and not GPT-4
. However, many users will continue to use GPT-3.5
due to its cost-effective pricing, so knowing how to address these problems will be valuable.
I hope you find these insights helpful. If you have any tips for better solutions or avoiding these issues altogether, please share them in the comments. Thank you for reading.
Code and Resources
All the code shown in this article is available in a Github Repo under an MIT license. Feel free to use it.