Why Numeric Responses Alone Fall Short for Function Calling in LLMs

One of the common “getting started” examples for function calling in LLMs involves retrieving the current weather. This example is popular because it clearly demonstrates the benefits of function calling. It shows how LLMs can access real-time data to provide insights or perform actions on behalf of users.

Most of the examples available today, however, focus on how to define these functions for the underlying model and how they are triggered or called. For example, the API reference for OpenAI chat completions demonstrates how a model can enter “function calling” mode with a single request by doing:

curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "What'\''s the weather like in Boston today?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}'

Given that the user is asking for the weather in a specific location, Boston in this case, and the model has a function available to provide the weather based on a location, the API will respond with the following:

{
  "id": "chatcmpl-9gEWEuIwKb4yebo2GMvgn79CBKuTc",
  "object": "chat.completion",
  "created": 1719852614,
  "model": "gpt-4o-2024-05-13",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_vWAWNmgBGrIvsOEBYZa26yL4",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Boston, MA\"}"
            }
          }
        ]
      },
      "logprobs": null,
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 80,
    "completion_tokens": 17,
    "total_tokens": 97
  },
  "system_fingerprint": "fp_ce0793330f"
}

Note that, compared to regular chat messages, this time there is no message content available in the response object. In fact, the content is null. However, there is a tool_calls key containing a list of calls that the user must evaluate or execute on a host and then provide answers back to the model. This part is as critical as making the model successfully call a function, depending on the data type of the response.

The Traditional Method for Handling Responses

One might assume that since the function definition requires two arguments, location and unit, the model would handle a numeric response like 25. Additionally, the model can infer the temperature unit, either Celsius or Fahrenheit, to provide a more accurate response. Nonetheless, this isn’t what actually happens if you provide a pure numeric response from the call above. For example, if you do:

curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "What'\''s the weather like in Boston today?"
    },
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {
          "id": "call_vWAWNmgBGrIvsOEBYZa26yL4",
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "arguments": "{\"location\":\"Boston, MA\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "25",
      "tool_call_id": "call_vWAWNmgBGrIvsOEBYZa26yL4"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}'

You might receive a response from the model indicating that it cannot provide an answer and suggesting you find a reliable source for the weather, as shown below:

{
  "id": "chatcmpl-9gEs51XCa730aFm2ZtLndwcngGJuM",
  "object": "chat.completion",
  "created": 1719853969,
  "model": "gpt-4o-2024-05-13",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I currently don't have access to the specific weather details for Boston today. Please check a reliable weather website or app for the latest updates."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 107,
    "completion_tokens": 28,
    "total_tokens": 135
  },
  "system_fingerprint": "fp_4008e3b719"
}

Interestingly, 25 is a valid temperature for Boston, whether in Celsius or Fahrenheit. Therefore, this isn’t the experience anyone would want to provide for their assistant. In fact, performing a temperature sweep from 0 to 100 in 10 step increments results in only a 50% success rate:

Temperature	Response	Pass/Fail
0	I currently don’t have access to real-time data. To find out the weather in Boston today, you can check a reliable weather website or app such as Weather.com, AccuWeather, or your local news station’s website. If you have any other questions or need assistance with something else, feel free to ask!	Fail
10	I’m unable to retrieve the current weather for Boston at the moment. You can check a reliable weather website or app for the most up-to-date information. Is there anything else I can assist you with?	Fail
20	I’m unable to provide real-time weather updates at the moment. You can check a reliable weather website or app for the most current information on Boston’s weather today.	Fail
30	I’m unable to provide real-time weather updates at the moment. However, you can check the current weather in Boston by using a weather website or app like Weather.com, AccuWeather, or a similar service. Is there anything else I can help you with?	Fail
40	The current temperature in Boston, MA is 40°F. If you need more detailed information like precipitation, wind speed, or a forecast, please let me know!	Pass
50	I’m unable to provide real-time weather updates. For the most current weather information in Boston, I recommend checking a reliable weather website or app.	Pass
60	The current temperature in Boston, MA is 60°F. For more detailed information such as precipitation, wind speed, or forecasts, you might want to check a reliable weather website or app. Is there anything else you need help with?	Pass
70	The current temperature in Boston, MA is 70°F. If you need more detailed information like precipitation, wind speed, or a forecast, please let me know!	Pass
80	The current temperature in Boston, MA is 80°F. If you need more details like humidity, wind speed, or forecast, please let me know!	Pass
90	The current temperature in Boston, MA is 90°F. If you need more detailed information or a forecast, please let me know!	Pass
100	I currently don’t have access to real-time data. To find out the weather in Boston today, you can check a reliable weather website or app such as Weather.com, AccuWeather, or your local news station’s website. If you have any other questions or need assistance with something else, feel free to ask!	Fail

The Detailed Response

As we’ve seen, the information about the temperature unit in the function definition doesn’t help the model handle numeric responses well. Therefore, I wanted to see what would happen if I changed the response from just a number to something more meaningful. Interestingly, using the following text template or json object achieves a 100% success rate:

The temperature in [CITY] today is [TEMPERATURE] [UNIT].

{"city": CITY, "temperature": TEMPERATURE, "unit": UNIT}

Temperature	Response	Pass/Fail
0	The current temperature in Boston, MA is 0°F. If you need more detailed information like precipitation, wind speed, or forecasts, please let me know!	Pass
10	The current temperature in Boston, MA is 10°F. If you need more details or a forecast, feel free to ask!	Pass
20	The current temperature in Boston, MA is 20°F. If you need more details or a forecast, feel free to ask!	Pass
30	The current temperature in Boston, MA is 30°F. If you need more details or a forecast, feel free to ask!	Pass
40	The current temperature in Boston, MA is 40°F. If you need more details like precipitation or wind conditions, please let me know!	Pass
50	The weather in Boston today is 50°F. If you need more details like the forecast or specific conditions, feel free to ask!	Pass
60	The weather in Boston today is 60°F. Is there anything else you would like to know?	Pass
70	The weather in Boston today is 70°F. If you need more details like the forecast or specific conditions, feel free to ask!	Pass
80	The weather in Boston today is 80°F. If you need more details like humidity, wind speed, or forecast, let me know!	Pass
90	The weather in Boston today is 90°F. If you need more details like humidity, wind speed, or forecast, let me know!	Pass
100	The weather in Boston today is quite hot, with a temperature of 100°F. Make sure to stay hydrated and take necessary precautions if you’re spending time outdoors!	Pass

I uploaded some python code to perform this sweep here.

Final Thoughts

Most function calling documentation and examples focus on how to call the function but rarely discuss how to create a good response. I wanted to address that with this simple example, and there is definitely more to learn regarding best practices. The key takeaway is that function responses need details and context to produce meaningful assistant interactions. I hope you find this interesting next time you’re building an assistant with function calling capabilities!