Beware of ChatGPT functions!

ChatGPT text » functions

TL;DR ChatGPT gives better quality* results when used in “plain text” mode as opposed to when these results are generated directly as ChatGPT function parameters. You should take extra care when relying on them exclusively.

* Quality has been measured both with objective and subjective methods.

This blog post is available in video form here.

DANGER!

I’ve just started recording youtube videos recently, and one of the videos I did was “Can ChatGPT count?”. While recording this video for the first time I ran into a problem (one of many, I ended up having to re-record regardless because of audio issues).

The goal was to see how well ChatGPT could count a certain character/number within a string with many random characters. I was exclusively using ChatGPT functions to get a nice structured response back that could easily be parsed for the purposes of seeing how the error changed as the string within which counting was occurring grew in size, instead of prompting ChatGPT to return the response in a given format I figured it was easier to use the function feature and parse it from there.

Here you can see what the prompt and function descriptions looked like.

The parsing worked great, the function was called every time and the numbers were initially inline with what you would expect (sometimes correct, sometimes a bit off).

As the strings got very long (100+ characters), the counting stopped working, returning 0 as the count every time. Pulling up the debugger I noticed that there was no error per-se and this was indeed what the ChatGPT API was returning. I tried playing with the temperature to see if its strictness was causing the issue but it persisted.

Then I decided to see what “plain old” ChatGPT would return without any functions to call, and the answer was not 0, and it was very close to the expected value (although not easily parseable).

What I assumed would make no difference (formatting prompt responses into JSON rather than free text) turned out to result in very different outputs on what mattered the most: the actual count.

How bad is it?

Like with most LLM development, it’s heavily context dependent. A good bet though if you’re building AI based apps or side projects is looking at enough examples to get an intuitive feel for what works and up to what point. I ended up running a few experiments with the goal of graphing the differences in a few different use-cases so that you can get an idea of when these problems arise and when they become unmanageable. There’s also a really simple solution to take advantage of both features at the end.

First one, then the other

Functions are great for getting information into JSON form, if the values that are to be provided as parameters need to be of high quality however, as we’ve seen it’s better to not rely on a single API call to generate them.

A solution is to take the high quality output of a plain ChatGPT interpretation of the query, and pass it along for parsing via the functions feature.

ChatGPT functions also have the great feature of selectively choosing the right function to call (and naming it unambiguously) based on the given context, so even if we don’t trust it with complex parameters, we generally do want to rely on it to decide when a given function should run. The solution therefore is three-step and not two-step.

  1. Define a list of functions and their descriptions as per the current OpenAI ChatGPT functions docs. For those that require higher quality parameters, allow an additional prompt to be specified which will be used to request a plain text response from the ChatGPT API answering it. Run the application you are building normally and rely on the ChatGPT API to request function calls.
  2. When a function call is requested by the ChatGPT response, if an additional prompt is specified, run it first (without allowing additional function calls) so that it will request a plain text response from the ChatGPT API answering the query, providing enough information to subsequently populate the parameters needed for the function call.
  3. Request a new response from the ChatGPT API, with the function description, including the plain text response just generated in the context so that ChatGPT can generate a function call pulling the high quality parameters from the previous message instead of coming up with them itself. Once it does, call the relative function in your own code as you normally would.

Don’t let parsing ruin the quality of your project, only rely on functions when it makes sense.

Please feel free to reach out to me with any comments, feedback or requests!