OpenAI function calling is a game changer for developers.
It allows programmers to have more control over the OpenAI API.
This post explains what function calling is and how to use it in Python.
The key concepts will be demonstrated with an example of evaluating the quality of text, using a schema generated with Pydantic.
Year of the LLM
Large Language Models (LLMs) broke out into the public in 2023.
A Large Language Model (LLM) is a neural network that predicts the next token in a sequence of natural language.
The attention mechanism combined with a massive scale of data, compute and network parameters, has led to incredible model performance.
Millions of people now use ChatGPT, where users interact with an LLM as a sequence of prompts and responses through a web application.
It’s also possible to use an LLM via an API call – a developer can access an LLM via a HTTP request, and can use the data in the response in an application.
Calling the API offers different options and control than the ChatGPT web application.
One option available to developers who use the OpenAI API is function calling.
Why is Function Calling Useful?
Function calling allows you to define the schema of the Open AI API response.
By specifying the schema of the response, a developer can reliably extract data from the API response into a programming language like Python.
Without function calling, developers can attempt to get LLMs returning valid JSON data via prompt engineering, but this is unreliable.
With function calling, a developer can send their desired schema into the API as a function call. The LLM will then return data following that schema.
Instead of hacking together a prompt, OpenAI handle the prompt engineering to follow the provided schema.
Below is a pseudocode example of how function calling works:
In the example above we:
define a schema,
call the OpenAI API with a prompt and schema,
read the data we get back into our program.
Function Calling Without Functions
The name function calling is misleading – the LLM doesn’t actually call a function. Instead the schema is used in the prompt on the OpenAI side.
This explains how natural language descriptions in your JSON schema are used – they end up as part of the prompt.
Unlikely to be Perfect
As function calling is prompt engineering, it’s unlikely that you can ever guarantee that an LLM will generate valid JSON.
This means developers will need to handle the failure cases when the LLM returns data outside of the desired schema.
Practically however, function calling will be valuable for many developers and businesses even with a non-zero error rate.
Evaluating Text with Function Calling
Now we will develop a Python example of function calling by evaluating the quality of text.
The complete example is at the bottom of this post and on GitHub.
Creating a JSON Model Schema with Pydantic
At the heart of our function call is the JSON schema we want the LLM response to follow.
We will use Pydantic to create our schema. Pydantic is a Python library for data validation, and provides a way to generate JSON schemas from models with model_json_schema().
We start by defining a Pydantic model that represents an evaluation of text:
The types and descriptions for each of the fields will end up both in our schema, and in the prompt used on the OpenAI side.
We can pull out the JSON schema from an initialized Pydantic model using model_json_schema():
Functions List
To pass this schema into the OpenAI API, we create a list of our functions in Python. This list will end up as part of our API call:
It’s important to include all the fields in the "required" section of the function call.
Prompt
The next thing we need is a prompt – this will be the text that we evaluate:
Calling the API
Finally we can call the API using the openai Python library.
This requires that you have an environment variable OPENAI_API_KEY set.
We use a system message to give the LLM context on the task – this is in addition to the schema we send in with functions. The prompt is in our first user message:
Using a temperature of 0 makes the LLM as deterministic as possible – GPU non-determinism and gamma ray bit flipping permitting.
Extracting the Function Call Response
After receiving the response from the API, we can pull out the evaluation from the response:
We can then use our Pydantic model to validate the JSON returned by the API:
While not strictly necessary, we can also check that we have no message where a non-function call OpenAI API call would have a message:
The great thing about this code is that we can change how we evaluate text by changing a single Pydantic class.
It’s easy for us to change the task we want the LLM to do, and we get validation for free.
We could extend our evaluation to also check if the text is hard to read, by adding a hard_to_read_score field:
Full Python Code
The full example of using the OpenAI API and Pydantic for evaluating text is below and on GitHub.