OpenAI made a ton of announcements earlier this week. As Benedict Evans pointed out, they are driving hard for the platform play.
As expected, their models are becoming faster, more capable, and cheaper, with much longer context windows to boot. I fully expect this drive to continue for quite a while, as there remains a ton of headroom around both what optimisation can bring in this space, and increased opportunity to scale at the customer and infrastructure level too.
The GPT was on user interface, and one API call, it was easy to figure out how to interact with it. Now there are a ton of options, and those options create cognitive friction. What the hell is the best way to use this voodoo magic?
I recognise that I don’t fully understand the differences yet, so I’m going to try to explain them to myself. The following are my descriptions, as I understand them, but I might be wrong.
GPT system prompts
This is a parameter you can pass in an API query to some of the GPT model endpoints that gives the model an instruction on how to behave. This was one of the first ways to modify model response and default behaviour, and is very easy to implement. I’ve had a lot of success in using this mode.
They provide an update here on how to move to the newer way of doing things.
How to provide a system prompt into the completions API is laid out here.
This is new. OpenAI says that -
The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling.
Since launching ChatGPT the UI tool has been extended with a ton of deeply impressive features. My take is that Assistants is a way to create API endpoints that you can use in your application, that can also take advantage of the more enhanced functionality that OPenAI provides. You can either create an assistant (which in effect is a namespace API endpoints) via API, or in the web console. You can set some instructions, but also set its ability to call enhanced features like the ability to search the web, or provide it a custom set of documents that it can access, or allow it to run code interpreter. You can set it up to interact with, I believe, up to 120 different function calls.
How you interact with the assistant, in your app, is interesting. You create a thread, which contains the input from your app or user. This gives you a thread ID. Then you attach the assistant to that thread, and then the assistant responds with a message. They say
A thread is a conversation session between an assistant and a user. Threads simplify application development by storing message history and truncating it when the conversation gets too long for the model’s context length.
These basically look like a UI manifestation of Assistants, with GPTs help in adding in context, resources, and designs. In a way, it’s kind of like a super doubling down on the App Store, but where the feature is customised and scoped instances of GPT that are, If not fully fine-tuned, at least guided, towards a set of capabilities or behaviours. The demo in the keynote was truly impressive from the point of view of showing how easy it is to create these.
Thoughts / Questions
This is hugely impressive.
- Today, this does not solve the systems integration problem, but I can see a route towards that, due to the very high competitive landscape we are in - more on that in a moment.
- This does not solve the discoverability issue, or the magic incantation issue. The discoverability issue is the one where in a world with 1000s of these GPTs, how do I verify or validate the ones I should be using. The magic incantation issue is that I still need to pause my work to interact with a scoped entity, and so need to develop my own heuristics (or magic incantations) on how to make the thing work optimally.
- I’d love to see GPTs available within the scope of an org, where we can share them internally, coalesce on a set that works well, and collectively improve and tweak them.
- I’d love to see dev-like tools for these, not hard to imagine, but something like an audit trail, and ability to see versioning on the parameters that are being used. OpenAI’s announcement that they are supporting a seed as part of an API, which allows you to get back to repeatable responses, tells me that this ask of mine is very possible.
Much of the above moves us towards agents. MS and Google are going to go hard into this area too. Today OpenAI needs you to interact with Zapier to get into your email, calendar, and docs. MS and Google will be able to offer these kinds of agents into environments with company-wide permissions, and that is a very exciting prospect.
Classifications from OpenAI:
- gpt assistants