How Zed's Open-Source Edit Predictions Work
There's a few interesting things in this video:
- Recorded 9 months ago, feels like a lifetime ago
- Nobody shares their secrets sauce, these folks might be the only ones
- Fine tuning (using bigger models to create data, good vs terrible output rather than single outcome training)
- speculative decoding (better guesses for next token, wider range with higher accuracy)
- Speculative vs deterministic software (find link for AI product article)
- Model hosting (so different than hosting "apps", capacity vs latency)
- Evals (unit test for LLM, only runs on a fine tuning run)
So much has changed in the last 9 months it is wild. This is before any IDE had any agent panel and before Claude Code and Codex launched for the CLI. It is hard for me to believe that all that has happened in 9 months, but here we are. While many really cool open source models have launched, very few folks show off their secret sauce like in this video. There is a ton of complexity in translating a "normal" LLM into something that can materially impact coding speed. Cursor would not have gone through the time to develop their own model if it was not worth it. So far, the cursor sauce is special. Their agents are fast and fairly smart, though I hear that Codex and Claude are still smarter.
Since this video, I've stopped using Zed as my daily driver. At the time of their video, their next edit prediction was incredible, much better than GitHub Copilot which was the leader then. Now though, it seems that Cursor and others have kind of stolen their tricks. I've slowly started to feel that the consistent UI of Cursor outweighs the snappiness of Zed as their agent / edit prediction advantage starts to fade away. Using Zed used to feel magical. Now it feels normal and a little bit outside of my comfort zone due to the differences from VS Code.
While Zed seems to have lost their advantage, their use of fine tuning and Cursor's subsequent replication of that strategy lends real credence to the idea that general models are not enough in specialized fields like coding. I wrote about the importance of training data before, but now it does seem that IDE adoption will be a compounding advantage going forward. If the smartest folks are using your IDE to write the best code, all of a sudden you have the most and best training data. I do think this is a weak spot for OpenAI. Most folks I know are using a mixture of Claude and Cursor as daily drivers with Codex sprinkled in. That usage pattern should concern OpenAI.
I don't have much to say about speculative decoding other than to say it's now on my list of things to read more about. I think this is the "special" sauce of the Zed model that allows it to be as fast as it is. I think we are going to see the primary innovation from here on out in model latency and token minimization. I don't personally think there is much juice left to squeeze on the model improvement front. We will see incremental innovation there until something comes up with a novel approach that does not use transformers.
Perhaps the most interesting part of the discussion here was model hosting. I've never heard anyone (outside of the hyperscalers) talking publicly about model hosting. I know folks who call the LLM APIs and I know people who use LM Studio to run things locally but I know nothing about the in between. The scarcity of chips for running things locally is mind boggling. There really isn't an equivalent to the "cloud" in LLMs yet. Google and Amazon have their own custom chips, but those are constrained. Apple has their private compute. Everyone else is renting or trying to build their own shit. I have no idea how folks like Cursor are managing to scale. There is no "Lambda" for LLMs yet where you can scale from 0 to 1 million requests overnight.
Finally, a subject near to my heart: testing. I fell in love with Test Driven Development right out of college and to this day my happiest days are spent in the flow loop that is writing a new feature with TDD. LLMs are really interesting though because they are not deterministic. I wrote about this property before, but once again hats off to Zed for actually talking through what they are doing here. The closest I've ever seen to an in depth discussion for testing probabilistic software is this banger from Matter about their approach to parsing. The key insight here is that you need multiple "test cases." You need to teach your LLM what good and bad are, not just what you expect. Because it's probabilistic, you need to test with probabilities. The goal is to shift the distribution of results closer to the "good" than the "bad" provided case. That's a whole new approach to testing that my brain still struggles to grasp.
There's a few interesting things in this video:
- Recorded 9 months ago, feels like a lifetime ago
- Nobody shares their secrets sauce, these folks might be the only ones
- Fine tuning (using bigger models to create data, good vs terrible output rather than single outcome training)
- speculative decoding (better guesses for next token, wider range with higher accuracy)
- Speculative vs deterministic software (find link for AI product article)
- Model hosting (so different than hosting "apps", capacity vs latency)
- Evals (unit test for LLM, only runs on a fine tuning run)
So much has changed in the last 9 months it is wild. This is before any IDE had any agent panel and before Claude Code and Codex launched for the CLI. It is hard for me to believe that all that has happened in 9 months, but here we are. While many really cool open source models have launched, very few folks show off their secret sauce like in this video. There is a ton of complexity in translating a "normal" LLM into something that can materially impact coding speed. Cursor would not have gone through the time to develop their own model if it was not worth it. So far, the cursor sauce is special. Their agents are fast and fairly smart, though I hear that Codex and Claude are still smarter.
Since this video, I've stopped using Zed as my daily driver. At the time of their video, their next edit prediction was incredible, much better than GitHub Copilot which was the leader then. Now though, it seems that Cursor and others have kind of stolen their tricks. I've slowly started to feel that the consistent UI of Cursor outweighs the snappiness of Zed as their agent / edit prediction advantage starts to fade away. Using Zed used to feel magical. Now it feels normal and a little bit outside of my comfort zone due to the differences from VS Code.
While Zed seems to have lost their advantage, their use of fine tuning and Cursor's subsequent replication of that strategy lends real credence to the idea that general models are not enough in specialized fields like coding. I wrote about the importance of training data before, but now it does seem that IDE adoption will be a compounding advantage going forward. If the smartest folks are using your IDE to write the best code, all of a sudden you have the most and best training data. I do think this is a weak spot for OpenAI. Most folks I know are using a mixture of Claude and Cursor as daily drivers with Codex sprinkled in. That usage pattern should concern OpenAI.
I don't have much to say about speculative decoding other than to say it's now on my list of things to read more about. I think this is the "special" sauce of the Zed model that allows it to be as fast as it is. I think we are going to see the primary innovation from here on out in model latency and token minimization. I don't personally think there is much juice left to squeeze on the model improvement front. We will see incremental innovation there until something comes up with a novel approach that does not use transformers.
Perhaps the most interesting part of the discussion here was model hosting. I've never heard anyone (outside of the hyperscalers) talking publicly about model hosting. I know folks who call the LLM APIs and I know people who use LM Studio to run things locally but I know nothing about the in between. The scarcity of chips for running things locally is mind boggling. There really isn't an equivalent to the "cloud" in LLMs yet. Google and Amazon have their own custom chips, but those are constrained. Apple has their private compute. Everyone else is renting or trying to build their own shit. I have no idea how folks like Cursor are managing to scale. There is no "Lambda" for LLMs yet where you can scale from 0 to 1 million requests overnight.
Finally, a subject near to my heart: testing. I fell in love with Test Driven Development right out of college and to this day my happiest days are spent in the flow loop that is writing a new feature with TDD. LLMs are really interesting though because they are not deterministic. I wrote about this property before, but once again hats off to Zed for actually talking through what they are doing here. The closest I've ever seen to an in depth discussion for testing probabilistic software is this banger from Matter about their approach to parsing. The key insight here is that you need multiple "test cases." You need to teach your LLM what good and bad are, not just what you expect. Because it's probabilistic, you need to test with probabilities. The goal is to shift the distribution of results closer to the "good" than the "bad" provided case. That's a whole new approach to testing that my brain still struggles to grasp.