Hacker news threw up a story today:
https://www.theregister.com/2025/05/08/google_gemini_update_prevents_disabling/
Gemini's latest model update has made the model refuse to discuss topics like sexual assault and rape. A developer is working on a platform to help victims, and their current app is not not working. It seems like there was a training update to an in-flight model.
The same thing happened a few weeks back with GPT4.5 which had suddenly become obsequious.
It seems like if LLM companies want to equip users with the latest and greatest they should support a "latest" model that gets any up to date tweaks, but also support named stable versions.
The whole field is moving so quickly that practices like this are just not established.
At the same time I was at an industry event this past week hosted by AWS and all of the large media companies in the room were uniform in saying that their strategy is to use frontier models, that local open source models are more unstable than one might like.
So I guess it comes down to being very careful in considering your use case, and being extra cautious as we remain in an environment of significant change.
https://www.theregister.com/2025/05/08/google_gemini_update_prevents_disabling/
Gemini's latest model update has made the model refuse to discuss topics like sexual assault and rape. A developer is working on a platform to help victims, and their current app is not not working. It seems like there was a training update to an in-flight model.
The same thing happened a few weeks back with GPT4.5 which had suddenly become obsequious.
It seems like if LLM companies want to equip users with the latest and greatest they should support a "latest" model that gets any up to date tweaks, but also support named stable versions.
The whole field is moving so quickly that practices like this are just not established.
At the same time I was at an industry event this past week hosted by AWS and all of the large media companies in the room were uniform in saying that their strategy is to use frontier models, that local open source models are more unstable than one might like.
So I guess it comes down to being very careful in considering your use case, and being extra cautious as we remain in an environment of significant change.