Would you trust an AI chatbot with family planning? Investing $1 million? How about writing your wedding vows? the Wall Street Journal asks.
I'd like to think that I started this trend with Project Maestro, but LLM technology has reached a point where folks are looking past the hype and benchmarking output in the real world.
Stanford has benchmarked for a while, but it's very technical and honestly not that accessible to the lay person. I recently found this Wall Street Journal article that put the top 5 LLMs to task against more real-world scenarios.
I love the format, and I appreciate the WSJ with its breadth of audience dedicating some time to show the practicality (and impracticality) of the current generation of LLMs.
Without a doubt enthusiasts will declare "imagine what it can do next month/year/decade." It's a familiar refrain, the future value of LLM. It's fun to dream, but technologists need to understand current capabilities of the tech to make smart and lasting decisions.
I'd like to think that I started this trend with Project Maestro, but LLM technology has reached a point where folks are looking past the hype and benchmarking output in the real world.
Stanford has benchmarked for a while, but it's very technical and honestly not that accessible to the lay person. I recently found this Wall Street Journal article that put the top 5 LLMs to task against more real-world scenarios.
I love the format, and I appreciate the WSJ with its breadth of audience dedicating some time to show the practicality (and impracticality) of the current generation of LLMs.
Without a doubt enthusiasts will declare "imagine what it can do next month/year/decade." It's a familiar refrain, the future value of LLM. It's fun to dream, but technologists need to understand current capabilities of the tech to make smart and lasting decisions.