Brian Austin

May 29, 2024

Putting Chatbots to the Test

Would you trust an AI chatbot with family planning? Investing $1 million? How about writing your wedding vows? the Wall Street Journal asks.

I'd like to think that I started this trend with Project Maestro, but LLM technology has reached a point where folks are looking past the hype and benchmarking output in the real world. 

Stanford has benchmarked for a while, but it's very technical and honestly not that accessible to the lay person. I recently found this Wall Street Journal article that put the top 5 LLMs to task against more real-world scenarios.  

I love the format, and I appreciate the WSJ with its breadth of audience dedicating some time to show the practicality (and impracticality) of the current generation of LLMs. 

Without a doubt enthusiasts will declare "imagine what it can do next month/year/decade."  It's a familiar refrain, the future value of LLM. It's fun to dream, but technologists need to understand current capabilities of the tech to make smart and lasting decisions.

About Brian Austin

| AI Dev Tool Research | Engineering Leadership | Tech Lead Manager | Software Architect |

This newsletter syndicates all of my LinkedIn content.

For AI project updates check out Project Maestro on Substack.

All other links at https://bit.ly/m/bwaustin