LLM Accuracy: My Experiment with Five Models

Comparing AI capabilities on spotting a currency conversion error in a Taylor Swift article. Spoiler: ChatGPT-4o shines, while others lag behind for now.

I recently decided to put five AI models to a simple but practical test: ChatGPT-4o, Gemini, Copilot, Llama 3, and ChatGPT-4. My experiment was an easy one: I had spotted a currency conversion error in a Taylor Swift article in the newspaper, and I wanted to see which model could identify it accurately. The error in question was a significant one: GBP 1 billion was incorrectly converted to INR 100 crores. For those unfamiliar with the conversion rates, this is a substantial mistake; the actual amount should be INR 10,000 crores. I was curious to see how the different models would handle it.

The source news article I had scanned. Courtesy: Times of India, Chennai edition, 23 May 2024

The Results

ChatGPT-4o stood out by correctly identifying the mistake. It nailed the task, showing that it’s currently the best among the models I tested. What amazed me was how it not only caught the important mistakes but also left out the inconsequential ones. For example, it pointed out the significant conversion mistake but ignored minor discrepancies elsewhere. Google’s Gemini has the technical capabilities to catch up, but they seem to be playing it safe for now. Llama 3, being a text model, required a bit of extra effort. I had to OCR the scanned image of the article before inputting the text into the model. While Llama 3 showed promise, it didn’t catch the important error. Similarly, Microsoft Copilot, though based on OpenAI’s ChatGPT-4, and ChatGPT-4 itself both focused on less important errors.

ChatGPT 4o nailed it!

Google Gemini, Copilot, ChatGPT-4, Llama3 missed spotting the error

Google Gemini, Copilot, ChatGPT-4, and Llama3 missed spotting the error

Why ChatGPT-4o’s Performance is Awesome

The accuracy and efficiency of ChatGPT-4o in identifying the currency conversion error are impressive.. Such capabilities make the model more useful for everyday tasks, significantly enhancing productivity—this can save a lot of time and effort.

The Future with AI Models and Fact-Checking Agents

Looking ahead, attaching these AI models to agents for more rigorous fact-checking could be incredibly beneficial. They could help automate the process of verifying information, reducing the likelihood of errors in published content. This combination of AI models and fact-checking agents could lead to even greater accuracy and reliability in various applications. It’s fascinating to see how differently these AI models perform and how tech giants are approaching AI. While LLMs have been handling these tasks fairly well over the past 1.5 years, my practical experience shows that there’s still room for improvement and innovation. And for now, ChatGPT-4o seems to be leading the pack.

Discover more from Venkatarangan blog

Subscribe to get the latest posts to your email.

Categorized in:

Generative AI

Tagged in:

Artificial-Intelligence

LLM Accuracy: My Experiment with Five Models

The Results

Why ChatGPT-4o’s Performance is Awesome

The Future with AI Models and Fact-Checking Agents

Discover more from Venkatarangan blog

About the Author

Venkatarangan Thirumalai

Check latest articles from this author:

Kalki 2898 (2024): Impressive Visuals, Lackluster Emotions

Self-gifting “Gold”

Restarting My Yoga Journey | International Yoga Day 2024

Comments

Leave a ReplyCancel reply

Previous Article

How ChatGPT-4 Simplified My Credit Card Statement in Minutes!

Next Article

PT Sir (2024) – A Bold Take on Sexual Harassment

Press ESC to close

Or check our Popular Categories...

The Results

Why ChatGPT-4o’s Performance is Awesome

The Future with AI Models and Fact-Checking Agents

Discover more from Venkatarangan blog

About the Author

Check latest articles from this author:

Comments

Leave a ReplyCancel reply

Related Articles

Previous Article

Next Article

Discover more from Venkatarangan blog