Beating OpenAI? Claude 3.5, GPT4 and Beyond

Claude is LEGIT

Brad Yim

Rishi Khaitan

, and

Hyun Supul

Jun 26, 2024

Hardly a month passes without someone hyping a “ChatGPT killer.” This time, the usual suspect named Claude AI is back in the news cycle. Claude AI announced the introduction of “Claude 3.5 Sonnet,” a “frontier model” outperforming ChatGPT-4o. This model is now currently available to the free users of Claude AI, albeit with daily usage limits. Many free-tier users report frequently running out of their daily limits after a handful of prompts, so Claude isn’t exactly being charitable with their top-of the-line model.

At the same time, Claude introduced a new feature to their chatbot called “Artifacts.” This is a long-awaited feature that allows the Claude AI chatbot to execute the code it generates. However, they have taken a very different approach compared to OpenAI’s ChatGPT. In this issue I want to dive deeper into this feature.

AI Reality Check

While I believe Claude AI’s claim that Claude 3.5 outscores ChatGPT in benchmarks, I doubt many real-life users are terribly impressed. The reasons are twofold:

Now, the frontier models like ChatGPT4 and Gemini 1.5 are quite capable of common cognitive and generative tasks. Average users are not too impressed by marginal improvements in this regard.
On the other hand, the best LLMs still show hallucinations and unexpected failures.

Let me illustrate the #2 point by showing an example that’s making rounds online: How many “r”s are in the word “strawberry?”

Oh no, how can you be so dumb while being so smart in many ways? Certainly, the much hyped “ChatGPT Killer” Claude 3.5 will do better, right?

“Claude can make mistakes. Please double-check responses.”

I will give credit where credit is due: Claude certainly sounds more confident and articulate than ChatGPT-4o as it preaches falsehood, like an expert conman would.

(Also to be fair, as of now Claude AI works noticeably faster than ChatGPT-4o. Even a casual user will notice this.)

Besides the well-known problem of hallucination, LLMs exhibit their lack of competency in simple counting and math tasks. This problem motivated OpenAI to quickly introduce the feature that enabled ChatGPT to write and execute Python computer code for math-oriented tasks. Even for relatively straightforward tasks like counting words in a paragraph, GPT4-o prefers to write Python code to derive the answer rather than relying on its own guess.

Claude’s Artifacts: Impressive Presentation

Claude took a very different approach when it comes to code execution. Unlike ChatGPT which prefers to write Python codes, Claude’s prefers to write JavaScript. What does that mean and why does that matter? ChatGPT executes the Python code on the server side and returns the result to the client. In contrast, Claude can’t execute Python code or any code on its server. Instead, it prefers to write JavaScript code that runs on the user’s web browser. Conversely, ChatGPT can write JavaScript but currently can’t execute JavaScript code on your browser.

The result is a much more visually impressive presentation by Claude, as it can pump out interactive code that runs right in your browser.

To use this Claude feature, you need to enable the “Artifacts” option manually, as it’s turned off by default. Once you sign up with Claude.AI, go to the “Feature Preview” menu under the user icon. And set the Artifacts option on.

To illustrate the difference between ChatGPT and Claude, let’s say that you ask ChatGPT: “Write code for a simple memory game.“

The code goes on and on. To run this code, first you need to set up the Python environment on your system, copy the code, write to a file on your system, then run Python commands to execute the code. While this process is not rocket science, many people using ChatGPT won’t know how to go through the steps.

In contrast, if you tell Claude 3.5,

it rapidly generates JavaScript and HTML code that runs right in your browser as you watch—no technical knowledge needed.

This instant interactivity makes a terrific presentation, and the crowd online has been raving about the feature. The fact that Claude 3.5 is measurably better than ChatGPT4 for coding doesn’t hurt. Yes, Claude 3.5 is widely considered as the best available model for writing codes as of now.

Also combined with Claude AI’s vision capability, it can turn a screenshot or a drawing to HTML code running in seconds.

Claude AI turns the above drawing to the following, right inside your browser:

So, What are the Limitations?

As stated, Claude can’t run code on the server side. That means, unlike ChatGPT, it can’t execute code on its own to answer your math-oriented questions and other queries that require the use of Python libraries.

Using ChatGPT, you can make complex requests like: “Ingest the provided spreadsheet file and create a PowerPoint presentation visualizing the data.” ChatGPT will then write Python code using libraries for PowerPoint file generation. It will then provide a link where you can download the file. Claude is unable to perform comparable tasks because everything has to run on the client side.

Also, frequently Claude AI will write JavaScript code that can’t run in your browser because of the lack of required libraries. To avoid this situation, you need to tell Claude “Don’t use unsupported libraries.” This request, however, limits the versatility of the code generated by Claude.

Additionally, Claude can’t generate images unlike ChatGPT. There are other, free services for generating images so it should not be too much of an issue.

API Cost

Currently Claude 3.5 offers API pricing pretty similar to OpenAI’s offering.

Claude 3.5 Sonnet: $3 for one million input tokens. $15 for one million output tokens.

ChatGPT-4o: $5 for one million input tokens, $15 for one million output tokens.

Therefore, while Claude offers marginally lower cost for input, it doesn’t offer savings of generating output.

What Will OpenAI Offer?

Recently, OpenAI’s CTO Mira Murati had an interview where she revealed further details about… the next OpenAI model.

GPT-5 will have ‘Ph.D.-level’ intelligence (msn.com)

It’s worth noting that she didn’t call the next-gen OpenAI model “GPT-5.” Many, including the author of the above article, choose to call it “GPT-5” anyway. Not too surprisingly, Murati claims that the next model will have significant leap in capabilities. She likens GPT-4 to a “smart highschooler,” while the next generation model will have “Ph. D level intelligence.” When asked when the model will be available, she suggested about “a year and half.”

Her statement pulls a stop on those who anticipated the arrival of GPT-5 this year. It’s unclear if OpenAI has a plan to fend off ongoing challenges from competitors like Claude while their next great model is getting ready. On the other hand, it further raises the anticipation that OpenAI is aiming for a greater goal, perhaps even a breakthrough closer to AGI. Hopefully, we will have an AI model that can tell how many “r” are in “strawberry.”

Hacking GPT : Thriving in the AI Revolution

Discussion about this post