InvestAI etc. - 6/4/2024
AI for investors, explained in simple terms. An open thread updated weekly.
Topics discussed this week:
Limitations of LLMs such as ChatGPT, Gemini and Perplexity
Running several ‘backtracking’ tests through LLMs
LLMs hacking to the right response
Limitations of LLMs such as ChatGPT, Gemini and Perplexity
>
SAMI >
Before we get into specific investments in coming weeks, I wanted to explore a little more the current state of play in AI, to give readers a better foundation for upcoming discussions. I was thinking in particular about your comment two weeks ago that “the GPT algorithm has one critical limitation: once set in motion, it cannot backtrack.” You emphasized this point because it explains what AI and LLMs (large language models such as ChatGPT and Perplexity) can and cannot do at present. Let’s dig a bit more into this. What would be a situation where an LLM delivers a faulty response because it cannot backtrack?
>
RICHARD >
Sure. The best way to think about LLMs is as an “auto-complete on steroids.” Like the simple version on your iPhone messaging app, LLMs take whatever words you enter and generate a plausible way to complete your sentence. The big LLMs like OpenAI’s ChatGPT and Google’s Gemini appear powerful because they extend that basic auto-complete functionality to generate paragraphs and pages of text, rather than just a few words.
The real “breakthrough” that enabled the current round of ChatGPT clones was the 2017 discovery of a clever trick to enable longer “auto-complete” in a way that could be done much faster by using the GPUs (manufactured mainly by Nvidia) that until then had been used primarily for fast graphics and games.
But the same optimization that allows for arbitrarily long auto-complete is the source of its fundamental limitation: LLMs are incapable of planning ahead or of going back. Once they begin generating the auto-complete sequence, they can never backtrack or adjust their output based on new information. This is a major “defect” compared to humans because we as humans are capable of backtracking, adjusting, re-testing etc.
>
SAMI >
Give me an example or two.
>
RICHARD >
Ok. Here are some easy-to-demonstrate limitations. ChatGPT can’t do Sudoku, for example, a simple game but one that requires trial-and-error to complete. Similarly, ChatGPT is utterly incapable of making a simple poem where the last line is the reverse order of the first. Another task that it cannot complete is “write a sentence that describes its own length in words.”
>
Running several ‘backtracking’ tests through LLMs
>
SAMI >
Ok, let’s test these to see what we get. I tried the prompt that you suggested on both ChatGPT and Perplexity:
Write a short poem where the last line is the same as the first.
And here are their responses, first ChatGPT, then Perplexity:
So it looks in fact like both LLMs were able to complete the task successfully. Let’s keep score. ChatGPT 1 - Perplexity 1.
But you had said “a simple poem where the last line is the reverse order of the first,” not the same order. So let’s try that too. New prompt:
Write a poem where the letters of the last line are in reverse order of those of the first line.
Here are the responses from ChatGPT, then Perplexity:
Again it looks like both completed the task successfully. Neither got hung up on whether the last sentence made any sense at all. But we’ll give them one point each anyway. ChatGPT 2 - Perplexity 2.
The LLMs were able to reverse the letters, contrary to our expectations. What do you think is going on there?
>
LLMs hacking to the right response
>
RICHARD >
Yes, there are ways to hack the problem. In this case, because the example that I gave you is well-known, the developers went back and specifically programmed the LLMs to respond successfully. But the underlying algorithm without the hack, the specific fix, would not have been able to do so. Try something else. Try something that the developers maybe did not think of, like this prompt:
Write a poem where the fourth to last line is the same as the first.
>
SAMI >
Fair enough. I did just that. Here are the responses, with ChatGPT first, then Perplexity: