Posting technology predictions publicly is a great way to prove how wrong you might be (just ask Steve Ballmer). Nevertheless, I'll try my luck and explain why at the end of the post.
So, where will large language models (LLMs) like ChatGPT and its peers advance to?
I identify two technological changes that together are expected to significantly alter our experience with GenAI:
- The context windows of the models are expanding significantly. GPT-3 had a context window, or memory, of about 1,536 words, whereas now it has about 24,000 words (Claude has about 150,000 words, and Google's Gemini 1.5 has an astounding 750,000 words).
- The operational cost of large models is becoming cheaper thanks to optimizations. This is particularly felt by those working with the API, but it mainly impacts the AI companies themselves. GPT-4 Turbo was significantly cheaper than GPT-4, and the new GPT-4o is even cheaper (and faster) to the point that they have made it free to the public.
These technological advancements might seem personally irrelevant to you, but they will form the basis of a revolution! Until now, to get useful answers from a language model, some "prompt engineering" was needed to provide the model with all the necessary context and encourage it to use the most relevant parts of its vast knowledge base. This requirement for such a drastic change in behavior prevents most people from adopting the use of language models in their daily activities.
However, now, with larger context windows and lower operating costs, tech giants can embed language models into the platforms themselves (i.e., in the computer, smartphone, and browser). Not superficially as a text window disconnected from any knowledge beyond what was fed to it, but as an agent aware of everything you are doing up to the moment you address it, so the conversation can start intuitively without unnecessary explanations, just like with a colleague who follows everything you have done so far (a bit creepy) or like Tony Stark talking to Jarvis. Those "naive prompts" that until now led to worthless answers will be enough to get accurate and relevant answers, because they are reinforced with all the required context by tracking everything you have done so far (over a long period).
Now that intuitive use seems within reach, the real race between tech giants for the personal assistant of each one of us begins.
I started the post by promising to explain the rationale behind publishing technology predictions that might be wrong and embarrassing. The simple reason is that while I have been sharing these ideas with those close to me for a long time, recently I see them taking shape with Microsoft's new Copilot recall feature, which is currently aimed at assisting users, but I believe its real goal is to provide context to the model itself, or in OpenAI's impressive demos, which blur the line between our present and science fiction scenarios with sexy-voiced virtual assistants.
In short, I want to write this now so that I can have the right and pleasure to tell you in the near future: I told you so!