On UIs for AIs

How UX became the primary competitive advantage once models were democratised.

Rush of gold

It's been almost two years since the release of the fastest-growing app of all time ChatGPT, in November 2022. OpenAI showed what generative language models are capable of doing by releasing a very powerful model and allowing everyone on the planet to access it easily via a simple chat interface.

In a way, ChatGPT impressed people with capabilities to quickly search and summarise information. In my mind though, what made the general public go crazy for ChatGPT was partially due to its interface - you can talk to an AI. That's some Ex Machina-like story that all philosophers were warning us against - "This is the end of the world, we've created a machine that can answer questions about quantum physics that I don't know answers to myself".

Fast forward these two years, and a simple chat interface is still a dominant form of interacting with generative language models. We've experienced a storm of AI companies that allow you to essentially talk to your knowledge base through RAG; models fine-tuned for every single use case possible.

Chat to PDFs, chat to a therapist, chat to a fictional character, chat to your meeting notes, chat with a porn star, chat with YouTube videos, chat with lecture notes. The majority of the products I've tried have been chat interfaces that you drop PDFs or links to.

This is a very intuitive interface for developers to create. Models accept text as a prompt and return a text response. Why not just return it to the user? Despite chat being the most intuitive interface, there are a few mavericks in the space that try to invent the wheel. For all the right reasons.

Why UI is playing the first fiddle in AI now

The argument I want to make in this blog post is that UX has now become the #1 player. This has been the case at the beginning of every interface revolution (hello web, hello smartphones, soon hello AR). We've got new toys, we don't know exactly how to play with them yet.

When prior to 2022 you developed a machine learning system that wasn't a generative language model, your competitive advantage was 1) having data that other people didn't have, 2) having ML people who knew how to model a problem and develop a performing solution.

You couldn't just ask a foundational model what to do, because zero-shot learners like GPT were not available, or were not good enough to use. You'd normally have to reproduce a complicated research paper and tweak it to your needs, which is harder than fine-tuning a HuggingFace model. What if you didn't have enough data to train something decent? Then you're not even allowed to play the game. But if you developed an actual working model, yes it would be wrong sometimes, but when it worked, for outsiders it looked like magic and it gave you a competitive advantage.

But what happens when an indie hacker and well-funded startups have access to the same state-of-the-art models from a provider like OpenAI/Anthropic, and they don't need too much of proprietary data for training/fine-tuning? All the value lies in the UX.1

How easy is it to use the model? Is it fast enough to provide a smooth experience? Is the UX tuned to a specific use case, or are you offering a chat interface like everyone else?

Mavericks

Not so long ago, Copilot from Microsoft (Github) has been released. It's integrated into another Microsoft product, a popular IDE called VSCode. I've switched from using ChatGPT for Copilot purely because it was more comfortable to have my assistant in the IDE rather than having to copy paste from the web. Recently, a friend recommended switching to Cursor, a small startup which redesigned the experience of using coding assistants. From what I can tell, the unique sauce they had was UI. The team likely forked VSCode, adopted it for LLM-assisted coding and plugged some off-the-shelf models. They didn't innovate on the quality of outputs, but the quality of the interface. Founders, CS majors themselves, understood how important for developers are things like shortcuts, they even added "cursor ." command in Terminal, to mimic "code ." command that opens up VSCode. Chef's kiss, I don't mind paying x2 of what I was paying for Copilot, just shut up and take my money.

Granola, an AI notepad for people with back to back meetings, has also made an interesting design choice. Instead of blindly transcribing everything you say and putting it in a RAG system that you can chat to, they took a slightly different path. The team likely understood that through these two years of experimenting with AI systems, people have seen many failed demos and hallucinations, that they have little trust in AI tools. Would you trust an AI that does this or this to summarise a very important meeting? Instead, you write your own, scrappy notes, and Granola makes them awesome. You, as a user, understand much better what's important about this meeting than this stupid AI that you can't trust anyway. It's your notes, enhanced, not GPT's notes.

A similar use case is a Google search competitor, Perplexity. The problem people have with ChatGPT is that you have to do the fact checking, and we still believe humans more than AIs. Instead of just returning the chat response, return also the links to relevant Stackoverflow posts or YouTube videos. As a result, Perplexity answers are trusted more by the users.

Footnotes

  1. Having the same access to a foundational model is one thing, being able to create an AI system that works well is another. Instead of reading research papers, you likely end up trying prompts, different agentic workflows etc. (which is a form of research, but I think of it as much more empirical than usual research), and this becomes another differentiator. I think of it as an early trend, and a good topic for another blog post!