On UIs for AIs

20.08.2024

Rush of gold

It's been almost two years since the release of the fastest-growing app of all time [click], ChatGPT, in November 2022. OpenAI showed what generative language models are capable of doing by releasing a very powerful model and allowing everyone on the planet to access it easily via a a simple chat interface.

In a way, ChatGPT impressed people with capabilities to quickly search and summarise information. In my mind though, what made the general public go crazy for ChatGPT was partially due to its interface - you can talk to an AI. That's some Ex Machina-like story that all philosophers were warning us against - "This is the end of the world, we've created a machine that can answer questions about quantum physics that I don't know answers to myself".

Fast forward these two years, and a simple chat interface is still a dominant form of interacting with generative language models. We've experienced a storm of AI companies that allow you to essentially talk to your knowledge base through RAG; models fine-tuned for every single use case possible.

Chat to PDFs, chat to a therapist, chat to a fictional character, chat to your meeting notes, chat with a porn star, chat with YouTube videos, chat with lecture notes. The majority of the products I’ve tried have been chat interfaces that you drop PDFs or links to.

This is a very intuitive interface for developers to create. Models accept text as a prompt and return a text response. Why not just return it to the user? Despite chat being the most intuitive interface, there are a few shenangings in the space that try to invent the wheel. For all the right reasons.

Why UI is playing the first fiddle in AI now

The argument I want to make in this blog post is that UX now become the #1 player. This has been the case at the beginning of every interface revolution (hello web, hello smartphones, soon hello AR). We’ve got new toys, we don’t know exactly how to play with them yet.

When prior to 2022 you developed a machine learning system that wasn’t a generative language model, your competitive advantage was 1) having data that other people didn’t have, 2) having ML people who knew how to model a problem and develop a performing solution.

You couldn’t just ask a foundational model what to do, because zero-shot learners like GPT were not available, or were not good enough to use. You’d normally have to reproduce a complicated research paper and tweak it to your needs, which is harder than fine-tuning a HuggingFace model. What if you didn’t have enough data to train something decent? Then you’re not even allowed to play the game. But if you developed an actual working model, yes it would be wrong sometimes, but when it worked, for outsiders it looked like magic and it gave you a competitive advantage.

But what happens when an indie hacker and well-funded startups have access to the same state-of-the-art models ^[1] from a provider like OpenAI/Anthropic, and they don’t need too much of proprietary data for training/fine-tuning? All the value lies in the UX.

How easy is it to use the model? Is it fast enough to provide a smooth experience? Is the UX tuned to a specific use case, or are you offering a chat interface like everyone else?

Shenanigans

Not so long ago, Copilot from Microsoft (Github) has been released. It's integrated into another Microsoft product, a popular IDE called VSCode. I've switched form using ChatGPT for Copilot purley because it was more comfortable to have my assistant in the IDE rather than having to copy paste from the web. Recently, a friend recommended switching to Cursor, a small startup which redesigned the experience of using coding assistants. From what I can tell, the unique sauce they had was UI. The team likely forked VSCode, adopted it for LLM-assisted coding and plugged some off-the-shelf models. They didn't innovate on the quality of outputs, but the quality of the interface. Founders, CS majors themselves, understood how important for developers are things like shortcuts, they even added “cursor .” command in Terminal, to mimic “code .” command that opens up VSCode. Chef’s kiss, I don't mind paying x2 of what I was paying for Copilog, just shut up and take my money.

Granola, an AI notepad for people with back to back meetings, has also made an interesting design choice. Instead of blindly transcribing everything you say and putting it in a RAG system that you can chat to, they took a slightly different path. The team likely understood that through these two years of experimenting with AI systems, people have seen many failed demos [click] and hallucinations, that they have little trust in AI tools. Would you trust an AI that does this [click] or this [click] to summarise a very important meeting? Instead, you write your own, scrappy notes, and Granola makes them awesome. You, as a user, understand much better what’s important about this meeting than this stupid AI that you can’t trust anyway. It’s your notes, enhanced, not GPT’s notes.

A similar use case is a Google search competitor, Perplexity. The problem people have with ChatGPT is that you have to do the fast checking, and we still believe humans more than AIs. Instead of just returning the chat response, return also the links to relevant Stackoverflow posts or YouTube videos. As a result, Perplexity answers are trusted more by the users.

Loss of trust, and where is the value?

People grew a bit sceptical of AI tools not only because of how reliable they are, but also what message they send, quite literally. How would you feel if John from marketing was sending you only very generic emails that were AI-generated?

I have a LinkedIn contact, who keeps posting AI-generated paragraphs on data science concepts, likely to build up credibility for potential employers. It’s obvious they didn’t write them themselves. The signal this gives me is 1) you may or may not understand these concepts, I don’t trust that you do, and 2) you were too lazy to write a post yourself. Why would I read it if I could just ask ChatGPT?

While Redditors happily recommend AI-assited job applications, because it's just a numbers game, PostHog , a fast-growing startup in the open-source product analytics space, discourages sending generated cover letters [click] , because it's easy to spot them, and AI-generated cover letter sends a message: you're one of many, and I don't even care that much.

Writing is thinking. I want to understand you through your writing, your personality, sense of humour and your attitude. If your responses are randomly drawn from the distribution of all email responses ever sent, I care less about it reading it.

You can extend this argument to other applications of artificial intelligence, for example in entertainment. If chess is clearly booming, why aren't people interested in matches between two AIs [click] ? There's more to chess than just a display of mastery of the sport. There's more to being a singer than just hitting the right voice tone. We're clearly still working out the value people see in all these things, and making digital copies of every profession may not be the solution. I don't know.

I'm really excited to see what we'll learn about ourselves running this big AI experiment.

[1] Having the same access to a foundational model is one thing, being able to create an AI system that works well is another. Instead of reading research papers, you likely end up trying prompts, different agentic workflows etc. (which is a form of research, but I think of it as much more empirical than usual research), and this becomes another differentiator. I think of it as an early trend, and a good topic for another blog post!