In one of the first posts on this blog, I talked about how APIs would always beat RPA (Robotic Process Automation). A brief summary of that argument:

SaaS as a software model breaks RPA – updates can be sent without user intervention and non-consistent changes (A/B testing). As a tangential point, the shift to collaborative web-first software also takes users away from scriptability (e.g., lack of a Win32 API – see The Programable Web).

But something that I missed was our propensity to build human interfaces. The most intriguing algorithms, like GPT-3 and Stable Diffusion, use natural language (i.e., NLP) as the primary interface. GPT-3 isn't precise; it's human-like (in input and output). The class of end-user SaaS software is only growing. Buttons to click, tables to sort, GUIs to navigate.

It's why the idea of Screenshots as the Universal API is so enticing – it's the human visual interface (vs. language via text or audio). Would it ever be possible to transverse these human interfaces as efficiently and reliably as we do machine interfaces? Why use an imperfect intermediate representation like the PDF if we could manipulate and format a raw image just as easily?

On the other hand, maybe human interfaces are an artificial constraint. Why use NLP when there are infinitely more efficient message-passing formats? Why limit images to 3 dimensions? Code isn't natural language (maybe it will be one day), but it is the end-product of many developers.

I make a case for these LLMs to package natural language in a better format (i.e., formatted or schema-driven responses) in AI Interfaces.