With advancements in machine learning, screenshots are quickly becoming a universal data format. It's now (relatively) easy to extract meaning (image-to-text), layout information (object recognition), text (optical character recognition, OCR),  and other metadata (formatting, fonts, etc.).

Now, with diffusion-based models like Stable Diffusion and DALL-E, we have an encoder – text-to-image.

Screenshots-as-API solves a few problems:

An image is worth a thousand words.