Omar Shehata

Dall-E Telephone Game

From Omar's notebook.


OpenAI's image model can't "see" images, as far as I understand. If you give it an image and ask it to "recreate it exactly" it will internally describe it, reprompt itself, and give you something new.

On the left is the original image.

Here's another example:

I don't know what's going on under the hood, but I think what we're seeing here is how things get "lost in translation" when the model describes the image, without enough detail, and when it recreates it, it fills in details with the "most likely" thing.

Rich on mastodon said this is a good way to see the bias of the model.

blinry asked, what happens if we do this over and over again with the same image???

Doing it over & over

Starting over from scratch with the original image, here's the full sequence. The prompt I used repeatedly was: "here's a brand new image. Recreate it exactly for me, no changes"

First step takes us to the stereotypical aesthetic of Dall-E when given no style instructions.

We get the word PIZZA on the box, and a separated slice.

More text! The slice is more separate!

Recursive pizza??

Flipping directions

It's interesting to me that direction stays consistent but sometimes flips. Very curious if this is an artifact of the image generation OR whatever system is converting the image to text first?

Next steps

What happens if you do this with different models? It'd be cool to have an app/web page where I can drag and drop and image and it repeatedly does this process like 10-20 times with the API and shows me the results.

ALSO: every time you regenerate from the same image you get different results. It'd be cool to automate like, creating 3 different variants from the same image, creating like a tree?? (kind of reminds me of Kenneth Stanley's experiment where you create images by mutating other images. But here we're playing with like latent space??)

(I also have a lot of other things I'd love to see, I don't know if anyone's already done this: given an image, I want to "add to it" another image, like a word vector? Like, nudge it in latent space. Like subtract two images, get a vector, add to another image?)