Ian Mulvany

August 12, 2022

some experiments with DALL.E and AI generated images

image.png

("Robots painting an imagined future" - https://labs.openai.com/s/DVLdRxX1ttkpr02Y3E0GmHon).

This month I got access to DALL.E - the AI tool for generating images from text prompts. It's astonishing. It feels odd using it, almost an amplification of the blank canvas problem. I spent a lot of time looking at images that were created by DALL.E and reading the descriptions of those images, and thinking how clever, how astonishing. The emotional response when starting with the text, and hoping to have a kind of an image come into view is very different. It has ranges, surprise, jaw dropping astonishment, frustration, disappointment. I got 50 credits to start with, and so have been a bit selective in using them. It might be different again if the API were available untapped, and if the response times were improved by a few factors. Those things are sure to happen. right now I'm fumbling in the dark, with moments of inspiration emerging.

I am sure that these kinds of tools will very rapidly become an accepted, and normalised. Frank Chimero described this process over a decade ago: https://web.archive.org/web/20100909050013/http://blog.frankchimero.com/post/1059696119/there-is-a-horse-in-the-apple-store, but there is still a lot of magic to get through with these class of tools.

People are talking now about prompt engineering, where you create some heuristics that interact with the model to drive towards some level of desired result. I like the idea of prompt prospecting, it's an exploration. We don't have any rules that we can go with because the models have been trained on such large corpora of data.

If you are lucky enough to get access, then the following prompt book is a great overview of the kinds of things you can do with Dalle.e https://pitch.com/v/DALL-E-prompt-book-v1-tmd33y. It has limitations and some of those are mapped out nicely in this blog post - https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-e-2-can-and-cannot-do

I intend to use the shots I have left to do some exploration of scholarly themed prompts, but for now here are some of the hits and misses that I've had in the week or so that I have been playing with this.

There is a freely available model called DALLE mini. There are a number of sites where you can interact with it, and you can even get the code and run it yourself on a Google Colab space. I did both to start to interact with these kinds of models. I found the difference between the free model and DALL.E is that the latter if much faster, and provides higher resolution images. For the following prompt "lego mini figures enacting the passion of the christ" I got the following result from Dalle mini:

image.png


The equivalent results from DALL.E are:

image.png


Stylistically they are very similar.

One of the first images I created was for my daughter, who is a huge fan of unicorns:

image.png

(a super cute and fluffy baby unicorn https://labs.openai.com/s/iKFOwPqn6AwJp0LpkeOiJIJn)

For my niece I generated the following image of a cat playing with an orange:

image.png

(cute kitten playing with an orange - Leica image - https://labs.openai.com/s/Fov3By4YiYsT40Rhfwz9kEz9)

I knew that DALL.E can do interesting things with directives about art styles so I tried:

image.png


(A family playing board games in Switzerland - line art https://labs.openai.com/s/wgTkfUqKKfo5WeKk3P7mbu5s)

After reading through the prompt book mentioned above I tried some prompts to get to something more realistic looking:

image.png


(portrait of a rugged climber, studio lighting, New York Times, warm 2700K, dark background, with bokeh. - https://labs.openai.com/s/wCjWlfK8j9xf7c7NcHlx3hsq)

and

image.png


(A close-up black & white studio photographic portrait of royal robins, dramatic backlighting, 1973 photo from life magazine - https://labs.openai.com/s/8GhNPDhn5hoswqYSCL3As0my).

I'm going to continue exploring the tool, but it is clear that there is huge potential here. I don't know how that is going to pan out, but how we create is always being shaped by the shape of the technologies that we adopt. 
There was a very nice write up in the Guardian a few weeks back on the experience of some artists working with generative models What came across to me in their reflections is that they had all adopted the use of some form of image banks, be that Google images, flickr or their own bank of images. The underlying methods have been infused with digital elements. Much in the way that we now function largely simbiotically with our devices. So DALL.E is just another step along this path. https://www.theguardian.com/technology/2022/jul/10/dall-e-artificial-intelligence-art