The 2004 film I, Robot is a futuristic murder mystery about a robot-hating Chicago detective and a robot who appears to have become sapient. In the movie, the robot-hating detective (played by Will Smith) and the robot discuss the possibility that a robot mind could create art. Twenty years ago, it was more of an open question as to whether computers could create visual art. Since then, computer-generated art seems to be much more of a reality. Earlier this month, an AI-generated image won an art contest, causing a great deal of controversy. It looks like we might have underestimated the computers.
This post describes recent developments in computer-generated images.
What are “AI-Generated Images”?
These are new software platforms that specialize in the generation of novel images based on text descriptions. Consider the three images below, which I generated using the Dall-E 2 platform. I used the prompt: “A sad robot sits and smokes a cigarette in Washington Square Park at sunset.”
This software is built on a massive database of art with attached textual descriptors (as you might find on Flickr or Pinterest). Algorithms go through these databases to “learn” which visual properties are commonly associated with descriptors. So, for example, the algorithm will go through millions upon millions of pictures, and develop estimates of, say, what shapes are present in pictures of “robots”, how bodies are presented in pictures describing them as “sad”, what backgrounds are often present in “Washington Square Park”, or what colors are present “at sunset”
The program generated these images based on an “understanding” of the visual elements that people might expect when told they would see this sort of image. They produce estimates or expectations of what might be in a hypothetical image, based on what visual features are generally present in images that contain those textual descriptors.
This is different from assembling other people’s images as if they were collages. These images did not exist prior to this image generation. The person depicted to the right, whom I described as “An old man with a wrinkly face, round glasses, a warm smile, and ruby red lipstick” does not exist and did not come in this specific form from me (the author of the textual description) or the work that “inspired” it. It was generated by a predictive model that used a database of pictures to generate a guess of what would be in an actual picture described as such.
Some argue that these images are influenced by, but not necessarily reproduced from, existing images. This is an important feature of these images. If they are indeed original and seen as “inspired” (rather than stolen) from artists (which is my view), then they constitute original intellectual property that is owned by its “authors” — the person who entered the query and terms set by those created the image generating software used to generate the image. As of mid-2022, most image-generating programs give users a license to use generated images as they wish, except under particular circumstances (like the use of image generation to create pornography, to depict real people or other intellectual property, or to generate hate material). If artists ultimately convince courts or lawmakers that such “inspiration” is in fact intellectual property theft, then they may also have claims to these images.
Like any technological advancement, computer-generated images have winners and losers. At present, it is an extremely affordable way to produce visual content for content creation enterprises that do not specialize in visual content. On the other hand, it is an example of computers performing work that people once performed, and the livelihoods of artists lost.
Will computer-generated images ruin the viability of graphic artists or photographers? Not entirely. It may hurt the market for some low-hanging fruit like, say, stock photography or clip art generation. I do not believe that it can produce highly meaningful innovations in visual arts because the computer is mimicking rather truly generating something that has never been seen. I believe that our eyes will soon adjust to what these algorithms produce, and it will come to be seen as cheap art. It is also my sense that many visual artists build personal brands or followings, such that audiences see it as meaningful that a particular individual created that cultural object. I do not think those aspects of the market will be ruined, even if technological advancement will hurt some old income streams. It is clear that creators who deal in image generation will have to adapt.
How To
The easiest method is to use the Dall-E 2 platform (above). However, some may wish to install an image generation engine locally. For that, we can use Stable Diffusion.
Installing Stable Diffusion on Your Device
Device Requirements. A modern Windows computer with a GPU of at least 6 Gb of VRAM and hard drive with at least 10 Gb of free space.
Installation Method. It is possible to generate images on your own device using an open-source program called Stable Diffusion. Click here for an (untested) illustrated tutorial on how to install Stable Diffusion on your Windows device with nVidia graphics drivers. I used this method to install Stable Diffusion on my device with an AMD GPU. Two notes on this install:
- Step 2.2 of the install depicts you as performing the operation in the project’s root folder. Ensure that your directory is set to the “diffusers-dml” subdirectory
- Your install may be halted at Step 4 due to an error related to PyTorch. You may have a version of PyTorch for devices with NVIDIA GPUs. Check out this fix.
- I needed to uninstall and reinstall the scipy, tokenizer, and regex packages
Generating Images
This installation renders a Python script called dml_onnx.py. To enter your image prompt, change this line in the script:
1 |
prompt = "INSERT IMAGE PROMPT HERE" |
So, to generate a nice oil painting of Queens College, I might enter something like:
1 |
prompt = "Queens College in Flushing NY, Fall Day, Oil Painting, Masterpiece, Award Winning" |
And I got this picture. What’s interesting is that this is not Queens College, but rather a fictitious building with an exterior fence like the one surrounding campus, and bricks like the library, and maybe a few other reminiscent features.
What makes it fun is that there are innumerable permutations of the process, so you can get something different every time by changing the random seeds. Here’s what I got on the next iteration with the exact same prompt, but by changing the random seed of the following line from 42 to 142:
1 |
torch.manual_seed(142) |
And I got this:
The Dall-E 2 system (above) gives you four choices right off the bat. Here’s three that Dall-E 2 gave me for this prompt. There are more options, but we will save that for another post.
Promise and Peril
There have been amazing advancements in computer image generation. Like any technological advancement, it presents opportunities and threats. As mentioned above, the technology is becoming very impressive. It presents a threat to content creators who deal in image generation, but also a source of cheap, quality art for creators who use images to create other products. I am interested to see where the courts and legislators go on these issues.
More broadly, the march of technology is highly impressive. Computers can create original art, at least on some level. Ideas that seemed fantastic 20 or so years ago are now very close to being realities. It does not seem to be a stretch to imagine entire movies generated from prompts a few decades from now. What will happen to all those people who make movies, TV shows, and commercials?