The Artist, The Curator, and the Neural Net: A Conversation with Robbie Barrat
Robbie Barrat (@DrBeef_) is an artist and researcher who uses artificial neural networks to generate artwork including landscape and nude paintings, fashion runway photos, and three-dimensional physical structures. At 19 years old, Robbie has already had his work featured on the cover of Bloomberg Businessweek and his code appropriated by an artist collective which sold a derivative artwork for $432,500 at Christie’s this month. Robbie’s most recent project involves feeding Balenciaga runway stills to a conditional adversarial network in order to generate novel outfits for new runway poses.
Seth Thompson: I’m interested in the idea that the output images you are producing from the neural networks are a reflection or a distortion of some sort of latent idea in the training images. How do the output images help us understand things about the input images that we might not have thought to articulate?
Robbie Barrat: I have two projects that are super relevant: the landscapes and the nude portraits that I did earlier this year. I like the nude portraits better because the output looked completely different than the input images. It was a misinterpretation by the network. It wasn’t fed any art like Francis Bacon or any freaky nude portraits. It was fed very classic nude portraits, but it produced these awful fleshy images instead.
I think it’s interesting that AI has the potential to accidentally produce these images. With traditional generative art, if you get a surprising output it was because you screwed up some parameter that you were defining or that you made a mistake. You can still get surprises that way but it’s really more of a glitch than something misinterpreted by the program.
Have you seen the AI Balenciaga project?
I extracted poses (poses are just silhouettes of people) from Balenciaga’s fashion campaigns, lookbooks, and runway shows and I’m trying to teach a neural network to take in a silhouette and then produce the outfit associated with it, to reconstruct the outfit, in other words.
And that’s an impossible task. There’s no way you can reconstruct an outfit from a detail-less silhouette. Because of that, the network has been able to come up with some really strange results and some weird outfits that you would never see in Balenciaga’s actual catalog, for example. In one case, the network generated a pair of pants with a red wrap-around thing on the side, almost like a shin bag, because it misinterpreted people holding actual bags next to their legs and thought that the bags were connected to their shins. I’m actually working with another artist and a Chinese bootleg factory to get some of these outfits produced in real life. The shin bag is one of the outfits we’ve chosen to make.
ST: When you see a surprising outfit in the output images of the fashion program, for example, is there ever a moment where you think that the image should have been in Balenciaga’s set? Or rather that the network isn’t generating anything new, but that it’s better describing Balenciaga’s inherent vision for, say, the FW series?
RB: That is such an interesting idea. Yes and no . . . I think all networks do is approximate distributions in functions. Maybe all of the outputs can be described as “in the spirit” or “in the style” of Balenciaga . . . but there is a certain uncanniness to certain images that you would not find in Balenciaga’s collections, ever. This is because the network is purely visual while the humans at Balenciaga have some context, like what bags are used for or what function clothes play. The network doesn’t really care, or isn’t fed that info as input. In a way, you could say that it’s trying to better describe Balenciaga, but it’s lacking some crucial information. It’s like a complete outsider to all of this.
When I call the network “it” I’m not trying to personify it or make it appear like some agent acting on its own that’s producing the artwork. It really is just a tool or a medium at the end of the day.
ST: Can you talk about the process that you went through to produce those initial images? Where was the divide between what you put into the network and what was up for the network to compute?
RB: I think this applies to all of my projects. The role that I’m playing is more of a curator than anything else. The first thing I do is curate the data that the network is going to study, and once the network is trained I also curate the output data, deciding which outputs I am going to keep and whether I want to tweak the training sets so that the network learns something different. So all of the work that I do is really through curation. I’d say that I’m a “curator/artist,” but I don’t know yet if AI and neural networks are an artist’s tools, because I think they’re different from other kinds of tools. They may be an artist’s medium.
ST: I think most people have a reductive view of neural nets as a product of the images you give them. But in reality, of course, the architecture of the neural net, the configuration of the layers, and how they’re connected is also incredibly important in terms of how the images get processed.
RB: For the nude portraits and landscapes, I used progressive growing of GANs – the idea here is that there are two neural networks, the generator and the discriminator. The generator takes in random noise and tries to learn how to output images, while the discriminator looks at the images from the generator and compares them to real or desired images. The generator tries to maximize how many of its images the discriminator mistakes for real ones, and both of these networks get better and better at their jobs over time, in tandem learning how to produce new, hopefully realistic images. In the case of the nude portraits, they turned out so strange because the discriminator was telling the difference between generated images and real images just by looking at very local features, like folds of fat and bellybuttons, and wasn’t taking into account the overall structure of the image. The result was outputs that look like horrible flesh blobs to us.
With the Balenciaga project, I used a similar process – one called Pix2Pix – it’s basically the same, except the generator takes in pairs of images, and tries to construct the second image in the pair given the first image as input, and then the discriminator will look at the first image in the pair and the generator’s attempted reconstruction, and try to tell if the two images are a good pair or not. I trained this network to take in what were basically silhouettes of everyone in the Balenciaga catalogues and runway shows, and then generate a person wearing an outfit given the silhouette. The silhouette gives the network information like pose and structure, so these results ended up being more realistic, instead of just like clothed versions of the nude portraits.
ST: One of things I’ve always thought about as someone on the periphery of machine learning, is that the image is a really limited way to view the world. As humans, we almost always encounter the world through both the image and language. If you fed the network an image along with a description of what was happening in the picture (especially a description that used the full richness of human language, including metaphor, allusion, allegory, memory . . .) how much more intelligent would the network appear?
RB: This actually brings up a coincidence.
The reason that I wanted to do a fashion project in the first place is that I wanted to feed the network a text description of an outfit. There are neural networks that are trained to reconstruct sentences. You can just chop the sentences into pieces and get a vector representation. So I was just thinking that I would find a big database of outfit descriptions and just train the neural network to reconstruct an image of an outfit or garment based on a description and a pose. I ended up not going that way because if you strictly define what you want to generate by adding lots of rules, there’s not much room for it to fantastically screw up. Using the Pix2Pix architecture was more interesting because there are so many opportunities for the network to misinterpret and to make really far out things that don’t really have a current text description.
ST: To bring things back to the other kind of architecture, I noticed that you have been experimenting with encoding buildings. Can you talk about that project?
RB: I’m still working on that. About a year ago I tried training a 3D GAN. I tried an approach where I took a collection of buildings and I chopped them up into horizontal [planimetric] slices and I tried to treat this data as a time series problem. Given the past three slices of a building, predict the next one that will lie on top. And then keep doing that over and over again and build some structure up out of the ground. But that didn’t work because the network would fall into local minima where the building would just keep amassing more material as it got taller.
Do you know Bureau Spectacular? They are using a process called triple extrusion. If you have a cube of material and you have three shapes, you can carve away material so that each orthographic view of the cube appears as one of the three shapes.
I ended up applying this triple extrusion idea to the problem of encoding a building in a network. I’m treating each orthographic projection as a channel of an RGB image. So the GAN will generate projections for each of those three filters and then I can perform the extrusion to see the resulting object. I’ve been meaning to work with it more because I’m super into architecture.
ST: Christie’s sale [at left, below] of a piece of artwork ostensibly made with your code [at right, below] is a big development. How have you reacted to the incident? What do you think of the artist collective Obvious that put the piece up for auction?
RB: The people behind that so-called artist collective are marketers. They don’t have any art training . . . I don’t want to speak very poorly of them, but it’s hard not to.
The reason I clarified above that the network is just a tool is because they’re saying the exact opposite. They strongly believe that the network is playing the role of the artist and their whole slogan is “creativity is not only for humans,” which is just ridiculous. Some convolutional neural network is not creative!
ST: How do you rationalize what happened, given that there seems to be little precedent for answering questions about authorship and ownership in AI art?
RB: Even if there’s no legal precedent, regardless of the tools involved, the outputs are just so visually similar they beg the question: did they just use the networks that I trained and put up on GitHub? What does that mean? There are a bunch of people that say that I deserved to be left out because I open-sourced the networks for anyone to use. I didn’t think that anyone would sell those outputs . . . I was 17 when I put those up. It never really occurred to me that someone would use them to make $500k – that’s just insane! I just wanted to get more people hip to this technology because I thought it was super cool.
People have asked if I’m going to sue them, but I don’t really want to deal with it too much more. I just hope that people recognize that it’s BS and go check out legitimate AI art. I’m glad that it sold for so much because it creates a good precedent for the value of AI art. I just hope that the AI art community and the artists that are innovating and creating something new get more recognition because of this.
ST: When you think about your other projects, do you still have the same inclination to open source your models?
RB: I do, you know, I do. I was thinking about putting some of my older models online, like the landscape painting network when it turns a year old this spring. Training a huge high-res GAN like that is something that’s really not accessible. I was an intern at NVIDIA and I had to use NVIDIA’s GPU clusters and tell my manager that I was doing work. I mean the fact that I was working at NVIDIA and I still wasn’t able to easily access to the right equipment shows that making things open source to make them accessible is really important. Of course, I’d still be worried that Obvious 2 would show up and start selling a slightly higher resolution version of the output…
ST: What are you working on at Stanford now?
RB: I’m doing research now at the intersection of AI and medicine. I don’t go to school here though. I’ve guest lectured here, I’m TA-ing here, and I work in a research lab here, but they still rejected me for their undergrad program!