Skip to content

OpenAI’s DALL-E AI image generator can now edit pictures, too

OpenAI’s DALL-E AI image generator can now edit pictures, too

Artificial intelligence research group OpenAI has created a new version of DALL-E, a text-to-image creation program. DALL-E 2 features a higher resolution version and lower latency than the original system, which produces images depicting descriptions written by users. It also includes new capabilities, such as editing an existing photo. As with previous OpenAI work, the tool is not released to the public directly. But researchers can sign up online to preview the system, and OpenAI hopes to make it available later for use in third-party applications.

The original DALL-E, an easel for artist Salvador Dali and robot WALL-E, debuted in January of 2021. It was a limited but great test of the AI’s ability to visually represent concepts, from regular images of a mannequin She wears a flannel shirt to a “giraffe made of turtle” or illustration of a radish walking with a dog. At the time, OpenAI said it would continue to build on the system while examining potential risks such as image creation bias or the production of misinformation. It attempts to address these issues with technical safeguards and a new content policy while reducing the computing load and pushing forward the core capabilities of the model.

DALL-E 2 result for

DALL-E 2 score for “A Shiba Inu dog in a black turtleneck and hat.”

One of the new DALL-E 2 features, inpainting, applies DALL-E’s text-to-image capabilities on a more subtle level. Users can start with an existing image, select an area, and tell the form to edit it. You can block out a painting on the living room wall and replace it with a different picture, for example, or add a vase of flowers on the coffee table. The model can fill in (or remove) objects while calculating details such as the directions of shadows in the room. Another feature, Differences, is a bit like an image finder for non-existent images. Users can upload a splash image and then create a set of skins similar to it. They can also mix two photos, creating photos with elements from both. The images generated are 1,024 x 1,024 pixels, a jump over the 256 x 256 pixels of the original submitted model.

DALL-E 2 is based on CLIP, a computer vision system that OpenAI also announced last year. OpenAI Research Scientist Pravola Dhariwal says, referring to the GPT model used by many textual AI applications. But word matching does not necessarily reflect the qualities that humans find most important, and the predictive process has limited the realism of the images. CLIP is designed to look at images and summarize their contents the way a human does, and OpenAI iterated this process to create an “unCLIP” – an inverted version that begins with the description and works its way toward the image. DALL-E 2 creates the image using a process called diffusion, which Darwall describes as starting with a “bag of dots” and then filling in a pattern with greater and greater detail.

Current photo of a room with a flamingo added in one corner.

Current photo of a room with a flamingo added in one corner.

Interestingly, a draft paper on unCLIP says it’s a resistance in part to a very funny weakness of CLIP: the fact that people can cheat form-defining abilities by naming one object (like Granny Smith’s apple) with a word that refers to something else (like an iPod). The authors say the Variations tool “still generates images of apples with high probability” even when using a false image that CLIP cannot identify as Granny Smith. Conversely, “the model never produces images for iPods, despite the expected relatively high probability of such captioning.”

The full DALL-E model hasn’t been released publicly, but other developers have developed their own tools that mimic some of its functionality over the past year. One of the most popular popular apps is Wombo’s Dream mobile app, which creates images of whatever users describe in a variety of artistic styles. OpenAI isn’t releasing any new models today, but developers can use their technical findings to update their own work.

DALL-E 2 result for

DALL-E 2 score for “a beast-like soup bowl, woven from wool.”

OpenAI has implemented some built-in safeguards. The model was trained on data that eliminated some objectionable material, ideally limiting its ability to produce objectionable content. There is a watermark indicating the nature of the work generated by the AI, although it could theoretically be cropped. As an anti-abuse prevention feature, the form also cannot generate any recognizable faces based on the name – even requesting something like Mona Lisa It will apparently return a variable on the actual face of the board.

DALL-E 2 will be testable by vetted partners, with some caveats. Users are prohibited from uploading or creating images that are “not rated on a G” and “potential to cause harm,” including anything that includes symbols of hate, nudity, obscene gestures, or “conspiracies or major events related to ongoing geopolitical events.” They also have to reveal the role of AI in creating the images, and they can’t present the generated images to other people through an app or website – so you won’t initially see a DALL-E-supported version of something like Dream. But OpenAI hopes to add it to the group’s API toolset later, allowing it to run third-party applications. “We hope to continue a phased process here, so we can continue to evaluate with feedback we get of how to safely release this technology,” says Dhariwal.

Additional reporting by James Vincent.

Source link