Meet DALL-E, the A.I. That Draws Anything at Your Command

Meet DALL-E, the A.I. That Draws Anything at Your Command

San Francisco – At OpenAI, one of the world’s most ambitious artificial intelligence labs, researchers are building technology that lets you create digital images simply by describing what you want to see.

They call it DALL-E in a nod to both “WALL-E,” the 2008 animated film about an independent robot, and Salvador Dali, the surrealist painter.

OpenAI, backed by $1 billion in funding from Microsoft, has yet to share the technology with the general public. But one afternoon, Alex Nicholl, one of the researchers behind the system, demonstrated how it works.

When he requested an “avocado-shaped teapot,” typing those words into a largely blank computer screen, the system generated 10 distinct images of a dark green avocado teapot, some with pits and some without. “DALL-E is good on avocado,” said Mr. Nicholl.

When he wrote “Cats Playing Chess,” he placed two fluffy cats on either side of a square game board, 32 chess pieces lined up between them. When he summoned a “teddy bear playing the underwater trumpet,” one of the images showed tiny air bubbles rising from the end of the bear’s horn toward the surface of the water.

DALL-E can also edit photos. When Mr. Nicole erased the teddy bear’s horn and ordered a guitar instead, a guitar appeared between the furry arms.

A team of seven researchers spent two years developing the technology, which OpenAI plans to eventually offer as a tool for people like graphic artists, introducing new shortcuts and new ideas while creating and editing digital images. Computer programmers already use Copilot, a tool based on similar technology from OpenAI, to generate snippets of program code.

But for many experts, DALL-E is worrying. As this type of technology continues to improve, they say, it could help spread disinformation online, fueling the kind of online campaigns that may have helped influence the 2016 presidential election.

“You can use it for good things, but certainly you can use it for all kinds of other crazy and disturbing applications, which include deepfakes,” said Subbarao Kambhampati, a professor of computer science at Arizona State, such as misleading photos and videos. University.

Half a decade ago, the world’s leading artificial intelligence laboratories built systems that could identify objects in digital images and even create images on their own, including flowers, dogs, cars, and faces. A few years later, they built systems that could do the same with written language, summarize articles, answer questions, create tweets, and even write blog posts.

Now, researchers are integrating these technologies to create new forms of AI DALL-E which is a notable step forward because it combines language, imagery, and, in some cases, realizing the relationship between the two.

“We can now use multiple, intersecting streams of information to create better and better technology,” said Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, an artificial intelligence lab in Seattle.

Technology is not perfect. When Mr. Nicole asked DALL-E to “put the Eiffel Tower on the moon,” he didn’t quite get the idea. Place the moon in the sky above the tower. When asked about a “living room full of sand,” he produced a scene that looked more like a construction site than a living room.

But when Mr. Nicholl modified his requests a little, adding or subtracting a few words here or there, he did what he wanted. When he asked about “a piano in the living room full of sand,” the photo looked more like a beach in the living room.

DALL-E is what AI researchers call a neural network, a mathematical system loosely modeled on the network of neurons in the brain. This is the same technology that recognizes spoken commands in smartphones and determines the presence of pedestrians while self-driving cars navigate city streets.

A neural network learns skills by analyzing large amounts of data. By identifying patterns in thousands of avocado photos, for example, you can learn to identify avocados. DALL-E looks for patterns as it analyzes millions of digital images as well as text comments describing what each image depicts. In this way, he learns to recognize the connections between images and words.

When someone describes an image of DALL-E, it generates a set of key features that that image might include. One feature may be the line on the edge of the horn. The bend above the ear of another teddy bear might be.

Then, a second neural network, called the diffusion model, generates the image and generates the pixels needed to achieve these features. The latest version of DALL-E, unveiled Wednesday in a new research paper describing the system, produces high-resolution images that in many cases look like photos.

Although DALL-E often fails to understand what someone has described and sometimes spoils the image it produces, OpenAI continues to improve the technology. Researchers can often improve the skills of a neural network by providing it with larger amounts of data.

They can also build more robust systems by applying the same concepts to new types of data. The Allen Institute recently created a system that can analyze audio as well as images and text. After analyzing millions of YouTube videos, including audio tracks and captions, I’ve learned to recognize specific moments in TV shows or movies, like a dog barking or a closed door.

Experts believe that researchers will continue to refine such systems. Ultimately, these systems can help companies improve search engines, digital assistants, and other popular technologies as well as automate new tasks for graphic artists, programmers, and other professionals.

But there are caveats to this possibility. AI systems can display bias against women and people of color, in part because they learn their skills from the vast pools of online text, images and other data that show bias. It can be used to produce pornography, hate speech and other offensive material. And many experts believe that technology will eventually make it very easy to create misinformation, and people will have to be skeptical about almost everything they see on the Internet.

“We can craft the text. We can put text into someone’s voice. We can fake pictures and videos,” Dr. Etzioni said. “There is already misinformation on the Internet, but the concern is that this scope of disinformation is taking to new levels.”

OpenAI maintains tight restrictions on DALL-E. Foreigners will not be allowed to use the system on their own. Puts a watermark in the corner of every image that is created. And while the lab plans to open the system to testers this week, the group will be small.

The system also includes filters that prevent users from generating what they consider inappropriate images. When asked about a “pig with a sheep’s head,” he refused to produce a portrait. According to the lab, the combination of the words “pig” and “head” likely caused OpenAI’s anti-bullying filters to falter.

“This is not a product,” said Mira Moratti, Head of Research at OpenAI. “The idea is to understand the capabilities and limitations and give us the opportunity to build in mitigation.”

OpenAI can control system behavior in some ways. But others around the world may soon devise a similar technology that puts the same forces in the hands of just about anyone. Working from a research paper describing an early version of DALL-E, Boris Daima, an independent researcher in Houston, built and released a simpler version of the technology.

“People need to know that the pictures they see may not be real,” he said.

Source link

Leave a Comment