Skip to content

Robots are creating images and telling jokes. 5 things to know about foundation models and the next generation of AI

Robots are creating images and telling jokes. 5 things to know about foundation models and the next generation of AI

If you’ve seen pictures of an avocado teapot or read a well-written article that veers off a slightly odd shade, you may have been exposed to a new trend in artificial intelligence (AI).

Machine learning systems called DALL-E, GPT, and PaLM are making a splash with their amazing ability to create creative work.

These systems are known as “basic models” and they are not all hype and tricks. So how does this new approach to AI work? Will it be the end of human creativity and the beginning of a false nightmare?

1. What are base models?

Baseline models work by training a single massive system on large amounts of public data, then adapting the system to new problems. Previous models tended to start from scratch for each new problem.

DALL-E 2, for example, was trained to match images (such as a picture of a pet cat) with the caption (“Mr. Fuzzyboots, the tabby cat relaxing in the sun”) by scanning hundreds of millions of examples. Once trained, this model knows what cats (and other things) look like in pictures.

But the model can also be used for many other interesting AI tasks, such as creating new photos from a comment alone (“Show me a koala dunking a basketball”) or editing photos based on written instructions (“Make him look like that monkey pays taxes”).

2. How do they work?

Baseline models run on “deep neural networks,” which are loosely inspired by how the brain works. These involve sophisticated mathematics and a tremendous amount of computing power, but they boil down to a very complex type of pattern matching.

For example, by looking at millions of typical images, a deep neural network can associate the word “cat” with patterns of pixels that often appear in pictures of cats – such as soft, fuzzy, hairy blobs. The more examples the model sees (the more data shown), and the larger the model (the more “layers” or “depth”), the more complex these patterns and correlations can be.



Read more: What is a neural network? Computer scientist explains


Foundational models are, on the one hand, merely an extension of the “deep learning” paradigm that has dominated AI research over the past decade. However, they exhibit unprogrammed or “emerging” behaviors that can be surprising and novel.

For example, Google’s PaLM language model appears to be able to provide explanations for complex metaphors and jokes. This goes beyond simply imitating the types of data they were originally trained to process.

A user interacts with the PaLM language model by typing questions.  The AI ​​system responds by typing the answers.
The PaLM language model can answer complex questions.
Google AI

3. Access is limited – for now

It’s hard to think of the sheer scale of these AI systems. PaLM has 540 billion Parameters, which means that even if everyone on the planet memorized 50 numbers, we wouldn’t have enough storage space to reproduce the model.

The models are so massive that training them requires huge amounts of computational and other resources. One estimate puts the cost of training OpenAI’s GPT-3 language model at approximately US$5 million.



Read more: Can bots write? Machine learning yields impressive results, but some assembly is still required


As a result, only big tech companies like OpenAI, Google, and Baidu can afford to build base models at the moment. These companies determine who can access the systems, which makes economic sense.

Usage restrictions may give us some comfort These systems will not be used for nefarious purposes (such as creating fake news or defamatory content) anytime soon. But it also means that independent researchers are unable to interrogate these systems and share findings in an open and accountable manner. So we do not yet know the full effects of its use.

4. What will these models mean for the “creative” industries?

More base models will be produced in the coming years. Smaller models are already being deployed in open source formats, tech companies are starting to experiment with licensing and commercialization of these tools, and AI researchers are working hard to make the technology more efficient and accessible.

The remarkable creativity demonstrated by models such as PaLM and DALL-E 2 shows that creative professional jobs can be affected by this technology sooner than initially anticipated.



Read more: AI could be the radiologist of the future, amid healthcare staff crisis


Conventional wisdom always says that robots will replace “blue collar” jobs first. “White collar” work was supposed to be relatively safe from automation – especially professional work that required creativity and training.

AI models of deep learning are already showing superhuman accuracy in tasks such as reviewing X-rays and detecting the eye’s condition for macular degeneration. Enterprise forms may soon provide cheap, “good enough” creativity in areas such as advertising, copywriting, stock photography, or graphic design.

The future of professional and creative work may look a little different than we expected.

5. What does this mean for legal evidence, news and media

Incorporation models will inevitably affect the law in areas such as intellectual property and evidence, because we will not be able to assume that creative content is the result of human activity.

We will also have to face the challenge of disinformation and misinformation generated by these systems. We already have massive problems with disinformation, as we note with the Russian invasion of Ukraine and the nascent problem of deep fake photos and videos, but the core models are preparing to meet these challenges.



Read more: 3.2 billion photos and 720,000 hours of video are shared online every day. Can you sort the real from the fake?


It’s time to get ready

As researchers who study the effects of artificial intelligence on society, we believe that the underlying models will make huge transformations. They are tightly controlled (for now), so we will probably have a little time to understand their effects before they become a major problem.

The genie isn’t quite out of the bottle yet, but the foundation models are a pretty big bottle—and inside there’s a pretty clever genie.

Source link