Skip to content

Google sets the bar for AI language models with PaLM

Google sets the bar for AI language models with PaLM

We’re excited to bring Transform 2022 back in person on July 19 and around July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!


Google’s new Large Language Model (LLM) called PaLM (Paths Language Model) is the first result of Pathways, Google’s new AI architecture, which aims to tackle many tasks simultaneously, quickly learn new tasks and reflect better understanding. for the world.

PaLM is a huge undertaking with ambitious goals. Although many aspects of PaLM require further evaluation, it represents an important step forward for LLM. The development and evaluation process for PaLM is detailed in the arXiv post and summarized by Google in a blog post.

Under the Hood LLM

This Google post outlines Pathways’ philosophy at every step of the PaLM training process. Versions of the new architecture include PaLM 8B with 8 billion parameters, PaLM 62B with 62 billion parameters and PaLM 540B with 540 billion parameters. Google has created different versions in order to evaluate the cost and value function as well as the benefits of scale.

The number of parameters is important in an LLM, although more parameters do not necessarily translate into a better performing model. PaLM 540B is in the same league as some of the largest LLMs available in terms of number of parameters: OpenAI’s GPT-3 with 175 billion, DeepMind’s Gopher and Chinchilla with 280 billion and 70 billion, Google’s GLaM and LaMDA with 1.2 trillion and 137 billion and Microsoft – Megatron Turing NLG from Nvidia with $530 billion.

The first thing to consider when discussing LLM, like any other model of artificial intelligence, is the efficiency of the training process. Even the Googlers of the world need to answer this question: “Given a certain amount of computing, how big of a model should I train in order to get the best possible performance?”

In 2020, OpenAI proposed scaling laws to guide LLM training. In 2022, DeepMind published a paper titled “Large Language Models for Optimal Training Computation,” in which analysts claim that LLM training was done using highly imperfect use of computing. Independently, Google has come to similar conclusions, as detailed in the PaLM documentation.

PaLM training is state of the art on many levels. At the hardware level, the PaLM 540B was trained on two TPU v4 Pods connected via a Data Center Network (DCN) using a combination of modeling and data parallelism. Google used 3,072 TPU v4 chips in each iPod connected to 768 hosts, which indicates it is the largest TPU configuration described to date. This allowed Google to efficiently scale training to 6,144 segments, achieving a training efficiency of 57.8% using hardware FLOPs, which Google claims is the highest level achieved so far for LLMs at this scale.

PaLM uses a standard transformer model architecture, with some customizations. Transformers are the architecture used by all LLMs and although PaLM deviates from it in some respects, arguably the most important is the focus of the training data set used.

How to train your LLM

The dataset used to train PaLM is a mixture of filtered multilingual web pages (27%), English books (13%), multilingual Wikipedia articles (4%), English news articles (1%), and GitHub source code (5%) and multilingual social media conversations (50%). This data set is based on those used for LaMDA and GLaM training. There are a few things worth highlighting here.

First, it is worth asking whether the choice of sources reflects Google’s goals. Social media chats are by far the most popular source, and while web pages were chosen with their assigned quality scores in mind, this does not appear to be the case for social media chats.

Web pages included in the training dataset were filtered using a quality-assessment classifier, with the goal of reducing content toxicity including professionally written content. However, Google notes that this may have disproportionately excluded informal language, code-switching (or behavioral modifications in actions or speech), or dialectal diversity and may limit PaLM’s ability to model non-dominant dialects across English-speaking regions globally

We assume that quality scores can be difficult to assign to social media conversations. The paper also argues that for PaLM to be able to determine toxicity as part of its general-purpose applicability, exposure to it is required.

Second, although multilingual sources are cited, in fact they are still English-dominated. Roughly 78% of all sources are English, with German and French sources at 3.5% and 3.2% and all other sources behind.

Google notes that the language capabilities of PaLM are likely limited by language limitations in the training data and assessment criteria. At the same time, PaLM offers great multilingual capabilities by the standards Google has been rated against, mostly in English.

Variations in PaLM were trained using a one- or few-pass approach, which means that the bulk of the data in the training data set was processed as input as many times as possible. This is part of the efficiency bet for PaLM, but it also had an interesting side effect: it resulted in very little memorization, meaning that the PaLM output was mostly computed, not recited.

Doing more with less – but why?

Google’s vision for Pathways is to “enable a single AI system to generalize across thousands or millions of tasks, to understand different types of data and to do so with remarkable efficiency.” PaLM may be an important step forward in terms of efficiency, but what about its levels of performance?

Google claims that PaLM demonstrates superhuman abilities in many challenging tasks. In his blog post, examples of language understanding and generation, inference and tasks related to code are highlighted.

In language comprehension, the PaLM was evaluated on 29 widely used English Language Processing (NLP) tasks. The PaLM 540B outperformed a few snapshots from previous LLMs in 28 of the 29 missions. In addition to English NLP tasks, PaLM also shows strong performance in multilingual NLP criteria, including translation, even though only 22% of the training group is not English.

PaLM’s performance was also compared to that of Gopher and Chinchilla using the Beyond the Imitation Game Benchmark (BIG-bench) benchmark. The results show a remarkable understanding of natural language and generative abilities in tasks such as distinguishing between cause and effect, understanding conceptual groups in appropriate contexts and even guessing a movie from a set of emojis.

Of note here is the fact that the five-shot PaLM 540B performed better than the average score of individuals who were asked to solve the same tasks. Google also notes that PaLM performance indicates that performance improvements from scale have not yet stopped.

For logical reasoning, PaLM’s performance on tasks requiring multi-step computation or logical reasoning was evaluated. The example highlighted by Google is PaLM’s ability to solve 58% of problems in the GSM8K, a standard for thousands of challenging math questions at the elementary school level.

PaLM beats the previous first score of 55% achieved by tuning GPT-3 with a training set of 7,500 problems and integrating it with a calculator and external validator. This new result also approaches the 60% average for problems solved by 9-12 year olds – the target audience for the question set.

Google results for the PaLM 540B show strong performance across coding tasks and natural language tasks in a single model, even though it contains only 5% of the code in the pre-training dataset. Google notes that PaLM’s performance is particularly impressive because it is on par with the finely tuned Codex while using 50 times less Python code for training.

To summarize, it appears that PaLM can do more with fewer resources – that is, achieve similar or better performance to the latest existing LLMs, while requiring less resources and less allocation than they do.

Aiming higher with the ethics of artificial intelligence and human-level intelligence

The fact that this is a huge project is illustrated by Google’s post detailing the new technology. Its size, level of detail, and the reported team of nearly 70 professionals involved in the effort speak volumes.

Google also includes sections on “Analysis of Representational Bias” and “Ethical Considerations” in its post. Analysis and documentation of potential unwanted risks is promoted through transparent tools such as form cards and datasheets, which also include information on intended use and testing.

It’s hard to make speculations about what this all means on a practical level for the rest of the world at this point. Being able to create an LLM in a more efficient way is a good thing – to the extent that it’s ever created.

However, we are not aware of PaLM’s involvement plans at this point, and the TPU infrastructure used to train it is specific to Google. This means that transfer of knowledge and techniques to other LLM makers may not be directly applicable.

Unlike GPT-3, which is commercially available by OpenAI with Microsoft via the API, we are not aware of similar software or plans for Google’s GLaM, LaMDA, and PaLM. Google’s BERT, one of the first LLMs, is open source and has generated many variations, as well as running the latest incarnation of Google Search. We can assume that PaLM may eventually get there as well.

Regarding the “pie in the sky” goal of human-level intelligence, opinions differ. Google notes in its post that performance improvements from the domain have not yet stopped. In other areas where deep learning is being applied, a plateau in performance appears to have been reached.

Recently, Blaise Aguera y Arcas, head of Google’s AI group in Seattle, argued that “the stats are beyond comprehension,” citing some exchanges with LaMDA as evidence. Critics did not take long to point out the weaknesses of this claim. If anything, we expect PaLM to fuel the ongoing debate between AI professionals and technical decision makers.

VentureBeat mission It is to be the digital city arena for technical decision makers to gain knowledge about transformational enterprise technology and transactions. Learn more about membership.

Source link