Ad Code

《麻省理工学院科技评论 . 人工智能揭秘》10月4日快讯内容节录

10月4日的《麻省理工学院科技评论 -- 人工智能揭秘》(MIT Technology Review:The Algorithm - Artificial intelligence, demystified)聚焦报导大数据驱动的自然语言处理(Natural Language Processing, NLP)人工智能模型往往极其耗费资源。这些机器学习训练模型的原理都是倚靠输入更多的参数(parameters)来增强运算结论的准确度。

谷歌的BERT训练模型运用3.4亿参数来进行运算;OpenAI的GPT-2训练模型输入将近15亿参数来达到超强运算精确度;最新型、最大型,由Nvidia研发的MegatronLM模型, 则涵盖了83亿参数量。



第一篇来自华为研究人员的论文提出了一项名为TinyBERT的模型。该模型比原始模型小7.5倍,快近10倍。 因此,它已经达到了与谷歌的原始BERT模型几乎相同的语言理解水平。 另一方面,谷歌研究人员则在另一篇论文中,提出了比原始BERT模型小60倍以上,但语言理解能力比华为版本稍微逊色的缩小模型版本。

Picture Source:

Picture Source:

[Excerpt] In the past year, natural language models have become dramatically better at the expense of getting dramatically bigger. In October of last year, for example, Google released a model called BERT that passed a long-held sentence-completion benchmark in the field. The larger version of the model had 340 million parameters, or characteristics learned from the training data, and cost enough electricity to power a US household for 50 days just to train one time through.

Four months later, OpenAI quickly topped it with its model GPT-2. The model demonstrated an impressive knack for constructing convincing prose; it also used 1.5 billion parameters. Now, MegatronLM, the latest and largest model from Nvidia, has 8.3 billion parameters. (Yes, things are getting out of hand.)

AI researchers have grown increasingly worried and rightly so about the consequences of this trend. In June, we wrote about a research paper from the University of Massachusetts, Amherst that showed the climate toll of these large scale models. Training BERT, the researchers calculated, emitted nearly as much carbon as a roundtrip flight between New York and San Francisco; GPT-2 and MegatronLM, by extrapolation, would likely emit a whole lot more.

The trend could also accelerate the concentration of AI research into the hands of a few tech giants. Under-resourced labs in academia or countries with fewer resources simply don’t have the means to use or develop such computationally-expensive models.

In response, many researchers are now focused on shrinking the size of existing models without losing their capabilities. Now two new papers, released within a day of one another, successfully did that to the smaller version of BERT, with 100 million parameters.

The first paper from researchers at Huawei produces a model called TinyBERT that is 7.5 times smaller and nearly 10 times faster than the original. It also reaches nearly the same language understanding performance as the original. The second from researchers at Google produces another that’s more than 60 times smaller, but its language understanding is slightly worse than the Huawei version. Read more here. [/Excerpt]

Reference source/资料出处:

Image Source/图片处处:

Post a Comment