AI as an Amplifier of Human Knowledge and Biases

Image generated by MidJourney AI Beta at https://www.midjourney.com

AI is an amplifier of the known and documented human knowledge, experience, and behaviors. Currently, AI excels in replicating, repeating, reorganizing, and magnifying the content that humans have recorded in forms such as text, images, audio, and video on a large scale.

Therefore, if humans are flawed, artificial intelligence will exhibit those flaws. If humans are biased, AI will also display biases.

A crucial aspect to grasp about AI is that current large language models simply concatenate words in a specific order, without any substantial understanding of the world expressed by those words. It's akin to players in a word-guessing game scoring points by applying English words without comprehending their meaning. In other words, AI simply guesses the most suitable word for a given context to become an enhanced version of an automatic sentence completion feature.

Large language models like ChatGPT and other generative AI tools generate text by predicting the most probable next word based on patterns learned from vast amounts of text data. While these models are capable of producing coherent and well-structured text in some instances, they lack a profound comprehension of the meaning and context conveyed by the text. Essentially, AI is merely guessing the most fitting words, phrases, and sentences in a given context to fill in or generate text, much like an auto-complete feature.

This statistical model-based AI operation principle is a data-driven algorithm. Large language models capture statistical patterns and probability distributions between words by analyzing and learning from large amounts of text data. When given an initial text or prompt, the model generates the next most likely word or phrase based on existing knowledge and learned data distribution patterns.

It is precisely because of this limitation that the current trend of large language models such as OpenAI's ChatGPT, disseminates misinformation, biases, and even generates hallucinations. AI is essentially an enhanced electronic parrot that lacks understanding of its own discourse.

AI may possess eloquent language expression abilities, but it lacks the capacity for self-confirmation and falsification because it doesn't comprehend semantics or the logical relationship of its own expressed content. Researchers even mercilessly liken large language models such as OpenAI ChatGPT to "Stochastic Parrots."

Researchers refer to large language models as "Stochastic Parrots" because they can generate seemingly coherent language text without truly understanding the content being generated. These models lack genuine language comprehension abilities and instead learn patterns and rules from vast amounts of language data to generate new language text based on those patterns and rules.

This manner of generating language text is similar to parrots imitating human speech in terms of pronunciation and intonation, where they can utter certain words or phrases without truly comprehending their meaning. Similarly, large language models can generate seemingly coherent language text based on pre-learned language patterns and rules, but they lack genuine comprehension of the content being expressed and cannot engage in reasoning and understanding like humans do.

Therefore, researchers consider large language models as "Stochastic Parrots" to emphasize their limitations and flaws, reminding us to exercise caution and rationality when using and applying these technologies.

The analogy of "Stochastic Parrots" vividly illustrates the limitations and shortcomings of large language models, reminding people not to be confused and misled when using these technologies, and to remain vigilant and rational.

François Chollet, a Google AI researcher and the creator of the deep learning framework Keras, argues that language is a query to humans' memory. Humans use words to store concepts in the memories, and language is the key to retrieve knowledge from memory.

However, for AI, language is not a strict means of storing and extracting knowledge. Language is merely a product of AI systems guessing and sequencing words.

In addition, Yann LeCun, Turing Award laureate and Chief AI Scientist at Facebook and Jacob Browning, a scholar in the computer science department at New York University believe that language only carries a small fraction of all human knowledge, and most human and animal knowledge is non-verbal. Therefore, large language model artificial intelligence trained only on text can never reach human-level intelligence.

At a time when mainstream media and social media almost unanimously praise ChatGPT like large language models, Yann LeCun recently took to Twitter to reiterate that before we can get to "God-like AI" we will need to get through "Dog-like AI".

Source: https://twitter.com/ylecun/status/1646882539833794560

Gary Marcus, Professor of Psychology and Neuroscience at New York University, states that although ChatGPT can generate linguistically logical content, it does not necessarily reflect reality itself. Consequently, ChatGPT may further amplify the impact of fake news, raising deep concerns at the governance level. He asserts that ChatGPT is merely a tool, not a human. It is more akin to a spell checker, a grammar checker, or a statistical package. It cannot provide genuine ideas, design carefully controlled experiments, or draw inspiration from existing literature.

The April 2023 issue of The Economist noted that modern artificial intelligence (AI) systems rely heavily on algorithms that require vast amounts of data for training. A significant portion of this data comes from the Internet, including popular Websites such as Wikipedia, Twitter, Reddit, Stack Overflow, and other Websites. However, as generative AI continues to evolve, so too does the risk of "data poisoning," where the data used to train AI models may be deliberately or accidentally corrupted, leading to biased or inaccurate results.

The Economist analysis suggests that with the rise of generative AI tools like ChatGPT and image generation system DALL-E 2, many AI product development companies have started scraping data directly from the open Internet, imitating OpenAI's approach. However, this approach increases the risk of "data poisoning." Any Internet user can inject "digital poison" into Internet data, potentially attacking these AI tools. For example, they could inject specific data into Wikipedia, which anyone can edit. Some data may only degrade the performance of AI tools, while other data may elicit specific responses from "poisoned" AI models, such as providing false information on a particular topic or promoting certain brands while denigrating certain groups in conversations with human users.

The Economist cautions that "data poisoning" attacks, which involve manipulating training data sets or adding irrelevant information, can allow AI algorithms to learn harmful or malicious behavior, much like a real poison. Such attacks may go unnoticed until damage is already done, making it crucial to carefully monitor and verify the quality of data used to train AI models.

In summary, while AI has the potential to amplify human knowledge, it is limited by its reliance on statistical patterns and rules, making it susceptible to the risks of "stochastic parrot" and "data poisoning". Despite these limitations, the potential and value of AI in various fields cannot be ignored. Responsible use of AI requires caution when applying its technologies, along with the establishment of ethical and regulatory frameworks. In addition, education and awareness are crucial to empower individuals to make informed decisions about the use of AI, maximizing its benefits while mitigating potential risks.

Furthermore, it is crucial for government officials, public agencies, academic institutions, and AI product companies to take responsibility for enhancing public understanding of AI and raising awareness of its potential risks, including fraud, privacy violations, and the dissemination of false information. By doing so, we can harness the power of AI to improve our quality of life and increase personal and corporate productivity, while also preventing its potential negative impacts.

~ This is the English translation of my article published on Malaysian Chinese News Portal Oriental Daily

Note: The translation was by ChatGPT

Reference

Bender, E. M. (2021). Natural Language Processing: A Recipe for Experimental AI [PDF]. Retrieved March 14, 2023 from https://faculty.washington.edu/ebender/papers/Bender-NE-ExpAI.pdf

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

Marcus, G. (2023). Nonsense on Stilts. Retrieved April 29, 2023 from https://garymarcus.substack.com/p/nonsense-on-stilts

Marcus, G. (2023). Scientists, Please Don’t Let Your Chatbots… Retrieved April 29, 2023 from https://garymarcus.substack.com/p/scientists-please-dont-let-your-chatbots

Marcus, G. (2022). AI Platforms like ChatGPT Are Easy to Use but Also Potentially Dangerous. Scientific American. Retrieved April 29, 2023 from https://www.scientificamerican.com/article/ai-platforms-like-chatgpt-are-easy-to-use-but-also-potentially-dangerous/

Marcus, G. (2023). Why Are We Letting the AI Crisis Just Happen? The Atlantic. Retrieved April 29, 2023 from https://www.theatlantic.com/technology/archive/2023/03/ai-chatbots-large-language-model-misinformation/673376/

Schreiner， M. (2023). Stochastic parrot or world model? How large language models learn. Retrieved April 29, 2023 from https://the-decoder.com/stochastic-parrot-or-world-model-how-large-language-models-learn/

Brownlee, J. (2020). Neural Networks are Function Approximation Algorithms. Machine Learning Mastery. Retrieved March 16, 2023 from https://machinelearningmastery.com/neural-networks-are-function-approximators/

Tencent Cloud. (2022). Yann LeCun：大模型方向错了，智力无法接近人类. Tencent Cloud Developer. Retrieved March 14, 2023 from https://cloud.tencent.com/developer/article/2085730

Wu, T. (2023). ChatGPT太火，这些人却给它泼冷水. The Paper. Retrieved March 14, 2023 from https://m.thepaper.cn/newsDetail_forward_21876471

Chollet, F. (2020). Measures of Intelligence [Video]. YouTube. Retrieved September 1, 2020 from https://www.youtube.com/watch?v=PUAdj3w3wO4

LeCun, Y. (2023). Before we can get to god-like AI, we’ll need more linguistically competent AI. [LinkedIn post]. Retrieved May 2, 2023 from https://www.linkedin.com/posts/yann-lecun_before-we-can-get-to-god-like-ai-well-activity-7052654636524068864-9zRH?trk=public_profile_post_view

LeCun, Y. [@ylecun]. (2023). Before we can get to "God-like AI" we'll need to get through "Dog-like AI". [Tweet]. Twitter. Retrieved May 2, 2023 from https://twitter.com/ylecun/status/1646882539833794560

LeCun, Y. [@ylecun]. (2023). LLMs do *not* capture much of human thought, because most of human thought and all of animal thought is entirely non verbal. [Tweet]. Twitter. Retrieved May 2, 2023 from https://twitter.com/ylecun/status/1610633906738298880

LeCun, Y. (2022). AI and the limits of language. [LinkedIn article]. Retrieved May 2, 2023 from https://www.linkedin.com/posts/yann-lecun_ai-and-the-limits-of-language-activity-6967929903409205248--ypi

Browning, J. and LeCun, Y. (2022). AI and the limits of language [Online article]. NOEMA. Retrieved May 2, 2023 from https://www.noemamag.com/ai-and-the-limits-of-language/

经济学人商论. (2023). ChatGPT的兴起让AI更容易中毒？. Retrieved April 29, 2023 from https://mp.weixin.qq.com/s?__biz=MjM5MjA1Mzk2MQ==&mid=2650911337&idx=1&sn=0988d46891265ad2a9d780aae66e593c

The Economist. (2023). It doesn’t take much to make machine-learning algorithms go awry. Retrieved April 29, 2023 from https://www.economist.com/science-and-technology/2023/04/05/it-doesnt-take-much-to-make-machine-learning-algorithms-go-awry