In light of the GPT-3 buzz, I’m reposting an essay I previously wrote on NLP, philosophy and speculation on the possibility of a “linguistic path” to general intelligence.

If you enjoy this article, please check out my free book by clicking Here: “Something to Read in Quarantine: Essays 2018-2020.”

Natural Language Processing (NLP) per Wikipedia:

“Is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.”

The field has seen tremendous advances during the recent explosion of progress in machine learning techniques.

Here are some of its more impressive recent achievements:

A) The Winograd Schema is a test of common sense reasoning- easy for humans, but historically almost impossible for computers- which requires the test taker to indicate which noun an ambiguous pronoun stands for. The correct answer hinges on a single word, which is different between two separate versions of the question. For example:

The city councilmen refused the demonstrators a permit because they feared violence.

The city councilmen refused the demonstrators a permit because they advocated violence.

Who does the pronoun “They” refer to in each of the instances?

The Winograd schema test was originally intended to be a more rigorous replacement for the Turing test, because it seems to require deep knowledge of how things fit together in the world, and the ability to reason about that knowledge in a linguistic context. Recent advances in NLP have allowed computers to achieve near human scores:(https://gluebenchmark.com/leaderboard/).

B) The New York Regent’s science exam is a test requiring both scientific knowledge and reasoning skills, covering an extremely broad range of topics. Some of the questions include:

1.Which equipment will best separate a mixture of iron filings and black pepper? (1) magnet (2) filter paper (3) triplebeam balance (4) voltmeter

2. Which form of energy is produced when a rubber band vibrates? (1) chemical (2) light (3) electrical (4) sound

3. Because copper is a metal, it is (1) liquid at room temperature (2) nonreactive with other substances (3) a poor conductor of electricity (4) a good conductor of heat

4. Which process in an apple tree primarily results from cell division? (1) growth (2) photosynthesis (3) gas exchange (4) waste removal

On the 8th grade, non-diagram based questions of the test, a program was recently able to score 90%. ( https://arxiv.org/pdf/1909.01958.pdf )

C)

It’s not just about answer selection either. Progress in text generation has been impressive. See, for example, some of the text samples created by Megatron: https://arxiv.org/pdf/1909.08053.pdf

2.

Much of this progress has been rapid. Big progress on the Winograd schema, for example, still looked like it might be decades away back in (from memory) much of 2018. The computer science is advancing very fast, but it’s not clear our concepts have kept up.

I found this relatively sudden progress in NLP surprising. In my head- and maybe this was naive- I had thought that, in order to attempt these sorts of tasks with any facility, it wouldn’t be sufficient to simply feed a computer lots of text. Instead, any “proper” attempt to understand language would have to integrate different modalities of experience and understanding, like visual and auditory, in order to build up a full picture of how things relate to each other in the world. Only on the basis of this extra-linguistic grounding could it deal flexibly with problems involving rich meanings- we might call this the multi-modality thesis. Whether the multi-modality thesis is true for some kinds of problems or not, it’s certainly true for far fewer problems than I, and many others, had suspected.

I think science-fictiony speculations generally backed me up on this (false) hunch. Most people imagined that this kind of high-level language “understanding” would be the capstone of AI research, the thing that comes after the program already has a sophisticated extra-linguistic model of the world. This sort of just seemed obvious- a great example of how assumptions you didn’t even know you were making can ruin attempts to predict the future.

In hindsight it makes a certain sense that reams and reams of text alone can be used to build the capabilities needed to answer questions like these. A lot of people remind us that these programs are really just statistical analyses of the co-occurence of words, however complex and glorified. However we should not forget that the relationships between words are isomorphic to the relations between things- that isomorphism is why language works. This is to say the patterns in language use mirror the patterns of how things are(1). Models are transitive- if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world.

It might be instructive to think about what it would take to create a program which has a model of eighth grade science sufficient to understand and answer questions about hundreds of different things like “growth is driven by cell division”, and “What can magnets be used for” that wasn’t NLP led. It would be a nightmare of many different (probably handcrafted) models. Speaking somewhat loosely, language allows for intellectual capacities to be greatly compressed. From this point of view, it shouldn’t be surprising that some of the first signs of really broad capacity- common sense reasoning, wide ranging problem solving etc., have been found in language based programs- words and their relationships are just a vastly more efficient way of representing knowledge than the alternatives.

So I find myself wondering if language is not the crown of general intelligence, but a potential shortcut to it.

3.

A couple of weeks ago I finished this essay, read through it, and decided it was not good enough to publish. The point about language being isomorphic to the world, and that therefore any sufficiently good model of language is a model of the world, is important, but it’s kind of abstract, and far from original.

Then today I read this report by Scott Alexander of having trained GPT-2 (a language program) to play chess. I realised this was the perfect example. GPT-2 has no (visual) understanding of things like the arrangement of a chess board. But if you feed it enough sequences of alphanumerically encoded games- 1.Kt-f3, d5 and so on- it begins to understand patterns in these strings of characters which are isomorphic to chess itself. Thus, for all intents and purposes, it develops a model of chess.

Exactly how strong this approach is- whether GPT-2 is capable of some limited analysis, or can only overfit openings- remains to be seen. We might have a better idea as it is optimised — for example, once it is fed board states instead of sequences of moves. Either way though, it illustrates the point about isomorphism.

Of course everyday language stands in a woolier relation to sheep, pine cones, desire and quarks than the formal language of chess moves stands in relation to chess moves, and the patterns are far more complex. Modality, uncertainty, vagueness and other complexities enter but the isomorphism between world and language is there, even if inexact.

Postscript- The Chinese Room Argument

After similar arguments are made, someone usually mentions the Chinese room thought experiment. There are, I think, two useful things to say about it:

A) The thought experiment is an argument about consciousness, a difficult thing to quantify or understand. It’s unclear that there is a practical upshot for what AI can actually do.

B) A lot of the power of the thought experiment hinges on the fact that the room solves questions using a lookup table, this stacks the deck. Perhaps we be more willing to say that the room as a whole understood language if it formed an (implicit) model of how things are, and of the current context, and used those models to answer questions? Even if this doesn’t deal with all the intuition that the room cannot understand Chinese, I think it takes a bite from it.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

(1)- Strictly of course only the patterns in true sentences mirror, or are isomorphic to, the arrangement of the world, but most sentences people utter are at least approximately true.

Leave a comment