UPDATE: performance jump in two very interesting natural language benchmarks (it’s gone away)

Early I reported on a sudden jump in two important Natural Language Processing benchmarks- The ARC-Easy and the OpenBookQA. A new methodology apparently reduced by two thirds errors in the first, and made no errors whatsoever in the second- giving it solidly superhuman performance. These achievements would be, to say the least, extremely impressive.

However, these entries no longer appear on the Allen AI Leaderboard. Given that the results were astonishing to begin with one plausible -though not they only- explanation is that they were withdrawn due to a bug of some sort which gave the model an unrepresentatively high score.

V. Argwal hasn’t responded to my enquiry about his model yet. If he does, that may throw some further light .

We will keep you updated if there are any further developments.

If you enjoyed this article please consider joining our mailing list: https://forms.gle/TaQA3BN5w3rgpyqeA also, a collection of my best writing between 2018 and early 2020 is available as a free e-book “Something to read in quarantine: Essays 2018-2020”. You can grab it here.

One thought on “UPDATE: performance jump in two very interesting natural language benchmarks (it’s gone away)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s