UPDATE: performance jump in two very interesting natural language benchmarks (it’s gone away)

Early I reported on a sudden jump in two important Natural Language Processing benchmarks- The ARC-Easy and the OpenBookQA. A new methodology apparently reduced by two thirds errors in the first, and made no errors whatsoever in the second- giving it solidly superhuman performance. These achievements would be, to say the least, extremely impressive.

However, these entries no longer appear on the Allen AI Leaderboard. Given that the results were astonishing to begin with one plausible -though not they only- explanation is that they were withdrawn due to a bug of some sort which gave the model an unrepresentatively high score.

V. Argwal hasn’t responded to my enquiry about his model yet. If he does, that may throw some further light .

We will keep you updated if there are any further developments.

If you enjoyed this article please consider joining our mailing list: https://forms.gle/TaQA3BN5w3rgpyqeA also, a collection of my best writing between 2018 and early 2020 is available as a free e-book “Something to read in quarantine: Essays 2018-2020”. You can grab it here.

UPDATE: performance jump in two very interesting natural language benchmarks (it’s gone away)

Published by deponysum

One thought on “UPDATE: performance jump in two very interesting natural language benchmarks (it’s gone away)”

Leave a comment Cancel reply

Share this:

Related

Published by deponysum

One thought on “UPDATE: performance jump in two very interesting natural language benchmarks (it’s gone away)”

Leave a comment Cancel reply