A Breakthrough for AI Technology: Passing an 8th Grade Science Test

Oren Etzioni, left, who oversees the Allen Institute for Artificial Intelligence, speaks with Peter Clark, manager of the Aristo project, at the lab in Seattle, Aug. 27, 2019. On Wednesday, Sept. 4, the Allen Institute unveiled Aristo, a new system that correctly answered more than 90 percent of the questions on an eighth-grade science test and more than 80 percent on a 12th-grade exam. (Kyle Johnson/The New York Times)

SAN FRANCISCO — Four years ago, more than 700 computer scientists competed in a contest to build artificial intelligence that could pass an eighth-grade science test. There was $80,000 in prize money on the line.

They all flunked. Even the most sophisticated system couldn’t do better than 60% on the test. AI couldn’t match the language and logic skills that students are expected to have when they enter high school.

But last week, the Allen Institute for Artificial Intelligence, a prominent lab in Seattle, unveiled a new system that passed the test with room to spare. It correctly answered more than 90% of the questions on an eighth-grade science test and more than 80% on a 12th-grade exam.

The system, called Aristo, is an indication that in just the past several months researchers have made significant progress in developing AI that can understand languages and mimic the logic and decision-making of humans.

The world’s top research labs are rapidly improving a machine’s ability to understand and respond to natural language. Machines are getting better at analyzing documents, finding information, answering questions and even generating language of their own.

Aristo was built solely for multiple-choice tests. It took standard exams written for students in New York, though the Allen Institute removed all questions that included pictures and diagrams. Answering questions like that would have required additional skills that combine language understanding and logic with so-called computer vision.

Some test questions, like this one from the eighth-grade exam, required little more than information retrieval:

A group of tissues that work together to perform a specific function is called:

(1) an organ

(2) an organism

(3) a system

(4) a cell

But others, like this question from the same exam, required logic:

Which change would most likely cause a decrease in the number of squirrels living in an area?

(1) a decrease in the number of predators

(2) a decrease in competition between the squirrels

(3) an increase in available food

(4) an increase in the number of forest fires

Researchers at the Allen Institute started work on Aristo — they wanted to build a “digital Aristotle” — in 2013, just after the lab was founded by Seattle billionaire and Microsoft co-founder Paul Allen. They saw standardized science tests as a more meaningful alternative to typical AI benchmarks, which relied on games like chess and backgammon or tasks created solely for machines.

A science test isn’t something that can be mastered just by learning rules. It requires making connections using logic. An increase in forest fires, for example, could kill squirrels or decrease the food supply needed for them to thrive and reproduce.

Enthusiasm for the progress made by Aristo is still tempered among scientists who believe machines are a long way from completely mastering natural language — and even further from duplicating true intelligence.

“We can’t compare this technology to real human students and their ability to reason,” said Jingjing Liu, a Microsoft researcher who has been working on many of the same technologies as the Allen Institute.

But Aristo’s advances could spread to a range of products and services, from internet search engines to record-keeping systems at hospitals.

“This has significant business consequences,” said Oren Etzioni, the former University of Washington professor who oversees the Allen Institute. “What I can say — with complete confidence — is you are going to see a whole new generation of products, some from startups, some from the big companies.”

The new research could lead to systems that can carry on a decent conversation. But it could also encourage the spread of false information.

In recent months, the world’s leading AI labs have built elaborate neural networks that can learn the vagaries of language by analyzing articles and books written by humans.

At Google, researchers built a system called Bert that combed through thousands of Wikipedia articles and a vast digital library of romance novels, science fiction and other self-published books.

Through analyzing all that text, Bert learned how to guess the missing word in a sentence. By learning that one skill, Bert soaked up enormous amounts of information about the fundamental ways language is constructed. And researchers could apply that knowledge to other tasks.

The Allen Institute built their Aristo system on top of Bert technology. They fed Bert a wide range of questions and answers. In time, it learned to answer similar questions on its own.

(0) comments

Welcome to the discussion.

Keep it Clean. Please avoid obscene, vulgar, lewd, racist or sexually-oriented language.
Don't Threaten. Threats of harming another person will not be tolerated.
Be Truthful. Don't knowingly lie about anyone or anything.
Be Nice. No racism, sexism or any sort of -ism that is degrading to another person.
Be Proactive. Use the 'Report' link on each comment to let us know of abusive posts.
Share with Us. We'd love to hear eyewitness accounts, the history behind an article.