Would you like to receive notifications on latest updates of the following headlines?

AI experts ready 'Humanity's Last Exam' to stump powerful tech

POSTED ON September 17, 2024 •   Technology      BY Abiodun Saheed Omodara •   VIEWS 16
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024.y l. Credit: Reuters

A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like a child's play.

Dubbed, 'Humanity's Last Exam,' the project seeks to determine when expert-level AI has arrived. 

It aims to stay relevant even as capabilities advance in future years, according to the organisers, a non-profit called the Centre for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like US history, the other probing models' ability to reason through competition-level math. 

The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. 

OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organisers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity’s Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. 

Hendrycks said some questions on "Humanity's Last Exam" would remain private to make sure AI systems' answers are not from memorisation.

The exam will include at least 1,000 crowd-sourced questions due on November 1 that are hard for non-experts to answer. 

These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organisers want no questions about weapons, which some say would be too dangerous for AI to study.

0
RECOMMENDED FOR YOU
Covert intelligence operations: Meta bans RT, other Russian state media networks
BY Abiodun Saheed Omodara September 17, 2024 0

Facebook owner Meta said on Monday it was banning RT, Rossiya Segodnya and other Russian state media...

RECOMMENDED FOR YOU
Apple shares fall as iPhone 16 delivery times signal soft demand
BY Abiodun Saheed Omodara September 17, 2024 0

Shares of Apple fell nearly 3% on Monday after some analysts said delivery times for the new iPhone...

RECOMMENDED FOR YOU
Drivers more likely to be distracted using partial automation — Study
BY Abiodun Saheed Omodara September 17, 2024 0

Drivers are more likely to engage in non-driving activities, such as checking their phones or eating...

RECOMMENDED FOR YOU
AI experts ready 'Humanity's Last Exam' to stump powerful tech
BY Abiodun Saheed Omodara September 17, 2024 0

A team of technology experts issued a global call on Monday seeking the toughest questions to pose t...

RECOMMENDED FOR YOU
FG assures civil servants of IPPIS data security
BY Abiodun Saheed Omodara September 17, 2024 0

The Federal Government has assured civil servants that the Integrated Personnel and Payroll Informat...

RECOMMENDED FOR YOU
SDAIA, NEOM partner to bolster AI research and innovation
BY Abiodun Saheed Omodara September 12, 2024 0

The Saudi Data and AI Authority (SDAIA) signed a memorandum of understanding with NEOM on Wednesday...

RECOMMENDED FOR YOU
Adobe to launch generative AI video creation tool 2024
BY Abiodun Saheed Omodara September 12, 2024 0

Adobe plans to introduce a new generative AI-based video creation and editing tool in a limited rele...

RECOMMENDED FOR YOU
Scientific Testing Continues for First International AI Olympiad in Riyadh
BY Abiodun Saheed Omodara September 12, 2024 0

Participants in the first International AI Olympiad (IAIO) continue their scientific tests in Riyadh...

OUR CHANNELS:

OTHER ARTICLES ::

19th September, 2024
Court grants Tambuwal's ex-aide remanded for defaming Sokoto governor
BY Abiodun Saheed Omodara September 19, 2024 0

A Sokoto Chief Magistrates' Court on Wednesday granted bail to Shafi’u Umar, who is standing t...


FG unveils advanced SCADA technology for national grid
BY Abiodun Saheed Omodara September 19, 2024 0

The Federal Government has unveiled an advanced Supervisory Control and Data Acquisition system for...


29 governors borrow fresh N446bn as revenue slumps
BY Benedicta Bassey September 19, 2024 0

The rise in the debt service costs in the country has forced 29 state governors to borrow a total su...


Only consequential presidents get shot at, boasts Trump
BY Abiodun Saheed Omodara September 19, 2024 0

Donald Trump resumed campaigning on Tuesday for the first time since a second apparent attempt on hi...


FG, states, LGs share N1.2tn in August – FAAC
BY Abiodun Saheed Omodara September 19, 2024 0

The Federal Accounts Allocation Committee has said it disbursed N1.2 trillion earned as revenue in A...


NCDC reports higher Mpox infections among males in Nigeria
BY Abiodun Saheed Omodara September 19, 2024 0

Data from the Nigeria Centre for Disease Control and Prevention (NCDC) has indicated that more males...


Major marketers to import 141million litres of petrol
BY Abiodun Saheed Omodara September 19, 2024 0

Barring any unexpected occurrences, three major oil marketers are anticipating the arrival of vessel...


Cardi B, Offset sued over nonpayment of debt
BY Ebiakuboere England September 19, 2024 0

American Celebrity couple, Cardi B and Offset are being sued for using a beautiful Beverly Hills pro...


NAFDAC seals cosmetic factory for allegedly producing fake products
BY Abiodun Saheed Omodara September 19, 2024 0

The National Agency for Food and Drug Administration and Control (NAFDAC) has shut down a cosmetic f...


ICPC arrests Kaduna ex-commissioner at Lagos Airport
BY Abiodun Saheed Omodara September 19, 2024 0

Operatives of the Independent Corrupt Practices and Other Related Offences Commission (ICPC) have ar...


Menu