Would you like to receive notifications on latest updates of the following headlines?

AI experts ready 'Humanity's Last Exam' to stump powerful tech

POSTED ON September 17, 2024 •   Technology      BY Abiodun Saheed Omodara
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024.y l. Credit: Reuters

A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like a child's play.

Dubbed, 'Humanity's Last Exam,' the project seeks to determine when expert-level AI has arrived. 

It aims to stay relevant even as capabilities advance in future years, according to the organisers, a non-profit called the Centre for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like US history, the other probing models' ability to reason through competition-level math. 

The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. 

OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organisers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity’s Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. 

Hendrycks said some questions on "Humanity's Last Exam" would remain private to make sure AI systems' answers are not from memorisation.

The exam will include at least 1,000 crowd-sourced questions due on November 1 that are hard for non-experts to answer. 

These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organisers want no questions about weapons, which some say would be too dangerous for AI to study.

0
READ ALSO
Digital Inequality in Africa: High costs, infrastructure gaps leave millions offline
BY Abiodun Saheed Omodara April 21, 2025 0

Despite significant investments in Nigeria and various regions of Africa, only 38 percent of the pop...

READ ALSO
CBEX unregistered digital assets exchange in Nigeria, SEC warns of investment risks
BY Abiodun Saheed Omodara April 19, 2025 0

The Securities and Exchange Commission (SEC) has announced that Crypto Bridge Exchange, also referre...

READ ALSO
NDPC launches initiative to combat cyberbullying, financial fraud through data protection
BY Abiodun Saheed Omodara April 7, 2025 0

The National Data Protection Commission (NDPC) on Monday reiterated its dedication to enhancing data...

READ ALSO
UNCTAD highlights risks of AI disparities as market approaches $4.8trn
BY Abiodun Saheed Omodara April 7, 2025 0

The widespread adoption of artificial intelligence (AI) worldwide, along with the emergence of new t...

READ ALSO
AI's Role in Spiritual Guidance: Enhancing teachings while upholding values
BY Abiodun Saheed Omodara April 5, 2025 0

Artificial Intelligence (AI), a collection of technologies programmed into computers to execute vari...

READ ALSO
U.S. shows highest anxiety over AI Job loss amidst technological advancements
BY Abiodun Saheed Omodara April 3, 2025 0

Despite its advanced status, research indicates that the United States of America (USA) has the high...

READ ALSO
NITDA Partners Afrovision technologies to bridge job gap for Nigeria’s Tech Talent
BY Abiodun Saheed Omodara April 3, 2025 0

In an effort to tackle the ongoing challenge of job placement for Nigeria’s expanding tech tal...

READ ALSO
OpenAI valuation hits $300 billion after SoftBank-led fund
BY Abiodun Saheed Omodara April 2, 2025 0

The Japanese telecommunications company, alongside a group of investors, has recently announced yet...

OUR CHANNELS:

Dangote Rejects Claims of Inadequate Fuel Supply Amidst Marketers' Accusations
BY Abiodun Saheed Omodara May 12, 2025 0

The Dangote Petroleum Refinery has dismissed accusations from oil marketers claiming that the facili...


Students Seek Justice: 2025 UTME Candidates Prepare to Sue JAMB Following Exam Failures
BY Abiodun Saheed Omodara May 12, 2025 0

Thousands of participants of the 2025 Unified Tertiary Matriculation Examination are planning to tak...


Nigeria's Public Debt Hits N144.67 Trillion as Government Borrows Heavily
BY Abiodun Saheed Omodara May 12, 2025 0

Nigeria's overall public debt increased by N57.3 trillion in the first 18 months of the current admi...


Onigbongbo LCDA Chairman Rejects APC Primary Results, Calls Process a 'Sham
BY Abiodun Saheed Omodara May 12, 2025 0

The chairman of Onigbongbo Local Council Development Area (LCDA), Olufunke Hassan, has dismissed the...


European Football: Premier League, La Liga, Serie A, Bundesliga, and Eredivisie Set for Action
BY Abiodun Saheed Omodara May 11, 2025 0

Football enthusiasts are set for a thrilling day of matches in Europe’s premier leagues today,...


India-Pakistan Ceasefire in Jeopardy: Accusations Fly Amidst Fragile Peace
BY Abiodun Saheed Omodara May 12, 2025 0

India and Pakistan exchanged accusations of violating a ceasefire early Sunday, just hours after US...


1,006 Foreign Nationals Granted Citizenship Since 2017
BY Abiodun Saheed Omodara May 11, 2025 0

ABUJA, Nigeria (NAN) - The Federal Government officially granted Nigerian citizenship to at least 1,...


PDP's Internal Strife: Gov’s Convene Key Meeting to Tackle Defections and Future Strategy
BY Abiodun Saheed Omodara May 12, 2025 0

In a broader discussion on how to tackle the recent surge of defections and the turmoil within the P...


Ibas vows to tackle piracy, oil theft in Rivers
BY Abiodun Saheed Omodara May 11, 2025 0

The Rivers State Government has urged for enhanced collaboration among security agencies, community...


Lagos APC: Four LCDAs Announce Consensus Candidates for Primaries
BY Abiodun Saheed Omodara May 12, 2025 0

LAGOS, Nigeria - In preparation for the local government election on July 12 in Lagos State, four Lo...


More Articles

Load more...

Menu