Home / News / New AGI Test Challenges AI Models, Revealing Gaps in Intelligence

Table of Contents

New AGI Test Challenges AI Models, Revealing Gaps in Intelligence

Key Features:

  • A new AGI benchmark reveals that today’s most advanced AI models still fall short of artificial general intelligence.
  • The ARC-AGI-1 dataset, now five years old, continues to challenge AI systems with abstract, high-level reasoning tasks.
  • Even top AI models from platforms like OpenAI, Google, and Hugging Face AI showed inconsistent results.
  • Tools like Gizmo AI and Quillbot AI perform well in narrow tasks but lack AGI-level flexibility.
  • ARC-AGI-1 exposes a critical gap in how we check for AI intelligence across different use cases.
  • The AGI test challenges the way we measure intelligence, urging the industry to develop AI that can generalize knowledge rather than just perform.

A newly developed benchmark for artificial general intelligence (AGI) is shaking up the AI industry, putting even the most advanced AI models to the test. The AGI evaluation, recently introduced and examined by researchers, reveals surprising weaknesses in how today’s top AI models handle reasoning, logic, and real-world problem-solving. The AI test, which was designed to measure higher-order cognitive ability, has proven that while many AI systems excel at specific tasks, they still struggle with general intelligence, the hallmark goal of AGI.

TechCrunch reports that the test stumped several leading models from platforms like OpenAI, Google, and Hugging Face AI. The surprising results prompt researchers to rethink how we evaluate intelligence in artificial systems. Despite high AI scores in standard benchmarks, the new AGI test challenges those metrics by introducing complex, abstract questions that require more than just pattern recognition.

This performance gap underscores the current limitations of even the top AI models. Systems previously thought to be approaching AGI-level capability performed inconsistently, revealing significant gaps in understanding, adaptability, and reasoning. It’s a stark reminder that while AI tools are becoming more powerful, artificial general intelligence remains elusive.

As the AI race intensifies, businesses and developers are closely monitoring how different models perform under pressure. Digital Software Labs has followed this evolution in their recent roundup of the hottest AI models right now, showing how AI scores vary between tasks and confirming that no current model has mastered AGI-level thinking. These findings further validate that even highly refined systems lack the general adaptability that true AGI demands.

ARC-AGI-1 Dataset Turns Five

The ARC-AGI-1 dataset, the foundation for the new AGI test, has reached its five-year milestone and continues to challenge the very best. This upgraded version of the Abstraction and Reasoning Corpus aims to push the boundaries of AI models by testing their abilities to think abstractly, create original solutions, and reason through unfamiliar problems.

Platforms across the spectrum, including OpenAI, Google AGI initiatives, and frameworks hosted by Hugging Face AI, have run their models through ARC-AGI-1. Despite these platforms achieving remarkable results in natural language tasks, their models failed to show consistent performance across ARC’s problem-solving metrics. This raises critical concerns about how we check AI systems for actual intelligence versus performance optimization.

Evaluators noted that even highly capable systems, including GPT-powered platforms and modern Skype models, struggled to achieve high AI scores on this dataset. That inconsistency highlights how much more development is needed before we can confidently claim to have reached artificial general intelligence.

This shift in intelligence benchmarking is reshaping how we evaluate and check for AI readiness. For example, Gizmo AI, reviewed by Digital Software Labs, excelled in specific task automation but lacked the reasoning scope ARC-AGI-1 demands. Similarly, the platform’s review of Quillbot AI revealed exceptional performance in targeted rewriting, but like many AI systems today, it showed limitations in learning abstract rules without prior context.

Digital Software Labs continues to provide critical industry insight through AI reviews, such as Gizmo AI and Quillbot, and coverage in its news section.

These insights demonstrate the growing gap between narrow AI functionality and the holistic intelligence that AGI aspires to achieve. As these tests gain traction, developers are being challenged to build not just smarter models but models that understand.

Let’s build something
great together.
By sending this form, I confirm that I have read and accepted the Privacy Policy.

Let’s build something
great together.

By sending this form, I confirm that I have read and accepted the Privacy Policy.

ClickBasket — AI-Powered Smart Retail Platform

Intelligent digital retail ecosystem —

Transforming online shopping through predictive recommendations, behavioral insights, and conversational AI assistance.

Services —

Overview —

ClickBasket was developed as a next-generation online retail platform powered by artificial intelligence.

We implemented a machine learning-driven recommendation engine capable of analyzing user preferences, browsing behavior, and purchasing history to deliver highly relevant product suggestions.

The system integrates intelligent search, automated product categorization, and a conversational shopping assistant that guides customers through discovery and checkout. Retail analytics tools provide business owners with actionable insights into purchasing trends and customer lifetime value.

Higher
Conversions

Through personalized recommendations

Improved
retention

Driven by AI personalization

Scalable
infra

Supporting peak seasonal traffic

Reduced
abandonment

Want similar results for your business?

Engaging with 1 billion
users,
across the
Google portfolio.

Overview —

AI personalization engine.
Delivered smart product recommendations in real time.

Predictive search upgrade.
Enhanced product discovery through behavioral insights.

Conversational AI integration.
Added a virtual assistant for guided shopping journeys.

Retail analytics deployment.
Enabled data-driven inventory and sales decisions.



Let’s Connect!

We specialize in developing eye tracking-based digital biomarkers, revolutionizing the way we understand and monitor cognitive processes in real-time.We specialize.

MyFitnessPal — Scalable Health & Wellness Optimization

Digital health performance enhancement —

Supporting millions of users with faster tracking, reliable integrations, and seamless wellness data synchronization.

Services —

Overview —

For a globally recognized health and nutrition platform, our focus was on performance scaling and ecosystem reliability.

We optimized backend systems to handle high volumes of nutritional logs, exercise tracking, and wearable device data. Enhancements improved synchronization speed between devices and the app, ensuring users received accurate, real-time health insights.

Additionally, we refined user experience flows to reduce friction in daily tracking habits, making calorie logging, macro tracking, and fitness monitoring faster and more intuitive.

35%+

Improvement in app responsiveness

Daily
Consistency

Consistent Tracking

Sync
latency

Improved route optimization efficiency

User
ratings

Across app stores

Want similar results for your business?

Engaging with 1 billion
users,
across the
Google portfolio.

Overview —

Performance enhancement initiative.
Optimized backend systems for high-volume health tracking.

Seamless device integration.
Improved real-time sync with wearables and APIs.

User experience refinement.
Simplified logging for faster daily tracking.

Infrastructure scaling.
Strengthened reliability to support global users.

Let’s Connect!

We specialize in developing eye tracking-based digital biomarkers, revolutionizing the way we understand and monitor cognitive processes in real-time.We specialize.

Marketly — Creator-Driven Digital Marketplace

Scalable creator commerce ecosystem —

Enabling creators to monetize content, connect with audiences, and scale digital businesses seamlessly.

Services —

Overview —

Marketly was built as a creator-first digital marketplace designed to simplify how independent creators sell products and digital assets to their communities.

We developed a scalable commerce infrastructure supporting digital downloads, physical goods, subscription services, and audience engagement tools. The platform includes intuitive storefront management, real-time sales analytics, and performance tracking dashboards.

The goal was to create an ecosystem where creators could operate like full-scale businesses, with automation, insights, and smooth transaction flows driving sustainable growth.

User
Adoption

Across multiple creator categories

Market
Retention

Through personalized discovery

Scalable
Pay

Infrastructure availability

40%+

Increase in creator transaction volume

Want similar results for your business?

Engaging with 1 billion
users,
across the
Google portfolio.

Overview —

Creator-first ecosystem launch.
Built a marketplace tailored to independent digital entrepreneurs.

Unified commerce integration.
Combined storefronts, payments, and analytics into one platform.

Scalable transaction framework.
Supported growing user and product volumes without friction.

Revenue growth enablement.
Equipped creators with tools to monetize sustainably.

Let’s Connect!

We specialize in developing eye tracking-based digital biomarkers, revolutionizing the way we understand and monitor cognitive processes in real-time.We specialize.

Uber — AI Infrastructure for Intelligent Mobility

Advanced mobility intelligence platform —

Powering real-time transportation decisions through scalable AI, predictive analytics, and intelligent automation.

Services —

Overview —

For a global mobility leader, we engineered a robust AI infrastructure designed to process large-scale transportation data and convert it into actionable intelligence.

Our solution focused on real-time ride demand prediction, traffic behavior analysis, and automated operational decision-making. By building scalable data pipelines and intelligent modeling frameworks, we enabled faster dispatch logic, improved driver allocation strategies, and optimized route efficiency.

The platform was built with elasticity in mind, capable of handling fluctuating demand volumes while maintaining speed, stability, and security across regions.

1B+

Data events processed annually

30%+

Faster operational decision cycles

25%

Improved route optimization efficiency

99.99%

Infrastructure availability

Want similar results for your business?

Engaging with 1 billion
users,
across the
Google portfolio.

Overview —

Predictive mobility intelligence.
Shifted operations from reactive dispatching to AI-powered demand forecasting.

Real-time data automation.
Enabled instant decision-making through high-speed analytics pipelines.

Global scalability upgrade.
Built cloud infrastructure capable of handling massive ride volumes seamlessly.

Operational efficiency boost.
Reduced manual processes with intelligent automation systems.

Let’s Connect!

We specialize in developing eye tracking-based digital biomarkers, revolutionizing the way we understand and monitor cognitive processes in real-time.We specialize.