¤³¤Î¥Ú¡¼¥¸¤ÏEtoJÃà¸ìËÝÌõ¥Õ¥£¥ë¥¿¤Ë¤è¤Ã¤ÆËÝÌõÀ¸À®¤µ¤ì¤Þ¤·¤¿¡£

ËÝÌõÁ°¥Ú¡¼¥¸¤Ø


This tiny AI startup just Ä𵤹¤ëd Google¡Çs Gemini 3 on a ½ÅÍ×¤Ê ¿äÏÀ¤¹¤ë¡¿Íýͳing ¼Â¸³¡Ê¤¹¤ë¡Ë ¡½ here's what we know

New on Yahoo

ÈÇs
? 2025 All ¸¢Íøs reserved.
ÀëÅÁ
ÀëÅÁ
ÀëÅÁ
Tom's Guide

This tiny AI startup just Ä𵤹¤ëd Google¡Çs Gemini 3 on a ½ÅÍ×¤Ê ¿äÏÀ¤¹¤ë¡¿Íýͳing ¼Â¸³¡Ê¤¹¤ë¡Ë ¡½ here's what we know

Amanda Caswell

When you buy through links on our articles, ̤Íè and its syndication partners may earn a ¡ÊÇäÇã¡Ë¼ê¿ôÎÁ¡¤°ÑÂ÷¡Ê¤¹¤ë¡Ë¡¿°Ñ°÷²ñ¡¿¸¢¸Â.

 Adobe Firefly image of superintelligence.
Credit: Adobe Firefly/̤Íè AI

Since Gemini 3 made its debut, it has ¼óÈø¤è¤¯ held the ºÇ¹â¤Î¡¤¤ò±Û¤¹ °ÌÃÖ¡¿±øÅÀ¡¿¸«¤Ä¤±½Ð¤¹ on the LMArena leaderboard. This leaderboard is a crowdsourced ºÇ¹â°Ì¤Î where thousands of real »ÈÍѼÔs compare AI models Ĺ¡¤Î¨¤¤¤ë-to-Ĺ¡¤Î¨¤¤¤ë across a wide ÈÏ°Ï of »Å»ös, Åêɼ¡Ê¤¹¤ë¡Ëing on which ÊÖÅú is better. But when it comes to reaching the toughest ¿äÏÀ¤¹¤ë¡¿Íýͳing ¡ÊȽÃǤΡ˴ð½às, there's a new kid on the Éõº¿¤¹¤ë, and it's already pulled ahead of Google ¡½ and it did it without training its own model.

A six-person team startup known as Poetiq says it has taken the ºÇ¹â¤Î¡¤¤ò±Û¤¹ °ÌÃÖ¡¿±øÅÀ¡¿¸«¤Ä¤±½Ð¤¹ on the ARC-AGI-2 Ⱦʬ-»äŪ¤Ê ¼Â¸³¡Ê¤¹¤ë¡Ë »Ï¤á¤ë¡¤·è¤á¤ë, a °­Ì¾¹â¤¯¤â difficult ¿äÏÀ¤¹¤ë¡¿Íýͳing challenge created by AI ¸¦µæ°÷ Fran?ois Chollet. The startup¡Çs system ÆÀÅÀ¤¹¤ë¡¿ÈóÆñ¤¹¤ë¡¿£²£°d 54 ¥Ñ¡¼¥»¥ó¥È, ¿É¾¡¤¹¤ë¡¿Í¥°Ìing out what Google °ÊÁ° Êó¹ð¡Ê¤¹¤ë¡Ë¡¿²±Â¬d for Gemini 3 ¿¼¤¤ Think at around 45 ¥Ñ¡¼¥»¥ó¥È.

To put that in »ëÌî, most AI models were stuck under 5 ¥Ñ¡¼¥»¥ó¥È on this ¡ÊȽÃǤΡ˴ð½à just six months ago. ³ä¤ìÌÜing 50 ¥Ñ¡¼¥»¥ó¥È is something ¸¦µæ°÷s ¹­ÈϰϤˤ錄¤Ã¤Æ assumed was years away.

ÀëÅÁ
ÀëÅÁ

And the most surprising part: Poetiq¡Çs Âdz« wasn¡Çt ÎÏ¡¿¶¯ÎϤˤ¹¤ëd by a new frontier model ¡½ but by a smarter way of orchestrating ¸ºß¤¹¤ëing ones.

How Poetiq pulled this off

Leaderboard
Credit: Poetiq

Instead of building a massiv e transformer from scratch, Poetiq developed what it calls a meta-system; essentially an AI ´ÆººÌò that ´ÆÆÄ¤¹¤ës, critiques and ²þÁ±¤¹¤ës the À¸»º¡Ê¹â¡Ës of whatever model you plug into it. For their ARC-AGI-2 work, the team used Gemini 3 ¥×¥í¤Î¡¿»¿À®¤Î as the base model.

Poetiq ½Ò¤Ù¤ës the system as a tight optimization ÃèÊÖ¤êÈô¹Ô: À¸À®¤¹¤ë > critique > ÀºÀ½¤¹¤ë > Ω¾Ú¤¹¤ë.

Here¡Çs what makes it stand out:

  • No retraining Í׵᤹¤ëd: The system adapts to new models within hours

  • Built ´°Á´¤Ë on off-the-shelf LLMs: No custom ȳ¶â-tuning

  • Lower cost: Google¡Çs ¿¼¤¤ Think ÊóÆ»¤Ë¤è¤ì¤Ð costs ~$77 per »Å»ö; Poetiq¡Çs system ran closer to $30

  • Open source: The solver is public and inspectable

  • Self-auditing: The system ɾ²Á¤¹¤ës its own answers before returning a final result

On the company website, Poetiq¡Çs team says the approach ºîÉÊ by squeezing more ¿äÏÀ¤¹¤ë¡¿Íýͳing ÎÏ¡¿¶¯ÎϤˤ¹¤ë out of ¸ºß¤¹¤ëing LLMs ¡½ not by µ¬ÌÏing brute-·³Ââ ·×»»¤¹¤ë.

Why ARC-AGI-2 »öÊÁs

Artificial intelligence concept image
Credit: Shutterstock

While most ¡ÊȽÃǤΡ˴ð½às ¼êÃÊ ¶¹¤¯¤¹¤ë µ»½Ñs like coding or math, ARC-AGI-2 is designed to ¼Â¸³¡Ê¤¹¤ë¡Ë something deeper: pattern ¾µÇ§, analogy, abstract ¿äÏÀ¤¹¤ë¡¿Íýͳing, and the Æù¿Æ¡¤¿ÆÎàd of generalization humans learn in Áá´ü¤Ë childhood.

ÀëÅÁ
ÀëÅÁ

It¡Çs ¸Î°Õ¤Ë hard and famously unfriendly to today¡Çs LLMs. Even many frontier models fail spectacularly.

That¡Çs why the leap from Áª¤Ó½Ð¤¹¡¿ÆÈ¿È-digit ÆÀÅÀ¤¹¤ë¡¿ÈóÆñ¤¹¤ë¡¿£²£°s to 54 ¥Ñ¡¼¥»¥ó¥È in half a year has turned Ĺ¡¤Î¨¤¤¤ës. It ¼¨º¶¤¹¤ës ¿ÊÊâ in ¿äÏÀ¤¹¤ë¡¿Íýͳing methods, not just raw model µ¬ÌÏ.

However, Poetiq¡Çs result ŬÍѤ¹¤ës ÆÃ¤Ë to the Ⱦʬ-»äŪ¤Ê ¼Â¸³¡Ê¤¹¤ë¡Ë »Ï¤á¤ë¡¤·è¤á¤ë, which is not fully open to the public. The company ¾ì½ê¡¿°ÌÃÖ says the result has been Ω¾Ú¤¹¤ëd by the ¡ÊȽÃǤΡ˴ð½à¡Çs ÁÈ¿¥¼Ôs ¡½ but ÆÈΩ¤·¤¿¡¦Ìµ½ê° third-party replication is still ̤²ò·è¤Î, which is important for a ¡ÊȽÃǤΡ˴ð½à this ±Æ¶ÁÎϤΤ¢¤ë.

Perhaps the next Âdz« won¡Çt come from bigger models as Poetiq¡Çs work ºÇ¹âĬ¤Î¾ìÌÌs a growing ·¹¸þ in AI: ¿ÊÊâ doesn¡Çt always Í׵᤹¤ë billion-dollar ÁÈ¿¥¡¿´ðÈס¿²¼Éô¹½Â¤ or a ÊúÍʤ¹¤ë ¸¦µæ lab.

ÀëÅÁ
ÀëÅÁ

If systems like this generalize beyond ¡ÊȽÃǤΡ˴ð½às, to planning, coding, ¸¦µæ or real-world ·èÄ꡿ȽÄ꾡¤Á¡Ê¤¹¤ë¡Ë-making, it could reshape how AI is developed. Instead of waiting for the next Âdz« model, companies might build ÁØd ÃÎǽ that makes today¡Çs models smarter, cheaper and more °ì´Ó¤·¤¿.

Äì¡Ê¤ËÆÏ¤¯¡Ë line

Poetiq has open-sourced its ARC-AGI solver so ¸¦µæ°÷s can ¼Â¸³¡Ê¤¹¤ë¡Ë, ±äŤ¹¤ë or challenge the results. The ¡ÊȽÃǤΡ˴ð½à has a hidden ¼Â¸³¡Ê¤¹¤ë¡Ë »Ï¤á¤ë¡¤·è¤á¤ë, and history shows results can ž´¹ once more people run ÆÈΩ¤·¤¿¡¦Ìµ½ê° evaluations.

If Poetiq¡Çs numbers »ý¤Ä¡¿¹´Î±¤¹¤ë, this could ¼¨¤¹ a turning point in AI ¿äÏÀ¤¹¤ë¡¿Íýͳing ¸¦µæ. A six-person team may have just shown that orchestrating models can ¶¥ÁèÁê¼ê, or even ¡Ê·Ù´±¤Î¡Ë½ä²ó¶è°è¡¤¼õ»ý¤Á¶è°è, training bigger ones. Poetiq just ¾ÚÌÀ¤¹¤ëd you don¡Çt need a µðÂç¡Ê¤Ê¡Ë lab to ¾¡Íø¡¤¾¡¤Ä a °ìÏ¢¤Î²ñµÄ¡¢¸ò¾Ä¡¿´°À®¤¹¤ë.

More from Tom's Guide

Follow Tom's Guide on Google News and Äɲ乤ë us as a preferred source to get our up-to-date news, ʬÀÏ, and reviews in your ÎÁ¶âd.

Google News
ÀëÅÁ
ÀëÅÁ
ÀëÅÁ
ÀëÅÁ