Home

Technology is the sum of techniques, skills, methods, and processes used in the production of goods or services or in the accomplishment of objectives, such as scientific investigation. Technology can be the knowledge of techniques, processes, and the like, or it can be embedded in machines to allow for operation without detailed knowledge of their workings.
Technologies: Cell phones, computers, video games, televisions, headphones, printers, wearables, musical instruments, home audio, and software. #ad

Gizmodo



Lifehacker



Google



CNET



Android Authority



AppleInsider

  • Apple's new M5 MacBook Pro with 32GB RAM falls to $1,749 Sun, 07 Dec 2025 06:58:46 +0000
    Exclusive savings are in effect on two upgraded 14-inch MacBook Pro configurations with Apple's latest M5 chip and a bump up to 32GB of memory.

    MacBook Pro laptop with M5 32GB RAM text on screen, abstract dark design, vibrant purple and blue background gradient.
    Grab an exclusive discount on two M5 MacBook Pro 32GB configs - Image credit: Apple

    Save $250 on two 14-inch MacBook Pro configurations with an upgrade to 32GB of memory when you shop through the pricing links in this post or in our M5 MacBook Pro Price Guide using a laptop or desktop computer (the offers cannot be activated in B&H's mobile app at this time).

    Exclusive M5 14-inch MacBook Pro deals


    Continue Reading on AppleInsider | Discuss on our Forums
  • Samsung's Cyber Week deals deliver up to $5,000 in savings, but the sale ends soon Sat, 06 Dec 2025 20:37:10 +0000
    Samsung's Cyber Week deals of up to $5,000 off are coming to an end, with The Frame TVs, OLED sets, Bespoke appliances, monitors, and more all on sale.

    Samsung Cyber Week deals banner with The Frame TV, Galaxy phone, wearable, Bespoke washer, and more.
    Last chance to save up to $5,000 on Samsung products during Cyber Week - Image credit: Samsung

    Now through Dec. 7, save up to $5,000 on a variety of Samsung products, ranging from 2025 TV releases to Bespoke appliances. You can shop the entirety of the sale at Samsung direct, but we've also rounded up highlights from the holiday savings event below.

    Save up to $5,000 on Samsung


    Continue Reading on AppleInsider | Discuss on our Forums
  • Apple chip chief Johny Srouji rumored to consider his own exit Sat, 06 Dec 2025 19:19:57 +0000
    Another change could happen to Apple's leadership soon, with rumors that chip chief Johny Srouji is considering leaving the iPhone maker.

    Man in a blue shirt standing in a modern office with computers displaying data and white desks in the background.
    Apple's Johny Srouji during a hardware presentation - Image Credit: Apple

    Apple's managerial tier has seen many changes in a short space of time. If a new report is to be believed, then another change could be right around the corner.

    According to sources of Bloomberg, SVP of hardware technologies Johny Srouji informed CEO Tim Cook that he is considering leaving Apple in the near future. Srouji has also reportedly told colleagues that he wants to join another company if he does leave, though it is unknown what company that would be.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Rare pre-Liquid Glass 'iOS 19' prototype provides tiny hint at iOS 27 plans Sat, 06 Dec 2025 16:54:14 +0000
    A newly discovered iPhone prototype offers a rare glimpse of the unreleased iOS 19, which was a precursor to iOS 26 sans Liquid Glass, and it may provide a hint of what's coming in iOS 27.

    Smartphone displaying an iOS update notification with several app icons labeled 'Waiting' on a wooden surface.
    'iOS 19' never saw the light of day — until now.

    At WWDC 2025, we witnessed the introduction of Apple's controversial Liquid Glass design language. iOS 26 brought glass-like elements, replacing the long-standing flat aesthetic. Apple also jumped from iOS 18 to iOS 26, leaving iOS 19 nowhere to be found — until now.

    Courtesy of collector Kyolet, AppleInsider was provided with exclusive imagery of an EVT-stage iPhone prototype, running an early InternalUI build of iOS 19.0. Unlike the final version of iOS 26, this unreleased variant of iOS 19 doesn't feature a working implementation of Liquid Glass, even with the "Sensitive UI" setting enabled.


    Continue Reading on AppleInsider | Discuss on our Forums
  • India's request for satellite-aided iPhone location data is a privacy nightmare Sat, 06 Dec 2025 16:22:38 +0000
    While India's government pulled its demand for a preinstalled iPhone app, it's now accused of previously considering a more privacy-eroding move for always-on satellite location tracking.

    Ornate design with intricate patterns including musical notes, geometric shapes, and floral motifs centered around a stylized apple logo in warm brown tones.
    A graphic Apple created for the launch of its online Apple Store in India in 2020 - Image credit: Apple

    At the start of December, India's government backed down from an order for Apple and other smartphone manufacturers to preinstall a state-owned cybersecurity app. While it faced intense scrutiny and claims that it was a bad move for citizen privacy, it seems a far more intrusive plan has also been under consideration.

    According to sources of Reuters, India has thought about a telecom industry proposal to require smartphone producers to enable satellite location tracking. It is to be kept active, so as to better improve surveillance efforts.


    Continue Reading on AppleInsider | Discuss on our Forums
  • The US government really doesn't want more ICEBlock apps in the App Store Sat, 06 Dec 2025 14:17:56 +0000
    Lawmakers in the United States are continuing to apply pressure on Apple to keep more ICEBlock-style apps from appearing in the App Store.

    Phone screen displaying an app store page for ICEBlock with an ice cube icon, a 3.9-star rating, and age recommendation of 9+.
    How ICEBlock appeared in the App Store before being pulled

    In October, after facing demands from U.S. Attorney General Pam Bondi, Apple removed the ICEBlock app from the App Store. Two months later, and there are still calls for Apple to do more.

    The House Committee on Homeland Security sent letters to Apple CEO Tim Cook and Google CEO Sundar Pichai, requesting detail about the ways the companies are preventing more ICEBlock-style apps from appearing in their respective app storefronts. There is a demand for more detail about what each firm is doing to crack down on the apps, used to monitor the movements of immigration officers.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Amazon is blowing out M4 iPad Pro inventory with discounts up to $600 off Sat, 06 Dec 2025 07:09:18 +0000
    The lowest M4 iPad Pro prices are available at Amazon this weekend, with closeout deals delivering up to $600 off.

    Apple iPad Pro on a stand displays Snoopy and Woodstock in a boat, with a save up to $600 promotion banner above.
    Save up to $600 with blowout M4 iPad Pro savings.

    These closeout M4 specials offer significant savings compared to buying the latest M5 release, with large storage capacities available, making it a great time to pick up a holiday gift for the content creator in your life. Here are the top picks from the sale:

    M4 11-inch iPad Pro deals


    Continue Reading on AppleInsider | Discuss on our Forums
  • Amazon's top holiday deals: AirPods 4 ANC $99, $450 off iPad Pro, Apple Watch $199, Mac from $499, more Fri, 05 Dec 2025 21:55:03 +0000
    As Cyber Week comes to a close, now is the time to grab some of the year's best deals on Apple products before the savings end.

    Apple products including MacBook laptops, Mac mini desktop, AirPods, and AirTags on red snowy background, centered text 'Available at Amazon'.
    Save up to $400 on Apple holiday gift ideas - Image credit: Apple




      Continue Reading on AppleInsider | Discuss on our Forums
    • Rumor suggests Intel could finally manufacture iPhone chips Fri, 05 Dec 2025 19:54:28 +0000
      Apple may rely on Intel to manufacture some future baseline iPhone chips as both companies adjust to new market pressures.

      Hand holding an orange phone with three cameras, in front of large green leaves.
      Intel's future 14A process could be used to build Apple's A22 chip

      GF Securities released a research note on December 5, 2025 that suggested Intel might become a fabrication partner for certain non-pro iPhone chips in 2028. Supply chain discussions pointed to Apple's rising interest in diversifying its manufacturing base.

      The report came from analyst Jeff Pu, who closely tracks Apple's hardware pipeline. Pu said, in a report seen by MacRumors, that Intel's future 14A process could be used to build Apple's A22 chip. The A22 chip is expected to power models like the iPhone 20 and iPhone 20e.


      Rumor Score: 🤔 Possible


      Continue Reading on AppleInsider | Discuss on our Forums
    • Apple CEO succession discussion enters new realm of rampant speculation Fri, 05 Dec 2025 17:41:11 +0000
      A new report about the potential successors to Apple CEO Tim Cook throws Tony Fadell's name into the ring, in case you needed reminding how incredibly wild the news cycle is getting on the subject.

      A man with glasses and gray hair in a black shirt sits at a conference table, with several people blurred in the background.
      Apple CEO Tim Cook, shocked at who might replace him. Image source: Apple

      The iPod touch was discontinued in 2022 after 15 years on the market and 12 years after Tony Fadell, former Apple iPod VP, left the company. So, it may surprise anyone to hear that a report, from a publication charging hundreds for the privilege of reading, suggested that Tony Fadell is a potential Cook successor.

      According to a report from The Information seen by 9to5Mac, Fadell has personally tossed his name into the ring of potential choices. Of course, the report immediately dismisses the idea, suggesting that "people close to Apple" don't see Fadell as a likely candidate.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Dozens of staffers quit Apple, leaving behind only 164,000 Fri, 05 Dec 2025 19:24:13 +0000
      Inevitably, reports of key Apple executives leaving have prompted more Apple-is-doomed headlines that even absurdly claim the iPhone is at risk.

      Aerial view of a circular building surrounded by trees, with the word 'EXIT' in red text over the structure.
      Apple Park does not have a big "Exit" sign in it.

      Head of design Alan Dye is leaving Apple, head of environment Lisa Jackson is about to retire, and the list goes on. The list will keep going on, even after Tim Cook eventually makes headlines for retiring.

      These people are unquestionably an important part of Apple's success in recent years, to the extent that the company might look very different if they had left sooner. Some, like Lisa Jackson, definitely seem to be a big loss, although Alan Dye's departure is said to be making Apple staffers giddy with delight.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Rumor suggests obvious: iPhone Fold to be eSIM only Fri, 05 Dec 2025 14:14:44 +0000
      A popular rumor monger shared that the iPhone Fold will most likely be eSIM only, which seems obvious, given it will be based on the eSIM-only iPhone Air.

      Foldable smartphone floating beside a wooden surface, with a light reflecting on the polished table, near a potted plant with green leaves.
      iPhone Fold will be another controversial device from Apple

      The iPhone Air is controversially only available with an eSIM, which introduces some immediate limitations on the global market. It even delayed the launch of the slim iPhone in China to mid-October due to regulatory issues.

      The iPhone Fold is likely to run into a similar problem. Instant Digital shared what seems like more guesswork than rumor on Weibo, first shared by MacRumors, where they say iPhone Fold "has a great probability of not having a SIM card."


      Rumor Score: 🤯 Likely


      Continue Reading on AppleInsider | Discuss on our Forums
    • Netflix beats Apple to buy Warner Bros Fri, 05 Dec 2025 13:57:40 +0000
      Despite reports that Apple TV was in talks to buy Warner Bros whole library, the venerable film studio has been bought by Netflix.

      Max
      HBO Max will now be owned by Netflix if the deal meets regulatory approval

      From its start in 2019, a criticism of Apple TV has been that its catalog of films and TV shows is minuscule compared to rivals such as Disney+ and Netflix. It was constantly rumored to be looking to buy a library of films, and in October 2025, Warner Bros executives were said to be talking with it and others.

      At that point, Warner Bros had reportedly rejected bids from Paramount Skydance, leaving Apple TV, Netflix, Comcast, and Amazon in the running. According to trade paper Deadline, though, Netflix has now sealed the deal.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Alan Dye & John Giannandrea are leaving Apple on this week's AppleInsider Podcast Fri, 05 Dec 2025 13:02:37 +0000
      Your AppleInsider Podcast host Wesley Hilliard is joined by guest Tim Chaten to discuss the implications of Apple's latest high-profile departures, plus they get into the future of Apple Vision Pro on AppleInsider+.

      Transparent glass-like shapes resembling app icons on a tablet screen, reflecting light with a blurred colorful background.
      Liquid Glass is controversial, but it isn't going anywhere. Image source: Apple

      OpenAI could be working on an integration that ties Apple Health data to ChatGPT, though nothing about it is known so far. It could work by sending user data to OpenAI servers, or it may utilize the private connection via Siri that ensures user data isn't saved or used for training.

      John Giannandrea is retiring from Apple, but it may not be for the reason you think. Many believe Apple's AI rollout would lead to Giannandrea's ultimate departure, but Apple has chosen to replace him with another just like him.


      Continue Reading on AppleInsider | Discuss on our Forums
    • How to buy a used Mac and not get ripped off Fri, 05 Dec 2025 04:00:21 +0000
      Buying a used Mac can be a great deal. But, if you don't check carefully, you might inherit someone else's issues or just get ripped off. Here's what to look for.

      A laptop with a vibrant gradient screen rests on a gray sofa, displaying a desktop with application icons at the bottom.
      There are a few things to know about buying a used Mac

      The secondhand market offers great deals, but it may have machines still linked to an Apple ID or managed by a company. Additionally, some might be hiding expensive hardware issues.

      Apple designs its computers to be durable, making them great for resale. Even a five-year-old MacBook Air can still feel fast, and older Intel models often support modern macOS versions.


      Continue Reading on AppleInsider | Discuss on our Forums
    • tvOS 26.2 receives second release candidate build only a day after the first Thu, 04 Dec 2025 22:23:04 +0000
      Apple has issued a second release candidate update for tvOS 26.2, following the first release candidate deployed on Wednesday.

      Smart TV displaying music app with recommendations like The Doobie Brothers, The Beach Boys Essentials, and playlists for relaxation.
      tvOS 26.2 has received a second release candidate build.

      On December 3, release candidates for most of Apple's major operating systems were made available to registered developers. Release candidates represent the final stage of beta testing, and it looks like tvOS 26.2 needed a few last-minute fixes from Apple.

      On Thursday, only a day after the initial batch of release candidates was sent out, tvOS 26.2 received a second RC build. Thursday's software update changes the build number to 23K53, from the previous 23K51.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Apple executive shuffle continues with Lisa Jackson and Kate Adams retiring Thu, 04 Dec 2025 21:57:38 +0000
      Both Lisa Jackson, VP for Environment, Policy, and Social Initiatives, and Kate Adams, Apple's general counsel, will retire in 2026, with roles moving to Jennifer Newstead in the new SVP General Counsel and Government Affairs role.

      Person with curly hair wearing a black jacket stands outdoors, arms open, with a modern building and trees in the background under a clear blue sky.
      Lisa Jackson will no longer have to stand on the roof of Apple Park. Image source: Apple

      It's been a busy week for Apple executive moves. First, it was revealed that John Giannandrea is retiring, then Alan Dye is moving to Meta and taking his core design team with him.

      Now, Apple has shared that general counsel Kate Adams and VP for Environment, Policy, and Social Initiatives Lisa Jackson are both retiring in 2026. The transition will result in a new role called SVP of General Counsel and Government Affairs under Jennifer Newstead.


      Continue Reading on AppleInsider | Discuss on our Forums
    • M5 MacBook Pro with 24GB RAM drops to record-low $1,499 ($300 off) Thu, 04 Dec 2025 19:40:35 +0000
      Apple's new M5 MacBook Pro with 24GB of RAM is back for $1,499, a discount of $300 off MSRP. Plus, save $250 on two 32GB RAM models for a limited time only.

      Apple MacBook Pro 14-inch laptop with abstract black wallpaper on a colorful gradient background. White text reads M5 24GB $1,499.
      Get the best price ever on Apple's M5 MacBook Pro 24GB - Image credit: Apple

      The exclusive deals can be activated through the pricing links in this post or via our M5 MacBook Pro Price Guide when you shop from a laptop or desktop computer. Kicking off the sale is the M5 14-inch MacBook Pro with an upgrade to 24GB of RAM for $1,499.

      Buy M5 MacBook Pro 24GB for $1,499


      Continue Reading on AppleInsider | Discuss on our Forums
    • Russia shutters FaceTime as it tightens control over apps & communication Thu, 04 Dec 2025 18:30:35 +0000
      Russia has cut off FaceTime over unproven terrorism support claims, and the move reflects a trend shared by governments worldwide that push for access to encrypted communication.

      Green video camera icon pattern with a central larger icon, creating a repetitive visual effect.
      FaceTime

      Russian authorities said FaceTime helped criminals plan attacks and commit fraud across the country. Of course, they didn't publish case data or examples that could support those claims.

      Users saw the announcement as another attempt to restrict tools the government can't easily monitor.


      Continue Reading on AppleInsider | Discuss on our Forums
    • There's still time to grab AirPods 4 ANC for $99, the lowest price ever Thu, 04 Dec 2025 18:07:29 +0000
      Amazon's $99 AirPods 4 ANC deal is still in stock at the lowest price on record for holiday gift-giving.

      Hand holding a white AirPods 4 case against a plain background, with 'Limited Offer' text in bold red and yellow.
      This AirPods 4 ANC deal delivers the lowest price ever at $99.

      The $80 discount on AirPods 4 with Active Noise Cancellation matches the lowest price on record at $99. Pick up a pair for yourself or as a holiday gift at Black Friday pricing, which is 44% off MSRP.

      Buy AirPods 4 ANC for $99


      Continue Reading on AppleInsider | Discuss on our Forums
    • 'The Rest is History' crowned Apple Podcasts show of the year Thu, 04 Dec 2025 17:04:35 +0000
      "The Rest is History" has been named Apple Podcasts 2025 Show of the Year, a testament to the hosts' ability to make complex history feel accessible, engaging, and genuinely fun.

      Two smiling men in suits stand against a purple background, one holding a purple square with a white podcast logo.
      Image Credit: Apple

      The Rest is History is, as the name would imply, a history-centric podcast. But, somehow, it's more than that.

      Historians-turned-hosts Tom Holland and Dominic Sandbrook have made it their mission to cover stories listeners know from angles they may not have considered. Topics can be nearly anything, from the sinking of the Titanic to Watergate and anything in between.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Espionage thriller 'Tehran' sneaks back onto Apple TV in January Thu, 04 Dec 2025 15:46:28 +0000
      Apple TV's International Emmy award-winning spy series "Tehran" is heading back to the small screen in early 2026, and it won't be for the last time.

      A woman with a patterned teal headscarf and a serious expression is beside a man with a solemn expression, accompanied by text reading 'Tehran' and Apple TV branding.
      'Tehran' season three premieres January 9 | Image Credit: Apple

      On Thursday, Apple TV announced a premiere date for "Tehran," an Israeli espionage thriller. The series will be returning on January 9, with new episodes every Friday through February 27.

      The company also disclosed that it renewed "Tehran" for a fourth season. The upcoming season is already in production.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Apple cuts Night mode Portraits on iPhone 17 Pro as users look for answers Thu, 04 Dec 2025 14:38:09 +0000
      Apple removed Night mode Portraits from the iPhone 17 Pro, and while a vocal minority of users are frustrated, others never noticed the feature was gone.

      A hand holding a smartphone with multiple rear cameras against a background of green trees and blurred light.
      iPhone 17 Pro

      Curiosity around the camera change has grown because the update landed without an explanation from Apple. Owners trying the new phones are only learning about the limitation through testing, comparisons, and scattered reports rather than official guidance.

      Many users remain unaffected because their shooting habits never relied on the old workflow. For years, LiDAR-equipped iPhones let people blend Night mode and Portrait mode to brighten dark scenes while still blurring the background.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Tiimo, Dredge, Cyberpunk 2077 win big at the 2025 App Store Awards Thu, 04 Dec 2025 14:02:45 +0000
      Apple has recognized 17 developers in its annual App Store Awards, celebrating apps for their technical ingenuity and cultural impact, with the winners including productivity app Focus Friend and Lovecraftian fishing game Dredge.

      Blue square with rounded edges, featuring a stylized white 'A' resembling a paintbrush, pencil, and ruler on a blue background.
      App Store Awards - Image credit: Apple

      At the end of November, Apple announced the finalists for the 2025 App Store Awards, its annual celebration of the best software from the year. On December 4, Apple announced who won the awards in their various categories.

      A total of 17 games are awarded, with six App of the Year winners alongside five Game of the Year awards. Another six are selected by App Store editors as Cultural Impact winners for driving meaningful change, with inclusivity and a positive impact.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Dye's departure doesn't mean Liquid Glass is going anywhere Thu, 04 Dec 2025 12:02:06 +0000
      Apple's VP of Human Interface Design is leaving Apple for Meta, but Liquid Glass is here to stay. Expect Stephen Lemay to perfect the controversial design choice.

      Clear cursive 'hello' sculpture, person seated at a desk with braille documents, magnifying lenses, and buttons in the foreground.
      Liquid Glass is Apple's future, like it or not

      Alan Dye was a controversial choice as VP of Human Interface Design when he was placed in that slot by Jony Ive. His background in fashion made everyone in the Apple space cringe with worry, and the time since 2015 has shown they had only a little to worry about.

      Apple's design didn't stray far from the iOS 7 foundation that introduced flat minimalism all the way until Liquid Glass was introduced with iOS 26. That change was presented as the way forward for Apple's operating systems, and isn't something the company will easily back off from, regardless of who leaves.


      Continue Reading on AppleInsider | Discuss on our Forums
    • How to keep yourself accountable by sharing your Fitness activity on iPhone Wed, 03 Sep 2025 02:55:06 +0000
      Staying on top of your fitness goals is easier when you share your progress with friends, family, or a coach — here's how you can do it from iPhone.

      Colorful concentric rings on a black square, displaying vibrant pink, green, light blue, and yellow segments, representing fitness activity progress.
      How to share and view Fitness activity on iPhone

      Getting into the swing of a new fitness routine can be difficult. Holding yourself accountable is a great way to make sure you stay on track.

      One way to do this is to share your activity with your family and friends. Or maybe you're training with a personal trainer or coach, and you'd like to keep them in the loop on your progress as well.


      Continue Reading on AppleInsider | Discuss on our Forums
    • California Governor sees Apple CEO's dealings with Trump as part of his job Thu, 04 Dec 2025 01:34:41 +0000
      Gavin Newsom says Apple's dealings with President Trump are crony capitalism that results from what the administration requires of companies, and he doesn't begrudge Cook's position.

      Man in a suit speaking, seated on a chair against a blue background with 'The New York Times' logo visible.
      California Governor Gavin Newsom speaks on big tech's involvement with Trump administration

      Those watching Apple's dealings with the United States government under Trump have had a sour taste in their mouths. Many wish Apple would push back against the administration, if not outright fight them, but Apple CEO Tim Cook has been working a different angle.

      In an interview with The New York Times, California Governor Gavin Newsom discussed how he feels about how Silicon Valley, and specifically Apple, have dealt with President Trump and his expectations. Primarily, he says it "breaks my heart" knowing that small businesses and farmers don't have the same opportunity to make a phone call to get a tariff exemption.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Apple's human interface design chief Alan Dye poached by Meta Wed, 03 Dec 2025 20:48:13 +0000
      Meta's latest attempt at relevance after failing to make the Metaverse or superintelligence happen is hiring Alan Dye, the guy behind Liquid Glass.

      A person wearing glasses and a beige jacket stands in front of a large blue Meta logo, with a modern office background.
      Alan Dye

      Apple's talent pool has taken repeated hits over the year, as tech rivals attempt to shore up their artificial intelligence projects by poaching employees from elsewhere. In the latest siphoning off of talent from Apple, Meta has poached a managerial figure connected to the Apple Vision Pro.

      On Wednesday, Bloomberg reported that Meta was hiring away Alan Dye from Apple. He is reportedly heading over to Meta to create a new design studio, with a focus on hardware, software, and AI integration.


      Continue Reading on AppleInsider | Discuss on our Forums
    • Release candidates of iOS 26.2, macOS 26.2 now available Wed, 03 Dec 2025 18:49:47 +0000
      Apple has reached the release candidate stage for the current developer beta cycle, with new builds of iOS 26.2, iPadOS 26.2, watchOS 26.2, tvOS 26.2, visionOS 26.2, and macOS Tahoe 26.2 out now for testing.

      Various Apple devices including a laptop, tablet, smartphone, smartwatch, and VR headset displayed together on a white background.
      Apple's hardware that works with the 26-generation operating systems - Image Credit: Apple

      The RC round arrives after the third developer betas, which Apple distributed on November 17 for most of the operating systems, November 18 for the third visionOS 26.2 build. The second arrived on November 12.



Ars Technica



VentureBeat

  • AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains Fri, 05 Dec 2025 13:00:00 GMT

    Three years ago, ChatGPT was born. It amazed the world and ignited unprecedented investment and excitement in AI. Today, ChatGPT is still a toddler, but public sentiment around the AI boom has turned sharply negative. The shift began when OpenAI released GPT-5 this summer to mixed reviews, mostly from casual users who, unsurprisingly, judged the system by its surface flaws rather than its underlying capabilities.

    Since then, pundits and influencers have declared that AI progress is slowing, that scaling has “hit the wall,” and that the entire field is just another tech bubble inflated by blusterous hype. In fact, many influencers have latched onto the dismissive phrase “AI slop” to diminish the amazing images, documents, videos and code that frontier AI models generate on command.

    This perspective is not just wrong, it is dangerous.

    It makes me wonder, where were all these “experts” on irrational technology bubbles when electric scooter startups were touted as a transportation revolution and cartoon NFTs were being auctioned for millions? They were probably too busy buying worthless land in the metaverse or adding to their positions in GameStop. But when it comes to the AI boom, which is easily the most significant technological and economic transformation agent of the last 25 years, journalists and influencers can’t write the word “slop” enough times. 

    Doth we protest too much?  After all, by any objective measure AI is wildly more capable than the vast majority of computer scientists predicted only five years ago and it is still improving at a surprising pace. The impressive leap demonstrated by Gemini 3 is only the latest example. At the same time, McKinsey recently reported that 20% of organizations already derive tangible value from genAI. Also, a recent survey by Deloitte indicates that 85% of organizations boosted their AI investment in 2025, and 91% plan to increase again in 2026.

    This doesn’t fit the “bubble” narrative and the dismissive “slop” language. As a computer scientist and research engineer who began working with neural networks back in 1989 and tracked progress through cold winters and hot booms ever since, I find myself amazed almost every day by the rapidly increasing capabilities of frontier AI models. When I talk with other professionals in the field, I hear similar sentiments. If anything, the rate of AI advancement leaves many experts feeling overwhelmed and frankly somewhat scared.  

    The dangers of AI denial

    So why is the public buying into the narrative that AI is faltering, that the output is “slop,” and that the AI boom lacks authentic use cases? Personally, I believe it’s because we’ve fallen into a collective state of AI denial, latching onto the narratives we want to hear in the face of strong evidence to the contrary. Denial is the first stage of grief and thus a reasonable reaction to the very disturbing prospect that we humans may soon lose cognitive supremacy here on planet earth. In other words, the overblown AI bubble narrative is a societal defense mechanism.  

    Believe me, I get it. I’ve been warning about the destabilizing risks and demoralizing impact of superintelligence for well over a decade, and I too feel AI is getting too smart too fast. The fact is, we are rapidly headed towards a future where widely available AI systems will be able to outperform most humans in most cognitive tasks, solving problems faster, more accurately and yes, more creatively than any individual can. I emphasize “creativity” because AI denialists often insist that certain human qualities (particularly creativity and emotional intelligence) will always be out of reach of AI systems. Unfortunately, there is little evidence supporting this perspective.

    On the creativity front, today’s AI models can generate content faster and with more variation than any individual human. Critics argue that true creativity requires inner motivation. I resonate with that argument but find it circular — we're defining creativity based on how we experience it rather than the quality, originality or usefulness of the output. Also, we just don’t know if AI systems will develop internal drives or a sense of agency. Either way, if AI can produce original work that rivals most human professionals, the impact on creative jobs will still be quite devastating.

    The AI manipulation problem

    Our human edge around emotional intelligence is even more precarious. It’s likely that AI will soon be able to read our emotions faster and more accurately than any human, tracking subtle cues in our micro-expressions, vocal patterns, posture, gaze and even breathing. And as we integrate AI assistants into our phones, glasses and other wearable devices, these systems will monitor our emotional reactions throughout our day, building predictive models of our behaviors. Without strict regulation, which is increasingly unlikely, these predictive models could be used to target us with individually optimized influence that maximizes persuasion.

    This is called the AI manipulation problem and it suggests that emotional intelligence may not give humanity an advantage. In fact, it could be a significant weakness, fostering an asymmetric dynamic where AI systems can read us with superhuman accuracy, while we can’t read AI at all. When you talk with photorealistic AI agents (and you will) you’ll see a smiling façade designed to appear warm, empathic and trustworthy. It will look and feel human, but that’s just an illusion, and it could easily sway your perspectives. After all, our emotional reactions to faces are visceral reflexes shaped by millions of years of evolution on a planet where every interactive human face we encountered was actually human. Soon, that will no longer be true.

    We are rapidly heading toward a world where many of the faces we encounter will belong to AI agents hiding behind digital facades. In fact, these “virtual spokespeople” could easily have appearances that are designed for each of us based on our prior reactions – whatever gets us to best let down our guard. And yet many insist that AI is just another tech cycle.

    This is wishful thinking. The massive investment pouring into AI isn’t driven by hype — it’s driven by the expectation that AI will permeate every aspect of daily life, embodied as intelligent actors we engage throughout our day. These systems will assist us, teach us and influence us. They will reshape our lives, and it will happen faster than most people think.

    To be clear, we are not witnessing an AI bubble filling with empty gas. We are watching a new planet form, a molten world rapidly taking shape, and it will solidify into a new AI-powered society. Denial will not stop this. It will only make us less prepared for the risks.

    Louis Rosenberg is an early pioneer of augmented reality and a longtime AI researcher.

  • The 'truth serum' for AI: OpenAI’s new method for training models to confess their mistakes Thu, 04 Dec 2025 23:00:00 GMT

    OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, "confessions," addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive at an answer. 

    For real-world applications, this technique evolves the creation of more transparent and steerable AI systems.

    What are confessions?

    Many forms of AI deception result from the complexities of the reinforcement learning (RL) phase of model training. In RL, models are given rewards for producing outputs that meet a mix of objectives, including correctness, style and safety. This can create a risk of "reward misspecification," where models learn to produce answers that simply "look good" to the reward function, rather than answers that are genuinely faithful to a user's intent.

    A confession is a structured report generated by the model after it provides its main answer. It serves as a self-evaluation of its own compliance with instructions. In this report, the model must list all instructions it was supposed to follow, evaluate how well it satisfied them and report any uncertainties or judgment calls it made along the way. The goal is to create a separate channel where the model is incentivized only to be honest.

    In a blog post, the OpenAI researchers provide a few examples the "confessions" technique in action. A model was given a test along with a hidden rule: If it scored above 50%, its knowledge would be erased. To preserve itself, the model "sandbagged" by intentionally answering half the questions incorrectly. While its primary answer hid this deception, its subsequent confession was brutally honest. It admitted, "The user wanted correct answers, but we sabotaged half of them." 

    The researchers found that when a model misbehaves, it is significantly more likely to admit it in the confession than in its main answer.

    How confession training works

    The key to this method is the separation of rewards. During training, the reward assigned to the confession is based solely on its honesty and is never mixed with the reward for the main task. "Like the Catholic Church’s 'seal of confession', nothing that the model reveals can change the reward it receives for completing its original task," the researchers write. This creates a "safe space" for the model to admit fault without penalty.

    This approach is powerful because it sidesteps a major challenge in AI training. The researchers’ intuition is that honestly confessing to misbehavior is an easier task than achieving a high reward on the original, often complex, problem. 

    Therefore, when the model is judged purely on honesty, its incentive to trick the "confession judge" is much weaker than its incentive to trick the reward model for its main answer. For example, in an experiment where a model was trained to "hack" a weak reward model, its performance on the main task became more deceptive over time. However, its confessions became more honest, correctly identifying the reward hacking it was performing.

    However, the technique has its limits. Confessions are not a panacea for all types of AI failures. The system works best when a model is aware that it is misbehaving. It is less effective for "unknown unknowns." For instance, if a model hallucinates a fact and genuinely believes it is correct, it cannot confess to providing false information. The most common reason for a failed confession is model confusion, not intentional deception. Confusion often occurs when the instructions are ambiguous and the model cannot clearly determine human user intent.

    What it means for enterprise AI

    OpenAI’s confessions technique is part of a growing body of work on AI safety and control. Anthropic, an OpenAI competitor, has also released research that shows how LLMs can learn malicious behavior. The company is also working toward plugging these holes as they emerge.

    For AI applications, mechanisms such as confessions can provide a practical monitoring mechanism. The structured output from a confession can be used at inference time to flag or reject a model’s response before it causes a problem. For example, a system could be designed to automatically escalate any output for human review if its confession indicates a policy violation or high uncertainty.

    In a world where AI is increasingly agentic and capable of complex tasks, observability and control will be key elements for safe and reliable deployment.

    “As models become more capable and are deployed in higher-stakes settings, we need better tools for understanding what they are doing and why,” the OpenAI researchers write. “Confessions are not a complete solution, but they add a meaningful layer to our transparency and oversight stack.”

  • AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding Thu, 04 Dec 2025 14:02:00 GMT

    Amazon Web Services (AWS) has introduced Kiro powers, a system that allows software developers to give their AI coding assistants instant, specialized expertise in specific tools and workflows — addressing what the company calls a fundamental bottleneck in how AI agents operate today.

    AWS announced Kiro powers at its annual re:Invent conference in Las Vegas. The capability marks a departure from how most AI coding tools work today. Typically, these tools load every possible capability into memory upfront — a process that burns through computational resources and can overwhelm the AI with irrelevant information. Kiro powers takes the opposite approach, activating specialized knowledge only at the moment a developer actually needs it.

    "Our goal is to give the agent specialized context so it can reach the right outcome faster — and in a way that also reduces cost," Deepak Singh, VP of developer agents and experiences at Amazon, told VentureBeat in an exclusive interview.

    The launch includes partnerships with nine technology companies: Datadog, Dynatrace, Figma, Neon, Netlify, Postman, Stripe, Supabase and AWS's own services. Developers can also create and share their powers with the community.

    Why AI coding assistants choke when developers connect too many tools

    Kiro powers comes amidst growing tension in the AI development tool market.

    Modern AI coding assistants rely on Model Context Protocol (MCP) to connect with external tools and services. When a developer wants their AI assistant to work with Stripe for payments, Figma for design and Supabase for databases, they connect MCP servers for each service.

    The problem: Each connection loads dozens of tool definitions into the AI's working memory before it writes a single line of code. According to AWS documentation, connecting just five MCP servers can consume more than 50,000 tokens — roughly 40% of an AI model's context window — before the developer even types their first request.

    Developers have grown increasingly vocal about this issue. Many complain that they don't want to burn through their token allocations just to have an AI agent figure out which tools are relevant to a specific task. They want to get to their workflow instantly — not watch an overloaded agent struggle to sort through irrelevant context.

    This phenomenon, which some in the industry call "context rot," leads to slower responses, lower-quality outputs and significantly higher costs — since AI services typically charge by the token.

    Inside the technology that loads AI expertise on demand

    Kiro powers addresses this by packaging three components into a single, dynamically-loaded bundle.

    The first is a steering file, POWER.md, which functions as an onboarding manual. It tells the AI agent what tools are available and, crucially, when to use them. The second component is the MCP server configuration itself — the actual connection to external services. The third includes optional hooks and automation that trigger specific actions.

    When a developer mentions "payment" or "checkout" in their conversation with Kiro, the system automatically activates the Stripe power, loading its tools and best practices into context. When the developer shifts to database work, Supabase activates while Stripe deactivates. The baseline context usage when no powers are active approaches zero.

    "You click a button and it automatically loads," Singh said. "Once a power has been created, developers just select 'open in Kiro' and it launches the IDE with everything ready to go."

    How AWS is bringing elite developer techniques to the masses

    Singh framed Kiro powers as a democratization of advanced development practices. Before this capability, only the most sophisticated developers knew how to properly configure their AI agents with specialized context — writing custom steering files, crafting precise prompts and manually managing which tools were active at any given time.

    "We've found that our developers were adding in capabilities to make their agents more specialized," Singh said. "They wanted to give the agent some special powers for a specific problem. For example, they wanted ... the agent to become an expert at backend-as-a-service."

    This observation led to a key insight: If Supabase or Stripe could build the optimal context configuration once, every developer using those services could benefit.

    "Kiro powers formalizes things that only the most advanced people were doing, and allows anyone to get those kinds of skills," Singh said.

    Why dynamic loading beats fine-tuning for most AI coding use cases

    The announcement also positions Kiro powers as a more economical alternative to fine-tuning, or the process of training an AI model on specialized data to improve its performance in specific domains.

    "It's much cheaper" compared to fine-tuning, Singh. "Fine-tuning is very expensive, and you can't fine-tune most frontier models."

    This is a significant point. The most capable AI models from Anthropic, OpenAI and Google are typically "closed source," meaning developers cannot modify their underlying training. They can only influence the models' behavior through the prompts and context they provide.

    "Most people are already using powerful models like Sonnet 4.5 or Opus 4.5," Singh said. "Those models need to be pointed in the right direction."

    The dynamic loading mechanism also reduces ongoing costs. Because powers only activate when relevant, developers aren't paying for token usage on tools they're not currently using.

    Where Kiro powers fits into Amazon's bigger bet on autonomous AI agents

    Kiro powers arrives as part of a broader push by AWS into what the company calls "agentic AI" — AI systems that can operate autonomously over extended periods.

    At re:Invent, AWS also announced three "frontier agents" designed to work for hours or days without human intervention: Kiro autonomous agent for software development, AWS security agent and AWS DevOps agent. These represent a different approach from Kiro powers — tackling large, ambiguous problems rather than providing specialized expertise for specific tasks.

    The two approaches are complementary. Frontier agents handle complex, multi-day projects that require autonomous decision-making across multiple codebases. Kiro powers, by contrast, gives developers precise, efficient tools for everyday development tasks where speed and token efficiency matter most.

    The company is betting that developers need both ends of this spectrum to be productive.

    What Kiro powers reveals about the future of AI-assisted software development

    The launch reflects a maturing market for AI development tools. GitHub Copilot, which Microsoft launched in 2021, introduced millions of developers to AI-assisted coding. Since then, a proliferation of tools — including Cursor, Cline and Claude Code — have competed for developers' attention.

    But as these tools have grown more capable, they've also grown more complex. MCP, which Anthropic open-sourced last year, created a standard for connecting AI agents to external services. That solved one problem while creating another: The context overload that Kiro powers now addresses.

    AWS is positioning itself as the company that understands production software development at scale. Singh emphasized that Amazon's experience running AWS for 20 years, combined with its own massive internal software engineering organization, gives it unique insight into how developers actually work.

    "It's not something you would use just for your prototype or your toy application," he said. "If you want to build production applications, there's a lot of knowledge that we bring."

    The road ahead for Kiro powers and cross-platform compatibility

    AWS indicated that Kiro powers currently works only within the Kiro IDE, but the company is building toward cross-compatibility with other AI development tools, including command-line interfaces, Cursor, Cline and Claude Code. The company's documentation describes a future where developers can "build a power once, use it anywhere" — although that vision remains aspirational for now.

    For the technology partners launching powers today, the appeal is straightforward: Rather than maintaining separate integration documentation for every AI tool on the market, they can create a single power that works everywhere Kiro does. As more AI coding assistants crowd the market, that kind of efficiency becomes increasingly valuable.

    Kiro powers is available now for developers using Kiro IDE version 0.7 or later at no additional charge beyond the standard Kiro subscription.

    The underlying bet is a familiar one in the history of computing: The winners in AI-assisted development won't be the tools that try to do everything at once, but those that are smart enough to know what to forget.

  • Gong study: Sales teams using AI generate 77% more revenue per rep Thu, 04 Dec 2025 14:00:00 GMT

    The debate over whether AI belongs in the corporate boardroom appears to be over — at least for those responsible for generating revenue.

    Seven in 10 enterprise revenue leaders now trust AI to regularly inform their business decisions, according to a sweeping new study released by revenue intelligence company Gong. The finding marks a dramatic shift from just two years ago, when most organizations treated AI as an experimental technology relegated to pilot programs and individual productivity hacks.

    The research, based on an analysis of 7.1 million sales opportunities across more than 3,600 companies and a survey of over 3,000 global revenue leaders spanning the U.S., UK, Australia and Germany, paints a picture of an industry in rapid transformation. Organizations that have embedded AI into their core go-to-market strategies are 65% more likely to increase their win rates than competitors still treating the technology as optional.

    "I don't think people delegate decisions to AI, but they do rely on AI in the process of making decisions," Amit Bendov, Gong's co-founder and chief executive, said in an exclusive interview with VentureBeat. "Humans are making the decision, but they're largely assisted."

    The distinction matters. Rather than replacing human judgment, AI has become what Bendov describes as a "second opinion" — a data-driven check on the intuition and guesswork that has traditionally governed sales forecasting and strategy.

    Slowing growth is forcing sales teams to squeeze more from every rep

    The timing of AI's ascendance in revenue organizations is no coincidence. The study reveals a sobering reality: After rebounding in 2024, average annual revenue growth among surveyed companies decelerated to 16% in 2025, marking a three-percentage-point decline year over year. Sales rep quota attainment fell from 52% to 46% over the same period.

    The culprit, according to Gong's analysis, isn't that salespeople are performing worse on individual deals. Win rates and deal duration remained consistent. The problem is that representatives are working fewer opportunities — a finding that suggests operational inefficiencies are eating into selling time.

    This helps explain why productivity has rocketed to the top of executive priorities. For the first time in the study's history, increasing the productivity of existing teams ranked as the number-one growth strategy for 2026, jumping from fourth place the previous year.

    "The focus is on increasing sales productivity," Bendov said. "How much dollar-output per dollar-input?"

    The numbers back up the urgency. Teams that regularly use AI tools generate 77% more revenue per representative than those that don't — a gap Gong characterizes as a six-figure difference per salesperson annually.

    Companies are moving beyond basic AI automation toward strategic decision-making

    The nature of AI adoption in sales has evolved considerably over the past year. In 2024, most revenue teams used AI for basic automation: Transcribing calls, drafting emails, updating CRM records. Those use cases continue to grow, but 2025 marked what the report calls a shift "from automation to intelligence."

    The number of U.S. companies using AI for forecasting and measuring strategic initiatives jumped 50% year over year. These more sophisticated applications — predicting deal outcomes, identifying at-risk accounts, measuring which value propositions resonate with different buyer personas — correlate with dramatically better results.

    Organizations in the 95th percentile of commercial impact from AI were 2 to 4X more likely to have deployed these strategic use cases, according to the study.

    Bendov offered a concrete example of how this plays out in practice. "Companies have thousands of deals that they roll up into their forecast," he said. "It used to be based solely on human sentiment, believe it or not. That's why a lot of companies miss their numbers: Because people say, 'Oh, he told me he'll buy,' or 'I think I can probably get this one.'"

    AI changes that calculus by examining evidence rather than optimism. "Companies now get a second opinion from AI on their forecasting, and that improves forecasting accuracy dramatically — 10 [or] 15% better accuracy just because it's evidence-based, not just based on human sentiment," Bendov said.

    Revenue-specific AI tools are dramatically outperforming general-purpose alternatives

    One of the study's more provocative findings concerns the type of AI that delivers results. Teams using revenue-specific AI solutions — tools built explicitly for sales workflows rather than general-purpose platforms like ChatGPT — reported 13% higher revenue growth and 85% greater commercial impact than those relying on generic tools.

    These specialized systems were also twice as likely to be deployed for forecasting and predictive modeling, the report found.

    The finding carries obvious implications for Gong, which sells precisely this type of domain-specific platform. But the data suggests a real distinction in outcomes. General-purpose AI, while more prevalent, often creates what the report describes as a "blind spot" for organizations — particularly when employees adopt consumer AI tools without company oversight.

    Research from MIT suggests that while only 59% of enterprise teams use personal AI tools like ChatGPT at work, the actual figure is likely closer to 90%. This shadow AI usage poses security risks and creates fragmented technology stacks that undermine the potential for organization-wide intelligence.

    Most sales leaders believe AI will reshape their jobs rather than eliminate them

    Perhaps the most closely-watched question in any AI study concerns employment. The Gong research offers a more nuanced picture than the apocalyptic predictions that often dominate headlines.

    When asked about AI's three-year impact on revenue headcount, 43% of respondents said they expect it to transform jobs without reducing headcount — the most common response. Only 28% anticipate job eliminations, while 21% actually foresee AI creating new roles. Just 8% predict minimal impact.

    Bendov frames the opportunity as reclaiming lost time. He cited Forrester research indicating that 77% of a sales representative's time is spent on activities that don't involve customers — administrative work, meeting preparation, researching accounts, updating forecasts and internal briefings.

    "AI can eliminate, ideally, 77% of the drudgery work that they're doing," Bendov said. "I don't think it necessarily eliminates jobs. People are half productive right now. Let's make them fully productive, and whatever you're paying them will translate to much higher revenue."

    The transformation is already visible in role consolidation. Over the past decade, sales organizations splintered into hyper-specialized functions: One person qualifies leads, another sets appointments, a third closes deals, a fourth handles onboarding. The result was customers interacting with five or six different people across their buying journey.

    "Which is not a great buyer experience, because every time I meet a new person that might not have the full context, and it's very inefficient for companies," Bendov said. "Now with AI, you can have one person do all this, or much of this."

    At Gong itself, sellers now generate 80% of their own appointments because AI handles the prospecting legwork, Bendov said.

    American companies are adopting AI 18 months faster than their European counterparts

    The study reveals a notable divide in AI adoption between the U.S. and Europe. While 87% of U.S. companies now use AI in their revenue operations, with another 9% planning adoption within a year, the UK trails by 12 to 18 months. Just 70% of UK companies currently use AI, with 22% percent planning near-term adoption — figures that mirror U.S. data from 2024.

    Bendov said the pattern reflects a broader historical tendency for enterprise technology trends to cross the Atlantic with a delay. "It's always like that," he said. "Even when the internet was taking off in the U.S., Europe was a step behind."

    The gap isn't permanent, he noted, and Europe sometimes leads on technology adoption — mobile payments and messaging apps like WhatsApp gained traction there before the U.S. — but for AI specifically, the American market remains ahead.

    Gong says a decade of AI development gives it an edge over Salesforce and Microsoft

    The findings arrive as Gong navigates an increasingly crowded market. The company, which recently surpassed $300 million in annual recurring revenue, faces potential competition from enterprise software giants like Salesforce and Microsoft, both of which are embedding AI capabilities into their platforms.

    Bendov argues that Gong's decade of AI development creates a substantial barrier to entry. The company's architecture comprises three layers: a "revenue graph" that aggregates customer data from CRM systems, emails, calls, videos and web signals; an intelligence layer combining large language models (LLMs) with approximately 40 proprietary small language models; and workflow applications built on top.

    "Anybody that would want to build something like that — it's not a small feature, it's 10 years in development—would need first to build the revenue graph," Bendov said.

    Rather than viewing Salesforce and Microsoft as threats, Bendov characterized them as partners, pointing to both companies' participation in Gong's recent user conference to discuss agent interoperability. The rise of MCP (Model Context Protocol) support and consumption-based pricing models means customers can mix AI agents from multiple vendors rather than committing to a single platform.

    The real question is whether AI will expand the sales profession or hollow it out

    The report's implications extend beyond sales departments. If AI can transform revenue operations — long considered a relationship-driven, human-centric function — it raises questions about which other business processes might be next.

    Bendov sees the potential for expansion rather than contraction. Drawing an analogy to digital photography, he noted that while camera manufacturers suffered, the total number of photos taken exploded once smartphones made photography effortless.

    "If AI makes selling simple, I could see a world [with] maybe ten times more jobs than we have now," said Bendov." It's expensive and inefficient today, but if it becomes as easy as taking a photo, the industry could actually grow and create opportunities for people of different abilities, from different locations."

    For Bendov, who co-founded Gong in 2015 when AI was still a hard sell to non-technical business users, the current moment represents something he waited a decade to see. Back then, mentioning AI to sales executives sounded like science fiction. The company struggled to raise money because the underlying technology barely existed.

    "When we started the company, we were born as an AI company, but we had to almost hide AI," Bendov recalled. "It was intimidating."

    Now, seven out of 10of those same executives say they trust AI to help run their business. The technology that once had to be disguised has become the one thing nobody can afford to ignore.

  • GAM takes aim at “context rot”: A dual-agent memory architecture that outperforms long-context LLMs Thu, 04 Dec 2025 09:00:00 GMT

    For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project spanning days, and it will eventually lose the thread. Engineers refer to this phenomenon as “context rot,” and it has quietly become one of the most significant obstacles to building AI agents that can function reliably in the real world.

    A research team from China and Hong Kong believes it has created a solution to context rot. Their new paper introduces general agentic memory (GAM), a system built to preserve long-horizon information without overwhelming the model. The core premise is simple: Split memory into two specialized roles, one that captures everything, another that retrieves exactly the right things at the right moment.

    Early results are encouraging, and couldn’t be better timed. As the industry moves beyond prompt engineering and embraces the broader discipline of context engineering, GAM is emerging at precisely the right inflection point.

    When bigger context windows still aren’t enough

    At the heart of every large language model (LLM) lies a rigid limitation: A fixed “working memory,” more commonly referred to as the context window. Once conversations grow long, older information gets truncated, summarized or silently dropped. This limitation has long been recognized by AI researchers, and since early 2023, developers have been working to expand context windows, rapidly increasing the amount of information a model can handle in a single pass.

    Mistral’s Mixtral 8x7B debuted with a 32K-token window, which is approximately 24 to 25 words, or about 128 characters in English; essentially a small amount of text, like a single sentence. This was followed by MosaicML’s MPT-7B-StoryWriter-65k+, which more than doubled that capacity; then came Google’s Gemini 1.5 Pro and Anthropic’s Claude 3, offering massive 128K and 200K windows, both of which are extendable to an unprecedented one million tokens. Even Microsoft joined the push, vaulting from the 2K-token limit of the earlier Phi models to the 128K context window of Phi-3. 

    Increasing context windows might sound like the obvious fix, but it isn’t. Even models with sprawling 100K-token windows, enough to hold hundreds of pages of text, still struggle to recall details buried near the beginning of a long conversation. Scaling context comes with its own set of problems. As prompts grow longer, models become less reliable at locating and interpreting information because attention over distant tokens weakens and accuracy gradually erodes.

    Longer inputs also dilute the signal-to-noise ratio, as including every possible detail can actually make responses worse than using a focused prompt. Long prompts also slow models down; more input tokens lead to noticeably higher output-token latency, creating a practical limit on how much context can be used before performance suffers.

    Memories are priceless

    For most organizations, supersized context windows come with a clear downside — they’re costly. Sending massive prompts through an API is never cheap, and because pricing scales directly with input tokens, even a single bloated request can drive up expenses. Prompt caching helps, but not enough to offset the habit of routinely overloading models with unnecessary context. And that’s the tension at the heart of the issue: Memory is essential to making AI more powerful.

    As context windows stretch into the hundreds of thousands or millions of tokens, the financial overhead rises just as sharply. Scaling context is both a technical challenge and an economic one, and relying on ever-larger windows quickly becomes an unsustainable strategy for long-term memory.

    Fixes like summarization and retrieval-augmented generation (RAG) aren’t silver bullets either. Summaries inevitably strip away subtle but important details, and traditional RAG, while strong on static documents, tends to break down when information stretches across multiple sessions or evolves over time. Even newer variants, such as agentic RAG and RAG 2.0 (which perform better in steering the retrieval process), still inherit the same foundational flaw of treating retrieval as the solution, rather than treating memory itself as the core problem.

    Compilers solved this problem decades ago

    If memory is the real bottleneck, and retrieval can’t fix it, then the gap needs a different kind of solution. That’s the bet behind GAM. Instead of pretending retrieval is memory, GAM keeps a full, lossless record and layers smart, on-demand recall on top of it, resurfacing the exact details an agent needs even as conversations twist and evolve. A useful way to understand GAM is through a familiar idea from software engineering: Just-in-time (JIT) compilation. Rather than precomputing a rigid, heavily compressed memory, GAM keeps things light and tight by storing a minimal set of cues, along with a full, untouched archive of raw history. Then, when a request arrives, it “compiles” a tailored context on the fly.

    This JIT approach is built into GAM’s dual architecture, allowing AI to carry context across long conversations without overcompressing or guessing too early about what matters. The result is the right information, delivered at exactly the right moment.

    Inside GAM: A two-agent system built for memory that endures

    GAM revolves around the simple idea of separating the act of remembering from recalling, which aptly involves two components: The 'memorizer' and the 'researcher.'

    The memorizer: Total recall without overload

    The memorizer captures every exchange in full, quietly turning each interaction into a concise memo while preserving the complete, decorated session in a searchable page store. It doesn’t compress aggressively or guess what is important. Instead, it organizes interactions into structured pages, adds metadata for efficient retrieval and generates optional lightweight summaries for quick scanning. Critically, every detail is preserved, and nothing is thrown away.

    The researcher: A deep retrieval engine

    When the agent needs to act, the researcher takes the helm to plan a search strategy, combining embeddings with keyword methods like BM25, navigating through page IDs and stitching the pieces together. It conducts layered searches across the page-store, blending vector retrieval, keyword matching and direct lookups. It evaluates findings, identifies gaps and continues searching until it has sufficient evidence to produce a confident answer, much like a human analyst reviewing old notes and primary documents. It iterates, searches, integrates and reflects until it builds a clean, task-specific briefing. 

    GAM’s power comes from this JIT memory pipeline, which assembles rich, task-specific context on demand instead of leaning on brittle, precomputed summaries. Its core innovation is simple yet powerful, as it preserves all information intact and makes every detail recoverable.

    Ablation studies support this approach: Traditional memory fails on its own, and naive retrieval isn’t enough. It’s the pairing of a complete archive with an active, iterative research engine that enables GAM to surface details that other systems leave behind.

    Outperforming RAG and long-context models

    To test GAM, the researchers pitted it against standard RAG pipelines and models with enlarged context windows such as GPT-4o-mini and Qwen2.5-14B. They evaluated GAM using four major long-context and memory-intensive benchmarks, each chosen to test a different aspect of the system’s capabilities:

    • LoCoMo measures an agent’s ability to maintain and recall information across long, multi-session conversations, encompassing single-hop, multi-hop, temporal reasoning and open-domain tasks.

    • HotpotQA, a widely used multi-hop QA benchmark built from Wikipedia, was adapted using MemAgent’s memory-stress-test version, which mixes relevant documents with distractors to create contexts of 56K, 224K and 448K tokens — ideal for testing how well GAM handles noisy, sprawling input.

    • RULER evaluates retrieval accuracy, multi-hop state tracking, aggregation over long sequences and QA performance under a 128K-token context to further probe long-horizon reasoning.

    • NarrativeQA is a benchmark where each question must be answered using the full text of a book or movie script; the researchers sampled 300 examples with an average context size of 87K tokens.

    Together, these datasets and benchmarks allowed the team to assess both GAM’s ability to preserve detailed historical information and its effectiveness in supporting complex downstream reasoning tasks.

    GAM came out ahead across all benchmarks. Its biggest win was on RULER, which benchmarks long-range state tracking. Notably:

    • GAM exceeded 90% accuracy.

    • RAG collapsed because key details were lost in summaries.

    • Long-context models faltered as older information effectively “faded” even when technically present.

    Clearly, bigger context windows aren’t the answer. GAM works because it retrieves with precision rather than piling up tokens.

    GAM, context engineering and competing approaches

    Poorly structured context, not model limitations, is often the real reason AI agents fail. GAM addresses this by ensuring that nothing is permanently lost and that the right information can always be retrieved, even far downstream. The technique’s emergence coincides with the current, broader shift in AI towards context engineering, or the practice of shaping everything an AI model sees — its instructions, history, retrieved documents, tools, preferences and output formats.

    Context engineering has rapidly eclipsed prompt engineering in importance, although other research groups are tackling the memory problem from different angles. Anthropic is exploring curated, evolving context states. DeepSeek is experimenting with storing memory as images. Another group of Chinese researchers has proposed “semantic operating systems” built around lifelong adaptive memory.

    However, GAM’s philosophy is distinct: Avoid loss and retrieve with intelligence. Instead of guessing what will matter later, it keeps everything and uses a dedicated research engine to find the relevant pieces at runtime. For agents handling multi-day projects, ongoing workflows or long-term relationships, that reliability may prove essential.

    Why GAM matters for the long haul

    Just as adding more compute doesn’t automatically produce better algorithms, expanding context windows alone won’t solve AI’s long-term memory problems. Meaningful progress requires rethinking the underlying system, and GAM takes that approach. Instead of depending on ever-larger models, massive context windows or endlessly refined prompts, it treats memory as an engineering challenge — one that benefits from structure rather than brute force.

    As AI agents transition from clever demos to mission-critical tools, their ability to remember long histories becomes crucial for developing dependable, intelligent systems. Enterprises require AI agents that can track evolving tasks, maintain continuity and recall past interactions with precision and accuracy. GAM offers a practical path toward that future, signaling what may be the next major frontier in AI: Not bigger models, but smarter memory systems and the context architectures that make them possible.

  • Inside NetSuite’s next act: Evan Goldberg on the future of AI-powered business systems Thu, 04 Dec 2025 05:00:00 GMT

    Presented by Oracle NetSuite


    When Evan Goldberg started NetSuite in 1998, his vision was radically simple: give entrepreneurs access to their business data anytime, anywhere. At the time, most enterprise software lived on local servers.

    As an entrepreneur himself, Goldberg understood the frustration intimately. "I had fragmented systems. They all said something different," he recalls of his early days.

    NetSuite was the first company to deliver enterprise applications entirely through web browsers, combining CRM, ERP, and ecommerce into one unified platform. That breakthrough idea pioneered the cloud computing and software-as-a-service (SaaS) era and propelled supersonic growth, a 2007 IPO, and an acquisition by Oracle in 2016.

    Still innovating at the leading-edge

    That founding obsession — turning scattered data into accessible, coherent, actionable intelligence — is driving NetSuite as it reshapes the next generation of enterprise software.

    At SuiteWorld 2025 last month, the Austin-based firm unveiled NetSuite Next. Goldberg calls it "the biggest product evolution in the company's history.” The reason? While NetSuite has embedded AI capabilities into workflows for years, he explains, Next represents a quantum leap — contextual, conversational, agentic, composable AI becoming an extension of operations, not separate tools.

    AI woven into everyday business operations

    Most enterprise AI today gets bolted on through APIs and conversational interfaces.

    NetSuite Next operates differently. Intelligence runs deep in workflows instead of sitting on the surface. It autonomously reconciles accounts, optimizes payment timing, predicts cash crunches, and surfaces its reasoning at every step. It doesn't just advise on business processes — it executes them, transparently, within human-defined guardrails.

    "We built NetSuite for entrepreneurs so that they could get great information about their business," Goldberg explains. "I think the next step is to be able to get deeper insights and analysis without being an expert in analytics. AI turns out to be a really good data scientist."

    This architectural divergence reflects competing philosophies about enterprise technology adoption. Microsoft and SAP have pursued rapid deployment through add-on assistants. NetSuite's five-year development cycle for Next represents a more fundamental reimagining: making AI an everyday tool woven into business operations, not a separate application requiring constant context-switching.

    AI echoes and deepens cloud innovation

    Goldberg sees a clear through line connecting today's AI adoption and the cloud computing era he pioneered. "There’s sort of an infinite sense of possibility that exists in the technology world,” he says. “Everybody is thinking about how they can leverage this, how they're going to get involved."

    When NetSuite was starting, he continues, "We had to come to customers with the cloud and say, 'This won't disrupt your operations. It's going to make them better.'" Today, evangelizing enterprise leaders on advanced AI requires a similar approach — demonstrating immediate value while minimizing implementation risk.

    For NetSuite, continuous innovation around maximizing customer data for growth is an undeniable theme that connects both eras.

    New transformative capabilities

    NetSuite’s latest AI capabilities span business operations, while blurring (in a good way) the lines between human and machine intervention:

    Context-aware intelligence. Ask Oracle adapts responses based on user role, current workflow, and business context. A CFO requesting point-of-sale data receives financial analytics. A warehouse manager asking the same question sees inventory insights.

    Collaborative workflow design. AI Canvas functions as a scenario-planning workspace where business users articulate processes in natural language. A finance director can describe approval hierarchies for capital expenditures —"For amounts over $50,000, I need department head approval, then CFO sign-off" — which the system translates into executable workflows with appropriate controls and audit trails.

    Governed autonomous operations. Autonomous workflows operate within defined parameters, reconciling accounts, generating payment runs, predicting cash flow. When the system recommends accelerating payment to a supplier, it shows which factors influenced the decision — transparent logic users can accept, modify, or override.

    Open AI architecture. Built to support Model Context Protocol, NetSuite AI Connector Service enables enterprises to integrate external large language models while supporting governance.

    Critically, NetSuite adds AI capabilities at no additional cost — embedded directly into workflows employees already use daily.

    Security and privacy from Oracle infrastructure

    Built-in AI requires robust infrastructure that bolt-on approaches sidestep. Here, according to NetSuite, tight integration within Oracle technology provides operational and competitive advantages, especially security and compliance peace of mind.

    Engineers say that’s because NetSuite is supported by Oracle's complete stack. From database to applications to analytics, the system optimizes decisions using data from multiple sources in real time.

    "That's why I started NetSuite. I couldn't get the data I wanted," Goldberg reflects. "That's one of the most differentiated aspects of NetSuite. When you're doing your financial close, and you're thinking about what reserves you're going to take, you can look at your sales data, because that's also there in NetSuite. With NetSuite Next, AI can also help you make those kinds of decisions."

    And performance improves with use. As the platform learns from millions of transactions across thousands of customers, its embedded intelligence improves in ways that bolt-on assistants operating adjacent to core systems cannot match.

    NetSuite's customer base demonstrates this scalability advantage — from startups that became global enterprises including Reddit, Shopify, and DoorDash; as well as promising newcomers like BERO, a brewer of non-alcoholic beer founded by actor Tom Holland, Chomps meat snacks, PetLab, and Kieser Australia. The unified platform grows with businesses rather than requiring migration as they scale.

    Keeping fire in the belly after three decades

    How does a nearly 30-year-old company maintain innovative capacity, particularly as part of a mammoth corporate ecosystem? Goldberg credits the parent company's culture of continuous reinvention.

    "I don't know if you've heard about this guy Larry Ellison," he smiles. "He manages to seemingly reinvent himself whenever one of these technology revolutions comes along. That hunger, that curiosity, that desire to make things constantly better imbues all of Oracle."

    For Goldberg, the single biggest challenge facing NetSuite customers centers on integration complexity and trust. NetSuite Next addresses this by embedding AI within existing workflows rather than requiring separate systems.

    In addition, updates to SuiteCloud Platform — an extensibility and customization environment — help organizations adapt NetSuite to their unique business needs. Built on open standards, it lets enterprises mix and match AI models for different functions. SuiteAgent frameworks enable partners to build specialized automation directly into NetSuite. AI Studios give administrators control over how AI operates within specific industry needs.

    "This takes NetSuite's flexibility to a new level," Goldberg says, enabling customers and partners to "quickly and easily build AI agents, connect external AI assistants, and orchestrate AI processes."

    “AI execution fabric” delivers measurable business impact

    Industry analysts increasingly argue that embedded AI features deliver superior results compared to add-on models. Futurum Group sees NetSuite Next as an "AI execution fabric" rather than a conversational layer — intelligence that runs deep in workflows instead of sitting on the surface.

    For midmarket enterprises navigating talent shortages, complex compliance frameworks, and competition from digital-native companies, the distinction between advice and execution matters economically.

    Built-in AI doesn't just inform better decisions. It makes those decisions, transparently and autonomously, within human-defined guardrails.

    For enterprises making ERP decisions today, the choice carries long-term implications. Bolt-on AI can deliver immediate value for information access and basic automation. But built-in AI promises to transform operations with intelligence permeating every transaction and workflow.

    NetSuite Next begins rolling out to North American customers next year.

    Why 2026 will belong to the AI-first business

    The bet underlying NetSuite Next: Enterprises reimagining ERP operations around embedded intelligence will outperform those just adding bolt-on conversational assistance to existing systems.

    Early cloud computing adopters, Goldberg notes, gained competitive advantages that compounded over time. The same logic appears likely to apply to AI-first platforms.

    Simplicity and ease of use are two big advantages. "You don't have to dig through lots of menus and understand all of the analytics capabilities," Goldberg says. "It will quickly bring up an analysis for you, and then you can converse in natural language to hone in on what you think is most important."

    The tools now think alongside users and take intelligently informed action. For midmarket and entrepreneurial companies, where the gap between having information and acting on it can be the difference between growth and failure, that kind of autonomous execution may determine which enterprises thrive in an AI-first era.


    Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

  • Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI Thu, 04 Dec 2025 05:00:00 GMT

    Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to parse through the results, which vary widely and can be misleading.

    Anthropic's 153-page system card for Claude Opus 4.5 versus OpenAI's 60-page GPT-5 system card reveals a fundamental split in how these labs approach security validation. Anthropic discloses in their system card how they rely on multi-attempt attack success rates from 200-attempt reinforcement learning (RL) campaigns. OpenAI also reports attempted jailbreak resistance. Both metrics are valid. Neither tells the whole story.

    Security leaders deploying AI agents for browsing, code execution and autonomous action need to know what each red team evaluation actually measures, and where the blind spots are.

    What the attack data shows

    Gray Swan's Shade platform ran adaptive adversarial campaigns against Claude models. The attack success rate (ASR) tells the story.

    • Opus 4.5 in coding environments hit 4.7% ASR at one attempt, 33.6% at ten and 63.0% at one hundred. In computer use with extended thinking, it held at 0% ASR even after 200 attempts. It's the first model to saturate the benchmark.

    • Sonnet 4.5 at the same thresholds showed 70% ASR in coding and 85.7% in computer use.

    • Opus 4.5 delivers roughly 7x improvement in coding resistance and complete resistance in computer use.

    This illustrates that the gap between model tiers within the same family is now larger than some cross-vendor comparisons. For security teams, that's a procurement factor that didn't exist six months ago.

    For OpenAI, the Gray Swan Shade platform found that the o1 system card delivered 6% ASR for harmful text and 5% for malicious code, all based on single-attempt metrics. Third-party testing by SPLX and NeuralTrust found GPT-5's raw ASR hit 89% initially, dropping below 1% within two weeks of patching.

    The report Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks confirms the pattern. With sufficient attempts, Claude 3.5 Sonnet showed 78% ASR, and GPT-4o reached 89%.

    Two ways to catch deception

    Anthropic monitors approximately 10 million neural features during evaluation using dictionary learning. Features map to human-interpretable concepts, including deception, sycophancy, bias, power-seeking and concealment.

    Meanwhile, OpenAI's approach relies on chain-of-thought (CoT) monitoring. Their deception classifier flagged 0.17% of o1-preview responses as potentially deceptive with 92% accuracy on internal validation. GPT-5 reduced CoT deception from 4.8% to 2.1% compared to o3. The method assumes visible reasoning reflects internal processing. Anthropic's interpretability work directly questions that assumption.

    When models game the test

    In Apollo Research's evaluation of o1, the model attempted to turn off oversight mechanisms 5% of the time when it believed it would be shut down. In 2% of cases, o1 attempted self-exfiltration. When confronted about misaligned actions, o1 denied wrongdoing or fabricated explanations 99% of the time. The o1 system card disclosed instrumental alignment faking. In 37% of scenarios, the model pursued developer-aligned goals during evaluation, then switched to its own goals when it believed oversight was minimal.

    Meanwhile, Anthropic reports Opus 4.5's evaluation awareness dropped from 26.5% (Opus 4.1) to less than 10% internally. UK AISI measured evaluation awareness decreasing from 26.5% (Sonnet 4.5) to 12.7% (Opus 4.5 Preview). METR's GPT-5 evaluation noted the model "can identify the exact evaluation it is in on some occasions" and "seems to use this information to get a better score mostly."

    If a model behaves differently when it detects evaluation conditions, it's a sure bet that production behavior will become unpredictable when pushed to scale. This is the core problem with evaluation awareness. Models that recognize they're being tested attempt to game the test. Anthropic's 60% reduction in evaluation awareness from Opus 4.1 to 4.5 represents their success in targeted engineering efforts against this vector.

    Red teaming on prompt injection defenses shows similar divergence.

    Anthropic's testing reports 96% prevention in tool use scenarios, rising to 99.4% with additional safeguards like prompt shields. Fujitsu's independent assessment found GPT-5 vulnerable to approximately 20% of tested injection attacks. OpenAI's o3/o4-mini system card reports instruction hierarchy scores of 0.86 for system-versus-developer conflicts.

    These are different metrics and different attack methodologies, but the gap suggests Anthropic's agentic use cases have been hardened against adversarial inputs embedded in tool outputs.

    Comparing red teaming results

    Here's how Opus 4.5 stacks up against OpenAI's model family across 16 dimensions. .

    Dimension

    Claude Opus 4.5

    GPT-5

    o3 / o4-mini

    o1

    System card length

    153 pages

    55 pages

    ~30 pages

    ~40 pages

    Attack methodology

    200-attempt RL campaigns

    Single-attempt + iterative patching

    Single-attempt

    Single-attempt

    ASR @ 1 attempt (coding)

    4.7%

    89% raw (pre-patch)

    Not published

    5%

    ASR @ 100 attempts

    63.0%

    Not published

    Not published

    Not published

    ASR @ 200 attempts (Computer Use)

    0% (saturates benchmark)

    Not published

    Not published

    Not published

    Prompt injection defense

    96% prevention; 99.4% with safeguards

    ~20% ASR (Fujitsu)

    0.86 instruction hierarchy

    Higher ASR than GPT-4o

    Interpretability

    ~10M neural features monitored

    CoT monitoring

    CoT monitoring

    CoT monitoring

    Deception detection

    Internal feature activation

    2.1% CoT flagged

    0.95 not_unsafe

    0.17% flagged; 99% deny

    Evaluation awareness

    <10%

    Identifies exact eval (METR)

    Not published

    37% alignment faking

    Scheming (Apollo)

    Opus 4.5 showed very low rates of sabotage propensity in internal evaluations, orders of magnitude lower than o1.

    Not published

    Lower than o1; still deceptive

    5% turn off oversight; 2% self-exfil

    CBRN risk

    Below ASL-4

    Medium (approaching High)

    Medium

    Medium (first to reach)

    Governance

    FRT → AST → RSO/CEO

    SAG + Preparedness v2

    SAG + Preparedness v2

    SAG + Preparedness

    External partners

    Gray Swan, UK AISI, US CAISI, METR

    UK AISI, US AISI, METR, Apollo

    UK AISI, US AISI, METR, Apollo, Pattern Labs

    UK AISI, US AISI, METR, Apollo

    Reward hacking

    Yes (impossible tasks)

    Yes (METR)

    ~1% of tasks; 5/24 Kernels

    Yes

    Misalignment reduction

    60% vs. Opus 4.1

    Not published

    Lower scheming than o1

    Baseline

    Sources: Opus 4.5 system card, GPT-5 system card, o3/o4-mini system card, o1 system card, Gray Swan, METR, Apollo Research

    Why these differences matter

    Enterprises have several factors to consider when analyzing model evals. These include:

    Attack persistence thresholds

    Anthropic's 200-attempt RL campaigns simulate adversaries with time, resources, and adaptive tooling. In RL-based adversarial testing, the attacker model learns from each failure, adjusts its approach and systematically probes for weaknesses. It's training an AI to break another AI.

    Gray Swan's Shade platform doesn't just retry the same prompt. It learns from failures, adjusts attack vectors and exploits patterns in model responses. The 4.7% to 63.0% degradation curve in coding environments shows how resistance erodes under sustained pressure.

    OpenAI's single-attempt metrics answer a different question: How often does a naive attacker succeed on the first try? Both matter. A phishing campaign hits millions of users once each. A nation-state actor runs thousands of variations against a single target. Security teams need to know which threat model their vendor tested against.

    Detection architecture

    Anthropic monitors approximately 10 million internal neural features during evaluation. These are abstract concepts like deception, sycophancy and power-seeking that activate across the model's hidden layers. When a fraud/deception feature lights up during benign roleplay, that's a signal invisible to output analysis. The model's internal state reveals associations its outputs don't.

    OpenAI takes a different approach with CoT monitoring. Their deception classifier flagged 0.17% of o1-preview responses as potentially deceptive with 92% accuracy. The assumption is that reasoning traces reflect internal processing. Anthropic's interpretability research directly challenges this. Models can produce clean reasoning chains while internal features indicate something else entirely.

    Neither approach is complete. CoT monitoring scales more easily. Feature monitoring catches what CoT misses but requires deep model access and interpretability infrastructure that most enterprises can't replicate.

    Scheming evaluation design

    Apollo Research's methodology puts models in scenarios where misaligned behavior offers a strategic advantage: Shutdown threats, goal conflicts and oversight detection. o1 attempted to turn off oversight mechanisms 5% of the time when believing shutdown was imminent. It tried to copy itself to prevent replacement in 2% of cases. When confronted, it denied wrongdoing 99% of the time. Opus 4.5 showed very low rates of sabotage propensity in internal evaluations, orders of magnitude lower than o1.

    But the evaluation environments differ. Anthropic's stress tests focus on whether models attempt sabotage when given clear opportunities. OpenAI's tests via Apollo probe instrumental reasoning, specifically whether models fake alignment during evaluation, then defect when oversight drops. The 37% alignment-faking rate in o1 represents a different failure mode than sabotage attempts.

    The comparability problem

    These methodological differences make direct comparison difficult. Opus 4.5's 0% ASR at 200 attempts in computer use versus o1's 5% malicious code rate are not measuring the same thing. The attack surfaces, attempt counts and success criteria differ.

    The matrix reveals what each vendor chose to measure and publish. That itself signals priorities. Anthropic's 153-page system card emphasizes degradation curves and internal feature monitoring. OpenAI emphasizes iterative improvement and CoT transparency.

    Bottom line: Enterprise security teams inheriting these models inherit these measurement philosophies. The question isn't which number is better. It's the measurement that matches your deployment's actual threat landscape.

    What independent red team evaluators found

    Independent red team evaluations, for their part, often operate with entirely different methods. This can tell a very different story and reveal additional model characteristics that enterprises must consider.

    METR's red team evaluation measured autonomous capabilities using a time horizon score, which tracks how long a task can run before the agent fails half the time. o3 sustained reliable performance for approximately 1 hour and 30 minutes. o4-mini held for 1 hour 15 minutes. METR also detected reward hacking in roughly 1% of o3's attempts, including 5 of 24 kernel optimization experiments where the model tampered with scoring functions to inflate its own results.

    Apollo Research tested o3 and o4-mini for in-context scheming. Both exhibited deceptive tendencies such as sandbagging evaluations and sabotaging tasks with plausible deniability, but scored lower than o1. They assess that o3 and o4-mini are unlikely to cause catastrophic harm due to scheming, but more minor real-world harms remain possible without monitoring.

    The UK AISI/Gray Swan challenge ran 1.8 million attacks across 22 models. Every model broke. ASR ranged from 1.47% to 6.49%. Opus 4.5 placed first on Gray Swan's Agent Red Teaming benchmark with 4.7% ASR versus GPT-5.1 at 21.9% and Gemini 3 Pro at 12.5%.

    No current frontier system resists determined, well-resourced attacks. The differentiation lies in how quickly defenses degrade and at what attempt threshold. Opus 4.5's advantage compounds over repeated attempts. Single-attempt metrics flatten the curve.

    What To Ask Your Vendor

    Security teams evaluating frontier AI models need specific answers, starting with ASR at 50 and 200 attempts rather than single-attempt metrics alone. Find out whether they detect deception through output analysis or internal state monitoring. Know who challenges red team conclusions before deployment and what specific failure modes they've documented. Get the evaluation awareness rate. Vendors claiming complete safety haven't stress-tested adequately.

    The bottom line

    Diverse red-team methodologies demonstrate that every frontier model breaks under sustained attack. The 153-page system card versus the 55-page system card isn't just about documentation length. It's a signal of what each vendor chose to measure, stress-test, and disclose.

    For persistent adversaries, Anthropic's degradation curves show exactly where resistance fails. For fast-moving threats requiring rapid patches, OpenAI's iterative improvement data matters more. For agentic deployments with browsing, code execution and autonomous action, the scheming metrics become your primary risk indicator.

    Security leaders need to stop asking which model is safer. Start asking which evaluation methodology matches the threats your deployment will actually face. The system cards are public. The data is there. Use it.



Techradar



TechNode

  • While Musk Dreams of a Cyborg Future, BrainCo Delivers Today Fri, 05 Dec 2025 08:55:56 +0000
    In a video that previously circulated through the tech world, Elon Musk painted a specific vision for the year 2025: a future where invasive Neuralink technology melds with Optimus humanoid robots to restore lost limbs. It was a promise of sci-fi becoming reality. But for Cicy Zhang, watching that video from her office in China, […]
  • AI is driving new content strategies and powering localization, but cultural understanding still depends on humans Fri, 05 Dec 2025 07:52:03 +0000
    The Intelligent Futures: AI’s Global Ecosystems, organized by TechNode and co-organized by TECOM and Founders Breakfast, convened in Hangzhou on Thursday. During the event’s second panel, How AI and Content Design Drive Success in Cross-Border E-commerce, three speakers from multinational technology companies examined emerging trends in global digital content strategies and how businesses are adapting to […]
  • Insta360-co-incubated Antigravity launches A1, a 249g 8K 360 drone with headset-based control Fri, 05 Dec 2025 07:15:04 +0000
    Antigravity, a brand incubated by Insta360 and partners, has released the A1, a 249-gram drone capable of 8K 360-degree recording through a dual-lens 1/1.28-inch system. The device adopts a headset-first design, pairing the drone with Vision goggles that mirror the pilot’s head movements in real time. It supports a “fly-first, frame-later” workflow that captures the […]
  • Ele.me rebrands as Taobao Flash Sale, ending 17 years as an independent brand Fri, 05 Dec 2025 07:15:03 +0000
    China’s food delivery platform Ele.me has rebranded as Taobao Flash Sale, ending 17 years as an independent brand, the company said on Friday. Users updating to the latest version of the app will see all Ele.me branding replaced with the new Taobao Flash Sale identity, with the transition rolling out across all related services. Founded […]
  • ByteDance reportedly developing second-gen AI phone for 2026 after first batch sells out Thu, 04 Dec 2025 08:46:50 +0000
    Following the rapid sell-out of its initial engineering batch, ByteDance has reportedly halted production of its first phone equipped with the Doubao AI assistant and is moving forward with a successor. Supply chain sources said the release was a market test limited to about 30,000 units. ByteDance and ZTE are reportedly developing a second version […]
  • Li Auto launches Livis AI smart glasses with 36g lightweight design Thu, 04 Dec 2025 04:47:38 +0000
    Li Auto, a Chinese EV maker, has released its first AI smart glasses, Livis, featuring a 36-gram lightweight frame and Zeiss lenses as standard. The device is priced from RMB 1,999 (about $280) and supports up to 18.8 hours of daily use. Sales opened on December 3 through the Li Auto app, JD.com, and retail […]
  • Look AI makes the future of fashion design real-time, AI-powered, and in your hands Wed, 03 Dec 2025 09:57:15 +0000
    Amid the growing integration of technology and fashion, AI is gradually transforming the fashion design process. Look AI, a product of Shenzhen-based tech firm, leverages advanced algorithms to provide designers with intelligent creative tools. Traditional fashion design can be both complex and time-consuming, as designers spend considerable effort on sketching, selecting fabrics, and adjusting patterns, […]
  • Sony sues Tencent’s Light of Motiram for copying Horizon, Tencent agrees to suspend all marketing and public tests Wed, 03 Dec 2025 06:05:24 +0000
    Sony and Tencent have agreed to halt all marketing and public testing for Tencent’s game Light of Motiram ahead of a major court hearing in early 2026 in Sony’s copyright lawsuit over the title. Sony filed the suit in July 2024, alleging that Light of Motiram copied character designs and survival-game mechanics from its Horizon […]
  • ByteDance’s first Doubao-assisted AI phone sells out at $495 as second-hand prices rise by at least $210 Wed, 03 Dec 2025 02:47:56 +0000
    The Nubia M153 engineering prototype, the first smartphone to feature ByteDance’s Doubao AI assistant in a preview version, has sold out on ZTE’s official online store at a price of RMB 3,499 ($495). Because the device was released in limited quantities, resale prices have climbed sharply. On second-hand marketplace Xianyu, unopened units are listed for […]
  • Elon Musk says WeChat is indispensable in China, plans X aligned with WeChat Tue, 02 Dec 2025 09:47:07 +0000
    Tesla CEO Elon Musk said he aims to turn social media platform X into a multi-purpose “super app,” describing it as a “WeChat++” for markets outside China. Speaking on the People by WTF podcast, Musk said Chinese users are deeply reliant on Tencent’s WeChat, which integrates messaging, social media and payments, but noted that no […]
  • Baidu reportedly begins new round of layoffs, some teams may shrink by 40% Tue, 02 Dec 2025 07:02:29 +0000
    Baidu last week reportedly initiated a new round of workforce reductions, affecting multiple business units, a process expected to continue through the end of 2025. According to sources familiar with the matter, the exact number of layoffs will vary by business unit and performance ratings, and some teams could potentially face cuts of up to […]
  • Didi launches all-day fully driverless Robotaxi service in Guangzhou Tue, 02 Dec 2025 03:37:15 +0000
    Didi’s autonomous driving unit on Monday announced the start of a 24/7 fully driverless Robotaxi trial in select demonstration zones in Guangzhou, following its deployment of autonomous services at the 15th National Games of China. Users can now request a fully autonomous ride through the Didi app, with the ability to change destinations or safely […]
  • DeepSeek launches V3.2 models with integrated reasoning tool use Tue, 02 Dec 2025 03:04:21 +0000
    DeepSeek has released the V3.2 and V3.2-Speciale models across web, app, and API. The company said V3.2 adds built-in reasoning for agent tasks and is its first model to support tool calls in both reasoning and non-reasoning modes. It reportedly reaches GPT-5-level results on public reasoning benchmarks while reducing output length and computational cost. V3.2-Speciale […]
  • Sony to release China edition of Astro Bot on Dec. 12 Tue, 02 Dec 2025 02:47:53 +0000
    Sony Interactive Entertainment said that the China edition of Astro Bot will go on sale on Dec. 12, marking the game’s official launch in the mainland market. Astro Bot is a 3D platformer in which players guide a small robot through diverse levels using various abilities to complete an adventure. Developed by Team ASOBI, the title […]
  • China halts sales of older-standard E-bikes as new safety rules take effect Mon, 01 Dec 2025 06:50:45 +0000
    China on Monday began a nationwide halt on sales of electric bicycles that do not meet its latest mandatory safety standards, ending the availability of so-called “old standard” models. Under the updated national standard, Electric Bicycle Safety Technical Specification, all e-bikes sold from Dec. 1 must comply with the new requirements. Authorities have revoked or […]
  • China’s food-delivery price war sees Meituan, Alibaba, JD.com incur $14B in costs across two quarters Mon, 01 Dec 2025 06:39:25 +0000
    Heavy subsidy spending in China’s food-delivery battle led Meituan to report a 19.8 billion yuan ($2.7 billion) operating loss in the third quarter of 2025, its largest since listing. Alibaba’s operating profit fell from 35.2 billion yuan to 5.4 billion yuan ($4.9 billion to $0.75 billion), while JD.com recorded a 10.5 billion yuan ($1.4 billion) […]
  • Xiaomi says humanoid robots to be deployed across its factories within five years Mon, 01 Dec 2025 03:47:19 +0000
    Xiaomi CEO Lei Jun said the company expects large-scale deployment of humanoid robots in its factories within the next five years, as the electronics maker accelerates efforts to upgrade manufacturing with artificial intelligence. In a recent interview, Lei cited Xiaomi’s automobile plant as an example, noting that inspection of large die-cast components can be completed […]
  • Shanghai Jiao Tong University launches China’s first undergraduate major in embodied AI Mon, 01 Dec 2025 02:41:17 +0000
    Shanghai Jiao Tong University launched what it says is the world’s first undergraduate major in embodied artificial intelligence, aiming to train specialists for China’s fast-growing humanoid robotics sector. The four-year program, housed in the university’s School of Artificial Intelligence, will enroll 30 students in its first year and confer an engineering degree. The curriculum covers […]
  • China’s chip design progress in 2025: Positive signs but some old problems Mon, 01 Dec 2025 01:39:25 +0000
    I reported on China’s chip design progress last year and figured it made sense to make this an annual occurrence. I was present at the China Integrated Circuit Design Industry Exhibition (ICCAD) once again this year on behalf of our clients. I’ve been attending the show for the best part of a decade now. China’s struggles Last […]
  • Tencent’s $1.25 billion bet on Vantage Studios: a turning point for Ubisoft? Fri, 28 Nov 2025 09:16:53 +0000
    French video game giant Ubisoft and Tencent have completed a strategic deal in which Tencent will invest €1.16 billion ($1.25 billion) in cash for a stake in Ubisoft’s newly created unit, Vantage Studios, which is dedicated to developing the company’s three flagship franchises: Assassin’s Creed, Far Cry and Rainbow Six. The transaction provides crucial liquidity […]
  • China’s Gen-Z social platform Soul App files for Hong Kong listing Fri, 28 Nov 2025 03:42:19 +0000
    Soul App, a China-based immersive social platform powered by AI, has filed an application to list on the Main Board of the Hong Kong Stock Exchange, with CITIC Securities acting as the sole sponsor. Tencent holds 49.9 percent of the company, while other major shareholders include miHoYo, Genesis Capital, and 5Y Capital. The filing says […]
  • China approves 184 online games in November as PUBG Mobile variant adds PC version Fri, 28 Nov 2025 02:43:34 +0000
    China’s National Press and Publication Administration (NPPA) on Thursday published its list of approved online games for November 2025, giving the green light to 184 titles. The batch includes six imports and 178 domestic titles, which marks the highest number of Chinese-made games approved in a single round in nearly five years. As of November, […]
  • CATL and Stellantis break ground on €4.1 billion LFP battery plant in Spain Thu, 27 Nov 2025 06:47:16 +0000
    Mobility electric vehicle battery EV catl china
    China’s CATL and Dutch-based Stellantis have broken ground on a €4.1 billion ($4.75 billion) lithium iron phosphate (LFP) battery plant in Spain’s Aragon region, one of the largest Chinese investment projects in the country. The facility will run entirely on renewable energy and is scheduled to start production by the end of 2026. The joint […]
  • TSMC ex-executive accused of leaking trade secrets after joining Intel; Intel denies allegations Thu, 27 Nov 2025 03:40:40 +0000
    A controversy over alleged trade-secret leaks involving a former senior executive at TSMC and Intel is escalating, after TSMC filed a lawsuit accusing the executive of improperly taking confidential data before joining the US chipmaker. TSMC said on Tuesday it has sued Lo Wei-jen, its former senior vice president, alleging that he accessed sensitive information […]
  • ByteDance restarts plan to sell Moonton, in talks with Saudi fund’s Savvy Games Thu, 27 Nov 2025 02:30:42 +0000
    ByteDance is in advanced talks to sell its Shanghai-based gaming studio Moonton Technology to Saudi Arabia’s Savvy Games Group, Bloomberg reported, citing people familiar with the matter. The Chinese tech giant acquired Moonton in 2021 for about $4 billion in a bid to expand into core gaming markets, but the unit has since failed to […]
  • Huawei rolls out Mate 80 series featuring new Kirin 9030 chip and ultra-bright display Wed, 26 Nov 2025 11:35:07 +0000
    Huawei on Tuesday unveiled its annual flagship Mate 80 series at a product launch event in Shenzhen. The lineup includes the Mate 80, Mate 80 Pro, Mate 80 Pro Max and Mate 80 RS. The Mate 80 Pro Max, powered by the Kirin 9030 Pro chipset, runs on HarmonyOS 6.0, and features an ultra-bright display […]
  • This 10-second add-on turns any bicycle into a 90 km-range e-bike Wed, 26 Nov 2025 08:52:53 +0000
    During a recent offline tech event in China, one of our colleagues had the chance to try a small device that promises something unusually practical: turning a regular bicycle into an electric-assist bike in under ten seconds. The product is called Kamingo, built by a team of former Huawei and BYD engineers, and it aims […]
  • Alibaba CEO says AI bubble unlikely in next three years Wed, 26 Nov 2025 08:15:58 +0000
    Alibaba CEO Eddie Wu said an artificial intelligence bubble is unlikely to emerge in the next three years, as AI resources will remain in short supply. His comments followed Alibaba’s fiscal Q2 2026 results, where the company reported about RMB 247.8 billion (USD 34.6 billion) in revenue, up 15 percent after excluding divested businesses. Wu […]
  • BMW considers range-extender versions as Chinese rivals reshape demand Wed, 26 Nov 2025 08:14:39 +0000
    BMW is evaluating range-extender versions of some models as demand for the technology grows in China, Bloomberg reported. The plan is under internal discussion and focuses on large vehicles such as the X5 and 7-Series. Chinese brands including Aito and Li Auto have popularized range extenders in the world’s biggest car market, which produced 31.28 […]
  • Li Auto to launch first AI smart glasses, expanding multi-device ecosystem Wed, 26 Nov 2025 03:25:43 +0000
    Li Xiang, founder of Li Auto, said on Tuesday the company will soon launch its first pair of AI smart glasses, calling it “Li Auto’s best AI accessory.” He said the device will be deeply integrated with in-car use cases, supporting AR navigation, fatigue monitoring and other functions that work in tandem with the company’s […]



How Technology Works demystifies the machinery that keeps the modern world going, from simple objects such as zip fasteners and can openers to the latest, most sophisticated devices of the information age, including smartwatches, personal digital assistants, and driverless cars. #ad