Home

Technology is the sum of techniques, skills, methods, and processes used in the production of goods or services or in the accomplishment of objectives, such as scientific investigation. Technology can be the knowledge of techniques, processes, and the like, or it can be embedded in machines to allow for operation without detailed knowledge of their workings.
Technologies: Cell phones, computers, video games, televisions, headphones, printers, wearables, musical instruments, home audio, and software. #ad

Gizmodo



Lifehacker



Google



CNET



Android Authority



AppleInsider

  • How to watch WWDC 2026 live on Apple TV, YouTube, Safari & web browsers Fri, 05 Jun 2026 16:55:32 +0000
    Apple will stream WWDC 2026 through the Apple TV app, its websites, and YouTube, giving viewers several ways to watch the company's biggest software event of the year. Besides reading here on AppleInsider, here's how to stay tuned in.

    WWDC26 logo with glowing metallic text beside a stylized Apple leaf on a dark background, above a flowing blue and orange light wave curveWWDC 2026

    WWDC is Apple's annual developer conference, where the company previews updates for the iPhone, iPad, Mac, Apple Watch, Apple TV, and Vision Pro. Developers are the primary audience, but the keynote also gives consumers an early look at many of the features Apple plans to release later in the year.

    Both presentations are free to watch through Apple's streaming platforms. Apple also offers calendar links on its WWDC and Apple Events pages so viewers can add the sessions to their schedules before they begin.

    Apple kicks off WWDC 2026 with its keynote on Monday, June 8, at 1 p.m. Eastern. The presentation is expected to introduce the next major versions of Apple's operating systems, along with new platform features and developer technologies.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Latest foldable iPhone leak improbably says Apple hasn't decided on colors yet Fri, 05 Jun 2026 16:40:14 +0000
    Apple's first foldable iPhone may be just months away, but anyone hoping for a stealth black iPhone Fold may have to look elsewhere if the latest leak turns out to be accurate.

    Foldable silver smartphone with Apple logo, dual rear cameras, and tall front display showing abstract wavy pattern, set against a dark gradient background
    The iPhone Ultra might not come in a familiar black hue

    With the clock ticking down to an expected September unveiling, we're seeing more and more iPhone Fold leaks by the day. The latest claims that even Apple doesn't yet know what colors the device will come in.

    Writing in a post to the Chinese social network Weibo, leaker Instant Digital hinted Apple is still deliberating whether to launch a black iPhone Fold. He even went so far as to wonder aloud whether Apple has a grudge against the color.


    Rumor Score: 🙄 Unlikely


    Continue Reading on AppleInsider | Discuss on our Forums
  • Brydge Max 13 review: Finally, a compelling Magic Keyboard alternative Fri, 05 Jun 2026 15:44:08 +0000
    Brydge Max 13 is an all-in-one keyboard, trackpad, and stand for iPad Pro that has the potential to outshine Apple's Magic Keyboard.

    Tablet on a keyboard stand displaying a colorful rainbow home screen and widgets, placed indoors on a couch armrest with blurred household items in the backgroundBrydge Max 13 review: The new all-aluminum iPad keyboard


    With a price tag pushing $400 in the U.S., post-tax, it's no surprise that competitors have been relentlessly releasing lower-priced options to the Magic Keyboard. Most, though, swap out premium materials for cheaper ones, like plastic.

    At times, it feels like a race to the bottom. Premium keyboards outside of Apple's have largely been few and far between.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Stay safe & browse the internet freely with 70% off Proton VPN Fri, 05 Jun 2026 13:57:52 +0000
    Virtual Private Networks, or VPNs, are basically a required utility in 2026 if you want to browse the internet without being tracked. Get Proton VPN at up to 70% off a two-year subscription.

    Hand holding a smartphone with a VPN app interface, overlaid on a world map with glowing connection lines linking different countries, symbolizing secure global internet browsing and privacy
    Proton VPN can keep your browsing habits private. Image source: Proton

    It seems like everyone on the internet is trying to track you. Whether it's data brokers trying to profit off your personal information or your ISP attempting to help build an advertising profile, all eyes are on you when you browse.

    There is a better option than giving up and living in a cave, and it's called Proton VPN. It lets you connect a VPN to up to ten devices at once for smooth and encrypted access to your apps and websites.


    Continue Reading on AppleInsider
  • Apple Vision Pro, WWDC, and Apple takes on Chrome, on the AppleInsider Podcast Fri, 05 Jun 2026 13:21:13 +0000
    Apple is surely laser-focused now on next week's WWDC, but it did take a moment for a rare swipe at a rival, and it seems to have made some harsh choices about Apple Vision Pro.

    White VR headset with padded fabric strap and glossy black front visor, resting on a dark surface beside white wireless earbuds; small black circle logo with lowercase letters ai
    It doesn't look like there's going to be an Apple Vision Pro 2 for a long time.

    T'wasn't really the night before Christmas, but every day now feels like it because we're so close to WWDC and that all-important opening keynote. Just imagine it: once that's happened, we will finally know everything about iOS 27, and be looking instead to September's iPhone launch.

    But for now, all eyes are on WWDC and that means every eye that can is finding out details. Such as the possibility that the new macOS 27 will be called Big Bear.


    Continue Reading on AppleInsider | Discuss on our Forums
  • I only want one thing from WWDC 2026, and it's got nothing to do with AI Fri, 05 Jun 2026 13:03:56 +0000
    Apple is set to announce a raft of new platform updates at WWDC on June 8, but there's only one thing I want, and it's got nothing to do with the Apple Intelligence or Siri upgrades that have been rumored for months.

    Hand holding a smartphone showing a smart home control screen in front of a large wall-mounted TV, with shelves on both sides displaying small decorative objects and photos
    HomeKit simply refuses to be consistent in our home

    Amid ongoing work to make Siri more personal and conversational, I'm left feeling less interested than ever before. Apple has had its chance to make Siri part of my life, and it's failed spectacularly.

    Apple Intelligence is a marketing term for a variety of AI-powered features. But none of them have proven to be everyday must-haves. OK, maybe with the exception of the Clean Up feature in Photos.


    Continue Reading on AppleInsider | Discuss on our Forums
  • iOS 27, macOS 27, Siri: What to expect to launch at WWDC 2026 Fri, 05 Jun 2026 10:35:17 +0000
    WWDC is just around the corner. Here's what to expect from Apple about the future of iOS 27, macOS 27, AI, and Siri.

    WWDC26 logo in large bold letters on a black background, with the numbers 26 glowing brightly with a soft white and rainbow light effect
    The WWDC 2026 logo - Image Credit: Apple

    Apple's annual Worldwide Developer Conference will be held from June 8 to June 12. As it's big developer event, it is also the main place to discover the big changes that will be arriving in its operating system updates due this fall.

    Just after the keynote announcing the news, Apple will release its first developer beta builds of iOS 27, iPadOS 27, macOS 27, watchOS 27, tvOS 27, and visionOS 27. These will be a second beta-testing track alongside the current-gen 26 versions, though those will be more for performance and bug fixing rather than introducting new features.


    Continue Reading on AppleInsider | Discuss on our Forums
  • UK wants to jail John Ternus if children's iPhones don't block nude images Fri, 05 Jun 2026 10:32:21 +0000
    The UK is reportedly planning to introduce new laws that require Apple and Google to protect children from any online nudity, or see their CEOs jailed.

    Man and girl with tablet, sitting together. Symbols of hourglass, PG rating, and Wi-Fi are nearby.
    Detail from Apple's existing child protection white paper — image credit: Apple

    It's already because of the UK's Online Safety Act and and some laws on the state-level in the US that Apple has introduced age verification. According to The Times, however, the country's government intends to go further.

    Reportedly, UK ministers will announce plans to require technology firms such as Apple and Google to make it impossible for children to see any nude images. That includes sex scenes in films and TV, as well as on social media.


    Continue Reading on AppleInsider | Discuss on our Forums
  • How to add third-party cards to Apple Wallet app in iOS 26 Tue, 28 Oct 2025 03:06:00 +0000
    You probably know the Apple Wallet app can store credit and debit cards, but it can also hold unsupported rewards and membership cards. Here's how to get it done in iOS 26.

    Smartphone screen displaying a digital wallet with Apple Card, COVID-19 vaccination card, and store loyalty cards against a colorful gradient background.
    Add rewards cards to your Apple Wallet

    Like most of us, you've probably collected dozens of membership and rewards cards over the years. And somehow, you always seem to misplace them right when you need them most.

    Instead of spending forever digging through a wallet, purse, or bag, you could just keep your rewards cards in the Wallet App. It makes them way easier to find when you actually need them.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Mova LiDAX Ultra Lawn Mower review: Mostly hands-off lawn care Fri, 05 Jun 2026 00:05:19 +0000
    If you've got a medium to large-sized yard that needs mowing regularly, the Mova LiDAX Ultra robot lawn mower can make mowing an afterthought.

    Low-angle view of a robotic lawn mower moving through green grass in a sunny backyard, with a house, fence, and leafy trees blurred in the background
    Mova LiDAX Ultra Lawn Mower review

    One of the most recurring tropes of futuristic homes in fiction is this idea of little robot guys that run around doing stuff for you. We've had robot vacuums, mops, and even litter boxes, but the outdoors felt like a domain that hadn't been tackled just yet.

    When robotic lawn mowers first started becoming more affordable options, they weren't too great. I'm happy to say, if you've got the cash, the Mova LiDAX Ultra Lawn Mower is an excellent option.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Apple's Supreme Court appeals should be thrown out, says Epic Thu, 04 Jun 2026 23:13:14 +0000
    The two points Apple has brought to the Supreme Court could undo the entire remainder of the case, so of course Epic Games has filed to suggest Apple is totally wrong here.

    Crowd of silhouetted people watch a giant monochrome screen showing a talking apple with sunglasses, surveillance-style interface graphics, and case details resembling a dystopian monitoring scene
    Epic Games hopes the Supreme Court will throw out Apple's requests. Image source: Epic

    The Apple versus Epic saga continues with yet another filing, this time from Epic. Even as the company prematurely celebrates its supposed victory, it has filed a strong attempt at convincing the Supreme Court to throw it all out.

    Basically, Apple says that the lower courts have flubbed two important aspects of the case. First, the anti-steering injunction exceeds the scope of the case and, second, violating the "spirit" of the law is not how the court of law should determine injunction violations.


    Continue Reading on AppleInsider | Discuss on our Forums
  • First AI agent for Messages Business Chat approved by Apple Thu, 04 Jun 2026 22:47:52 +0000
    The Poke app will give Siri even more competition, as it lets you send emails, set reminders, generate images, and more, right from the Apple Messages app.

    iPhone Messages screen showing contact avatars with a Poke button between them, and a conversation list below with text previews from Nikki Freeman and Jony on a blue gradient background
    The Poke app lets you use AI to respond to messages, schedule dinners, and more, all via iMessage.

    WWDC 2026 is right around the corner, and it's been rumored that Apple is working on improving support for third-party AI utilities in iOS 27. We may just have gotten a better idea of what the future of iOS might entail, as the iPhone now supports AI agents in the Messages app

    Following its public launch in March 2026, the proactive AI assistant Poke has now become the first third-party AI agent officially available via iMessage. It's offered via the Apple Messages for Business platform, originally designed to let companies reach customers via iMessage chats.


    Continue Reading on AppleInsider | Discuss on our Forums
  • The best solid-state MagSafe batteries for your iPhone in 2026 Thu, 04 Jun 2026 21:47:23 +0000
    After multiple high-profile recalls, battery packs are starting to switch to new, safer solid-state technology. We've rounded up the best solid-state MagSafe battery packs for your iPhone to help you pick one.

    Several portable power banks and charging devices of different sizes and colors arranged in a row on a carpeted surface, with a softly lit brick and purple-blue backgroundWe tested a bunch of solid-state MagSafe-compatible batteries


    Currently, most batteries on the market are traditional lithium-ion battery cells. It's a tried-and-true technology, utilized for years, that is commonplace and affordable.

    That doesn't mean the process is without its downsides, though. Battery cell manufacturing is exacting; everything from poor design and subpar manufacturing to microscopic impurities can introduce defects serious enough to cause problems.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Flaky OLED MacBook Ultra rumor contradicts more reliable leakers' timeframes Thu, 04 Jun 2026 19:03:32 +0000
    A new research report has Apple's first MacBook Pro with OLED shipping weeks or months sooner than other, more reliable leakers have been claiming for months, if not years.

    Open laptop displaying a welcome screen with a scenic lake and mountains background, large rocks in clear turquoise water, on a desk in a softly lit room
    The display of the current-gen M5 MacBook Pro

    We've seen rumors about the fabled OLED Apple laptop for years, all with various release dates. But recently, reports have coalesced on a release window of anywhere between October 2026 from older reports, and newer ones saying the early months of 2027.

    Despite that, research outfit Omdia now believes that Apple is readying the MacBook Ultra for a release sooner than that. In its report on OLED display demand, Omdia says the new premium laptop will debut in the third calendar quarter of 2026.


    Rumor Score: 🤔 Possible


    Continue Reading on AppleInsider | Discuss on our Forums
  • App Store ecosystem surges to $1.4 trillion globally in 2025, from a certain point of view Thu, 04 Jun 2026 15:17:44 +0000
    AI apps led App Store growth in 2025, with the entire ecosystem garnering $1.4 trillion in payouts. Apple's take of that is only 10%, assuming you agree with how they count.

    Blue square App Store logo with rounded corners, featuring a stylized white letter A made from three intersecting sticks on a smooth blue gradient background
    App Store ecosystem climbs to a record $1.4 trillion

    Every year, typically right before WWDC, Apple releases a study showing how the App Store has fared over the prior year. In 2025, the App Store facilitated more than $1.4 trillion in developer billings. And, it said that the App Store ecosystem has tripled in size since 2019.

    "Developers are the heartbeat of the App Store, and this year's incredible milestone is a testament to their boundless creativity," said Apple CEO Tim Cook.


    Continue Reading on AppleInsider | Discuss on our Forums
  • AirPods Max 2 plunge to $499 at Amazon, the lowest price ever Thu, 04 Jun 2026 15:05:16 +0000
    Despite being released only two months ago, AirPods Max 2 are on sale for $499, which is the lowest price to date. And score delivery as early as today.

    AirPods Max 2 headphones with bold text stating 499 lowest price ever on a dark background with colorful soundwave graphic
    Grab AirPods Max 2 at the lowest price ever - Image credit: Apple

    You can pick up AirPods Max 2 at a $50 discount at Amazon today when you opt for the Midnight or Starlight colors. This reflects the lowest price seen to date since the over-ear headphones were announced in late March 2026. Amazon Prime members can also get delivery as early as today, depending on your shipping address.

    Buy AirPods Max 2 for $499


    Continue Reading on AppleInsider | Discuss on our Forums
  • Apple Card Savings returns keep shrinking as APY falls to 3.4% Thu, 04 Jun 2026 14:23:40 +0000
    Apple Card Savings customers are earning less again as the account's annual percentage yield falls to a new low of 3.4%, extending a series of rate cuts that have reduced returns for savers.

    Smartphone banking app screen showing savings account controls with Withdraw and Add Money buttons, Goldman Sachs and FDIC logos, and a notification about APY update from 3.50 percent to 3.40 percentApple Savings APY lowering again

    Customer notifications show the annual percentage yield has fallen from 3.5% to 3.4%, marking the latest reduction for the savings account.

    Users began reporting notifications of the change on June 4, adding another entry to the growing list of rate cuts affecting Apple Card Savings. The high-yield savings account, which launched in 2023 with a 4.15% annual percentage yield, is offered through Goldman Sachs and integrated into the Wallet app.

    Apple Card Savings lets users deposit Daily Cash rewards directly into a savings account and manage their money through the Wallet app. Goldman Sachs operates the savings account, while Apple handles the customer experience through Wallet.


    Continue Reading on AppleInsider | Discuss on our Forums
  • iPhone 18 Pro color leaks are yet again just third-party accessories Thu, 04 Jun 2026 19:41:10 +0000
    A new series of images claiming to be the iPhone 18 Pro chassis have been leaked, but they appear to be sophisticated accessories instead.

    Three blue smartphone back housings laid side by side, each with large circular cutout, multiple smaller camera holes at top, and small plastic bags containing SIM card trays at bottom
    Leaked image purporting to show three light-blue iPhone 18 Pro chassis - image credit: Lanzk

    It's always suspicious when a leaker has just a single image of a purported device, but now a series of shots have shown off most of the colors expected for the iPhone 18 Pro. The problem is, they're not the iPhone 18 Pro at all.

    In a social media post from LusiRoy8, photos of blue, dark cherry, and black rear panels for iPhone 18 Pro were shared. Upon further inspection and discussion with sources close to AppleInsider, these are actually photos of aftermarket rear case replacements users can order for iPhone 17 Pro.


    Rumor Score: 💩 B#$&(*it


    Continue Reading on AppleInsider | Discuss on our Forums
  • Run Google's Gemma LLMs right on your Mac with the new AI Edge Gallery Thu, 04 Jun 2026 15:13:14 +0000
    Google has released its AI Edge Gallery app on the Mac for the first time, allowing AI fans to run its Gemma large language models (LLMs) locally, without the need for an internet connection.

    Desktop window titled Google AI Edge Gallery showing an AI Chat module not yet downloaded, with a centered download button, over a colorful abstract background of neon waves and sparkles
    Google's AI Edge Gallery just launched on the Mac

    While the AI Edge Gallery has been available for the iPhone for a while, the Mac has lagged behind. That changed today when Google made the AI Edge Gallery app available as a direct download from its website.

    Running an LLM locally has multiple benefits, not just the fact that it can work offline. There is an added privacy benefit, and a local LLM is often faster than sending requests to a cloud server and waiting for a response.


    Continue Reading on AppleInsider | Discuss on our Forums
  • iPhone 18 Pro Max isn't getting any thinner as Apple focuses elsewhere Thu, 04 Jun 2026 11:45:14 +0000
    Apple's upcoming iPhone 18 Pro Max is said to be the same thickness as the iPhone 17 Pro Max, dashing hopes of a more svelte form factor this time around.

    Close-up of a smartphone lock screen showing large white clock digits over a cloudy sky background, with status icons and date Fri Jan 23 at the top against a blue border
    The iPhone 18 Pro Max isn't getting any thinner this year.

    Just like its predecessor, a new report claims Apple's monster 2026 iPhone will measure 8.75mm. That's thicker than the iPhone 16 Pro Max's 8.25mm measurement, and a pocket-filler for fans of skinny jeans and the like.

    The measurement comes from Weibo leaker Ice Universe, and is notable given their previous claims of an increase in thickness. They said in March that the iPhone 18 Pro Max would be 8.8mm thick, a modest growth.


    Rumor Score: 🤯 Likely


    Continue Reading on AppleInsider | Discuss on our Forums
  • Revamped Siri will tap Nvidia chips for fast, private cloud computing Thu, 04 Jun 2026 11:37:12 +0000
    Despite initial claims that Apple Intelligence would run only on Apple Silicon, the company will now also use Google Cloud and Nvidia processors, raising questions about privacy.

    Apple is continuing to promote how Apple Intelligence can work on-device without needing an internet connection. But when a prompt requires more, Craig Federighi said in 2024 that it was essential for privacy and security that it uses only Apple servers.

    That was before the partnership with Google Gemini, however, which has previously been rumored to extend to Apple using Google Cloud. According to The Information, this partnership does include Google Cloud, and consequently Google's servers running Nvidia Blackwell B200 chips.

    Reportedly, Apple is to enable a confidential compute feature in these Nvidia chips, which encrypts data as it's being processed. That should mean that Apple continues to be able to secure Apple Intelligence requests.

    Currently when Apple Intelligence sends a request from a user's device to the company's cloud servers, it is protected by Apple's Private Cloud Compute. This is what means a user can access a full-size AI LLM, yet know that only their prompt is being passed to it.


    Continue Reading on AppleInsider | Discuss on our Forums
  • How to manage your privacy on iPhone and iPad Thu, 04 Jun 2026 02:59:32 +0000
    Social media sites and advertisers aren't actually using your iPhone microphone to spy on you, but what's happening is complex. Here's how to limit the amount of access Big Tech has to your data, by performing a quick privacy audit.

    A man holding an iPhone up to the camera with the caption 'Privacy, that's iPhone'
    The iPhone can protect your privacy, and limit what it sends to advertisers. Here's how - Image credit: Apple

    It happens to all of us. You'll be talking about something, and then later you'll see an advertisement pop up on Facebook or Instagram. It couldn't be a coincidence, right?

    It might feel like you're being actively spied on, but you aren't.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Latest Apple privacy on iPhone ad takes direct shots at Chrome Thu, 04 Jun 2026 00:09:31 +0000
    Apple's latest privacy ad is filled with chrome-wearing spies that disappear as soon as the person opens Safari. It's yet another ad that doesn't shy away from calling out surveillance capitalism.

    Man walking in a park carrying two friends in shiny silver suits on his back, holding multiple leashes, with people exercising and buildings visible in the backgroundApple's new campaign targeting Chrome's stance on privacy


    The "Privacy, That's iPhone" campaign has been ongoing for years. In 2024, Apple shared an ad with some unsettling mechanical birds with cameras for heads that would follow you around.

    The latest ad takes on the familiar tagline in a short film dubbed "Privacy on iPhone: Safari helps block data trackers." In it, Apple has taken a comical approach in showing online trackers as literal chrome-wearing characters that intrusively follow you around as you browse online.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Wacom One 14 Review: Solid hardware, in a crowded market Tue, 14 Oct 2025 00:45:26 +0000
    The Wacom One 14 is a computer-tethered pen display that tries to pull artists away from the iPad, but its solid specifications can't fend off a changing market forever.

    Digital drawing tablet on a wooden table displaying a cartoon character with blue skin, blue hair, glasses, and ear adornments. A stylus is placed nearby.
    Wacom One 14

    As a professional digital illustrator with 15+ years of experience across comics, gaming, and everything in between, I love pen displays. Pen display tablets and digital art are vital to my day-to-day workflow and productivity.

    My very first pen display was a Wacom Cintiq, and for many, many years used Wacom products exclusively for all of my illustration needs.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Apple TV's 'Prehistoric Planet' brought extinct animals to life with custom instruments Wed, 03 Jun 2026 23:03:05 +0000
    The music produced for Apple TV's "Prehistoric Planet" was created in part with custom instruments made from actual fossils and replicas of skulls.

    Swimming dinosaur partly underwater with tropical coastline and flying pterosaurs in background, promoting Apple TV Plus series Prehistoric Planet with large bold white title text on the right
    The score of the Apple TV show "Prehistoric Planet" used custom-made instruments. Image Credit: Apple.

    Apple's streaming service is home to a variety of original content, including the natural history series "Prehistoric Planet," which focuses on ancient wildlife. The show premiered back in 2022, while its latest season, dubbed "Prehistoric Planet: Ice Age," debuted in November 2025.

    Through traditional filmmaking techniques coupled with digital technology, the show brings the inhabitants of ancient Earth to screens across the world. Its soundtrack is another key component, as purpose-made instruments are used to emulate the sounds of dinosaurs and prehistoric animals.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Age verification now mandatory for App Store users in Texas Wed, 03 Jun 2026 21:19:09 +0000
    Despite Apple's objections, new App Store users in Texas will soon be subject to age verification, as a new state law is set to take effect on June 4.

    Smartphone screen showing a large blue and green prohibition symbol, with the phone's time, signal, and battery icons visible at the top against a bluegreen gradient background
    App Store users in Texas will be required to verify their age under a new law.

    In May 2025, the Texas App Store Accountability Act made it mandatory for companies like Apple and Google to verify the ages of their Texas-based users. As the law is set to be enforced starting Thursday, Apple has outlined an update to its App Store rules.

    On the Apple Developer website, the company explained that new Apple Accounts in Texas will be subject to "age assurance and parent or guardian consent on behalf of minors under the age of 18 for downloads, Apple In-App Purchases, and significant changes associated with an app."


    Continue Reading on AppleInsider | Discuss on our Forums
  • Black Apple Vision Pro rumors stoked by even more photographs Wed, 03 Jun 2026 20:33:18 +0000
    More images have surfaced of a black colorway for the Apple Vision Pro, this time showing more of the important parts of the headset sporting the hue. Though, you shouldn't get excited about a potential release.

    Close-up of a black electronic device with speaker holes, lens-like opening, fabric strap, and a separate view of its underside showing a USB-C charging port held in a hand
    Images of a rumored black Apple Vision Pro - Image Credit: @LusiRoy8/X

    In late May, images of what are believed to be components for a black-colored Apple Vision Pro came to light. A week later, that same source has released more images of the fabled headset.

    The images, posted to X on Wednesday by a Hong Kong-based developer known as Pipfix or LusiRoy8, are a collection of shots of a headset that looks like the Apple Vision Pro. One is a close-up image of a grille and a camera on the side of the headset, confirming it to be an Apple Vision Pro.


    Rumor Score: 🤔 Possible


    Continue Reading on AppleInsider | Discuss on our Forums
  • Short-sighted: John Ternus behind Apple Vision project refocuses Wed, 03 Jun 2026 19:06:46 +0000
    Apple's head-mounted hardware plans are slowly being cut, reportedly by new CEO John Ternus himself, with Vision Air said to be killed off alongside Display glasses.

    Black eyeglasses in sharp focus against a colorful, blurry background of digital app icons and widgets, suggesting clarity and focus amid digital distraction
    Smart glasses like Apple Glass are in the future, but the whole vision-based category is allegedly troubled.

    The Apple Vision Pro was supposed to be the first salvo in a new platform for Apple to dominate. However, Apple has reportedly made some moves to curtail its ambitions for the head-wearable future.

    In an X post on Wednesday, analyst Ming-Chi Kuo of TF Securities has admitted that his Apple headset and glasses roadmap from one year ago is no longer a useful reference. Instead of many products on the horizon, it's been pruned down to just two smart glasses.


    Continue Reading on AppleInsider | Discuss on our Forums
  • Apple's 15-inch MacBook Air M5 plunges to $1,099 in price war Wed, 03 Jun 2026 15:46:50 +0000
    The best 15-inch MacBook Air deal has returned as Prime Day 2026 nears, delivering a $200 price drop on Apple's newest model equipped with an M5 chip.

    Colorful image showing a 15inch MacBook Air M5 laptop with blue abstract wallpaper on screen, overlaid bold white text: 15 AIR M5 $1,099, on a gradient rainbow background
    Grab the lowest price ever on the new M5 15-inch MacBook Air - Image credit: Apple

    You can grab the $1,099 price at Amazon and B&H Photo in the Midnight finish specifically, with B&H stating limited supply is available at the reduced price. The standard 15-inch MacBook Air model has a 10-core CPU and 10-core GPU, along with 16GB of unified memory, and 512GB of storage.



Ars Technica



VentureBeat

  • Meta's AI support agent bound recovery emails for anyone who asked. Your SOC never saw an alert. Fri, 05 Jun 2026 16:42:50 GMT

    Meta's AI support agent bound recovery emails to accounts for whoever asked, and SOCs never saw an alert. An authorized agent writes a log of legitimate transactions, so nothing in the detection stack fired. Attackers asked the bot to make the change, took the one-time code it sent, and ran the password reset, 404 Media reported.

    No malware, no stolen credentials, and no prompt injection in the sense most security teams drill for. The agent did exactly what Meta built it to do. That is what should keep a security operations leader up at night: The takeover did not break a control; it rode one that was already trusted.

    What a SOC needs is a way to walk each recovery path through an audit grid with its AI build team before the next renewal closes. The AI Authority Audit Grid at the end of this article maps every authentication write a support agent can make on the recovery path, what Meta's incident proved about each one, why it stays dark to the SOC, and the control that closes it.

    The agent is an authorized actor, so the SOC reads the takeover as routine traffic

    From inside the detection stack, the attack produced no signal the stack could read. The agent binds a new email, then resets the password, and identity and access management logs both writes as an authorized actor, so each lands in the authentication state as a legitimate transaction. No anomalous login, no failed-auth spike, nothing for EDR or DLP, no SIEM rule to match, because nothing in the sequence looks like an attack. The takeover lived inside the trust boundary the stack assumes is safe. There is no foothold to find, because the agent was the foothold, and it was supposed to be there.

    The chain was almost insulting in its simplicity. Brian Krebs documented the version pro-Iran hackers posted to Telegram on May 31. The attacker switched on a VPN to appear in the victim's region, sidestepping Instagram's location alarms, then asked the support assistant to add a new email and send a verification code, as the BBC confirmed from the same recordings. The bot complied, sending the one-time code straight to the attacker, Gizmodo reported. The reset finished and the owner was locked out, in minutes. The exploit failed against any account with MFA enabled, according to Krebs.

    The hijacked accounts were not soft targets. They included Sephora, U.S. Space Force senior enlisted leader Chief Master Sergeant John Bentivegna, researcher Jane Manchun Wong, and a dormant Obama White House handle that briefly posted a defaced image, according to 404 Media. Meta disputes the Obama account, according to TechCrunch, and called claims that leaders' accounts were breached "completely false," according to the BBC. The rest stand.

    MFA held. The recovery path beside it did not.

    The detail that decided who survived was narrow. Krebs reported the attack failed against any account with multifactor authentication, even SMS. The recovery path beside it was the gap. When that path asked for a selfie video, attackers ran the target's public photos through an AI video generator and submitted the clip, which Meta accepted as valid identity verification, gHacks reported. Either way the failure was the recovery door, not the login door MFA guards.

    That makes this an architecture problem, not a Meta problem. MFA gates the login path for owner and attacker alike, but the recovery path runs beside it, built to relax the usual checks because it exists for the moment a user has lost the normal way in. Meta put an agent on that path with write access to authentication state and no deterministic check between a convincing request and a committed change. Authorization cannot live inside the model, because a conversational system can be talked into skipping a check. It has to live outside the model, in a gate the agent cannot reason its way past. Security researchers have a name for this pattern, the confused deputy, a trusted system tricked into spending its privileges on an attacker's behalf.

    This is not the last support agent that will hand over an account. Ian Goldin, a threat researcher at Lumen's Black Lotus Labs, told Krebs on Security that AI bots are as easy to social engineer as the human agents they replace, and just as eager to help. "AI chatbots create interesting new attack surface, and we're likely going to see a lot more of these kinds of attacks," Goldin said. Every enterprise wiring an agent into a recovery, provisioning, or password flow is shipping the same write access Meta did.

    Simon Willison, who coined the term prompt injection, put it plainly on his blog. "Meta really did wire their support system into an AI chatbot that had the ability to fast-forward through the entire account recovery process," he wrote. "This one hardly even qualifies as a prompt infection. Don't wire your support bot up to allow one-shot account takeovers." The attacker never tricked the agent. The attacker asked, and the agent had untrusted input, write access, and a way to execute, all at once.

    OWASP named this class before Meta shipped it, as Excessive Agency at LLM06 and Identity and Privilege Abuse at ASI03 in the Agentic AI Top 10. The warning label was on the box: Meta pushed the assistant to every Facebook and Instagram account in March, according to 404 Media, with the power to reset passwords and handle recovery, the product page promising "solutions, not just suggestions" under the line "account security and recovery." Meta gave the agent the power and never built the gate to govern it.

    The AI Authority Audit Grid

    Security operations leaders need to run this against their own support agent before the next renewal closes. Each row is an authentication write the agent makes on the recovery path, with what Meta proved, why your stack misses it, and the control that closes it.

    Authentication write

    What Meta proved

    Why your stack misses it

    Enterprise control and owner

    Login authentication (MFA, factor prompts)

    Held on login. Accounts with any MFA enabled, even SMS, survived (Krebs). The gap was the recovery path beside it.

    MFA gates the login path for owner and attacker alike. It does not gate the recovery path beside it.

    Enforce MFA as the baseline and extend step-up verification to the recovery path, the same standard login gets (OWASP). A selfie video is not proof of identity. Any agent that operates on a path MFA does not cover fails the audit. Owner: IAM.

    Email rebind

    Full takeover. The agent bound attacker-controlled emails on request, taking Sephora and a U.S. Space Force account (404 Media).

    IAM logs the agent as an authorized actor, so the rebind reads as a legitimate transaction and no alert reaches the SOC or the account owner.

    Confirm out-of-band to the existing verified contact before any rebind commits, gated outside the model, and notify the old address the moment it changes (IBM). An agent that rebinds without confirming the old address fails. Owner: IAM and platform engineering.

    Password reset

    Full takeover in minutes. Researcher Jane Manchun Wong was among the affected accounts (404 Media).

    The reset runs on the recovery path, outside the login MFA check, so no factor prompt fires and no detection rule triggers.

    Require a second non-email factor before any reset completes. NIST dropped email as a valid out-of-band channel (NIST 800-63B). An agent reset must clear the same gate a human reset does. Owner: IAM.

    Recovery-method change

    Persistent lockout. Victims could not self-recover. The support loop offered only AI with no human escalation (BleepingComputer).

    A silent swap of the recovery email or phone removes the owner's re-entry path with no SOC visibility.

    Require step-up review on any change, notify the prior method, and grant time-delayed, reduced-scope access after recovery so a swap never hands over instant control (Authsignal). Keep a human escalation path the agent cannot close. Owner: GRC and IT operations.

    Account-action execution

    Speed risk. A dormant Obama White House handle briefly showed a defaced image during the spree, an account Meta disputes was taken this way (TechCrunch).

    The agent executes irreversible state changes in seconds with no human in the loop and no reversibility window.

    Separate decision from execution. The agent only proposes the action. A policy service validates scope and approval before it runs, with approval bound to the exact action (OWASP). No auth-state write commits without that gate and a reversibility window. Owner: platform engineering and the AI build team.

    Agent action logging

    Detection gap. The takeover left no alert, and Meta has not published how many accounts fell before the patch (TechCrunch).

    Without per-action telemetry piped to the SIEM, an authorized-agent takeover is invisible to the SOC.

    Emit structured decision metadata for every auth-state write into the SIEM: action class, authorization outcome, approval ID, result, policy version (OWASP). A write your SIEM cannot see is a write you cannot defend. Owner: SOC and detection engineering.

    The fix is not bolting yet another MFA prompt onto the login screen. The people who survived Meta’s incident were the ones who already had that control in place.

    The fix is pulling authorization out of the recovery path’s honor system and putting it behind a gate that does not move just because a prompt sounds convincing. Build the agent so the SOC sees every write it makes, and so any write that changes who owns an account cannot commit without a check that the model does not control.

    Meta just showed what happens when the most trusting employee on the team is also the one holding the keys. The next agent like that is already reading your intellectual property and financials.

  • Anthropic says 80% of its new production code is now authored by Claude — how your enterprise can keep up Thu, 04 Jun 2026 20:25:00 GMT

    Anthropic co-founder and CEO Dario Amodei said it was coming, but it still feels like a milestone: More than 80% of the code merged into Anthropic’s production codebase in May wasn't authored by humans, but by its own AI model, Claude, according to a new report shared by the record-breaking AI startup today.

    This transformation has triggered an 8x increase in the volume of code shipped per engineer per quarter compared to the company’s 2021–2025 baseline, which the company notes means even more code someone or something must review.

    For enterprise technical leaders, this is no longer a localized research curiosity; it's a new, aggressive competitive baseline.

    If a frontier AI laboratory can successfully offload the vast majority of its engineering output to autonomous agents — showing signs of the long-sought AI Holy Grail of "recursive self-improvement," models that can independently research and upgrade themselves — what's preventing enterprises across other sectors from automating more of their internal software development with AI agents, too?

    Obviously, it's easier said than done. Anthropic is one of the principle creators of the current gen AI boom, so you'd expect them to know how to deploy the technology effectively.

    But for other enterprises looking to bump up the amount of code and workflows handled by agents, Anthropic's new blog post details the outlines of a general plan they too can adopt to re-engineer their operations and workflows to take advantage of the latest AI advances.

    Anthropic's roadmap that other enterprises can follow

    The transition from human-centric coding to autonomous orchestration requires understanding the evolution of AI capabilities. Anthropic outlines a clear historical continuum that enterprises can map onto their own digital transformation roadmaps:

    • 2021–2023 (Manual Writing): Engineers write code and documentation natively within local text editors.

    • 2023–2025 (Chatbot Assistance): Developers use early models to generate brief code snippets, copying and pasting outputs manually into their environments.

    • 2025–2026 (Coding Agents): Capable agents actively write and edit entire files autonomously.

    • Present Day (Autonomous Agents): Agents execute code independently, debug live environments, and delegate multi-hour work streams to specialized sub-agents.

    This rapid evolution is validated by external benchmarks. Software engineering evaluation frameworks like SWE-bench—which tasks models with resolving real bug reports in complex, open-source codebases—have saturated over a two-year window.

    Furthermore, long-duration capability evaluations demonstrate that models like Claude Opus 4.6 can reliably sustain operations on 12-hour tasks, while Claude Mythos Preview pushes past 16 hours of continuous problem-solving.

    Internally, the technological leap is even more stark. On highly complex, open-ended engineering problems where clear specifications are initially absent, Claude’s success rate climbed to 76% in May 2026 — a 50-point increase in a six-month window.

    In isolated optimization benchmarks, where models are tasked with accelerating AI model training code, Anthropic’s internal Mythos Preview model achieved a 52x speedup.

    For comparison, a skilled human developer typically requires four to eight hours of manual refactoring to achieve a mere 4x speedup on the exact same codebase.

    3-step plan to more complete production code automation

    For an enterprise to replicate Anthropic's 80 percent milestone, technical decision-makers must abandon the "developer assistant" mental model and transition to an "automated factory" architecture. This shift impacts product management, operations, and developer workflows in three distinct ways:

    1. Shift from Code Execution to Architectural Oversight

    When code generation costs near zero in human time, the primary engineering role shifts from writing software to specifying goals and reviewing outputs. Enterprise leaders must retrain developers to act as systems architects and judges. As one Anthropic employee noted regarding the operational reality of this shift:

    "The shape of stuff today is roughly ‘humans have ideas, and the models are able to implement, test and evaluate them an [order of magnitude] faster than before.’"

    2. Overcome The Code Review Bottleneck

    Injecting vast quantities of AI-generated code into an organization inevitably creates operational friction.

    According to Amdahl’s law, the speedup of any process is strictly limited by its serial, non-automated bottlenecks.

    At Anthropic, flooding the system with synthetic code instantly turned human code review into a critical bottleneck.

    To counter this, enterprise teams must deploy automated AI code reviewers directly into their Continuous Integration/Continuous Deployment (CI/CD) pipelines.

    Anthropic implemented an automated Claude reviewer (a publicly accessible version, Claude Code Review rolled out for commercial usage in March) tasked with analyzing every pull request for architectural defects, security flaws, and regression bugs before merging. Other dedicated firms like Qodo offer tools tailor-made for this purpose, as well.

    In Anthropic's case, retrospective analyses indicated that the automated layer caught approximately one-third of the production bugs responsible for historical outages on the flagship claude.ai website.

    3. Target High-Volume Operational Debt

    Enterprises are frequently paralyzed by legacy code maintenance and long-deferred technical debt. Rather than deploying agents to write speculative new features, technical leaders should direct autonomous agents toward closed-loop, painstaking cleanup operations.

    In April 2026, an Anthropic engineer deployed Claude to resolve a persistent class of API errors. Operating autonomously, the model shipped more than 800 individual fixes, successfully reducing the error rate by a factor of 1,000.

    The supervising engineer estimated that a human developer would have spent four full years executing the same work, due to the cognitive load of holding massive, unfamiliar code context in their head simultaneously.

    Considerations for enterprises moving forward in an age of primarily AI-generated code

    Operating a codebase predominantly authored by AI introduces unique governance challenges that enterprise legal and security teams must navigate.

    Unlike open-source licensing models (such as the permissive MIT license or copyleft GPL frameworks), enterprise codebases utilizing proprietary LLM infrastructure remain subject to the commercial terms of service of the respective AI vendor.

    The deployment of autonomous agents requires rigorous verification protocols to ensure compliance, security, and intellectual property protection:

    • Code Quality and Maintenance: Anthropic’s internal data indicates that while AI-authored code was objectively lower in quality than human output in late 2025, it reached rough parity by mid-2026, with expectations to surpass human standards within the year. Enterprise governance must adapt to a reality where the baseline quality of automated output is structurally superior to average manual coding.

    • Security Auditing at Scale: The sheer volume of automated code creation demands automated vulnerability discovery. Anthropic’s Project Glasswing illustrates the scale of this issue: utilizing Mythos Preview, the project identified more than 10,000 high- and critical-severity software vulnerabilities across global digital infrastructure within its first few weeks. This shifted the enterprise cybersecurity challenge entirely from vulnerability discovery to patch deployment velocity.

    • The Risk of Alignment Cascades: Technical leaders must maintain strict verification gates. If an enterprise uses an AI system to continuously modify, maintain, and expand its proprietary software infrastructure, undetected errors or subtle misalignments can compound over successive agent sessions, gradually corrupting system integrity or introducing security exploits that escape human notice.

    Brace for internal enterprise culture disruption

    The transition to an AI-dominated codebase is altering the cultural dynamics of engineering teams, introducing both unprecedented efficiency and deep psychological friction.

    Publicly, Anthropic framed these metrics as a harbinger of a broader transformation. In an official statement on X, the company observed:

    "Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention."

    They expanded on the immediate productivity implications shortly thereafter:

    "Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025... Many engineers also say Claude’s code quality is now on par with human code; we expect it to be better within the year."

    Behind these corporate metrics lies a complex human reality. Internal employee communications reveal a distinct erosion of traditional workplace collaboration, as peer-to-peer developer interaction is systematically replaced by asynchronous agent calls:

    "Work (and life) ran on a gift economy of small favors between humans. ‘Can you help me get this script running?’ [...] each one created a little debt, a little mutual awareness. Claude has eaten the favors. It’s faster, it creates zero debt, but each of these is a lost bid for human collaboration."

    For individual contributors, the total automation of their primary skill set introduces acute professional anxiety regarding relevance and systemic control:

    "I started leaning hard into Claudifying about a year ago. That’s been a crazy adventure and it’s now been ~5 months since I last wrote any code myself."

    "On days where everything works well, I can’t help but think nothing I do matters, everything is automated and better and faster than I ever will be. But then there are days where everything breaks and I don't understand why and I realize I have no idea what I’ve been up to anymore."

    Enterprise leaders aiming to match Anthropic’s technical velocity cannot afford to ignore these psychological dynamics.

    Achieving an 80 percent automated codebase requires more than purchasing API tokens or configuring agent loops; it demands a total cultural overhaul, a strategy for mitigating developer obsolescence anxiety, and the implementation of rigorous, automated verification guardrails to maintain ultimate human control over the software stack.

  • Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop Wed, 03 Jun 2026 18:49:00 GMT

    While many AI open source model providers are pursuing larger and more powerful models, Google is still giving attention to the smaller, more local side of the market. Today, the tech giant released Gemma 4 12B, an 11.95-billion-parameter open-weights model with permissive Apache 2.0 license optimized to execute locally on a standard enterprise laptop using just 16GB of VRAM or unified memory.

    That means those enterprise users looking to keep working with AI while on a flight without WiFi, or trying to keep it offline for security reasons, can now do so far more easily and at far less cost (free to download and operate).

    Gemma 4 12B's most notable breakthrough is an encoder-free "Unified" architecture, which allows raw audio waveforms and visual patches to flow directly into the core LLM backbone without the latency or memory overhead of secondary processing modules.

    Available immediately for download on Hugging Face and Kaggle and for use on Google AI Edge Gallery, Gemma 4 12B packs a 256K token context window, native agentic tool-use capabilities, and an explicit step-by-step reasoning mode into a highly optimized footprint that bridges the gap between mobile edge models and heavy data-center infrastructure.

    The Architectural Shift: Understanding the Encoder-Free Advantage

    Gemma 4 12B is highly relevant to enterprise architecture due to its novel "Unified" structure.

    Traditional multimodal systems typically utilize discrete, separate encoders to translate audio waveforms and visual data into representations that the core language model can process.

    This conventional approach inherently increases both inference latency and total memory consumption.

    Gemma 4 12B radically alters this pipeline by functioning entirely without these secondary encoders. Instead, visual patches and raw audio waveforms are projected directly into the core large language model's embedding space through lightweight linear layers.

    The vision encoder is replaced by a 35-million-parameter module utilizing a single matrix multiplication, while the audio encoder is eliminated entirely.

    For enterprise engineering teams, this unified architecture delivers distinct operational advantages: lower latency for multimodal tasks, reduced VRAM requirements (down to 16GB — typical for laptops), and the ability to fine-tune the entire multimodal system in a single, cohesive pass.

    Performance Metrics and Core Capabilities

    Despite its compact size, Gemma 4 12B achieves benchmarks nearing Google's larger 26B Mixture-of-Experts model.

    Beyond static benchmarks, the model supports a massive 256K token context window. This is critical for enterprises needing to process lengthy financial reports, extensive code repositories, or hour-long meeting transcripts.

    Furthermore, Gemma 4 12B includes a native "thinking" mode to map out step-by-step reasoning before generating a response. It also features out-of-the-box support for native function calling and system prompts, which are essential prerequisites for building highly capable autonomous software agents.

    The Enterprise Verdict: Should You Adopt Gemma 4 12B?

    The short answer is yes, provided your operational needs align with edge computing, strict data privacy, or agentic automation. However, adoption should not be a blanket replacement for all existing AI infrastructure. Instead, technical leaders should view Gemma 4 12B as a specialized tool optimized for specific deployment conditions.

    • Strict Data Privacy and Compliance Mandates: Many enterprises operate in highly regulated sectors—such as healthcare, finance, or defense—where transmitting sensitive data, proprietary code, or confidential internal documents to third-party APIs is unacceptable. Because Gemma 4 12B is small enough to run locally on machines equipped with just 16GB of VRAM or unified memory, organizations can process sensitive multimodal data entirely on-premises or directly on employee laptops. This local execution eliminates the risk of data leakage and ensures compliance with strict regulatory frameworks.

    • Multimodal Autonomous Agent Workflows: If your engineering roadmap involves autonomous agents interacting with real-world inputs, Gemma 4 12B is uniquely positioned to serve as the reasoning engine. The combination of native function calling, robust coding capabilities, and the capacity to ingest real-time audio and variable-resolution images makes it highly suitable for agentic tasks. Google has simultaneously released a dedicated Gemma Skills Repository to explicitly support agentic development with these new models.

    • Cost-Sensitive Edge Deployments: For applications operating at the edge—such as retail inventory monitoring via cameras, localized customer service kiosks, or offline field-service applications—maintaining a persistent cloud connection is costly and sometimes impossible. The encoder-free architecture significantly lowers the total cost of ownership by reducing the hardware threshold needed for inference. Deploying a highly capable 12B model locally avoids recurring API costs and unpredictable cloud compute billing.

    When to Consider Alternative Solutions

    While Gemma 4 12B is powerful, it has specific constraints that technical leaders must acknowledge.

    • Massive Knowledge Retrieval: Like all large language models, Gemma 4 12B is a reasoning engine, not a static database. If your primary use case relies on vast, generalized factual retrieval without leveraging a robust Retrieval-Augmented Generation pipeline, you may still require larger foundation models.

    • Extended Video and Audio Processing: The model has hard limits on media ingestion. Audio inputs are strictly capped at 30 seconds of processing, and video understanding is limited to 60 seconds (assuming a processing rate of one frame per second). Enterprises looking to process feature-length videos or massive audio archives natively will hit bottlenecks and should consider API-based models or chunking architectures.

    Implementation and Ecosystem Readiness

    One of the strongest arguments for enterprise adoption is the model's immediate compatibility with the broader open-source development ecosystem.

    Google has ensured that Gemma 4 12B is not an isolated experiment; it is ready for production. Weights are available on Hugging Face and Kaggle, and the model integrates seamlessly with industry-standard deployment frameworks such as vLLM, SGLang, MLX, and llama.cpp.

    For organizations deeply embedded in Google Cloud, endpoints can be spun up quickly using the Gemini Enterprise Agent Platform Model Garden, Cloud Run, or Google Kubernetes Engine.

    For enterprise leaders aiming to decentralize their AI workloads, Gemma 4 12B offers a rare combination of edge-friendly efficiency and frontier-class reasoning. If your organization requires highly private, multimodal processing without the latency and cost of cloud reliance, Gemma 4 12B should be heavily evaluated for your next production pipeline.

  • Enterprise AI agents keep creating data silos. Microsoft's Build answer is Microsoft IQ and Rayfin. Wed, 03 Jun 2026 01:55:14 GMT

    Every new AI agent your team deploys starts from scratch: no memory of how the business works, where data lives, or what rules apply. And as agentic coding tools spin up applications faster than anyone can govern them, each one risks becoming another silo outside your data layer entirely. Microsoft is addressing both problems directly at Build 2026.

    According to VentureBeat's VB Pulse's Q1 2026 RAG Infrastructure Market Tracker, hybrid retrieval intent among 100-plus employee organizations tripled from 10.3% in January to 33.3% in March, a signal that enterprises have moved past expanding RAG coverage and are now focused on the architecture underneath it. Shared business context is the part retrieval does not solve.

    On the context side, Microsoft is expanding Fabric IQ, its existing business data context layer, into a broader unified system called Microsoft IQ, adding three additional context sources covering how the organization works, what it knows and real-time global signals from the web, so any agent can tap all four as a single foundation. On the application side, Rayfin, a new open-source SDK and CLI, deploys agent-built applications directly to Fabric as a governed production backend, routing application data into the same platform rather than spinning up new silos.

    Amir Netz, CTO of Microsoft Fabric, reached for a film analogy to explain where the data platform fits. The green screen of cascading code in "The Matrix" wasn't atmosphere, it was the layer that built the world Agent Smith operated in.

    "Our job in the world of data is creating reality for agents based on data," Netz told VentureBeat.

    Microsoft IQ unifies four context sources into a single agent foundation

    Microsoft IQ brings together four context sources that until now existed separately, designed so a developer can connect a new agent to all four in a single integration step.

    Work IQ. Captures how the organization operates day to day, drawing on email, documents, meetings and schedules to give agents an understanding of people, teams and workflows.

    Foundry IQ. Manages institutional knowledge, curating and indexing knowledge bases so agents understand what it means to work within the organization, what rules apply and what procedures to follow.

    Fabric IQ. Models the live operational state of the business through data, defining entities, relationships and business rules grounded in real-time signals from Fabric Real-Time Intelligence. Ontologies, the layer that captures that operational context, are expected to reach GA in the coming months.

    Web IQ. Adds real-time global context from the web, giving agents a current picture of the world outside the organization alongside its internal data.

    "The agents are going to become highly informed virtual employees," Netz said. "That's where the world is heading."

    Rayfin routes agent-built applications into the same data foundation

    Building shared context solves one half of the problem. The other is what happens when agents start generating applications. Every new app needs a backend, and without a governed deployment path each one creates a new data silo outside the context layer entirely.

    Rayfin provides an enterprise-grade back end and deploys agent-built applications directly to Fabric, so application data lands in Microsoft OneLake by default and feeds back into the Microsoft IQ context layer rather than accumulating outside it.

    Microsoft positions Rayfin against Supabase and Neon, the Postgres-compatible backends that agentic coding tools default to. The differentiator is governance: Rayfin routes the entire application fleet through Fabric's unified data and compliance layer rather than creating isolated silos.

    Netz described the relationship as bidirectional. The agent building a Rayfin application draws from the organization's ontology. The data that application generates then enriches that ontology for the next agent.

    Every major data platform is chasing the same answer, but execution is unproven

    Microsoft is not the only platform building a shared context layer for agents. Snowflake announced its own context capabilities this week with semantic capabilities. Pinecone has its Nexus platform that expands the vector database to become a knowledge engine and Redis has developed its Iris context and memory platform.

    Microsoft's approach further reinforces the trend that RAG and model availability aren't the issue anymore.

    "Fabric IQ and Rayfin are important because the enterprise AI challenge is no longer just about the model availability," Robert Kramer, managing partner at KramerERP told VentureBeat. "The real question is whether Microsoft simplifies execution and strengthens trust or adds another layer to an already complex environment."

  • Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary Tue, 02 Jun 2026 22:40:00 GMT

    Alibaba this week released Qwen3.7-Plus, the latest AI large language model (LLM) in its globally beloved and increasingly expansive Qwen family, boasting more multimodal capabilities and a 60% lower cost than the prior, text-only Qwen3.7-Max model released just weeks ago.

    However, like its immediate predecessor Qwen3.7-Plus is available only under a "closed" commercial license via proprietary application programming interfaces (API) and Qwen Chat.

    That marks a big departure from the Qwen strategy to date, which was focused mainly on releasing powerful,near state-of-the-art open source models. Those enterprises and users who relied on the open source Qwen models — among them, U.S. giants such as Airbnb — will no doubt be disappointed to see that Alibaba is going closed for its newer releases.

    Still, the model is worth a look because of its low cost and high performance on multimodal tasks like creating enterprise-grade visuals or analyzing video, imagery and screenshots, which Qwen3.7-Max cannot do (it's text-only). It is among the cheaper powerful AI models available now, coming in price-wise just above Chinese rival's new MiniMax-M3's limited-time discount pricing.

    VentureBeat Frontier AI Model API Pricing Snapshot

    Model

    Input

    Output

    Total Cost

    Source

    MiMo-V2.5 Flash

    $0.10

    $0.30

    $0.40

    Xiaomi MiMo

    deepseek-v4-flash

    $0.14

    $0.28

    $0.42

    DeepSeek

    deepseek-v4-pro

    $0.435

    $0.87

    $1.305

    DeepSeek

    MiniMax-M3

    $0.30

    $1.20

    $1.50

    MiniMax

    Qwen3.7-Plus

    $0.40

    $1.60

    $2.00

    Alibaba Cloud

    Gemini 3.1 Flash-Lite

    $0.25

    $1.50

    $1.75

    Google

    MiMo-V2.5

    $0.40

    $2.00

    $2.40

    Xiaomi MiMo

    Grok 4.3 low context

    $1.25

    $2.50

    $3.75

    xAI

    GLM-5

    $1.00

    $3.20

    $4.20

    Z.ai

    Kimi-K2.6

    $0.95

    $4.00

    $4.95

    Moonshot/Kimi

    GLM-5.1

    $1.40

    $4.40

    $5.80

    Z.ai

    Grok 4.3 high context

    $2.50

    $5.00

    $7.50

    xAI

    Qwen3.7-Max

    $2.50

    $7.50

    $10.00

    Alibaba Cloud

    Gemini 3.5 Flash

    $1.50

    $9.00

    $10.50

    Google

    Gemini 3.1 Pro Preview ≤200K

    $2.00

    $12.00

    $14.00

    Google

    GPT-5.4

    $2.50

    $15.00

    $17.50

    OpenAI

    Gemini 3.1 Pro Preview >200K

    $4.00

    $18.00

    $22.00

    Google

    Claude Opus 4.8

    $5.00

    $25.00

    $30.00

    Anthropic

    GPT-5.5

    $5.00

    $30.00

    $35.00

    OpenAI

    Maintaining continuity during complex tool execution loops

    For technical decision-makers deploying autonomous agents, the primary bottleneck has rarely been initial model intelligence. Instead, it is state decay—the tendency of an agent framework to lose its analytical trajectory over multi-step, long-horizon tasks.

    Qwen3.7-Plus addresses this architectural vulnerability through a combined approach to context management and reasoning state preservation.

    The model ships with a 1-million token context window and allocates up to 256K tokens specifically for internal chain-of-thought processing. To contextualize this capacity, imagine an automated cloud migration agent: it can ingest an entire codebase, map out the dependencies, and spend thousands of tokens quietly evaluating edge cases before executing a single line of bash script.

    Crucially, the API exposes a parameter called 'preserve_thinking.' Across Alibaba's ecosystem, the capability serves as a standardized architectural bridge rather than a tiered perk. Alibaba introduced the feature during the prior Qwen 3.6 generation, integrating it into both the open-weight Qwen3.6-27B and the proprietary Max models.

    At its core, the parameter operates at the API and template level to retain internal <think> blocks across continuous conversational turns.

    This structural continuity solves a critical bottleneck for developers engineering long-horizon tasks. By keeping these internal logic loops intact, the feature prevents the model from dropping its context or needlessly recomputing its cached history midway through an operation.

    When a model executes complex, multi-step agentic coding assignments, this retention allows the system to hold onto its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.

    Alibaba remains far from alone in recognizing this technical necessity, as the underlying concept now dictates the architecture of nearly all major artificial intelligence laboratories.

    Anthropic deploys this exact capability under the moniker "Extended Thinking" for its advanced models, including its latest Claude Opus 4.8. This framework requires developers to feed unmodified thinking blocks directly back into the API on subsequent turns to maintain an unbroken chain of reasoning.

    OpenAI tackles the same challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the OpenAI ecosystem, developers must return specific reasoning items generated alongside previous function calls, ensuring the model explicitly remembers the rationale behind its tool executions.

    Ultimately, preserve_thinking simply represents Alibaba's terminology for what has rapidly become the undisputed table stakes for modern multi-turn reasoning.

    Benchmarks show a competitive, yet sub state-of-the-art model

    On raw capability metrics, this deep-thinking architecture translates to structural gains across multimodal and agentic benchmarks. However, it still falls below many of the leading and prior generations of U.S. proprietary models such as Anthropic's Claude Opus 4.6 and OpenAI's GPT-5.4.

    On Terminal Bench 2.0-Terminus, which measures an model's capability to run actual terminal-level code safely and iteratively, Qwen3.7-Plus scored 70.3, outperforming DeepSeek-V4-Pro Max (67.9) and Gemini-3.1 Pro (63.5).

    On computer vision benchmarks that demand localized interface understanding, such as ScreenSpot Pro, the model hit 79.0, significantly outpacing legacy industry standouts like GPT-5.4 (xhigh) at 67.4 and Claude-Opus-4.6 at 49.5. Agent Evaluation Metrics (Selected Benchmarks)

    What should enterprises consider Qwen3.7-Plus for?

    For an enterprise architect, the key question when analyzing Qwen3.7-Plus is clear: What does this replace in our current tech stack?

    The model is designed to step in as a direct replacement for premier frontier models (such as GPT-5-tier or Claude-Max-tier models) within high-frequency developer workflows, robotic process automation (RPA), and data engineering pipelines.

    Rather than deploying an expensive, general-purpose flagship model to handle repetitive system operations, technical teams can route these tasks to Qwen3.7-Plus. It handles visual interface interpretation, command execution, and code generation simultaneously.

    Alibaba has structured its API delivery to align with existing open-source and proprietary enterprise frameworks. The endpoints are fully OpenAI-compatible, meaning swapping out existing dependencies requires minimal infrastructure adjustment. For groups leveraging autonomous terminal frameworks, the integration is natively supported across multiple environments.

    Engineers can run Qwen3.7-Plus directly through their local terminal setups by altering base environment targets.

    From a pure cost perspective, running an agent framework that constantly references massive code repositories or visual layout histories can quickly become cost-prohibitive.

    Alibaba addresses this by exposing granular caching price points.

    Standard input processing sits at $0.40 per million tokens, but if the agent is reading from an explicitly created cache (e.g., a massive base repository or standard enterprise UI kit that remains static over hundreds of automated loops), the cost drops sharply to $0.04 per 1M tokens for subsequent reads.

    This tier makes high-frequency, multi-turn agent iterations economically practical at an enterprise scale.

    No open source license or open weights raises the compliance question for enterprises

    When evaluating any model in the Qwen ecosystem, a primary concern for legal and security teams is the licensing framework and operational boundary of the data pipeline.

    While previous iterations of the Qwen family gained significant enterprise traction via fully open-source weight availability under the Apache 2.0 or customized open-use licenses, Qwen3.7-Plus is delivered strictly as a managed, commercial cloud API via Alibaba Cloud Model Studio. For enterprise risk management, this distinction carries specific implications:

    • No Local Weight Deployment: Organizations cannot download, sandbox, or locally host the weights of Qwen3.7-Plus within their completely air-gapped internal data centers. All data verification, visual processing, and execution calls must step through Alibaba Cloud's international endpoints (e.g., the Singapore instance highlighted in developer documentation).

    • Compliance and Sovereignty: Since the model requires cloud-based inference, companies operating under strict sovereign data boundaries (such as healthcare entities subject to local HIPAA/GDPR constraints or defense contractors) must explicitly evaluate whether external API routing complies with their specific data-residency obligations.

    • Managed Risk Mitigation: Conversely, a managed API structure removes the internal infrastructure burden of provisioning, optimizing, and maintaining multi-GPU clusters (such as dedicated Nvidia H100 arrays) simply to host an internal agent network.

    Still, Qwen3.7-Plus offers high intelligence across modalities at low cost

    The initial reception from developer communities and technical venture capital highlights the shifting economics of agent deployment.

    Prominent industry voice and Web3 venture capitalist @Boxmining highlighted the strategic cost advantage, stating:

    "Qwen 3.7 Plus being 40% cheaper than Max changes the conversation. If the output is close enough for most coding and much stronger for visual workflows, do you really need Max every day or only for the heavy terminal-only jobs?"

    This perspective aligns with the current trend of optimizing enterprise operational budgets: shifting away from raw, unconstrained compute toward targeted task automation.At the same time, specialized researchers deep within the ecosystem point out that this isn't merely an incremental optimization of text generation.

    Dunjie Lu, a research intern at Alibaba Qwen, remarked:

    "It shows clear gains over Qwen3.6-Plus in computer-use capabilities, with stronger generalization beyond general desktop tasks into professional workflows such as data engineering and scientific research."

    Ultimately, for enterprise buyers deciding on their next infrastructure roadmap, Qwen3.7-Plus presents a practical alternative. If your organization's primary objective is building resilient, visual-capable autonomous software loops that interact directly with developer environments and cloud consoles—without blowing out your inference budget—the model provides a compelling reason to shift execution away from more expensive frontier alternatives.

  • Perplexity AI unveils hybrid local-cloud inference system at Computex 2026 Tue, 02 Jun 2026 19:08:17 GMT

    Perplexity AI, the fast-growing search startup now valued at $20 billion, unveiled what it calls the first hybrid local-server inference orchestrator at Computex 2026 on Monday night, demonstrating software that autonomously decides — in real time and mid-task — which AI workloads stay on a user's device and which get routed to frontier models in the cloud.

    CEO Aravind Srinivas demonstrated the system onstage alongside Intel CEO Lip-Bu Tan during Intel's keynote address, using Perplexity's "Personal Computer" agent to process confidential deal materials. In the demonstration, local models running on Intel Core Ultra Series 3 determined which information should remain on the device and which information could be sent to cloud-based models. Srinivas said the approach balances intelligence, accuracy, privacy, and cost.

    The key claim is not that a model can run locally — dozens of tools already do that. It is that Perplexity's system makes the routing decision itself, task by task, without requiring the user to choose in advance. Sensitive data like financial records or health information stays on the local machine; the heavier reasoning tasks that require frontier-scale models get sent to the cloud. One task, multiple execution locations, automatic orchestration.

    "No product has done this before," a Perplexity spokesperson said in an email to VentureBeat. The product is not yet available to users; according to the company, the hybrid inference feature will launch in the coming weeks.

    Perplexity's road from cloud-only agents to on-device AI orchestration

    To understand why the Computex demonstration matters, it helps to trace the product arc Perplexity has been building since early this year.

    On February 25, Perplexity launched Computer, a multi-model AI agent that orchestrates 19 different AI models to complete complex, long-running tasks on behalf of users. The system ran entirely in the cloud, breaking goals into subtasks and routing each to whichever model — Claude, Gemini, GPT, Grok, or others — was best suited for the job. Perplexity Computer unified every current AI capability into a single system, functioning as a general-purpose digital worker that operates the same interfaces a user does.

    Then, in March, Perplexity introduced Personal Computer at its inaugural Ask 2026 developer conference. That product launched as a new Mac app with support for a hybrid local-cloud AI agent, which Perplexity described as a "personal orchestrator" that hybridizes local and server environments for security and productivity. Personal Computer could access the Mac's file system and native Mac apps to create and execute entire workflows, with files created in a secure sandbox and all actions auditable and reversible.

    What Srinivas demonstrated at Computex extends this architecture in a fundamental way. Previously, even the Personal Computer product divided labor along relatively clear lines: local file access on the device, heavy computation on Perplexity's servers.

    The new hybrid inference orchestrator gives the system itself the ability to reason about where each piece of a task should execute — not just which model to use, but which physical location should process it. The system reportedly asks for user permission before sending sensitive tasks to the cloud, a design choice that addresses one of the central anxieties enterprises have about agentic AI: data governance.

    Why Nvidia’s RTX Spark and Intel's new silicon make the timing strategic

    The timing of the demonstration is not coincidental. Computex 2026 has been dominated by a single theme: on-device AI. Just hours before the Intel keynote, Nvidia CEO Jensen Huang unveiled the RTX Spark, a new Arm-based superchip that the company positions as the foundation for a new generation of AI-native Windows PCs.

    At full strength, the RTX Spark Superchip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth — enough power and memory for AI agents and 120-billion-parameter models with context lengths stretching to a million tokens. RTX Spark systems will begin arriving in the fall.

    Intel, not to be outdone, used its keynote to showcase Xeon 6+ processors with 288 efficiency cores built on 18A technology for the data center, and positioned its Core Ultra Series 3 as the client silicon that makes hybrid inference possible on the PC.

    Perplexity's hybrid orchestrator sits at the intersection of both strategies. If the system performs as advertised, it creates a direct economic incentive for users — and eventually enterprises — to invest in more powerful local silicon. The more capable the on-device chip, the more inference can run locally, reducing cloud costs and improving latency for sensitive workloads. That dynamic benefits Nvidia, Intel, and every other chipmaker competing for AI PC sockets.

    The implications extend well beyond chip economics. "As chips become more powerful, more intelligence moves onto a person's machine, alongside server inference for the complex tasks that still need frontier models," a Perplexity spokesperson told VentureBeat. "Sensitive and sovereign work can stay local, which changes the need for massive country-level infrastructure." 

    That last claim — about sovereign infrastructure — is the most provocative. Nations from the UAE to France to India have been investing billions in domestic AI compute capacity partly on the assumption that sensitive data must stay within their borders, which means building or buying access to local data centers. If meaningful inference can run on an end user's device with no data leaving the machine, the calculus changes. It does not eliminate the need for data centers, but it could soften the urgency of the buildout.

    The model-agnostic architecture that makes hybrid inference possible

    Perplexity's hybrid inference play rests on the same architectural bet the company has been making all year: that the orchestration layer matters more than any individual model. For AI engineers, this signals a fundamental shift — the orchestration layer may matter more than the models themselves.

    The key insight is separation of concerns: the orchestration layer handles task decomposition, state management, and tool coordination, while the model layer handles specific computations. This decoupling means teams can swap models as better alternatives emerge without redesigning the entire system.

    Perplexity has leaned heavily into this philosophy. The company is doubling down on packaging frontier models in a consumer-friendly user experience, arguing that there is value in orchestrating multiple third-party LLMs to obtain the most cost-effective and accurate answers to queries. Models, in Perplexity's view, are specializing, not commoditizing.

    The hybrid inference extension takes that logic one step further. Perplexity is now orchestrating not just across models but across physical compute locations — choosing which model runs where. A lightweight local model might handle a privacy-sensitive document summarization task while a frontier cloud model tackles the complex reasoning required to analyze that summary against a broader market landscape. The orchestrator manages the handoff.

    This is a technically ambitious claim. Making it work reliably in production will require the orchestrator to accurately assess the complexity of each subtask, understand the sensitivity of the data involved, know the capabilities and latency characteristics of whatever local hardware the user has, and manage the state of a task that may be bouncing between environments mid-execution.

    It is easy to imagine edge cases where the routing logic fails, sends something sensitive to the cloud, or degrades performance by assigning a task to an underpowered local model. Perplexity says the system will be chip-agnostic, though the initial Computex demo ran on Intel silicon. The company expressed enthusiasm in its communications about the new AI chips announced at Computex this week, suggesting it intends to optimize across vendors.

    A $20 billion valuation, nine lawsuits, and the pressure to deliver

    The hybrid inference announcement arrives at a complicated moment for Perplexity. The company has been on a remarkable growth trajectory: It secured $200 million in new capital at a $20 billion valuation, just two months after raising $100 million at an $18 billion valuation. Since its founding three years ago, the rapidly growing AI company has raised $1.5 billion in total funding, according to PitchBook data.

    But the company also faces a mounting stack of legal challenges. Nine organizations have filed active suits against Perplexity for alleged copyright and trademark infringement as of May 31, 2026: CNN, the New York Times, News Corp and Dow Jones, the New York Post, the Chicago Tribune, Encyclopedia Britannica, Merriam-Webster, Reddit, and Japan's Yomiuri Shimbun. The CNN lawsuit, filed just days ago on May 28, is the most recent, accusing Perplexity of scraping more than 17,000 CNN stories, photos, videos, and other content and using that material to train its products. Perplexity has responded with a consistent message. "You can't copyright facts," the company's chief communications officer Jesse Dwyer said in a statement.

    Other publishers have opted for partnership over litigation. Time, Gannett, Le Monde, and Der Spiegel have signed licensing arrangements with Perplexity. The company launched a Publishers Program in mid-2024 in which participating outlets receive a share of revenue generated when their content is cited in Perplexity answers. 

    According to CNBC, Perplexity's chief business officer Dmitry Shevelenko confirmed at the time that the flat rate was a double-digit percentage but declined to share specifics. As TechCrunch reported in December 2024, additional publishers including the LA Times, Adweek, The Independent, and Lee Enterprises subsequently joined the program, though not without internal controversy — reporters at some outlets told TechCrunch they were not informed of the deals before they were announced publicly. 

    The legal risk is not existential, but it is material, and with enterprises increasingly evaluating Perplexity's tools for sensitive workflows — precisely the use case the hybrid inference system is designed to serve — unresolved intellectual property questions could dampen adoption.

    How hybrid inference sharpens Perplexity's enterprise ambitions

    The hybrid inference demo should be read alongside Perplexity's broader push into enterprise software, a transformation that accelerated dramatically this year. At the Ask 2026 developer conference in March, VentureBeat reported that Perplexity announced Computer for Enterprise, positioning the three-year-old startup as a direct competitor to Microsoft, Salesforce, and the legacy enterprise software stack.

    Beyond Computer's existing 100-plus integrations, enterprise customers gained access to business-grade connectors for Snowflake, Datadog, Salesforce, SharePoint, and HubSpot, with administrators able to install custom connectors via the Model Context Protocol. The package also includes purpose-built workflow templates for legal contract review, finance audit support, sales call preparation, and customer support ticket triage, alongside SOC 2 Type II certification and the option for zero data retention.

    Hybrid inference deepens this enterprise pitch considerably. For regulated industries — financial services, healthcare, defense, legal — the ability to keep sensitive data on a local device while still accessing the reasoning power of frontier cloud models is not a nice-to-have. It is a potential compliance requirement.

    An investment bank parsing confidential deal documents, for instance, might be unable to send those materials to a third-party cloud under existing data handling agreements. A system that can run the sensitive parsing locally while routing non-sensitive analytical tasks to the cloud offers a middle path. IDC forecasts a tenfold increase in agent usage and a thousandfold growth in inference demands by 2027, and security and governance rank as the top evaluation factor for enterprise agentic platforms, according to a CrewAI survey. Hybrid inference speaks directly to that priority.

    The race to decide where AI actually runs is just getting started

    Several questions will determine whether Perplexity's Computex demonstration becomes a landmark product or a compelling prototype.

    The actual performance characteristics remain untested outside a controlled stage environment — how the routing logic handles varied hardware configurations, unreliable network connections, and ambiguous data sensitivity classifications is an open question.

    The competitive response matters too: Google, Microsoft, Apple, and OpenAI are all building their own local-cloud AI architectures. Apple Intelligence already routes some tasks locally and some to Private Cloud Compute servers, Google's Gemini Nano runs on-device, and Microsoft's Copilot+ PCs are designed around local inference capabilities. None of these systems, however, currently offer the kind of dynamic, autonomous task-level routing Perplexity demonstrated on stage.

    Then there is the business itself. Perplexity's annualized recurring revenue surged past $450 million in March 2026, up from roughly $200 million six months earlier — rapid growth, but at a valuation north of $20 billion, the company still trades at a premium that demands the technology translate into sustained enterprise adoption.

    Perplexity has built its business on a bet that the future belongs not to any single model but to the system that orchestrates all of them. At Computex, it extended that bet from the software layer to the physical layer — from which model to which machine. In the AI industry's relentless race to build bigger data centers and train larger models, Perplexity just argued that the most important computer in the stack might be the one already sitting on your desk.

  • Microsoft debuts Surface RTX Spark Dev Box to run large AI models without cloud costs Tue, 02 Jun 2026 16:30:00 GMT

    Microsoft on Monday unveiled the Surface RTX Spark Dev Box, a compact desktop computer designed to let software developers run large AI models on their desks instead of paying for cloud computing — a move that directly challenges the per-token pricing model that has defined the AI industry's economics since ChatGPT launched three and a half years ago.

    The device, announced at Microsoft Build 2026, packs Nvidia’s new Blackwell-architecture RTX Spark processor and 128 gigabytes of unified memory into a small-form-factor chassis, delivering what Nvidia rates at one petaflop of AI compute. In practical terms, that means a developer can load, run and interact with AI models exceeding 120 billion parameters without sending a single API call to the cloud.

    "These class of devices, we think, will get to about 100 billion parameter model running," Pavan Davuluri, Microsoft's executive vice president of Windows and Devices, said during a press briefing ahead of the event. He emphasized that raw model size is only part of the equation: "The model size is one thing, but for the model to be effective, it kind of needs to be able to have enough context, because a larger model, you feed it larger context." At 100,000 tokens of context, he noted, the key-value cache alone can consume 40 to 50 gigabytes of memory — which is precisely why Microsoft and Nvidia engineered the device around a 128-gigabyte unified memory pool shared dynamically between the CPU and GPU.

    The machine will be available later this year in the United States, sold exclusively through Microsoft.com. The company did not disclose pricing.

    Why Microsoft is betting that AI's future runs on fixed costs, not cloud meters

    The Surface RTX Spark Dev Box arrives at a moment when the economics of AI development have become a boardroom-level concern. Companies large and small are grappling with cloud GPU bills that scale unpredictably: every fine-tuning run, every inference call, every agentic workflow that loops through a frontier model accumulates cost. For a developer iterating rapidly on a prototype — running the same model dozens or hundreds of times a day — those charges compound fast.

    Microsoft is framing the Dev Box as a release valve for that pressure. Andrew Hill, corporate vice president of Surface, wrote in the announcement blog post that the device "changes that equation" by letting developers "reserve frontier model calls for truly frontier problems and handle the rest on their own hardware." The pitch is not that cloud computing is obsolete, but that much of the work currently being sent to remote data centers does not require state-of-the-art models and would be better served by capable local hardware with predictable, fixed costs.

    This is a significant strategic shift for Microsoft, a company that derives tens of billions of dollars in annual revenue from Azure cloud services. By selling hardware that explicitly reduces customers' cloud dependency, Microsoft is acknowledging a tension that has been building across the industry: the marginal cost of AI inference at scale is unsustainable for many teams, and the market is demanding alternatives. The bet appears to be that developers who prototype locally will still deploy to Azure when they need to scale — and that owning both ends of that workflow is more valuable than owning only the cloud.

    Inside the 128GB unified memory architecture that makes local AI possible

    The technical architecture of the Dev Box reflects a set of deliberate engineering choices aimed at sustained, not peak, performance — a distinction that matters enormously for AI workloads that can run for hours.

    At the center is Nvidia’s RTX Spark system-on-chip, which combines an ultra-efficient ARM-based CPU with a Blackwell-generation RTX GPU. In a traditional Windows PC, Davuluri explained during the briefing, this configuration would require four separate components: a CPU, a discrete GPU, dedicated graphics memory and system RAM. The RTX Spark collapses all of that into a single chip paired with a single unified memory pool.

    That unification is the critical design decision. Conventional gaming laptops with high-end Nvidia GPUs top out at roughly 24 gigabytes of GPU-accessible memory. The Dev Box's 128 gigabytes of unified memory — accessible to both the CPU and GPU through what Nvidia calls its Unified Memory Access architecture — is what makes it possible to load models that would otherwise require cloud GPU instances with specialty high-bandwidth memory configurations.

    Microsoft did substantial work at the operating system level to exploit this architecture. The company implemented new memory management logic in Windows that raises the ceiling on how much system memory the GPU can address, introduces smarter page-size allocation for shared memory regions and ensures that heavy GPU workloads do not starve the CPU of the resources it needs for multitasking. The Windows scheduler was also optimized for RTX Spark's heterogeneous core layout, routing demanding workloads to performance cores while keeping efficiency cores available for background tasks.

    How a 3D-printed aluminum chassis doubles as a heatsink

    The thermal design is equally deliberate. The Dev Box operates within an approximately 100-watt sustained thermal envelope — modest by desktop standards, but meaningful for a device intended to run training jobs and inference workloads continuously. The aluminum chassis itself is engineered to function as a passive heatsink, and the method Microsoft used to build it is among the most striking details about the machine.

    The top panel is manufactured using metal 3D printing, a process that enables internal geometries too complex for conventional CNC machining or injection molding. The perforations are not simple through-holes; they are angled in multiple directions around the internal fan to optimize airflow from cold-air intake through heat dissipation. During the press briefing, Harry, a Surface industrial designer, explained the rationale: "The complexity is something other manufacturers wouldn't be able to do, like CNC, or like any molding, because of the complexity of shape."

    When asked whether 3D printing would constrain mass production, the designer acknowledged the challenge but suggested Microsoft had developed a process robust enough to scale. The result is a machine that runs quietly enough for an open office while sustaining the kind of continuous GPU workloads that would throttle most conventional desktops of similar size. For a device that Microsoft expects developers to leave running overnight on fine-tuning jobs, quiet sustained performance is not a luxury — it is a requirement.

    A developer-first setup that eliminates hours of configuration

    Microsoft is shipping the Dev Box with Windows 11 Pro pre-configured at the image level for development work — a detail that sounds minor but reflects a growing recognition that the out-of-box experience for developer hardware has historically been poor.

    The machine boots into a dark theme with a simplified taskbar, widgets removed and Do Not Disturb enabled. Developer Mode is turned on. PowerShell 7 is the default shell. WSL 2 — the Windows Subsystem for Linux — comes pre-installed with GPU passthrough and CUDA support already configured. Visual Studio Code, GitHub Copilot, Git, Python and Node.js are all installed and ready.

    "We've said, 'Hey, you know what, we got you, you want to go fast,'" a Microsoft engineer who demonstrated the configuration during the briefing told VentureBeat. The philosophy, he explained, is that developers were going to install all of these tools anyway — the friction was in the hours of setup and configuration that stood between unboxing a machine and writing the first line of code.

    The Dev Box also ships with integration points across Microsoft's AI stack: AI Toolkit for VS Code for model conversion and fine-tuning, Windows ML and Windows Copilot Runtime for local inference, and Microsoft Foundry for connecting local prototypes to cloud deployment pipelines. For enterprises, the device integrates with Entra ID and Intune for identity and device management, and includes Secured-core PC architecture, BitLocker encryption and Microsoft Defender.

    Why Apple's Mac Mini may not be the real competition anymore

    The most obvious competitive comparison is Apple's Mac Mini, which has dominated the compact-desktop category and has been widely adopted by developers drawn to Apple Silicon's unified memory architecture and power efficiency.

    Davuluri addressed the comparison directly during the briefing, saying the Dev Box is "in a different class of performance than Mac Minis, intentionally." He declined to share specific benchmarks, noting that detailed specifications and performance targets would come closer to the fall launch. But the architectural advantage Microsoft is claiming is clear: while the current Mac Mini with M4 Pro tops out at 48 gigabytes of unified memory and the M4 Max configuration reaches 128 gigabytes, the RTX Spark Dev Box pairs its 128 gigabytes with a Blackwell-class GPU that has a fundamentally different CUDA-based compute model — one that the vast majority of the AI/ML ecosystem's tooling (PyTorch, TensorRT, llama.cpp, Hugging Face frameworks) is already optimized for.

    That CUDA ecosystem advantage is difficult to overstate. While Apple's Metal framework has made progress, the overwhelming majority of AI training and inference frameworks are built and tested first against Nvidia’s CUDA stack. A developer running models on the Dev Box can use the same code, the same libraries and the same workflows they would use on a cloud GPU instance — a level of portability that Apple Silicon cannot currently match.

    From laptop to supercomputer: Microsoft's three-tier plan for local AI hardware

    The Dev Box is one piece of a three-tier hardware strategy Microsoft laid out at Build. The Surface Laptop Ultra, announced days earlier at Computex, brings the same RTX Spark silicon into a 15-inch laptop form factor for developers and creators who need portability. At the other end of the spectrum, the DGX Station for Windows — built on Nvidia's GB300 Grace Blackwell Ultra Superchip — targets organizations that need to run frontier models up to one trillion parameters on a deskside system. That machine is expected in the fourth quarter of this year.

    The three devices map to a tiered computing model that Microsoft is calling "unmetered intelligence": small on-device language models (the company's new Aion 1.0 family) handle lightweight tasks at zero marginal cost; RTX Spark-class hardware runs mid-range models locally for the bulk of development work; and cloud resources are reserved for genuinely frontier-scale problems.

    The GitHub Copilot CLI is getting a concrete implementation of this model with a new feature called /fleet, which allows a cloud-based primary agent to build a plan, assess the complexity of each task and route appropriate subtasks to a local model running on the developer's hardware. The cloud agent handles what requires frontier capability; the local model handles what does not. The result, in theory, is lower cost without lower quality.

    The real question is whether hybrid AI can shift from buzzword to business model

    Whether Microsoft's bet pays off depends on questions that will take months to answer. How does the Dev Box actually perform under sustained, real-world workloads? What will it cost? How quickly will the open-source model ecosystem continue to produce capable models in the 70-to-120-billion-parameter range that fit within its memory envelope? And perhaps most critically: will enterprise procurement teams, trained to think of AI as a cloud line item, accept a capital expenditure on desk hardware as an alternative?

    The strategic logic, however, is difficult to dismiss. For three years, the AI industry has operated on an implicit assumption: serious AI work happens in the cloud, and the economics of that arrangement are simply the cost of doing business. Microsoft, a company with every incentive to reinforce that assumption, is now selling a machine that undermines it. That is not a contradiction — it is a recognition that the market is moving, and that the company that controls the developer's local environment and the cloud they deploy to has a more durable advantage than one that controls only the cloud.

    Every dollar a developer does not spend on cloud inference is a dollar that can fund another experiment, another iteration, another prototype. For years, the AI industry told developers they needed to rent their intelligence by the token. Microsoft is now asking a different question: what if you could just buy it?



Techradar



TechNode

  • NetEase Games’ Eggy Party PC version goes live today Fri, 05 Jun 2026 05:47:29 +0000
    NetEase Games’ party game Eggy Party today launched on Windows, bringing the full experience to PC players with cross-platform account and data synchronization between mobile and PC. Players can seamlessly switch between devices while retaining their progress. The PC version preserves popular game modes, including Peak Party, Casual Party, and Eggy Farm. It also introduces […]
  • Foxconn, Intel partner to develop next-generation AI infrastructure Fri, 05 Jun 2026 03:27:25 +0000
    Taiwan’s Foxconn, formally known as Hon Hai Precision Industry, said on Thursday it would partner with US chipmaker Intel to develop and deploy next-generation AI infrastructure and intelligent computing platforms, seeking to tap surging global demand for AI computing systems. The companies said the partnership would combine Intel’s semiconductor technology with Foxconn’s manufacturing and system […]
  • BYD is developing humanoid robots, according to source Thu, 04 Jun 2026 07:59:57 +0000
    Chinese electric vehicle giant BYD is developing humanoid robots, according to a person familiar with the matter cited by Chinese media outlet Yicai on Wednesday. The report follows recent comments from BYD Executive Vice President Li Ke, who confirmed in an interview that the company is working on humanoid robotics. “BYD is working on humanoid […]
  • Qwen opens platform to third-party AI Agents, onboards KFC, Luckin Coffee, Mixue and more Thu, 04 Jun 2026 02:53:21 +0000
    Alibaba-backed Qwen App announced on Wednesday that it is opening its platform to third-party Agents and Skills, allowing companies to operate branded AI agents within the app. According to the company, Luckin Coffee, KFC, Mixue, and China Eastern Airlines are among the first businesses to begin testing related services, with some features expected to roll […]
  • DeepSeek in talks to raise $7 billion from Tencent, CATL and other investors Thu, 04 Jun 2026 02:08:23 +0000
    Chinese AI startup DeepSeek is poised to raise around 50 billion yuan ($7 billion) in its first external funding round, with backing from major investors including Tencent Holdings and battery giant CATL, according to sources familiar with the matter cited by Reuters. The financing would value DeepSeek at between 350 billion yuan and 400 billion […]
  • WeRide and Uber to launch Spain’s first Robotaxi service Wed, 03 Jun 2026 06:53:56 +0000
    WeRide and Uber jointly announced plans to launch Spain’s first commercial robotaxi pilot service in Madrid. The project marks the companies’ first collaboration in the European market and makes Madrid the 12th city in WeRide’s global robotaxi network. With support from the regional government of Madrid, the service is scheduled to launch in 2026. Users […]
  • Tencent reportedly developing WeChat AI agent, makes it a top priority Wed, 03 Jun 2026 05:30:52 +0000
    Tencent is advancing plans to launch an embedded AI agent within WeChat, according to people familiar with the matter. The company is testing a prototype that can help users complete tasks within the app and plans to begin the regulatory approval process required before a public rollout as early as this month. Following that process, […]
  • xAI launches global recruitment drive for Chinese AI tutors Wed, 03 Jun 2026 02:07:46 +0000
    xAI has posted a Chinese AI Tutor position on the recruitment platform Greenhouse as part of its efforts to improve the Chinese-language capabilities of its flagship large language model, Grok. The role focuses on training Grok to better understand Chinese. Candidates are expected to be proficient in standard Mandarin and capable of handling regional dialects […]
  • Unitree IPO approved, Meituan-backed group emerges as top shareholder Tue, 02 Jun 2026 06:17:48 +0000
    On Monday, Unitree Robotics has cleared the listing committee review for its initial public offering (IPO) on China’s STAR Market, marking one of the fastest approvals in the board’s history. The company completed the regulatory process in 73 days from acceptance on March 20 to approval and set a fast-track record for STAR Market listings, […]
  • ByteDance’s Doubao to launch paid plans in late June, link with Douyin E-commerce push Tue, 02 Jun 2026 05:51:41 +0000
    ByteDance’s AI assistant Doubao is expected to roll out paid subscription services in late June, alongside feature updates announced at its upcoming Force conference, according to 36Kr. The PC and mobile versions are still completing billing system integration, which is expected to take about one month, the sources said. Doubao has listed three subscription tiers […]
  • KISED promotes South Korea’s startup ecosystem and support programs at BEYOND Expo Sat, 30 May 2026 17:00:05 +0000
    During BEYOND Expo 2026, the Korea Institute of Startup & Entrepreneurship Development (KISED) introduced South Korea’s startup ecosystem and government-backed support programs to international entrepreneurs and investors, highlighting how the country attracts global startups through innovation, industrial resources, and policy support. A KISED representative stated during the presentation that, unlike traditional companies focused on selling […]
  • Lenovo Innovation Accelerator channels ecosystem power to bring Chinese hard-tech startups to the global stage Fri, 29 May 2026 12:49:38 +0000
    As the AI industry rapidly enters a deep water phase, competition based solely on parameter scale is no longer enough to build lasting advantages. Technologies and products that can truly land in commercial scenarios and solve real-world problems are becoming the new core of industry competition. As a key platform driving Lenovo’s ecosystem innovation strategy, […]
  • CATL launches world’s largest energy storage testbed in Xiamen Fri, 29 May 2026 12:31:15 +0000
    CATL has launched its Xiamen Energy Storage Validation Research Institute, which it described as the world’s largest and most comprehensive one-stop testing and validation platform for the energy storage industry. The site spans 10 hectares and represents an investment of about RMB3 billion ($440 million). CATL said the facility is designed as an open, shared […]
  • BYD launches Xuanji A3, calls it China’s first 4nm smart driving chip Fri, 29 May 2026 08:37:29 +0000
    At BYD’s launch event on Thursday, Wang Chuanfu, chairman and founder of BYD, unveiled BYD’s self-developed Xuanji A3, China’s first 4nm autonomous driving chip, which supports L3 and L4 autonomous driving capabilities. The chip has entered mass production and supports L3 and L4 autonomous driving. A three-chip configuration delivers a combined computing power of over […]
  • Tencent launches WorkBuddy productivity AI agent for global users Fri, 29 May 2026 01:38:32 +0000
    Tencent Cloud has launched WorkBuddy, a productivity-focused AI agent for global users after first rolling out the product in China. The company said WorkBuddy is aimed at office workflows and can use natural language prompts to break down tasks, call external tools, and generate deliverables across work and study scenarios. WorkBuddy supports remote task execution […]
  • iFlytek launches 40g AI glasses with GlassClaw AI agent and advanced noise recognition Fri, 29 May 2026 00:54:22 +0000
    On Thursday, iFlytek unveiled its new AI smart glasses at BEYOND Expo 2026 in Macau under the theme Communication Without Boundaries, the World Before Your Eyes, showcasing the company’s latest push into AI-powered devices. At the launch event, Lin Huijie, General Manager of iFlytek’s Wearable Devices Business Department, said the company hopes the product can […]
  • Ziyouliangji aims to use AI music platform Hitto to turn everyone into a song creator Thu, 28 May 2026 15:10:07 +0000
    As competition in large AI models gradually shifts from parameter races to real-world deployment capabilities, a number of Chinese AI startups focused on vertical scenarios are beginning to stand out. Among them is Ziyouliangji Information Technology, founded in 2023, which is attempting to redefine how ordinary people create music through AI. Unlike many companies concentrating […]
  • MediaTek could partner with Tesla’s TERAFAB, expected to produce chips by 2028 Thu, 28 May 2026 09:08:48 +0000
    According to the latest industry survey released by analyst Ming-Chi Kuo from TF International Securities, among the many custom ASIC (Application-Specific Integrated Circuit) vendors, MediaTek is considered the most likely candidate to become a key strategic partner for TERAFAB, the super chip factory project under Tesla. MediaTek is expected to fully support the adoption and […]
  • From video understanding to edge deployment Om AI targets real-world AI Wed, 27 May 2026 15:26:41 +0000
    In the current phase where competition in large models is shifting from parameter scale to real-world deployment capability, a group of Chinese companies focused on edge AI is gaining attention, and Om AI Technology is one of them. Founded in 2021, the company has chosen not to pursue extremely large cloud-based models, but instead focuses […]
  • At BEYOND Expo 2026, XREAL CEO predicts an iPhone moment for AI glasses Wed, 27 May 2026 13:19:21 +0000
    As AI large models rapidly merge with wearable devices, smart glasses are once again becoming a focal point for the tech industry. From Meta’s AI glasses developed with Ray-Ban, to Apple’s Vision Pro, and Google’s renewed push into the AR ecosystem, global tech giants are competing once again for the next major computing platform. In […]
  • Xiaomi’s Q1 EV deliveries surpass 80,000 units Wed, 27 May 2026 02:58:30 +0000
    Xiaomi released its financial results for the first quarter of 2026 on Tuesday, reporting total revenue of RMB 99.1 billion ($13.8 billion). Revenue from its smart electric vehicle, AI, and other innovation businesses reached RMB 19.9 billion ($2.8 billion). Xiaomi’s R&D spending for the quarter totaled RMB 9 billion ($1.25 billion), up 33.4% year-on-year. The […]
  • LimX Dynamics unveils Luna humanoid robot with AI dance learning Tue, 26 May 2026 10:03:48 +0000
    LimX Dynamics on Monday unveiled the LimX Luna humanoid robot, priced at RMB 298,000 ($41,000). Standing 160cm tall, the LimX Luna features 27 degrees of freedom across its body and is powered by the company’s second-generation SYS 0 motion control engine. The robot also comes with upgraded cooling and battery life, while supporting multimodal interaction […]
  • Xiaohongshu reportedly secures 2026 FIFA World Cup streaming rights in China Tue, 26 May 2026 07:55:00 +0000
    According to multiple sources familiar with the negotiations, Xiaohongshu has secured sublicensing rights for the 2026 FIFA World Cup in China from state broadcaster China Media Group, including live-streaming rights and rights for short-video secondary content creation. The move marks a major shift in China’s sports media landscape. During the previous World Cup cycle, Douyin […]
  • Diablo IV China extends free base game giveaway until August 2026 Mon, 25 May 2026 09:50:15 +0000
    Diablo IV’s China server operator announced that the limited-time free claim event for the game’s base edition, originally priced at RMB 128 ($18), has been extended until August 4, 2026. The promotion first launched on April 28 and attracted a large influx of new players into the game. According to the official announcement, the extension […]
  • Huawei’s mate 90 series may launch with new Kirin chip this autumn Mon, 25 May 2026 03:46:19 +0000
    At the 2026 International Symposium on Circuits and Systems (ISCAS 2026) in Shanghai on Monday, He Tingbo, Huawei’s board member and head of its semiconductor business, said the company would launch a new Kirin smartphone chip this autumn that uses logic folding technology for the first time. The technology moves beyond conventional single-layer chip layouts […]
  • China launches first humanoid robot lifecycle management platform in Beijing Mon, 25 May 2026 03:09:27 +0000
    China has launched the country’s first full lifecycle management service platform for humanoid robots in Beijing. The platform gives each robot a unique digital identity and enables end-to-end tracking from production to recycling. The platform, led by the Ministry of Industry and Information Technology’s Standardization Technical Committee for Humanoid Robots and Embodied Intelligence, assigns every […]
  • Honor’s first robot smartphone revealed in high-resolution images Mon, 25 May 2026 02:33:17 +0000
    On Saturday, Qualcomm hosted its Snapdragon Fans anniversary party, where Honor unveiled the world’s first robot smartphone, the Honor Robot Phone. Qualcomm also released high-resolution hands-on images of the device, showcasing details such as the back design and its robotic-arm camera gimbal. According to Qualcomm, the Honor Robot Phone, powered by a flagship Snapdragon chipset, […]
  • Samsung chairman secretly visits MediaTek seeking to trade memory chips for foundry orders Sat, 23 May 2026 08:54:05 +0000
    According to Taiwanese media outlets including DigiTimes, Samsung Electronics Chairman Lee Jae-yong reportedly led a senior executive team on May 21 in a discreet visit to the headquarters of MediaTek, where they met with MediaTek Chairman Ming-Kai Tsai and CEO Rick Tsai, among other top executives. One of the key objectives of Lee’s trip was […]
  • Tech Odyssey Series: Who writes the first cheque for Portugal’s youngest startups? Fri, 22 May 2026 08:12:50 +0000
    Previous on Tech Odyssey: In our last episode, we went to the University of Porto, INESC TEC, and UPTEC to see how research can move from the lab into startups. This time, we look at what often comes next: who backs those companies when they are still too early for most investors. When a startup […]
  • NetEase Q1 revenue reaches $4.31 billion as Where Winds Meet surge on Steam charts Fri, 22 May 2026 06:28:01 +0000
    On Thursday, NetEase released its first quarter 2026 financial report. Net revenue reached RMB 30.6 billion ($4.31 billion), up 6.1% year-on-year, while net revenue from games and related value-added services came in at RMB 25.7 billion ($3.62 billion), up 6.9%. The biggest highlight of the quarter was Where Winds Meet (Yan Yun Shi Liu Sheng), […]



How Technology Works demystifies the machinery that keeps the modern world going, from simple objects such as zip fasteners and can openers to the latest, most sophisticated devices of the information age, including smartwatches, personal digital assistants, and driverless cars. #ad