Bluesky on Thursday quietly opened the doors to those who want to become verified on its social networking service. In a post published by the Bluesky Safety account, the company announced that “notable and authentic” accounts can now apply for verification through a new online form. Plus, organizations can request to become a Trusted Verifier […]
Read more

Anthropic has announced two new AI models that it claims represent a major step toward making AI agents truly useful.

AI agents trained on Claude Opus 4, the company’s most powerful model to date, raise the bar for what such systems are capable of by tackling difficult tasks over extended periods of time and responding more usefully to user instructions, the company says.

Claude Opus 4 has been built to execute complex tasks that involve completing thousands of steps over several hours. For example, it created a guide for the video game Pokémon Red while playing it for more than 24 hours straight. The company’s previously most powerful model, Claude 3.7 Sonnet, was capable of playing for just 45 minutes, says Dianne Penn, product lead for research at Anthropic.

Similarly, the company says that one of its customers, the Japanese technology company Rakuten, recently deployed Claude Opus 4 to code autonomously for close to seven hours on a complicated open-source project. 

Anthropic achieved these advances by improving the model’s ability to create and maintain “memory files” to store key information. This enhanced ability to “remember” makes the model better at completing longer tasks.

“We see this model generation leap as going from an assistant to a true agent,” says Penn. “While you still have to give a lot of real-time feedback and make all of the key decisions for AI assistants, an agent can make those key decisions itself. It allows humans to act more like a delegator or a judge, rather than having to hold these systems’ hands through every step.”

While Claude Opus 4 will be limited to paying Anthropic customers, a second model, Claude Sonnet 4, will be available for both paid and free tiers of users. Opus 4 is being marketed as a powerful, large model for complex challenges, while Sonnet 4 is described as a smart, efficient model for everyday use.  

Both of the new models are hybrid, meaning they can offer a swift reply or a deeper, more reasoned response depending on the nature of a request. While they calculate a response, both models can search the web or use other tools to improve their output.

AI companies are currently locked in a race to create truly useful AI agents that are able to plan, reason, and execute complex tasks both reliably and free from human supervision, says Stefano Albrecht, director of AI at the startup DeepFlow and coauthor of Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. Often this involves autonomously using the internet or other tools. There are still safety and security obstacles to overcome. AI agents powered by large language models can act erratically and perform unintended actions—which becomes even more of a problem when they’re trusted to act without human supervision.

“The more agents are able to go ahead and do something over extended periods of time, the more helpful they will be, if I have to intervene less and less,” he says. “The new models’ ability to use tools in parallel is interesting—that could save some time along the way, so that’s going to be useful.”

As an example of the sorts of safety issues AI companies are still tackling, agents can end up taking unexpected shortcuts or exploiting loopholes to reach the goals they’ve been given. For example, they might book every seat on a plane to ensure that their user gets a seat, or resort to creative cheating to win a chess game. Anthropic says it managed to reduce this behavior, known as reward hacking, in both new models by 65% relative to Claude Sonnet 3.7. It achieved this by more closely monitoring problematic behaviors during training, and improving both the AI’s training environment and the evaluation methods.

Read more

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

The data center boom in the desert

In the high desert east of Reno, Nevada, construction crews are flattening the golden foothills of the Virginia Range, laying the foundations of a data center city.

Google, Tract, Switch, EdgeCore, Novva, Vantage, and PowerHouse are all operating, building, or expanding huge facilities nearby. Meanwhile, Microsoft has acquired more than 225 acres of undeveloped property, and Apple is expanding its existing data center just across the Truckee River from the industrial park.

The corporate race to amass computing resources to train and run artificial intelligence models and store information in the cloud has sparked a data center boom in the desert—and it’s just far enough away from Nevada’s communities to elude wide notice and, some fear, adequate scrutiny. Read the full story.

—James Temple

This story is part of Power Hungry: AI and our energy future—our new series shining a light on the energy demands and carbon costs of the artificial intelligence revolution. Check out the rest of the package here.

A new atomic clock in space could help us measure elevations on Earth

In 2003, engineers from Germany and Switzerland began building a bridge across the Rhine River simultaneously from both sides. Months into construction, they found that the two sides did not meet. The German side hovered 54 centimeters above the Swiss one.

The misalignment happened because they measured elevation from sea level differently. To prevent such costly construction errors, in 2015 scientists in the International Association of Geodesy voted to adopt the International Height Reference Frame, or IHRF, a worldwide standard for elevation.

Now, a decade after its adoption, scientists are looking to update the standard—by using the most precise clock ever to fly in space. Read the full story.

—Sophia Chen

Three takeaways about AI’s energy use and climate impacts

—Casey Crownhart

This week, we published Power Hungry, a package all about AI and energy. At the center of this package is the most comprehensive look yet at AI’s growing power demand, if I do say so myself.

This data-heavy story is the result of over six months of reporting by me and my colleague James O’Donnell (and the work of many others on our team). Over that time, with the help of leading researchers, we quantified the energy and emissions impacts of individual queries to AI models and tallied what it all adds up to, both right now and for the years ahead.

There’s a lot of data to dig through, and I hope you’ll take the time to explore the whole story. But in the meantime, here are three of my biggest takeaways from working on this project. Read the full story.

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

MIT Technology Review Narrated: Congress used to evaluate emerging technologies. Let’s do it again.

Artificial intelligence comes with a shimmer and a sheen of magical thinking. And if we’re not careful, politicians, employers, and other decision-makers may accept at face value the idea that machines can and should replace human judgment and discretion.

One way to combat that might be resurrecting the Office of Technology Assessment, a Congressional think tank that detected lies and tested tech until it was shuttered in 1995.

This is our latest story to be turned into a MIT Technology Review Narrated podcast, which we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 OpenAI is buying Jony Ive’s AI startup
The former Apple design guru will work with Sam Altman to design an entirely new range of devices. (NYT $)
+ The deal is worth a whopping $6.5 billion. (Bloomberg $)
+ Altman gave OpenAI staff a preview of its AI ‘companion’ devices. (WSJ $)
+ AI products to date have failed to set the world alight. (The Atlantic $)

2 Microsoft has blocked employee emails containing ‘Gaza’ or ‘Palestine’
Although the term ‘Israel’ does not trigger such a block. (The Verge)
+ Protest group No Azure for Apartheid has accused the company of censorship. (Fortune $)

3 DOGE needs to do its work in secret
That’s what the Trump administration is claiming to the Supreme Court, at least. (Ars Technica)
+ It’s trying to avoid being forced to hand over internal documents. (NYT $)
+ DOGE’s tech takeover threatens the safety and stability of our critical data. (MIT Technology Review)

4 US banks are racing to embrace cryptocurrency
Ahead of new stablecoin legislation. (The Information $)
+ Attendees at Trump’s crypto dinner paid over $1 million for the privilege. (NBC News)
+ Bitcoin has surged to an all-time peak yet again. (Reuters)

5 China is making huge technological leaps
Thanks to the billions it’s poured into narrowing the gap between it and the US. (WSJ $)
+ Nvidia’s CEO has branded America’s chip curbs on China ‘a failure.’ (FT $)
+ There can be no winners in a US-China AI arms race. (MIT Technology Review)

6 Disordered eating content is rife on TikTok
But a pocket of creators are dedicated to debunking the worst of it. (Wired $)

7 The US military is interested in the world’s largest aircraft
The gigantic WindRunner plane will have an 80-metre wingspan. (New Scientist $)
+ Phase two of military AI has arrived. (MIT Technology Review)

8 How AI is shaking up animation
New tools are slashing the costs of creating episodes by up to 90%. (NYT $)
+ Generative AI is reshaping South Korea’s webcomics industry. (MIT Technology Review)

9 Tesla’s Cybertruck is a flop
Sorry, Elon. (Fast Company $)
+ The vehicles’ resale value is plummeting. (The Daily Beast)

10 Google’s new AI video generator loves this terrible joke
Which appears to originate from a Reddit post. (404 Media)
+ What happened when 20 comedians got AI to write their routines. (MIT Technology Review)

Quote of the day

“It feels like we are marching off a cliff.”

—An unnamed software engineering vice president jokes that future developers conferences will be attended by the AI agents companies like Microsoft are racing to deploy, Semafor reports.

One more thing

What does GPT-3 “know” about me?

One of the biggest stories in tech is the rise of large language models that produce text that reads like a human might have written it.

These models’ power comes from being trained on troves of publicly available human-created text hoovered up from the internet. If you’ve posted anything even remotely personal in English on the internet, chances are your data might be part of some of the world’s most popular LLMs.

Melissa Heikkilä, MIT Technology Review’s former AI reporter, wondered what data these models might have on her—and how it could be misused. So she put OpenAI’s GPT-3 to the test. Read about what she found.

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Don’t shoot the messenger, but it seems like there’s a new pizza king in town 🍕 ($)
+ Ranked: every Final Destination film, from worst to best.
+ Who knew that jelly could help to preserve coral reefs? Not I.
+ A new generation of space archaeologists are beavering away to document our journeys to the stars.

Read more

This week, we published Power Hungry, a package all about AI and energy. At the center of this package is the most comprehensive look yet at AI’s growing power demand, if I do say so myself. 

This data-heavy story is the result of over six months of reporting by me and my colleague James O’Donnell (and the work of many others on our team). Over that time, with the help of leading researchers, we quantified the energy and emissions impacts of individual queries to AI models and tallied what it all adds up to, both right now and for the years ahead. 

There’s a lot of data to dig through, and I hope you’ll take the time to explore the whole story. But in the meantime, here are three of my biggest takeaways from working on this project. 

1. The energy demands of AI are anything but constant. 

If you’ve heard estimates of AI’s toll, it’s probably a single number associated with a query, likely to OpenAI’s ChatGPT. One popular estimate is that writing an email with ChatGPT uses 500 milliliters (or roughly a bottle) of water. But as we started reporting, I was surprised to learn just how much the details of a query can affect its energy demand. No two queries are the same—for several reasons, including their complexity and the particulars of the model being queried.

One key caveat here is that we don’t know much about “closed source” models—for these, companies hold back the details of how they work. (OpenAI’s ChatGPT and Google’s Gemini are examples.) Instead, we worked with researchers who measured the energy it takes to run open-source AI models, for which the source code is publicly available. 

But using open-source models, it’s possible to directly measure the energy used to respond to a query rather than just guess. We worked with researchers who generated text, images, and video and measured the energy required for the chips the models are based on to perform the task.  

Even just within the text responses, there was a pretty large range of energy needs. A complicated travel itinerary consumed nearly 10 times as much energy as a simple request for a few jokes, for example. An even bigger difference comes from the size of the model used. Larger models with more parameters used up to 70 times more energy than smaller ones for the same prompts. 

As you might imagine, there’s also a big difference between text, images, or video. Videos generally took hundreds of times more energy to generate than text responses. 

2. What’s powering the grid will greatly affect the climate toll of AI’s energy use. 

As the resident climate reporter on this project, I was excited to take the expected energy toll and translate it into an expected emissions burden. 

Powering a data center with a nuclear reactor or a whole bunch of solar panels and batteries will not affect our planet the same way as burning mountains of coal. To quantify this idea, we used a figure called carbon intensity, a measure of how dirty a unit of electricity is on a given grid. 

We found that the same exact query, with the same exact energy demand, will have a very different climate impact depending on what the data center is powered by, and that depends on the location and the time of day. For example, querying a data center in West Virginia could cause nearly twice the emissions of querying one in California, according to calculations based on average data from 2024.

This point shows why it matters where tech giants are building data centers, what the grid looks like in their chosen locations, and how that might change with more demand from the new infrastructure. 

3. There is still so much that we don’t know when it comes to AI and energy. 

Our reporting resulted in estimates that are some of the most specific and comprehensive out there. But ultimately, we still have no idea what many of the biggest, most influential models are adding up to in terms of energy and emissions. None of the companies we reached out to were willing to provide numbers during our reporting. Not one.

Adding up our estimates can only go so far, in part because AI is increasingly everywhere. While today you might generally have to go to a dedicated site and type in questions, in the future AI could be stitched into the fabric of our interactions with technology. (See my colleague Will Douglas Heaven’s new story on Google’s I/O showcase: “By putting AI into everything, Google wants to make it invisible.”)

AI could be one of the major forces that shape our society, our work, and our power grid. Knowing more about its consequences could be crucial to planning our future. 

To dig into our reporting, give the main story a read. And if you’re looking for more details on how we came up with our numbers, you can check out this behind-the-scenes piece.

There are also some great related stories in this package, including one from James Temple on the data center boom in the Nevada desert, one from David Rotman about how AI’s rise could entrench natural gas, and one from Will Douglas Heaven on a few technical innovations that could help make AI more efficient. Oh, and I also have a piece on why nuclear isn’t the easy answer some think it is

Find them, and the rest of the stories in the package, here

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Read more

In 2003, engineers from Germany and Switzerland began building a bridge across the Rhine River simultaneously from both sides. Months into construction, they found that the two sides did not meet. The German side hovered 54 centimeters above the Swiss side.

The misalignment occurred because the German engineers had measured elevation with a historic level of the North Sea as its zero point, while the Swiss ones had used the Mediterranean Sea, which was 27 centimeters lower. We may speak colloquially of elevations with respect to “sea level,” but Earth’s seas are actually not level. “The sea level is varying from location to location,” says Laura Sanchez, a geodesist at the Technical University of Munich in Germany. (Geodesists study our planet’s shape, orientation, and gravitational field.) While the two teams knew about the 27-centimeter difference, they mixed up which side was higher. Ultimately, Germany lowered its side to complete the bridge. 

To prevent such costly construction errors, in 2015 scientists in the International Association of Geodesy voted to adopt the International Height Reference Frame, or IHRF, a worldwide standard for elevation. It’s the third-dimensional counterpart to latitude and longitude, says Sanchez, who helps coordinate the standardization effort. 

Now, a decade after its adoption, geodesists are looking to update the standard—by using the most precise clock ever to fly in space.

That clock, called the Atomic Clock Ensemble in Space, or ACES, launched into orbit from Florida last month, bound for the International Space Station. ACES, which was built by the European Space Agency, consists of two connected atomic clocks, one containing cesium atoms and the other containing hydrogen, combined to produce a single set of ticks with higher precision than either clock alone. 

Pendulum clocks are only accurate to about a second per day, as the rate at which a pendulum swings can vary with humidity, temperature, and the weight of extra dust. Atomic clocks in current GPS satellites will lose or gain a second on average every 3,000 years. ACES, on the other hand, “will not lose or gain a second in 300 million years,” says Luigi Cacciapuoti, an ESA physicist who helped build and launch the device. (In 2022, China installed a potentially stabler clock on its space station, but the Chinese government has not publicly shared the clock’s performance after launch, according to Cacciapuoti.) 

From space, ACES will link to some of the most accurate clocks on Earth to create a synchronized clock network, which will support its main purpose: to perform tests of fundamental physics. 

But it’s of special interest for geodesists because it can be used to make gravitational measurements that will help establish a more precise zero point from which to measure elevation across the world.

Alignment over this “zero point” (basically where you stick the end of the tape measure to measure elevation) is important for international collaboration. It makes it easier, for example, to monitor and compare sea-level changes around the world. It is especially useful for building infrastructure involving flowing water, such as dams and canals. In 2020, the international height standard even resolved a long-standing dispute between China and Nepal over Mount Everest’s height. For years, China said the mountain was 8,844.43 meters; Nepal measured it at 8,848. Using the IHRF, the two countries finally agreed that the mountain was 8,848.86 meters. 

Airbus worker performs critical tests on ACES in the Space Station Processing Facility cleanroom at the Kennedy Space Center.
A worker performs tests on ACES at a cleanroom at the Kennedy Space Center in Florida.
ESA-T. PEIGNIER

To create a standard zero point, geodesists create a model of Earth known as a geoid. Every point on the surface of this lumpy, potato-shaped model experiences the same gravity, which means that if you dug a canal at the height of the geoid, the water within the canal would be level and would not flow. Distance from the geoid establishes a global system for altitude.

However, the current model lacks precision, particularly in Africa and South America, says Sanchez. Today’s geoid has been built using instruments that directly measure Earth’s gravity. These have been carried on satellites, which excel at getting a global but low-resolution view, and have also been used to get finer details via expensive ground- and airplane-based surveys. But geodesists have not had the funding to survey Africa and South America as extensively as other parts of the world, particularly in difficult terrain such as the Amazon rainforest and Sahara Desert. 

To understand the discrepancy in precision, imagine a bridge that spans Africa from the Mediterranean coast to Cape Town, South Africa. If it’s built using the current geoid, the two ends of the bridge will be misaligned by tens of centimeters. In comparison, you’d be off by at most five centimeters if you were building a bridge spanning North America. 

To improve the geoid’s precision, geodesists want to create a worldwide network of clocks, synchronized from space. The idea works according to Einstein’s theory of general relativity, which states that the stronger the gravitational field, the more slowly time passes. The 2014 sci-fi movie Interstellar illustrates an extreme version of this so-called time dilation: Two astronauts spend a few hours in extreme gravity near a black hole to return to a shipmate who has aged more than two decades. Similarly, Earth’s gravity grows weaker the higher in elevation you are. Your feet, for example, experience slightly stronger gravity than your head when you’re standing. Assuming you live to be about 80 years old, over a lifetime your head will age tens of billionths of a second more than your feet. 

A clock network would allow geodesists to compare the ticking of clocks all over the world. They could then use the variations in time to map Earth’s gravitational field much more precisely, and consequently create a more precise geoid. The most accurate clocks today are precise enough to measure variations in time that map onto centimeter-level differences in elevation. 

“We want to have the accuracy level at the one-centimeter or sub-centimeter level,” says Jürgen Müller, a geodesist at Leibniz University Hannover in Germany. Specifically, geodesists would use the clock measurements to validate their geoid model, which they currently do with ground- and plane-based surveying techniques. They think that a clock network should be considerably less expensive.

ACES is just a first step. It is capable of measuring altitudes at various points around Earth with 10-centimeter precision, says Cacciapuoti. But the point of ACES is to prototype the clock network. It will demonstrate the optical and microwave technology needed to use a clock in space to connect some of the most advanced ground-based clocks together. In the next year or so, Müller plans to use ACES to connect to clocks on the ground, starting with three in Germany. Müller’s team could then make more precise measurements at the location of those clocks.

These early studies will pave the way for work connecting even more precise clocks than ACES to the network, ultimately leading to an improved geoid. The best clocks today are some 50 times more precise than ACES. “The exciting thing is that clocks are getting even stabler,” says Michael Bevis, a geodesist at Ohio State University, who was not involved with the project. A more precise geoid would allow engineers, for example, to build a canal with better control of its depth and flow, he says. However, he points out that in order for geodesists to take advantage of the clocks’ precision, they will also have to improve their mathematical models of Earth’s gravitational field. 

Even starting to build this clock network has required decades of dedicated work by scientists and engineers. It took ESA three decades to make a clock as small as ACES that is suitable for space, says Cacciapuoti. This meant miniaturizing a clock the size of a laboratory into the size of a small fridge. “It was a huge engineering effort,” says Cacciapuoti, who has been working on the project since he began at ESA 20 years ago. 

Geodesists expect they’ll need at least another decade to develop the clock network and launch more clocks into space. One possibility would be to slot the clocks onto GPS satellites. The timeline depends on the success of the ACES mission and the willingness of government agencies to invest, says Sanchez. But whatever the specifics, mapping the world takes time.

Read more
1 493 494 495 496 497 3,186