This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

We can’t “make American children healthy again” without tackling the gun crisis

This week, the Trump administration released a strategy for improving the health and well-being of American children. The report was titled—you guessed it—Make Our Children Healthy Again. It suggests American children should be eating more healthily. And they should be getting more exercise.

But there’s a glaring omission. The leading cause of death for American children and teenagers isn’t ultraprocessed food or exposure to some chemical. It’s gun violence. 

This week’s news of yet more high-profile shootings at schools in the US throws this disconnect into even sharper relief. Experts believe it is time to treat gun violence in the US as what it is: a public health crisis. Read the full story.

—Jessica Hamzelou

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

How do AI models generate videos?

It’s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, and the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation.

The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation.

With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work. Read the full story.

—Will Douglas Heaven

This article is part of MIT Technology Review Explains, our series untangling the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

Meet our 2025 Innovator of the Year: Sneha Goenka

Up to a quarter of children entering intensive care have undiagnosed genetic conditions. To be treated properly, they must first get diagnoses—which means having their genomes sequenced. This process typically takes up to seven weeks. Sadly, that’s often too slow to save a critically ill child.

Hospitals may soon have a faster option, thanks to a groundbreaking system built in part by Sneha Goenka, an assistant professor of electrical and computer engineering at Princeton—and MIT Technology Review’s 2025 Innovator of the Year. Read all about Goenka and her work in this profile.

—Helen Thomson

As well as our Innovator of the Year, Goenka is one of the biotech honorees on our 35 Innovators Under 35 list for 2025. Meet the rest of our biotech and materials science innovators, and the full list here

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 OpenAI and Microsoft have agreed a revised deal
But haven’t actually revealed any details of said deal. (Axios)
+ The news comes as OpenAI keeps pursuing its for-profit pivot. (Ars Technica)
+ The world’s largest startup is going to need more paying users soon. (WSJ $)

2 A child has died from a measles complication in Los Angeles
They had contracted the virus before they were old enough to be vaccinated. (Ars Technica)
+ Infants are best protected by community immunity. (LA Times $)
+ They’d originally recovered from measles before developing the condition. (CNN)
+ Why childhood vaccines are a public health success story. (MIT Technology Review)

3 Ukrainian drone attacks triggered internet blackouts in Russia
The Kremlin cut internet access in a bid to thwart the mobile-guided drones. (FT $)
+ The UK is poised to mass-produce drones to aid Ukraine. (Sky News)
+ On the ground in Ukraine’s largest Starlink repair shop. (MIT Technology Review)

4 Demis Hasabis says AI may slash drug discovery time to under a year
Or perhaps even faster. (Bloomberg $)
+ But there’s good reason to be skeptical of that claim. (FT $)
+ An AI-driven “factory of drugs” claims to have hit a big milestone. (MIT Technology Review)

5 How chatbots alter how we think
We shouldn’t outsource our critical thinking to them. (Undark)
+ AI companies have stopped warning you that their chatbots aren’t doctors. (MIT Technology Review)

6 Fraudsters are threatening small businesses with one-star reviews
Online reviews can make or break fledgling enterprises, and scammers know it. (NYT $)

7 Why humanoid robots aren’t taking off any time soon
The industry has a major hype problem. (IEEE Spectrum)
+ Chinese tech giant Ant Group showed off its own humanoid machine. (The Verge)
+ Why the humanoid workforce is running late. (MIT Technology Review)

8 Encyclopedia Britannica and Merriam-Webster are suing Perplexity
In yet another case of alleged copyright infringement. (Reuters)
+ What comes next for AI copyright lawsuits? (MIT Technology Review)

9 Where we’re most likely to find extraterrestrial life in the next decade
Warning: Hollywood may have given us unrealistic expectations. (BBC)

10 Want to build a trillion-dollar company?
Then kiss your social life goodbye. (WSJ $)

Quote of the day

“Nooooo I’m going to have to use my brain again and write 100% of my code like a caveman from December 2024.”

—A Hacker News commenter jokes about a service outage that left Anthropic users unable to access its AI coding tools, Ars Technica reports.

One more thing


What Africa needs to do to become a major AI player

Africa is still early in the process of adopting AI technologies. But researchers say the continent is uniquely hospitable to it for several reasons, including a relatively young and increasingly well-educated population, a rapidly growing ecosystem of AI startups, and lots of potential consumers.

However, ambitious efforts to develop AI tools that answer the needs of Africans face numerous hurdles. Read our story to learn what they are, and how they could be overcome.

—Abdullahi Tsanni

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ The fascinating, unexpected origins of everyone’s favorite pastime—karaoke.
+ Why the Twilight juggernaut just refuses to die.
+ If you’re among the mass of excited Hollow Knight fans, here’s a few tips to get through the early stages of the new Silksong game.
+ A sloe gin bramble pie sounds like the perfect way to welcome fall.

Read more

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

It’s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation. This year also saw Netflix debut an AI visual effect in its show The Eternaut, the first time video generation has been used to make mass-market TV.

Sure, the clips you see in demo reels are cherry-picked to showcase a company’s models at the top of their game. But with the technology in the hands of more users than ever before—Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers—even the most casual filmmaker can now knock out something remarkable. 

The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation. 

With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work.

How do you generate a video?

Let’s assume you’re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: “Hey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.” What you get back will be hit or miss, and you’ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted. 

So what’s going on under the hood? Why is it hit or miss—and why does it take so much energy? The latest wave of video generation models are what’s known as latent diffusion transformers. Yes, that’s quite a mouthful. Let’s unpack each part in turn, starting with diffusion. 

What’s a diffusion model?

Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set. 

A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes. 

The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set. 

But you don’t want any image—you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model—such as a large language model (LLM) trained to match images with text descriptions—that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt. 

An aside: This LLM isn’t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it’s represented online, distorted by prejudice (and pornography).

It’s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images—the consecutive frames of a video—instead of just one image. 

What’s a latent diffusion model? 

All this takes a huge amount of compute (read: energy). That’s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest. 

A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video. 

And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user’s prompt, the compressed video gets converted into something you can watch.  

With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There’s just an eye-popping amount of computation involved.) 

What’s a latent diffusion transformer?

Still with me? There’s one more piece to the puzzle—and that’s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video. 

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as OpenAI’s GPT-5 and Google DeepMind’s Gemini, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences. 

But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Tim Brooks, a lead researcher on Sora.

A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan

Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don’t pop in and out of existence, for example. 

And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats. 

What about the audio? 

A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That’s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at this year’s Google I/O: “We’re emerging from the silent era of video generation.” 

The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind’s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched.  

You said that diffusion models can generate different kinds of data. Is this how LLMs work too? 

No—or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models—which generate text (including computer code)—are built using transformers. But the lines are blurring. We’ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text. 

Here’s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind’s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!

Read more

Note for readers: This newsletter discusses gun violence, a raw and tragic issue in America. It was already in progress on Wednesday when a school shooting occurred at Evergreen High School in Colorado and Charlie Kirk was shot and killed at Utah Valley University. 

Earlier this week, the Trump administration’s Make America Healthy Again movement released a strategy for improving the health and well-being of American children. The report was titled—you guessed it—Make Our Children Healthy Again.

Robert F. Kennedy Jr., who leads the Department of Health and Human Services, and his colleagues are focusing on four key aspects of child health: diet, exercise, chemical exposure, and overmedicalization.

Anyone who’s been listening to RFK Jr. posturing on health and wellness won’t be surprised by these priorities. And the first two are pretty obvious. On the whole, American children should be eating more healthily. And they should be getting more exercise.

But there’s a glaring omission. The leading cause of death for American children and teenagers isn’t ultraprocessed food or exposure to some chemical. It’s gun violence

Yesterday’s news of yet more high-profile shootings at schools in the US throws this disconnect into even sharper relief. Experts believe it is time to treat gun violence in the US as what it is: a public health crisis.

I live in London, UK, with my husband and two young children. We don’t live in a particularly fancy part of the city—in one recent ranking of London boroughs from most to least posh, ours came in at 30th out of 33. I do worry about crime. But I don’t worry about gun violence.

That changed when I temporarily moved my family to the US a couple of years ago. We rented the ground-floor apartment of a lovely home in Cambridge, Massachusetts—a beautiful area with good schools, pastel-colored houses, and fluffy rabbits hopping about. It wasn’t until after we’d moved in that my landlord told me he had guns in the basement.

My daughter joined the kindergarten of a local school that specialized in music, and we took her younger sister along to watch the kids sing songs about friendship. It was all so heartwarming—until we noticed the school security officer at the entrance carrying a gun.

Later in the year, I received an email alert from the superintendent of the Cambridge Public Schools. “At approximately 1:45 this afternoon, a Cambridge Police Department Youth Officer assigned to Cambridge Rindge and Latin School accidentally discharged their firearm while using a staff bathroom inside the school,” the message began. “The school day was not disrupted.”

These experiences, among others, truly brought home to me the cultural differences over firearms between the US and the UK (along with most other countries). For the first time, I worried about my children’s exposure to them. I banned my children from accessing parts of the house. I felt guilty that my four-year-old had to learn what to do if a gunman entered her school. 

But it’s the statistics that are the most upsetting.

In 2023, 46,728 people died from gun violence in the US, according to a report published in June by the Johns Hopkins Bloomberg School of Public Health. That includes both homicides and suicides, and it breaks down to 128 deaths per day, on average. The majority of those who die from gun violence are adults. But the figures for children are sickening, too. In 2023, 2,566 young people died from gun violence. Of those, 234 were under the age of 10.

Gun death rates among children have more than doubled since 2013. Firearms are involved in more child deaths than cancer or car crashes.

Many other children survive gun violence with nonfatal—but often life-changing—injuries. And the impacts are felt beyond those who are physically injured. Witnessing gun violence or hearing gunshots can understandably cause fear, sadness, and distress.  

That’s worth bearing in mind when you consider that there have been 434 school shootings in the US since Columbine in 1999. The Washington Post estimates that 397,000 students have experienced gun violence at school in that period. Another school shooting took place at Evergreen High School in Colorado on Wednesday, adding to that total.

“Being indirectly exposed to gun violence takes its toll on our mental health and children’s ability to learn,” says Daniel Webster, Bloomberg Professor of American Health at the Johns Hopkins Center for Gun Violence Solutions in Baltimore.

The MAHA report states that “American youth face a mental health crisis,” going on to note that “suicide deaths among 10- to 24-year-olds increased by 62% from 2007 to 2021” and that “suicide is now the leading cause of death in teens aged 15-19.” What it doesn’t say is that around half of these suicides involve guns.

“When you add all these dimensions, [gun violence is] a very huge public health problem,” says Webster.

Researchers who study gun violence have been saying the same thing for years. And in 2024, then US Surgeon General Vivek Murthy declared it a public health crisis. “We don’t have to subject our children to the ongoing horror of firearm violence in America,” Murthy said in a statement at the time. Instead, he argued, we should tackle the problem using a public health approach.

Part of that approach involves identifying who is at the greatest risk and offering support to lower that risk, says Webster. Young men who live in poor communities tend to have the highest risk of gun violence, he says, as do those who experience crisis or turmoil. Trying to mediate conflicts or limit access to firearms, even temporarily, can help lower the incidence of gun violence, he says.

There’s an element of social contagion, too, adds Webster. Shooting begets more shooting. He likens it to the outbreak of an infectious disease. “When more people get vaccinated … infection rates go down,” he says. “Almost exactly the same thing happens with gun violence.”

But existing efforts are already under threat. The Trump administration has eliminated hundreds of millions of dollars in grants for organizations working to reduce gun violence.

Webster thinks the MAHA report has “missed the mark” when it comes to the health and well-being of children in the US. “This document is almost the polar opposite to how many people in public health think,” he says. “We have to acknowledge that injuries and deaths from firearms are a big threat to the health and safety of children and adolescents.”

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

Read more

Generative AI has the potential to transform the finance function. By taking on some of the more mundane tasks that can occupy a lot of time, generative AI tools can help free up capacity for more high-value strategic work. For chief financial officers, this could mean spending more time and energy on proactively advising the business on financial strategy as organizations around the world continue to weather ongoing geopolitical and financial uncertainty.

CFOs can use large language models (LLMs) and generative AI tools to support everyday tasks like generating quarterly reports, communicating with investors, and formulating strategic summaries, says Andrew W. Lo, Charles E. and Susan T. Harris professor and director of the Laboratory for Financial Engineering at the MIT Sloan School of Management. “LLMs can’t replace the CFO by any means, but they can take a lot of the drudgery out of the role by providing first drafts of documents that summarize key issues and outline strategic priorities.”

Generative AI is also showing promise in functions like treasury, with use cases including cash, revenue, and liquidity forecasting and management, as well as automating contracts and investment analysis. However, challenges still remain for generative AI to contribute to forecasting due to the mathematical limitations of LLMs. Regardless, Deloitte’s analysis of its 2024 State of Generative AI in the Enterprise survey found that one-fifth (19%) of finance organizations have already adopted generative AI in the finance function.

Despite return on generative AI investments in finance functions being 8 points below expectations so far for surveyed organizations (see Figure 1), some finance departments appear to be moving ahead with investments. Deloitte’s fourth-quarter 2024 North American CFO Signals survey found that 46% of CFOs who responded expect deployment or spend on generative AI in finance to increase in the next 12 months (see Figure 2). Respondents cite the technology’s potential to help control costs through self-service and automation and free up workers for higher-level, higher-productivity tasks as some of the top benefits of the technology.

“Companies have used AI on the customer-facing side of the house for a long time, but in finance, employees are still creating documents and presentations and emailing them around,” says Robyn Peters, principal in finance transformation at Deloitte Consulting LLP. “Largely, the human-centric experience that customers expect from brands in retail, transportation, and hospitality haven’t been pulled through to the finance organization. And there’s no reason we cannot do that—and, in fact, AI makes it a lot easier to do.”

If CFOs think they can just sit by for the next five years and watch how AI evolves, they may lose out to more nimble competitors that are actively experimenting in the space. Future finance professionals are growing up using generative AI tools too. CFOs should consider reimagining what it looks like to be a successful finance professional, in collaboration with AI.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Read more
1 377 378 379 380 381 3,214