Ian Sansavera, a software engineer at a New York startup called Runway AI, wrote a short description of what he wanted to see in a video. Books “Quiet River in the Woods”.
Less than two minutes later, a test internet service produced a short video clip of a calm river in a forest. The running river water shimmered in the sun as it cut through the trees and ferns, turned a corner, and splashed gently against the rocks.
Runway, which plans to open its service to a small group of testers this week, is one of several companies building artificial intelligence technology that will soon allow people to create videos simply by typing several words into a box on a computer screen.
They represent the next stage in the industry’s race — a race that includes giants like Microsoft and Google as well as much smaller startups — to create new types of AI systems that some believe could be the next big thing in technology, as important as web browsers or iPhone.
New video creation systems can speed up the work of filmmakers and other digital artists, while becoming a new, fast way to create hard-to-detect misinformation online, making it hard to tell what’s real online.
The systems are examples of what is known as generative artificial intelligence, which can instantly generate text, images, and sounds. Another example is ChatGPT, the online chatbot created by San Francisco startup OpenAI, which stunned the tech industry with its capabilities late last year.
Google and Meta, the parent company of Facebook, It unveiled its first video generation systems last yearbut they did not share it with the public because they were concerned that the systems could eventually be used to spread disinformation with their newfound speed and efficiency.
But Runway CEO Cristobal Valenzuela said he believes the technology is too important to be kept in a research lab, despite its risks. “This is one of the most impressive technologies we’ve built in the last 100 years,” he said. “You need people to actually use it.”
The ability to edit and manipulate movies and video is nothing new, of course. Filmmakers have been doing this for over a century. In recent years, researchers and digital artists have used various AI technologies and programs to create and edit videos that are often called fake videos.
But systems like the one Runway has created could, in time, replace editing skills with the push of a button.
A new generation of chatbots
Brave New World. A new batch of AI-powered chatbots has ignited a scramble to determine whether the technology can upend the economics of the Internet, turn today’s workforce into packages, and create the next industry giants. Here are the bots to know:
Runway technology produces videos of any short description. To get started, just write a description much as you would a quick note.
This works best if the scene contains some action – but not a lot of action – something like “Rainy day in a big city” or “A dog with a cell phone in the park”. Press Enter, and the system will create a video in a minute or two.
This technology can reproduce common images, such as a cat sleeping on a rug. Or he can combine disparate concepts to create oddly entertaining videos, like a cow at a birthday party.
The videos are only four seconds long, and the video is choppy and blurry if you look closely. Sometimes the images are strange, distorted, and disturbing. The system has a way of fusing animals like dogs and cats with inanimate objects like balls and cell phones. But given the right direction, he’s producing videos showing where the technology is headed.
“At this point, if I see an HD video, I’m probably going to trust it. But that’s going to change very quickly,” said Philip Isola, a professor at MIT who specializes in artificial intelligence.
Like other generative AI technologies, Runway’s system learns by analyzing numerical data — in this case, photos, videos, and annotations describing what those images contain. By training this type of technology on increasingly large amounts of data, researchers are confident they can quickly improve and expand their skills. Very soon, the experts believe, they’ll be creating professional-looking mini-movies complete with music and dialogue.
It is difficult to say what the system is currently creating. It’s not a picture. It’s not a cartoon. It’s a collection of lots of pixels mixed together to create a realistic video. The company plans to introduce its technology with other tools that it believes will speed up the work of professional artists.
For the past month, social media has been abuzz with photos of Pope Francis in a white Balenciaga puffer coat — a surprisingly modern outfit for the 86-year-old pope. But the pictures were not real. A 31-year-old construction worker from Chicago caused quite a stir Using a popular AI tool called Midjourney.
Dr. Isola has spent years building and testing this type of technology, first as a researcher at the University of California, Berkeley, and at OpenAI, and then as a professor at MIT. Totally fake pictures of Pope Francis.
“There was a time when people would post deep fakes and they wouldn’t fool me, because it was too weird or too unrealistic,” he said. “Now, we can’t take any of the images we see online at face value.”
Midjourney is one of many services that can create realistic still images from a short prompt. Other applications include Stable Diffusion and DALL-E, the OpenAI technology that started this wave of image generators when it was unveiled a year ago.
Midjourney relies on a neural network that learns its skills by analyzing huge amounts of data. It looks for patterns as it combs through millions of digital images as well as text captions describing the images being photographed.
When someone describes an image of a system, they are creating a list of features that the image might have. One feature may be the curve at the top of a dog’s ear. Another may be the edge of the mobile phone. Next, a second neural network, called a diffusion model, generates the image and generates the pixels needed for the attributes. Finally, it converts the pixels into a coherent image.
Companies like Runway, which has about 40 employees and has raised $95.5 million, are using this technology to create moving images. By analyzing thousands of video clips, their technology can learn to stitch together many still images in a similar coherent fashion.
“Video is just a series of frames — still images — that are combined in a way that gives the illusion of movement,” said Mr. Valenzuela. “The trick is to train a model that understands the relationship and consistency between each framework.”
Like early versions of instruments such as DALL-E and Midjourney, the technique sometimes combines concepts and images in strange ways. If you order a bear who plays basketball, he may give a kind of transforming stuffed animal with led basketball. If you ask a dog with a mobile phone in the park, it may give you a mobile phone-carrying puppy with an alien human body.
But experts believe they can correct the flaws as they train their systems on more and more data. They believe technology will eventually make creating a video as easy as writing a sentence.
“In the old days, to do anything like this remotely, you had to have a camera. You had to have props. You had to have a location. You had to have permission,” said Susan Bonser, an author and publisher in Penn State, who was Early incarnations of generative video technology exclaim, “You should have had the money.” “You don’t have to have any of that now. You can just sit back and imagine it.”
“Infuriatingly humble music trailblazer. Gamer. Food enthusiast. Beeraholic. Zombie guru.”