OsaTeching

Google Veo 2 is one of the best AI video models I have ever seen.

General

Google has an advanced AI lab called DeepMind, which has been conducting research for the past few weeks. The latest release is a new version of the Veo artificial intelligence video model, which understands physics more accurately than any other video tool to date.

First announced at Google I/O earlier this year, Veo is one of the best AI video generators in direct competition with OpenAI's Sora, and the new version takes it to a whole new level.

Veo 2 not only improves visual realism, but it also has a better understanding of physics, resulting in more accurate depictions of motion. In one video example, someone slices a tomato precisely, something no other video model can achieve.

The new Veo model is currently still in the waiting phase, but you can register for access when it becomes available through Google Labs; it has been incorporated into the VideoFX experiment and can create 4K clips up to 1 minute long.

I have not tried Veo 2 myself, but the videos Google has released (including one showing bees surrounding a beekeeper) look more realistic than the ones I have tried; even Pika 2.0 has not solved the physical problems, which are still being worked out, and the video is still in the process of being released.

While waiting for access, I scoured social media and the Veo 2 website to gather some of the best examples of Veo 2 features I could find.

I chose the video above because it handles the complex interactions between individual bees and beekeepers. The bees move in a natural way and the beekeeper holds a jar of honey in his hand. This may seem trivial, but each of these elements is something that other models struggle with on their own.

Prompt from Google: “The camera floats gently through a row of wooden beehives painted in pastel colors, buzzing bees moving in and out of the frame. His stark white beekeeping outfit glistens in the golden afternoon light. He lifts a jar of honey and tilts it slightly to catch the light. Behind him, tall sunflowers sway rhythmically in the breeze, their petals glistening in the warm sunlight. The camera tilts upward to reveal a retro farmhouse with mint green shutters, its walls dimmed by the shadows of swaying trees. The golden light, captured on Kodak Portra 400 film with a 35mm lens, creates rich textures in the farmer's gloves, marmalade jars, and weathered beehive trees.

A few years ago, when OpenAI first presented their DALL-E 3 image model, they used flamingos. I don't know if this was Google's intention, but the example includes multiple flamingo videos. Here they capture the movement of the water, the physics related to the weight of the dog, and the lighting.

Prompt from Google: “The cinematic shot captures a fluffy cockapoo perched atop a bright pink flamingo float in a sun-drenched Los Angeles swimming pool. The clear water glistens in the bright California sun, creating a playful scene. The soft cockapoo's fur, a mixture of white and apricot, is illuminated by the golden sunlight, and its floppy ears gently sway in the breeze. Their happy expressions and wagging tails convey pure joy and summer bliss. Bright pink flamingos add a touch of whimsy, creating a picturesque image of carefree fun in the Los Angeles sun.

This prompt made me hungry. It led me to brew coffee. Curiously, pouring liquid is something other models are not good at, but the Veo 2 did it perfectly.

Prompt from Google: “The sun slowly rises behind a perfectly plated breakfast scene. Thick, golden maple syrup is poured in slow motion over the fluffy pancakes, each one releasing soft, warm steam. Crispy bacon ups sizzle and the golden fat rises into the air in tiny flames. Coffee poured into a clear cup is filled with a layer of deep brown crema. The scene ends with the camera swooping in on a freshly cut orange, showing its bright, juicy segments in stunning macro detail.”

The video model does a pretty good job of portraying emotion, but it is not perfect and some are better than others. This video shows that Veo 2 is one of the better ones.

Prompt from Google: “An extreme close-up shot focuses on the female DJ's face, her beautifully voluminous black curly hair framing her features as she is completely immersed in the music. Her eyes are closed, immersed in the rhythm, a faint smile on her lips. The camera captures the subtle movements of her head as she nods and sways to the beat. A shallow depth of field blurs the background. She is surrounded by bright neon colors. The close-up emphasizes her seductive presence and her ability to carry and transcend the music.

Finally, the video captivated me with its complexity. There are so many elements in the clip that keep a great deal of visual clarity and movement. The reflections, the movement happening in the mirror, even the reflection of the candle are all elements that others would have struggled with.

Prompt from Google: “The camera moves in slow dolly shots, showing the opulence of a Renaissance palace room decorated with gilded furniture, velvet drapes, and softly shimmering chandeliers. The queen sits at her gilded desk, her crimson silk gown cascading to the floor like spilled blood. On the desk sits an unsigned letter, the edges of which have curled with age. The camera captures her from behind, her stoic face reflected in a huge mirror. Behind her, the courtiers murmur, their silhouettes dancing like ghosts in the candlelight. The room feels heavy, each gilded detail amplifying the air of betrayal and paranoia. The color palette alternates between deep, majestic reds and cold golds, and chiaroscuro lighting adds to the drama; the rich textures, shot on 70mm film, evoke the grandeur of a historical masterpiece.