Trying Vidu 1.5 - a new major player in the AI video field

Trying Vidu 1.5 - a new major player in the AI video field

Vidu is an AI video platform from China that seeks not only to compete with other leading players such as Runway and Kling, but also with OpenAI's powerful, yet unreleased Sora.

Developed by Shengshu, it is the first AI video tool to add “multi-entity consistency.” This feature allows unrelated images to be stitched together to create one new coherent video. This is in response to recent studies that have shown that AI video models mimic physics from images, rather than understanding how physics works.

For example, if you upload a photo of yourself and a random car, the model can put you in the driver's seat and make the car move. In another example given by Vidu, a second image of a coat or shirt can be used to dress a character in different outfits.

What I like best about Vidu 1.5 is the control you have as the creator when creating an AI video. You can customize the degree of movement, resolution, time, etc. I need to do more testing, but it looks like it will be on my list of best AI video generators.

Vidu 1.5 is Shengshu's latest offering, with multi-entity mode as well as the usual text-to-video and image-to-video modes that other platforms enjoy. It can be configured to generate photorealistic or illustrative videos, and it doesn't move badly either.

The ability to generate clips in 1080p is also a big step up from the usual 720p limit of other platforms, but the text-to-video model is not as good as Runway, Kling, or MiniMax. [The future of content creation is here, and it is supported by the limitless possibilities of AI. At the heart of this transformation is the ability for everyone to engage in high-quality content production, unlocking new opportunities and breaking through traditional limitations.”

Multi-entity consistency is probably one of the most innovative add-ons to AI video I've seen in a while. Not only can it be tried and tested to steer the visuals of a video, but it can also improve the overall movement, especially when used to give different perspectives.

In one example, we gave three images of a skateboarder and added an extra perspective to help generate a more fluid motion as the board moves across the steps.

In another test, I gave it a picture of me and a picture of a person busking, and it was able to recreate a fairly accurate image of me playing the guitar from a single image!

What made the video of me playing the guitar work was another feature called “Advanced Character Control.” According to Vidou, this provides greater precision to the way the camera moves, the cinematic techniques used in the output, and the general movement in the video.

Finally, the motion speed level can be set, which, according to Vidu, allows for a more authentic output of the model. Basically, it can be set to auto, low, medium, or high motion to create a more dynamic output.

Overall, I was impressed with Vidu 1.5. It still has some work to do to catch up to the state-of-the-art in terms of visual realism and motion, but it is one step closer and is a state-of-the-art model.

The multi-entity consistency is an important enough feature to warrant attention to Vidu on its own, and I suspect other models will try to emulate it in the near future.

Categories