We can not deny how fast technology develops today, and we are yet to see every possible thing that artificial intelligence (AI) can do. Google, one of the largest and most valuable brand/tech company in the world, introduces their very own Imagen Video AI. With Imagen being a text-to-image diffusion model upgrading into generating videos as well using texts as data.
With this AI technology of Google, it is now possible to combine texts,
images, videos in order to create both realistic and artistic videos.
For now, the resulting video clips could use some more improvements and quality upgrading, as the generated video loops seem to contain artifacts and noise. But despite the imperfections of the initial results, Google acknowledges that this is surely a step toward a system with a high level of adaptability and awareness of the world, as well as the capacity to produce video in a variety of artistic forms.
Google's Imagen
Google utilized an internal dataset with up to 14 million films and 60 million photos to train its Imagen Video AI. The LAION-400M open data set they used was said to contain approximately 400 million photos. With these means, their tool is able to use given data inputs through texts to artistically create five-second video clips.
Imagen was originally able to do this with images using a brief textual description alone, giving extremely photorealistic results. But since video is more complicated than pictures, Imagen Video AI requires a more specific descriptive narration of how the expected video clip should look like.
Image/Video Generating Systems
This released detailed work by Google on October 6, 2022 was said to keep up with Meta’s Make-A-Video which has similar goals. The Imagen, an image-generating system, can also be comparable to a system of a smaller company, OpenAI’s DALL-E 2 and Stable Diffusion.
DALL-E 2 and Stable Diffusion was stated to have better results than Imagen Video AI. But it could have been due to Google tech’s efforts to handle and master increasingly complicated challenges including 3D objects and how they connect with each other.
In addition, it was said that Imagen Video can also correctly generate text, which is a significant advance over the image-generating technologies used today. Stable Diffusion and DALL-E 2 have trouble rendering prompts like “a logo for “Diffusion”” into understandable type, whereas Imagen Video does it without any problems.
Imagen Video AI's issues to improve
As said earlier, the video clip results are imperfect and needs more improvements. These issues are said to be similar to Meta’s Make-A-Video outcomes, being shaky and warped in certain places, with things that merge together in “physically unnatural and impossible” ways. This issue was said to still be unsolved, and the quality of results are yet to have big improvements.
The Imagen Video team intends to work with the researchers behind Phenaki, another Google text-to-video system that emerged today that can convert lengthy, complex instructions into films of more than two minutes in length, but at a lesser quality, in order to advance upon the distorted result issues.
The researchers also point out another negatively serious possibility, which is the fact that Imagen Video AI could produce highly violent or sexual videos. As well as resulting to clips concerning copyrights, misinformation, and even deepfakes. That is why they claimed that the content of the data used to train the system was still gravely problematic.
Google's Imagen Video AI availability
In contrast to Meta, Google won’t be offering any kind of sign-up form to indicate interest, claiming that it won’t reveal the Imagen Video model or source code “until these concerns are mitigated.” This image/video generating system is still not available to the public.
However, it is widely known how Google will always be able to fund every research and development they need. That is why we can expect their Imagen Video AI to improve a lot and achieve great progress in the future.