Full body waifus, AI dreams, realtime AI music, open-source Gemini Omni: AI NEWS

Full body waifus, AI dreams, realtime AI music, open-source Gemini Omni: AI NEWS

More

Descriptions:

AI Search delivers a packed weekly roundup covering more than a dozen model and tool releases across video, image, 3D, audio, and language domains. The top items include ByteDance’s Bernini, an open-source video editing model that accepts text, image, and video references for flexible scene manipulation — compared to an “open-source Gemini Omni” — available now on Hugging Face at roughly 84 GB per branch. NVIDIA’s Deja View is a 3D scene reconstruction model at just 117 million parameters that matches the performance of Depth Anything Three (approximately 10 times larger) by reusing the same transformer block in repeated passes rather than stacking layers, and is already open-sourced.

Google’s Gemma 4 12B earns significant coverage: a new encoder-free multimodal architecture that accepts text, images, and audio directly without a separate encoder step. It runs offline on 16 GB of VRAM, sits between the 4B phone-scale variant and the 26B mixture-of-experts model in the Gemma 4 family, matches the 24B variant on several benchmarks, and is released under Apache 2.0 for commercial use. Alibaba’s Qwen 3.7 Plus, a new frontier open model from Minimax, Baidu’s video model with natively baked-in audio, and Alibaba’s real-time streaming video generator round out the model news.

Additional coverage includes Google’s open-source real-time music generator, two new frontier image models, ChatGPT’s new “dream” feature, NVIDIA’s open-source world model, and humanoid robot demos. The video functions as a fast-scan orientation to a particularly active week in open-source and multimodal AI development.


📺 Source: AI Search · Published June 07, 2026
🏷️ Format: Roundup

1 Item

Channels

5 Items

Companies