Descriptions:
This weekly AI news roundup from the AI Search channel surveys a packed week of multimodal and generative AI releases across video, image, 3D, and audio. Standout coverage includes JustDubIt, a video dubbing tool built on top of LTX 2.3 (the leading open-source video model with audio) that can translate speech and re-sync lip movements in a target language — with the full 2.5 GB model already publicly available. On the 3D generation front, Pixel 3D is introduced as the new benchmark leader, using pixel-aligned geometry to produce high-fidelity meshes from single images and outperforming both Hunyuan 3D and Trellis 2 in direct comparisons. Also covered: asymmetric flow models, an image generator that bypasses the standard latent-space pipeline to produce sharper, more realistic pixel-level outputs.
Other highlights include MiniCPM, a compact 2.6 GB multimodal model designed for phones and edge devices; Creata 2, a style-focused image generator that prioritizes creative consistency over photorealism; and several new open-source world simulation models compared to Google’s Genie 3. OpenAI’s Codex gets a mobile companion app for iOS and Android, allowing developers to monitor, redirect, and approve coding agent tasks remotely while files and credentials stay on the local machine.
The episode also touches on new expressive TTS systems with emotional range, open-source music generation from text prompts, AI-driven video relighting and camera direction tools, and a wave of humanoid robot demonstrations. A useful one-stop reference for tracking the breadth of open-source multimodal progress in May 2026.
📺 Source: AI Search · Published May 17, 2026
🏷️ Format: Roundup







