Descriptions:
Shivam Verma, tech lead of the User Representations team inside Spotify’s AI Foundation group, delivers a rare inside look at how the company builds personalization for 750 million users across a catalog of 100 million+ tracks, millions of podcasts, and a growing video library. The talk covers three core pillars: foundational user modeling, content representation, and steerable personalization โ and traces Spotify’s evolution from traditional autoencoders to a unified transformer architecture that embeds users, tracks, and podcast episodes into a shared hypersphere.
Verma explains how sequences of user interactions are converted into vectors, then into tokens that can be combined with LLM reasoning to produce context-aware recommendations. The team runs continued pretraining (CPT) and supervised fine-tuning (SFT) on open-weight LLMs โ the same techniques used by frontier labs โ to teach models about Spotify’s content catalog. A visualization of the embedding space shows the presenter’s own user embedding sitting close to machine learning and tech podcast content, illustrating what the model has learned about him as a listener.
The talk also highlights Spotify’s newest user-facing products powered by this stack, including AI DJ and prompted playlists, which as of May 2026 now supports podcast episode curation. Engineers working on recommendation systems, LLM adaptation for domain-specific content, or large-scale personalization infrastructure will find this an unusually candid technical reference from one of the world’s highest-traffic ML platforms.
๐บ Source: AI Engineer ยท Published May 19, 2026
๐ท๏ธ Format: Deep Dive







