From: redpointai
AI is fundamentally changing the music industry [00:00:09]. Suo, with over 10 million users and a recent fundraise of $125 million, is at the forefront of this transformation [00:00:12]. Mikey Shelman, CEO of Suo, is deeply involved in shaping the future of AI and audio [00:00:20].
Impact and Vision
AI brings a new dimension to music, expanding its role as a form of communication and storytelling [00:09:30]. The future of music is envisioned as more interactive, collaborative, and accessible to everyone, not just expert musicians [00:03:55].
Democratizing Creativity
One of the core aims of AI creator tools, including Suo, is to reawaken the imaginative play often lost in adulthood [00:04:01]. These tools aim to democratize access to creative expression, moving away from complex production software with steep learning curves like Ableton or Protools [00:07:47]. The focus is on a guided creative experience, akin to “paint by numbers,” allowing humans, who are hardwired to enjoy such experiences, to reconnect with their inner artist [00:04:25].
User Experiences and Product Development
Suo identifies two main categories of users:
Casual Users
Often described as “soundtracking your life,” these users musically narrate happy, sad, funny, and memorable moments. Examples include songs about trivial occurrences like Starbucks getting a name wrong or unexpected visitors [00:06:11]. Music serves as a personal storytelling medium [00:06:25].
Power Users
These individuals use Suo as a significant creative outlet, finding enjoyment in the process of making music as much as the final product [00:06:50]. They spend hours crafting songs to match sounds and stories in their heads [00:06:58]. The platform confirms that many people possess great musical taste and ideas but previously lacked the means to actualize them [00:07:26].
Overcoming the “Blank Canvas” Problem
A common challenge for AI product development is the initial “blank canvas” or “cold start” problem, where users are unsure how to begin [00:08:14]. While current AI tools are largely text-driven [00:10:35], the future aims for more intuitive input methods beyond text prompts, such as:
- Humming melodies [00:10:01]
- Tapping beats [00:10:04]
- Using images or describing mood [00:10:08], [00:11:41]
- Incorporating everyday sounds like clinking glasses to inspire tracks [00:11:51] The goal is to make music creation a first-class citizen in communication and multimedia, realizing that reasons to make a song already exist, similar to sending a text or taking a picture [00:09:16].
Collaborative and Social Aspects
A significant future focus for Suo is “everything multiplayer” – enabling people to make music together [00:14:30]. This includes:
- Synchronous creation: Jamming together in real-time [00:14:39], [00:15:16].
- Asynchronous collaboration: Sending half-finished songs or musical ideas to be modified and passed back [00:14:42].
- Interactive performances: Witnessing a Twitch streamer using Suo for a digital concert, where viewers can micro-pay to interact and influence the music, creating an interactive experience [00:16:51]. This concept could extend to sports stadiums, with fans contributing to game-time music [00:17:44].
Business Model and Pricing Strategy
Suo currently offers a free tier for a set number of songs, with charges for power users who generate more [00:18:10]. However, the business model and pricing strategy for AI music platforms is still evolving and not yet settled across the industry [00:18:29]. The current approach often mimics SaaS pricing, which may not be entirely suitable for AI products due to non-zero marginal costs associated with generating content [00:19:07]. The company is actively focusing on product innovation before fully optimizing its business model [00:18:46].
Model Evaluation and Development
Evaluating music models is complex because music lacks a “correct” answer, unlike text or reasoning tasks [00:21:00]. While objective metrics for audio quality exist, they are often flawed [00:20:29]. Ultimately, quality is measured by user satisfaction and love for the music produced [00:21:11]. This often relies on human evaluation and feedback from a large, engaged user base, particularly through community channels like Discord [00:22:32].
Challenges in AI product development for music
Despite rapid progress, challenges remain:
- Iterative control: Users often need to express specific changes (e.g., “do that but change X”) which is currently difficult [00:23:55].
- Precise control over musical elements: Models provide loose guidelines for elements like tempo (BPM), but more objective and precise control is needed [00:24:17].
- Speed: While already fast, the expectation set by platforms like Spotify means continuous efforts to reduce generation time and keep users engaged in the creative journey [00:25:34]. Suo utilizes autoregressive Transformers, allowing songs to stream while still being made, contributing to its speed [00:26:36].
Infrastructure and Scalability
Suo has experienced massive user growth, with 10 million songs generated [00:10:10]. To support this, they have focused on being deliberate about where to innovate versus buy infrastructure solutions [00:29:03]. They leverage external tools like Modal for deploying jobs onto GPU infrastructure, grateful not to have to build everything in-house [00:27:44]. The audio domain benefits from advancements in AI for media production in image and text communities, as many problems are solved and open-sourced there before becoming critical for audio [00:28:15].
Broader Opportunities in AI Audio
The recent wave of AI innovation, particularly with multimodal models like GPT-4o, highlights that audio should be a first-class citizen in AI applications [00:29:32]. Audio is the primary mode of human communication [00:29:42]. While current models like GPT-4o and ElevenLabs offer impressive audio interfaces, they often still rely on a text-based reasoning layer [00:32:51]. Full integration of audio into the underlying models is still some time away [00:33:07]. This suggests that consolidation into one giant model for audio may be distant, with room for many specialized audio models and companies [00:33:12].
Market Landscape and IP
The AI music market is expected to be very large and diverse [00:39:40]. It’s a “green field” with ample opportunities for various types of businesses:
- Tools for professional artists [00:40:24].
- Background music for vast content libraries like YouTube videos [00:40:37].
- Experiences for the general consumer [00:40:54].
Regarding intellectual property (IP) and partnerships, the industry is in its early stages [00:41:41]. Suo aims to work with the music industry [00:42:05]. They avoid direct artist impersonation or generating “new Charlie Puth songs” without explicit consent, believing that such viral moments, while attention-grabbing, are not a significant part of the long-term future of music creation [00:42:17]. The focus is on enabling users to create music relevant to their own lives and stories [00:43:09].