From: redpointai

Artificial intelligence has “completely changed” the music industry [00:00:10]. Products like Suno, with over 10 million users and a recent fundraise of $125 million, are at the “epicenter of this ecosystem” [00:00:20], showcasing the transformative potential of AI in music.

Suno’s Approach to Music Creation

Suno’s CEO, Mikey Shulman, emphasizes a vision for the future of music [00:00:32] that focuses on the joy of music creation itself, rather than solely the final product [00:02:44]. The company aims to make music creation accessible to everyone, helping adults rediscover the playful, uninhibited interaction with music often seen in children [00:03:50].

User Experiences and Categorization

Suno identifies two main categories of users [00:06:02]:

  1. Casual Users: These users engage in “soundtracking their life,” narrating everyday moments musically [00:06:13]. This includes creating songs about mundane events like a Starbucks order error or unexpected deliveries [00:06:34]. Music serves as a way to tell stories and relive funny or memorable moments [00:06:25]. This aligns with user experiences and categorization in AI music platforms.
  2. Power Users: This group views Suno as a “creative outlet” and enjoys the process of making music [00:06:50]. They spend hours crafting songs to match a specific sound or story in their head [00:07:05]. Suno aims to give these users “control over the music” without the steep learning curve of traditional production software like Ableton or Pro Tools [00:07:47]. This demonstrates how AI democratizes access to creation tools [00:04:16].

Overcoming the “Blank Canvas Problem”

A significant challenge for AI creator tools is the “blank canvas problem,” where users don’t know where to begin [00:08:14]. Suno is working on making the initial creation process more intuitive [00:09:55].

Instead of a text box prompt, future interactions might involve:

The goal is to shift from text-driven interactions, which are seen as a sign of how early the technology is [00:10:37], to more expressive and intuitive methods of input [00:13:11].

The Social and Collaborative Aspect of Music

A core vision for Suno is making music creation a shared experience, moving beyond single-player creation [00:14:18]. Mikey Shulman emphasizes that making music with other people is one of the “most enjoyable moments” of his life [00:02:51].

Future developments include:

  • Multiplayer Creation: Synchronous (jam sessions) and asynchronous (sending half a song to be finished) co-creation [00:14:37].
  • Musical Conversation: Treating music as a conversation, where users can riff off each other’s ideas fluidly [00:15:20].
  • Live Performance & Interaction: Observing Twitch streamers using Suno to create live music and engage audiences through micro-payments highlights the potential for interactive digital concerts [00:16:55]. This suggests a future where fans and even athletes could have input in creating event music [00:17:52].

This focus on shared experiences and collaboration reflects future possibilities and visions for AI and music collaboration.

Business Model and Technical Challenges and Innovations

Suno’s current business model offers a free tier with a paid subscription for power users, but the company is “actively not trying to innovate on the business model” yet, given how early the market is [00:18:40]. The CEO notes that current AI pricing models often “blindly adapt” SAS pricing, which isn’t ideal for AI due to non-zero marginal costs associated with generating songs [00:19:20]. This highlights the impact of AI advancements on business models.

Model Evaluation

Evaluating music AI models is challenging due to the subjective nature of music [00:20:08].

  • Objective Metrics: Automatic metrics for audio quality exist but are “flawed” [00:20:31].
  • Subjective Metrics: “Aesthetics matter” [00:20:36]. The ultimate test is “how much do our users love the music” [00:21:11] and the level of control users have over the output [00:21:34].
  • User Feedback: User engagement (e.g., song usage, choosing models, sharing) and explicit feedback from communities like Discord are crucial for identifying issues [00:22:20].

Infrastructure and Speed

Suno has experienced “insane Spike of usage” [00:27:12]. Key aspects of their infrastructure include:

  • Speed: Suno strives for fast song generation, recognizing that users expect instant results akin to Spotify [00:25:46]. Transformers are used to stream songs while they are still being made [00:26:36].
  • Leveraging Existing Tools: Suno uses platforms like Modal for easy deployment of jobs onto GPU infrastructure [00:27:45], avoiding the need to innovate on everything themselves [00:28:09]. They also benefit from breakthroughs in image and text AI communities that solve problems applicable to audio [00:28:20]. These efforts contribute to addressing challenges and innovations in AI music tech and infrastructure.

Future Capabilities

Current limitations of music models include:

  • Iterative Control: Lack of precise control for iterative refinements (e.g., “do that but change X”) [00:23:55].
  • Specific Parameters: Models don’t reliably listen to precise instructions for elements like music tempo (BPM) [00:24:17].

Suno’s North Star metrics are focused on user enjoyment: how many users make songs, daily retention, likelihood of exhausting free tiers, and song sharing [00:24:50].

Broader AI and Audio Market

The recent advances in audio AI, such as GPT-4o, highlight that audio “should be a first-class citizen” in the AI world [00:29:38], given that it’s how “the vast majority of human communication happens” [00:29:42].

Potential applications for voice AI extend beyond music:

Regarding the general audio model space, Shulman believes it will take “longer than people realize” for audio to be treated as more than just an interface for LLMs [00:32:41]. He doesn’t foresee rapid consolidation into a single giant model, suggesting room for many specialized audio models [00:33:16]. This aligns with potential and development of AI in music and other industries across various domains.

Funding and Future of Music Models

Suno recently raised $125 million, which will be used for “scale everywhere” [00:35:11], including training larger music models, conducting research, acquiring specialized data, and hiring talent [00:36:00]. The goal is to “more quickly pull forward the future of music” that Suno envisions [00:36:39].

A “3.5-minute pop song that is indistinguishable from what a human would make in a recording studio” is seen as a benchmark, but potentially “artificially constrained” and “disappointing” if it’s the only goal [00:37:37]. The true ceiling is higher because music is about how it makes you feel [00:37:56].

Market Dynamics

The music AI market is considered “really, really big” and “so Green Field” [00:39:40], suggesting ample room for multiple companies and different niches [00:40:10]. This includes tools for professional artists and background music generation, which are not Suno’s primary focus [00:40:24].

Regarding IP partnerships, the CEO draws an analogy to Napster and Spotify, expecting both those who work with the industry and those who work against it [00:41:48]. Suno is “very excited to work with the industry” [00:42:05]. Direct artist partnerships (e.g., generating new songs by a specific artist without consent) are viewed as “viral moments” but not a significant part of the future of music [00:42:36]. This perspective aligns with personal anecdotes and experiences related to AI music creation. The focus is on enabling users to create music about “things that are relevant to them” [00:43:29].

Hot Take: Open source AI is "overhyped" because compute costs create high barriers, making it difficult for open-source initiatives to keep pace with financially incentivized private companies, especially in the short term [00:46:21]. Music, however, is "underhyped" as a part of people's lives [00:47:07].