From: redpointai
Suno, an AI music creation platform, has garnered over 10 million users who have generated songs, and recently completed a fundraise of 500 million [00:00:13]. Mikey Sheman, CEO of Suno, believes AI has “completely changed” the music industry and discusses user experiences and product direction for the platform [00:00:10].
User Experiences with Suno
Personal Connections to AI-Generated Music
Sheman shared that his favorite piece of music created on Suno is an Estonian language German Lied, which deeply touches him despite not speaking Estonian or being particularly into the genre [00:01:01]. He also enjoys his own creations, particularly songs made with his son about fantastical situations, like his three-year-old driving a Zamboni [00:02:06]. For these, the joy comes not just from the final product, but from the experience of crafting music with someone, marking a moment in time [00:02:29].
The Joy of Creation Over Final Product
Sheman notes that music is often made solely for the final product, without much consideration for the journey of creation [00:02:41]. Making music with others, however, is one of the “most enjoyable moments” of his life [00:02:51]. Suno aims to bring back the form of play that adults often lose, allowing them to reconnect with their inner child and imagination [00:03:50]. This aligns with the idea of AI creator tools democratizing access to the ability to dream up fantastical things, overcoming the hurdle of complex tools or past negative feedback about one’s artistic abilities [00:04:01].
Sheman sees AI music as analogous to looking at photos to relive moments, allowing users to amplify the emotional connection music has to specific memories [00:05:20].
User Categorization: Casual vs. Power Users
Suno’s user base largely falls into two categories:
- Casual Users: These users engage in “soundtracking their life,” using music to narrate happy, sad, funny, and memorable events [00:06:13]. Music acts as a storytelling medium, leading to creations about everyday occurrences like Starbucks getting a name wrong or unexpected package deliveries [00:06:34].
- Power Users: This group views Suno as an “amazing creative outlet,” enjoying both the process of making music and the final product [00:06:50]. They have specific sounds and stories in mind, often spending hours to craft their vision, demonstrating that people with great musical taste and ideas can create without traditional complex software [00:06:58].
Overcoming the “Blank Canvas Problem”
A significant challenge for many AI products, including Suno, is the “blank canvas problem” – users facing an empty text box without knowing where to begin [00:08:14]. Suno aims for more intuitive future interactions beyond simple text prompts [00:09:55].
Examples of future interaction methods include:
- Humming a melody or tapping a beat [00:10:01].
- Using the last picture taken on a phone as inspiration [00:10:08].
- Describing a mood [00:11:41].
- Expressing oneself through visuals or sounds, turning everyday sounds into music [00:11:47].
The company implemented a Valentine’s Day experience, where the reason for creating a song (for a loved one) was obvious, guiding users through the process [00:09:03]. This highlights the importance of showing people that music can be a “first-class citizen” in communication and storytelling, just like texting or taking a picture [00:09:30].
Future of User Interaction and Collaboration
Enhancing Control for Power Users
For power users, the focus is on giving them more control over the music to achieve the sounds in their heads, but not necessarily through the complex interfaces of traditional production software [00:12:54]. The aim is to allow users to “pour their heart out” through singing, image montages, or mood-boarding sounds to inspire the model [00:13:09]. This approach makes the creative journey more enjoyable, which in turn leads to more enjoyable music for recipients [00:13:31].
The Multiplayer Future of Music Creation
A significant future focus for Suno is “everything multiplayer,” fostering shared music-making experiences [00:14:30]. This can be:
- Synchronous: Users making music together in real-time, mimicking a jam session [00:14:39].
- Asynchronous: Sending half a song for someone to finish, or passing musical ideas back and forth for modification [00:14:44].
Music is viewed as a conversation, which can happen in various ways, similar to natural language conversations (IRL, Slack) [00:15:08]. The goal is to recreate the “joyful experience” of jamming for people who aren’t instrument experts, allowing them to express ideas through lyrics or sounds and riff off each other [00:15:31].
This collaborative model can extend to public viewing, with Suno observing Twitch streamers creating live, interactive digital concerts where viewers can micro-pay to interact with the streamer [00:16:55]. This demonstrates the potential for AI music to bring large groups of people together, akin to a football game, and opens possibilities for fan and athlete input in stadium music [00:17:21].
Product Evaluation and Development
Challenges in Model Evaluation
Evaluating music models is harder than other domains like text or images because music doesn’t have a “correct answer” [00:20:20]. While objective metrics for audio quality exist, they are often flawed [00:20:29]. The company emphasizes that “aesthetics matter,” relying on human judgment from people who deeply love music to evaluate subjective aspects [00:20:36].
The ultimate test for quality is how much users love the music produced and the level of control they have over it [00:21:11]. Suno leverages its large and engaged user base for feedback, both implicitly (usage, model choices) and explicitly (Discord community reporting issues) [00:22:18].
Iterative Improvements and Desired Capabilities
Model issues are often case-dependent, like fixing overly long outros or silence at the end of songs [00:23:05]. A key area for improvement is enabling iterative control, allowing users to express specific changes (e.g., “do that but change X”) [00:23:55]. More precise control over quantifiable aspects like BPM is also desired [00:24:23].
Suno’s North Star metrics revolve around user enjoyment:
- Number of users making songs [00:24:50].
- Daily returning users [00:24:52].
- Probability of users exhausting their free tier (indicating enjoyment) [00:24:56].
- Sharing activity (social engagement) [00:25:10].
Speed is also a critical factor; the goal is to be as fast as Spotify, despite having to generate the song from scratch [00:25:47]. Suno achieves this partly by using Auto-Regressive Transformers, which allows for streaming the song while it’s still being made [00:26:36].
Infrastructure and Tooling
Suno has experienced an “insane spike of usage” [00:27:12]. They’ve been deliberate about where to innovate and where to buy existing solutions, like using Modal for deploying jobs onto GPU infrastructure [00:27:37]. The audio domain benefits from problems being solved by communities in image and text AI, which are more advanced [00:28:15].
Market Dynamics and Future Vision
Market Size and Focus
Sheman believes the AI music market is “really, really big” and “Green Field,” with potential for many companies [00:39:38]. While others may focus on professional tools or background music, Suno’s core focus is on building experiences for the “average person” and expanding the joy music brings to their lives [00:40:54].
AI in Music vs. Other Domains
Sheman is excited that audio is becoming a first-class citizen in the AI world, recognizing its importance in human communication [00:29:38]. He anticipates a future where interacting with systems is as natural as interacting with other humans, extending beyond obvious uses like customer service to less obvious ones like code interaction [00:29:54].
Regarding the general audio model space, Sheman predicts that multimodal AI will take longer than people realize to fully incorporate audio, as current impressive models (like GPT-4o or ElevenLabs) often act as interfaces to text-based LLMs [00:33:01]. This suggests less immediate consolidation into one giant model.
Stance on Artist Partnerships and IP
Suno aims to work with the music industry, but differentiates itself from models that focus on imitating specific artists without consent [00:42:05]. Sheman considers such “artist partnership” or “fake voice” viral moments to be a “flash in the pan” [00:43:40]. He likens it to initial engagement with GPT, where users might create Shakespearean sonnets about silly topics, but the real value comes from practical daily applications [00:42:43].
Suno’s focus remains on empowering users to create music about things relevant to them, enhancing the joy of creation, rather than mimicking existing artists [00:43:09]. While these viral moments might onboard people to the technology, Sheman believes directly showcasing the personal, creative use cases from the start is a more straightforward entry into music making [00:44:04].
Pricing Model
Suno currently offers a free tier for a set number of songs, with power users charged for additional generations [00:18:10]. Sheman acknowledges that the current pricing model is a somewhat “blindly adapted” version of SaaS pricing, which isn’t perfectly suited for AI due to non-zero marginal costs (compute expenses per song generation) [00:19:10]. He anticipates that pricing models will evolve and become more product and use-case dependent, rather than conforming to current norms [00:19:40].
Community Insights and Learnings
The shift from Discord to a dedicated web app for Suno’s product was a significant learning [00:48:41]. Despite a robust Discord community that provided “crowdsourcing prompt engineering” and feedback, 90% of usage shifted to a simpler web app within five days of its launch [00:49:10]. This highlighted that for an “all-encompassing music experience,” a dedicated platform is preferred over a messaging platform like Discord [00:49:26].
The community remains an “underappreciated resource” for feedback and guidance on using the models [00:49:54]. Suno also learned that users are proud of their creations and will edit song titles to include their names when their songs trend, indicating a desire to “lean into” that feeling of pride [00:48:00].
The company’s initial investment in owning some of its own GPU hardware proved less effective due to the unexpected scale of demand, leading to hardware collecting dust [00:48:18].