From: redpointai

AI has significantly changed the music industry, with platforms like Sun.o becoming highly viral [00:00:11]. Sun.o has garnered over 10 million users who have generated songs and recently completed a fundraise of 500 million [00:00:13]. Mikey Shelman, CEO of Sun.o, is at the forefront of this evolving ecosystem [00:00:20].

The Evolution of Music Creation with AI

The integration of AI in music focuses on making music creation more accessible and enjoyable for everyone [00:03:55]. This approach aims to restore the playful, imaginative interaction with music that adults often lose, similar to how children use spoons for drumsticks or pretend to play instruments [00:03:50, 00:03:55]. AI tools are democratizing access to this “inner kid” by simplifying complex creative processes and removing the need for traditional, steep learning curve software like Ableton or Protools [00:04:16, 00:07:47].

Personal Connection to AI-Generated Music

For Sun.o’s CEO, Mikey Shelman, the enjoyment of AI-generated music comes from various sources:

  • Intangible Resonance An Estonian-language German Lied, though not a genre he typically enjoys or a language he speaks, deeply touches him in inexplicable ways, leading him to listen to it hundreds of times [00:01:01, 00:01:14]. This highlights the subjective nature of musical appreciation [00:01:55].
  • Shared Creation Creating songs with his son, often about fantastical situations like a three-year-old driving a Zamboni, resonates strongly because of the shared experience of crafting music together [00:02:06, 00:02:29]. This emphasizes the journey of creation over just the final product [00:02:47]. Such songs serve as memorable markers of time, akin to photographs, that can be revisited in the future [00:05:01].

User Types and Experience on Sun.o

Sun.o caters to two primary user categories:

  • Casual Users These users often “soundtrack their life” by musically narrating everyday events, whether happy, sad, or funny [00:06:06]. Examples include songs about a Starbucks order being wrong or unexpected mail delivery [00:06:34]. For them, music is a way to tell stories [00:06:25].
  • Power Users This group sees Sun.o as a profound creative outlet. They enjoy the process of making music, spending hours to craft a song that matches a sound or story in their head [00:06:45]. This demonstrates the tool’s versatility for radically different use cases [00:07:11].

Overcoming the “Blank Canvas Problem”

Many AI products face a “blank canvas problem” where users are overwhelmed by where to start [00:08:14]. Sun.o addresses this by:

  • Guiding Users Providing suggestions or context, like a Valentine’s Day experience where the reason for creation is clear [00:09:03].
  • Broadening Input Methods Moving beyond text prompts to more intuitive interactions, such as humming a melody, tapping a beat, or drawing inspiration from images or everyday sounds [00:10:01, 00:10:08, 00:11:43]. This acknowledges that current text-driven AI tools are a sign of their early stage and that diverse entry points into the underlying music model will be crucial for the future of generative AI in media and creative industries [00:10:35, 00:10:30].
  • Facilitating Expression Allowing users to describe their mood or express themselves through visuals and sounds to inspire the model [00:11:41, 00:13:13]. The goal is to let people “pour their heart out” into the music creation process [00:13:11, 00:13:25].

The Future of AI Music: Social and Collaborative Experiences

A significant future focus for Sun.o is “multiplayer” music creation, emphasizing making music with other people [00:14:31, 00:14:35]. This includes:

  • Synchronous Collaboration Users literally making music together at the same time, mimicking a jam session where ideas are expressed, reacted to, and riffed upon fluidly [00:14:39, 00:15:16].
  • Asynchronous Collaboration Sending half a song for someone else to finish, or passing musical ideas back and forth with modifications [00:14:42].
  • Interactive Performances The emergence of Twitch streamers creating music live with audience interaction through micro-payments, transforming digital concerts into interactive experiences [00:16:55, 00:17:10]. This also opens possibilities for fans to contribute to music at events like sports games [00:17:52].

This collaborative approach aims to recreate the joyful experience of jamming with friends, even for non-expert musicians, as everyone has taste and can contribute ideas like lyrics or sounds [00:15:31, 00:15:40].

Challenges and Business Considerations

Pricing and Business Model

Sun.o currently offers a free tier with paid options for power users [00:18:10]. However, the business model and pricing strategy for AI music platforms are still in very early stages of development [00:18:32]. The current approach often defaults to adapting SaaS pricing models, which may not be suitable given that AI music generation has a non-zero marginal cost (due to compute) [00:19:10, 00:19:20]. The industry is still exploring how users will engage with AI music, and pricing models will likely evolve significantly over the next decade [00:18:37, 00:19:28].

Model Evaluation

Evaluating music models is complex due to the subjective nature of music [00:20:20]. While automatic metrics for audio quality exist, they are often flawed [00:20:29]. The ultimate test is how much users love the music produced, which depends on both aesthetic quality and the user’s control over the output [00:21:11, 00:21:35]. This reliance on subjective “aesthetics” and user satisfaction means significant human evaluation is necessary, unlike text models that can rely on objective reasoning benchmarks [00:20:36, 00:21:00, 00:21:41].

User feedback is critical:

  • Implicit Feedback Tracking how much new models are used or preferred over others [00:22:24].
  • Explicit Feedback Relying on active Discord communities to report issues or suggest improvements [00:22:32].

Improving models often involves diagnosing specific issues, such as excessively long outros or silent sections, which once identified, can lead to straightforward fixes [00:23:14].

Performance and Infrastructure

Sun.o prioritizes speed, aiming to get songs to users instantly, despite the challenge of generating music in real-time [00:25:20]. The company uses Transformer models, which are auto-regressive, allowing for streaming the song while it’s still being made, significantly improving speed compared to diffusion-based models [00:26:34].

Scaling to support a massive user base (10 million users) has presented challenges in AI product development for music [00:27:12]. Sun.o strategically chooses where to innovate and where to leverage existing tools. For instance, they utilize platforms like Modal for deploying jobs onto GPU infrastructure, benefiting from developer-friendly solutions [00:27:40, 00:27:53]. The relative newness of audio AI means that solutions for image and text communities often pave the way for audio applications [00:28:15].

Market Landscape and Future Vision

The audio AI market is still nascent, but there’s increasing recognition of audio as a “first-class citizen” in AI, given its role in human communication [00:29:35, 00:29:42]. Opportunities exist in various applications, from customer service to coding, transforming how humans interact with systems [00:29:59, 00:30:04].

While multimodal AI is emerging, the deep integration of audio beyond just an interface for text-based LLMs is still some time away [00:32:33, 00:33:05]. This suggests that niche, single-purpose audio models will continue to thrive, preventing immediate consolidation into one giant model [00:33:12, 00:34:15].

Sun.o’s recent fundraise aims to “pull forward the future of music” they envision by investing in:

  • Model Training While music models may not require the same scale as the largest text models, they demand immense care, specialized data, and ongoing research, making training expensive [00:35:11, 00:36:02].
  • Research and Development Especially in figuring out the “right way to model music,” which is still an open question compared to text [00:35:31, 00:35:44].
  • Talent Acquisition Hiring the best people to achieve their vision [00:36:16].

The ultimate goal for Sun.o is not just to create indistinguishable pop songs, but to enable deeper, more intuitive, and enjoyable experiences for the average person [00:37:37, 00:40:54]. This includes developing new interactive ways for people to engage with music, such as a Vision Pro app that allows users to “play air guitar with a band” or “conduct a symphony” [00:38:20, 00:39:03].

IP and Partnerships

The music industry is still early in forming IP partnerships for AI [00:41:44]. Sun.o prefers to work with the industry rather than against it [00:42:05]. The company has deliberately avoided creating “new songs by famous artists” without express consent, viewing such viral moments as fleeting “flash in the pan” attractions rather than the true future of music creation [00:42:12, 00:43:39]. Instead, the focus remains on empowering users to create music relevant to their own lives and experiences [00:43:29].

Lessons from Building Sun.o

  • User Pride Allowing users to easily share their creations and feel proud of their work (e.g., adding their names to song titles on trending pages) has been surprisingly successful [00:47:26, 00:48:00].
  • Hardware Ownership Initial attempts to own GPU hardware proved impractical due to the scale required [00:48:18].
  • Platform Choice Initially believing Discord would be a long-term home for the product, Sun.o found that a dedicated web app quickly surpassed Discord usage for a more “pleasant” and “all-encompassing music experience” [00:48:41, 00:49:10, 00:49:23].
  • Community Engagement The large and engaged Discord community, despite its noise, remains an invaluable resource for implicit and explicit product feedback and evolution [00:50:05, 00:50:18].
  • Open Source vs. Closed Source AI While open source software is lauded, open source AI faces challenges in AI product development for music due to high compute costs for state-of-the-art models and a less clear business model [00:46:21, 00:46:51]. This creates a dynamic where financially resourced entities like Meta produce leading open-source models [00:46:44].
  • Underhyped Areas Music is considered an underhyped area in AI, and more broadly, as a part of people’s lives [00:47:07].
  • Future Applications Beyond music, AI in physical hardware, especially for smart home infrastructure (like voice-controlled plumbing systems), presents significant untapped potential for intuitive interaction and improvement [00:31:17, 00:31:25, 00:31:46].

Sun.o continues to hire for roles, particularly on the East Coast (New York, Cambridge, Massachusetts) and remotely, and welcomes ideas for the future of music from all levels of expertise [00:52:34, 00:52:50].