From: redpointai

Haiyan, an AI video platform that recently secured 500 million valuation, is at the forefront of revolutionizing video creation. Led by CEO Joshua, the company’s mission is to make the camera obsolete, enabling visual storytelling for everyone by making video production 10 times faster and cheaper using AI video tools [00:04:45], [00:11:38]. The platform helps over 40,000 customers create, localize, and personalize video content [00:09:37].

Current State of Interactive Avatars

Haiyan currently offers an interactive avatar beta version that allows users to send their avatars to participate in Zoom meetings and interact in real time [00:19:15], [00:19:26]. This capability represents a significant step towards more dynamic and immersive AI avatar technology.

Technical Challenges for Synchronous Streaming

Achieving seamless synchronous (real-time) streaming with AI avatars presents considerable technical hurdles:

  • Model Complexity: As AI models grow larger and more intricate, optimizing their performance for real-time inference becomes increasingly challenging [00:19:42].
  • Inference Speed: The primary challenge is navigating through complex model architectures to maintain high inference speeds necessary for live interaction [00:19:51].

Despite these challenges, there is strong optimism that these technologies will be capable of running in real time, even on devices, within the next 12 months [00:20:46], [00:20:54].

Future Vision for Synchronous Streaming

The advent of real-time AI avatars promises to unlock entirely new use cases [00:19:57]. One significant area of impact is advertising, where platforms like Facebook and Google currently show the same video ads to all users. In the future, it will be possible for users to view different video ads based on their individual preferences and watch history, leading to highly personalized experiences [00:20:00], [00:20:17]. This aligns with a broader vision for AI-powered content personalization and future trends in AI and personalization.

Evolution of Avatar Models

Haiyan’s focus is on building “magic moments” by inventing new ways for customers to create video [00:01:32], [00:01:45], [00:07:33]. The core of this is the quality of the AI models, which goes beyond mathematical optimization to focus on how great the video looks [00:08:01]. Key aspects of an engaging avatar include [00:13:40]:

  • Facial Expression and Voice Tone: Maintaining the speaker’s unique voice tone and facial expressions during localization [00:10:10].
  • Body Motion and Gesture: Coordinating head movement, eyebrow movement, and body gestures to match the script and content [00:14:19], [00:21:25].
  • Full Body Rendering: Haiyan’s AI 3.0 model can render the entire body, with future plans to include gestures [00:15:20], [00:21:32].
  • Capturing Diverse Modes: The aim is to create models that can capture different speaking modes, such as presentation mode or interview mode, from just 30 seconds to 2 minutes of video footage [00:17:23], [00:17:31].

Haiyan’s Approach to Video Generation

Haiyan primarily focuses on business videos [00:22:30]. It pursues an “orchestration engine” approach to video generation, which involves capturing and integrating text, script, voice, sound, music, avatar footage, and background generation [00:23:00], [00:24:10]. This method is preferred over pixel-by-pixel generation (like text-to-video models such as Sora) because it offers greater control, consistency, and quality, which are crucial for brands and enterprises [00:23:17]. Haiyan aims to integrate with and build upon text-to-video technologies as a foundational layer [00:24:00].

Impact on Content Creation and Traditional Platforms

AI-generated content introduces a dilemma for existing platforms like TikTok, which are built around human content creators [00:29:40]. If AI-generated content becomes prevalent, platforms will face challenges in ranking and recommending it alongside traditional camera-based content without diminishing the reach of human creators [00:30:00], [00:30:18]. This could lead to the emergence of new platforms specifically for AI-generated content [00:30:37]. Haiyan, however, intends to remain a creative tool provider, not a consumption platform [00:31:10].

Future of AI-Generated Content and IPs

The ability to generate new voices and AI-generative people could lead to the creation of entirely new intellectual properties (IPs) [00:36:10]. With advancements in image generation allowing for consistent character persistence across different generations, extending this to video opens up possibilities for new forms of AI influencers and digital personas [00:36:27], [00:36:48].

Funding and Costs in AI

The AI category is uniquely capital-intensive due to the high costs associated with GPUs and talent [00:37:32]. Unlike traditional software companies where marginal costs approach zero, serving additional customers in AI involves significant GPU consumption [00:37:56]. However, AI also makes individual employees much more efficient, potentially reducing the overall capital required to build a great AI company [00:38:24], [00:39:36]. AI-native teams leveraging AI tools themselves can operate with greater efficiency [00:38:50].

Future of Video Creation Workflows

By 2030, the vision is that everyone will have a “video agency on their pocket” [00:48:30]. This AI-powered agency would allow users to interact with a product like Haiyan as if speaking to a personal video agency, guiding them through the entire video creation process from ideas to final editing [00:48:04], [00:48:41].

This transformation will lead to new use cases that are currently unimaginable, much like how the mobile camera in 2012 led to platforms like Instagram, Snapchat, and TikTok [00:49:29]. Lowering the barrier to content creation through advanced tools will unlock a new world of possibilities for visual storytelling [00:50:01].