Future of AI avatars

From: redpointai

Hægen, an AI video platform, is at the forefront of developing and deploying AI avatars for various applications. The company recently raised $60 mi ll i o na t a$ 500 million valuation, indicating significant investor interest in its approach to AI-generated video content and avatars [00:00:04].

The “Magic Moment” of AI Avatars

The concept of “magic moments” is central to Hægen’s product philosophy, aiming to disrupt traditional video creation by inventing new AI experiences [00:01:17]. An early “magic moment” for Hægen was when Elon Musk and others used its video translation and dubbing technology for the Argentinian president’s speech at the World Economic Forum, showcasing the power of speaking in different languages with natural voice and expression [00:00:53], [00:01:45].

For Hægen CEO Joshua Xu, his first “magic moment” was creating his own avatar and watching himself speak on screen [00:02:36]. He also found it “much, much easier” to use his avatar to generate an internal product update video from a script, eliminating the need to film himself [00:03:04].

Current Capabilities and Use Cases

Hægen’s AI video platform helps users create, localize, and personalize video content [00:03:51]. It serves over 40,000 customers with three primary use cases [00:09:37]:

Creation: Users can create videos by typing text, using their own avatar, or selecting from Hægen’s stock avatars, eliminating the need for a camera [00:09:48]. The mission is to enable visual storytelling for everyone, especially those without expensive cameras or sophisticated video software skills [00:11:40].
Localization: Existing videos (even non-Hægen ones) can be localized into over 175 different languages and dialects while preserving voice tone, facial expression, and lip-sync [00:10:01].
Personalization: One video can be personalized into over 100,000 variations based on customer demographics, industry, or specific problems they face, similar to personalizing emails [00:10:19].

Hægen primarily targets the “non-professional player” – the 99% of users who aren’t professional video editors or filmmakers, such as marketers who write scripts but lack video production skills [00:11:18]. The company focuses on demonstrating what’s possible with AI avatars across various use cases like marketing, sales, customer support, training, and content creation [00:12:44].

Technical Aspects and Quality

The core focus for Hægen’s avatars is “engaging” footage, ensuring that the delivered message is effective and prevents viewers from disengaging quickly [00:13:40]. This engagement relies on various aspects of AI quality, including [00:07:47]:

Lighting of the footage [00:08:25]
Naturalism of speech [00:08:27]
Body motion and gestures matching the script [00:08:32]

Hægen’s dedicated research team builds the entire video layer in-house, focusing on lip-syncing, driving body motion, and full-body rendering [00:15:10]. The Hægen Avatar 3.0 model can render the entire body [00:15:20].

To create a personalized avatar, users submit a video ranging from 30 seconds to 2 minutes. The AI model learns and mimics the user’s entire speaking behavior, including mouth movements, gestures, and facial expressions, creating a unique “talking style” [00:17:04], [00:17:23]. In the future, larger models will capture different “modes” (e.g., presentation mode, interview mode) to adapt the avatar’s behavior based on the content or script [00:17:36].

The Future of AI Avatars

Interactive AI avatars and realtime engagement

Hægen already has a beta version of its interactive avatars that can attend Zoom meetings and interact in real-time [00:19:15]. The main technical challenge for synchronous generation is optimizing inference speed for larger, more complicated models [00:19:39]. Joshua is optimistic that real-time avatar technology will be widely available within 12 months, with some even running on-device [00:20:46]. This will unlock new use cases, such as personalized video advertising where different viewers see unique video ads based on their preferences and watch history [00:20:00].

Full Body Movement

The ability to render full body avatars and include natural gestures is crucial for creating engaging human presenters [00:21:10]. While Hægen’s 3.0 avatar currently renders the full body, integrating gestures is the next step [00:21:32]. The challenge lies in a relatively new area with limited data and understanding of the right model architecture to capture these dimensions [00:21:44].

Integration with Text-to-Video Models

Hægen primarily focuses on business videos, prioritizing control, consistency, and quality [00:23:00]. Rather than solely generating pixel-by-pixel videos (like pure text-to-video models such as Sora), Hægen builds an “orchestration engine” [00:23:00]. This engine integrates various components: text, script, voice, sound, music, avatar footage, and some background generation [00:23:06]. Hægen plans to collaborate with text-to-video partners, using their output as a component while building a service layer on top to directly interface with customers [00:24:03].

Brand Personalization

A significant future development is “brand personalization” for video. This involves an AI model learning a company’s brand tone, context, history, product, color palette, video style, and even intro/outro clips from a URL or past videos [00:25:03]. This learned information would then be “baked” into the final video assembly process, much like large language models can adapt to specific writing styles [00:26:06].

AI-Generated IPs and Influencers

The ability to generate new voices and new, consistent AI-generated persons could lead to the creation of future intellectual property (IPs), such as AI influencers [00:36:10]. This opens up new possibilities for content creation and digital personas [00:36:32].

Business Model and Challenges

Enterprise Focus

Hægen’s strategy for enterprise customers requires a much higher quality standard, especially regarding brand consistency and video output [00:31:41]. Key challenges include integrating the technology and product into existing day-to-day workflows, particularly with CRM and go-to-market tools [00:32:07]. Hægen has already partnered with platforms like HubSpot to build integrations within their app ecosystems [00:32:38].

Monetization and Cost

AI companies face unique cost factors, primarily GPU usage and talent [00:37:32]. Unlike traditional software with near-zero marginal costs, AI services consume significant GPU computing power per additional customer [00:37:56]. Despite these costs, Joshua believes AI-native companies can be surprisingly less capital-intensive due to increased employee efficiency with AI tools (e.g., using ChatGPT, Hægen for internal communications) and accelerated go-to-market due to industry excitement [00:38:28]. Hægen offers a free tier for users to sign up and create their own avatars, allowing them to experience the “magic moments” [00:40:17], [00:51:19]. Hægen plans products around future model capabilities and costs rather than waiting for inference costs to drop [00:40:51].

Industry Landscape and Dilemmas

Hægen differentiates itself from incumbents like Snapchat and TikTok by building for users who lack access to expensive cameras or sophisticated editing tools, aiming to make the camera obsolete [00:27:47], [00:29:26]. This creates a new market opportunity rather than directly competing in the old one [00:28:27].

A significant dilemma for existing content platforms (like TikTok) arises as AI-generated content becomes more prevalent [00:29:38]. These platforms are built around human creators using mobile cameras [00:29:12]. If AI-generated content grows (e.g., from 10% to 50% of content), it directly competes with traditional creator content, potentially leading to reduced views and attention for existing creators [00:29:58]. This might necessitate the emergence of new platforms specifically for AI-generated content [00:30:30]. While Hægen’s mission is not to build a consumption platform, it sees this as a possible future opportunity for new platforms [00:31:10].

Ethical Considerations: Trust and Safety

Trust and safety are critical for Hægen, especially when serving large enterprise customers [00:33:50]. Their approach involves two main pillars:

Avatar Creation:
- Consent: Every avatar created on Hægen requires video consent from the person [00:34:11].
- AI Matching & Dynamic Passcodes: Advanced AI matches the consent to the footage, and dynamic, expiring passcodes (every 10-15 seconds) are used to provide a secure layer, making it “almost impossible” to create someone’s avatar without their consent [00:34:20].
- Human Review: A moderation team conducts human reviews of footage to ensure compliance [00:34:46].
Platform Moderation Policy: Hægen has a strict policy against hate speech, misinformation, fraudulent content, and political campaign information [00:35:03]. Content is reviewed by a hybrid system of AI models and human moderation [00:35:15].

Hægen also has partnerships with actors who license their likenesses for stock avatars on the platform [00:35:49].

Vision for 2030: A Personal Video Agency

Joshua envisions that in five years, everyone will have a “video agency in their pocket” [00:48:30]. This AI-powered agency, possibly provided by Hægen, would interact with users like a personal consultant, guiding them through the video creation process from idea to final product, including filming (generating footage), editing, and incorporating feedback [00:47:51].

The power of creative tools like AI avatars lies in opening up entirely new use cases that are currently unimaginable, much like how mobile cameras led to platforms like Instagram, Snapchat, and TikTok [00:49:26]. By lowering the creation barrier, AI avatars will enable a new world of content previously inaccessible [00:49:56]. This perspective was shaped by Joshua’s career experience at Snap, where he witnessed the evolution of mobile platforms and the emergence of diverse content platforms [00:50:17].

Tubegraph

Explorer

Table of Contents