From: redpointai
Haen, an AI video platform, has recently raised 500 million valuation from investors including Benchmark and Thrive, highlighting the growing interest in AI video tools for enterprises [00:00:04]. The company’s CEO, Joshua Xu, shared insights on how enterprises are leveraging AI video tools today, addressing security guidelines for voice cloning, the dilemma platforms like TikTok face with AI-generated content, and future product expansion [00:00:11].
The “Magic Moment” of Generative AI
Haen aims to disrupt traditional content creation by building an “AI experience” that feels magical [00:01:22]. A notable “magic moment” occurred when Haen’s video translation and dubbing technology was used to translate the speech of the president of Argentina at the World Economic Forum into different languages, gaining viral attention [00:00:53]. For Joshua, his first personal “magic moment” was creating his own avatar and watching it speak [00:03:36]. He now uses his avatar to generate internal product update videos, simplifying communication [00:02:57].
Current and Future State of AI for Creative Tools
Haen’s platform enables users to create, localize, and personalize video content [00:03:52]. Historically, video production involved filming with a camera and then editing the footage [00:04:03]. With generative AI, it is now possible to generate footage directly using AI, effectively replacing the need for a camera [00:04:36]. Haen’s initial mission was to replace the camera, making video creation 10 times faster and 10 times cheaper for businesses and individuals who lack access to expensive equipment or are uncomfortable on camera [00:05:06].
The evolution of video editing is also anticipated; traditional timeline editors, which exist due to the high cost of cameras requiring multiple takes, may become obsolete [00:06:04]. In the future, editing could involve generating video from text, combining scriptwriting with intuitive 2D canvas navigation [00:07:00].
Building High-Quality AI Avatars
The primary focus for Haen is on AI quality, ensuring generated footage can replace traditional camera processes [00:07:50]. Quality encompasses various aspects beyond mathematical models, including:
- Lighting of the footage [00:08:27]
- Realism and expressions of the person [00:08:30]
- Body motion and gestures matching the script [00:08:32]
Haen’s dedicated research team builds all avatar models in-house, focusing on lip-syncing, body motion, and full-body rendering [00:15:11]. Their AI 3.0 model renders the entire body [00:15:28]. Training good avatar models involves observing extensive talking video data, similar to how a human learns, and continuously improving model architecture to capture nuanced dimensions [00:15:53].
To create a personalized “video avatar,” users submit 30 seconds to 2 minutes of footage, allowing the AI model to learn their unique talking style, including mouth movements, gestures, and facial expressions [00:17:04]. Future developments aim to capture different “modes” (e.g., presentation mode, interview mode) and adapt avatar behavior based on the script or content [00:17:40].
Haen’s Business Use Cases
With over 40,000 customers, Haen’s platform is primarily used for three key applications [00:09:37]:
- Create: Users can create videos using their own avatar or stock avatars by simply typing text, eliminating the need for a camera [00:09:48].
- Localize: Existing videos (even non-Haen ones) can be localized into over 175 different languages and dialects, maintaining voice tone, facial expression, and lip-sync [00:10:01]. This was exemplified by the viral dubbing of the Argentine president’s speech [00:00:53].
- Personalize: A single video can be personalized into more than 100,000 variations, tailoring messaging based on the specific customer, industry, or problem [00:10:19].
Haen is designed for the “non-professional player” – the 99% of users who are not professional video editors [00:11:18]. This includes marketers, content creators, and those in sales, support, customer success, and training, who write scripts but may lack the skills or tools to produce videos [00:11:29]. The mission is to enable visual storytelling for everyone, especially those without expensive cameras or complex software [00:11:40]. A key challenge is educating users on this new way of creating video and demonstrating its diverse applications across different verticals [00:12:00].
Future of AI Video Technology
Synchronous Generation and Streaming
Haen has a beta version of “interactive avatars” that can attend Zoom meetings and interact in real-time [00:19:15]. The main technical challenge for synchronous generation is optimizing inference speed as models become larger and more complex [00:19:42]. Joshua is optimistic that real-time AI video technology will be available within 12 months, possibly even running on devices [00:20:46]. This could unlock new use cases, such as personalized video advertisements tailored to individual user preferences and watch history [00:20:17].
Full Body Movement
The importance of full-body rendering and gestures for engaging human presenters is a key focus [00:21:10]. Haen’s 3.0 avatar version offers full-body rendering, with gesture integration as the next step [00:21:32]. Challenges include a lack of sufficient data and finding the right model architecture to capture complex body movements [00:21:50].
Intersection with Text-to-Video Models
Haen primarily focuses on business videos, prioritizing control, consistency, and quality [00:22:30]. While some text-to-video technologies generate video pixel by pixel, Haen believes in an orchestration engine approach [00:22:50]. This involves combining text, script, voice, sound, music, avatar footage, and background generation to deliver a more controlled and consistent output for businesses [00:23:00]. Haen plans to work closely with text-to-video partners, integrating their capabilities as building blocks within Haen’s broader orchestration engine [00:24:00].
Brand Personalization
A significant future area for video is brand personalization, allowing AI to learn a company’s brand tone, context, history, and product details from prompts or existing content [00:25:03]. The AI model could learn color palettes, styles, and common video elements (like opening clips) by analyzing past company videos or URLs, then bake these elements into newly generated videos [00:25:51]. This involves disassembling video into components and reassembling them with brand-specific elements, similar to how large language models use a context window [00:25:32].
Impact of AI Advancements on Business Models
AI’s impact on business models is evident in the shift from traditional software companies, where marginal cost is near zero, to AI companies, where GPU consumption incurs significant marginal costs [00:37:50]. However, AI also increases the efficiency of individual employees [00:38:24]. The growth trajectory of AI companies is “insane,” as exemplified by ChatGPT reaching 100 million users extremely fast [00:39:10]. Surprisingly, building a great AI company may require less capital than anticipated due to accelerated go-to-market strategies and efficiency gains [00:39:36]. Haen proactively designs products based on anticipated model capabilities and costs 12 months in advance [00:40:51].
Competition with Incumbents
Haen’s strategy is to build for a new market of users (content people) who don’t have access to cameras or sophisticated editing tools, rather than directly competing with platforms like Snap or TikTok that cater to professional video editors [00:28:02]. These incumbents, focused on mobile camera-based content and creators, will face a dilemma as AI-generated content becomes more prevalent [00:29:12]. If AI-generated content grows to 50% of platform content, existing human creators might see significantly reduced views, potentially leading to the emergence of new platforms specifically for AI-generated content [00:30:15].
Enterprise Adoption and Integration
For enterprise customers, quality requirements are much higher, especially concerning brand consistency [00:31:41]. Integration into existing workflows is crucial [00:32:07]. For marketing use cases, integration with CRMs and go-to-market tools is vital, as demonstrated by Haen’s partnership and app integration with HubSpot [00:32:25].
Data Privacy and Ethical Considerations in Generative AI
Trust and safety are critical for Haen’s business, especially when serving large enterprise customers [00:33:45]. Key measures include:
- Avatar Creation: Requires a video consent format to ensure the person creating the avatar is the same as in the footage [00:34:11]. Dynamic generated passwords with short expiry times (10-15 seconds) add a secure layer to prevent unauthorized avatar creation [00:34:25].
- Content Moderation: A hybrid approach using AI models and a human moderation team reviews all content to ensure compliance with policies against hate speech, misinformation, fraud, and political campaigns [00:35:01].
Haen also engages in partnerships with existing actors to create stock avatars [00:35:49]. The ability to generate new voices and persons through AI could lead to the creation of future intellectual property (IP), particularly with models that persist consistency across different generations, opening possibilities for “AI influencers” [00:36:10].
The Future of Video Creation Workflows
By 2030, Joshua envisions a world where “everyone will have their video agency on their pocket” [00:48:30]. This means interacting with products like Haen as if talking to a personal video agency, from ideation to filming (generating footage), editing, and iterative feedback [00:47:56]. AI will enable faster and cheaper generation of any text, audio, or video content [00:49:16]. The true power of creative tools lies in their ability to lower creation barriers, which will unlock entirely new use cases, similar to how mobile cameras led to platforms like Instagram, Snapchat, and TikTok [00:49:52].