From: redpointai

AI video creation tools represent a disruptive shift in content production, enabling users to generate, localize, and personalize videos without relying on traditional cameras or extensive post-production [04:09]. Haen, an AI video platform, is at the forefront of this transformation, aiming to make cameras obsolete and democratize visual storytelling [03:30, 04:45, 11:38, 29:30].

Haen: An Overview

Haen is an AI video platform that recently secured 500 million valuation from investors like Benchmark and Thrive [00:04, 00:08]. Its core mission is to enable everyone to create video content, especially those without access to expensive cameras or sophisticated editing software [04:49, 11:40, 29:30].

The company aims to invent a new way of creating video and content, functioning like a “magician” to disrupt old workflows [01:22, 01:25, 01:32].

The “Magic Moment” of AI Video

Haen experienced a viral “magic moment” when its technology was used to dub the speech of the President of Argentina at the World Economic Forum into different languages, gaining attention from figures like Elon Musk [00:51, 01:45]. For Haen’s CEO, Joshua Xu, a personal “magic moment” was creating his own avatar and seeing himself speak on screen [02:36, 02:42]. He now uses his avatar to generate internal product update videos, eliminating the need to film himself [03:04, 03:09, 03:16, 38:43].

Evolution of Video Production with AI

Traditionally, video production involves two main steps: filming with a camera and then post-production editing [04:03, 04:07, 04:14]. Generative AI is changing this by making it possible to:

  • Generate footage using AI instead of filming with a camera, making the process 10 times faster and 10 times cheaper [04:36, 05:06, 05:09].
  • Transform the editing experience, moving away from timeline editors, which were designed to manage expensive camera footage [06:06, 06:15, 06:33]. The future of editing is expected to be vastly different, potentially involving text-to-video generation and new user interfaces [06:40, 07:00, 07:10].

Developing AI Avatars and Models

The primary focus for Haen is AI quality, ensuring that generated footage can effectively replace traditional camera processes [07:47, 07:57]. This involves addressing various aspects beyond mathematical problems in the AI model, such as lighting, natural expressions, body motion, and gestures that match the content [08:21, 08:30, 08:32, 08:39].

Haen evaluates its models by asking if an avatar is “engaging” enough for day-to-day use and effective in delivering a message [09:11, 13:40, 13:56]. The company has a dedicated research team building its avatar models, including full-body rendering [15:11, 15:28].

To create a personalized avatar, users submit a short video (30 seconds to 2 minutes) that allows the AI model to learn their unique talking style, including mouth movements, facial expressions, and gestures [17:04, 17:11, 17:23, 17:31]. Future models aim to capture different “modes” like presentation or interview styles, with adaptive behavior based on the script [17:36, 18:19, 18:23].

Key Use Cases for Haen Today

Haen serves over 40,000 customers across three main use cases [09:37]:

  1. Create: Users can create their own avatars or use stock avatars to generate videos by simply typing text, eliminating the need for a camera [09:48, 09:55, 09:57].
  2. Localize: Existing videos (even non-Haen ones) can be localized into over 175 different languages and dialects, preserving voice tone, facial expression, and lip sync [10:01, 10:07, 10:10, 10:15].
  3. Personalize: A single video can be personalized into more than 100,000 variations based on specific customer needs, industry, or problems, similar to personalizing emails [10:19, 10:22, 10:27, 10:30, 10:34].

Haen is built for the “rest of the 99% of the user who are not professional player,” such as marketers and content creators, who may not have access to expensive cameras or advanced editing skills [11:17, 11:21, 11:29, 11:36].

Technical Challenges and Future Directions

Future advancements in AI in creative workflows for video creation include:

  • Synchronous Generation/Streaming: Haen already offers an interactive avatar beta that can attend Zoom meetings and interact in real-time [19:14, 19:17, 19:26]. The main technical challenge is optimizing inference speed for larger, more complex models to run in real-time, potentially even on-device within 12 months [19:38, 19:42, 20:46, 20:54]. This could enable dynamic, personalized video advertisements based on user preferences and watch history [20:15, 20:19, 20:22].
  • Full Body Movement: While Haen’s latest Avatar 3.0 model offers full-body rendering, incorporating gestures remains a significant challenge due to the need for more data and improved model architecture [21:07, 21:10, 21:26, 21:32, 21:45, 21:50].
  • Integration with Text-to-Video Models: Haen views pure text-to-video models (like Sora and Pika) as complementary [24:00, 24:03]. Haen focuses on business videos, prioritizing control, consistency, and quality, achieved through an “orchestration engine” that combines text, script, voice, music, avatar footage, and background generation [22:26, 23:00, 23:15, 23:17, 23:24, 23:36]. Text-to-video can serve as a component within this broader system [24:07].
  • Brand Personalization Layer: A key future area is enabling AI models to learn a company’s brand tone, style, color palette, and common video elements (like opening/closing clips) from existing content (e.g., URLs or past videos) [25:03, 25:12, 25:17, 25:21, 25:24, 25:51]. This would allow the AI to bake these elements into the final assembled video [26:06]. This is akin to an AI model having a “context window” or “memory” of a brand’s visual identity [26:20, 26:35, 26:38].

Market Dynamics and Competition

Haen operates in a market with impressive incumbents like Snap and TikTok [26:56, 26:59]. However, Haen believes it’s opening up a new market by building tools for content creators who lack access to cameras and sophisticated software [28:02, 28:28, 29:30]. This differentiates it from platforms like TikTok, which are built around creators using mobile cameras [29:09, 29:12].

A potential dilemma for existing platforms is how to balance AI-generated content versus camera-based content [30:00]. If AI-generated content becomes a significant portion of total content, it could directly compete with and suppress the reach of traditional creators, potentially leading to the emergence of new platforms specifically for AI-generated content [30:15, 30:20, 30:32, 30:37]. While Haen’s mission is not to build a consumption platform, it recognizes this as a potential new opportunity [31:10, 31:20].

Enterprise Adoption

Haen has made a significant push into the Enterprise market [31:27]. Key learnings from serving Enterprise customers include:

  • Higher Quality Requirements: Enterprises demand much higher quality and brand consistency in video output [31:41, 31:48, 31:53].
  • Workflow Integration: Integrating Haen’s technology into existing day-to-day workflows is crucial [32:07, 32:13]. This includes integrations with CRM systems and go-to-market tools like HubSpot [32:27, 32:30, 32:33, 32:44].

Trust and Safety

Trust and safety are critical for Haen’s business, especially when serving large Enterprise customers [33:45, 33:52]. Haen implements policies across two main areas:

  1. Avatar Creation: For every avatar created on Haen, a video consent format is required [34:11]. Advanced AI verifies that the person providing consent is the same as the one in the footage. Dynamic passcodes that expire quickly further enhance security, making it “almost impossible” to create someone’s avatar without their consent [34:20, 34:25, 34:30, 34:38].
  2. Video Content Creation: Haen has a platform moderation policy that prohibits hate speech, misinformation, and political campaign content [35:03, 35:05, 35:07, 35:11]. This moderation is a hybrid solution involving both AI model review and a human moderation team [35:15, 35:17, 35:20].

Haen also explores IP partnerships, allowing existing actors to create avatars on the platform [35:30, 35:49]. The potential for generating new AI-generated IPs or AI influencers (like the existing social media persona Lil Miquela) is an exciting future prospect [36:13, 36:16, 36:27, 36:33, 36:44, 36:48, 36:56].

Business Model and Capital Intensity

While GPU and talent are significant cost factors for AI companies, Haen’s CEO notes that the financial model differs from traditional software companies [37:32, 37:34, 37:44]. Unlike software with near-zero marginal cost, AI incurs GPU compute costs for every additional customer [37:56, 38:00, 38:08].

However, AI-native companies and their teams are becoming much more efficient, leveraging tools like ChatGPT [38:24, 38:28, 38:30, 39:00]. This acceleration in efficiency and market excitement means that AI companies surprisingly “require less capital to build a great AI company” [39:27, 39:36, 39:39]. Haen operates with a free tier, allowing users to discover the “magic moments” of the product despite inference costs [39:43, 39:47, 39:53]. The company consistently builds products 12 months ahead of current inference costs, anticipating future technical capabilities and cost reductions [40:51, 40:53, 41:00, 41:04, 41:11, 41:19].

Future of Video Creation Workflows

Looking to 2030, the vision for video creation is that “everybody will have their video agency on their pocket” [48:30, 48:33]. This means AI tools like Haen will enable users to interact with a product as if conversing with a personal video agency, streamlining the entire process from ideation to final video [47:59, 48:08, 48:41, 48:44].

Just as mobile cameras led to new content platforms like Instagram, Snapchat, and TikTok, AI video creation tools are expected to open up entirely new use cases and forms of content that are currently unimaginable [49:29, 49:32, 49:35, 49:41, 49:52, 49:56, 50:00]. By improving tools and lowering the barrier to creation, a “whole new world” of visual storytelling will emerge [49:56, 49:58, 50:01].