From: aidotengineer

This article explores the journey of building LinkedIn’s Generative AI (GenAI) platform, focusing on the evolution and application of multiagent systems within the company’s products and infrastructure [00:00:19]. It details the progression from simple GenAI features to complex multiagent systems and the critical platform components required to support them [00:00:45].

LinkedIn’s GenAI Product Evolution

LinkedIn’s GenAI journey began in 2023 with the launch of its first formal GenAI feature, collaborative articles [00:01:29]. This feature was a straightforward prompt-in, string-out application leveraging the GPT-4 model to create long-form articles [00:01:42]. Initially, the supporting infrastructure included a gateway for centralized model access and Python notebooks for prompt engineering [00:02:00]. At this stage, two different tech stacks were used: Java for the online phase and Python for the backend [00:02:13]. This initial setup was not considered a full platform [00:02:24].

Second Generation: Co-Pilot or Coach

By mid-2023, LinkedIn realized the limitations of the simple approach, particularly its inability to inject rich data into the product experience [00:02:30]. This led to the development of the second generation of GenAI products, internally referred to as co-pilot or coach [00:02:42]. An example is a feature that analyzes a user’s profile and a job description, using a RAG (Retrieval Augmented Generation) process to provide personalized recommendations on job fit [00:02:56].

At this stage, platform capabilities began to emerge [00:03:13]. Key developments included:

  • Python SDK: A Python SDK was built on top of the popular LangChain framework to orchestrate LLM calls and integrate with LinkedIn’s large-scale infrastructure [00:03:16]. This allowed developers to easily assemble applications [00:03:35].
  • Unified Tech Stack: The company unified its tech stack, primarily to Python, realizing the cost and error potential of transferring Python prompts to Java [00:03:38].
  • Prompt Management: Investment began in prompt management, or “prompt source of truth,” as a sub-module to help developers version their prompts and provide structure to meta-prompts [00:03:51].
  • Conversational Memory: A critical infrastructure piece, conversational memory, was introduced to track LLM interactions and retrieval content, injecting it into the final product to enable conversational bots [00:04:08].

Multiagent Systems and Applications

In the last year, LinkedIn launched its first “real” multiagent system called the LinkedIn AI Assistant [00:04:33]. This multiagent system is designed to assist recruiters, automating tedious tasks such as posting jobs, evaluating candidates, and reaching out to them [00:04:42].

The GenAI platform further evolved to support multiagent systems [00:05:07]. Key advancements included:

  • Distributed Agent Orchestration Layer: The Python SDK was extended into a large-scale distributed agent orchestration layer [00:05:11]. This layer handles distributed agent execution and complex scenarios like retry logic and traffic shifts [00:05:21].
  • Skill Registry: Recognizing that skills (APIs) are a key aspect for agents to perform actions, LinkedIn invested in a skill registry [00:05:36]. This registry provides tools for developers to publish their APIs, facilitating skill discovery and invocation within applications [00:05:48].
  • Experiential Memory: Beyond conversational memory, the platform introduced experiential memory [00:06:14]. This memory storage extracts, analyzes, and infers tacit knowledge from interactions between agents and users [00:06:21]. Memories are organized into different layers, including working, long-term, and collective memories, to enhance agent awareness [00:06:35].
  • Operability: As agents are autonomous and can decide which APIs or LLMs to call, predicting their behavior is challenging [00:06:50]. To address this, LinkedIn invested in operability, building an in-house solution on OpenTelemetry to track low-level telemetry data [00:07:08]. This data allows for replaying agent calls and an analytics layer guides future optimization of agent systems [00:07:24].

Components of the GenAI Platform for Multiagent Systems

The LinkedIn GenAI platform, particularly for multiagent systems, can be classified into four key layers [00:07:39]:

  1. Orchestration [00:07:44]
  2. Prompt Engineering [00:07:47]
  3. Tools and Skills Invocation [00:07:48]
  4. Content and Memory Management [00:07:50]

The platform provides a unified interface for a complex GenAI ecosystem, abstracting away the underlying complexities of modeling, responsible AI, and machine learning infrastructure [00:08:20]. For instance, developers can switch between OpenAI models and internal models by changing a single parameter in one line of code [00:08:50]. This centralized platform also enforces best practices and governance, ensuring efficient and responsible application development [00:09:12].

The Criticality of a Dedicated Platform for Agent Systems

A dedicated platform for multiagent systems is considered critical because GenAI represents a fundamentally different AI system compared to traditional ones [00:09:55]. In traditional AI, there’s a clear separation between model optimization and model serving, allowing AI engineers and product engineers to work on different tech stacks [00:10:04]. However, in GenAI systems, this line blurs, with everyone becoming an engineer who can optimize overall system performance [00:10:24].

GenAI and agent systems are viewed as “compound AI systems” [00:10:49].

“A compound AI system can be defined as a system which tackles AI tasks using multiple interacting components including multiple cost to model retrievers or external tools.” [00:10:55]

This necessitates a platform to bridge the skill gaps between AI engineers and product engineers [00:11:10].

Building and Scaling Multiagent Systems

Talent Acquisition and Team Building

Hiring for developing and optimizing AI agents requires a unique blend of skills [00:11:39]. Ideal candidates are strong software engineers capable of infrastructure integration, with good developer product management (PM) skills for interface design, and an AI/data science background to understand the latest techniques [00:11:58]. They must be hands-on and adaptable to new techniques [00:12:19].

Realistically, trade-offs are made in hiring, following principles like:

  • Prioritizing Software Engineering: Stronger software engineering skills are prioritized over AI expertise [00:12:47].
  • Hiring for Potential: Given the rapid evolution of the field, hiring for potential rather than outdated experience or degrees is crucial [00:13:03].
  • Diversified Teams: Instead of finding a single “unicorn” engineer, building a diversified team with full-stack software engineers, data scientists, AI engineers, data engineers, fresh graduates, and startup veterans has proven effective [00:13:15]. Collaboration helps team members pick up new skills and grow into ideal candidates [00:13:50].
  • Critical Thinking: The team consistently evaluates the latest open-source packages, engages with vendors, and proactively deprecates solutions, as technologies can become outdated within a year or less [00:14:06].

Key Technical Takeaways for Multiagent Systems

  • Python for Tech Stack: Python is strongly recommended due to its prevalence in research and open-source communities, and its proven scalability [00:14:37].
  • Prompt Source of Truth: A robust system for version controlling prompts is critical for operational stability, similar to managing traditional model parameters [00:15:03].
  • Memory Management: Memory is a key component for injecting rich data into the agent experience [00:15:26].
  • API Uplifting to Skills: In the agent era, uplifting existing APIs into easily callable skills for agents is a new and crucial component, requiring supporting tooling and infrastructure [00:15:42].

Scaling and Adoption Strategies

To scale and ensure adoption of multiagent systems and platforms:

  • Solve Immediate Needs: Start by solving immediate needs rather than attempting to build a full-fledged platform from the outset [00:16:04]. For example, LinkedIn began with a simple Python library for orchestration, which then grew into more components [00:16:15].
  • Focus on Infrastructure and Scalability: Leverage existing robust infrastructure, such as LinkedIn’s messaging infrastructure for a memory layer, to ensure cost-efficiency and scalability [00:16:29].
  • Prioritize Developer Experience: The platform’s primary goal is to help developers be productive [00:16:46]. Aligning technology with developers’ existing workflows is key to ease adoption and success [00:16:59].

More technical details on LinkedIn’s GenAI platform journey are available in their engineering blog posts [00:17:12].