From: aidotengineer
This article explores the journey of building LinkedIn’s Generative AI (GenAI) platform, focusing on the evolution and application of multiagent systems within the company’s products and infrastructure [00:00:19]. It details the progression from simple GenAI features to complex multiagent systems and the critical platform components required to support them [00:00:45].
LinkedIn’s GenAI Product Evolution
LinkedIn’s GenAI journey began in 2023 with the launch of its first formal GenAI feature, collaborative articles [00:01:29]. This feature was a straightforward prompt-in, string-out application leveraging the GPT-4 model to create long-form articles [00:01:42]. Initially, the supporting infrastructure included a gateway for centralized model access and Python notebooks for prompt engineering [00:02:00]. At this stage, two different tech stacks were used: Java for the online phase and Python for the backend [00:02:13]. This initial setup was not considered a full platform [00:02:24].
Second Generation: Co-Pilot or Coach
By mid-2023, LinkedIn realized the limitations of the simple approach, particularly its inability to inject rich data into the product experience [00:02:30]. This led to the development of the second generation of GenAI products, internally referred to as co-pilot or coach [00:02:42]. An example is a feature that analyzes a user’s profile and a job description, using a RAG (Retrieval Augmented Generation) process to provide personalized recommendations on job fit [00:02:56].
At this stage, platform capabilities began to emerge [00:03:13]. Key developments included:
- Python SDK: A Python SDK was built on top of the popular LangChain framework to orchestrate LLM calls and integrate with LinkedIn’s large-scale infrastructure [00:03:16]. This allowed developers to easily assemble applications [00:03:35].
- Unified Tech Stack: The company unified its tech stack, primarily to Python, realizing the cost and error potential of transferring Python prompts to Java [00:03:38].
- Prompt Management: Investment began in prompt management, or “prompt source of truth,” as a sub-module to help developers version their prompts and provide structure to meta-prompts [00:03:51].
- Conversational Memory: A critical infrastructure piece, conversational memory, was introduced to track LLM interactions and retrieval content, injecting it into the final product to enable conversational bots [00:04:08].
Multiagent Systems and Applications
In the last year, LinkedIn launched its first “real” multiagent system called the LinkedIn AI Assistant [00:04:33]. This multiagent system is designed to assist recruiters, automating tedious tasks such as posting jobs, evaluating candidates, and reaching out to them [00:04:42].
The GenAI platform further evolved to support multiagent systems [00:05:07]. Key advancements included:
- Distributed Agent Orchestration Layer: The Python SDK was extended into a large-scale distributed agent orchestration layer [00:05:11]. This layer handles distributed agent execution and complex scenarios like retry logic and traffic shifts [00:05:21].
- Skill Registry: Recognizing that skills (APIs) are a key aspect for agents to perform actions, LinkedIn invested in a skill registry [00:05:36]. This registry provides tools for developers to publish their APIs, facilitating skill discovery and invocation within applications [00:05:48].
- Experiential Memory: Beyond conversational memory, the platform introduced experiential memory [00:06:14]. This memory storage extracts, analyzes, and infers tacit knowledge from interactions between agents and users [00:06:21]. Memories are organized into different layers, including working, long-term, and collective memories, to enhance agent awareness [00:06:35].
- Operability: As agents are autonomous and can decide which APIs or LLMs to call, predicting their behavior is challenging [00:06:50]. To address this, LinkedIn invested in operability, building an in-house solution on OpenTelemetry to track low-level telemetry data [00:07:08]. This data allows for replaying agent calls and an analytics layer guides future optimization of agent systems [00:07:24].
Components of the GenAI Platform for Multiagent Systems
The LinkedIn GenAI platform, particularly for multiagent systems, can be classified into four key layers [00:07:39]:
- Orchestration [00:07:44]
- Prompt Engineering [00:07:47]
- Tools and Skills Invocation [00:07:48]
- Content and Memory Management [00:07:50]
The platform provides a unified interface for a complex GenAI ecosystem, abstracting away the underlying complexities of modeling, responsible AI, and machine learning infrastructure [00:08:20]. For instance, developers can switch between OpenAI models and internal models by changing a single parameter in one line of code [00:08:50]. This centralized platform also enforces best practices and governance, ensuring efficient and responsible application development [00:09:12].
The Criticality of a Dedicated Platform for Agent Systems
A dedicated platform for multiagent systems is considered critical because GenAI represents a fundamentally different AI system compared to traditional ones [00:09:55]. In traditional AI, there’s a clear separation between model optimization and model serving, allowing AI engineers and product engineers to work on different tech stacks [00:10:04]. However, in GenAI systems, this line blurs, with everyone becoming an engineer who can optimize overall system performance [00:10:24].
GenAI and agent systems are viewed as “compound AI systems” [00:10:49].
“A compound AI system can be defined as a system which tackles AI tasks using multiple interacting components including multiple cost to model retrievers or external tools.” [00:10:55]
This necessitates a platform to bridge the skill gaps between AI engineers and product engineers [00:11:10].
Building and Scaling Multiagent Systems
Talent Acquisition and Team Building
Hiring for developing and optimizing AI agents requires a unique blend of skills [00:11:39]. Ideal candidates are strong software engineers capable of infrastructure integration, with good developer product management (PM) skills for interface design, and an AI/data science background to understand the latest techniques [00:11:58]. They must be hands-on and adaptable to new techniques [00:12:19].
Realistically, trade-offs are made in hiring, following principles like:
- Prioritizing Software Engineering: Stronger software engineering skills are prioritized over AI expertise [00:12:47].
- Hiring for Potential: Given the rapid evolution of the field, hiring for potential rather than outdated experience or degrees is crucial [00:13:03].
- Diversified Teams: Instead of finding a single “unicorn” engineer, building a diversified team with full-stack software engineers, data scientists, AI engineers, data engineers, fresh graduates, and startup veterans has proven effective [00:13:15]. Collaboration helps team members pick up new skills and grow into ideal candidates [00:13:50].
- Critical Thinking: The team consistently evaluates the latest open-source packages, engages with vendors, and proactively deprecates solutions, as technologies can become outdated within a year or less [00:14:06].
Key Technical Takeaways for Multiagent Systems
- Python for Tech Stack: Python is strongly recommended due to its prevalence in research and open-source communities, and its proven scalability [00:14:37].
- Prompt Source of Truth: A robust system for version controlling prompts is critical for operational stability, similar to managing traditional model parameters [00:15:03].
- Memory Management: Memory is a key component for injecting rich data into the agent experience [00:15:26].
- API Uplifting to Skills: In the agent era, uplifting existing APIs into easily callable skills for agents is a new and crucial component, requiring supporting tooling and infrastructure [00:15:42].
Scaling and Adoption Strategies
To scale and ensure adoption of multiagent systems and platforms:
- Solve Immediate Needs: Start by solving immediate needs rather than attempting to build a full-fledged platform from the outset [00:16:04]. For example, LinkedIn began with a simple Python library for orchestration, which then grew into more components [00:16:15].
- Focus on Infrastructure and Scalability: Leverage existing robust infrastructure, such as LinkedIn’s messaging infrastructure for a memory layer, to ensure cost-efficiency and scalability [00:16:29].
- Prioritize Developer Experience: The platform’s primary goal is to help developers be productive [00:16:46]. Aligning technology with developers’ existing workflows is key to ease adoption and success [00:16:59].
More technical details on LinkedIn’s GenAI platform journey are available in their engineering blog posts [00:17:12].