From: aidotengineer

Introduction to Brightwave’s Research Agent

Brightwave, founded by Mike Cohn, specializes in building research agents designed to digest vast amounts of content within the financial domain [00:00:19]. This technology is particularly useful for tasks like due diligence in competitive deal processes, where professionals need to quickly achieve conviction when facing thousands of pages of content [00:00:27]. The goal is to rapidly identify critical risk factors that could diminish asset performance [00:00:40].

Current Challenges in Financial Research

Traditional financial research involves highly demanding and time-intensive tasks for professionals, such as mutual fund analysts reviewing earnings calls, transcripts, and filings for 80-120 companies during earning season [00:00:47]. Identifying early termination clauses in hundreds of vendor contracts during confirmatory diligence is another example of a non-trivial problem [00:01:08]. These tasks are often described as “not a human-level intelligence task” and can put junior analysts in a “meat grinder” due to impossible deadlines [00:01:21].

Transformation of Financial Workflows

The advent of computational tools, similar to how spreadsheets revolutionized accounting in 1978, is increasing the sophistication of thought applied to financial problems [00:02:22]. Knowledge agents, such as Brightwave’s system, can digest large volumes of content and perform meaningful work, accelerating efficiency and time-to-value by orders of magnitude in financial markets [00:03:03].

Technical Insights and Design Considerations

Integrating large language models into financial tools requires careful design. A key challenge is revealing the thought process of a system that has considered 10,000 pages of content in a useful and legible way [00:03:40]. The final form factor for such products is still evolving, with chat interfaces likely being insufficient [00:03:57].

Limitations of Non-Reasoning Models

Non-reasoning models typically perform greedy local search, which can lead to high error rates when chained in successive calls [00:04:10]. For example, if an LLM has a 5-10% error rate extracting organizations from a Reuters article, chaining such calls exponentially increases the likelihood of error [00:04:24].

The Need for End-to-End RL Over Tool Use

Winning systems will perform end-to-end Reinforcement Learning (RL) over tool use calls, where the results of API calls influence subsequent decisions [00:04:36]. This allows models to make locally suboptimal decisions to achieve globally optimal outputs [00:04:47]. However, intelligently availing oneself of tools in this manner remains an open research problem [00:05:04].

Building Practical Products Today

Despite ongoing research, practical product development today focuses on constraining the scope of an agent’s behaviors [00:05:41]. This “regularization parameter” limits the likelihood of the model producing degenerate output [00:05:46]. Effective use of language models to generate text requires skill in “steering” the model through multi-turn conversations, guiding its activations towards solving the problem [00:06:13]. Since most professionals won’t become prompting experts, products must provide scaffolding to orchestrate workflows and shape system behavior [00:07:31]. Verticalized product workflows are likely to endure because they specify intent and reduce the burden on the user [00:07:41].

Mimicking Human Decision-Making

An autonomous agent should mimic the human decision-making process by decomposing tasks. This involves:

  • Looking for public market comparables [00:08:12].
  • Assessing relevant document sets (SEC filings, earnings call transcripts, Knowledge Graphs from past deals, news) [00:08:17].
  • Distilling findings that substantiate hypotheses [00:08:32].
  • Enriching and error-correcting those findings [00:08:45].
  • Asking models to self-correct by verifying factual entailment or organization classification [00:09:21]. It’s often more powerful to perform this as a secondary call rather than within a single Chain of Thought [00:09:41].

Synthesis Across Documents

Synthesis involves weaving together disparate fact patterns from numerous documents into a coherent narrative [00:09:55]. This is analogous to biomedical literature synthesis, where one needs to read many papers and provide useful insights that integrate facts across them [00:15:10]. However, generating high-quality, intelligent analysis from many documents faces practical limitations in even state-of-the-art models [00:15:31]. Complex real-world factors like temporality (e.g., changes due to mergers or contract addendums) are difficult for models to manage [00:15:47].

Human Oversight and Model Nudging

Human oversight is crucial. The ability to “nudge” the model with directives or by selecting interesting threads for it to explore is vital [00:10:04]. Analysts possess information not yet digitized, such as conversations with management or portfolio manager insights, which can guide the model [00:10:20]. Over-anthropomorphizing systems by assigning them roles like “portfolio manager agent” can constrain flexibility [00:10:46].

The Latency Trap

The “latency trap” highlights that long feedback loops hinder user learning and product adoption [00:12:00]. If a user’s prompt results in an 8-minute or 20-minute wait for feedback, their faculty with the system will remain low [00:12:49].

Product Design for Interpretability and Interaction

Effective products aim to reveal the model’s “thought process” on vast datasets. This can be achieved through:

  • Continuous Surface: Providing a dynamic interface rather than static chat [00:16:45].
  • Details on Demand: Allowing users to click citations for additional context, including what the model was “thinking” [00:17:30].
  • Structured Interactive Outputs: Enabling users to “pull the thread” on specific findings, like asking for more details on rising capex spend [00:17:52].
  • Highlighting and Interrogating: Allowing users to highlight any text passage and ask for implications or further information [00:18:00].
  • Audit Trail (“Receipts”): Providing an ability to “turn over that cube” of high-dimensional data and see the underlying findings, such as a fundraising timeline or ongoing litigation [00:18:45]. This acts as a “magnifying glass for text,” enabling human analysts to drill into crucial details that catch their eye [00:19:26].

The final form factor for this class of products is still being determined, representing a significant design problem [00:19:47].