Role of machine learning and language models in finance

Introduction to Brightwave’s Research Agent

Brightwave, founded by Mike Cohn, specializes in building research agents designed to digest vast amounts of content within the financial domain [00:00:19]. This technology is particularly useful for tasks like due diligence in competitive deal processes, where professionals need to quickly achieve conviction when facing thousands of pages of content [00:00:27]. The goal is to rapidly identify critical risk factors that could diminish asset performance [00:00:40].

Current Challenges in Financial Research

Traditional financial research involves highly demanding and time-intensive tasks for professionals, such as mutual fund analysts reviewing earnings calls, transcripts, and filings for 80-120 companies during earning season [00:00:47]. Identifying early termination clauses in hundreds of vendor contracts during confirmatory diligence is another example of a non-trivial problem [00:01:08]. These tasks are often described as “not a human-level intelligence task” and can put junior analysts in a “meat grinder” due to impossible deadlines [00:01:21].

Transformation of Financial Workflows

The advent of computational tools, similar to how spreadsheets revolutionized accounting in 1978, is increasing the sophistication of thought applied to financial problems [00:02:22]. Knowledge agents, such as Brightwave’s system, can digest large volumes of content and perform meaningful work, accelerating efficiency and time-to-value by orders of magnitude in financial markets [00:03:03].

Technical Insights and Design Considerations

Integrating large language models into financial tools requires careful design. A key challenge is revealing the thought process of a system that has considered 10,000 pages of content in a useful and legible way [00:03:40]. The final form factor for such products is still evolving, with chat interfaces likely being insufficient [00:03:57].

Limitations of Non-Reasoning Models

Non-reasoning models typically perform greedy local search, which can lead to high error rates when chained in successive calls [00:04:10]. For example, if an LLM has a 5-10% error rate extracting organizations from a Reuters article, chaining such calls exponentially increases the likelihood of error [00:04:24].

The Need for End-to-End RL Over Tool Use

Winning systems will perform end-to-end Reinforcement Learning (RL) over tool use calls, where the results of API calls influence subsequent decisions [00:04:36]. This allows models to make locally suboptimal decisions to achieve globally optimal outputs [00:04:47]. However, intelligently availing oneself of tools in this manner remains an open research problem [00:05:04].

Building Practical Products Today

Despite ongoing research, practical product development today focuses on constraining the scope of an agent’s behaviors [00:05:41]. This “regularization parameter” limits the likelihood of the model producing degenerate output [00:05:46]. Effective use of language models to generate text requires skill in “steering” the model through multi-turn conversations, guiding its activations towards solving the problem [00:06:13]. Since most professionals won’t become prompting experts, products must provide scaffolding to orchestrate workflows and shape system behavior [00:07:31]. Verticalized product workflows are likely to endure because they specify intent and reduce the burden on the user [00:07:41].

Mimicking Human Decision-Making

An autonomous agent should mimic the human decision-making process by decomposing tasks. This involves:

Looking for public market comparables [00:08:12].
Assessing relevant document sets (SEC filings, earnings call transcripts, Knowledge Graphs from past deals, news) [00:08:17].
Distilling findings that substantiate hypotheses [00:08:32].
Enriching and error-correcting those findings [00:08:45].
Asking models to self-correct by verifying factual entailment or organization classification [00:09:21]. It’s often more powerful to perform this as a secondary call rather than within a single Chain of Thought [00:09:41].

Synthesis Across Documents

Synthesis involves weaving together disparate fact patterns from numerous documents into a coherent narrative [00:09:55]. This is analogous to biomedical literature synthesis, where one needs to read many papers and provide useful insights that integrate facts across them [00:15:10]. However, generating high-quality, intelligent analysis from many documents faces practical limitations in even state-of-the-art models [00:15:31]. Complex real-world factors like temporality (e.g., changes due to mergers or contract addendums) are difficult for models to manage [00:15:47].

Human Oversight and Model Nudging

Human oversight is crucial. The ability to “nudge” the model with directives or by selecting interesting threads for it to explore is vital [00:10:04]. Analysts possess information not yet digitized, such as conversations with management or portfolio manager insights, which can guide the model [00:10:20]. Over-anthropomorphizing systems by assigning them roles like “portfolio manager agent” can constrain flexibility [00:10:46].

The Latency Trap

The “latency trap” highlights that long feedback loops hinder user learning and product adoption [00:12:00]. If a user’s prompt results in an 8-minute or 20-minute wait for feedback, their faculty with the system will remain low [00:12:49].

Product Design for Interpretability and Interaction

Effective products aim to reveal the model’s “thought process” on vast datasets. This can be achieved through:

Continuous Surface: Providing a dynamic interface rather than static chat [00:16:45].
Details on Demand: Allowing users to click citations for additional context, including what the model was “thinking” [00:17:30].
Structured Interactive Outputs: Enabling users to “pull the thread” on specific findings, like asking for more details on rising capex spend [00:17:52].
Highlighting and Interrogating: Allowing users to highlight any text passage and ask for implications or further information [00:18:00].
Audit Trail (“Receipts”): Providing an ability to “turn over that cube” of high-dimensional data and see the underlying findings, such as a fundraising timeline or ongoing litigation [00:18:45]. This acts as a “magnifying glass for text,” enabling human analysts to drill into crucial details that catch their eye [00:19:26].

The final form factor for this class of products is still being determined, representing a significant design problem [00:19:47].

Tubegraph

Explorer

Table of Contents