Design considerations for financial AI tools

From: aidotengineer

Brightwave, founded by Mike Con, develops a research agent designed to digest vast corpuses of content within the financial domain [00:00:19]. This technology addresses the significant data processing challenges faced by finance professionals [00:01:34].

Addressing Financial Data Processing Challenges

Financial tasks, such as due diligence in competitive deal processes, require quickly getting to conviction ahead of other teams and spotting critical risk factors from thousands of pages of content [00:00:27]. Mutual fund analysts during earnings season face a universal coverage of 80-120 names, involving calls, transcripts, and filings, making it a non-trivial problem to understand market dynamics at both sector and individual ticker levels [00:00:47]. Similarly, in confirmatory diligence, reviewing hundreds of vendor contracts to spot early termination clauses or understand thematic negotiation patterns is “frankly not a human level intelligence task” [00:01:10]. Junior analysts are often put in a “meat grinder” tasked to do the impossible on extremely tight deadlines [00:01:34]. This highlights the challenges in data processing for finance professionals.

The transition from manual financial work to computational tools can be paralleled with the advent of spreadsheets in 1978 [00:02:22]. Initially, running numbers was cognitively demanding and time-intensive [00:02:30]. Now, tools allow for a substantially increased sophistication of thought, enabling greater efficiency improvements with AI in financial analysis [00:02:50]. AI systems, including knowledge agents, can digest volumes of content and accelerate meaningful work by orders of magnitude, improving efficiency and time-to-value [00:03:03].

Core Design Challenges for Financial AI Agents

A primary design challenge for AI agents in finance is revealing the thought process of a system that has considered thousands of pages of content in a useful and legible way [00:03:40]. This is a new product architecture problem that did not exist previously [00:03:54].

Limitations of Current Models and Approaches

Beyond Chat: While chat interfaces are a common focus, they are likely “not enough” for complex financial analysis [00:04:01].
Non-Reasoning Models: Current non-reasoning models perform “greedy local search,” leading to fidelity issues [00:04:10]. A 5-10% error rate in extracting information, when chained, can exponentially increase the likelihood of errors [00:04:24]. The ideal winning systems will perform end-to-end Reinforcement Learning (RL) over tool use calls, allowing for locally suboptimal decisions to achieve globally optimal outputs [00:04:36]. However, intelligently availing oneself of tools to achieve globally optimal outputs remains an open research problem [00:05:06].
User Skill vs. Product Scaffolding: Users are unlikely to become prompting experts, a skill that can take over a thousand hours to develop [00:07:24]. Therefore, the “scaffolding that products put in place” to orchestrate workflows and shape system behavior is crucial. Verticalized product workflows, which specify intent and offload the burden from the user, are likely to be enduring [00:07:31].

Archetypal Design Patterns for Autonomous Agents

To ensure effective design of AI in products, autonomous agents should mimic the human decision-making process [00:08:00]:

Assess Content: Identify relevant document sets from SEC filings, earnings call transcripts, knowledge graphs, or news [00:08:18].
Distill Findings: Extract findings that substantiate hypotheses or investment theses from these documents [00:08:32].
Enrich and Error Correct: This step is extremely powerful. Models can self-correct by being asked if a finding is factually entailed by a document or if an entity is indeed an organization [00:09:12]. It’s more effective to do this as a secondary call rather than within the same Chain of Thought [00:09:41].
Synthesis: Weave together fact patterns from many documents into a coherent narrative [00:09:55].
Human Oversight: A “control loop” with human oversight is critical for proactive AI systems [00:10:04]. The ability to “nudge” the model with directives or to select an interesting thread to pull is vital because human analysts have access to non-digitized information, such as conversations with management or portfolio manager insights [00:10:09]. This “taste making” is where the most powerful products will lean [00:10:31].

System Architecture and Tool Selection

Avoid Anthropomorphism: Overly anthropomorphizing systems (e.g., “portfolio manager agent,” “fact checker”) constrains flexibility if the design needs of the compute graph change [00:10:46].
Unix Philosophy: Adopt the Unix philosophy of simple tools that do one thing well and work together, with text as a universal interface [00:11:01].
Price-Performance Frontier: The efficiency frontier for compute and performance/price trade-off will continue to move [00:11:32]. This necessitates careful selection of which tool, system, or model to use for each node in the compute graph [00:11:45].

The Latency Trap and User Experience

A significant consideration for building AI systems is the “latency trap” [00:11:59]. For agentic systems, it’s easy to assume high quality outputs in a short time, but often, the quality is uncertain [00:12:17]. The impulse response for the user—how quickly they receive feedback and refine their mental model of the system—is crucial [00:12:31]. If the feedback loop is 8-20 minutes, users won’t interact enough to develop fluency with the system, leading to low user faculty [00:12:49].

Challenges and Solutions in Synthesis

Synthesis is where much of the “magic happens” in these systems [00:13:02].

Output Length Limitations: Models struggle to produce very long, coherent, novel outputs (e.g., 50,000 tokens) because instruction tuning demonstrations have a characteristic short output length [00:13:23]. Current state-of-the-art models like A1 still typically produce only 2,000-3,000 tokens [00:13:42].
Decomposition of Instructions: With large input context windows, information is compressed into a set of tokens [00:13:52]. To get higher quality, more information-dense outputs, research instructions should be decomposed into multiple sub-themes [00:14:27]. This allows for more focused and specific responses.
Recombinative Reasoning: The presence of recombinative reasoning demonstrations in training corpuses is low [00:14:43]. While models can internalize a single document and create new content from it, it’s challenging to get high-quality, intelligent analysis from weaving together disparate fact patterns from multiple documents [00:15:05].
Complex Real-World Situations: Models have practical limitations in managing complex real-world situations, such as temporality (e.g., understanding changes from mergers and acquisitions) [00:15:47]. It is crucial to propagate evidentiary passages with metadata that contextualizes findings [00:16:11].

Brightwave’s Approach to Revealing Thought Processes

Brightwave aims to present AI-generated insights as an “interactive surface” rather than just a static report [00:16:45].

Details on Demand: The ability to give users details on demand is extremely important [00:17:30].
Interactive Citations: Users can click on a citation to get additional context, including not just the source document but also “what the model was thinking” [00:17:40].
Structured Interactive Outputs: These allow users to “pull the thread” and inquire further about specific findings, like rising capital expenditure [00:17:52].
Highlighting and Interrogation: Any passage of text can be highlighted, and users can ask for implications or additional details [00:18:00].
High-Dimensional Data Structure: The model’s discoveries from reading documents are treated as a high-dimensional data structure, with the report being one view [00:18:45].
Audit Trail: Especially in finance, being able to see “the receipts” or the audit trail for the system’s analysis is crucial [00:19:00]. This includes laying out all findings (e.g., fundraising timeline, ongoing litigation) [00:19:06].
Magnifying Glass for Text: The ability to drill in and get additional details on demand, like a “magnifying glass for text,” is extremely important [00:19:39].

The final form factor for this class of products has not yet been determined, presenting an “extremely interesting design problem” [00:19:47].

Tubegraph

Explorer

Table of Contents