Challenges in synthesizing vast financial data

From: aidotengineer

Brightwave, a company specializing in research agents for the financial domain, focuses on solving the challenge of synthesizing vast amounts of content to aid financial professionals [00:00:19]. This includes tasks such as due diligence, competitive deal processes, mutual fund analysis during earnings season, and reviewing hundreds of vendor contracts [00:00:27]. These tasks are described as “non-trivial” and frankly, “not a human level intelligence task” [00:01:24], often placing junior analysts in a “meat grinder” with impossible deadlines [00:01:34].

The Need for AI in Financial Data Synthesis

The role of an individual in finance workflows and financial research today is compared to an accountant in 1978 before computational spreadsheets [00:02:11]. Previously, “running the numbers” was a cognitively demanding, time-intensive task done by hand [00:02:30]. With the advent of spreadsheets, the sophistication of thought applied to financial problems increased substantially [00:02:50].

Similarly, systems like Brightwave, or any class of knowledge agents, can digest volumes of content and perform meaningful work that accelerates efficiency and time to value by orders of magnitude [00:03:03]. These AI tools are designed to handle tasks that are beyond human capacity [00:01:24].

Core Challenges in Building AI Applications for Data Synthesis

Building effective AI systems for synthesizing vast financial data presents several complex challenges and solutions in AI driven data processing.

Technical and Model Limitations

Error Propagation: Non-reasoning models often perform “greedy local search,” meaning a small error rate (e.g., 5-10%) in a single call can introduce exponential likelihood of error when calls are chained together [00:04:10].
Global Optimality: Achieving “end to end RL over tool use calls” where API call results influence subsequent decisions for globally optimal outputs is “still an open research problem” [00:04:36]. This includes intelligently availing oneself of knowledge graphs [00:04:55].
Output Length Constraints: Despite claims of large context output lengths, current models typically produce shorter coherent outputs (e.g., A1 is around 2,000-3,000 tokens, better than 40) [00:13:10]. It is difficult for models to produce tens of thousands of coherent novel words [00:13:33].
Compression Problem: A large input context window compresses information into a smaller set of output tokens. To get high-fidelity, information-dense outputs, research instructions must be decomposed into multiple sub-themes [00:13:52].
Lack of Combinative Reasoning: The training corpuses for instruction tuning and post-training have a low presence of combinative reasoning demonstrations [00:14:40]. It’s easy to ask a model to write an epilogue based on a single book, but it’s much harder to get high-quality, thoughtful analysis by weaving together disparate fact patterns from many documents, like in biomedical literature synthesis [00:15:05].
Real-World Complexity: Even state-of-the-art models struggle with complex real-world situations, including factors like temporality (e.g., understanding changes from mergers and acquisitions or contract addendums) [00:15:47].

Product Design and User Experience Challenges

Revealing Thought Process: A significant design problem is how to reveal the thought process of a system that has considered 10,000 pages of content in a useful and legible way to a human [00:03:40]. Current chat interfaces are likely “not enough” [00:04:05].
User Prompting Expertise: People generally do not want to become deep prompting experts, which can take “easily a thousand hours” to master [00:07:22]. Products need to provide scaffolding to orchestrate workflows and shape system behavior, with “verticalized product workflows” enduring because they specify intent and offload complexity from the user [00:07:31].
The Latency Trap: If the feedback loop for a user’s prompt is too long (e.g., 8-20 minutes), they won’t perform many interactions in a day, which hinders their ability to develop faculty with the system and refine their mental model of its behavior [00:12:50].
Anthropomorphizing Systems: Avoid “needless anthropomorphizing” of AI systems (e.g., “portfolio manager agent,” “fact checker”) as it constrains flexibility if compute graph design needs change [00:10:46]. Instead, focus on simple tools that do one thing well and work together, akin to the Unix philosophy [00:11:01].
Maintaining Human Oversight: Human oversight is “extremely important,” allowing users to nudge the model with directives or select interesting threads to explore [00:10:04]. The human analyst will always have access to non-digitized information, like conversations with management or insights from portfolio managers, which is where “taste making” comes into play [00:10:20].

Designing Autonomous Agents

Autonomous agents should mimic human decision-making processes, decomposing tasks such as:

Identifying relevant public market comparables (e.g., SEC filings, earnings call transcripts) [00:08:00].
Assessing relevant document sets and distilling findings [00:08:30].
Enriching and error-correcting those findings [00:08:44].

A useful design pattern involves asking the model to think aloud, generating intermediary notes about what it believes based on initial findings [00:08:55]. Additionally, models can self-correct by being asked to verify accuracy or factual entailment in a secondary call, as they might be “primed to be credulous” in a single call [00:09:21].

Ultimately, synthesis involves weaving together fact patterns across many documents into a coherent narrative [00:09:55].

Brightwave’s Solutions

Brightwave is building a product that aims to reveal the AI’s thought process when considering vast amounts of text [00:16:35]. Key features include:

Interactive Reports: The ability to click on citations to get additional context, including the document source and the model’s reasoning [00:17:40].
Structured Interactive Outputs: Users can “pull the thread” and ask for more details on any finding, such as “tell me more about that Rising capex spend” [00:17:52].
Details on Demand: Users can highlight any passage of text and ask for implications, effectively using it as a “magnifying glass for text” [00:18:03].
Audit Trail: The system provides an “audit trail” for findings, laying out all discovered information (e.g., fundraising timeline, ongoing litigation) and allowing users to click in for more details on specific points of interest [00:19:00].

The final form factor for this class of products is still evolving, representing an “extremely interesting design problem” [00:19:47].

Tubegraph

Explorer

Table of Contents