From: redpointai
In the evolving landscape of AI, finetuning and reinforcement techniques are becoming increasingly crucial for enhancing the capabilities, reliability, and domain-specificity of AI models [00:54:55]. These methods allow developers to move beyond rigid, deterministic workflows towards more flexible and intelligent agentic products [00:07:48].
Evolution of AI Model Training
Historically, agentic products in AI often relied on clearly defined workflows with a limited number of tools, typically less than a dozen [00:07:02]. This approach involved orchestrating steps from one point to another in a predetermined sequence [00:07:17]. However, the shift in 2025 has been towards models that can reason through “chain of thought,” calling multiple tools, re-evaluating their approach, and even course-correcting if they go down a wrong path [00:07:32]. This signifies a move away from deterministic workflow building [00:07:50].
Reinforcement finetuning is a key technique enabling this flexibility [00:07:55]. By creating tasks and “graders,” developers can teach the model to find the correct tool-calling path to solve specific problems unique to their domain [00:09:35]. This process allows for “steering the model” in its chain of thought, effectively teaching it how to think about a particular domain, such as legal or medical fields [00:10:06]. Over the course of fine-tuning, models can be steered to produce better and better outputs by cross-referencing their thought processes with known ground truths like medical textbooks [00:11:31].
Customization and Personalization
Finetuning models offers significant opportunities for customization and steerability [00:10:06]. This is particularly evident in the verticalization of models, where they can be trained to perform tasks like those of a legal scholar or a medical doctor [00:10:18]. The ability to add custom information through finetuning can significantly move the needle for specific tasks [00:40:24].
Challenges and Opportunities
Despite the advancements, productizing grading and task generation for domain-specific finetuning remains a significant challenge [00:12:48]. It currently requires substantial iteration and is considered one of the biggest problems to solve in the near future [00:13:03]. The goal is to make the process of evaluating and improving workflows about ten times easier than it is today [00:35:50].
Developers are advised to focus on agent and tool orchestration, which is considered the most important area right now [00:19:45]. Effectively orchestrating tools, meticulously observing traces, and prompt engineering are crucial for making models work effectively [00:20:46]. Splitting tasks among multiple agents also simplifies debugging workflows, as changes to individual agents have a smaller “blast radius” [00:21:10].
Future Outlook
Future advancements in models, particularly in reinforcement finetuning, are expected to make agents even more useful and reliable [00:10:01]. A significant “unlock” will be the ability to expose models to hundreds of tools, allowing them to figure out the right ones to call and utilize them effectively, removing the current 10-15 tool constraint [00:08:05]. Additionally, increasing the available runtime for models from minutes to hours or even days will yield more powerful results [00:09:02].
There is also excitement about developing smaller, faster models that are particularly good at tool use and can be easily fine-tuned for specific classification and guardrailing tasks [00:32:53]. These “workhorse” or “supporting” models could efficiently handle quick classifications, freeing up larger models for more complex reasoning [00:33:05].
The overall sentiment is that model progress will accelerate, driven by a feedback loop where models themselves help in their improvement through better data [00:33:33]. This continuous improvement underscores the importance of making the “flywheel” from evaluation to production to fine-tuning and back again much simpler for developers [00:34:58].