Statistical methods in AI and their limitations

From: mk_thisisit

Artificial intelligence (AI) has seen significant development, particularly through the adoption of statistical methods, but this approach comes with inherent limitations and challenges, particularly regarding transparency and verifiability.

The Evolution of AI: From Logic to Statistics

Early developments in artificial intelligence aimed to automate theorem proving using mathematical methods [03:46]. Professor Simon, a Nobel Prize winner, was a co-creator of these mathematical methods for intelligence [03:42]. However, researchers found that purely mathematical methods were difficult to apply in practice and had practical limitations [04:24].

This led to a significant shift towards statistics, which became a convenient tool to systematically grasp the uncertain world [04:32]. While many find statistics unnatural, it is an important tool for this purpose [04:47]. This statistical turn allowed for many applications, which are still growing in scope [05:10].

Current State and Limitations of Current AI Systems

Despite advancements, particularly with models like ChatGPT, current AI systems are fundamentally based on statistics [05:28]. While some perceive a form of “own reasoning” or ability to infer things not directly stated, it is described as still being “only statistics” [05:28]. Large language models frequently make mistakes, including “hallucinations,” and can be fooled with minimal effort, indicating that the work is not yet finished [06:42]. These imperfections in AI’s behavior are compared to human imperfections, as people also make mistakes and talk nonsense [07:01].

The “Black Box” Problem and Verifiability

A major limitation of current statistical AI is the “black box” problem [18:13]. It is often unclear how these models arrive at their decisions [18:10]. This lack of transparency leads to questions of legal liability if an AI system fails [17:08]. For example, in medical contexts, a doctor might be asked to trust an AI recommending a drastic procedure without understanding why [18:13].

Even when models provide confidence levels (e.g., “95% convinced”), an intelligent user would question the remaining percentage, for which statistics do not provide good answers [19:06]. As models become more complex with more parameters, this “black box” area, including hallucinations, will only expand [19:15]. The scientific community is currently taking the risk that users unknowingly accept these limitations when using tools like ChatGPT [19:23].

Data Dependence and Synthetic Data

The effectiveness of these methods is directly tied to the data on which they are trained [12:43]. Currently, these data are limited [12:48]. An alternative scenario involves AI generating its own data for training and self-checking [12:53]. This concept, known as synthetic data generation, has been practiced for years, especially in mathematical simulations, to avoid expensive and dangerous real-world experiments [13:17]. The Monte Carlo method, invented by Polish mathematician Stanisław Ulam, is a basis for this [13:48].

However, a “strange” aspect arises when algorithms create data, which then feeds other algorithms that make decisions impacting human lives [13:35].

Addressing Challenges and Future Directions

To overcome these limitations, there’s a need to develop a “protective layer” around AI technology, focusing on verifying the truthfulness of decisions, rather than just regulation [19:33]. One proposed solution is to return to the roots of AI: mathematical logic [20:27]. This approach would use mathematical logic to analyze complex models and mathematically prove their correctness in certain contexts [20:31]. This is seen as essential for AI to be truly recognized as a helpful and verifiable technology [21:03].

Distributed Artificial Intelligence

A new paradigm, distributed artificial intelligence, is being developed to address practical inefficiencies of current AI, especially in large organizations with scattered data resources [23:17]. The traditional method of collecting massive datasets into one place, harmonizing them, building a model, and then exporting it for use, is proving impractical and too slow, especially in rapidly changing situations [24:15].

Instead, distributed AI, or federated learning, teaches models on data where it resides, and then shares the smaller models or insights, rather than large amounts of raw data [25:31]. This approach bypasses issues of data sharing restrictions (e.g., patient privacy in hospitals) and the need for database compatibility [23:59]. This allows models to learn locally and remain compatible with each other, making practical sense [26:45]. This project is currently in the proof-of-concept phase, with practical solutions expected in the near future [26:55].

The “Automatic Scientist”

A competition was recently announced by major U.S. funding sources for basic research projects to build an “automatic scientist” [07:36]. This envisioned AI would be able to prove mathematical theorems [07:58], discover new chemical compounds, and uncover fundamental laws from large datasets and experiments [08:10]. This raises a fundamental question about the future role of human scientists, as an AI capable of discovering scientific laws would question the very existence and purpose of human researchers and funding agencies [08:23].

AI and Human Intelligence: A Collaborative Future

While the idea of AI replacing humans is viewed skeptically for the near future, the industrial revolution example (horse replaced by steam engine) highlights that new tools can emerge [09:56]. AI is seen as a tool, not a competitor [09:59]. Delegating important decisions, such as medical decisions in patient care, entirely to machines is considered dangerous [10:11]. Humans should ideally remain in control from an ethical and operational standpoint [10:23].

However, research indicates AI can exhibit empathy; a study from the University of San Diego found that AI-generated medical advice was more empathetic than advice from human doctors [10:38]. Humans can learn from these algorithms, and AI can serve as an “intelligent assistant” to monitor decisions and suggest better alternatives [11:07].

The development of AI to analyze various data types—text, visual, numerical, time-series data—is a next step, allowing a single system to absorb multiple data modalities simultaneously [15:01]. However, whether AI can develop intuition similar to humans remains debated [15:51]. The current understanding is skeptical that the “intuition” perceived in AI tools is genuinely implemented by them [16:11]. The question of whether AI will ever be able to generate or feel pain is a separate, complex issue related to self-regulation mechanisms [22:13].

Tubegraph

Explorer

Table of Contents