When deep learning meets inductive logic programming
The integration of inductive logic programming (symbolism) and deep learning (connectionism) has drawn researchers' attention and become a novel research direction toward better reasoning ability. However, the current integration is still far from perfect, especially the poor generalization and...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182530 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The integration of inductive logic programming (symbolism) and deep learning (connectionism) has drawn researchers' attention and become a novel research direction toward better reasoning ability. However, the current integration is still far from perfect, especially the poor generalization and learning efficiency of the current models. In this thesis, we aim to improve the generalization of current approaches, together with the learning ability and performance. Besides, we also aim to conduct a full evaluation regarding the reasoning ability of Large Language Models with inductive logic programming tasks.
First, to improve the generalization for the current logic-based deep reinforcement learning (DRL) algorithms, we propose a novel framework called GALOIS, which is capable of synthesizing white-box programs with hierarchy and definitive cause-effect logic. It utilizes a program sketch and introduces a new sketch-based programming language to guide the synthesis process. Utilizing this framework, GALOIS implements a sketch-based program synthesis approach that automatically generates white-box, interpretable programs with a generalizable cause-effect logic. Comprehensive evaluation across various complex decision-making tasks demonstrates that GALOIS outperforms mainstream DRL methods and previous state-of-the-art program-guided baselines in learning capability, generalization, interpretation, and the ability to reuse knowledge across different tasks.
Then, we observe that current start-of-the-art neural inductive logic programming (ILP) models suffer from requiring a large number of training iterations and training examples for training. Besides, they are still far from perfect in terms of performance and generalization when dealing with tasks that require complex task-solving logic. To mitigate this issue, we present a novel framework named Failure Reflection Guided Regularizer (FRGR).
FRGR initially identifies and summarizes error patterns dynamically when the model repeatedly makes similar mistakes during training. Subsequently, it penalties the model for making similar mistakes in future training iterations. This approach encourages the model to avoid recurring mistakes with similar error patterns, thereby facilitating faster convergence to a more optimal and higher-performing solution. Experimental results on multiple relational reasoning and decision-making tasks demonstrate the effectiveness of FRGR in improving neural ILP models' performance, generalization, and learning efficiency.
Finally, in spite of the improvement the state-of-the-art neural ILP solvers can bring out, there has been a surge of interest in investigating the reasoning ability of the LLMs. Whereas the textual and numerical reasoning benchmarks adopted by previous works are rather shallow and simple, it is hard to conclude that the large language models (LLMs) possess strong reasoning ability by merely achieving positive results on these benchmarks.
Recent efforts have demonstrated that LLMs are poor at solving sequential
decision-making problems that require common-sense planning by evaluating their performance on the reinforcement learning benchmarks.
In this work, we conduct an in-depth assessment of several state-of-the-art LLMs' reasoning ability based on the ILP benchmark, which is broadly recognized as a representative and challenging measurement for evaluating logic program induction/synthesis systems as it requires inducing strict cause-effect logic to achieve robust deduction on independent and identically distributed (IID) and out-of-distribution (OOD) test samples.
Our evaluations illustrate that compared with the neural program induction systems which are much smaller in model size, the state-of-the-art LLMs are much poorer in terms of reasoning ability by achieving much lower performance and generalization using either natural language prompting or truth-value matrix prompting. |
---|