Learning General Policies for Partially Observable Deterministic (POD) Planning

Advisor: Till Hofmann

Supervisor: Prof. Hector Geffner

Level: Master’s Thesis

Topic

In AI planning, the goal is to compute a plan—a sequence of actions—that leads from the initial state to a goal state, given a model of the available actions (e.g., in terms of PDDL). While classical planning assumes deterministic actions and complete information about the initial state, in partially observable deterministic (POD) planning, the agent does not have full information about the state. Instead, the agent can perform sensing actions to obtain additional information about the current state. One approach for POD planning is to translate the problem into a fully observable non-deterministic (FOND) planning problem [1,2] by introducing the literals \(\mathbf{K}L\) and \(\mathbf{K}\neg L\), which express that \(L\) is known to be true or false, respectively. The translated problem can then be solved using a classical online planner or a FOND planner that computes an offline contingent plan [3].

In generalized planning [6], the goal is to compute a policy that can solve a class of problems, rather than addressing each problem instance separately. Recently, we extended generalized planning to FOND [5] based on a combinatorial approach [4]. In this approach, a general policy is described using rules derived from a collection of features. Such a policy can be learned without supervision by classifying transitions into “good” and “non-good” and identifying distinguishing features for these classes.

The goal of this thesis project is to learn general policies for partially observable deterministic problems by translating the POD problem into a FOND problem and then adapting the FOND learning approach to generate strong policies.

Requirements

Prior knowledge in task planning, e.g., from the lecture Actions and Planning in AI is strongly recommended.
Additional background from epistemic reasoning (e.g., Introduction to Knowledge Representation or The Logic of Knowledge Bases) may be helpful, but not necessary.
Prior knowledge in answer set programming (ASP) with clingo would also be very helpful, but is not required.

Interested?

Please contact Till Hofmann and provide a short CV and a transcript of records.

References

[1] A. Albore, H. Palacios, and H. Geffner, “A translation-based approach to contingent planning,” in Proceedings of the 21st International Joint Conference on Artificial Intelligence, in IJCAI’09. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., Jul. 2009, pp. 1623–1628.

[2] B. Bonet and H. Geffner, “Flexible and Scalable Partially Observable Planning with Linear Translations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28, no. 1, Art. no. 1, Jun. 2014, doi: 10.1609/aaai.v28i1.9047.

[3] C. Muise, V. Belle, and S. McIlraith, “Computing Contingent Plans via Fully Observable Non-Deterministic Planning,” AAAI, vol. 28, no. 1, Art. no. 1, Jun. 2014, Accessed: May 12, 2021. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/9049

[4] G. Francès, B. Bonet, and H. Geffner, “Learning General Planning Policies from Small Examples Without Supervision,” in Proceedings of the AAAI Conference on Artificial Intelligence, May 2021, pp. 11801–11808. doi: 10.1609/aaai.v35i13.17402.

[5] T. Hofmann and H. Geffner, “Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains,” in Proceedings of the 33rd International Joint Conference on Artificial Intelligence, 2024, pp. 6733–6742. doi: 10.24963/ijcai.2024/744.

[6] B. Bonet and H. Geffner, “General Policies, Subgoal Structure, and Planning Width,” Journal of Artificial Intelligence Research, vol. 80, pp. 475–516, Jun. 2024, doi: 10.1613/jair.1.15581.