High-Confidence, Efficient Learning Under Rich Task Specifications
Technical point of contact: James Donlon, NSF
Period of activity: 2016-2019
Collaborators: Scott Niekum (PI)
Overview of the Project
While a large literature exists for both policy learning and establishing provable guarantees of correctness, the results so far have stayed within their disciplinary boundaries, e.g., learning theory and formal methods, and inherited the restrictive assumptions of their domains. In the proposed effort, we will develop theory and methods for joint learning and synthesis of policies that not only adapt to novel and changing situations but also respect given high-level specifications with quantified confidence. To this end, we structure the effort around two central algorithmic properties and the synergies between the two:
- Theoretical correctness and efficiency guarantees: We will design "safe" learning algorithms that provide theoretical probabilistic satisfaction and data efficiency guarantees over both the expected reward of a policy and its correctness with respect to a high-level specification. These guarantees will be utilized in both policy evaluation and policy improvement settings, providing a measure of safety and, more generally, correctness against high-level specifications both after and during the learning process.
- Practical efficiency guarantees: While necessary for soundness, strong theoretical guarantees often require a large amount of execution data that are simply not available in many settings, such as robotics. This problem is compounded in a (potentially non-stationary) learning setting in which the policy, environment, or even the reward function, might change frequently. In order to account for this gap between theoretical and practical efficiency in learning, we will develop model-based and model-free off-policy evaluation methods for practical data efficiency that leverage active sampling strategies and bootstrapping.
- Hybrid techniques: Theoretical and practical approaches contribute toward developing provably sound and efficient algorithms in complementary directions. Therefore, we propose to develop hybrid techniques that combine and amplify the advantages of strong theoretical and practical guarantees.