3vLTL: A Tool to Generate Automata for Three-valued LTL
F Belardinelli, A Ferrando, V Malvone
Multi-valued logics have a long tradition in the literature on system verification, including run-time verification. However, comparatively fewer model-checking tools have been developed for multi-valued specification languages. We present 3vLTL, a tool to generate Buchi automata from formulas in Linear-time Temporal Logic (LTL) interpreted on a three-valued semantics. Given an LTL formula, a set of atomic propositions as the alphabet for the automaton, and a truth value, our procedure generates a Buchi automaton that accepts all the words that assign the chosen truth value to the LTL formula. Given the particular type of the output of the tool, it can also be seamlessly processed by third-party libraries in a natural way. That is, the Buchi automaton can then be used in the context of formal verification to check whether an LTL formula is true, false, or undefined on a given model.
BibTeX citation
@article{3vltl-a-tool-to-generate-automata-for-three-valued-ltl-2023-,
title = {3vLTL: A Tool to Generate Automata for Three-valued LTL},
author = {F Belardinelli, A Ferrando, V Malvone},
journal = {arXiv preprint arXiv:2311.09787 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2311.09787},
} Aggregating Bipolar Opinions by Arguing
CJ Dickie, F Belardinelli, F Toni
In multi-agent systems, bipolar argumentation (BA) frameworks can be used to represent the individual opinions of agents during debate. Recently, bipolar assumption-based argumentation (bipolar ABA) frameworks have been combined with methods from social choice theory to aggregate such opinions, producing a single framework that reflects a consensus among the debate participants. However, there has not yet been a comprehensive study evaluating this method of aggregation. In this project, we establish preservation results by defining the conditions under which agreement between agents, such as shared semantic properties, can be preserved during bipolar ABA aggregation. It was found that the preservation of properties could be guaranteed, albeit with significant restrictions to the input frameworks and/or aggregation procedure. Furthermore, we build an application of bipolar ABA aggregation by designing and implementing a debate and decision-making platform named ArgSolve. As part of this process, we introduce a novel program for computing extensions of bipolar ABA semantics, accompanied by a corresponding soundness and completeness proof. Through evaluation, it was found that the platform presented an improvement over conversation-centric debate with respect to expressiveness, comprehension and perceived fairness, but also showed limitations in accessibility and usability.
BibTeX citation
@article{aggregating-bipolar-opinions-by-arguing-2023-mr35r1eaaaaj-uh,
title = {Aggregating Bipolar Opinions by Arguing},
author = {CJ Dickie, F Belardinelli, F Toni},
year = {2023},
url = {https://crjake.github.io/thesis.pdf},
} An abstraction-refinement framework for verifying strategic properties in multi-agent systems with imperfect information
F Belardinelli, A Ferrando, V Malvone
We investigate the verification of Multi-Agent Systems against strategic properties expressed in Alternating-time Temporal Logic under the assumptions of imperfect information and perfect recall. To this end, we develop a three-valued semantics for concurrent game structures upon which we define an abstraction method. We prove that concurrent game structures with imperfect information admit perfect information abstractions that preserve three-valued satisfaction. Furthermore, to deal with cases in which the value of a specification is undefined, we develop a novel automata-theoretic technique for the linear-time logic ( LTL ), then apply it to finding ?failure? states. The latter can then be fed into a refinement procedure, thus providing a sound, albeit incomplete, verification method. We illustrate the overall procedure in a variant of the Train Gate Controller scenario and a simple voting protocol under imperfect ?
BibTeX citation
@article{an-abstraction-refinement-framework-for-verifying-strategic-,
title = {An abstraction-refinement framework for verifying strategic properties in multi-agent systems with imperfect information},
author = {F Belardinelli, A Ferrando, V Malvone},
journal = {Artificial Intelligence 316, 103847 , 2023},
year = {2023},
url = {https://www.sciencedirect.com/science/article/pii/S0004370222001874},
} Approximate model-based shielding for safe reinforcement learning
AW Goodall, F Belardinelli
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
BibTeX citation
@article{approximate-model-based-shielding-for-safe-reinforcement-lea,
title = {Approximate model-based shielding for safe reinforcement learning},
author = {AW Goodall, F Belardinelli},
journal = {arXiv preprint arXiv:2308.00707 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2308.00707},
} Approximate shielding of atari agents for safe exploration
AW Goodall, F Belardinelli
Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.
BibTeX citation
@article{approximate-shielding-of-atari-agents-for-safe-exploration-2,
title = {Approximate shielding of atari agents for safe exploration},
author = {AW Goodall, F Belardinelli},
journal = {arXiv preprint arXiv:2304.11104 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2304.11104},
} Arguing with physical activity data from a real-world wearable clinical trial with patients with a primary brain tumour
A Gould, F Toni, F Belardinelli
The objective of this project is to leverage Abstract Argumentation for Case-Based Reasoning (AA-CBR) to support clinical decision-making. The BrainWear study aims to assess the feasibility of its innovative use of wearable accelerometers to capture real-time Physical Activity (PA) data over intermittently collected Patient Reported Outcome (PRO) measures, for the assessment of patient Health-Related Quality of Life (HRQoL). This project proposes novel uses of AA-CBR as a transparent machine learning model to assess the utility of PA data from the BrainWear study, focusing on a patient-centred and transparent approach. Objectives: Firstly, we aim to develop novel applications of AA-CBR to predict the status of patient disease with PRO and PA data. This involves developing new processes to characterise real-world medical data into interpretable representations, as well as evaluating the performance of argumentation models at this prediction task. Furthermore, we aim to assess the trade-off between constructing clinically interpretable models, achieving high performance, and the associated effort required for data characterisations. Additionally, we seek to determine if PA data can supplement or replace PRO measures and investigate the use of argumentation to support our conclusions. This will involve constructing innovative methods to identify conflicts between the two measures. We aim to create novel variants of AA-CBR that reduce the burden of feature characterisation and can handle time-based representations. By experimenting with these diverse AA-CBR variants, we seek to identify the most suitable approach for clinical decision ?
BibTeX citation
@article{arguing-with-physical-activity-data-from-a-real-world-wearab,
title = {Arguing with physical activity data from a real-world wearable clinical trial with patients with a primary brain tumour},
author = {A Gould, F Toni, F Belardinelli},
year = {2023},
url = {https://adam-gould.co.uk/MEngReport.pdf},
} Asymptotic convergence and performance of multi-agent Q-learning dynamics
AA Hussain, F Belardinelli, G Piliouras
Achieving convergence of multiple learning agents in general -player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games. Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge.
BibTeX citation
@article{asymptotic-convergence-and-performance-of-multi-agent-q-lear,
title = {Asymptotic convergence and performance of multi-agent Q-learning dynamics},
author = {AA Hussain, F Belardinelli, G Piliouras},
journal = {arXiv preprint arXiv:2301.09619 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2301.09619},
} Automatically verifying expressive epistemic properties of programs
F Belardinelli, I Boureanu, V Malvone, F Rajaona
We propose a new approach to the verification of epistemic properties of programmes. First, we introduce the new``program-epistemic''logic L_PK, which is strictly richer and more general than similar formalisms appearing in the literature. To solve the verification problem in an efficient way, we introduce a translation from our language L_PK into first-order logic. Then, we show and prove correct a reduction from the model checking problem for program-epistemic formulas to the satisfiability of their first-order translation. Both our logic and our translation can handle richer specification wrt the state of the art, allowing us to express the knowledge of agents about facts pertaining to programs (ie, agents' knowledge before a program is executed as well as after is has been executed). Furthermore, we implement our translation in Haskell in a general way (ie, independently of the programs in the logical statements), and we use existing SMT-solvers to check satisfaction of L_PK formulas on a benchmark example in the AI/agency field.
BibTeX citation
@article{automatically-verifying-expressive-epistemic-properties-of-p,
title = {Automatically verifying expressive epistemic properties of programs},
author = {F Belardinelli, I Boureanu, V Malvone, F Rajaona},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence 37 (5), 6245-6252 , 2023},
year = {2023},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/25769},
} Beyond strict competition: approximate convergence of multi agent Q-learning dynamics
A Hussain, F Belardinelli, G Piliouras
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.
BibTeX citation
@article{beyond-strict-competition-approximate-convergence-of-multi-a,
title = {Beyond strict competition: approximate convergence of multi agent Q-learning dynamics},
author = {A Hussain, F Belardinelli, G Piliouras},
journal = {arXiv preprint arXiv:2307.13928 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2307.13928},
} Characterising decision theories with mechanised causal graphs
M MacDermott, T Everitt, F Belardinelli
How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, with contenders for how to think about it including evidential decision theory, causal decision theory, and functional decision theory. In this paper, we show that mechanised causal models can be used to characterise and differentiate the most important decision theories, and generate a taxonomy of different decision theories.
BibTeX citation
@article{characterising-decision-theories-with-mechanised-causal-grap,
title = {Characterising decision theories with mechanised causal graphs},
author = {M MacDermott, T Everitt, F Belardinelli},
journal = {arXiv preprint arXiv:2307.10987 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2307.10987},
} Defining deception in structural causal games
FR Ward, F Toni, F Belardinelli
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals. There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a functional definition of deception in structural causal games, grounded in the philosophical literature. We present several examples to establish that our formal definition captures philosophical desiderata for deception.
BibTeX citation
@article{defining-deception-in-structural-causal-games-2023-mr35r1eaa,
title = {Defining deception in structural causal games},
author = {FR Ward, F Toni, F Belardinelli},
journal = {Proceedings of the 2023 International Conference on Autonomous Agents and ? , 2023},
year = {2023},
url = {https://www.southampton.ac.uk/~eg/AAMAS2023/pdfs/p2902.pdf},
} Honesty is the best policy: defining and mitigating AI deception
F Ward, F Toni, F Belardinelli, T Everitt
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formaldefinition of deception in structural causal games, grounded in the philosophyliterature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.
BibTeX citation
@article{honesty-is-the-best-policy-defining-and-mitigating-ai-decept,
title = {Honesty is the best policy: defining and mitigating AI deception},
author = {F Ward, F Toni, F Belardinelli, T Everitt},
journal = {Advances in neural information processing systems 36, 2313-2341 , 2023},
year = {2023},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/hash/06fc7ae4a11a7eb5e20fe018db6c036f-Abstract-Conference.html},
} Honesty Is the Best Policy: Defining and Mitigating AI Deception
F Rhys Ward, F Belardinelli, F Toni, T Everitt
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement ?
BibTeX citation
@article{honesty-is-the-best-policy-defining-and-mitigating-ai-decept,
title = {Honesty Is the Best Policy: Defining and Mitigating AI Deception},
author = {F Rhys Ward, F Belardinelli, F Toni, T Everitt},
journal = {arXiv e-prints, arXiv: 2312.01350 , 2023},
year = {2023},
url = {https://scholar.google.com/scholar?cluster=4892495585589940263&hl=en&oi=scholarr},
} Program semantics and verification technique for ai-centred programs
F Rajaona, I Boureanu, V Malvone, F Belardinelli
We give a general-purpose programming language in which programs can reason about their own knowledge. To specify what these intelligent programs know, we define a ?program epistemic? logic, akin to a dynamic epistemic logic for programs. Our logic properties are complex, including programs introspecting into future state of affairs, i.e., reasoning now about facts that hold only after they and other threads will execute. To model aspects anchored in privacy, our logic is interpreted over partial observability of variables, thus capturing that each thread can ?see? only a part of the global space of variables. We verify program-epistemic properties on such AI-centred programs. To this end, we give a sound translation of the validity of our program-epistemic logic into first-order validity, using a new weakest-precondition semantics and a book-keeping of variable assignment. We implement our translation and fully ?
BibTeX citation
@article{program-semantics-and-verification-technique-for-ai-centred-,
title = {Program semantics and verification technique for ai-centred programs},
author = {F Rajaona, I Boureanu, V Malvone, F Belardinelli},
journal = {International Symposium on Formal Methods, 473-491 , 2023},
year = {2023},
url = {https://link.springer.com/chapter/10.1007/978-3-031-27481-7_27},
} RMLGym: a Formal Reward Machine Framework for Reinforcement Learning.
H Unniyankal, F Belardinelli, A Ferrando, V Malvone
Reinforcement learning (RL) is a powerful technique for learning optimal policies from trial and error. However, designing a reward function that captures the desired behavior of an agent is often a challenging and tedious task, especially when the agent has to deal with complex and multi-objective problems. To address this issue, researchers have proposed to use higher-level languages, such as Signal Temporal Logic (STL), to specify reward functions in a declarative and expressive way, and then automatically compile them into lower-level functions that can be used by standard RL algorithms. In this paper, we present RMLGym, a tool that integrates RML, a runtime verification tool, with OpenAI Gym, a popular framework for developing and comparing RL algorithms. RMLGym allows users to define reward functions using RML specifications and then generates reward monitors that evaluate the agent?s performance and provide feedback at each step. We demonstrate the usefulness and flexibility of RMLGym by applying it to a famous benchmark problem from OpenAI Gym, and we analyze the strengths and limitations of our approach.
BibTeX citation
@article{rmlgym-a-formal-reward-machine-framework-for-reinforcement-l,
title = {RMLGym: a Formal Reward Machine Framework for Reinforcement Learning.},
author = {H Unniyankal, F Belardinelli, A Ferrando, V Malvone},
journal = {WOA, 1-16 , 2023},
year = {2023},
url = {https://ceur-ws.org/Vol-3579/paper1.pdf},
} Scalable verification of strategy logic through three-valued abstraction
F Belardinelli, A Ferrando, W Jamroga, V Malvone, A Murano
The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.
BibTeX citation
@article{scalable-verification-of-strategy-logic-through-three-valued,
title = {Scalable verification of strategy logic through three-valued abstraction},
author = {F Belardinelli, A Ferrando, W Jamroga, V Malvone, A Murano},
journal = {arXiv preprint arXiv:2310.17219 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2310.17219},
} Stability of Multi-Agent Learning: Convergence in Network Games with Many Players
A Hussain, D Leonte, F Belardinelli, G Piliouras
The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.
BibTeX citation
@article{stability-of-multi-agent-learning-convergence-in-network-gam,
title = {Stability of Multi-Agent Learning: Convergence in Network Games with Many Players},
author = {A Hussain, D Leonte, F Belardinelli, G Piliouras},
journal = {arXiv preprint arXiv:2307.13922 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2307.13922},
} Strategic abilities of forgetful agents in stochastic environments
F Belardinelli, W Jamroga, M Mittelmann, A Murano
In this paper, we investigate the probabilistic variants of the strategy logics ATL and ATL* under imperfect information. Specifically, we present novel decidability and complexity results when the model transitions are stochastic and agents play uniform strategies. That is, the semantics of the logics are based on multi-agent, stochastic transition systems with imperfect information, which combine two sources of uncertainty, namely, the partial observability agents have on the environment, and the likelihood of transitions to occur from a system state. Since the model checking problem is undecidable in general in this setting, we restrict our attention to agents with memoryless (positional) strategies. The resulting setting captures the situation in which agents have qualitative uncertainty of the local state and quantitative uncertainty about the occurrence of future events. We illustrate the usefulness of this setting with meaningful examples.
BibTeX citation
@article{strategic-abilities-of-forgetful-agents-in-stochastic-enviro,
title = {Strategic abilities of forgetful agents in stochastic environments},
author = {F Belardinelli, W Jamroga, M Mittelmann, A Murano},
journal = {arXiv preprint arXiv:2310.17240 , 2023},
year = {2023},
url = {https://arxiv.org/abs/2310.17240},
} The impact of exploration on convergence and performance of multi-agent Q-learning dynamics
A Hussain, F Belardinelli, D Paccagnan
Understanding the impact of exploration on the behaviour of multi-agent learning has, so far, benefited from the restriction to potential, or network zero-sum games in which convergence to an equilibrium can be shown. Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q-Learning dynamics. We show that, in any network game, exploration by agents results in the convergence of Q-Learning to a neighbourhood of an equilibrium. This holds independently of whether the dynamics reach the equilibrium or display complex behaviours. We show that increasing the exploration rate decreases the size of this neighbourhood and also decreases the ability of all agents to improve their payoffs. Furthermore, in a broad class of games, the payoff performance of Q-Learning dynamics, measured by Social Welfare, decreases when the exploration rate increases. Our experiments show this to be a general phenomenon, namely that exploration leads to improved convergence of Q-Learning, at the cost of payoff performance.
BibTeX citation
@article{the-impact-of-exploration-on-convergence-and-performance-of-,
title = {The impact of exploration on convergence and performance of multi-agent Q-learning dynamics},
author = {A Hussain, F Belardinelli, D Paccagnan},
journal = {International conference on machine learning, 14178-14202 , 2023},
year = {2023},
url = {https://proceedings.mlr.press/v202/hussain23a.html},
}