2026

PRIMA 2025: Principles and Practice of Multi-Agent Systems: 26th ... , 2026 Scholar

Expressive Reward Synthesis

D Donnelly, F Belardinelli

Reinforcement Learning (RL)[10] has achieved remarkable success by enabling agents to learn through interactions with their environment, using reward signals to shape their behaviour. Yet, the reward function that produces these signals is typically treated as a black box that the agent queries to receive rewards [6]. Reward Machines (RMs)[5, 6] represent reward functions using finite state machines, enabling the agent to receive an explicit representation of the reward function. Each state in the machine corresponds to a possibly different reward function, with transitions between states triggered by events in the environment. Furthermore, Reward Machines can encode histories of state-action sequences, allowing the specification of long-horizon objectives and multi-stage tasks. However, Reward Machines are typically limited to expressing non-Markovian properties that can be described by regular languages [6], thus making them unsuitable for tasks requiring more expressive capabilities, such as count-ing [2] or parametrized conditions.

Link to paper
BibTeX citation
@article{expressive-reward-synthesis-2026-mr35r1eaaaaj-re3vwb3y0ac, title = {Expressive Reward Synthesis}, author = {D Donnelly, F Belardinelli}, journal = {PRIMA 2025: Principles and Practice of Multi-Agent Systems: 26th ... , 2026}, year = {2026}, url = {https://books.google.com/books...hl=en&lr=&id=NMmfEQAAQBAJ&oi=fnd&pg=PA427&dq=info:vkBqj52YrZQJ:scholar.google.com&ots=V2jzJVqMXv&sig=Nzc3-1GYFjnt0gV3_Y07EUbKtP8}, }

2025

arXiv preprint arXiv:2511.02605 , 2025 Scholar

Adaptive GR (1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning

TA Georgescu, AW Goodall, D Alrajeh, F Belardinelli, S Uchitel

Shielding is widely used to enforce safety in reinforcement learning (RL), ensuring that an agent's actions remain compliant with formal specifications. Classical shielding approaches, however, are often static, in the sense that they assume fixed logical specifications and hand-crafted abstractions. While these static shields provide safety under nominal assumptions, they fail to adapt when environment assumptions are violated. In this paper, we develop the first adaptive shielding framework - to the best of our knowledge - based on Generalized Reactivity of rank 1 (GR(1)) specifications, a tractable and expressive fragment of Linear Temporal Logic (LTL) that captures both safety and liveness properties. Our method detects environment assumption violations at runtime and employs Inductive Logic Programming (ILP) to automatically repair GR(1) specifications online, in a systematic and interpretable way. This ensures that the shield evolves gracefully, ensuring liveness is achievable and weakening goals only when necessary. We consider two case studies: Minepump and Atari Seaquest; showing that (i) static symbolic controllers are often severely suboptimal when optimizing for auxiliary rewards, and (ii) RL agents equipped with our adaptive shield maintain near-optimal reward and perfect logical compliance compared with static shields.

Link to paper
BibTeX citation
@article{adaptive-gr-1-specification-repair-for-liveness-preserving-s, title = {Adaptive GR (1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning}, author = {TA Georgescu, AW Goodall, D Alrajeh, F Belardinelli, S Uchitel}, journal = {arXiv preprint arXiv:2511.02605 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2511.02605}, }
Autonomous Agents and Multi-Agent Systems 39 (1), 4 , 2025 Scholar

Aggregating bipolar opinions through bipolar assumption-based argumentation

C Dickie, S Lauren, F Belardinelli, A Rago, F Toni

We introduce a novel method to aggregate bipolar argumentation frameworks expressing opinions of different parties in debates. We use Bipolar Assumption-based Argumentation (ABA) as an all-encompassing formalism for bipolar argumentation under different semantics. By leveraging on recent results on judgement aggregation in social choice theory, we prove several preservation results for relevant properties of bipolar ABA using quota and oligarchic rules. Specifically, we prove (positive and negative) results about the preservation of conflict-free, closed, admissible, preferred, complete, set-stable, well-founded and ideal extensions in bipolar ABA, as well as the preservation of acceptability, acyclicity and coherence for individual assumptions. Finally, we illustrate our methodology and results in the context of a case study on opinion aggregation for the treatment of long COVID patients.

Link to paper
BibTeX citation
@article{aggregating-bipolar-opinions-through-bipolar-assumption-base, title = {Aggregating bipolar opinions through bipolar assumption-based argumentation}, author = {C Dickie, S Lauren, F Belardinelli, A Rago, F Toni}, journal = {Autonomous Agents and Multi-Agent Systems 39 (1), 4 , 2025}, year = {2025}, url = {https://link.springer.com/article/10.1007/s10458-024-09684-3}, }
arXiv e-prints, arXiv: 2510.17306 , 2025 Scholar

ATL* AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems

S Garcia de Blas Garcia-Alcalde, F Belardinelli

We present two novel symbolic algorithms for model checking the Alternating-time Temporal Logic ATL*, over both the infinite-trace and the finite-trace semantics. In particular, for infinite traces we design a novel symbolic reduction to parity games. We implement both methods in the ATL* AS model checker and evaluate it using synthetic benchmarks as well as a cybersecurity scenario. Our results demonstrate that the symbolic approach significantly outperforms the explicit-state representation and we find that our parity-game-based algorithm offers a more scalable and efficient solution for infinite-trace verification, outperforming previously available tools. Our results also confirm that finite-trace model checking yields substantial performance benefits over infinite-trace verification. As such, we provide a comprehensive toolset for verifying multiagent systems against specifications in ATL*.

Link to paper
BibTeX citation
@article{atl-as-an-automata-theoretic-approach-and-tool-for-the-verif, title = {ATL* AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems}, author = {S Garcia de Blas Garcia-Alcalde, F Belardinelli}, journal = {arXiv e-prints, arXiv: 2510.17306 , 2025}, year = {2025}, url = {https://scholar.google.com/scholar...cluster=8172251689052544713&hl=en&oi=scholarr}, }
arXiv preprint arXiv:2510.17306 , 2025 Scholar

ATL* AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems

SGB Garcia-Alcalde, F Belardinelli

We present two novel symbolic algorithms for model checking the Alternating-time Temporal Logic ATL*, over both the infinite-trace and the finite-trace semantics. In particular, for infinite traces we design a novel symbolic reduction to parity games. We implement both methods in the ATL* AS model checker and evaluate it using synthetic benchmarks as well as a cybersecurity scenario. Our results demonstrate that the symbolic approach significantly outperforms the explicit-state representation and we find that our parity-game-based algorithm offers a more scalable and efficient solution for infinite-trace verification, outperforming previously available tools. Our results also confirm that finite-trace model checking yields substantial performance benefits over infinite-trace verification. As such, we provide a comprehensive toolset for verifying multiagent systems against specifications in ATL*.

Link to paper
BibTeX citation
@article{atl-as-automata-theoretic-approach-preprint, title = {ATL* AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems}, author = {SGB Garcia-Alcalde, F Belardinelli}, journal = {arXiv preprint arXiv:2510.17306 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2510.17306}, }
arXiv preprint arXiv:2511.10843 , 2025 Scholar

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

AW Goodall, EHD Court, F Belardinelli

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we leverage new results from off-policy evaluation; it has recently been shown that well-designed behaviour policies can be used to collect off-policy data for provably lower variance return estimates. This result is surprising as it means collecting data on-policy is not variance optimal. We extend this key insight to the online reinforcement learning setting, where both policy evaluation and improvement are interleaved to learn optimal policies. Off-policy RL has been well studied (e.g., IMPALA), with correct and truncated importance weighted samples for de-biasing and managing variance appropriately. Generally these approaches are concerned with reconciling data collected from multiple workers in parallel, while the policy is updated asynchronously, mismatch between the workers and policy is corrected in a mathematically sound way. Here we consider only one worker - the behaviour policy, which is used to collect data for policy improvement, with provably lower variance return estimates. In our experiments we extend two policy-gradient methods with this regime, demonstrating better sample efficiency and performance over a diverse set of environments.

Link to paper
BibTeX citation
@article{behaviour-policy-optimization-provably-lower-variance-return, title = {Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning}, author = {AW Goodall, EHD Court, F Belardinelli}, journal = {arXiv preprint arXiv:2511.10843 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2511.10843}, }
Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, 1090-1097 , 2025 Scholar

Explainable Reinforcement Learning for Formula One Race Strategy

D Thomas, J Jiang, A Kori, A Russo, S Winkler, S Sale, J McMillan, ...

In Formula One, teams compete to develop their cars and achieve the highest possible finishing position in each race. During a race, however, teams are unable to alter the car, so they must improve their cars' finishing positions via race strategy, i.e. optimising their selection of which tyre compounds to put on the car and when to do so. In this work, we introduce a reinforcement learning model, RSRL (Race Strategy Reinforcement Learning), to control race strategies in simulations, offering a faster alternative to the industry standard of hard-coded and Monte Carlo-based race strategies. Controlling cars with a pace equating to an expected finishing position of P5.5 (where P1 represents first place and P20 is last place), RSRL achieves an average finishing position of P5.33 on our test race, the 2023 Bahrain Grand Prix, outperforming the best baseline of P5.63. We then demonstrate, in a generalisability study, how ...

Link to paper
BibTeX citation
@article{explainable-reinforcement-learning-for-formula-one-race-stra, title = {Explainable Reinforcement Learning for Formula One Race Strategy}, author = {D Thomas, J Jiang, A Kori, A Russo, S Winkler, S Sale, J McMillan, ...}, journal = {Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, 1090-1097 , 2025}, year = {2025}, url = {https://dl.acm.org/doi/abs/10.1145/3672608.3707766}, }
International Conference on Principles and Practice of Multi-Agent Systems ... , 2025 Scholar

Expressive reward synthesis with the runtime monitoring language

D Donnelly, F Belardinelli

Reinforcement Learning (RL) has achieved remarkable success by enabling agents to learn through interactions with their environment, using reward signals to shape their behaviour. Yet, the reward function that produces these signals is typically treated as a black box that the agent queries to receive rewards .

Link to paper
BibTeX citation
@article{expressive-reward-synthesis-with-the-runtime-monitoring-lang, title = {Expressive reward synthesis with the runtime monitoring language}, author = {D Donnelly, F Belardinelli}, journal = {International Conference on Principles and Practice of Multi-Agent Systems ... , 2025}, year = {2025}, url = {https://link.springer.com/chapter/10.1007/978-3-032-13562-9_33}, }
arXiv preprint arXiv:2511.12808 , 2025 Scholar

Expressive Temporal Specifications for Reward Monitoring

O Adalat, F Belardinelli

Specifying informative and dense reward functions remains a pivotal challenge in Reinforcement Learning, as it directly affects the efficiency of agent training. In this work, we harness the expressive power of quantitative Linear Temporal Logic on finite traces (( )) to synthesize reward monitors that generate a dense stream of rewards for runtime-observable state trajectories. By providing nuanced feedback during training, these monitors guide agents toward optimal behaviour and help mitigate the well-known issue of sparse rewards under long-horizon decision making, which arises under the Boolean semantics dominating the current literature. Our framework is algorithm-agnostic and only relies on a state labelling function, and naturally accommodates specifying non-Markovian properties. Empirical results show that our quantitative monitors consistently subsume and, depending on the environment, outperform Boolean monitors in maximizing a quantitative measure of task completion and in reducing convergence time.

Link to paper
BibTeX citation
@article{expressive-temporal-specifications-for-reward-monitoring-202, title = {Expressive Temporal Specifications for Reward Monitoring}, author = {O Adalat, F Belardinelli}, journal = {arXiv preprint arXiv:2511.12808 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2511.12808}, }
Proceedings of the 24th International Conference on Autonomous Agents and ... , 2025 Scholar

LUNAR: A Runtime Verification Tool for Anomaly Detection in Gas Networks

J Gasson, F Belardinelli

We introduce LUNAR, a framework to detect and classify network anomalies. The tool is designed to (1) synthesize safety constraints expressed in Signal Temporal Logic (STL) based on network data;(2) detect anomalies in new samples wrt the STL constraints;(3) learn to classify anomaly types based on user labels of prior anomalies.

Link to paper
BibTeX citation
@article{lunar-a-runtime-verification-tool-for-anomaly-detection-in-g, title = {LUNAR: A Runtime Verification Tool for Anomaly Detection in Gas Networks}, author = {J Gasson, F Belardinelli}, journal = {Proceedings of the 24th International Conference on Autonomous Agents and ... , 2025}, year = {2025}, url = {https://www.ifaamas.org/Proceedings/aamas2025/pdfs/p3009.pdf}, }
ACM transactions on computational logic 26 (1), 1-45 , 2025 Scholar

Model-checking Strategic Abilities in Information-sharing Systems

F Belardinelli, I Boureanu, C Dima, V Malvone

We introduce a subclass of concurrent game structures (CGS) with imperfect information in which agents are endowed with private data-sharing capabilities. Importantly, our CGSs are such that it is still decidable to model-check these CGSs against a relevant fragment of ATL. These systems can be thought as a generalization of architectures allowing information forks, that is, cases where strategic abilities lead to certain agents outside a coalition privately sharing information with selected agents inside that coalition. Moreover, in our case, in the initial states of the system, we allow information forks from agents outside a given set to agents inside this group . For this reason, together with the fact that the communication in our models underpins a specialized form of broadcast, we call our formalism -cast systems . To underline, the fragment of ATL for which we show the model-checking problem to be decidable over ...

Link to paper
BibTeX citation
@article{model-checking-strategic-abilities-in-information-sharing-sy, title = {Model-checking Strategic Abilities in Information-sharing Systems}, author = {F Belardinelli, I Boureanu, C Dima, V Malvone}, journal = {ACM transactions on computational logic 26 (1), 1-45 , 2025}, year = {2025}, url = {https://dl.acm.org/doi/abs/10.1145/3704919}, }
arXiv preprint arXiv:2503.10186 , 2025 Scholar

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

A Hussain, D Leonte, F Belardinelli, R Huser, D Paccagnan

Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, and instead display complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdos-Renyi model, a well-studied model for social networks, and the Stochastic Block model, which generalizes the above by accounting for community structures within the network. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the sparsity of the network. Finally, we validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided network sparsity is controlled.

Link to paper
BibTeX citation
@article{multi-agent-q-learning-dynamics-in-random-networks-convergen, title = {Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity}, author = {A Hussain, D Leonte, F Belardinelli, R Huser, D Paccagnan}, journal = {arXiv preprint arXiv:2503.10186 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2503.10186}, }
Proceedings of the AAAI Conference on Artificial Intelligence 39 (15), 16091 ... , 2025 Scholar

Probabilistic Shielding for Safe Reinforcement Learning

F Belardinelli, AW Goodall

In real-life scenarios, a Reinforcement Learning (RL) agent aiming to maximize their reward, must often also behave in a safe manner, including at training time. Thus, much attention in recent years has been given to Safe RL, where an agent aims to learn an optimal policy among all policies that satisfy a given safety constraint. However, strict safety guarantees are often provided through approaches based on linear programming, and thus have limited scaling. In this paper we present a new, scalable method, which enjoys strict formal guarantees for Safe RL, in the case where the safety dynamics of the Markov Decision Process (MDP) are known, and safety is defined as an undiscounted probabilistic avoidance property. Our approach is based on state-augmentation of the MDP, and on the design of a shield that restricts the actions available to the agent. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time. Furthermore, we demonstrate that our approach is viable in practice through experimental evaluation.

Link to paper
BibTeX citation
@article{probabilistic-shielding-for-safe-reinforcement-learning-aaai, title = {Probabilistic Shielding for Safe Reinforcement Learning}, author = {F Belardinelli, AW Goodall}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence 39 (15), 16091 ... , 2025}, year = {2025}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/33767}, }
arXiv preprint arXiv:2503.07671 , 2025 Scholar

Probabilistic Shielding for Safe Reinforcement Learning

EHD Court, F Belardinelli, AW Goodall

In real-life scenarios, a Reinforcement Learning (RL) agent aiming to maximise their reward, must often also behave in a safe manner, including at training time. Thus, much attention in recent years has been given to Safe RL, where an agent aims to learn an optimal policy among all policies that satisfy a given safety constraint. However, strict safety guarantees are often provided through approaches based on linear programming, and thus have limited scaling. In this paper we present a new, scalable method, which enjoys strict formal guarantees for Safe RL, in the case where the safety dynamics of the Markov Decision Process (MDP) are known, and safety is defined as an undiscounted probabilistic avoidance property. Our approach is based on state-augmentation of the MDP, and on the design of a shield that restricts the actions available to the agent. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time. Furthermore, we demonstrate that our approach is viable in practice through experimental evaluation.

Link to paper
BibTeX citation
@article{probabilistic-shielding-for-safe-reinforcement-learning-2025, title = {Probabilistic Shielding for Safe Reinforcement Learning}, author = {EHD Court, F Belardinelli, AW Goodall}, journal = {arXiv preprint arXiv:2503.07671 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2503.07671}, }
arXiv preprint arXiv:2510.15720 , 2025 Scholar

ProSh: Probabilistic Shielding for Model-free Reinforcement Learning

EHD Court, G Ohlmann, F Belardinelli

Safety is a major concern in reinforcement learning (RL): we aim at developing RL systems that not only perform optimally, but are also safe to deploy by providing formal guarantees about their safety. To this end, we introduce Probabilistic Shielding via Risk Augmentation (ProSh), a model-free algorithm for safe reinforcement learning under cost constraints. ProSh augments the Constrained MDP state space with a risk budget and enforces safety by applying a shield to the agent's policy distribution using a learned cost critic. The shield ensures that all sampled actions remain safe in expectation. We also show that optimality is preserved when the environment is deterministic. Since ProSh is model-free, safety during training depends on the knowledge we have acquired about the environment. We provide a tight upper-bound on the cost in expectation, depending only on the backup-critic accuracy, that is always satisfied during training. Under mild, practically achievable assumptions, ProSh guarantees safety even at training time, as shown in the experiments.

Link to paper
BibTeX citation
@article{prosh-probabilistic-shielding-arxiv, title = {ProSh: Probabilistic Shielding for Model-free Reinforcement Learning}, author = {EHD Court, G Ohlmann, F Belardinelli}, journal = {arXiv preprint arXiv:2510.15720 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2510.15720}, }
The 25th International Conference on Autonomous Agents and Multi-Agent Systems , 2025 Scholar

ProSh: Probabilistic Shielding for Model-free Reinforcement Learning

G Ohlmann, F Belardinelli

Safety is a major concern in reinforcement learning (RL): we aim at developing RL systems that not only perform optimally, but are also safe to deploy, possibly providing formal guarantees about their safety. To this end, we introduce Probabilistic Shielding via Risk Augmentation ProSh, a model-free algorithm for safe reinforcement learning under cost constraints. ProSh augments the Constrained MDP state space with a risk budget and enforces safety by applying a shield to the agent's policy distribution using a learned cost critic. The shield ensures that all sampled actions remain safe in expectation. We show that optimality is preserved in the deterministic case. Since ProSh is model-free, safety during training depends on the knowledge we have acquired about the environment. We provide a tight upper-bound to the cost in expectation, depending only on the backup-critic accuracy, that is always satisfied during ...

Link to paper
BibTeX citation
@article{prosh-probabilistic-shielding-for-model-free-reinforcement-l, title = {ProSh: Probabilistic Shielding for Model-free Reinforcement Learning}, author = {G Ohlmann, F Belardinelli}, journal = {The 25th International Conference on Autonomous Agents and Multi-Agent Systems , 2025}, year = {2025}, url = {https://openreview.net/forum...id=TCFEijF5G7}, }
arXiv preprint arXiv:2511.16579 , 2025 Scholar

Synthesis of Safety Specifications for Probabilistic Systems

G Ohlmann, EHD Court, F Belardinelli

Ensuring that agents satisfy safety specifications can be crucial in safety-critical environments. While methods exist for controller synthesis with safe temporal specifications, most existing methods restrict safe temporal specifications to probabilistic-avoidance constraints. Formal methods typically offer more expressive ways to express safety in probabilistic systems, such as Probabilistic Computation Tree Logic (PCTL) formulas. Thus, in this paper, we develop a new approach that supports more general temporal properties expressed in PCTL. Our contribution is twofold. First, we develop a theoretical framework for the Synthesis of safe-PCTL specifications. We show how the reducing global specification satisfaction to local constraints, and define CPCTL, a fragment of safe-PCTL. We demonstrate how the expressiveness of CPCTL makes it a relevant fragment for the Synthesis Problem. Second, we leverage these results and propose a new Value Iteration-based algorithm to solve the synthesis problem for these more general temporal properties, and we prove the soundness and completeness of our method.

Link to paper
BibTeX citation
@article{synthesis-of-safety-specifications-for-probabilistic-systems, title = {Synthesis of Safety Specifications for Probabilistic Systems}, author = {G Ohlmann, EHD Court, F Belardinelli}, journal = {arXiv preprint arXiv:2511.16579 , 2025}, year = {2025}, url = {https://arxiv.org/abs/2511.16579}, }

2024

Formal aspects of computing 37 (1), 1-24 , 2024 Scholar

An SMT-based Approach to the Verification of Knowledge-Based Programs

F Belardinelli, I Boureanu, V Malvone, F Rajaona

We give a general-purpose programming language in which programs can reason about their own knowledge. To specify what these intelligent programs know, we define a "program epistemic" logic, akin to a dynamic epistemic logic for programs. Our logic properties are complex, including programs introspecting into future state of affairs, i.e., reasoning now about facts that hold only after they and other threads will execute. To model aspects anchored in privacy, our logic is interpreted over partial observability of variables, thus capturing that each thread can "see" only a part of the global space of variables. We verify program-epistemic properties on such AI-centred programs. To this end, we give a sound translation of the validity of our program-epistemic logic into first-order validity, using a new weakest-precondition semantics and a book-keeping of variable assignment. We implement our translation and fully ...

Link to paper
BibTeX citation
@article{an-smt-based-approach-to-the-verification-of-knowledge-based, title = {An SMT-based Approach to the Verification of Knowledge-Based Programs}, author = {F Belardinelli, I Boureanu, V Malvone, F Rajaona}, journal = {Formal aspects of computing 37 (1), 1-24 , 2024}, year = {2024}, url = {https://dl.acm.org/doi/abs/10.1145/3700150}, }
arXiv preprint arXiv:2402.00816 , 2024 Scholar

Leveraging approximate model-based shielding for probabilistic safety guarantees in continuous environments

AW Goodall, F Belardinelli

Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.

Link to paper
BibTeX citation
@article{leveraging-approximate-model-based-shielding-for-probabilist,
  title = {Leveraging approximate model-based shielding for probabilistic safety guarantees in continuous environments},
  author = {AW Goodall, F Belardinelli},
  journal = {arXiv preprint arXiv:2402.00816 , 2024},
  year = {2024},
  url = {https://arxiv.org/abs/2402.00816},
}
Advances in Neural Information Processing Systems 37, 11412-11431 , 2024 Scholar

Measuring goal-directedness

M MacDermott, J Fox, F Belardinelli, T Everitt

We define maximum entropy goal-directedness (MEG), a formal measure of goal-directedness in causal models and Markov decision processes, and give algorithmsfor computing it. Measuring goal-directedness is important, as it is a criticalelement of many concerns about harm from AI. It is also of philosophical interest, as goal-directedness is a key aspect of agency. MEG is based on an adaptation ofthe maximum causal entropy framework used in inverse reinforcement learning. Itcan measure goal-directedness with respect to a known utility function, a hypothesisclass of utility functions, or a set of random variables. We prove that MEG satisfiesseveral desiderata and demonstrate our algorithms with small-scale experiments.

Link to paper
BibTeX citation
@article{measuring-goal-directedness-2024-mr35r1eaaaaj-5awf1xo2g04c, title = {Measuring goal-directedness}, author = {M MacDermott, J Fox, F Belardinelli, T Everitt}, journal = {Advances in Neural Information Processing Systems 37, 11412-11431 , 2024}, year = {2024}, url = {https://proceedings.neurips.cc/paper_files/paper/2024/hash/1551c01d7a3d0bf21e2518331e9f7074-Abstract-Conference.html}, }
arXiv preprint arXiv:2403.15848 , 2024 Scholar

On the stability of learning in network games with many players

A Hussain, D Leonte, F Belardinelli, G Piliouras

Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players. To make progress towards addressing this challenge we study the Q-Learning Dynamics, a classical model for exploration and exploitation in multi-agent learning. In particular, we study the behaviour of Q-Learning on games where interactions between agents are constrained by a network. We determine a number of sufficient conditions, depending on the game and network structure, which guarantee that agent strategies converge to a unique stable strategy, called the Quantal Response Equilibrium (QRE). Crucially, these sufficient conditions are independent of the total number of agents, allowing for provable convergence in arbitrarily large games. Next, we compare the learned QRE to the underlying NE of the game, by showing that any QRE is an -approximate Nash Equilibrium. We first provide tight bounds on and show how these bounds lead naturally to a centralised scheme for choosing exploration rates, which enables independent learners to learn stable approximate Nash Equilibrium strategies. We validate the method through experiments and demonstrate its effectiveness even in the presence of numerous agents and actions. Through these results, we show that independent learning dynamics may converge to approximate Nash Equilibria, even in the presence of many agents.

Link to paper
BibTeX citation
@article{on-the-stability-of-learning-in-network-games-with-many-play,
  title = {On the stability of learning in network games with many players},
  author = {A Hussain, D Leonte, F Belardinelli, G Piliouras},
  journal = {arXiv preprint arXiv:2403.15848 , 2024},
  year = {2024},
  url = {https://arxiv.org/abs/2403.15848},
}

Shielding Regular Safety Properties in Reinforcement Learning

A Goodall, F Belardinelli

To deploy reinforcement learning (RL) systems in real-world scenarios we need to consider requirements such as safety and constraint compliance, rather than blindly maximizing for reward. In this paper we study RL with regular safety properties. We present a constrained problem based on the satisfaction of regular safety properties with high probability and we compare our setup to the some common constrained Markov decision processes (CMDP) settings. We also present a meta-algorithm with provable safety-guarantees, that can be used to shield the agent from violating the regular safety property during training and deployment. We demonstrate the effectiveness and scalability of our framework by evaluating our meta-algorithm in both the tabular and deep RL setting.

Link to paper
BibTeX citation
@article{shielding-regular-safety-properties-in-reinforcement-learnin, title = {Shielding Regular Safety Properties in Reinforcement Learning}, author = {A Goodall, F Belardinelli}, year = {2024}, url = {https://openreview.net/forum...id=PnSTlFUfcd}, }
Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17435 ? , 2024 Scholar

Stability of multi-agent learning in competitive networks: Delaying the onset of chaos

A Hussain, F Belardinelli

The behaviour of multi-agent learning in competitive network games is often studied under the assumption of zero-sum payoffs, for which convergence guarantees may be obtained. However, beyond zero-sum, multi-agent learning is known to display complex behaviours and convergence can no longer be guaranteed. Nonetheless, in order to develop a complete picture of the behaviour of multi-agent learning in competitive settings, the zero-sum assumption must be lifted. Motivated by this, we study the Q-Learning dynamics, a popular model of exploration and exploitation in multi-agent learning, in general competitive network games. We determine how the degree of competition, exploration rate and network structure impact the convergence of Q-Learning. To study general competitive games, we parameterise network games in terms of their competitiveness and study the average behaviour of Q-Learning across all games drawn from a choice of this parameter. This statistical approach establishes choices of parameters for which Q-Learning dynamics converge to a stable fixed point. Different to previous works, we find that the stability of Q-Learning is explicitly dependent only on the network connectivity rather than the total number of agents. Our experiments validate these results and show that, under certain network structures, the total number of agents can be increased without affecting the likelihood of unstable or chaotic behaviours.

Link to paper
BibTeX citation
@article{stability-of-multi-agent-learning-in-competitive-networks-de,
  title = {Stability of multi-agent learning in competitive networks: Delaying the onset of chaos},
  author = {A Hussain, F Belardinelli},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17435 ? , 2024},
  year = {2024},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/29692},
}
arXiv preprint arXiv:2402.07221 , 2024 Scholar

The reasons that agents act: Intention and instrumental goals

FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt

Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to the reasons it chooses its decision. We introduce a formal definition of intention in structural causal influence models, grounded in the philosophy literature on intent and applicable to real-world machine learning systems. Through a number of examples and results, we show that our definition captures the intuitive notion of intent and satisfies desiderata set-out by past work. In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents. Finally, we demonstrate how our definition can be used to infer the intentions of reinforcement learning agents and language models from their behaviour.

Link to paper
BibTeX citation
@article{the-reasons-that-agents-act-intention-and-instrumental-goals,
  title = {The reasons that agents act: Intention and instrumental goals},
  author = {FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt},
  journal = {arXiv preprint arXiv:2402.07221 , 2024},
  year = {2024},
  url = {https://arxiv.org/abs/2402.07221},
}
arXiv e-prints, arXiv: 2402.07221 , 2024 Scholar

The Reasons that Agents Act: Intention and Instrumental Goals

F Rhys Ward, M MacDermott, F Belardinelli, F Toni, T Everitt

Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to the reasons it chooses its decision. We introduce a formal definition of intention in structural causal influence models, grounded in the philosophy literature on intent and applicable to real-world machine learning systems. Through a number of examples and results, we show that our definition captures the intuitive notion of intent and satisfies desiderata set-out by past work. In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the ?

Link to paper
BibTeX citation
@article{the-reasons-that-agents-act-intention-and-instrumental-goals,
  title = {The Reasons that Agents Act: Intention and Instrumental Goals},
  author = {F Rhys Ward, M MacDermott, F Belardinelli, F Toni, T Everitt},
  journal = {arXiv e-prints, arXiv: 2402.07221 , 2024},
  year = {2024},
  url = {https://scholar.google.com/scholar?cluster=5968175976538516184&hl=en&oi=scholarr},
}
Proceedings of the 23rd International Conference on Autonomous Agents and … , 2024 Scholar

Verification of stochastic multi-agent systems with forgetful strategies

F Belardinelli, W Jamroga, M Mittelmann, A Murano

Formal methods for strategic reasoning play a fundamental role in Multi-Agent System (MAS) design and verification [3, 16, 56, 67, 71, 77, 79]. This success story originated from the breakthrough idea of using temporal logics for the specification of behaviors of reactive systems [31, 36, 70]. Temporal logics are traditionally

Link to paper
BibTeX citation
@article{verification-of-stochastic-multi-agent-systems-with-forgetfu,
  title = {Verification of stochastic multi-agent systems with forgetful strategies},
  author = {F Belardinelli, W Jamroga, M Mittelmann, A Murano},
  journal = {Proceedings of the 23rd International Conference on Autonomous Agents and … , 2024},
  year = {2024},
  url = {https://aamas.csc.liv.ac.uk/Proceedings/aamas2024/pdfs/p160.pdf},
}

2023

arXiv preprint arXiv:2311.09787 , 2023 Scholar

3vLTL: A Tool to Generate Automata for Three-valued LTL

F Belardinelli, A Ferrando, V Malvone

Multi-valued logics have a long tradition in the literature on system verification, including run-time verification. However, comparatively fewer model-checking tools have been developed for multi-valued specification languages. We present 3vLTL, a tool to generate Buchi automata from formulas in Linear-time Temporal Logic (LTL) interpreted on a three-valued semantics. Given an LTL formula, a set of atomic propositions as the alphabet for the automaton, and a truth value, our procedure generates a Buchi automaton that accepts all the words that assign the chosen truth value to the LTL formula. Given the particular type of the output of the tool, it can also be seamlessly processed by third-party libraries in a natural way. That is, the Buchi automaton can then be used in the context of formal verification to check whether an LTL formula is true, false, or undefined on a given model.

Link to paper
BibTeX citation
@article{3vltl-a-tool-to-generate-automata-for-three-valued-ltl-2023-,
  title = {3vLTL: A Tool to Generate Automata for Three-valued LTL},
  author = {F Belardinelli, A Ferrando, V Malvone},
  journal = {arXiv preprint arXiv:2311.09787 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2311.09787},
}

Aggregating Bipolar Opinions by Arguing

CJ Dickie, F Belardinelli, F Toni

In multi-agent systems, bipolar argumentation (BA) frameworks can be used to represent the individual opinions of agents during debate. Recently, bipolar assumption-based argumentation (bipolar ABA) frameworks have been combined with methods from social choice theory to aggregate such opinions, producing a single framework that reflects a consensus among the debate participants. However, there has not yet been a comprehensive study evaluating this method of aggregation. In this project, we establish preservation results by defining the conditions under which agreement between agents, such as shared semantic properties, can be preserved during bipolar ABA aggregation. It was found that the preservation of properties could be guaranteed, albeit with significant restrictions to the input frameworks and/or aggregation procedure. Furthermore, we build an application of bipolar ABA aggregation by designing and implementing a debate and decision-making platform named ArgSolve. As part of this process, we introduce a novel program for computing extensions of bipolar ABA semantics, accompanied by a corresponding soundness and completeness proof. Through evaluation, it was found that the platform presented an improvement over conversation-centric debate with respect to expressiveness, comprehension and perceived fairness, but also showed limitations in accessibility and usability.

BibTeX citation
@article{aggregating-bipolar-opinions-by-arguing-2023-mr35r1eaaaaj-uh,
  title = {Aggregating Bipolar Opinions by Arguing},
  author = {CJ Dickie, F Belardinelli, F Toni},
  year = {2023},
  url = {https://crjake.github.io/thesis.pdf},
}
Artificial Intelligence 316, 103847 , 2023 Scholar

An abstraction-refinement framework for verifying strategic properties in multi-agent systems with imperfect information

F Belardinelli, A Ferrando, V Malvone

We investigate the verification of Multi-Agent Systems against strategic properties expressed in Alternating-time Temporal Logic under the assumptions of imperfect information and perfect recall. To this end, we develop a three-valued semantics for concurrent game structures upon which we define an abstraction method. We prove that concurrent game structures with imperfect information admit perfect information abstractions that preserve three-valued satisfaction. Furthermore, to deal with cases in which the value of a specification is undefined, we develop a novel automata-theoretic technique for the linear-time logic ( LTL ), then apply it to finding ?failure? states. The latter can then be fed into a refinement procedure, thus providing a sound, albeit incomplete, verification method. We illustrate the overall procedure in a variant of the Train Gate Controller scenario and a simple voting protocol under imperfect ?

Link to paper
BibTeX citation
@article{an-abstraction-refinement-framework-for-verifying-strategic-,
  title = {An abstraction-refinement framework for verifying strategic properties in multi-agent systems with imperfect information},
  author = {F Belardinelli, A Ferrando, V Malvone},
  journal = {Artificial Intelligence 316, 103847 , 2023},
  year = {2023},
  url = {https://www.sciencedirect.com/science/article/pii/S0004370222001874},
}
arXiv preprint arXiv:2308.00707 , 2023 Scholar

Approximate model-based shielding for safe reinforcement learning

AW Goodall, F Belardinelli

Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.

Link to paper
BibTeX citation
@article{approximate-model-based-shielding-for-safe-reinforcement-lea,
  title = {Approximate model-based shielding for safe reinforcement learning},
  author = {AW Goodall, F Belardinelli},
  journal = {arXiv preprint arXiv:2308.00707 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2308.00707},
}
arXiv preprint arXiv:2304.11104 , 2023 Scholar

Approximate shielding of atari agents for safe exploration

AW Goodall, F Belardinelli

Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.

Link to paper
BibTeX citation
@article{approximate-shielding-of-atari-agents-for-safe-exploration-2,
  title = {Approximate shielding of atari agents for safe exploration},
  author = {AW Goodall, F Belardinelli},
  journal = {arXiv preprint arXiv:2304.11104 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2304.11104},
}

Arguing with physical activity data from a real-world wearable clinical trial with patients with a primary brain tumour

A Gould, F Toni, F Belardinelli

The objective of this project is to leverage Abstract Argumentation for Case-Based Reasoning (AA-CBR) to support clinical decision-making. The BrainWear study aims to assess the feasibility of its innovative use of wearable accelerometers to capture real-time Physical Activity (PA) data over intermittently collected Patient Reported Outcome (PRO) measures, for the assessment of patient Health-Related Quality of Life (HRQoL). This project proposes novel uses of AA-CBR as a transparent machine learning model to assess the utility of PA data from the BrainWear study, focusing on a patient-centred and transparent approach. Objectives: Firstly, we aim to develop novel applications of AA-CBR to predict the status of patient disease with PRO and PA data. This involves developing new processes to characterise real-world medical data into interpretable representations, as well as evaluating the performance of argumentation models at this prediction task. Furthermore, we aim to assess the trade-off between constructing clinically interpretable models, achieving high performance, and the associated effort required for data characterisations. Additionally, we seek to determine if PA data can supplement or replace PRO measures and investigate the use of argumentation to support our conclusions. This will involve constructing innovative methods to identify conflicts between the two measures. We aim to create novel variants of AA-CBR that reduce the burden of feature characterisation and can handle time-based representations. By experimenting with these diverse AA-CBR variants, we seek to identify the most suitable approach for clinical decision ?

Link to paper
BibTeX citation
@article{arguing-with-physical-activity-data-from-a-real-world-wearab,
  title = {Arguing with physical activity data from a real-world wearable clinical trial with patients with a primary brain tumour},
  author = {A Gould, F Toni, F Belardinelli},
  year = {2023},
  url = {https://adam-gould.co.uk/MEngReport.pdf},
}
arXiv preprint arXiv:2301.09619 , 2023 Scholar

Asymptotic convergence and performance of multi-agent Q-learning dynamics

AA Hussain, F Belardinelli, G Piliouras

Achieving convergence of multiple learning agents in general -player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games. Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge.

Link to paper
BibTeX citation
@article{asymptotic-convergence-and-performance-of-multi-agent-q-lear,
  title = {Asymptotic convergence and performance of multi-agent Q-learning dynamics},
  author = {AA Hussain, F Belardinelli, G Piliouras},
  journal = {arXiv preprint arXiv:2301.09619 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2301.09619},
}
Proceedings of the AAAI Conference on Artificial Intelligence 37 (5), 6245-6252 , 2023 Scholar

Automatically verifying expressive epistemic properties of programs

F Belardinelli, I Boureanu, V Malvone, F Rajaona

We propose a new approach to the verification of epistemic properties of programmes. First, we introduce the new``program-epistemic''logic L_PK, which is strictly richer and more general than similar formalisms appearing in the literature. To solve the verification problem in an efficient way, we introduce a translation from our language L_PK into first-order logic. Then, we show and prove correct a reduction from the model checking problem for program-epistemic formulas to the satisfiability of their first-order translation. Both our logic and our translation can handle richer specification wrt the state of the art, allowing us to express the knowledge of agents about facts pertaining to programs (ie, agents' knowledge before a program is executed as well as after is has been executed). Furthermore, we implement our translation in Haskell in a general way (ie, independently of the programs in the logical statements), and we use existing SMT-solvers to check satisfaction of L_PK formulas on a benchmark example in the AI/agency field.

Link to paper
BibTeX citation
@article{automatically-verifying-expressive-epistemic-properties-of-p,
  title = {Automatically verifying expressive epistemic properties of programs},
  author = {F Belardinelli, I Boureanu, V Malvone, F Rajaona},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence 37 (5), 6245-6252 , 2023},
  year = {2023},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/25769},
}
arXiv preprint arXiv:2307.13928 , 2023 Scholar

Beyond strict competition: approximate convergence of multi agent Q-learning dynamics

A Hussain, F Belardinelli, G Piliouras

The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.

Link to paper
BibTeX citation
@article{beyond-strict-competition-approximate-convergence-of-multi-a,
  title = {Beyond strict competition: approximate convergence of multi agent Q-learning dynamics},
  author = {A Hussain, F Belardinelli, G Piliouras},
  journal = {arXiv preprint arXiv:2307.13928 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2307.13928},
}
arXiv preprint arXiv:2307.10987 , 2023 Scholar

Characterising decision theories with mechanised causal graphs

M MacDermott, T Everitt, F Belardinelli

How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, with contenders for how to think about it including evidential decision theory, causal decision theory, and functional decision theory. In this paper, we show that mechanised causal models can be used to characterise and differentiate the most important decision theories, and generate a taxonomy of different decision theories.

Link to paper
BibTeX citation
@article{characterising-decision-theories-with-mechanised-causal-grap,
  title = {Characterising decision theories with mechanised causal graphs},
  author = {M MacDermott, T Everitt, F Belardinelli},
  journal = {arXiv preprint arXiv:2307.10987 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2307.10987},
}
Proceedings of the 2023 International Conference on Autonomous Agents and ? , 2023 Scholar

Defining deception in structural causal games

FR Ward, F Toni, F Belardinelli

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals. There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a functional definition of deception in structural causal games, grounded in the philosophical literature. We present several examples to establish that our formal definition captures philosophical desiderata for deception.

Link to paper
BibTeX citation
@article{defining-deception-in-structural-causal-games-2023-mr35r1eaa,
  title = {Defining deception in structural causal games},
  author = {FR Ward, F Toni, F Belardinelli},
  journal = {Proceedings of the 2023 International Conference on Autonomous Agents and ? , 2023},
  year = {2023},
  url = {https://www.southampton.ac.uk/~eg/AAMAS2023/pdfs/p2902.pdf},
}
Advances in neural information processing systems 36, 2313-2341 , 2023 Scholar

Honesty is the best policy: defining and mitigating AI deception

F Ward, F Toni, F Belardinelli, T Everitt

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formaldefinition of deception in structural causal games, grounded in the philosophyliterature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.

Link to paper
BibTeX citation
@article{honesty-is-the-best-policy-defining-and-mitigating-ai-decept,
  title = {Honesty is the best policy: defining and mitigating AI deception},
  author = {F Ward, F Toni, F Belardinelli, T Everitt},
  journal = {Advances in neural information processing systems 36, 2313-2341 , 2023},
  year = {2023},
  url = {https://proceedings.neurips.cc/paper_files/paper/2023/hash/06fc7ae4a11a7eb5e20fe018db6c036f-Abstract-Conference.html},
}
arXiv e-prints, arXiv: 2312.01350 , 2023 Scholar

Honesty Is the Best Policy: Defining and Mitigating AI Deception

F Rhys Ward, F Belardinelli, F Toni, T Everitt

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement ?

Link to paper
BibTeX citation
@article{honesty-is-the-best-policy-defining-and-mitigating-ai-decept,
  title = {Honesty Is the Best Policy: Defining and Mitigating AI Deception},
  author = {F Rhys Ward, F Belardinelli, F Toni, T Everitt},
  journal = {arXiv e-prints, arXiv: 2312.01350 , 2023},
  year = {2023},
  url = {https://scholar.google.com/scholar?cluster=4892495585589940263&hl=en&oi=scholarr},
}
International Symposium on Formal Methods, 473-491 , 2023 Scholar

Program semantics and verification technique for ai-centred programs

F Rajaona, I Boureanu, V Malvone, F Belardinelli

We give a general-purpose programming language in which programs can reason about their own knowledge. To specify what these intelligent programs know, we define a ?program epistemic? logic, akin to a dynamic epistemic logic for programs. Our logic properties are complex, including programs introspecting into future state of affairs, i.e., reasoning now about facts that hold only after they and other threads will execute. To model aspects anchored in privacy, our logic is interpreted over partial observability of variables, thus capturing that each thread can ?see? only a part of the global space of variables. We verify program-epistemic properties on such AI-centred programs. To this end, we give a sound translation of the validity of our program-epistemic logic into first-order validity, using a new weakest-precondition semantics and a book-keeping of variable assignment. We implement our translation and fully ?

Link to paper
BibTeX citation
@article{program-semantics-and-verification-technique-for-ai-centred-,
  title = {Program semantics and verification technique for ai-centred programs},
  author = {F Rajaona, I Boureanu, V Malvone, F Belardinelli},
  journal = {International Symposium on Formal Methods, 473-491 , 2023},
  year = {2023},
  url = {https://link.springer.com/chapter/10.1007/978-3-031-27481-7_27},
}
WOA, 1-16 , 2023 Scholar

RMLGym: a Formal Reward Machine Framework for Reinforcement Learning.

H Unniyankal, F Belardinelli, A Ferrando, V Malvone

Reinforcement learning (RL) is a powerful technique for learning optimal policies from trial and error. However, designing a reward function that captures the desired behavior of an agent is often a challenging and tedious task, especially when the agent has to deal with complex and multi-objective problems. To address this issue, researchers have proposed to use higher-level languages, such as Signal Temporal Logic (STL), to specify reward functions in a declarative and expressive way, and then automatically compile them into lower-level functions that can be used by standard RL algorithms. In this paper, we present RMLGym, a tool that integrates RML, a runtime verification tool, with OpenAI Gym, a popular framework for developing and comparing RL algorithms. RMLGym allows users to define reward functions using RML specifications and then generates reward monitors that evaluate the agent?s performance and provide feedback at each step. We demonstrate the usefulness and flexibility of RMLGym by applying it to a famous benchmark problem from OpenAI Gym, and we analyze the strengths and limitations of our approach.

Link to paper
BibTeX citation
@article{rmlgym-a-formal-reward-machine-framework-for-reinforcement-l,
  title = {RMLGym: a Formal Reward Machine Framework for Reinforcement Learning.},
  author = {H Unniyankal, F Belardinelli, A Ferrando, V Malvone},
  journal = {WOA, 1-16 , 2023},
  year = {2023},
  url = {https://ceur-ws.org/Vol-3579/paper1.pdf},
}
arXiv preprint arXiv:2310.17219 , 2023 Scholar

Scalable verification of strategy logic through three-valued abstraction

F Belardinelli, A Ferrando, W Jamroga, V Malvone, A Murano

The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.

Link to paper
BibTeX citation
@article{scalable-verification-of-strategy-logic-through-three-valued,
  title = {Scalable verification of strategy logic through three-valued abstraction},
  author = {F Belardinelli, A Ferrando, W Jamroga, V Malvone, A Murano},
  journal = {arXiv preprint arXiv:2310.17219 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2310.17219},
}
arXiv preprint arXiv:2307.13922 , 2023 Scholar

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

A Hussain, D Leonte, F Belardinelli, G Piliouras

The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.

Link to paper
BibTeX citation
@article{stability-of-multi-agent-learning-convergence-in-network-gam,
  title = {Stability of Multi-Agent Learning: Convergence in Network Games with Many Players},
  author = {A Hussain, D Leonte, F Belardinelli, G Piliouras},
  journal = {arXiv preprint arXiv:2307.13922 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2307.13922},
}
arXiv preprint arXiv:2310.17240 , 2023 Scholar

Strategic abilities of forgetful agents in stochastic environments

F Belardinelli, W Jamroga, M Mittelmann, A Murano

In this paper, we investigate the probabilistic variants of the strategy logics ATL and ATL* under imperfect information. Specifically, we present novel decidability and complexity results when the model transitions are stochastic and agents play uniform strategies. That is, the semantics of the logics are based on multi-agent, stochastic transition systems with imperfect information, which combine two sources of uncertainty, namely, the partial observability agents have on the environment, and the likelihood of transitions to occur from a system state. Since the model checking problem is undecidable in general in this setting, we restrict our attention to agents with memoryless (positional) strategies. The resulting setting captures the situation in which agents have qualitative uncertainty of the local state and quantitative uncertainty about the occurrence of future events. We illustrate the usefulness of this setting with meaningful examples.

Link to paper
BibTeX citation
@article{strategic-abilities-of-forgetful-agents-in-stochastic-enviro,
  title = {Strategic abilities of forgetful agents in stochastic environments},
  author = {F Belardinelli, W Jamroga, M Mittelmann, A Murano},
  journal = {arXiv preprint arXiv:2310.17240 , 2023},
  year = {2023},
  url = {https://arxiv.org/abs/2310.17240},
}
International conference on machine learning, 14178-14202 , 2023 Scholar

The impact of exploration on convergence and performance of multi-agent Q-learning dynamics

A Hussain, F Belardinelli, D Paccagnan

Understanding the impact of exploration on the behaviour of multi-agent learning has, so far, benefited from the restriction to potential, or network zero-sum games in which convergence to an equilibrium can be shown. Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q-Learning dynamics. We show that, in any network game, exploration by agents results in the convergence of Q-Learning to a neighbourhood of an equilibrium. This holds independently of whether the dynamics reach the equilibrium or display complex behaviours. We show that increasing the exploration rate decreases the size of this neighbourhood and also decreases the ability of all agents to improve their payoffs. Furthermore, in a broad class of games, the payoff performance of Q-Learning dynamics, measured by Social Welfare, decreases when the exploration rate increases. Our experiments show this to be a general phenomenon, namely that exploration leads to improved convergence of Q-Learning, at the cost of payoff performance.

Link to paper
BibTeX citation
@article{the-impact-of-exploration-on-convergence-and-performance-of-,
  title = {The impact of exploration on convergence and performance of multi-agent Q-learning dynamics},
  author = {A Hussain, F Belardinelli, D Paccagnan},
  journal = {International conference on machine learning, 14178-14202 , 2023},
  year = {2023},
  url = {https://proceedings.mlr.press/v202/hussain23a.html},
}

2022

AISafety@ IJCAI , 2022 Scholar

A Causal Perspective on AI Deception in Games.

FR Ward, F Toni, F Belardinelli

Deception is a core challenge for AI safety and we focus on the problem that AI agents might learn deceptive strategies in pursuit of their objectives. We define the incentives one agent has to signal to and deceive another agent. We present several examples of deceptive artificial agents and show that our definition has desirable properties.

Link to paper
BibTeX citation
@article{a-causal-perspective-on-ai-deception-in-games-2022-mr35r1eaa,
  title = {A Causal Perspective on AI Deception in Games.},
  author = {FR Ward, F Toni, F Belardinelli},
  journal = {AISafety@ IJCAI , 2022},
  year = {2022},
  url = {http://star.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-3193/paper2CAUSAL.pdf},
}
arXiv preprint arXiv:2206.05973 , 2022 Scholar

A Sahlqvist-style Correspondence Theorem for Linear-time Temporal Logic

R Li, F Belardinelli

The language of modal logic is capable of expressing first-order conditions on Kripke frames. The classic result by Henrik Sahlqvist identifies a significant class of modal formulas for which first-order conditions -- or Sahlqvist correspondents -- can be find in an effective, algorithmic way. Recent works have successfully extended this classic result to more complex modal languages. In this paper, we pursue a similar line and develop a Sahlqvist-style correspondence theorem for Linear-time Temporal Logic (LTL), which is one of the most widely used formal languages for temporal specification. LTL extends the syntax of basic modal logic with dedicated temporal operators Next X and Until U . As a result, the complexity of the class of formulas that have first-order correspondents also increases accordingly. In this paper, we identify a significant class of LTL Sahlqvist formulas built by using modal operators F , G, X, and U . The main result of this paper is to prove the correspondence of LTL Sahlqvist formulas to frame conditions that are definable in first-order language.

Link to paper
BibTeX citation
@article{a-sahlqvist-style-correspondence-theorem-for-linear-time-tem,
  title = {A Sahlqvist-style Correspondence Theorem for Linear-time Temporal Logic},
  author = {R Li, F Belardinelli},
  journal = {arXiv preprint arXiv:2206.05973 , 2022},
  year = {2022},
  url = {https://arxiv.org/abs/2206.05973},
}
ICLR Workshop on Agent Learning in Open-Endedness , 2022 Scholar

Agent, do you see it now? systematic generalisation in deep reinforcement learning

BG Le?n, M Shanahan, F Belardinelli

Systematic generalisation, i.e., the algebraic capacity to understand and execute unseen tasks by combining already known primitives, is one of the most desirable features for a computational model. Good adaptation to novel tasks in open-ended settings rely heavily on the ability of agents to reuse their past experience and recombine meaningful learning pieces to tackle new goals. In this work, we analyse how the architecture of convolutional layers impacts on the performance of autonomous agents when generalising to zero-shot, unseen tasks while executing human instructions. Our findings suggest that the convolutional architecture that is correctly suited to the environment the agent will interact with, may be of greater importance than having a generic convolutional network trained in the given environment.

Link to paper
BibTeX citation
@article{agent-do-you-see-it-now-systematic-generalisation-in-deep-re,
  title = {Agent, do you see it now? systematic generalisation in deep reinforcement learning},
  author = {BG Le?n, M Shanahan, F Belardinelli},
  journal = {ICLR Workshop on Agent Learning in Open-Endedness , 2022},
  year = {2022},
  url = {https://openreview.net/forum?id=BHHG47Db8-q},
}
Journal of Artificial Intelligence Research 73, 897-932 , 2022 Scholar

Approximating perfect recall when model checking strategic abilities: Theory and applications

F Belardinelli, A Lomuscio, V Malvone, E Yu

The model checking problem for multi-agent systems against specifications in the alternating-time temporal logic ATL, hence ATL?, under perfect recall and imperfect information is known to be undecidable. To tackle this problem, in this paper we investigate a notion of bounded recall under incomplete information. We present a novel three-valued semantics for ATL? in this setting and analyse the corresponding model checking problem. We show that the three-valued semantics here introduced is an approximation of the classic two-valued semantics, then give a sound, albeit partial, algorithm for model checking two-valued perfect recall via its approximation as three-valued bounded recall. Finally, we extend MCMAS, an open-source model checker for ATL and other agent specifications, to incorporate bounded recall; we illustrate its use and present experimental results.

Link to paper
BibTeX citation
@article{approximating-perfect-recall-when-model-checking-strategic-a,
  title = {Approximating perfect recall when model checking strategic abilities: Theory and applications},
  author = {F Belardinelli, A Lomuscio, V Malvone, E Yu},
  journal = {Journal of Artificial Intelligence Research 73, 897-932 , 2022},
  year = {2022},
  url = {https://www.jair.org/index.php/jair/article/view/12539},
}
arXiv preprint arXiv:2209.14010 , 2022 Scholar

Argumentative reward learning: Reasoning about human preferences

FR Ward, F Belardinelli, F Toni

We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.

Link to paper
BibTeX citation
@article{argumentative-reward-learning-reasoning-about-human-preferen,
  title = {Argumentative reward learning: Reasoning about human preferences},
  author = {FR Ward, F Belardinelli, F Toni},
  journal = {arXiv preprint arXiv:2209.14010 , 2022},
  year = {2022},
  url = {https://arxiv.org/abs/2209.14010},
}
arXiv e-prints, arXiv: 2209.14010 , 2022 Scholar

Argumentative Reward Learning: Reasoning About Human Preferences

F Rhys Ward, F Belardinelli, F Toni

We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.

Link to paper
BibTeX citation
@article{argumentative-reward-learning-reasoning-about-human-preferen,
  title = {Argumentative Reward Learning: Reasoning About Human Preferences},
  author = {F Rhys Ward, F Belardinelli, F Toni},
  journal = {arXiv e-prints, arXiv: 2209.14010 , 2022},
  year = {2022},
  url = {https://scholar.google.com/scholar?cluster=8058110464247862104&hl=en&oi=scholarr},
}
Thinking and Calculating: Essays in Logic, Its History and Its Philosophical ? , 2022 Scholar

Counterpart Semantics at Work: Independence and Incompleteness Results in Quantified Modal Logic

F Belardinelli

In this paper we make a targeted contribution to the model theory of quantified modal logic (QML). Specifically, we investigate the independence of two first-order modal principles thoroughly discussed in the literature on QML: the Barcan formula BF and the necessity of fictionality N E. We make use of counterpart semantics to prove the mutual independence of these two principles in various QML calculi on normal modal bases as strong as S 5. Further, these independence results are in turn used to prove the incompleteness of a number of QML calculi based on free logic and containing BF with respect to Kripke semantics.

Link to paper
BibTeX citation
@article{counterpart-semantics-at-work-independence-and-incompletenes,
  title = {Counterpart Semantics at Work: Independence and Incompleteness Results in Quantified Modal Logic},
  author = {F Belardinelli},
  journal = {Thinking and Calculating: Essays in Logic, Its History and Its Philosophical ? , 2022},
  year = {2022},
  url = {https://link.springer.com/chapter/10.1007/978-3-030-97303-2_21},
}
14th International Conference on Agents and Artificial Intelligence (ICAART ? , 2022 Scholar

Enabling Markovian representations under imperfect information

F Belardinelli, BG Le?n, V Malvone

Markovian systems are widely used in reinforcement learning (RL), when the successful completion of a task depends exclusively on the last interaction between an autonomous agent and its environment. Unfortunately, real-world instructions are typically complex and often better described as non-Markovian. In this paper we present an extension method that allows solving partially-observable non-Markovian reward decision processes (PONMRDPs) by solving equivalent Markovian models. This potentially facilitates Markovian-based state-of-the-art techniques, including RL, to find optimal behaviours for problems best described as PONMRDP. We provide formal optimality guarantees of our extension methods together with a counterexample illustrating that naive extensions from existing techniques in fully-observable environments cannot provide such guarantees.

Link to paper
BibTeX citation
@article{enabling-markovian-representations-under-imperfect-informati,
  title = {Enabling Markovian representations under imperfect information},
  author = {F Belardinelli, BG Le?n, V Malvone},
  journal = {14th International Conference on Agents and Artificial Intelligence (ICAART ? , 2022},
  year = {2022},
  url = {https://telecom-paris.hal.science/hal-03779034/},
}
AAMAS, 1759-1761 , 2022 Scholar

On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios.

FR Ward, F Toni, F Belardinelli

1. The AI alignment problem in reinforcement learning (RL): Generate a reward function which captures the desired specification [1, 6]. 2. Reward learning as a solution to AI alignment: Learn the reward function from human feedback [8, 6]. 3. In the single-agent reward learning setting, the agent might have incentives to manipulate the human from which it learns rewards, or otherwise to influence the reward learning process (RLP)[2, 3, 5].

Link to paper
BibTeX citation
@article{on-agent-incentives-to-manipulate-human-feedback-in-multi-ag,
  title = {On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios.},
  author = {FR Ward, F Toni, F Belardinelli},
  journal = {AAMAS, 1759-1761 , 2022},
  year = {2022},
  url = {https://www.doc.ic.ac.uk/~afr114/explainAI21/talks/FW.pdf},
}
arXiv preprint arXiv:2206.13841 , 2022 Scholar

Program semantics and a verification technique for knowledge-based multi-agent systems

F Belardinelli, I Boureanu, V Malvone, SF Rajaona

We give a relational and a weakest precondition semantics for "knowledge-based programs", i.e., programs that restrict observability of variables so as to richly express changes in the knowledge of agents who can or cannot observe said variables. Based on these knowledge-based programs, we define a program-epistemic logic to model complex epistemic properties of the execution of multi-agent systems. We translate the validity of program-epistemic logic formulae into first-order validity, using our weakest precondition semantics and an ingenious book-keeping of variable assignment. We implement our translation in Haskell in a general way (i.e., independently of the programs in the logical statements), and test this novel verification method for our new program-epistemic logic on a series of well-established examples.

Link to paper
BibTeX citation
@article{program-semantics-and-a-verification-technique-for-knowledge,
  title = {Program semantics and a verification technique for knowledge-based multi-agent systems},
  author = {F Belardinelli, I Boureanu, V Malvone, SF Rajaona},
  journal = {arXiv preprint arXiv:2206.13841 , 2022},
  year = {2022},
  url = {https://arxiv.org/abs/2206.13841},
}
arXiv preprint arXiv:2201.09616 , 2022 Scholar

Reasoning about human-friendly strategies in repeated keyword auctions

F Belardinelli, W Jamroga, V Malvone, M Mittelmann, A Murano, ...

In online advertising, search engines sell ad placements for keywords continuously through auctions. This problem can be seen as an infinitely repeated game since the auction is executed whenever a user performs a query with the keyword. As advertisers may frequently change their bids, the game will have a large set of equilibria with potentially complex strategies. In this paper, we propose the use of natural strategies for reasoning in such setting as they are processable by artificial agents with limited memory and/or computational power as well as understandable by human users. To reach this goal, we introduce a quantitative version of Strategy Logic with natural strategies in the setting of imperfect information. In a first step, we show how to model strategies for repeated keyword auctions and take advantage of the model for proving properties evaluating this game. In a second step, we study the logic in relation to the distinguishing power, expressivity, and model-checking complexity for strategies with and without recall.

Link to paper
BibTeX citation
@article{reasoning-about-human-friendly-strategies-in-repeated-keywor,
  title = {Reasoning about human-friendly strategies in repeated keyword auctions},
  author = {F Belardinelli, W Jamroga, V Malvone, M Mittelmann, A Murano, ...},
  journal = {arXiv preprint arXiv:2201.09616 , 2022},
  year = {2022},
  url = {https://arxiv.org/abs/2201.09616},
}
arXiv preprint arXiv:2201.08111 , 2022 Scholar

Safety-aware multi-agent apprenticeship learning

J Zhao

Our objective of this project is to make the extension based on the technique mentioned in the paper "Safety-Aware Apprenticeship Learning" to improve the utility and the efficiency of the existing Reinforcement Learning model from a Single-Agent Learning framework to a Multi-Agent Learning framework. Our contributions to the project are presented in the following bullet points: 1. Regarding the fact that we will add an extension to the Inverse Reinforcement Learning model from a Single-Agent scenario to a Multi-Agent scenario. Our first contribution to this project is considering the case of extracting safe reward functions from expert behaviors in a Multi-Agent scenario instead of being from the Single-Agent scenario. 2. Our second contribution is extending the Single-Agent Learning Framework to a Multi-Agent Learning framework and designing a novel Learning Framework based on the extension in the end. 3. Our final contribution to this project is evaluating empirically the performance of my extension to the Single-Agent Inverse Reinforcement Learning framework.

Link to paper
BibTeX citation
@article{safety-aware-multi-agent-apprenticeship-learning-2022-mr35r1,
  title = {Safety-aware multi-agent apprenticeship learning},
  author = {J Zhao},
  journal = {arXiv preprint arXiv:2201.08111 , 2022},
  year = {2022},
  url = {https://arxiv.org/abs/2201.08111},
}

2021

Proceedings of the 20th International Conference on Autonomous Agents and ? , 2021 Scholar

Aggregating bipolar opinions

S Lauren, F Belardinelli, F Toni

We introduce a novel method to aggregate Bipolar Argumentation (BA) Frameworks expressing opinions by different parties in debates. We use Bipolar Assumption-based Argumentation (ABA) as an all-encompassing formalism for BA under different semantics. By leveraging on recent results on judgement aggregation in Social Choice Theory, we prove several preservation results, both positive and negative, for relevant properties of Bipolar ABA.

Link to paper
BibTeX citation
@article{aggregating-bipolar-opinions-2021-mr35r1eaaaaj-kxtntwgdaa4c,
  title = {Aggregating bipolar opinions},
  author = {S Lauren, F Belardinelli, F Toni},
  journal = {Proceedings of the 20th International Conference on Autonomous Agents and ? , 2021},
  year = {2021},
  url = {https://ifmas.csc.liv.ac.uk/Proceedings/aamas2021/pdfs/p746.pdf},
}
arXiv preprint arXiv:2102.02881 , 2021 Scholar

Aggregating Bipolar Opinions (With Appendix)

S Lauren, F Belardinelli, F Toni

We introduce a novel method to aggregate Bipolar Argumentation (BA) Frameworks expressing opinions by different parties in debates. We use Bipolar Assumption-based Argumentation (ABA) as an all-encompassing formalism for BA under different semantics. By leveraging on recent results on judgement aggregation in Social Choice Theory, we prove several preservation results, both positive and negative, for relevant properties of Bipolar ABA.

Link to paper
BibTeX citation
@article{aggregating-bipolar-opinions-with-appendix-2021-mr35r1eaaaaj,
  title = {Aggregating Bipolar Opinions (With Appendix)},
  author = {S Lauren, F Belardinelli, F Toni},
  journal = {arXiv preprint arXiv:2102.02881 , 2021},
  year = {2021},
  url = {https://arxiv.org/abs/2102.02881},
}
arXiv preprint arXiv:2102.01434 , 2021 Scholar

An abstraction-based method to check multi-agent deep reinforcement-learning behaviors

PE Mqirmi, F Belardinelli, BG Le?n

Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL) and builds an abstract representation of the system to reduce the complexity of the verification step. This abstract model allows for model checking techniques to identify a set of abstract policies that meet the safety constraints expressed in PCTL. Then, the agents' behaviours are restricted according to these safe abstract policies. We provide formal guarantees that by using this method, the actions of the agents always meet the safety constraints, and provide a procedure to generate an abstract model automatically. We empirically evaluate and show the effectiveness of our method in a multi-agent environment.

Link to paper
BibTeX citation
@article{an-abstraction-based-method-to-check-multi-agent-deep-reinfo,
  title = {An abstraction-based method to check multi-agent deep reinforcement-learning behaviors},
  author = {PE Mqirmi, F Belardinelli, BG Le?n},
  journal = {arXiv preprint arXiv:2102.01434 , 2021},
  year = {2021},
  url = {https://arxiv.org/abs/2102.01434},
}
Information and Computation 276, 104552 , 2021 Scholar

Bisimulations for verifying strategic abilities with an application to the ThreeBallot voting protocol

F Belardinelli, R Condurache, C Dima, W Jamroga, M Knapik

We propose a notion of alternating bisimulation for strategic abilities under imperfect information. The bisimulation preserves formulas of ATL for both the {\em objective} and {\em subjective} variants of the state-based semantics with imperfect information, which are commonly used in the modeling and verification of multi-agent systems. Furthermore, we apply the theoretical result to the verification of coercion-resistance in the ThreeBallot voting system, a voting protocol that does not use cryptography. In particular, we show that natural simplifications of an initial model of the protocol are in fact bisimulations of the original model, and therefore satisfy the same ATL properties, including coercion-resistance. These simplifications allow the model-checking tool MCMAS to terminate on models with a larger number of voters and candidates, compared with the initial model.

Link to paper
BibTeX citation
@article{bisimulations-for-verifying-strategic-abilities-with-an-appl,
  title = {Bisimulations for verifying strategic abilities with an application to the ThreeBallot voting protocol},
  author = {F Belardinelli, R Condurache, C Dima, W Jamroga, M Knapik},
  journal = {Information and Computation 276, 104552 , 2021},
  year = {2021},
  url = {https://arxiv.org/abs/2203.13692},
}
arXiv preprint arXiv:2112.11490 , 2021 Scholar

Do androids dream of electric fences? safety-aware reinforcement learning with latent shielding

C He, BG Le?n, F Belardinelli

The growing trend of fledgling reinforcement learning systems making their way into real-world applications has been accompanied by growing concerns for their safety and robustness. In recent years, a variety of approaches have been put forward to address the challenges of safety-aware reinforcement learning; however, these methods often either require a handcrafted model of the environment to be provided beforehand, or that the environment is relatively simple and low-dimensional. We present a novel approach to safety-aware deep reinforcement learning in high-dimensional environments called latent shielding. Latent shielding leverages internal representations of the environment learnt by model-based agents to "imagine" future trajectories and avoid those deemed unsafe. We experimentally demonstrate that this approach leads to improved adherence to formally-defined safety specifications.

Link to paper
BibTeX citation
@article{do-androids-dream-of-electric-fences-safety-aware-reinforcem,
  title = {Do androids dream of electric fences? safety-aware reinforcement learning with latent shielding},
  author = {C He, BG Le?n, F Belardinelli},
  journal = {arXiv preprint arXiv:2112.11490 , 2021},
  year = {2021},
  url = {https://arxiv.org/abs/2112.11490},
}
Proceedings of the Genetic and Evolutionary Computation Conference Companion ? , 2021 Scholar

Evolutionary reinforcement learning for sparse rewards

S Zhu, F Belardinelli, BG Le?n

Temporal logic (TL) is an expressive way of specifying complex goals in reinforcement learning (RL), which facilitates the design of reward functions. However, the combination of these two techniques is prone to generate sparse rewards, which might hinder the learning process. Evolutionary algorithms (EAs) hold promise in tackling this problem by encouraging the diversification of policies through exploration in the parameter space. In this paper, we present GEATL , the first hybrid on-policy evolutionary-based algorithm that combines the advantages of gradient learning in deep RL with the exploration ability of evolutionary algorithms, in order to solve the sparse reward problem pertaining to TL specifications. We test our approach in a delayed reward scenario. Differently from previous baselines combining RL and TL, we show that GEATL is able to tackle complex TL specifications even in sparse-reward settings.

Link to paper
BibTeX citation
@article{evolutionary-reinforcement-learning-for-sparse-rewards-2021-,
  title = {Evolutionary reinforcement learning for sparse rewards},
  author = {S Zhu, F Belardinelli, BG Le?n},
  journal = {Proceedings of the Genetic and Evolutionary Computation Conference Companion ? , 2021},
  year = {2021},
  url = {https://dl.acm.org/doi/abs/10.1145/3449726.3463142},
}
arXiv preprint arXiv:2110.09461 , 2021 Scholar

In a nutshell, the human asked for this: Latent goals for following temporal specifications

BG Le?n, M Shanahan, F Belardinelli

We address the problem of building agents whose goal is to learn to execute out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the agent's neural architecture is a key feature when DRL agents are learning to solve OOD tasks in TL. Yet, the studies on this topic are still in their infancy. In this work, we propose a new deep learning configuration with inductive biases that lead agents to generate latent representations of their current goal, yielding a stronger generalization performance. We use these latent-goal networks within a neuro-symbolic framework that executes multi-task formally-defined instructions and contrast the performance of the proposed neural networks against employing different state-of-the-art (SOTA) architectures when generalizing to unseen instructions in OOD environments.

Link to paper
BibTeX citation
@article{in-a-nutshell-the-human-asked-for-this-latent-goals-for-foll,
  title = {In a nutshell, the human asked for this: Latent goals for following temporal specifications},
  author = {BG Le?n, M Shanahan, F Belardinelli},
  journal = {arXiv preprint arXiv:2110.09461 , 2021},
  year = {2021},
  url = {https://arxiv.org/abs/2110.09461},
}
IJCAI, 1787-1793 , 2021 Scholar

Reasoning About Agents That May Know Other Agents' Strategies.

F Belardinelli, S Knight, A Lomuscio, B Maubert, A Murano, S Rubin

We study the semantics of knowledge in strategic reasoning. Most existing works either implicitly assume that agents do not know one another?s strategies, or that all strategies are known to all; and some works present inconsistent mixes of both features. We put forward a novel semantics for Strategy Logic with Knowledge that cleanly models whose strategies each agent knows. We study how adopting this semantics impacts agents? knowledge and strategic ability, as well as the complexity of the model-checking problem.

Link to paper
BibTeX citation
@article{reasoning-about-agents-that-may-know-other-agents-strategies,
  title = {Reasoning About Agents That May Know Other Agents' Strategies.},
  author = {F Belardinelli, S Knight, A Lomuscio, B Maubert, A Murano, S Rubin},
  journal = {IJCAI, 1787-1793 , 2021},
  year = {2021},
  url = {https://www.researchgate.net/profile/Aniello-Murano/publication/353836591_Reasoning_About_Agents_That_May_Know_Other_Agents'_Strategies/links/61c087bba6251b553ad1459b/Reasoning-About-Agents-That-May-Know-Other-Agents-Strategies.pdf},
}
Proc. 38th Int. Conf. Mach. Learn. Workshop Uncertainty Robustness Deep ? , 2021 Scholar

Relational deep reinforcement learning and latent goals for following instructions in temporal logic

BG Le?n, M Shanahan, F Belardinelli

We address the problem of building agents aiming to satisfy out-of distribution (OOD) multitask instructions expressed in temporal logic (TL) through deep reinforcement learning (DRL). There exists evidence that the deep learning architecture is a key feature when teaching a DRL agent to solve OOD tasks in TL. Yet, the studies on the performance of different networks are still limited. Here, we first study the impact of introducing relational layers and key-value attention in OOD safety-aware navigation TL tasks. Then, we propose a novel architecture configuration that induces a DRL agent to generate latent representations of its current task given its perception of the environment and the formal instruction. We find that inducing this latent goal representation significantly improves the performance of neural-based agents in OOD environments.

Link to paper
BibTeX citation
@article{relational-deep-reinforcement-learning-and-latent-goals-for-,
  title = {Relational deep reinforcement learning and latent goals for following instructions in temporal logic},
  author = {BG Le?n, M Shanahan, F Belardinelli},
  journal = {Proc. 38th Int. Conf. Mach. Learn. Workshop Uncertainty Robustness Deep ? , 2021},
  year = {2021},
  url = {http://www.gatsby.ucl.ac.uk/~balaji/udl2021/accepted-papers/UDL2021-paper-078.pdf},
}
Artificial Intelligence 300, 103557 , 2021 Scholar

Strategic reasoning with a bounded number of resources: The quest for tractability

F Belardinelli, S Demri

The resource-bounded alternating-time temporal logic RB?ATL combines strategic reasoning with reasoning about resources. Its model-checking problem is known to be 2exptime-complete (the same as its proper extension RB?ATL?) and fragments have been identified to lower the complexity. In this work, we consider the variant RB?ATL+ that allows for Boolean combinations of path formulae starting with single temporal operators, but restricted to a single resource, providing an interesting trade-off between temporal expressivity and resource analysis. We show that the model-checking problem for RB?ATL+ restricted to a single agent and a single resource is ? 2 P-complete, hence the same as for the standard branching-time temporal logic CTL+. In this case reasoning about resources comes at no extra computational cost. When a fixed finite set of linear-time temporal operators is considered, the model ?

Link to paper
BibTeX citation
@article{strategic-reasoning-with-a-bounded-number-of-resources-the-q,
  title = {Strategic reasoning with a bounded number of resources: The quest for tractability},
  author = {F Belardinelli, S Demri},
  journal = {Artificial Intelligence 300, 103557 , 2021},
  year = {2021},
  url = {https://www.sciencedirect.com/science/article/pii/S0004370221001089},
}

2020

Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer ? , 2020 Scholar

A Hennessy-Milner theorem for ATL with imperfect information

F Belardinelli, C Dima, V Malvone, F Tiplea

We show that a history-based variant of alternating bisimulation with imperfect information allows it to be related to a variant of Alternating-time Temporal Logic (ATL) with imperfect information by a full Hennessy-Milner theorem. The variant of ATL we consider has a common knowledge semantics, which requires that the uniform strategy available for a coalition to accomplish some goal must be common knowledge inside the coalition, while other semantic variants of ATL with imperfect information do not accomodate a Hennessy-Milner theorem. We also show that the existence of a history-based alternating bisimulation between two finite Concurrent Game Structures with imperfect information (iCGS) is undecidable.

Link to paper
BibTeX citation
@article{a-hennessy-milner-theorem-for-atl-with-imperfect-information,
  title = {A Hennessy-Milner theorem for ATL with imperfect information},
  author = {F Belardinelli, C Dima, V Malvone, F Tiplea},
  journal = {Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer ? , 2020},
  year = {2020},
  url = {https://dl.acm.org/doi/abs/10.1145/3373718.3394784},
}
17th International Conference on Principles of Knowledge Representation and ? , 2020 Scholar

A three-valued approach to strategic abilities under imperfect information

F Belardinelli, V Malvone

A major challenge for logics for strategies is represented by their verification in contexts of imperfect information. In this contribution we advance the state of the art by approximating the verification of Alternating-time Temporal Logic (AT L) under imperfect information by using perfect information and a three-valued semantics. In particular, we develop novel automata-theoretic techniques for the linear-time logic LT L, then apply these to finding ?failure? states, where the AT L specification to be model checked is undefined. Such failure states can then be fed into a refinement procedure, thus providing a sound, albeit partial, verification procedure.

Link to paper
BibTeX citation
@article{a-three-valued-approach-to-strategic-abilities-under-imperfe,
  title = {A three-valued approach to strategic abilities under imperfect information},
  author = {F Belardinelli, V Malvone},
  journal = {17th International Conference on Principles of Knowledge Representation and ? , 2020},
  year = {2020},
  url = {https://hal.science/hal-03170197/},
}
arXiv preprint arXiv:2002.06000 , 2020 Scholar

Extended markov games to learn multiple tasks in multi-agent reinforcement learning

BG Le?n, F Belardinelli

The combination of Formal Methods with Reinforcement Learning (RL) has recently attracted interest as a way for single-agent RL to learn multiple-task specifications. In this paper we extend this convergence to multi-agent settings and formally define Extended Markov Games as a general mathematical model that allows multiple RL agents to concurrently learn various non-Markovian specifications. To introduce this new model we provide formal definitions and proofs as well as empirical tests of RL algorithms running on this framework. Specifically, we use our model to train two different logic-based multi-agent RL algorithms to solve diverse settings of non-Markovian co-safe LTL specifications.

Link to paper
BibTeX citation
@article{extended-markov-games-to-learn-multiple-tasks-in-multi-agent,
  title = {Extended markov games to learn multiple tasks in multi-agent reinforcement learning},
  author = {BG Le?n, F Belardinelli},
  journal = {arXiv preprint arXiv:2002.06000 , 2020},
  year = {2020},
  url = {https://arxiv.org/abs/2002.06000},
}
Proceedings of the 35th Annual ACM Symposium on Applied Computing, 940-947 , 2020 Scholar

Formal verification of debates in argumentation theory

R Jha, F Belardinelli, F Toni

Humans engage in informal debates on a daily basis. By expressing their opinions and ideas in an argumentative fashion, they are able to gain a deeper understanding of a given problem and in some cases, find the best possible course of actions towards resolving it. In this paper, we develop a methodology to verify debates formalised as abstract argumentation frameworks. We first present a translation from debates to transition systems. Such transition systems can model debates and represent their evolution over time using a finite set of states. We then formalise relevant debate properties using temporal and strategy logics. These formalisations, along with a debate transition system, allow us to verify whether a given debate satisfies certain properties. The verification process can be automated using model checkers. Thus, we also measure their performance when verifying debates, and use the results to ?

Link to paper
BibTeX citation
@article{formal-verification-of-debates-in-argumentation-theory-2020-,
  title = {Formal verification of debates in argumentation theory},
  author = {R Jha, F Belardinelli, F Toni},
  journal = {Proceedings of the 35th Annual ACM Symposium on Applied Computing, 940-947 , 2020},
  year = {2020},
  url = {https://dl.acm.org/doi/abs/10.1145/3341105.3373907},
}
Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 7071-7078 , 2020 Scholar

Model checking temporal epistemic logic under bounded recall

F Belardinelli, A Lomuscio, E Yu

We study the problem of verifying multi-agent systems under the assumption of bounded recall. We introduce the logic CTLK BR, a bounded-recall variant of the temporal-epistemic logic CTLK. We define and study the model checking problem against CTLK specifications under incomplete information and bounded recall and present complexity upper bounds. We present an extension of the BDD-based model checker MCMAS implementing model checking under bounded recall semantics and discuss the experimental results obtained.

Link to paper
BibTeX citation
@article{model-checking-temporal-epistemic-logic-under-bounded-recall,
  title = {Model checking temporal epistemic logic under bounded recall},
  author = {F Belardinelli, A Lomuscio, E Yu},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 7071-7078 , 2020},
  year = {2020},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/6193},
}
24th European Conference on Artificial Intelligence, 624--631 , 2020 Scholar

Reasoning with a bounded number of resources in ATL+

F Belardinelli, S Demri

The resource-bounded alternating-time temporal logic RB?ATL combines strategic reasoning with reasoning about resources. Its model-checking problem is known to be 2EXPTIME-complete (the same as its proper extension RB?ATL *). Several fragments have been identified to lower the complexity. In this work, we consider the variant RB?ATL + which permits Boolean combinations of path formulae starting with single temporal operators, but restricted to a single resource, providing an interesting trade-off between temporal expressivity and resource analysis. We show that the model-checking problem for RB?ATL + restricted to a single agent and a single resource is ?p2-complete, hence the same as for CTL +. In this case reasoning about resources comes at no extra computational cost. Furthermore, we show that, with an arbitrary number of agents and a fixed number of resources, the problem can be solved in EXPTIME using a Turing reduction to the parity game problem for alternating vector addition systems with states.

Link to paper
BibTeX citation
@article{reasoning-with-a-bounded-number-of-resources-in-atl-2020-mr3,
  title = {Reasoning with a bounded number of resources in ATL+},
  author = {F Belardinelli, S Demri},
  journal = {24th European Conference on Artificial Intelligence, 624--631 , 2020},
  year = {2020},
  url = {https://hal.science/hal-03005853/},
}
arXiv preprint arXiv:2006.08767 , 2020 Scholar

Systematic generalisation through task temporal logic and deep reinforcement learning

BG Le?n, M Shanahan, F Belardinelli

This work introduces a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL) to achieve systematic zero-shot, i.e., never-seen-before, generalisation of formally specified instructions. In particular, we present a neuro-symbolic framework where a symbolic module transforms TL specifications into a form that helps the training of a DRL agent targeting generalisation, while a neural module learns systematically to solve the given tasks. We study the emergence of systematic learning in different settings and find that the architecture of the convolutional layers is key when generalising to new instructions. We also provide evidence that systematic learning can emerge with abstract operators such as negation when learning from a few training examples, which previous research have struggled with.

Link to paper
BibTeX citation
@article{systematic-generalisation-through-task-temporal-logic-and-de,
  title = {Systematic generalisation through task temporal logic and deep reinforcement learning},
  author = {BG Le?n, M Shanahan, F Belardinelli},
  journal = {arXiv preprint arXiv:2006.08767 , 2020},
  year = {2020},
  url = {https://arxiv.org/abs/2006.08767},
}
24th European Conference on Artificial Intelligence (ECAI 2020) , 2020 Scholar

Verifying strategic abilities in multi-agent systems via first-order entailment

F Belardinelli, V Malvone

The verification of strategic abilities of autonomous agents is a key subject of investigation in the applications of formal methods to the design and certification of multi-agents systems. In this contribution we propose a novel approach to this verification problem. Inspired by recent advances, we introduce a translation from Alternating-time Temporal Logic (ATL) to First-order Logic (FOL). We show that our translation is sound on a fragment of ATL, that we call ATL-live, as it is suitable to express liveness properties in MAS. Further, we show how the universal model checking problem for ATL-live can be reduced to semantic entailment in FOL. Finally, we prove that ATL-live is maximal in the sense that if any other ATL connective is added, non-FOL reasoning techniques would be required. These results are meant to be a first step towards the application of FOL reasoners to model check strategic abilities expressed in ATL.

Link to paper
BibTeX citation
@article{verifying-strategic-abilities-in-multi-agent-systems-via-fir,
  title = {Verifying strategic abilities in multi-agent systems via first-order entailment},
  author = {F Belardinelli, V Malvone},
  journal = {24th European Conference on Artificial Intelligence (ECAI 2020) , 2020},
  year = {2020},
  url = {https://hal.science/hal-03170192/},
}

2019

Proceedings of the 18th International Conference on Autonomous Agents and ? , 2019 Scholar

A social choice theoretic perspective on database aggregation

F Belardinelli, U Grandi

Aggregating information coming from multiple sources is a longstanding problem in both knowledge representation and multiagent systems (see, eg,[28]). Depending on the chosen representation for the incoming pieces of knowledge or information, a number of competing approaches has seen the light in these literatures. Belief merging [21?23] studies the problem of aggregating propositional formulas coming from different agents into a set of models, subject to some integrity constraint. Judgment and binary aggregation [11, 12, 17] asks individual agents to report yes/no opinions on a set of logically-related binary issues?the agenda?in order to take a collective decision. Social welfare functions, the cornerstone problem in social choice theory (see, eg,[2]), can also be viewed as mechanisms to merge conflicting information, namely the individual preferences of voters expressed in the form of linear orders over a set of alternatives. Other examples include graph aggregation [13], multi-agent argumentation [6?8], ontology merging [26], and clustering aggregation [15]. In this work we take a general perspective and represent individual knowledge coming from multiple sources as a profile of databases, modelled as finite relational structures [1, 25]. Our motivation lies in between two possibly conflicting views on the problem of information fusion. On the one hand, the study of information merging (typically knowledge or beliefs) in knowledge representation has focused on the design of rules that guarantee the consistency of the outcome, with the main driving principles inspired from the literature on belief revision. On the other hand, social choice ?

Link to paper
BibTeX citation
@article{a-social-choice-theoretic-perspective-on-database-aggregatio,
  title = {A social choice theoretic perspective on database aggregation},
  author = {F Belardinelli, U Grandi},
  journal = {Proceedings of the 18th International Conference on Autonomous Agents and ? , 2019},
  year = {2019},
  url = {http://www.doc.ic.ac.uk/~fbelard/Documents/papers/aamas2019b.pdf},
}
Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), 6030-6037 , 2019 Scholar

An abstraction-based method for verifying strategic properties in multi-agent systems with imperfect information

F Belardinelli, A Lomuscio, V Malvone

We investigate the verification of Multi-agent Systems against strategic properties expressed in Alternating-time Temporal Logic under the assumptions of imperfect information and perfect recall. To this end, we develop a three-valued semantics for concurrent game structures upon which we define an abstraction method. We prove that concurrent game structures with imperfect information admit perfect information abstractions that preserve three-valued satisfaction. Further, we present a refinement procedure to deal with cases where the value of a specification is undefined. We illustrate the overall procedure in a variant of the Train Gate Controller scenario under imperfect information and perfect recall.

Link to paper
BibTeX citation
@article{an-abstraction-based-method-for-verifying-strategic-properti,
  title = {An abstraction-based method for verifying strategic properties in multi-agent systems with imperfect information},
  author = {F Belardinelli, A Lomuscio, V Malvone},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), 6030-6037 , 2019},
  year = {2019},
  url = {https://aaai.org/ojs/index.php/AAAI/article/view/4558},
}
International Conference on Principles and Practice of Multi-Agent Systems ? , 2019 Scholar

Decidable verification of agent-based data-aware systems

F Belardinelli, V Malvone

In recent years the area of knowledge representation and reasoning (KR&R) has witnessed a growing interest in the modelling and analysis of data-driven/data-centric systems. These are systems in which the two tenets of data and processes are given equal importance, differently from traditional approaches whereby the data content is typically abstracted away in order to make the reasoning task easier. However, if data-aware systems (DaS) are to be deployed in concrete KR&R scenarios, it is key to develop tailored verification techniques, suitable to account for both data and processes. In this contribution we consider for the first time to our knowledge the parameterised verification of DaS. In particular, we prove that ? under specific assumptions ? this problem is decidable by computing a suitable cut-off value. We illustrate the proposed approach with a use case from the literature on business process modelling.

BibTeX citation
@article{decidable-verification-of-agent-based-data-aware-systems-201,
  title = {Decidable verification of agent-based data-aware systems},
  author = {F Belardinelli, V Malvone},
  journal = {International Conference on Principles and Practice of Multi-Agent Systems ? , 2019},
  year = {2019},
  url = {https://link.springer.com/chapter/10.1007/978-3-030-33792-6_4},
}
International Conference on Principles and Practice of Multi-Agent Systems ? , 2019 Scholar

Imperfect information in alternating-time temporal logic on finite traces

F Belardinelli, A Lomuscio, A Murano, S Rubin

We introduce a logic to reason about strategic abilities in finite games under imperfect information. We interpret Alternating-time Temporal Logic on interpreted systems with final states, where agents only have partial observability of the system?s global state. We consider the model checking problem in this setting. We prove that the complexity results available for the case of infinite traces carry over to the finite traces case. We show that when only public actions are allowed, the verification problem under perfect recall becomes decidable.

Link to paper
BibTeX citation
@article{imperfect-information-in-alternating-time-temporal-logic-on-,
  title = {Imperfect information in alternating-time temporal logic on finite traces},
  author = {F Belardinelli, A Lomuscio, A Murano, S Rubin},
  journal = {International Conference on Principles and Practice of Multi-Agent Systems ? , 2019},
  year = {2019},
  url = {https://link.springer.com/chapter/10.1007/978-3-030-33792-6_31},
}
arXiv preprint arXiv:1903.04350 , 2019 Scholar

Model Checking ATL* on vCGS

F Belardinelli, C Dima, I Boureanu, V Malvone

We prove that the model checking ATL* on concurrent game structures with propositional control for atom-visibility (vCGS) is undecidable. To do so, we reduce this problem to model checking ATL* on iCGS.

Link to paper
BibTeX citation
@article{model-checking-atl-on-vcgs-2019-mr35r1eaaaaj-ldfaerwxgeuc,
  title = {Model Checking ATL* on vCGS},
  author = {F Belardinelli, C Dima, I Boureanu, V Malvone},
  journal = {arXiv preprint arXiv:1903.04350 , 2019},
  year = {2019},
  url = {https://arxiv.org/abs/1903.04350},
}
18th International Conference on Autonomous Agents and Multiagent Systems ? , 2019 Scholar

Resource-bounded ATL: the quest for tractable fragments

F Belardinelli, S Demri

Resource-aware logics to represent strategic abilities in multi-agent systems are notoriously hard to handle as they combine strategic reasoning with reasoning about resources. In this work, we begin by providing a general overview of the model-checking results currently available for the Resource-bounded Alternating-timeTemporal Logic RB?ATL. This allows us to identify several open problems in the literature, as well as to establish relationships with RBTL-like logics, when RB?ATL is restricted to a single agent. Then,we tackle one such open problem that we deem highly significant: we show that model-checking RB?ATL is ptime-complete when restricted to a single agent and a single resource. To do so, we make a valuable detour on vector addition systems with states, by proving new complexity results for their statereachability and nontermination problems, when restricted to a single counter. Thus, reasoning about resources comes at no computational extra cost int he single-resource, single-agent case.

Link to paper
BibTeX citation
@article{resource-bounded-atl-the-quest-for-tractable-fragments-2019-,
  title = {Resource-bounded ATL: the quest for tractable fragments},
  author = {F Belardinelli, S Demri},
  journal = {18th International Conference on Autonomous Agents and Multiagent Systems ? , 2019},
  year = {2019},
  url = {https://hal.science/hal-02362636/},
}
Conf?rence Nationale en Intelligence Artificielle , 2019 Scholar

Resource-bounded ATL: the Quest for Tractable Fragments En qu?te de fragments m?canisables pour ATL avec ressources Keywords Logics for agents and multi-agent systems ?

F Belardinelli, S Demri

Dans ce travail, nous commen?ons par pr?senter un ?tat de l'art des r?sultats sur le probl?me de model-checking en relation avec la logique RB?ATL, qui est une version de ATL avec ressources. Cela nous permet d'identifier plu-sieurs probl?mes ouverts et d'?tablir des relations avec les logiques ? la RBTL, lorsque RB?ATL est restreinte ? un unique agent. Ensuite, nous montrons que le probl?me de model-checking pour RB?ATL restreinte ? un agent et ? une ressource est PTIME-complet. Pour ce faire, nous fai-sons une l?ger d?tour en passant par les syst?mes d'addi-tion de vecteurs avec ?tats. Nous prouvons de nouveaux r?sultats de complexit? pour l'accessibilit? d'un ?tat de contr?le et pour la non-terminaison, lorsqu'un seul comp-teur est autoris?. Cet article a ?t? pr?sent? ? la conf?-rence AAMAS'19, Montr?al, mai 2019.

Link to paper
BibTeX citation
@article{resource-bounded-atl-the-quest-for-tractable-fragments-en-qu,
  title = {Resource-bounded ATL: the Quest for Tractable Fragments En qu?te de fragments m?canisables pour ATL avec ressources Keywords Logics for agents and multi-agent systems ?},
  author = {F Belardinelli, S Demri},
  journal = {Conf?rence Nationale en Intelligence Artificielle , 2019},
  year = {2019},
  url = {https://hal.science/hal-02328812/},
}
arXiv preprint arXiv:1907.10492 , 2019 Scholar

Social choice methods for database aggregation

F Belardinelli, U Grandi

Knowledge can be represented compactly in multiple ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of integrity constraints, we identify classes of aggregators that respect them in the aggregated database, provided these are satisfied in all individual databases. We also characterise languages for first-order queries on which the answer to a query on the aggregated database coincides with the aggregation of the answers to the query obtained on each individual database. This contribution is meant to be a first step on the application of techniques from social choice theory to knowledge representation in databases.

Link to paper
BibTeX citation
@article{social-choice-methods-for-database-aggregation-2019-mr35r1ea,
  title = {Social choice methods for database aggregation},
  author = {F Belardinelli, U Grandi},
  journal = {arXiv preprint arXiv:1907.10492 , 2019},
  year = {2019},
  url = {https://arxiv.org/abs/1907.10492},
}

Strategic Reasoning about General Game Playing under Imperfect Information

YI Zhu, F Belardinelli

In this technical report, we mainly focus on the translation from Game Description Language with Imperfect Information (GDL-II) to a Concurrent Game Structure with imperfect information (iCGS) using Alternating-time Temporal Logic (ATL), as well as the relationship between the two. This section provides the background information on the syntax and semantics of GDL-II, and the syntax of ATL and its semantics over iCGS.

Link to paper
BibTeX citation
@article{strategic-reasoning-about-general-game-playing-under-imperfe,
  title = {Strategic Reasoning about General Game Playing under Imperfect Information},
  author = {YI Zhu, F Belardinelli},
  year = {2019},
  url = {https://www.irit.fr/agape/wp-content/uploads/2020/01/main-4.pdf},
}
28th International Joint Conference on Artificial Intelligence (IJCAI 2019 ? , 2019 Scholar

Strategy logic with simple goals: Tractable reasoning about strategies

F Belardinelli, W Jamroga, V Malvone, A Murano

In this paper we introduce Strategy Logic with Simple Goals (SL[SG]), a fragment of Strategy Logic that strictly extends Alternating-time Temporal Logic ATL by introducing arbitrary quantification over the agents' strategies. Our motivation comes from game-theoretic applications, such as expressing Stackelberg equilibria in games, coercion in voting protocols, as well as module checking for simple goals. We prove that model checking SL[SG] is P-complete, the same as ATL. Thus, the extra expressive power comes at no computational cost as far as verification is concerned.

Link to paper
BibTeX citation
@article{strategy-logic-with-simple-goals-tractable-reasoning-about-s,
  title = {Strategy logic with simple goals: Tractable reasoning about strategies},
  author = {F Belardinelli, W Jamroga, V Malvone, A Murano},
  journal = {28th International Joint Conference on Artificial Intelligence (IJCAI 2019 ? , 2019},
  year = {2019},
  url = {https://hal.science/hal-02377403/},
}
Proceedings of the 18th International Conference on Autonomous Agents and ? , 2019 Scholar

Verifying Strategic Abilities in Multi-agent Systems with Private-Data Sharing

F Belardinelli, I Boureanu, C Dima, V Malvone

Most formalisms for multi-agent systems (MAS) are not adept at explicitly expressing of data sharing between agents. Yet, disclosure and hiding of data amongst agents impacts on their strategic abilities, and so has a strong baring on non-classical logics that formally capture agents? coalitions, e.g., Alternating-time Temporal Logic (ATL) [1]. To this end, we devise concurrent game structures with propositional control for atom-visibility (vCGS). In vCGS, agents a and b have an explicit endowment to see some of each others? variables, without other agents partaking in this. Second, we ascertain that the model checking problem for ATL with imperfect information and perfect recall on vCGS is undecidable. Third, we put forward a methodology to model check a formula ? in ATL? on a vCGS M, by verifying a suitable translation of ? in a submodel of M.

Link to paper
BibTeX citation
@article{verifying-strategic-abilities-in-multi-agent-systems-with-pr,
  title = {Verifying Strategic Abilities in Multi-agent Systems with Private-Data Sharing},
  author = {F Belardinelli, I Boureanu, C Dima, V Malvone},
  journal = {Proceedings of the 18th International Conference on Autonomous Agents and ? , 2019},
  year = {2019},
  url = {https://hal.u-pec.fr/hal-04572109},
}

2018

IJCAI 18, 77-83 , 2018 Scholar

Alternating-time Temporal Logic on Finite Traces.

F Belardinelli, A Lomuscio, A Murano, S Rubin

We develop a logic-based technique to analyse finite interactions in multi-agent systems. We introduce a semantics for Alternating-time Temporal Logic (for both perfect and imperfect recall) and its branching-time fragments in which paths are finite instead of infinite. We study validities of these logics and present optimal algorithms for their modelchecking problems in the perfect recall case.

Link to paper
BibTeX citation
@article{alternating-time-temporal-logic-on-finite-traces-2018-mr35r1,
  title = {Alternating-time Temporal Logic on Finite Traces.},
  author = {F Belardinelli, A Lomuscio, A Murano, S Rubin},
  journal = {IJCAI 18, 77-83 , 2018},
  year = {2018},
  url = {https://www.researchgate.net/profile/Aniello-Murano/publication/326206267_Alternating-time_Temporal_Logic_on_Finite_Traces/links/5be55832a6fdcc3a8dc8a658/Alternating-time-Temporal-Logic-on-Finite-Traces.pdf},
}
KR 18, 435-444 , 2018 Scholar

Approximating Perfect Recall When Model Checking Strategic Abilities.

F Belardinelli, A Lomuscio, V Malvone

We investigate the notion of bounded recall in the context of model checking AT L* and AT L specifications in multiagent systems with imperfect information. We present a novel three-valued semantics for AT L*, respectively AT L, under bounded recall and imperfect information, and study the corresponding model checking problems. Most importantly, we show that the three-valued semantics constitutes an approximation with respect to the traditional two-valued semantics. In the light of this we construct a sound, albeit partial, algorithm for model checking two-valued perfect recall via its approximation as three-valued bounded recall.

Link to paper
BibTeX citation
@article{approximating-perfect-recall-when-model-checking-strategic-a,
  title = {Approximating Perfect Recall When Model Checking Strategic Abilities.},
  author = {F Belardinelli, A Lomuscio, V Malvone},
  journal = {KR 18, 435-444 , 2018},
  year = {2018},
  url = {https://cdn.aaai.org/ocs/18010/18010-78664-1-PB.pdf},
}
16th International Conference on Principles of Knowledge Representation and ? , 2018 Scholar

Bisimulations for logics of strategies: A study in expressiveness and verification

F Belardinelli, C Dima, A Murano

In this paper we advance the state of the art on the subject of bisimulations for logics of strategies. Bisimulations are a key notion to study the expressive power of a modal language, as well as for applications to system verification. In this contribution we present novel notions of bisimulation for several significant fragments of Strategy Logic (SL), and prove that they preserve the interpretation of formulas in the corresponding fragments. In selected cases we are able to prove that such bisimulations enjoy the Hennessy-Milner property. Finally, we make use of bisimulations to study the expressiveness of the various fragment of SL, including the complexity of their model checking problems.

Link to paper
BibTeX citation
@article{bisimulations-for-logics-of-strategies-a-study-in-expressive,
  title = {Bisimulations for logics of strategies: A study in expressiveness and verification},
  author = {F Belardinelli, C Dima, A Murano},
  journal = {16th International Conference on Principles of Knowledge Representation and ? , 2018},
  year = {2018},
  url = {https://cdn.aaai.org/ocs/17992/17992-78665-1-PB.pdf},
}
arXiv preprint arXiv:1802.08586 , 2018 Scholar

Database Aggregation

F Belardinelli, U Grandi

Knowledge can be represented compactly in a multitude ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of an integrity constraint, we identify classes of aggregators that respect it in the aggregated database, provided all individual databases satisfy it. We also characterise languages for first-order queries on which the answer to queries on the aggregated database coincides with the aggregation of the answers to the query obtained on each individual database. This contribution is meant to be a first step on the application of techniques from rational choice theory to knowledge representation in databases.

Link to paper
BibTeX citation
@article{database-aggregation-2018-mr35r1eaaaaj-nmxildl6lwmc,
  title = {Database Aggregation},
  author = {F Belardinelli, U Grandi},
  journal = {arXiv preprint arXiv:1802.08586 , 2018},
  year = {2018},
  url = {https://arxiv.org/abs/1802.08586},
}
AAMAS 18, 1865-1867 , 2018 Scholar

Decidable Verification of Multi-agent Systems with Bounded Private Actions.

F Belardinelli, A Lomuscio, A Murano, S Rubin

Introduction. A key feature of the frontier between decidability and undecidability in the verification of multi-agent systems (MAS) with respect to strategy-based specifications is the assumptions made on the agents? information. For instance, the complexity of model checking MAS against ATL is markedly different for agents with complete information compared with agents with incomplete information, ie, PTIME-complete [1] and undecidable [7], respectively, assuming agents with perfect recall. It follows that the verification problem for agents with incomplete information and perfect recall remains undecidable in any formalism stronger than ATL, such as Strategy Logic and most of its variants [4?6, 12]. It therefore remains of importance to identify expressive fragments whose model checking problem is decidable. A way of achieving this consists in identifying classes of MAS, still endowed with perfect recall and incomplete information, for which model checking is decidable. For example, such a result is proved in [2, 3] for MAS where all communication is via public actions. In this paper we further pursue this line and show that the work in [2] can be generalised much further. Specifically, we show that: i) model checking remains decidable even if non-public actions are permitted a bounded number of times along any execution; ii) the temporal language underlying epistemic SL can be considerably extended from LTL to LDL [13] at no extra computational cost. By doing so, we show decidability for a much larger class of MAS against a considerably expressive specification language. Interpreted Systems (IS) with Explicit Public Actions. Interpreted systems ?

Link to paper
BibTeX citation
@article{decidable-verification-of-multi-agent-systems-with-bounded-p,
  title = {Decidable Verification of Multi-agent Systems with Bounded Private Actions.},
  author = {F Belardinelli, A Lomuscio, A Murano, S Rubin},
  journal = {AAMAS 18, 1865-1867 , 2018},
  year = {2018},
  url = {https://www.doc.ic.ac.uk/~fbelard/Documents/aamas2018.pdf},
}
Springer International Publishing , 2018 Scholar

Multi-Agent Systems and Agreement Technologies

F Belardinelli, E Argente

This volume contains revised versions of the papers presented at the 15th European Conference on Multi-Agent Systems (EUMAS 2017) and the 5th International Conference on Agreement Technologies (AT 2017), which were both held at the Universit? d?Evry?Val d?Essonne, during December 14?15, 2017. EUMAS 2017 followed the tradition of previous editions (Oxford 2003, Barcelona 2004, Brussels 2005, Lisbon 2006, Hammamet 2007, Bath 2008, Agia Napa 2009, Paris 2010, Maastricht 2011, Dublin 2012, Toulouse 2013, Prague 2014, Athens 2015, and Valencia 2016) in aiming to provide the prime European forum for presenting and discussing agents research as the annual designated event of the European Association of Multi-Agent Systems (EURAMAS). AT 2017 was the fifth instalment in a series of events (after Dubrovnik 2012, Beijing 2013, Athens 2015, and Valencia 2016) that focus on bringing ?

Link to paper
BibTeX citation
@article{multi-agent-systems-and-agreement-technologies-2018-mr35r1ea,
  title = {Multi-Agent Systems and Agreement Technologies},
  author = {F Belardinelli, E Argente},
  journal = {Springer International Publishing , 2018},
  year = {2018},
  url = {https://link.springer.com/content/pdf/10.1007/978-3-030-01713-2.pdf},
}
Artificial Intelligence 263, 3-45 , 2018 Scholar

Second-order propositional modal logic: Expressiveness and completeness results

F Belardinelli, W Van Der Hoek, LB Kuijer

In this paper we advance the state-of-the-art on the application of second-order propositional modal logic (SOPML) in the representation of individual and group knowledge, as well as temporal and spatial reasoning. The main theoretical contributions of the paper can be summarised as follows. Firstly, we introduce the language of (multi-modal) SOPML and interpret it on a variety of different classes of Kripke frames according to the features of the accessibility relations and of the algebraic structure of the quantification domain of propositions. We provide axiomatisations for some of these classes, and show that SOPML is unaxiomatisable on the remaining classes. Secondly, we introduce novel notions of (bi)simulations and prove that they indeed preserve the interpretation of formulas in (the universal fragment of) SOPML. Then, we apply this formal machinery to study the expressiveness of Second-order ?

Link to paper
BibTeX citation
@article{second-order-propositional-modal-logic-expressiveness-and-co,
  title = {Second-order propositional modal logic: Expressiveness and completeness results},
  author = {F Belardinelli, W Van Der Hoek, LB Kuijer},
  journal = {Artificial Intelligence 263, 3-45 , 2018},
  year = {2018},
  url = {https://www.sciencedirect.com/science/article/pii/S0004370218303886},
}

Towards the Verification of Strategic Ability in MAS with Private Data-Sharing

I Boureanu, C Dima, F Belardinelli, V Malvone

The Increasing Importance of Private Information Sharing. The 5th-generation of networks is set to be one of the important ICT developments into the 2020s. In this way, smart buildings, cities and cars will be in constant connectivity, transferring data back and forth, sometimes on private channels. These private communications should lead to interlocutors reactively controlling and influencing subsequent information-flow and parts of the system. Already, your UK?s Hive heating-control system privately gets a reading from the smart-thermostat sensors and, based on the privately-attained measurements, Hive reactively changes the temperature settings in your house. At the same time, the private nature of Internet-of-Things communications is to be safeguarded at all costs (Samaila et al. 2017). In a nutshell, private communications will be at the core in 2020s? ubiquitous systems, and are likely to be further endorsed and imposed. So, being able to explicitly and naturally model datasharing in multi-agent systems (MAS) is essential. However, the concurrent formalisms for MAS do not cater for explicit data-sharing between agents: neither at the syntactic nor at the semantic level. Such sharing can be nonetheless achieved eg, via mechanisations based on duplicating agents? variables and the addition of synchronisation-driven actions. But, this is tedious and error-prone, and?above all?it is inefficient, yielding adding extra actions and generally duplication of agents? local variables. To fill this gap, we put forward a formalism for the explicit expression of private data-sharing in MAS. That is, we encode syntactically and in a natural semantics ?MAS with 1-to ?

Link to paper
BibTeX citation
@article{towards-the-verification-of-strategic-ability-in-mas-with-pr,
  title = {Towards the Verification of Strategic Ability in MAS with Private Data-Sharing},
  author = {I Boureanu, C Dima, F Belardinelli, V Malvone},
  year = {2018},
  url = {http://www.dis.uniroma1.it/~kr18actions/papers/ACTIONSKR18_paper_15.pdf},
}

2017

arXiv preprint arXiv:1707.08735 , 2017 Scholar

A logic for global and local announcements

F Belardinelli, H van Ditmarsch, W van der Hoek

In this paper we introduce {\em global and local announcement logic} (GLAL), a dynamic epistemic logic with two distinct announcement operators -- and indexed to a subset of the set of all agents -- for global and local announcements respectively. The boundary case corresponds to the public announcement of , as known from the literature. Unlike standard public announcements, which are {\em model transformers}, the global and local announcements are {\em pointed model transformers}. In particular, the update induced by the announcement may be different in different states of the model. Therefore, the resulting computations are trees of models, rather than the typical sequences. A consequence of our semantics is that modally bisimilar states may be distinguished in our logic. Then, we provide a stronger notion of bisimilarity and we show that it preserves modal equivalence in GLAL. Additionally, we show that GLAL is strictly more expressive than public announcement logic with common knowledge. We prove a wide range of validities for GLAL involving the interaction between dynamics and knowledge, and show that the satisfiability problem for GLAL is decidable. We illustrate the formal machinery by means of detailed epistemic scenarios.

Link to paper
BibTeX citation
@article{a-logic-for-global-and-local-announcements-2017-mr35r1eaaaaj,
  title = {A logic for global and local announcements},
  author = {F Belardinelli, H van Ditmarsch, W van der Hoek},
  journal = {arXiv preprint arXiv:1707.08735 , 2017},
  year = {2017},
  url = {https://arxiv.org/abs/1707.08735},
}
26th International Joint Conference on Artificial Intelligence (IJCAI 2017 ? , 2017 Scholar

Dynamic logic for data-aware systems: decidability results

F Belardinelli, A Herzig

We introduce a first-order extension of dynamic logic (FO-DL), suitable to represent and reason about the behaviour of Data-aware Systems (DaS), which are systems whose data content is explicitly exhibited in their description. We illustrate the expressivity of the formal framework by modelling English auctions as DaS and by specifying relevant properties in FO-DL. Most importantly, we develop an abstraction-based verification procedure, thus proving that the model checking problem for DaS against FO-DL is decidable, provided some mild assumptions on the interpretation domain.

Link to paper
BibTeX citation
@article{dynamic-logic-for-data-aware-systems-decidability-results-20,
  title = {Dynamic logic for data-aware systems: decidability results},
  author = {F Belardinelli, A Herzig},
  journal = {26th International Joint Conference on Artificial Intelligence (IJCAI 2017 ? , 2017},
  year = {2017},
  url = {https://hal.science/hal-01629359/},
}

Model Checking Multi-Agent Systems

F Belardinelli

BibTeX citation
@article{model-checking-multi-agent-systems-2017-mr35r1eaaaaj-hmod-77,
  title = {Model Checking Multi-Agent Systems},
  author = {F Belardinelli},
  year = {2017},
  url = {https://scholar.google.com/scholar?cluster=5629462132987839309&hl=en&oi=scholarr},
}
IJCAI, 98-104 , 2017 Scholar

Parameterised Verification of Data-aware Multi-Agent Systems.

F Belardinelli, P Kouvaros, A Lomuscio

We introduce parameterised data-aware multiagent systems, a formalism to reason about the temporal properties of arbitrarily large collections of homogeneous agents, each operating on an infinite data domain. We show that their parameterised verification problem is semi-decidable for classes of interest. This is demonstrated by separately addressing the unboundedness of the number of agents and the data domain. In doing so we reduce the parameterised model checking problem for these systems to that of parameterised verification for interleaved interpreted systems. We illustrate the expressivity of the formal model by modelling English auctions with an unbounded number of bidders on unbounded data.

BibTeX citation
@article{parameterised-verification-of-data-aware-multi-agent-systems,
  title = {Parameterised Verification of Data-aware Multi-Agent Systems.},
  author = {F Belardinelli, P Kouvaros, A Lomuscio},
  journal = {IJCAI, 98-104 , 2017},
  year = {2017},
  url = {https://pkouvaros.github.io/publications/IJCAI17-BKL/paper.pdf},
}
IJCAI 17, 91-97 , 2017 Scholar

Verification of broadcasting multi-agent systems against an epistemic strategy logic.

F Belardinelli, A Lomuscio, A Murano, S Rubin

We study a class of synchronous, perfect-recall multi-agent systems with imperfect information and broadcasting, ie, fully observable actions. We define an epistemic extension of strategy logic with incomplete information and the assumption of uniform and coherent strategies. In this setting, we prove that the model checking problem, and thus rational synthesis, is non-elementary decidable. We exemplify the applicability of the framework on a rational secret-sharing scenario.

Link to paper
BibTeX citation
@article{verification-of-broadcasting-multi-agent-systems-against-an-,
  title = {Verification of broadcasting multi-agent systems against an epistemic strategy logic.},
  author = {F Belardinelli, A Lomuscio, A Murano, S Rubin},
  journal = {IJCAI 17, 91-97 , 2017},
  year = {2017},
  url = {https://people.na.infn.it/~Murano/pubblicazioni/IJCAI17-Alessio.pdf},
}