PhD Defense: Zhuofan Xu

Permutation equivariant and permutation invariant reinforcement learning for multi-agent systems

Thursday 11 December 2025 at 9h00
ENS Paris-Saclay, Room 1Z25 and online

Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making tasks, from games to robotics. Extending DRL to Multi-Agent Reinforcement Learning (MARL) introduces additional challenges: agents must coordinate in high-dimensional environments, training becomes unstable due to non-stationarity, and learned strategies often fail to generalize across team sizes or tasks.

A key source of inefficiency lies in ignoring structural symmetries. In cooperative MARL, the order of agents carries no intrinsic meaning: permuting inputs should not alter the underlying decision problem. Standard neural architectures, however, rarely enforce this property, leading to redundant parameters, poor sample efficiency, and unstable learning.

This thesis develops principled methods that integrate permutation equivariance (PE) and permutation invariance (PI) directly into neural architectures for MARL. We design novel PE and PI networks, such as the Permutation-Equivariant Neural Network (PENN), its invariant variant IPENN, and the Global-Local Permutation Equivariant (GLPE) structures at the core of the Centralized Permutation Equivariant (CPE) framework. These architectures are combined with established MARL paradigms, including Centralized Training with Decentralized Execution (CTDE), value decomposition methods (QMIX, QPLEX), and actor–critic algorithms (MAPPO).

The proposed approaches are evaluated on a wide range of benchmarks, from simplified environments such as Multi-Armed Bandits (MAB) to large-scale cooperative settings including Combat, SMAC, RWARE, and MPE. Results demonstrate significant improvements in stability, parameter efficiency, and final performance compared to baseline methods.

Finally, exploratory directions are investigated, spanning attention-based PE modules, permutation-stable structures, Fourier-inspired formulations, and complementary training strategies such as curriculum learning and self-play. Together, these results highlight the potential of symmetry-aware design to advance scalable and interpretable multi-agent learning.

The defense will be in English

Jury:

  • Alain DUTECH, Laboratoire lorrain de Recherche en Informatique et ses Applications (LORIA), Inria, Rapporteur et examinateur,
  • Régis SABBADIN, Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT), INRAE, Rapporteur et examinateur,
  • Sergio MOVER, Laboratoire d’Informatique de l’École polytechnique (LIX), École polytechnique, Examinateur,
  • Nicolas SABOURET, Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, Examinateur,
  • Lina YE, Laboratoire Méthodes Formelles (LMF), CentraleSupélec, Examinatrice
  • Matthias FÜGGER, CNRS, Directeur
  • Benedikt BOLLIG, CNRS, Co-encadrant
  • Thomas NOWAK, ENS Paris-Saclay, Co-encadrant