Introduction
This chapter considers the notion of sharedness through the lens of physics. In particular, it uses the free energy principle (Friston, 2013) to deconstruct the fundaments of self-organisation—and how scale-invariant self-organisation rests upon shared generative or world models. Technically, this aspect of ensemble or distributed self-organisation manifests as generalised synchrony (a.k.a., synchronisation of chaos). Crucially, the ensuing synchronisation can be read in terms of inference and [Bayesian] belief sharing. In short, by considering the existential imperatives for any thing (sic), one is necessarily led to the conclusion that for some thing (sic) to self-organise is: (i) to share a common ground or generative model of a shared world, and (ii) to exchange beliefs (and knowledge) with other things, under that model.
In what follows, we first briefly rehearse the free energy principle, with a focus on what constitutes the self in self-organisation. We then consider two aspects of the ensuing Bayesian mechanics—or physics of sentience—that speak to the chapter by Chris Fields; namely, the representation of uncertainty or precision, and the constructive aspect of our generative models, respectively. This chapter concludes by considering the foundational role of shared generative models when deploying the free energy principle over nested scales of self-organisation.
On the nature of sentience from first principles
We start with the notion of things that can be individuated from other things. This individuation licences a description of self-organisation in terms of inference and modelling. Examples here range from the good regulator theorem from cybernetics (Conant and Ashby, 1970) through to modern day treatments under the free energy principle (Ramstead et al., 2022). The dénouement of these treatments is straightforward: to exist is to be in states that are characteristic of the thing in question. Formally, to exist is to maximise the path (or time) integral of the likelihood of being in characteristic or attracting states (Friston et al., 2022a). This can be read as maximising the marginal likelihood of exchanges with the world, also known as Bayesian model evidence. From a physicist’s perspective, this is nothing more than a principle of least action, or constrained maximum entropy principle (Sakthivadivel, 2022). From the perspective of a philosopher or psychologist to exist is to self-evidence (Hohwy, 2016). Model evidence speaks to the key role of generative models that underwrite the marginal likelihood and its variational bounds: i.e., variational free energy (Winn and Bishop, 2005). The generative model is nothing more than a probabilistic statement of the characteristic states jointly occupied by the world and the thing in question. So what does this buy us?
If existing, in a characteristic way, just is securing evidence for our generative models, then everything that exists can be described as engaging in inference. On a Bayesian gloss, this can be read as Bayesian belief updating, under a generative model that specifies attracting or characteristic states. This reading is appealing because it concurs with physics, in the sense that physics is a description of inference or measurement: ranging from quantum information theory (Fields et al., 2021a), through to the observational nature of physics (Cook, 1994). Indeed, the Bayesian mechanics unpacked below shares exactly the same foundations as quantum, classical and statistical mechanics. But pays careful attention to the individuation of things from the rest of the world (Friston, 2019). This line of thinking places generative models and belief updating centre stage in exchanges with the world.
Is belief updating a sufficient account of sentient behaviour? Perhaps. However, there are different kinds of things, which are characterised by the generative models they entail. One key aspect of these models is their temporal depth. In other words, the ability to model the future and, in particular, the consequences of their actions or choices. This means there may be certain kinds of things (like us) that have deep generative models, which support planning as inference (Attias, 2003; Botvinick and Toussaint, 2012; Lanillos et al., 2021). At this point, we encounter the kinds of things that evince agency, in a nontrivial sense; namely, things that choose what to do by inferring their action. On this view, intelligent behaviour has an interesting corollary:
It transpires that agents—to which the principle of least action applies—minimise a functional called expected free energy. Expected free energy can be decomposed into expected information gain and expected value, where value is the log likelihood of attracting or characteristic outcomes. This means that intelligent behaviour complies with the dual aspects of Bayesian optimality; namely, optimal Bayesian design (Lindley, 1956; Mackay, 1992) and decision theory (Berger, 2011), respectively. In short, to be intelligent is to be curious (with values).
The foundations of this account of sentient behaviour can be traced back to the students of Plato and were most clearly articulated in the 19th-century by Helmholtz as unconscious inference (Helmholtz, 1866/1962, 1878 (1971))—ideas that are reminiscent of Kantian philosophy. These ideas endured through the 20th-century in several flavours; for example, analysis by synthesis (Neisser, 1967; Yuille and Kersten, 2006), epistemological automata (MacKay, 1956), perception as hypothesis testing (Gregory, 1968, 1980) and, in machine learning, the Helmholtz machine (Dayan et al., 1995). The inference narrative supervened at the turn of the century, with a resurgence of interest in enactivist approaches (Barsalou, 2008; Chemero, 2009; Clark, 2001; Clark and Chalmers, 1998; Goodwin, 2000; Hutto and Myin, 2013) that now predominate in the cognitive and systems neurosciences, in the form of things like predictive processing and active inference (Ballard et al., 2013; Clark, 2013b; Friston et al., 2010; Hohwy, 2013; Hohwy, 2020; Parr and Friston, 2018; Rao and Ballard, 1999; Seth, 2014; Wiese, 2017).
Active inference can be read as an enactive version of the Bayesian brain hypothesis (Doya, 2007; Knill and Pouget, 2004) that subsumes sentience (perceptual inference) and behaviour by treating control and planning as inference (Attias, 2003; Botvinick and Toussaint, 2012; Da Costa et al., 2020; Lanillos et al., 2021). So, what is inference? In this setting, inference just refers to a process that maximises the evidence for some (generative) model or hypothesis about the causes of (sensory) data. Model evidence is also known as marginal likelihood; namely, the likelihood of some data, under a model of how those data were generated. Maximising the evidence for our own generative model is sometimes called self-evidencing (Hohwy, 2016). In brief, active inference casts the brain as a fantastic organ: a generator of fantasies, hypotheses and predictions that are tested against sensory evidence. One might ask how this account of sentient behaviour speaks to the nature of things. In what follows, we will look at the brain and how one can understand its functional architecture under this kind of physics.
Predictive coding and Bayesian belief updating
Given a generative model, there are well described belief-updating or propagation schemes that specify the requisite message passing that must, in some form, be implemented by [neuronal] networks. For generative models based upon continuous states of the world, these schemes are known as Bayesian filters or predictive coding (Bastos et al., 2012; Egner and Summerfield, 2013; Elias, 1955; Rao and Ballard, 1999; Srinivasan et al., 1982). In generative models of discrete states (e.g., “I am in the kitchen”, as opposed to “I am at these continuous GPS coordinates”) the message passing schemes are variously known as belief propagation or variational message passing (Dauwels, 2007; Friston et al., 2017; Parr et al., 2019; Winn and Bishop, 2005). All of these schemes can be cast as a gradient ascent on model evidence or marginal likelihood (Da Costa et al., 2021). In short, [neuronal] dynamics just are a process of inference. See Figure 1 for a schematic description of predictive coding.
Crucially, the gradients that subtend neuronal dynamics—and consequent belief updating—can always be formulated as a prediction error (Friston et al., 2017): that is, the divergence between predictions of sensory input and the observed sensations. In predictive coding schemes, it is thought that prediction errors are represented explicitly by superficial pyramidal cells in the upper layers of the cortex (Adams et al., 2013; Bastos et al., 2012; Lee and Mumford, 2003; Mumford, 1992; Shipp, 2016).
Summary
This leads to a picture of hierarchical inference—in the brain—as reciprocal message passing between the levels of a cortical hierarchy; in which prediction errors ascend from lower to higher levels to drive changes in neuronal populations encoding states of affairs in the world. These populations (e.g., deep pyramidal cells in lower layers of the cortex) then supply a counter stream of descending predictions that resolve or cancel prediction errors at lower levels (Bastos et al., 2012; Markov et al., 2013; Mumford, 1992): e.g., by targeting inhibitory interneurons that are coupled to the superficial pyramidal cells broadcasting prediction errors (Pinotsis et al., 2014; Shaw et al., 2017). This architecture also plays out under discrete generative models and has become something of a workhorse for understanding recurrent message passing in cortical and subcortical hierarchies. On this view, self evidencing can be construed as minimising prediction errors or, more simply, surprise. In terms of information theory, this surprise (a.k.a., surprisal) is known as self-information. Variational free energy provides a tractable bound on self-information. In summary, minimising free energy is equivalent to minimising surprise, which is synonymous with maximising the evidence for generative models, under which sensory exchange with the environment is surprising.
The importance of being precise
The foregoing speaks to a physics of sentience or Bayesian mechanics that allows us to read the behaviour of others—and ourselves—as making inferences about the sensed world. One popular application of the free energy principle is to cast belief updating or message passing in the brain in terms of predictive coding. There are many interesting aspects to this application. We will pick out one that underwrites the notion of felt uncertainty. This rests upon the notion of precision and, on a psychological reading, notions of covert action or mental activity, such as attention.
Prediction errors (i.e., the free energy gradients that drive belief updating) can be regarded as carrying newsworthy information—at any given hierarchical level—to the level above. However, this is not the complete story. Higher levels have to select which prediction errors to listen to; much in the same way that we select our trustworthy news channels or sources of information. This selection rests upon predictions of predictability or precision (depicted in teal in Figure 1). Affording certain prediction errors greater precision increases their influence on belief updating and has all the hallmarks of attentional selection (Ainley et al., 2016; Auksztulewicz and Friston, 2015; Clark, 2013a; Feldman and Friston, 2010; Kok et al., 2012; Limanowski, 2022). Physiologically, this simply entails an increase in the excitability or postsynaptic gain of neuronal populations broadcasting prediction errors. On this view, there is an intimate relationship between attention and the modulation of synaptic efficacy by classical neuromodulators and nonlinear postsynaptic responses responsible for mediating the exchange between fast-spiking inhibitory interneurons and pyramidal cells (Auksztulewicz and Friston, 2015; Bauer et al., 2014; Graboi and Lisman, 2003; Lisman, 2012; Lisman and Buzsaki, 2008; Pinotsis et al., 2014; Shaw et al., 2017; Sohal et al., 2009; Spencer et al., 2003).
One crucial aspect of this precision engineering is that it underwrites our ability to filter out—or ignore—certain prediction errors when they are deemed imprecise. A key example is the attenuation of sensory prediction errors that report the consequences of movement (Blakemore et al., 1999; Brown et al., 2013; Hughes et al., 2013; Limanowski, 2017; Oestreich et al., 2015; Shergill et al., 2005). If we could not ignore the proprioceptive and somatosensory afferents—supplying evidence that we are not moving—then any intended or predicted movement would be revised immediately, and we would not be able to move. This mandates a transient suspension of sensory precision during active sensing: c.f., saccadic suppression of optic flow signals during saccadic eye movements (Wurtz, 2008).
Summary
The narrative so far is that things like brains are statistical organs, generating predictions and revising (subpersonal Bayesian) beliefs on the basis of prediction errors. Crucially, these predictions are contextualised with predictions of precision or predictability that instantiate attentional or intentional set; allowing us to select or attenuate prediction errors via a process of precision weighting. This precision weighting is nothing more than modulating the gain, postsynaptic sensitivity or excitability of appropriate neuronal populations.
The schematic in Figure 1 highlights a key aspect of the generative models that are fit for purpose to navigate our lived world; namely, the hierarchical structure. This kind of architecture can be viewed from many perspectives; namely, factorial, hierarchical and temporal depth. These notions are foregrounded to connect the treatment offered by Chris Fields who emphasises the constructivist aspect of active inference; in the sense, that we are in the game of constructing hypotheses or explanations for the sensory or empirical data available to us. In short, everything is a hypothesis that is most apt to explain our sensory samples—or the way that we measure the world. Crucially, these constructs may include space and time per se, and, perhaps, the very constructs of self and experience (Clark et al., 2019).
Shared narratives and generalised synchrony
In this final section, we consider the implications of self-evidencing under the free energy principle. There are many ways of approaching this subject; ranging from variational approaches to niche construction (Badcock et al., 2019; Bruineberg et al., 2018; Constant et al., 2018; Constant et al., 2019; Vasil et al., 2020; Veissiere et al., 2019); particularly, in an evolutionary setting (see Richard Watson – this issue), through federated inference and belief sharing to communication and distributed cognition (Fields et al., 2021b; Friston and Frith, 2015a; Friston and Frith, 2015b; Friston et al., 2020; Friston et al., 2022b; Levin, 2019; Pezzulo et al., 2021).
These treatments or perspectives share a common theme. Specifically, if any given thing—at any given scale—can be read as self-evidencing, then it will act upon its world to minimise surprise and maximise predictability. If this world is constituted by other things—that are therefore acting under the same imperatives—there is an inevitable (or perhaps emergent) tendency towards mutual predictability. From the perspective of a physicist, it will look as if an ensemble of self-evidencing agents evince something called generalised synchrony or synchronisation of chaos. Generalised synchrony comes in two flavours: identical and not. In identical synchronisation there is a one-to-one mapping between the dynamics of two systems. When this symmetry is broken, we have generalised synchrony. Generalised synchrony means that I can use my state to predict your dynamics and vice versa. It can be described in terms of a tendency for both of us to occupy a shared set of states on something called a synchronisation manifold; namely, an attracting set of states that lies in our joint state space.
Occupying these states just is to minimise our joint variational free energy. Note that by evincing generalised synchrony we are occupying shared characteristics states; namely, the states to which we are attracted, and have value for us. In other words, it will look as if I am acting to realise your characteristic (attracting) states, and you mine. This could be regarded as a form of mathematical ‘caring’ that emerges under a shared synchronisation manifold.
Figures 2 and 3 illustrate the emergence of generalised synchronisation in the context of dyadic exchange. This example rests upon a predictive coding simulation of communication with birdsong between two songbirds that have the same generative model (Friston and Frith, 2015a). By taking turns, they are never surprised—and never surprise themselves. This is because the only thing that changes periodically is whose turn it is to generate the sensorium. Anecdotally, the two birds jointly create a predictable sensorium by ‘singing from the same hymn sheet’.
But what would happen if two birds had different generative models? Because free energy minimisation entails learning (i.e., belief updating about the parameters of a generative model at a slower timescale), one bird will come to learn the others generative model in order to make it more predictable (and vice versa). The degree to which the generative model of some thing converges to the generative model of another depends upon their relative precision (Friston and Frith, 2015b). This is nicely conceptualised in terms of parent-daughter, teacher-student and inter-actions between any two things that have different experiences of a shared world.
The attendant phenomena and mechanisms of belief-sharing, distributed cognition (i.e., federated inference) are manifold. These can be considered in terms of the spread of ideas (Albarracin et al., 2022; Heins et al., 2023; Kastel and Hesp, 2021), through to the emergence of language as a free energy minimising process (Constant et al., 2019; Vasil et al., 2020; Veissiere et al., 2019). We will conclude with an example of generalised synchrony at a basal level that can be read as morphogenesis or pattern formation in the development of a system of multiple things; for example, cells (Kuchling et al., 2020).
Figures 4 and 5, are taken from (Friston et al., 2015). These simulations represent an early attempt to show that morphogenesis is an emergent property of joint free energy minimisation. In this example, the dynamics of different constituents of an ensemble—here, the cells of a multicellular ensemble—are markedly different; however, they all share the same (epigenetically specified) generative model illustrated in Figure 4.
However, this generative model prescribes what would be sensed in different contexts. The twist in this example is that each agent provides the sensations for other agents by broadcasting their internal states (i.e., implicitly their beliefs). In other words, every agent sets the context for other agents. This means that each agent has to find the right context to make its sensations predictable—and this context is established by other agents. In this example, the context is simply the physical location or place from which beliefs or signals are broadcast. This means that there is one, unique, free energy minimising solution in which all the agents (i.e., cells) have ‘found their place’. This is a nice example of how committing to a shared generative model leads to individuation and collective behaviour of a highly structured and unique sort: see Figure 5.
Conclusion
Clearly, we—where ‘we’ is read as every thing that exists—constitute an ecosystem and cannot all share the same generative model. Can the notion of generalised synchrony then be extended to situations in which different kinds of things are coupled to each other? The answer is yes and speaks to the notion that every thing must act in a (mathematically) caring way on another, in the following sense: if the observable aspects of something have characteristic states (i.e., the states of the attracting set), then minimising surprising observations is to act in a way that increases the probability that things will remain in the states to which they are attracted. If to ‘suffer’ is to be in a surprising state, then any enduring ecosystem—that evinces the distributed sentence implied by the free energy principle—will look as if it is caring for its denizens.