Abstract

We present a cerebellar architecture with two main characteristics. The first one is that complex spikes respond to increases in sensory errors. The second one is that cerebellar modules associate particular contexts where errors have increased in the past with corrective commands that stop the increase in error. We analyze our architecture formally and computationally for the case of reaching in a 3D environment. In the case of motor control, we show that there are synergies of this architecture with the Equilibrium-Point hypothesis, leading to novel ways to solve the motor error and distal learning problems. In particular, the presence of desired equilibrium lengths for muscles provides a way to know when the error is increasing, and which corrections to apply. In the context of Threshold Control Theory and Perceptual Control Theory we show how to extend our model so it implements anticipative corrections in cascade control systems that span from muscle contractions to cognitive operations.

Keywords: cerebellum, reaching, equilibrium point, motor learning, complex spikes

1. Introduction

The anatomy of the cerebellum presents a set of well established and striking facts (Eccles et al., 1967; Ito, 2006), which have inspired a variety of functional theories over the years. The cerebellum receives two main input sources, the mossy fibers and the climbing fibers. The mossy fibers convey a vast amount of afferent and efferent information, and synapse onto granule cells, Golgi cells, and neurons of the deep cerebellar nuclei. Granule cells exist in very large numbers, and could be considered the input layer of the cerebellum; they send axons that bifurcate in the cerebellar cortex, called parallel fibers, innervating Purkinje cells and molecular layer interneurons. Purkinje cells have intricate dendritic arbors with about 150,000 parallel fiber connections. On the other hand, each Purkinje cell receives a single climbing fiber that can provide thousands of synapses. Activation of a climbing fiber reliably causes a sequence of tightly coupled calcium spikes, known as a complex spike. In contrast, simple spikes are the action potentials tonically produced by Purkinje cells, modulated by parallel fiber inputs and feedforward inhibition from molecular layer interneurons. The sole output from the cerebellar cortex is constituted by the Purkinje cell axons, which send inhibitory projections to the deep cerebellar nuclei and to the vestibulum. Cells in the deep cerebellar nuclei can send projections to diverse targets, such as the brainstem, the thalamus, the spinal cord, and the inferior olivary nucleus. The inferior olivary nucleus is the origin of climbing fibers, which are the axons of electrotonically-coupled olivary cells that experience subthreshold oscillations in their membrane potential.

There is a prevailing view that the cerebellum is organized into modular circuits that perform similar computations. Sagittal regions of Purkinje cells called microzones receive climbing fibers from a cluster of coupled olivary neurons, and tend to be activated by the same functional stimuli. Purkinje cells in a microzone project to the same group of cells in the cerebellar nuclei, which in turn send inhibitory projections to the olivary neurons that innervate the microzone. A microzone together with its associated cerebellar nuclear cells is called a microcomplex, which together with its associated olivary cells constitutes an olivo-cerebellar module.

In one of the first and most influential theories about cerebellar function, developed by a succession of researchers (Marr, 1969; Albus, 1971; Ito et al., 1982), the convergence of mossy fibers (which carry sensory and motor signals into the cerebellum) onto Purkinje cells supports pattern recognition in a manner similar to a perceptron. This pattern recognition capacity is used to improve motor control, and the Marr-Albus-Ito hypothesis states that the other major cerebellar input, the climbing fibers, provide a training signal that, thanks to conjunctive LTD on the parallel fiber synapses into Purkinje cells, allows for the right patterns to be selected. Conjuctive LTD (Long-Term Depression) reduces the strength of parallel fiber synapses when they happen to be active at the same time as climbing fiber inputs. Within this general framework, a persistent challenge comes in determining what the right patterns are, and how they are used to improve motor control.

One common trend for cerebellar models of motor control is to assume that the cerebellum is involved in providing anticipative corrections to performance errors (Manto et al., 2012), and that this is done by forming internal models of the controlled objects (Wolpert98,Ebner13). Forward models take as inputs a command and a current state, returning the consequences of that command, often in the form of a predicted state. Inverse models take as their input a desired state and a current state, returning the commands required to reach the desired state. Adaptive learning in the cerebellum is often assumed to involve using error signals to learn these types of internal models. It should be noted that some computational elements (such as adaptive filters), which could be implemented by cerebellar microzones, can in principle learn to implement either a forward or an inverse model depending on its input/output connections and on the nature of its error signal (Porrill et al., 2013).

The error signal required by a forward model is a sensory error, which consists of the difference between the desired sensory state (e.g., a hand trajectory) and the perceived sensory state. In contrast, inverse models require a motor error signal that indicates the difference between a given command and the command that would have produced the desired outcome. Figures 1A,B shows two well known proposed architectures that allow the cerebellum to use forward and inverse models to reduce performance errors, respectively called the recurrent architecture, and feedback error learning. A recent review (Ito, 2013) examined the signal contents of climbing fibers for different cerebellar circuits, and found that both sensory and motor errors might be present, bringing the possibility of having both forward and inverse models in the cerebellum.

Figure 1

(A) The recurrent architecture (Porrill and Dean, 2007) uses a forward model as the adaptive element in a controller. This forward model learns to predict the response of the controlled object to the motor commands, using an error that considers the difference...

Inverse models in the cerebellum present some difficulties. The first one is known as the motor error problem, and consists on the requirement that the climbing fibers carry an unobservable motor error rather than the observed sensory error signal. This creates difficulties when applying them to the control of complex plants (Porrill et al., 2013). A second difficulty is the evidence that simple spikes in Purkinje cells are consistent with a forward model, but probably not with an inverse model (Ebner and Pasalar, 2008). Although climbing fiber may carry information about motor errors, most studies seem to find correlations with sensory signals and sensory errors (e.g., Ghelarducci et al., 1975; Stone and Lisberger, 1986; Ekerot et al., 1991; Yanagihara and Udo, 1994; Kitazawa et al., 1998; Simpson et al., 2002).

There are two other problems that must be addressed by cerebellar models that form internal models, whether forward or inverse (Porrill and Dean, 2007). The distal error problem happens when we use output errors (such as sensory signals) to train the internal parameters of a neural network. Backpropagation is a common—although biologically implausible—way to deal with this problem. In reaching, the distal error problem can be conceived in terms of knowing which muscles to stimulate if you want the hand to move in a certain direction. The nature of the distal error problem is the same one as that of the motor error problem, since they both are credit assignment problems. After an error is made, the credit assignment problem consists in knowing which control signals contributed to the error, and which ones can reduce it. The redundancy problem happens when a set of commands lead to the same outcome, leading to incorrect generalizations when that set is non convex. One common setting where the redundancy problem arises is in reaching. The human arm, including the shoulder and elbow joints has 5° of freedom (without considering shoulder translation), allowing many joint configurations that place the hand in the same location.

The recurrent architecture of Figure ​1A, and the feedback error learning scheme of Figure ​1B are shown here because they present two different ways of addressing the motor error and redundancy problems. The recurrent architecture is trained with sensory error, so the motor error problem is not an issue; moreover, this architecture receives motor commands as its input, so it doesn't have to solve the redundancy problem. Feedback error learning approximates the motor error by using the output of a feedback controller. The feedback controller thus acts as a transformation from sensory error into motor error. If the feedback controller can properly handle redundancy, then so will the inverse model that it trains.

In this paper we propose a new cerebellar architecture that successfully addresses the motor error problem, the distal learning problem, and the redundancy problem. This architecture is specified at an abstract level, and consists of descriptions of the inputs and outputs to cerebellar modules, the content of climbing fiber signals, and the nature of the computations performed by the cerebellar microzone.

In our architecture, the role of the cerebellum is to provide anticipative corrections to the commands issued by a central controller, and we explore 4 variations on how to associate a predicted increase in error with the corrective motor commands. For example, in the first version of our architecture (called model 1 in the Section 3), shown in Figure ​1C, these corrections are learned by associating the sensory/motor context shortly before an error with the corrective response issued by the central controller shortly afterwards. We thus propose that the cerebellar inputs carried by mossy fiber signals consist of all sensory and motor signals that can be used to predict a future state. The cerebellar output could be a predicted set of motor commands similar to a correction issued by the central controller in the past. The climbing fiber activity rises in response to an increase of an error measure over time, not to instantaneous error values. Cerebellar microzones act to predict an increase in error, and this prediction is then associated with a correction. For example, in our first variation of the architecture (model 1), particular sensory/motor contexts are associated with a response by the central controller happening shortly after an increase in the climbing fiber activity. This is consistent with many models based on the Marr-Albus-Ito framework. If a bank of filters (presumably arising from computations in the granule cell layer) are placed in the inputs, then this associator becomes functionally similar to adaptive filter models commonly found in cerebellum literature (Fujita, 1982; Dean and Porrill, 2008). Those models usually assume that mossy fiber inputs correlated with climbing fiber activity cause a decrease in the firing rate of Purkinje cells because of conjunctive LTD, leading to an increase in firing rate at the cerebellar or vestibular nuclei. This could be conceived as associating a particular pattern of mossy fiber inputs with a response in cerebellar nuclei, with the input filters giving the system the ability to recognize certain temporal patterns.

We explore the ideas of our cerebellar architecture by implementing it in computational and mathematical models of reaching in 3D space. We chose this task because it presents challenges that should be addressed by cerebellar models, namely distal learning, redundancy, and timing. There is a tendency for studies of the cerebellum in motor control to model problems where the error signal is 1-dimensional, thus hiding the difficulties of distal learning and redundancy. For example, the distance between the hand and the target is a 3-dimensional error, but it can be decomposed into 1-dimensional errors (left-right, up-down, forward-backward). In a different example, for 2D reaching with a planar arm joint-angle errors can be used, so the error signal already implies what the right correction is. In the present study we try to break away from this tendency.

In the context of reaching, the idea that the cerebellum could function by anticipatively applying the same corrections as the central controller raises valid concerns about stability. We address these concerns by showing that if the central controller acts like a force always pointing at the target, and whose magnitude depends only in the distance between the hand and the target, then an idealized implementation of our cerebellar architecture will necessarily reduce the energy of the system, resulting in smaller amplitude for the oscillations, and less angular momentum. The idealized implementation of the architecture thus yields sufficient conditions for its successful application. This result is presented in the Supplementary Material.

In addition to our mathematical model, we implemented four computational models of a 3D reaching task embodying simple variations of our proposed architecture. The central controller in the four models uses an extension of the Equilibrium-point hypothesis (Feldman and Levin, 2009), described in the Section 2. The presence of equilibrium points permits ways of addressing the motor error problem different than using stored copies of efferent commands from the central controller, and ways of detecting errors different than visually monitoring the distance between the hand and the target. Our four models thus explore variations of the architecture, in which either the learning signal or the corrections are generated using proprioceptive signals from muscles. For these models the controlled plant is a 4 DOF arm actuated by 11 Hill-type muscles. The cerebellar module associates contexts, represented by radial basis functions in the space of afferent and motor signals with corrective motor commands. These associations between contexts and motor responses happen whenever a learning signal is received, which happens when there is an increase of the error.

As mentioned above, we use two types of errors in our computational models. The first type of error is the distance between the hand and the target, which proves to be sufficient to obtain predictive corrections. By virtue of using the equilibrium-point hypothesis in the central controller we can alternatively use a second type of error signal generated for individual muscles that extend when they should be contracting. This allows the cerebellum to perform anticipative corrections in a complex multidimensional task like reaching using learning signals that arise from 1-dimensional systems. This learning mechanism can trivially be extended to serial cascades of feedback control systems, such as those posited by Perceptual Control Theory (Powers, 1973, 2005) and Threshold Control Theory (Feldman and Levin, 2009; Latash et al., 2010), allowing the cerebellum to perform corrections at various levels of a hierarchical organization spanning from individual muscle contractions to complex cognitive operations. We elaborate on this in the Discussion.

2. Materials and methods

2.1. Physical simulation of the arm

In order to test the principles of our cerebellar model in 3D reaching tasks we created a detailed mechanical simulation of a human arm. Our arm model contains a shoulder joint with 3° of rotational freedom, and an elbow joint with one degree of rotational freedom. Inertia tensors for the arm, forearm, and hand were created assuming a cylindrical geometry with size and mass typical of human subjects. The actuators consist of 11 composite muscles that represent the main muscle groups of the human arm (Figure ​2). Some of these muscles wrap around “bending lines,” which are used to model the curved shape of real muscles as they wrap around bones and other tissue. The force that each muscle produces in response to a stimulus comes from a Hill-type model used previously with equilibrium point controllers (Gribble et al., 1998). The mechanical simulation was implemented in SimMechanics, which is part of the Matlab/Simulink software package (http://www.mathworks.com/), release 2012b. Source code is available from the first author upon request.

Figure 2

Geometry of the arm model. Blue lines represent the upper arm and forearm, with the small black sphere representing the shoulder. Red lines represent muscles. Cyan lines are bending lines. The colored spheres (with color representing their position along...

The coordinate for the targets used in our test reaches are shown in Table ​1.

Table 1

Coordinates used for the targets in the test reaches.

2.2. Central controller

The central controller we use to perform reaching is a modified version of Threshold Control Theory (TCT, Feldman and Levin, 2009). TCT is an extension of a biological control scheme known as the Equilibrium Point (EP) hypothesis. The lambda version of the EP hypothesis states that the control signals used in the spinal cord to drive skeletal muscles consist of a group of muscle lengths known lambda values. When the length of a muscle exceeds its lambda value it contracts, so that a set of lambda values will lead the body (or in our case, the arm) to acquire an equilibrium position. The muscle lengths at the equilibrium position may or may not be equal to their lambda values. Also, notice that given a set of lambda values there is a unique position that the limb will acquire, because the viscoelastic properties of the muscles will lead the joint to adopt the configuration minimizing its potential energy. In this paper the control signals arriving at the spinal cord to specify threshold lengths for muscle activation are called target lengths.

Considering that the velocity of a muscle's extension-contraction is represented in spindle afferents (Lennerstrand, 1968; Lennerstrand and Thoden, 1968; Dimitriou and Edin, 2008, 2010), the argument made for lengths in the EP hypothesis could be modified to hypothesize threshold velocities being the control signals at the spinal cord level, and threshold lengths being used at a higher level, modulating the threshold velocities. Such a two level control system is inspired by the hierarchical organization found in TCT and in Perceptual Control Theory (Powers, 1973, 2005), and is capable of stabilizing oscillations with far more success than pure proportional control. In general, it is hard to stabilize movement without velocity information, so this factor has been introduced in equilibrium-point controllers (de Lussanet et al., 2002; Lan et al., 2011). As in TCT, we assume that the forces are generated at the level of the spinal cord, similarly to the stretch reflex, and we assume a proprioceptive delay of 25 ms.

The way our controller guides reaching starts by mapping the Cartesian coordinates of a target into the muscle lengths that the arm would have with the hand located at those coordinates. In order to make this mapping one-to-one we assume that the upper arm performs no rotation. The difference between the current muscle length and the target muscle length will produce a muscle stimulation, modulated by the contraction velocity (details in next subsection). The blocks labeled “inverse kinematics” and “feedback controller” in Figure ​3 represent the computations of the central controller being described.

Figure 3

Block diagram corresponding to the computational implementation of our architecture in Matlab when using visual errors. λi is the target length for muscle i. eli and evi are respectively the length and velocity errors for the i-th muscle. ci is...

2.2.1. Equations for the central controller

The central controller performs two tasks in order to reach for a target. The first task is, given the coordinates of the target, to produce the muscle lengths that would result from the hand being at those coordinates. The second task is to contract the muscles so that those target lengths are reached.

The first task (inverse kinematics) requires to map 3D desired hand coordinates into an arm configuration. The spatial configuration of the arm that leads to hand location is specified by 3 Euler angles α, β, γ at the shoulder joint, and the elbow angle δ. Our shoulder Euler angles correspond to intrinsic ZXZ rotations. In order to create a bijective relation between the 3D hand coordinates and the four arm angles we set γ = 0.

For a given target hand position we calculate the angles α, β, γ, δ corresponding to it. Using these angles we calculate the coordinates of the muscle insertion points, from which their lengths can be readily produced. When the muscle wraps around a bending line we first calculate the point of intersection between the muscle and the bending line. The muscle length in this case comes from the sum of the distances between the muscle insertion points and the point of intersection with the bending line.

The formulas used to calculate the angles α, β, γ, δ given hand coordinates (x,y,z) and the shoulder at the origin are:

(1)

(2)

(4)

Where Larm and Lfarm are the lengths of the upper arm and forearm respectively. If we have the coordinates of a humerus muscle insertion point (as a column vector) at the resting position, then we can find the coordinates of that insertion point at the position specified by α, β, γ using the following rotation matrix:

(5)

where c(·) = cos(·), s(·) = sin(·).

The coordinates of insertion points on the forearm at the pose determined by α, β, γ, δ are obtained by first performing the elbow (δ) rotation of the coordinates in the resting position, and then performing the shoulder rotation (α, β, γ). Muscle lengths come from the distance between their insertion points, or between their insertion points and their intersection with the bending line. Details on how to determine whether a muscle intersects a bending line can be found in the function piece5.m, included with the source code. This function also obtains the point of intersection, which is the point along the bending line that minimizes the muscle length.

Once we have found target equilibrium lengths for the muscles, we must contract them until they adopt those lengths. To control the muscles we use a simple serial cascade control scheme. The length error el of a muscle is the difference between its current length l and its equilibrium length λ. The velocity error ev is the difference between the current contraction velocity v (negative when the muscle contracts), and the length error el:

elgl(l − λ),   evgv(vel).

(6)

The constants gl, gv are gain factors. For all simulations gl = 2, gv = 1. The input to the muscles is the positive part of the velocity error. This creates a force that tends to contract the muscle whenever its length exceeds the equilibrium length, but this force is reduced according to the contraction speed. At steady state the muscle lengths may or may not match the equilibrium lengths, depending on the forces acting on the arm. To promote stability the output of the central controller went through a low-pass filter before being applied to the muscles. Also, to avoid being stuck in equilibria away from the target, a small integral component was added, proportional to the time integral of the central controller's output.

2.3. Cerebellar model

The cerebellar model provides motor commands whenever an “error-prone area” of state space is entered. Each error-prone area consists of a point in state space (its center, or feature vector), and a kernel radius. To each error-prone area there also corresponds a “correction vector,” specifying which muscles are activated and which are inhibited when the error-prone area is entered. At each iteration of the simulation the distance between the currently perceived point in state space and the center of each error-prone area is obtained, and each correction vector will be applied depending on this distance, modulated by its kernel radius. The kernels used can be exponential or piecewise linear. The action of the cerebellar model is represented in Figure ​3 by the block labelled “stored corrections.”

Learning in the model requires an error signal, which could be visual (such as the one that may be generated in posterior parietal cortex Desmurget et al., 1999), or could arise from muscle afferents. Block diagrams corresponding to the model with the visual and muscle error signals are in Figures 5, 7, 9, 11. The visual error signal arrives with a delay of 150 ms. Each time the error increases its magnitude (its derivative becomes positive) this increases the probability of complex spikes; for each IO cell, this probability also depends on the current phase of its subthreshold oscillation (see next subsection). Complex spikes generate a new error-prone area. The feature vector associated with this area is the state of the system a short time span before the error increased; usually this time span will be half the time it takes for the error derivative to go back to zero, plus an amount of time comparable to the perceptual delay. For as long as the error derivative is positive, at each iteration we will record the efferent signals produced by the central controller, and when the derivative stops being positive we will obtain the average of all the recorded efferent signals. The correction is obtained from this average. The muscles are driven by the velocity errors, so these are the efferent signals collected during correction period. All the kernel radii were equal, so they have no change associated with learning.

Notice that if the error derivative remains positive, more complex spikes will be generated as different olivary nucleus cells reach the peak of their subthreshold oscillations. Thus, we have two gain mechanisms for a correction: one comes from the magnitude of the error derivative, which will promote a large response (and synchronous activity) of complex spikes; the second comes from the amount of time that the error derivative remains positive, since more inferior olivary nucleus cells reaching the peak of their subthreshold oscillations while this derivative is positive will mean a larger number of complex spikes, creating error-prone areas along the trajectory of the arm. Performance-wise, it is beneficial to have a sequence of error-prone areas rather than a single one, since the appropriate correction to apply will change as the arm moves.

When the new feature vector is too close to a previously stored one, or when we have already stored too many feature vectors, then the new feature vector will become “fused” with the stored feature vector closest to it. When two areas fuse they are both replaced by a new area whose feature vector is somewhere along the line joining the feature vectors of its parent areas, and likewise for its correction vectors.

2.3.1. Algorithm for the cerebellum simulations

We will describe the part of the computational model that deals with the functions of a microcomplex (the file CBloop11c.m of the source code). To simplify the exposition, we do not consider the case when the maximum number of “feature vectors” have been already stored.

The input to the microcomplex model has components that represent error, and afferent/efferent signals. The error component consists of the distance between the hand and the target (the visual error), and its derivative (from which complex spikes are generated). The afferent information includes a quaternion describing the shoulder joint position, the derivative of this quaternion, an angle describing the elbow position, and this angle's derivative. The efferent input is the muscle input described in Section 2.2.1 (consisting of 11 velocity errors), and in addition, the desired shoulder position (expressed as a quaternion), and the desired elbow angle. The error and its derivative arrive with a visual delay of 150 ms. The rest of the information arrives with a proprioceptive delay of 25 ms.

The output of the microcomplex consists of 11 additional signals that will be added to the muscle inputs.

The algorithm's pseudocode is presented next. An unhandled spike is a complex spike whose “context,” consisting of the afferent/efferent signals and the error briefly before the spike, has not been stored as a “feature vector.” A “feature vector” is a context associated with a motor correction.

At each step of the simulation:

1: Generate complex spikes using the error derivative

2:

if there are unhandled spikes then

if If the error derivative is no longer positive, and the time since the spike doesn't exceed 250 ms then

2.1.1: Store the context corresponding to the unhandled spike as a new feature vector

2.1.2: Store the motor correction associated with the new feature vector

end if

end if

3: For each feature vector, calculate its distance to the current context, and add its motor correction to the output as a function of that distance

In step 2.1.1, the stored feature vector consists of the context as it was milliseconds before the complex spike, with τv being the visual delay, τp the proprioceptive delay, t the current time, and tcs the time when the complex spike arrived.

In step 2.1.2, the motor correction that gets stored is the average motor input from (tcs − τvτp) to (t − τvτp).

The output that the microcomplex provides at each simulation step is obtained using radial basis functions. The distance between the current context and each feature vector is calculated, and those distances are normalized. The contribution of each feature vector to the output is its corrective motor action scaled by an exponential kernel using that normalized distance. Let f(i) be the i-th feature vector, and w(i) its associated correction. Let v denote the vector with the current context information. We first obtain a distance vector D, whose components are: D(i) = ||f(i)−v||2.

The distance vector is normalized as where MF is the maximum number of feature vectors allowed. The contribution of feature i to the output is F(i) = w(i)eγDN(i), with γ specifying the kernel radius.

2.4. Inferior olivary module

The process of generating complex spikes when using the visual error is explained next. By “complex spike” we mean a signal indicating that a correction should be stored. There are N inferior olivary nucleus cells, from which N3 are assumed to oscillate at 3 Hz, and N7 are assumed to oscillate at 7 Hz. The phases of both cell subpopulations are uniformly distributed so as to occupy the whole range [0, 2 π] in the equation below. Let ϕ(i) denote the phase of cell i, and α(i) denote its angular frequency. The probability to spike at time t for cell i is calculated as:

(7)

Where p is a constant parameter, E is the visual error, and [E′]+ is the positive part of its derivative. At each step of the simulation a random number between 0 and 1 is generated for each cell. If that number is smaller than PiCS, and the cell i has not spiked in the last 200 ms, then a complex spike is generated.

Complex spikes are less likely to be generated when the error is small. When the hand is close to the target it is likely that it oscillates around it. Generating cerebellar corrections in this situation could be counterproductive, as the angle between the hand and the target changes rapidly, and so do the required corrections. In our idealized cerebellum (see Supplemenatry Material) there are conditions ensuring that no corrections are created when the angle between the hand and the target has changed too much. Since there is no obvious biological way to measure the angle between the hand and the target, we just avoid generating corrections when the hand is close to the target. Another mechanism present in our computational simulations to deal with this problem is that no corrections are stored if the time between the complex spike and the time when the error stops increasing is more than 250 ms.

Generating complex spikes when using the proprioceptive error follows a simpler procedure. For each muscle three conditions must be satisfied for a “complex spike” to be generated: (1) its length l is increasing, (2) l is longer than it's target value λ, and (3) no complex spikes have been generated for that muscle in the last 200 ms. A variation described in the Section 3 adds a fourth condition: (4) the visual error must be increasing (E′ > 0).

2.5. Generating corrective muscle activity

In this paper there are three different methods to determine the corrective motor commands that become associated with points of state space where the error increases.

The first method, in model 1 of the Section 3, is used with visual errors. The corrective commmand consists of the average efferent commands produced from the point when the error started to increase until the error stopped increasing (points 3 and 5 in Figure ​4A).

Figure 4

Correcting reaching errors. (A) Schematic trajectory of the hand as it reaches for target T in 2 dimensions. Seven points of the trajectory are illustrated, corresponding to seven important points in time with different afferent/efferent contexts. 1....

The second method, in models 2 and 3, is used with proprioceptive errors. If a complex spike is generated for a muscle, the corrective command is simply a slight contraction of that same muscle.

The third method is used with visual errors, and is applied in model 4. The corrective command for muscle i will be proportional to the product ci = [< li > −λi]+[i]+, where li is the length of muscle i, < li > is the average of that length through a brief period before the error stopped increasing (e.g., a brief period between points 3 and 5 in Figure ​4A, λi is the target length for muscle i, i is the derivative of the length, and [·]+ returns the positive part (and zero otherwise).

3. Results

3.1. Implementing the architecture in a reaching task

We hypothesize that the role of the cerebellum in motor control is to associate afferent and efferent contexts with movement corrections; in the case of reaching the controller involves the cortex, basal ganglia, brainstem, and spinal cord. The role of the central controller is to reduce error, and we begin by assuming that the role of the cerebellum is to anticipatively apply the corrections of the central controller. (model 1 below). How this could happen for the case of reaching is described in Figure ​4. Before an incorrect motion is made (moving the hand away from the target), the mossy fibers reaching the granule layer have afferent and efferent information that could predict when this error will occur. When the error does increase during a reach, this is indicated by complex spikes, while the central motor controller is acting to correct the error. The cerebellum associates the afferent and efferent information of granule cells shortly before the increase in error with the motor actions required to correct it, using climbing fiber activity as the training signal. The corrective motor actions can be those that the central motor controller produces in order to stop the error from increasing, which come shortly after the onset of error increase; thus the cerebellum doesn't have to obtain those actions itself, it can merely remember what the central controller did. This idea is related to Fujita's feed-forward associative learning model (Fujita, 2005). Other ways to obtain the corrective motor actions are described in the models below.

As mentioned in the Introduction, we created mathematical and computational models implementing these ideas. The mathematical model and the results of its analysis are described in the Discussion. The full mathematical treatment is in the Supplementary Material. The elements of the computational models are described in the Section 2. In the remainder of the Results we present the outcome of simulations using four computational models with basic variations of our cerebellar architecture. All these computational models use the same central controller and the same arm and muscle models.

The physical simulation of the arm used for this study used no friction at the joints. The muscles had limited viscoelastic properties and the control signals had low gain. Under these conditions, the arm under the action of the central controller alone tended to place its distal end at the target slowly (in around 1.5 s) and with some oscillations, even in the absence of gravity forces. Introducing a 25 ms proprioceptive delay resulted in larger oscillations, and the hand no longer reached the target with arbitrary accuracy, but would instead oscillate around it in a non periodic fashion. Moreover, certain positions of the target would cause the arm to become unstable, leading to chaotic flailing.

To test that the cerebellar corrections could gradually reduce the error as learning progressed through successive reaches, we selected 8 target locations and simulated 8 successive reaches to each target. From these 8 targets one of them (target 2) produced instability of the arm when no cerebellar corrections were applied. The same 8 targets were used for the four models presented here. Figure ​2 presents a visualization of the arm's geometry, and of the 8 targets.

3.2. Simulation results

3.2.1. Model 1: visual errors, efferent copies to generate corrections

We first considered the case when complex spikes were generated when the distance between the hand and the target increased, according to Equation (7). The corrective muscle commands were proportional to the average of the efferent commands produced between the onset of error increase and the time when the error no longer increased (the period between points 3 and 5 in Figure ​4). Figure ​5 presents a block diagram indicating the signals and modules involved in this model.

Figure 5

Computational model with the visual error signal, and a corrective command that is obtained from the efferent commands produced by the central controller (model 1 in the text). This is the same model depicted in Figure ​3, but at a slightly...

Figure ​6A shows the evolution through time of the distance between the hand and the target in the 1st, 4th, and 8th reaches toward a representative target. To measure the success of a reach we obtained the time integral of the distance between hand and target through the 4 s of simulation for each reach. Smaller values of this performance measure indicate a faster, more accurate reach. Figure ​6B shows our performance measure for each of the 8 successive reaches, averaged over the 8 targets.

Figure 6

Results for Model 1. (A) Distance between hand and target through 4 s of simulation time for the first, fourth, and eighth reaches to target 7. The cerebellar system was trained using the distance between the hand and the target as the error, and the...

Figure ​6 shows that on average the performance increases through successive reaches. The error may not decrease monotonically, however, since the correction learned in the last trial may put the system in a new region of state space where new errors can arise within the time of the simulation. Eventually, however, the hand comes close to monotonically approaching the target. The instability present in the second target dissappeared on the second reach (not shown).

Although this model improves the performance of the reach, it can't be considered biologically plausible unless we understand how the outputs at the deep cerebellear nucleus could become associated with the corrections they presumably apply. Basically, the problem is that if all microcomplexes receive the same learning signal (increase in visual error), then all the DCN populations will learn the same response, and the arm would express all possible corrections upon entering an error-prone area of state space. In the Discussion we elaborate on this. In the rest of the Section 3 we present 3 alternative models were the corrections to be applied are not learned from efferent copies of the commands to the arm, but from proprioceptive signals.

3.2.2. Model 2: proprioceptive errors, individual muscle corrective signals

Using the equilibrium point hypothesis in the central controller has the distinct advantage that we know the lengths at which the muscle stops contracting (called target lengths in this paper). A simple way to detect errors could be to monitor when a muscle is longer than its target length, but is nevertheless elongating. A simple way to correct that error is to contract that muscle a bit more. The multidimensional task of applying corrections during 3D reaching is thus reduced to a group of one dimensional tasks corresponding to individual muscle groups. Figure ​7 shows a block diagram implementing these ideas as done in our second model.

Figure 7

Model with the proprioceptive error signal, and a corrective command that is simply a contraction of the muscle that produced the error signal (model 2 in the text). The error is the muscle length l minus the target length λ. This target length...

Figure ​8 shows the results of using a model where the errors are detected and corrected at the level of individual composite muscles, as just described. It can be observed that improvement is slower than in the case of the previous model. For example, the instability of the second target only dissappeared during the sixth reach (not shown). In our simulations of model 2 the cerebellar corrections could lead to instability unless we use small kernel radii and small amplitude for the corrections. A possible reason for this is that our central controller does not specify an optimal temporal sequence of muscle contractions, but instead specifies a static set of target lengths. The trajectory of muscle lengths that leads the hand in a straight line toward the target may not have those lengths monotonically approaching the target lengths. On the other hand, our system generates an error signal whenever that approach is non monotonic. This inconsistency is the price of using one-dimensional signals to approach an error that arises from the nonlinear interaction of several independent variables. The next model uses a simple approach to try to overcome this problem.

Figure 8

Results for Model 2. The cerebellar system was trained using an error signal produced when muscles became larger than their target value. (A,B) Refer to Figure ​6 for interpretation.

3.2.3. Model 3: proprioceptive errors with visual error constraint, individual muscle corrective signals

In the previous model the gain of the corrections and their area of application in state space had to remain small because there can be some inconsistency between the error signals from individual muscles and the visual error. A muscle continuing to elongate past its target value does not imply that contracting it will bring the hand closer to the target. A simple way to address this is to add the necessary condition that if a correction is to be stored, the visual error should be increasing. Corrective signals will thus arise when the muscle is elongating beyond its target length, and the hand is getting away from the target. In this way, even if the muscle lengths are getting away from their target values, no corrections will be stored when the hand is approaching the target monotonically. Figure ​9 shows how the architecture of model 2 is augmented with visual errors in order to produce model 3.

Figure 9

Model with the proprioceptive error signal, a visual error constraint, and a corrective command that is simply a contraction of the muscle that produced the error signal (model 3 in the text). Notice that this is similar to the model in Figure ​...

Figure ​10 shows the results of using a such a model. Using the additional constraint permits larger gains in the corrections and larger kernel radii than those used in model 2. This is reflected by a larger increase in performance. This increase, however, is still not as good as that seen in model 1. The visual error is what we really want to reduce, and there is a limit to how much this can be done when the error signals are triggered at the level of muscles, as the visual error and the proprioceptive error are not entirely equivalent. This is addressed by the next model.

Figure 10

Results for Model 3. The cerebellar system was trained using an error signal produced when muscles became larger than their target value, with the additional constraint that the error (distance between hand and target) had to be increasing. (A,B) Refer...

3.2.4. Model 4: visual errors, proprioceptive corrective signals

As discussed above, visual errors are the most appropriate to improve performance, so in this model we use them, just as in model 1. Unlike model 1, we don't use the commands from the central controller in order to create the corrections. We must then find a way to solve the distal error problem without the central controller. A way to do this is to create corrections similar to the signals that indicated error increase in models 2 and 3.

Model 4 generates error signals (complex spikes) when the hand is getting away from the target according to Equation (7), just like model 1. Figure 11A shows the signals and modules implied by model 4. For each muscle, the correction associated with an error signal is proportional to two factors: how much longer the muscle is than its target value, and how fast its length is increasing (Figure 11B). The block that associates contexts with predicted increases in error (labeled “ERROR INCREASE PREDICTOR”) is identified with the cerebellum, while the “CORRECTION GENERATION” module is identified with muscle afferents and spinal cord neurons. We assume that the predictions of error increase from the cerebellum become associated with the corrections generated at the level of the spinal cord. This is elaborated in the Discussion.

Figure 11

Model with visual errors and proprioceptive error signals (model 4 in the text). (A) The visual error signal used by this model is the same one as in model 1, but unlike model 1, the correction associated with an error is not a copy of a command from...

Figure ​12 shows the performance of model 4. It can be seen that the error reduction is comparable to that of model 1, but using a novel solution to the motor error problem based on the assumption that the muscle is controlled through an equilibrium length.

Figure 12

Results for Model 4. The cerebellar system was trained using the same error signal as in model 1 (Figure ​6), but the corrective commands were produced from muscle proprioceptive signals. (A,B) Refer to Figure ​6 for interpretation....

4. Discussion

As research on the cerebellum continues, it becomes increasingly clear that although cerebellar microzones have a uniform architecture, the role they play in various systems can be different depending on their input and output connections. For example, cerebellar microzones could implement either forward or inverse models (Popa et al., 2012; Ito, 2013; Porrill et al., 2013). Cerebellar architectures such as feedback-error learning (Kawato and Gomi, 1992, Figure ​1B) and the recurrent architecture (Porrill and Dean, 2007, Figure ​1A) specify the connectivity of cerebellar microzones, and the computational role they would therefore play.

We have presented an architecture in which the cerebellum reduces errors associated with climbing fiber activity when that activity arises from the increase in some error measure. Instead of assuming that complex spikes encode the magnitude of some performance error, we have assumed that they are generated when the derivative of the error becomes positive. This leads to a sparse code that generates a forward model for anticipative corrections. This forward model exists only in locations of state space where the error is prone to increase, and predicts a corrective command, not the output of the controlled object. Very importantly, the identity of the error signal does not need to imply the dimension along which the correction should be made. Although we have assumed that the central controller uses closed-loop feedback, this is not necessary for our first model. Our architecture has the potential to explain the presence of predictive and feedback performance errors in Purkinje cell simple spikes (Popa et al., 2012, 2013), the correlation of complex spikes with both sensory and motor events (Ito, 2013), the sparsity of complex spikes, and as discussed below, the role of the cerebellum in nonmotor operations (Ito, 2008; Buckner, 2013; Koziol et al., 2014; Popa et al., 2014). A possible way to discard our architecture in a particular system is when errors that are not increasing or changing elicit a sustained complex spike response. By “errors,” we refer not only to performance errors, but in general to signals that merit a behavioral response, such as an unexpected perturbation, or a potentially nociceptive stimulus.

We have explored our architecture in the context of reaching in 3D space. In addition to the mathematical treatment described below, we showed that the equilibrium point hypothesis gives our architecture the ability to solve the motor error problem in a novel way, using proprioceptive muscle signals (models 2,3, and 4). The success of model 4 suggests that we can predict errors using visual signals, and generate corrections using proprioceptive signals. It is clear that we can provide predictive control without the need to predict the kinematic or dynamic state variables of the controlled plant. Moreover, a signal which very loosely represented the positive part of the error derivative is sufficient to train our predictive controller. The type of corrections that our model cerebellum provides tend to avoid episodes where the hand gets away from the target; this is important when using a controller based on the lambda model of the equilibrium-point hypothesis (Feldman and Levin, 2009). A controller that only specifies a set of target muscle lengths (and not a trajectory of such lengths) may produce reaches by simultaneously contracting all the muscles whose lengths are longer than their desired lengths. This, in general, will not result in a straight-line reach. What the cerebellar controller does is to modify the activity of antagonist muscles at different points of the trajectory so that the hand monotonically approaches its target, producing a reach that is closer to a straight line.

All four models in this paper avoid or solve the redundancy problem. In Section 2.5 three ways of generating corrective motor commands were described. When the corrective output is generated from an efferent copy of the central controller (model 1), the redundancy problem is avoided, as it is assummed that this is handled by the central controller (the recurrent architecture avoids the redundancy problem in a similar manner). For the two other ways of generating corrective commands (in models 2,3,4), the redundancy problem is solved as soon as equilibrium lengths are given. Notice that equilibrium lengths determine the final position of the arm uniquely, as the viscoelastic properties of muscles lead the arm toward a configuration of minimal potential energy.

4.1. The mathematical model

In our mathematical model the hand is considered to be a point mass, and the central controller applies a force applied to this mass, always pointing to the origin, which is considered to be target. This constitutes a central force system, and as in the case of planetery motion under gravity forces it will tend to produce elliptical trajectories around the origin.

We modelled the “cerebellum” as a system that would apply impulsive forces to the point mass whenever particular regions of state space were entered, and proceeded to prove that such a cerebellum will continue to reduce the angular momentum in the trajectory until it either gets close enough to the target, or until it becomes circular. Circular trajectories do not ellicit cerebellar corrections because the error signal (distance between the hand and the target) does not increase. This is a shortcoming of generating learning signals only when the error increases.

The crucial part of this mathematical treatment is specifying when cerebellar corrections will be created, and for each cerebellar correction what will be the impulse vector associated with it. The cerebellar controller is characterized by three numbers: a speed threshold, a distance threshold, and a gain. A cerebellar correction is created whenever two conditions are met: the error begins increasing faster than the speed threshold, and it grows beyond the distance threshold.

2.1 Definitions

Spike sorting relies on the assumption that the action potentials of a single neuron lead to extracellular spikes with similar waveforms (Lewicki 1998). This is generally not true, since spikes of the very same neuron are known to vary, e.g., dependent on the recent firing history of the neuron (Fee et al. 1996) and slowly over time (Franke et al. 2010). Here we will assume non-varying waveforms but, in principle, the filters could be adapted over time. We define the template of neuron i as ξi = [ξi,1T , ξi,2T, …, ξi,NT]T where ξi,c is the waveform for neuron i on electrode c and (·)T is the vector transpose. The single electrode waveforms are L samples long. Thus ξi is a column vector with LN dimensions, where N is the number of recording electrodes, i.e., the single electrode waveforms are concatenated. We use an analogous definition for any piece of multi-electrode data starting at time t: X(t) = [x(t)1T, x(t)2T, …, x(t)NT]T, where x(t)k = [x(t)k, x(t + 1)k, …, x(t + L − 1)k]T represents L samples of recorded data on electrode k starting at time t. The output of a multielectrode FIR filter f = [f1T, f2T, …, fNT]T applied to the recordings (which is usually denoted as $$y={\displaystyle \sum_k^N}\left({x}_k\star {f}_k\right)$$ where ⋆ is the cross-correlation between data of electrode k and the filter for electrode k) can now be expressed in terms of a vector multiplication: y(t) = X(t)Tf. Note that ξi, f and X(t) are all column vectors in the same LN dimensional vector space.

The noise covariance matrix (Pouzat et al. 2002) is given by: C = E[η(t)η(t)T] with η(t) being a piece of data where no spikes were detected. It characterizes noise from various sources, including small amplitude spikes from far away neurons. Thus C is of dimensions LN × LN and captures the complete spatio-temporal covariance (between electrodes and over time) that is induced by both, noise and undetected small amplitude spikes (Pouzat et al. 2002). Since we assume that the data was high pass filtered E[η(t)] = 0 holds.

2.2 Classical template matching

Two different similarity measures between a piece of data and a template are commonly employed for template matching: Euclidean distance template matching (e.g., (Cambridge Electronic Design Limited 2012; Plexon Inc 2009; Sato et al. 2007)) and convolution or cross-correlation template matching (Friedman 1968; Kim and McNames 2007). Note that convolution and cross-correlation are identical, apart from the time reversal of the filter by the convolution. Although we use the cross-correlation throughout this work, we still refer to template matching based on the cross-correlation as convolutive template matching. The Euclidean distance at time t between data X(t) and template ξi is defined as

$${D}_i^{Euc}(t)={\left|\left|X(t)-{\xi}_i\right|\right|}^2={\left(X(t)-{\xi}_i\right)}^T\left(X(t)-{\xi}_i\right) = X{(t)}^TX(t)-2X{(t)}^T{\xi}_i+{\xi_i}^T{\xi}_i$$

and the cross-correlation as

$${D}_i^{XC}(t)=X{(t)}^T{\xi}_i.$$

2.3 Template matching using the noise covariance matrix

If the noise covariance matrix is known, the recorded data can be prewhitened (Pouzat et al. 2002; Rutishauser et al. 2006) before matching the templates. Prewhitening is a linear operation that transforms the cluster of “noise” waveforms in a way that it will be roughly spherical (standard normal distributed) after prewhitening. With respect to template matching, prewhitening is equivalent to using the squared Mahalanobis distance instead of Euclidean distance and the matched filtering procedure instead of the convolution with the template:

$${D}_i^{Maha}(t)={\left(X(t)-{\xi}_i\right)}^T{\boldsymbol{C}}^{-1}\left(X(t)-{\xi}_i\right) = X{(t)}^T{\boldsymbol{C}}^{-1}X(t)-2X{(t)}^T{\boldsymbol{C}}^{-1}{\xi}_i+{\xi_i}^T{\boldsymbol{C}}^{-1}{\xi}_i$$

$${D}_i^{Match}(t)=X{(t)}^T{\boldsymbol{C}}^{-1}{\xi}_i=X{(t)}^T{\boldsymbol{f}}_i$$

where fi = C− 1ξi is the matched filter (see the appendix for a more detailed explanation) for template ξi. Figure 1 illustrates different template matching techniques applied to a toy example. We will refer to DiEuc or DiMaha as subtractive template matching since templates are subtracted from the data and the energy of the residual is used for spike detection and classification. Accordingly, DiXC and DiMatch are referred to as convolutive template matching.

2.4 Bayes optimal template matching (BOTM)

Template matching has to solve two tasks: the detection of a known signal in a noisy recording and the discrimination of spikes originating from different neurons. In the Gaussian noise case, these two problems have known solutions (see appendix): the optimal linear detector (i.e., the linear filter that maximizes the signal-to-noise ratio after filtering) is given by matched filtering (Van Trees 2002), while the optimal discrimination between several clusters that share the same covariance matrix is given by linear discriminant analysis (LDA) (Fisher 1936). These two solutions are interrelated, and, by using the probabilistic framework of LDA, it can be shown that a template matching exists that is optimal with respect to both tasks, i.e., it maximizes the signal-to-noise ratio for each neuron while also maximizing their discriminability. This Bayes optimal template matcher (BOTM) is given by:

$${D}_i^{BOTM}(t)=X{(t)}^T{\boldsymbol{C}}^{-1}{\xi}_i-\frac{1}{2}{\xi_i}^T{\boldsymbol{C}}^{-1}{\xi}_i+ ln\left(p(i)\right)=X{(t)}^T{\boldsymbol{f}}_i-\frac{1}{2}{\xi_i}^T{\boldsymbol{f}}_i+ ln\left(p(i)\right)$$

where p(i) is the prior probability to observe a spike of neuron i. For a derivation see eq. (7) in the appendix. In this work we set the prior probabilities equal to a constant (see section”Evaluation metrics“and Fig. 5), although, in principle, the estimated firing rates of the individual neurons could be used. The detection threshold for a spike in the BOTM filter outputs DiBOTM(t) (see Fig. 5) is then given by (see eq. (9) in the appendix)

$$thr= \ln \left(1-{\displaystyle \sum_i}p(i)\right)$$

which is usually close to 0 for realistic spike priors (see Fig. 3 bottom row).

BOTM computes matched filter outputs for each template (see Fig. 4 for an example with 10 templates). For each template a constant, which depends on the energy (ξiTC− 1ξi) of the respective template and on the probability p(i) that the template occurs in the data, is added to the filter outputs to compute the final discriminant functions. Spikes are detected and classified by thresholding those discriminant functions (the BOTM output). For each detected peak, the template that has the maximal BOTM output in a small temporal window around the peak is assigned to the spike. The individual processing steps of the method are summarized in Fig. 2. Figure 3 shows a visual comparison of the results of BOTM and other template matching procedures applied to a spike sorting benchmark (Quiroga et al. 2004).

In summary, the method works as follows (Fig. 2): 1. Precomputed matched filters are convolved with the incoming data. 2. Discriminant functions are computed from the matched filter outputs by adding the respective constants. 3. The detection threshold is applied to all discriminant functions. 4. For each threshold crossing, the local maximum of all discriminant functions after the threshold crossing is detected (and before all discriminant functions fall again below the threshold). 5. The temporal position of the peak and the identity of the discriminant function define the identity and time point of the detected spike.

2.5 Resolution of overlapping spikes

If two neurons fire an action potential very close in time, their spike waveforms will overlap in the recording (Fig. 2, right). Overlapping spikes are difficult to detect and classify, since the overlap waveform might be very different from the individual spike waveforms. However, as long as the individual waveforms do not cancel each other in a way that the overlap waveform has virtually zero energy, the individual matched filters do still respond to the overlap. Thus, BOTM will assign the single template with the highest peak to an overlap (which could also be a template not participating in the overlap if the overlap waveform is coincidentally similar to that template) missing at least one of the spikes (Fig. 2, right, ‘BOTM’). To also detect and resolve overlapping spikes one could, in principle, construct all possible overlaps between all available templates with different temporal shifts and add those to the template set. This would, however, dramatically increase the number of templates and introduce two problems: First, with increasing number of templates, each waveform, including noise, can be described by a certain combination of templates. Second, this approach, in a naïve implementation, would be computationally prohibitive. Both problems will be addressed in the following.

For independent spike trains the prior probability to observe an overlap is equal to the product of the individual single-spike prior probabilities (the spike prior). Thus, the more templates are involved in an overlap, the lower is the prior probability for the resulting waveform. Therefore, solutions that feature less templates are naturally favored in our probabilistic framework. With increasing number of templates in an overlap the prior probability to observe such an event decreases, providing a natural cutoff to how many spikes per overlap have to be considered.: At some point the discriminant function for the overlap, i.e.,

$${d}_{overlap}\left({\xi}_{overlap}\right)={\xi_{overlap}}^T{\boldsymbol{C}}^{-1}{\xi}_{overlap}-\frac{1}{2}{\xi_{overlap}}^T{\boldsymbol{C}}^{-1}{\xi}_{overlap}+ ln\left(p(overlap)\right)<thr$$

will never cross the detection threshold, even for the actual overlap waveform ξoverlap (see eq. (10) in the appendix). Still, it would be computationally very expensive to compute all convolutions between all (overlap-) templates and the data. However, this is not necessary, since the overlap BOTM discriminant functions (i.e., the BOTM output di + jτ for an overlap between template i and j with temporal shift τ) can be directly computed from the individual spike discriminant functions by

$${d}_{i+j}^{\tau }(t)={d}_i(t)+{d}_j^{\tau }(t)-{\xi_i}^T{\boldsymbol{C}}^{-1}{\xi}_j^{\tau }$$

where djτ(t) and ξjτ are the temporally shifted discriminant function and template of neuron j, respectively (see Fig. 2, right, ‘Option 1’ and eq. (10) in the appendix for the derivation). Depending on the number of spikes per overlap and their maximal temporal shift considered, this approach can still be computationally expensive.

Therefore, here, we employ a greedy approach. We assume that at least one of the single-spike BOTM outputs will cross the threshold for each overlap and that its peak is giving approximately the correct position of the template in the data. Then, we can subtract the expected influence of this spike from the other discriminant functions, a process which is also referred to as subtractive interference cancellation (SIC, see Fig. 2, right, ‘Option 2’ and eq. (12) in the appendix for the derivation). Fortunately, under the assumption that at least one single-spike discriminant function crosses the threshold for each overlapping spike, we only need to compute the overlap discriminant functions in temporal periods around threshold crossings of single-spike discriminant functions.

2.6 Noise Covariance Matrix

The noise covariance matrix C plays a crucial role for the Mahalanobis distance, the matched filter and BOTM as well as for some spike sorting procedures (Franke et al. 2010; Marre et al. 2012; Pillow et al. 2013; Pouzat et al. 2002; Shoham et al. 2003). In all cases its inverse is needed. However, the noise covariance matrix can be badly conditioned, i.e., it might have eigenvalues that are close to zero, which makes its inversion unstable. A standard procedure to invert ill conditioned covariance matrices is diagonal loading or shrinkage (Hiemstra 2002; Van Trees 2002): A target covariance matrix CT, often the identity, is added or merged to the original covariance matrix. We decided to chose the diagonal of C, CD = diag(C) as the target and blended it with the original matrix according to CL = αC + (1 − α)CT. We noticed that the choice of α influenced performance mainly through the temporal length of the resulting filter responses to spikes. With decreasing α the filters become more similar to a narrow bandpass filter which could lead to oscillating filter responses. Oscillations in the filter outputs were not a problem for benchmark 3 BOTM + SIC (see below, SIC iteratively subtracts the filter responses to spikes from the filter outputs, therefore removing all oscillations). We did not try to optimize diagonal loading but simply chose α = 0.5. For benchmark 3, BOTM + SIC we chose an α so that the maximal eigenvalue of CL divided by its minimal eigenvalue (i.e., the condition number of CL) was not larger than 10,000 to ensure that CL was invertible.

The noise covariance matrix can be either computed on pieces of noise, that is, periods of data where no spikes were detected (e.g., as in (Marre et al. 2012; Pouzat et al. 2002; Rutishauser et al. 2006)), or on residuals of the spikes after subtracting the templates (e.g., (Pillow et al. 2013; Shoham et al. 2003)). Since the residuals may contain additional variability, e.g., from mis-alignment of the templates (the correct template was subtracted at a wrong position), clustering errors (the wrong template was subtracted), or variability in the neuronal signal (the correct template does not fit perfectly with the occurring waveform), we chose to compute C on stretches of noise. This has the side-effect that we do not run the risk of overfitting C to the spike waveforms.

If T is the length of the templates (measured in samples) and N is the number of recording electrodes, the dimensionality of C is T ⋅ N times T ⋅ N, which can be very large and thus difficult to estimate. However, C has a special block structure (Pouzat et al. 2002):

$$\boldsymbol{C}=\left(\begin{array}{ccc}\hfill {C}_{1,1}\hfill & \hfill \cdots \hfill & \hfill {C}_{N,1}\hfill \\ {}\hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \hfill \\ {}\hfill {C}_{1,N}\hfill & \hfill \hfill & \hfill {C}_{N,N}\hfill \end{array}\right)$$

where the blocks Ci,k are the covariance matrices between electrode i and k with Ci,i = Ci,iT and Ci,k = Ck,iT. Each block is a Toeplitz matrix of the cross-correlation function of electrode i and k. Therefore, we did not estimate C by directly computing the covariance matrix from T ⋅ N dimensional snippets of noise, where it would not be guaranteed that (Ci,k)m,n = (Ck,i)n,m and (Ci,k)m,n = (Ci,k)m + 1,n + 1, but by constructing C from the respective auto- and cross-correlation functions. This procedure reduces the number of free parameters that need to be estimated to T ⋅ N, which again reduces the risk of overfitting.

2.7 Evaluation metrics

We evaluated the spike detection and classification performance of the template matching procedures on two different data sets described below. Both were preprocessed in a similar way: The available ground truth information was used to cut true spikes from the recording and to construct the templates. This avoided the problem of aligning the spikes to compute the templates. The template matchers were evaluated for detection and classification performance separately.

For the detection task the template matching output distribution was computed for all spikes as well as for pieces of noise. For each detector the detection error was defined as

$$\mathrm{Detection}\ \mathrm{Error}=\mathrm{F}\mathrm{P}+\mathrm{F}\mathrm{N}$$

where FP is the number of false positive noise detections and FN is the number of false negative misses. Then, the overlap of the template matching response distributions to noise and spikes was computed and, except for BOTM, the optimal threshold (which minimized the detection error) was estimated numerically. The alternative would have been to compare receiver operator characteristic curves, but BOTM directly provides a threshold. Therefore, we decided to compare its performance to the optimal performance of the other methods.1 To detect spikes in the BOTM output a prior probability to observe a spike needs to be chosen. Optimally, this probability is different for each neuron and depends on its firing rate. However, for the benchmarks used in this study, the performance was insensitive to the exact choice of the priors (see Fig. 5). Therefore, instead of choosing a prior for each neuron independently, we assumed them to be equal and set the prior for noise to 0.99 (i.e., we assumed it 100 times more likely to observe noise than a spike).

The sensitivity, i.e., the detection performance at the optimal threshold was defined as

$$\mathrm{Sensitivity} = 100\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{F}\mathrm{N}},$$

which denotes the percentage of correct spike detections (TP) divided by the total number of spikes. Note that this quantity indirectly includes the number of FPs since we minimized FP + FN to find the optimal threshold. In this study we use two different data sets (see below) that differ in the availability of ground truth data: for one data set we know the spike times of all neurons (and, therefore, also the number of neurons), while, in the other data set, spike times are known only for one neuron and it is unknown how many more neurons have been recorded from. For this reason, it is not possible to use the same evaluation metric for both data sets. We, therefore, describe the data-set-dependent evaluation metrics in the following section, together with the applied benchmarks.
• Benchmark 1 (Q): Evaluation on simulated data with full ground truth

• The proposed template matching was evaluated on the publicly available spike sorting benchmark data set described in (Quiroga et al. 2004) which we will refer to as benchmark 1 (Q). This data set was already used by other researchers for spike sorting evaluation (see, e.g., (Bestel et al. 2012; Ghanbari et al. 2009; Herbst et al. 2008)). The data set consists of 4 sub benchmarks labeled “Easy1” to “Diffcult2”. Every sub benchmark consists of 4 different data files (with the exception of “Easy1” which has 8) with decreasing signal-to-noise ratios. All files contain 60s of a simulated single electrode extracellular recording with 24 kHz sampling rate and 3 simulated neurons. Templates and the noise covariance matrix were calculated using the available ground truth information. Short periods of simulated data of length L = 61 samples starting 15 samples before the given spike time points were cut and averaged to create the templates. In this benchmark, the ground-truth spike times do not indicate the position of the peak of the spike waveforms in the data but rather their onset. Therefore, we corrected the original spike times by shifting the complete spike train of each neuron by a constant number of samples. This shift yielded a ground truth that reflected the peak positions of the spikes in the data (a feature also a spike sorter without any knowledge of the ground truth could estimate), not their onset. The template matching outputs for the different template matching procedures were computed on the noisy spike waveforms present in the data that were aligned with the corrected ground truth information. Each spike was assigned to the template with the maximal response. Spikes that were correctly assigned were counted as TP while spikes assigned to the wrong template were counted as classification errors CN. The quantity CN relies on knowing the full ground truth, that is, the exact spike times for all neurons. We used the following quantities for the performance comparison:

$$\mathrm{Classification}\ \mathrm{Performance}=100\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{C}\mathrm{N}}$$

• We chose to weight detection and classification equally, combining them in a final performance score of:

$$\mathrm{Total}\ \mathrm{Performance}=\frac{\left(\mathrm{Sensitivity}+\mathrm{Classification}\right)}{2}.$$

• Benchmark 2 (H): Evaluation on real recording with partial ground truth

• This data set is the part of the hc-1 data set described in (Henze et al. 2000) and publicly available under http://crcns.org/ and was already used for evaluation of spike sorting algorithms (see, e.g., (Ekanadham et al. 2014; Harris et al. 2000; Schmitzer-Torbert et al. 2005)). The following files were used: d11221.002, d11222.001, d12821.001 and d14521.001. We chose the files depending on the quality of the intracellular recording, i.e., the ones where the intracellular recording showed clearly visible and easily detectable spikes during the whole recording. Each data file consists of several minutes of simultaneous intra- and extracellular recordings in rat hippocampus with a sampling rate of 20 kHz. Ground truth information was available for only one single neuron, extracted from the respective intracellular recording. The extracellular recordings were high pass filtered at 300Hz.

• Spike sorting using mean-shift clustering (Marre et al. 2012) was performed in the space of the first 6 principal components after prewhitening to estimate the templates. The sorting was not optimized manually. The template of the cluster whose spikes best matched the ground truth was used to estimate the performance of the template matching procedures and will be referred to as the “target template”. For all data files the target template was very similar to the template that can be obtained by using only the spikes given by the ground truth, i.e., in all cases spikes from one of the clusters matched the ground truth well enough to get good template estimation.

• For the classification task, the template matching output was computed for all templates and all spikes of the ground truth neuron, as well as for all other spikes detected during the spike sorting. Spikes of the ground truth that were correctly matched to the target template were counted as true positives (TP). Since we do not have the full ground truth for this data set, the quantity CN from the previous benchmark cannot be computed for all neurons. We therefore counted spikes of the target neuron that were falsely assigned to a putative other neuron as CNt. Spikes which were found by the automatic spike sorting procedure which belong to putative other neurons and which were correctly not assigned to the target neuron were counted as true negatives (TN). Spikes not included in the ground truth that were assigned to the target template were counted as false positives (FP). How many of the spikes that actually belong to the target neuron were also classified correctly? This is the classification performance.

$$\mathrm{Classification}\ \mathrm{Performance}=100\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{C}\mathrm{N}\mathrm{t}}$$

• And of all putative spikes detected in the recording that do not belong to the target neuron, how many were correctly classified as noise or spikes from other neurons? This is the specificity (the performance in rejecting spikes that do not belong to the target neuron).

$$\mathrm{Specificity}=100\frac{\mathrm{TN}}{\mathrm{FP}+\mathrm{T}\mathrm{N}}$$

• Many different ways of combining the different performance measures into a final score are possible; however, here, we decided to combine them with equal weight into a final score, the total performance.

$$\mathrm{Total}\ \mathrm{Performance}=\frac{\left(\mathrm{Sensitivity}+\mathrm{Classification}+\mathrm{Specificity}\right)}{3}$$

• Benchmark 3 (Q): Online template matching of full recording

• To compare the performance of BOTM to the spike sorting performance reported in (Quiroga et al. 2004), we did not use the ground truth to cut perfectly aligned spikes (but the ground truth was used to compute the correct templates). Instead, BOTM was run on the whole data and spikes were detected in the template matching output.

• We used the same data as in benchmark 1 (Q) to evaluate the performance of BOTM as a refinement tool for spike sorting procedures. For this, the performance of BOTM using the correct templates was compared to the performance of a clustering based spike sorting. We chose the spike sorter “wave_clus” described in (Quiroga et al. 2004). “wave_clus” detects spikes using a voltage threshold. The threshold is computed via the median absolute deviation (MAD) of the data which is, in the presence of spikes, a more robust way to estimate the standard deviation of the noise. Then, wavelet coefficients are computed for each spike. A subset of the wavelet coefficients is chosen which, putatively, carries most information about the identity of the spikes. Those coefficients are then clustered by using superparamagnetic clustering, a clustering method developed in the context of statistical mechanics and based on simulated interactions between each data point and its K-nearest neighbors.

• For BOTM, the correct templates and the noise covariance matrix were computed from the data using the available ground truth information. To be more specific, ground-truth spike trains were used to cut short pieces out of the data around the true locations of the spikes. These pieces were averaged to calculate the templates. Therefore, the (‘correct’) templates contain a realistic amount of noise due to the averaging process. Pieces of data in which no spikes were present were used to compute the noise auto-covariance function from which the noise covariance matrix was estimated. As a consequence, also the noise covariance matrix was noisy due to the estimation process on a limited amount of noisy data. The optimal filters were computed and convolved with the whole extracellular recordings. The discriminant functions were then thresholded with the optimal threshold (see eq. (9) in the appendix). For each period during which at least one discriminant function was above the threshold, the peak of the maximal discriminant function was detected. The identity of the discriminant function was used as the neurons’ identity and the position of the peak as the time of the spike. This way, detection, alignment and classification were implemented in a single operation. To avoid a second detection of the same peak due to multiple peaks of the filter response, peaks within 8 samples (.33 ms) were compared and only the maximum was kept. Thus, overlapping spikes within this time window could not be resolved. The BOTM + SIC method was evaluated in the same way but, additionally, once a spike was found in the data and assigned to a neuron, the expected filter response to the neuron’s template was subtracted from the discriminant functions (eq. (10) in the appendix) and the detection step was repeated.