hugo-site/content/publications/2016/brain-machine-interfaces-neural-recording-time-domain-techniques.md
2024-11-02 18:10:37 +01:00

130 KiB

title date draft toc math type tags
Brain machine interfaces: Time Domain Techniques 2016-08-08T15:26:46+01:00 false true true posts
chapter
thesis
CMOS
biomedical

Lieuwe B. Leene, Yan Liu, Timothy G. Constandinou

Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK

Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK

43 Time Domain Techniques

Thus far our work has detailed numerous design techniques that extend on contemporary work where the classical analogue approach with digital processing has demonstrated its capabilities. However we have also analytically shown that although we can still strive to improve area and efficiency, there are a number of factors that prevent making significant progress in terms of improving system characteristics. Moreover there is a strict need for more efficient computational processing that appears overwhelming if it is made robust and adaptive. If we keep the current processing methodology this component can only be made viable with smaller technologies and voltage scaling that can substantially diminish the performance of analogue operations. Here we will attempt to address the two factors that have the most significant impact on improving sensing electronics based on the observations made in the foregoing discussions to consolidate this work. The first is introducing all-digital instrumentation that is not diminished by technology related scaling and the characteristics of nano-meter transistors. The second objective is developing a mixed signal topology for analogue to information conversion where feature extraction is performed adaptively in the analogue domain.

This chapter will focus on exploring the emerging time domain processing modality in order leverage increased digital performance associated with modern CMOS processes. In fact this motivation is carefully addressed in the literature1 where logic-gate based topologies demonstrate better scalability with respect to linearity and bandwidth. Here we will demonstrate how the fundamental limits of noise efficiency can be approached by proposing several topologies and design techniques. Further we will elaborate on the characteristic relations between analogue performance and resource requirements that enable these structures. The organization of this chapter is as follows. Section 44 will introduce the essential design considerations for continuous time-domain circuits by considering the phase characteristics of oscillators in relation to driving transistors that are used for analogue feedback. This structure will be used to implement both amplifying and filtering structures to compose the instrumentation front end. This is followed by Section 48 where we propose a mixed signal topology for analogue domain classification.

44 Principles for time domain processing

There are two driving factors to approaching time domain concepts where signals are represented in terms of delays between pulse edges or phase components in oscillators. The first benifit is the inherent digital operation where continuous valued signals are represented by digital events with respect to a global or local reference 2. This implies that the typical analogue processing has the same power scaling and advantages as the digital processing in terms of technology parameters. This allows oscillator structures to approach very efficient operation irrespective of the oscillation frequency or supply voltage 3. The second is that many operations are not restrained by non-linearity from individual transistors giving way to ideal integrators and other operators 4. The overall result is that even with limited power budgets the topologies have an overwhelming excess in bandwidth where performance can scale with digital gate delay or its switching energy. The abundance of digital operations for such systems allows these topologies have the potential for digital synthesis using standard cells and a digital design flow to directly process analogue signals 5. Moreover event based representation of continuous valued signals allows for often a surprisingly efficient implementation with reduced complexity for a variety of elementary operations. For example 6 presents a clock-less PVT invariant true random number generator based on the collapse of a ring oscillator structure.

{{< figure src="/images/phd-thesis/BW-VDD.svg" width="500" >}}

{{< figure src="/images/phd-thesis/mLP.svg" title="Figure 68: Voltage supply relationship with respect to the bandwidth and linearity requirements with respect to different technologies " width="500" >}}

Let us elaborate on the notion of scaling analogue with digital characteristics quantitatively. Figure 68 illustrates the drawback of conventional analogue techniques from first principles by looking more closely at the voltage scaling characteristics. Here transitioning to nanometre technologies gives us the capability of reducing our voltage supply because the desired bandwidth can be achieved with a smaller inversion coefficient or equivalent gate voltage. However the transconductance and consequently linearity and noise efficiency can degrade as the drain voltage is reduced. This dependency is because transistor gain requires a large channel resistance which is a function of \((1-e^{-V_{DS}/U_T})\) in addition to any DIBL which introduce a asymptotic limit where the former is in fact not process dependent 7. This limits the output swing \(V_{max}\) with a overhead that is \(5 U_T\) 8. Figure 68 b) demonstrates the resulting the class-A power efficiency measured as \(V_{max}/V_{DD}\) to reflect how efficiently we can use the provided voltage supply. This is evaluated in terms of \(P_{out}/P_{vdd}\) where \(P_{out}\) and \(P_{vdd}\) are the output signal power and the power dissipated by the voltage supply respectively. We can conclude that conventional low noise amplification structures can no longer benefit from technology scaling unless we adopt topologies that do not rely on amplification in the voltage domain. Because the input referred noise of a circuit relies only on its current dissipation scaling supply voltages remains a viable means to reduce power if time-domain structures can mitigate the need for voltage gain.

45 Sub-threshold Ring Oscillators

The understanding and the interpretation of principle elements for a given modality has the most influential impact on how well it can be utilised. Moreover encoding signals in the time domain will influence how flexibly certain objectives are approached either using digital operations or analogue feedback. Here we will review some basic understanding for ring oscillator structures that are biased in weak inversion. This component will provide a fundamental basis for the topologies proposed here because it connects analogue signals at the input to phase and time domain signals at its output. The interest here specifically lies with using current controlled oscillators that have a well defined linear relationship associated with any injected charge and the resulting shift in output phase. This will lead to the small signal transfer function that relates to the biasing current of the oscillator. Moreover we must be able to evaluate the different components of phase noise and refer it back to the input of transconductive element because we will apply this structure to instrumentation.

V_{out} (t) = A(t) \cdot f\left[ \omega_0 t + \phi(t) \right]

A generalized time dependent model for an oscillator is represented by Equation 37 where \(A\) and \(\phi\) represent the amplitude and phase state variables of the system. \(f\) describes the limit cycle of the oscillator over time that maps the steady state output voltage \(V_{out}\) as function of phase. The challenge for sub-threshold current biased ring oscillators is that the non-linearity in \(f\) is difficult to analytically predict without well informed priori. This is of significance as it will determine how noise sources perturb coupled to the output phase state.

{{< figure src="/images/phd-thesis/impulse.png" title="Figure 69: " width="500" >}}

Many principle aspects of phase dependencies in oscillators have been well described in a generalized form using numerical methods 9 and by approximation 10. The underlying characteristics however are illustrated in Figure 69 where charge perturbations integrate on to the phase of state the oscillator with respect to the impulse sensitivity function (ISF) \(\Gamma(x)\). This factor is a cyclo stationary function that describes how the coupling changes as a function of the phase state \(\phi\) subject to the source of perturbation. Moreover this allows us to predict the accumulated phase noise due to a time varying process according to Equation 38.

\phi (t) = \int_{-\infty}^{\infty} h_{\phi}(t,\tau) i(\tau) \: d\tau = \int_{-\infty}^{t} \Gamma(\omega_0 \tau) i(\tau) \: d\tau

The integral dependency on accumulated phase is what leads to the infinite open loop gain for oscillator based amplifiers. This also implies that any white noise source that is incoherent with the oscillator fundamental frequency will translate to the output phase as \(\Gamma_{rms}\). This depends on the assertion that incoherence implies uncorrelated which is subject to the beat frequencies of the two sources. In practical cases this is a fair approximation not only because the oscillator frequency drifts freely but also because we explicitly consider closed loop implementations that aggressively shape in band perturbations. The utility of Equation 38 lies with its ability to predict the single-side band output noise spectrum due to a white noise current source with spectral density i_n^2 / \Delta f with carrier off-set frequency \(f_{off}\) according to Equation 39 11.

L(f_{off}) = \left( \frac{\Gamma_{rms}}{2\pi f_{off}} \right)^2 \cdot \frac{i_n^2 / \Delta f}{2}

The \(N\) stage ring oscillator structure of interest is illustrated in Figure 70. Opposed to voltage controlled structures this configuration is current biased and the oscillating ring is isolated from the supplies. Here the oscillating frequency in sub-threshold operation can be approximated as f_0 = I_B / (N C_{gate} V_{RS}) where \(I_B\) is the biasing current and \(C_{gate}\) is the input capacitance of the delay element. \(V_{RS}\) is the voltage across the oscillating structure and is evaluated using Equation 40 where \(V_{th}\) is the transistor threshold voltage. In this case \(M_2\) provides a biasing current from the PMOS side and \(M_1\) if designed appropriately will allow isolation from the ground supply. This is particularly useful in differential configurations where capacitance on a common \(V_R\) or \(V_S\) can minimize high frequency noise from coupling directly to the differential phase component through the common mode feedback.

$$ V_{RS} = V_{th} + \eta U_T \ln \left( \frac{2 I_B}{2\eta U_T^2 \mu C_{ox}} \frac{L}{W} \right) $$%

{{< figure src="/images/phd-thesis/schematic_RO.svg" title="Figure 70: Schematic of current regulated ring oscillator with capacitively couple noise source. " width="500" >}}

The defining characteristic of the current biased oscillator is that the conduction of the NMOS and PMOS devices in each delay element is strictly non-overlapping. This is different when compared to oscillators biased in strong inversion and implies maximum current efficiency in a large signal sense. In addition it leads to the respective NMOS and PMOS ISF being non-negative. Thus the focus should lie with optimizing its rms value by balancing pull-up pull-down conductance. In fact we can empirically demonstrate that despite the intricacies of non-linear phenomena a current starved ring oscillator presents a significantly superior noise excess factor when compared to that of a transistor biased with the same weak inversion conditions due to change being retained in the high impedance nodes.

{{< figure src="/images/phd-thesis/state_variables.svg" width="500" >}}

{{< figure src="/images/phd-thesis/ISF_bias_nmos_pmos.svg" title="Figure 71: Simulation results outlining the dependency of parameter dynamics as a function of oscillator phase" width="500" >}}

Figure 71 exemplifies the challenge of being able to predict internal parameter dependency analytically. Specifically in \textbf{a)} where the NMOS and PMOS of a single delay slice is evaluated both the saturation and linear conduction phases contribute towards accumulated phase noise. It is indicative to note that the bias transistor has a near uniform ISF equal to \(2\pi/q_{max}\) independent of phase state as expected from the linear phase to charge relation. Here \(q_{max}\) simply represents the total charge dissipated by the ring oscilator each cycle which is \(2N V_{RS} C_{gate}\). In particular this phase independent sensitivity is surprisingly independent of oscillator configuration in terms of number of stages and delay cell input capacitance. Instead the characteristic relies on the capacitance and channel resistance seen at the drain of M2 such that increasing impedance improves linearity.

When the aggregate contribution of all delay elements is taken into account as well as the increased noise excess factor in the linear region the ISF in \textbf{b)} appears predictable when normalized to that of M2. One may expect that the aggregate ISF of the ring oscillator to exceed the sensitivity to that of M2 as its contribution should have a similar profile and more noisy elements are involved. However the soft-switching of each delay element filters out a significant component of injected noise in addition to the fact that the nodes \(V_R\) and \(V_S\) retain accumulated current noise that feedback on the following stage.

Insight to optimizing the oscillator consideration is drawn from considering the lossy integration phases on \(V_X\). Specifically as the transistors M1 and M2 present high impedance when considering the injection of charge or integration of a noisy current. We can infer that resulting voltage fluctuations are either one of two cases; coupled to \(V_R\) or \(V_S\) through a transistor in the linear region, or coupled to the switching capacitance during a transition. Rejection of the former will rely on increasing \(q_{max}\) and minimizing coupling factors as the ISF is equivalent to that of the bias current.

{{< figure src="/images/phd-thesis/ISF_M2_compensate.svg" width="500" >}}

{{< figure src="/images/phd-thesis/ISF_M2_injct.svg" title="Figure 72: The compensation effect of M1 on the ISF for capacitively couple noise sources with reference to Figure 70" width="500" >}}

It is well known that the dominant factor of noise in ring oscillators comes from supply variations that are capacitively coupled as illustrated in Figure 70. This represents the coupling expected from substrate noise and supply noise that is not generated by the transistors them selves. The impact of introducing M1 opposed to grounding \(V_T\) is shown by Figure 72 with a dramatic improvement in ISF characteristics. Moreover large drain resistance of M1 allows the peak to peak ISF to be adjusted by exploiting the dynamics previously discussed. On that note it is important to realize that unlike Gm-C differential implementations the rejection of common mode signals is not present due to the coupling dependency of on phase. The matching/minimization of these factors can still allow a considerable improvement towards performance in practice but the process of optimization is challenging due to the fact that these components can not be well predicted as a priori. More generally incoherent perturbations in differential implementations will scale with \((\Gamma_{rms}-\Gamma_{dc})^2\).

It may be obvious that there no high impedance analogue nodes in this configuration that could introduce undesirable poles. But more importantly we do not need to provide extra voltage headroom or a second gain stage to let our output signal vary with maximum amplitude. In this case the oscillator mostly reuses the VSR voltage headroom. This raises an interesting question; what limits the required voltage headroom for this circuit? Typically the complementary structure necessitates that the source drain voltage of the current bias transistors and differential pairs is sufficient to provide good channel resistance. However there is another component with regard to the noise generated by the oscillator that should be considered in terms of the oscillator voltage overhead VRS. This leads us to evaluate the dependency on sampling noise with respect to the loading capacitor of each delay cell. Considering that the scaling the technology can result in a higher oscillator frequencies and equivalently using a small loading capacitance for the same power budget. It is important to realize that it is a charge induced as sampling noise on each capacitor before each up/down transition as residue from the previous cycle. This sampling noise can be referred to the input of M2 which leads to the expression in Equation 41.

v^2_{smp} = \frac{1}{Gm^2_{M2}} \cdot \underbrace{2N f^2_{osc} kT C_{gate}}_{Noise \: power}

As Equation 41 suggests this noisy charge injection occurs for every transition in a delay element which is \(2N\) times per period. When we expand this expression in terms of the oscillator power dissipation we can show its underlying dependency in Equation 42.

v^2_{smp} = \frac{4kT}{P_{osc}} \cdot \left( \eta U_T \right)^2

Now it should be clear from Equation 42 that this contribution only depends on the total power dissipation of the oscillator \(P_{osc}\). This profound result confirms that without considering band-limiting factors all transistor generated noise densities are in fact independent of the frequency or total capacitance when referred to the gate of the biasing transistor. Following our expectation is that the dominant factor for noise is the total biasing current of the structure which is fundamentally identical to that of an conventional amplifier.

46 Time Domain Sensor interface

A principle element to these systems is associated with achieving effective conversion from continuous analogue signals to time encoded binary signals without distortion or excess signal corruption. It is typical to see the removal of VCO non-linearity though LMS post-processing12 however this level of in-channel DSP can also be avoided through feedback utilizing the linearity of passive components. Our endeavour here lies with applying the discussion and topology selection in Section 23 to VCO based structures that follow closely to our optimization methodology. We suggest thinking of the oscillator's phase as an analogue memory that represents the state variable of the system which we can freely adjust by injecting charge.

This approach is different from that currently seen in the literature for time-domain based instrumentation of low frequency signals. The time domain encoding concept is predominantly used in asynchronous ADCs that aim to avoid quantization noise from being introduced 1314. There is some motivation here to approach a neuromorphic amplifier topology that generates tokens with time-domain events that encode the input signal intensity 15. Many of these structures leverage signal dependent power dissipation that reduces as the input signal varies more slowly. However they are typically open-loop topologies to avoid a complicated feedback DAC where events are generated upon asynchronous level crossings that reset internal integration nodes or toggle the reference voltages. Linearity and dynamic range can become difficult to achieve while maintaining aggressive power efficiency because resetting integrators or changing references are large signal discontinuities.

{{< figure src="/images/phd-thesis/LNTI.svg" width="500" >}}

{{< figure src="/images/phd-thesis/TDFB.svg" title="Figure 73: Time domain instrumentation topology for low noise voltage to time-domain conversion." width="500" >}}

The proposed implementation shown in Figure 73. This structure opts for a direct conversion of analogue to phase domain signals by relying on the integration to filter out oscillator harmonics present in the feedback signal. Abstractly the topology is seen as a ideal integrator with integration factor \(\frac{Gm}{q_{max}}\) proceeded with a non-linear element that introduces spurs around N times the oscillator frequency when feeding back. Here N is the number of taps in the ring oscillator used to simultaneously evaluate the phase difference of the differential structure. This allows us to freely adjust N for improving \(\Gamma\) through increasing \(q_{max}\) without sacrificing the ability to suppress the harmonics. Since the signals at the output of the phase frequency detector represent the phase difference between the two oscillators is full scale. The capacitive network need to scale down by a relatively large factor to assure \(V_{x}\) does not exceed the linear range of the transconductor and is implemented using a capacitance area reduction technique 16. When the closed loop gain is large however this concern can be dismissed since the quantization levels scale with \frac{V_{DD}}{A_{cl} N} which will typically be the same order of magnitude as the input signal.

While we are free to adjust the transconductance for noise requirements there is a limitation to the increase the complexity resulting from the capacitive feedback DAC and parallel digital phase processing. Because digital power dissipation scales with N f_{osc} which is bounded by \(I_B\) it is independent of \(N\) for a fixed capacitive load in the oscillator delay cell. In fact increasing \(N\) reduces the total power of the oscillator harmonics as we effectively increase the number of quantization levels. This can be seen at the output of the capacitive DAC but this aspect will not be evident with respect to the processing performed in the time domain.

Note that when using ring oscillators with large number of stages in order to reduce leakage and non-linearities in the limit cycle to some extent we can retain a small factor of \(N\) by sub-sampling the output taps of the structure. This does require an integer ratio between the total number of stages and \(N\) in order to position the harmonics beyond \(N f_{osc}\). Also consider that relation between the phases of the oscillator will imply a specific frequency shaping and harmonic modulation at high frequencies 17.

The primary design criteria for the phase detector structure and it respective time domain encoding should be related to maximizing power-bandwidth efficiency of digital cells. This is because the time-domain characteristics of the detector could introduce a inverse relation with regard to signal level and required logic gate bandwidth. Using conventional \(1.5 b\) encoding with up/down signals for example would give rise to this unwanted discontinuity. This is because the encoding scheme will generate narrower pulses for smaller signals that require exceedingly more bandwidth to process and feed into the time domain memory. It is conceivable that if this bandwidth is insufficient a dead-zone is introduced that is characteristically similar to class-B amplifiers.

Using a single bit representation that results from a XOR phase detector inverts this problem such that for small error the minimum bandwidth is required that successively increases as the loop error increases. In extension any asymmetric switching & delays in driving the capacitive feedback that is expected from process variation exacerbates any capacitive mismatch in the different phases of the feedback additively. These components primarily excite distortion on the output depending on the ratio of gate delay to oscillation period. Here Chopping the input will remove off-set and mismatch related components to a certain extent by up modulating them.

The motivation for using the single stage structure or allocating all the gain to the first stage is also associated with how the supply noise couples to the signal. In this respect we suggest that this structure should be thought of equivalent to that of a ADC. Particularly with respect to the digital feedback where providing asymmetric feedback implies that supply noise coupling can not be cancelled out. In addition capacitive mismatch between the positive and negative branches will also contribute to supply noise coupling. As since supply noise sources couple to the output of the amplifier while providing the maximum closed loop gain minimizes the input referred component. It should be noted that this type of supply sensitivity and capacitive mismatch is equivalent to that found in analogue to digital converters hence this drawback is only with reference to an all analogue solution. Further more once our signal has been encoded in the time domain which we expect to exhibit improved resilience to supply noise because its influence is proportional to the gate delay of the technology used.

{{< figure src="/images/phd-thesis/schematic_TDI.svg" width="500" >}}

{{< figure src="/images/phd-thesis/schematic_PR.svg" title="Figure 74: Transistor level implementation of the phase domain integrator structure with phase detector feedback." width="500" >}}

The schematic implementation of the VCO is show in Figure 74 which is derived from the complementary amplifier structure used in prior work. The fact that both ring oscillators are isolated from the supplies and floating in the middle of the rails presents an improved ISF as well as assuring the buffer that amplifies the clock phases to the full scale is guaranteed to be centred around the switching point of a balanced inverter. The most crucial component for effective operation however lies with the sizing of the input NMOS M2 with respect to loading ring oscillator. The DC operating point M2 and R1 will present an load equivalent to that of a diode connected transistor. If the delay element is balanced the current bias of the oscillator is evaluated with \(K_{M2}\) and \(K_{N}\) representing the \(W/L\) ratio of transistors M2 and the NMOS in the delay cell respectively.

{{< figure src="/images/phd-thesis/DIG2.svg" title="Figure 75: Simulated transient behaviour of the differential oscillator and the generated digital output. " width="500" >}}

Figure 75 clarifies the principle operation of this topology. We can see that as two currents are being integrated on the differential oscillator a phase shift will start to emerge when the two waveforms are compared. This phase difference on node \Delta \phi represents our system output where the signal is encoded in the pulse width of the digital signal. This signal is applied to the capacitor array for feedback.

f_{osc} \approx \frac{\alpha I_{M1}}{N C_{gate} V_{th}} \text{where} \alpha = \frac{K_{M2}}{2 K_{N}}

The factor \(\alpha\) in Equation 43 dominates the noise performance when referred to the input which would ideally approach the \(NEF\) of that without the oscillator. Similarly the corner frequency of the oscillator flicker noise which is not rejected by the chopper scales with this factor. It follows that the transistor length of the oscillator has a strong relation with respect to f_{cor} \propto 1/L^{2}. Fortunately it is easy to diminish this contribution as only a small bias is needed to result in a oscillation frequency several orders outside the signal bandwidth.

H_{sys}(s) \approx \frac{\eta f_{osc} }{s U_T} \cdot \frac{2-\alpha}{\alpha} N and f \approx \frac{C_I}{C_D} \cdot (N+2)

The overall open loop system characteristics \(H_{sys}\) evaluated in Equation 44. This reflects the single pole nature of the topology that scales with the oscillator frequency and the number of phases taped out as one may expect. Notably the capacitive feedback structure used can represent a very small feedback factor \(1/f\) without excess input capacitance that accommodates a large number of oscillator taps 18. Evaluating the low pass 3dB point of the system which reveals a dependency as shown in Equation 45.

f_{3dB} = \frac{\eta f_{osc} }{\alpha U_T} \cdot \frac{N}{N+2} \cdot \frac{C_D}{C_I}

This expression is primarily dominated by the oscillator frequency which even for a small bias current can result in a considerable bandwidth. Although this is partially expected due to the fact there is no explicit load capacitance it also illustrates the benefits in FOM that can be achieved with this configuration of current-mode time domain architecture. There is a instinctive concern for the stability of the system as a result of the excessive bandwidth driven by maximizing efficiency. The non dominant poles introduced in the voltage domain is due to the parasitic capacitance on node \(V_Q\) typicall will not compromise stability due to coupling to the input of the transconductance at higher frequencies. The non-dominant pole on the time domain is introduced by any delay \(t_d\) from the VCO to the output buffers of the PFD as e^{-j\omega t_d}. This component can be more restrictive for small loop gain as it does not scale with the power of the input stage but with the supply voltage.

The voltage requirement of this structure is improved by biasing the NWell of the PMOS peudo differential pair to \(V_{XN}\) & \(V_{XP}\) in a cross coupled fashion to reject the differential lading component. The forward biasing reduces the threshold voltage of the devices allowing a supply voltage down to \(0.6 V\) without any considerable impact from leakage currents. This configuration also implies that the common mode at \(V_X\) is well regulated by the body transconductance of M4 & M5 rejecting common mode input fluctuation. The main voltage requirement actually comes from the switches of the chopper that feeds the ring oscillator that need good on-resistance to prevent noise injection which implies a minimum voltage of approximately \(2V_{th}+V_{ov}\). Back-gate biasing will allow us to reduce the impact of \(V_{th}\).

The psuedo-resistive feedback structure in Figure 74 b) extracts the signal component from up modulated aggressors using a current DAC which is resistively coupled to the input to close the loop. This allows us to feed back the full swing digital signals to cancel a DC off-set and sets the input common mode by matching the cross coupled transistor with the input pair. This primarily prevents having to use a cascaded resistor structure in order to deal with the large voltage swing on the output that can significantly degrade performance. While stability is trivialized by the capacitive feed forward signal that grantees stability 19, it is important to note the design choice associated with the two poles in this feedback loop. One pole lies at the input of the complementary pair associated with \(C_{fb}\) and the other is at the gate of the cross-coupled pair \(C_{x}\).

\tau_{hp1}(s) \approx C_{fb} \cdot R_{psudo} \text{and} \tau_{hp2}(s) \approx C_{x} \cdot \frac{\eta U_{T}}{I_{M1}} \cdot \frac{W_{M6} + W_{M7}}{W_{M6} - W_{M7}}

Equation 46 described the dependency of the two time constants in addition to the capacitive feedback. The reduction in capacitance of the feedback network implies that the high pass filter needs careful design in terms of the resulting pole location as the noise expected from the psuedo resistor will appear increasingly wider band as we try to reduce the total capacitance. Here we allow the second pole of \(C_{x}\) to approach DC by having W_{M6} \approx W_{M7} resulting in a integration node. This means that the noise around the chopper frequency is strongly related to amount of capacitance we can allocate to \(C_{x}\) and the 1/f agressors are now shaped by the VCO integrator and this capacitance. The bias of \(I_{M1}\) in the current DAC should be adjusted to set the pole location close to but smaller than the chopper fundamental similar to the conventional design approach.

When we compare this structure to the conventional topology we realize a number of significant advantages. Primarily the inversion coefficient of the transistors is not bound like in a complementary input stage where the \(V_{GS}\) voltage for both the NMOS and PMOS has to be sufficiently large to allow the drain voltage to fluctuate by several \(100 mV\). This is particularly significant because the minimum feature size is inverse proportional to the optimal inversion coefficient confirming again that conventional means to not work at nano meter technologies. Here the threshold voltage can be arbitrarily small and we still retain a topology that is independent of supply voltage in the sense that it is strictly current biased. This will lead to improving the tolerance towards wafer level variations of the threshold voltage and carrier mobility which many sub \(1 V\) structures do not have. Similarly this implies class-A type power dissipation that minimizes switching current seen at the analogue supplies.

The excess in bandwidth from the VCO despite operating with a very small inversion coefficient has enabled us to achieve both \(40-50 dB\) closed loop gain while still retaining excess loop gain that easily exceeds \(30 dB\). This excess loop gain in the signal band is facilitated by the near ideal VCO integration of this topology that shapes a number of external noise sources and nonidealities. In particular technology scaling allows us to minimizes the noise gain due to the input capacitance \(C_g\) according to the expression 1 + C_{g}/C_{in}+ N/A_{cl} 20. Hence the VCO topology can allow a reduction for the input capacitance by a very significant factor relating to an impedance enhancement that scales with technology.

{{< figure src="/images/phd-thesis/Sim_Inband.svg" width="500" >}}

{{< figure src="/images/phd-thesis/Sim_Outband.svg" title="Figure 76: Transient noise simulation result of the 180nm CMOS time domain instrumentation topology with a \(6 mV\) peak to peak sine input at \(1 kHz\)." width="500" >}}

A Transient noise simulation performance is shown in Figure 76. This demonstrates that the dependency on nonlinearity is mainly due DAC mismatch components which are modulated the oscillator frequency spurs. Secondly the noise-floor and corner frequency characteristics follow closely to analytic predictions. In addition for the same current bias as a conventional implementation the structure can achieve an equivalent noise floor but at a reduced voltage overhead. Noticeably in the full spectrum there is a considerable amount of harmonics out side of the band induced by the chopped and oscillator aggressors. These components will need to be filtered out in order to approach a \(60 dB\) signal to noise ratio. Interestingly there is an observable gain in noise floor as we approach the point where there is no excess loop gain. Note that the spurious free dynamic range of this structure almost exceeds that of the structure used in Section 23 by a factor of 10 for the same power budget due to the increase input range.

47 Time Domain analogue filter

Now that we have addressed the aspects of achieving low noise and linear instrumentation we must proceed to address the mechanisms for filtering to implement a band limiting characteristic necessary for the processing algorithms. There is some diversity in the number of approaches used to filter time domain signals. Most notably the continuous-time FIR based structure that represents a number low power characteristics that scales well with technology without sampling or clocking 2122. However similarly to conventional FIR structures it is limited to signals where the frequency dynamic range is small in order to keep the filter order small. Other examples are found in PLL structures that lock using coherent phase domain signals which is inherently second order due to the analogue integration node which result in a loss of noise efficiency at low frequencies 23. We do mention that VCO-based ADCs have been very successful in achieving efficient high-order noise shaping 2425.

It is important to realize that our proposed instrumentation topology converges on incoherent phase domain signals and neglects the modulation products through construction similar to that of asynchronous \(\Delta\Sigma\) modulators26. Here we will take a similar approach to construct a first order phase domain integrator where the time-domain signals are also incoherent. Through simplicity the structure achieves a significantly better dynamic range and voltage scaling capability than its analogue domain counterpart. The premise will lie with our assumption that the intermodulation products of the incoherent frequencies are sufficiently out of band to allow construction of higher order filter structures.

{{< figure src="/images/phd-thesis/TD_ABST.svg" title="Figure 77: Closed loop time-domain analogue filtering structure" width="500" >}}

The topology used for analogue filtering of time-domain signals is illustrated in Figure 77. This is based similarly on the phase difference of two ring oscillators that integrate a switched current which is generated by evaluating the difference in duty cycle with respect to the input and feedback. The logic simply advances or recedes the phase difference of the oscillator when there is an excess or lack pulse width when comparing the two inputs respectively. This behaviour is shown in Figure 78. The use of logic gates over avoids any drawback that arise from limited linearity and mismatch in the case of approaching this design with current mixing techniques. More over the efficiency of these operations allows miniaturized reconfiguration in the digital domain with a minimal analogue structure.

{{< figure src="/images/phd-thesis/DIG1.svg" title="Figure 78: Simulated transient behaviour of the differential oscillator and the time encoded digital signals internal to the feedback loop." width="500" >}}

Most design considerations here are similar to that of conventional filters. We expect the analogue in-band noise components will scale with the biasing current \(I_b\) which will determine its input referred noise relative to the transconductance element \Delta I. This however does represent a fundamental drawback since the charge pump transconductance element does not benefit from the sub-threshold slope gain factor. \Delta I will typically be larger for the same bandwidth requirements by a factor of \(1/U_T\). Although this factor is essential for achieving smaller cut off frequencies while maintaining large oscillation frequencies. The decreased noise efficiency is the fundamental drawback of using a digital logic instead the capacitive feedback network. However in this scenario the processed signal will already be at full dynamic range with reduced noise requirements.

{{< figure src="/images/phd-thesis/TD_SUB.svg" width="500" >}}

{{< figure src="/images/phd-thesis/TD_FLT.svg" title="Figure 79: Schematic sub-blocks for first-order time domain analogue filter " width="500" >}}

The gate level implementation is elaborated in Figure 79. The charge pump structure here uses a cascaded current source and dummy load for bandwidth improvement. This configuration is important because this operator precedes the integration and consequently has a substantial influence on off-set or distortion near the cut-off. The self referenced bias of the charge pump through M11-M14 should allow to good matching independent of the configuration in biasing transistors M2-M3. Similarly the noise figure is improved by sharing the drain voltage of M12-M13 as its noise is coupled to the common mode.

{{< figure src="/images/phd-thesis/gmc_eqv.svg" width="500" >}}

{{< figure src="/images/phd-thesis/TD_FLT2.svg" title="Figure 80: Bandpass time-domain analogue filter which is cascaded to realize a 4\(^{th}\) order TD-BPF." width="500" >}}

The two digital components in Figure 79(a) represent the subtraction for feedback and a gain factor \(G\) when the phases of \(Q\) are mapped to the output. The subtraction logic is determined by considering the XOR-PWM waveform as a two state input of \pm 1 and similarly the DAC input states which in this case is \(+1/0/-1\). The configuration of both components should compliment each other. This is exemplified when we consider the summing node for another case when two integrators are cascaded with both outputs fed back to achieve a bandpass response. This configuration is shown in Figure 80. The boolean operation required is shown in Figure \ref{fig:T3_logic} where four levels are needed to include carry signals. Here we compromise with two DAC structures with input states \(\pm 1\) and \(+2/0/-2\).

\begin{karnaughmap} \centering \karnaughmap{3}{\(\boldmath{F}=D-(Q+X)\) }{D Q X}{{1}{-1}{3}{1}{-1}{-3}{1}{-1}}{} \caption{ Karnaugh map associated with the subtracting XOR type PWM signals \(Q\) & \(X\) from \(D\) } \label{fig:T3_logic} \end{karnaughmap}

The logic that sums the different phases relies on the coherence of it input. Because we know the different taps of the ring oscillator will not over lap with respect to certain signal range at the output we may isolate the components with the \(AND\) operation of two different phases to isolate small variations in pulse width and combine them. Analogous to a variable gain amplifier, if the signal variations exceed the section where the two phases overlap the output will saturate. The significance here is that gain achieved with this operation has arbitrary gain bandwidth product with negligible power dissipation. Once blocking or interfering signals have been removed we may give the signal reconfigurable gain by only using a handful of gates. It simply relies on increasing the number of oscillator taps from the previous stage while maintaining its feedback configuration which is independent of noise & linearity performance. In fact for a gain \(G\) on the PWM signal \(D\) we require \(2G\) taps that are summed according to Equation 47.

Q = \bigcup_{k=0}^{G-1} \{ D_{R k} \cap D_{R k+N/2-R/2} \} Where R= \floor{\frac{N}{G}}

While \(AND\) & \(OR\) will perform equivalent operations that retain the oscillator phase difference \(\Phi\) but subtract a signal independent component which the gate delay between the two phases being operated on from the pulse width. Here \(\Phi\) is normalized such that it represents \(1\) and \(0\) when the oscillator phase difference is \(\pi\) and \(0\) respectively. This implies that for a positive gate delay number of \Delta T we will have an output off set by Q=\Phi - \frac{\Delta T}{N}. If an \(XOR\) gate is used we extract a signal independent component as Q=2 \cdot \frac{\Delta T}{N}. Both statements will hold true as long as \Phi < 1-2\Delta T which implies that the pulse section used for computation is signal independent. Using this rather simple construction of logic one may sum phases that are one radian apart with a \(XNOR\) gate to realize the absolute value operator which exemplifies the rich utility of this time domain processing.

H_{sys}(s) = \frac{ G }{1 + s/p_1} \text{where} p_1 = \frac{N }{q_{tot}} \cdot \Delta I = \frac{ N \omega_{osc} }{ K }

When the primitive topology in Figure 77 is analysed in the Laplace domain we can derive the Equation 48. This demonstrates a first order characteristic similar to the amplifier and has is a close relationship with the oscillation frequency and the filter bandwidth with the addition of the gain factor \(G\).

This filter configuration here specifically designed for a 0.18 \mu m process. Considering that digital filters will become more viable as the technology node decreases it should be acknowledged that the proposed time-domain filter structure will only be advantageous when frequency dynamic range is large and memory is limited. This is primarily because the \(kT/C\) relations inhibit very aggressive sizing in the oscillator structure particularly if no excess loop gain is available. We still require a large amount of energy storage \(q_{max}\) to prevent external noise sources from perturbing the output. We point out the proposed topology discussed here provides the means by which instrumentation can successfully scale with technology characteristics. Particularly as it is robust towards transistor non-linearity and imperfections. A large component for performance enhancement will rely on calibration components that improve the resilience of the capacitive feedback structure and filter parameters to allow miniaturization. While this specific time-domain topology will not allow the absolute minimum supply voltage this configuration does take advantage of transistor sub-threshold slope which implies a fundamentally superior noise performance that can also circumvent supply noise.

{{< figure src="/images/phd-thesis/TD_sys_sim.svg" title="Figure 81: Transient noise simulation of proposed instrumentation amplifier with time-domain filter structure with a \(6 mV\) peak to peak sine input at \(1 kHz\)}\label{fig:T3_sys_sim" width="500" >}}

{{< figure src="/images/phd-thesis/65nQ.svg" title="Figure 82: Transient noise simulation result of the proposed instrumentation topology in 65nm CMOS with \(2 mV\) peak to peak sine input at \(2 kHz\)}\label{fig:T3_65nm" width="500" >}}

As shown in Figure 81 both a low noise floor and band limiting behaviour is achieved. In particular we see a \(40 dB\) roll off in the noise floor at the \(6 KHz\) cut-off frequency from the \(4^{th}\) order bandpass filter. While Table 11 reveals some similar performance characteristics as the conventional implementation in Section 23 which is the result of using the same optimization strategy. However this work is the first to consider NEF maximization for the design of time-domain circuits. As a result we are much more confident about the power efficiency for this implementation. As a reference this topology was also implemented using a 65nm CMOS process without filtering structure to confirm the scalability of this structure with the noise transient simulation shown in Figure82. The compact capacitive feedback may not allow linearity beyond 60dB but given the scalability and efficiency of this design there is a significant advantage over current state-of-the-art.

Table 11:

Parameter Units \multicolumn{2}{c}{This work} Chandrakasan 27 Tsividis 28 Markovic 12
Modality Time Time Voltage Time
Technology [nm] 180 65 180 28
Supply Voltage [V] (0.6) (0.5) 0.2 \| 0.8 (0.65)
Total Current [(\mu)A] (0.8) (1.5) (1.8) (36.92)
Bandwidth [Hz] (375)-(6k) (1)-(6k) (1)-(1k) (40M)
Filter Order IIR (4^{th}) - IIR (2^{nd}) FIR (8^{th})
Noise Floor [(nV/√{Hz}) ] (69.4) (57.5) (36) (514)
Noise Corner [Hz] (<10) (<10) (0.5) -
SFDR [dB] (58) (54) (50.4) (30)
Area [mm(^2)] 115 \times 100 ^\star 64 \times 69 ^\star 800 \times 775 72 \times 45
NEF (1.18) (1.22) (2.1) (8.13)
Chopped Input Capacitance [pF] 0.04 0.03 21.5 0.01(^\dagger)

This brings us finally to finding a satisfactory answer to how the area-power product figure of merit limited or bounded in some sense. Section \ref{ch:T1_model} argued that linearity and quantization was crucial constraint in the conventional structures which is not the case for the proposed structures. Instead we observe from Equation 41 we must dissipate a certain power level in the oscillator which we know is biased by a fractional current related to the input referred noise through \(\alpha\). As a result the current of the oscillator is fixed and thus when the voltage \(V_T\) is scaled down this sampling contribution will progressively become larger to the point that is it comparable to that of the thermal noise. This equality reveals the oscillator voltage should not approach \alpha \cdot 2U_T or the NEF efficiency factor will degrade. We can assert that although at first it appeared that the sampling noise limited the minimum size of the instrumentation circuit here is limits the minimum power of the circuit in a much more explicit manner. The area requirement is more simply proportional to \(K_F/\alpha\) reflecting the location of the flicker noise corner and the target oscillator frequency. There are some details remain with regard to choosing \(\alpha\) which represents one degree of freedom for trading off oscillator area for minimum voltage but is also strongly depedent on the system bandwidth.

48 Analogue Signal Classification

Now our interests will be redirected towards the methods and hardware implementation of neural spike analogue classification in order to faithfully demonstrate that continuous time instrumentation can provide a substantial improvement over the conventional approach at the system level. In particular we demonstrate an unsupervised method that will allow the classification of spikes without requiring signal quantization at any stage of the adaptive process with empirical results.

{{< figure src="/images/phd-thesis/Nsample.svg" width="500" >}}

{{< figure src="/images/phd-thesis/Fsample.svg" title="Figure 83: Illustration of Nyquist rate feature extraction and using feature enhancement in order to operate at sub-Nyquist rates. " width="500" >}}

The abstract motivation here is illustrated in Figure 83. Utilizing digitized recordings as basis for feature extraction implies the necessity of operating with excessive data rates in order to capture the full bandwidth of features in the signal such that f_s > 2 f_{BW}. By using mechanisms that enhance & extract prominent features directly in the analogue domain this sampling constraint is eliminated 29. Instead we may sample at the rate of spikes present in the recording. Even by approximation we can assert that f_n << f_s. In some sense this motivation inspired by that of adaptive compressed sensing or sparse representation methods 30. Here we will introduce a perspective based on realizing a less generalized method that can be integrated effortlessly which has not yet been attempted in the literature. The challenge for this approach is finding efficient analogue operators that allow direct feature extraction and more importantly feedback mechanisms that adaptively improve the feature extraction process without substantial resource requirements or supervision.

The notion that motivated this specific classification structure is that in order to improve alignment and thereby reducing how noise couples to features relatively high sampling rates are needed. This implies high temporal resolution for these spread-spectrum signals is desirable such an approach but has unavoidable implications on larger memory and power requirements. As we shall see analysing \(K\) features for \(M\) centroids with mixed signal methods will require \(KM+K\) registers and \(max(K,M)\) integrators where register depth has logarithmic dependency on temporal resolution. In contrast to PCA, template matching, and many all-digital methods where temporal resolution or window size is linearly proportional to register count. While it is still vital to align the analogue operators with the spike waveform, increased clock rate does not influence the analogue power dissipation since signal quantization is not performed.

It is relatively rare to see analogue or mixed signal implementations for machine learning classifiers due to the convoluted impact circuit imperfections has of learning dynamics 31. This makes it difficult to successfully realize more complex architectures but the methodology can significantly increase the information storage density at very low power budgets. This is exemplified by the system in 32 that not only achieve 1.04 GOPS/mW but also an area efficiency of 0.03 GOPS/mm\(^2\) using a 130 nm technology. Here the continuous valued charge on floating gates was used to preserve learned features without quantization of errors.

49 Feature Selection

It should be expected that when a spike event it sampled at a relatively high rate we typically only need a select few samples once it is aligned in side a window in order to tell different classes apart. Only when noise becomes a considerable component must we consider multiple samples before we can make an accurate distinction. In such a scenario we would like to use the samples that maximally distinguish classes. That is, we would like to maximize the quality of our feature \(Q_F\) by maximizing for instance a simple sum of \(N\) maximally separating samples \(c\) in the window \(W\) with the linear distance operator \(D\).

Q_F = \frac{\sum_{i=1}^{N} D(W[c_i]) }{N} + \frac{en_{rms}}{√{N}} + \frac{BG(t)}{N}

The expression in Equation 49 primarily tells us that in the presence of white noise \(en_{rms}\) and background activity \(BG\). Increasing the number of samples reduces the contribution of white noise by \(√{N}\). However if the new samples have negligible signal quality our aggregate distance will decrease fractionally by \(1/(N+1)\). This suggests that we should avoid using on a large ensemble of samples because it avoids complexity and may very well improve classification accuracy. Implementing an effective analogue solution requires the evaluation of optimal section in the spike window. To analyse this problem let us define a distance to noise ratio \((DNR)\) with respect to the mean spikes of \(M\) classes and their standard deviation for each sample in the spike window as;

DNR[n] = \frac{\sum_{i=2}^{M} |\mu_{0}[n] - \mu_{i}[n]|}{√{\sum_{k=1}^{M}\sigma_k^2[n]}}

Where \(\mu_0\), \(\mu_k\),\(\sigma_k\) are the mean of all spike waveforms, mean of spike waveforms in class \(k\) and the aggregate standard deviation of class \(k\) respectively. In a more general form of choosing samples that maximize Equation 50 can be seen as classification by expectation maximization33. In Figure 84 we exemplify this quality factor for a number of training data sets along side the first principle component of each set respectively.

{{< figure src="/images/phd-thesis/S2PcorA.svg" width="500" >}}

{{< figure src="/images/phd-thesis/S2PcorB.svg" width="500" >}}

{{< figure src="/images/phd-thesis/S2PcorC.svg" title="Figure 84: Illustration of feature dependency for windowed neural spikes for data sets from \cite[roman]{} with \(-16/,dB\) background noise and \(-20 dB\) added white Gaussian noise." width="500" >}}

A number of observations can be made. In particular for a variety of spike shapes show PCA peaks in the first moment which are convex with respect the depolarizing and repolarization sections. In contrast to the DNR plot shows multiple local minima and thus optimization in this space is not considered trivial. What should be pointed out is that the peaks in the PCA curve will typically correspond to good DNR points as they will relate to sections of high variance due to maximum class separation on top of the noise variance. It should also be clear that our interest does not lie in the refractory period as the slow-wave component is small in magnitude and typically corrupted by high-pass filters that reject environmental interference. Hence we inherently expect poor DNR in the latent refractory period of the recorded action potential. One of the more significant implications of not quantizing the signal is related to the fact that features can generally not be extracted before the detection event. This implies that the operator used for detection should avoid introducing group delay that could result in completely missing the most energetic features in the spiking event.

In actuality this hints at the contradictory advantage of analogue filter detection over an FIR filter equivalent. That is this group delay exclusive to IIR filters is highly dependent on the frequency content of the spike waveform. This can to some extent be observed in Figure 84 where class 2 is deliberately delayed with respect to the alignment point since it has a smaller derivative or equivalently less high frequency content. Here the alignment is a achieved by conditioning the signal with a narrow bandpass filter and looking for a peak above the threshold. While a relatively simple in implementation, if designed to maximize output signal to noise power it can be quite effective.

The approach to self mixing spike classes to with the sampling or alignment strategy is effective at improving features of already very similar spike shapes. In fact this alignment in some sense captures the features existent before the detection event and is mixed with latent features. As a result although analogue techniques are less effective at demonstrating resilience towards background noise they can efficiently mix different features to improve discrimination between spike shapes.

50 Mixed Signal Implementation

The adaptive method employed here is to section the spike waveform depending on the redundancy in the number of features required. Noting that features in the same polarizing phase will correlate strongly since they emerge from the same phenomena. Each section after the detection event is bounded as priori to grantee non-overlapping features (i.e. 0-150 \mu s , 170-310 \mu s) and the maximum variance is found in each section respectively. These are assumed as optimal DNR features referred to as \(\Omega\)s. Our classification will integrate around these points and perform k-means clustering in the resulting space.

{{< figure src="/images/phd-thesis/TD_SYS.svg" title="Figure 85: System abstraction showing the configuration for pre-filtering detection and reconfigurable integrators. " width="500" >}}

The topology is summarized in Figure 85 where the predominant active component lies with the state machine reconfiguring the integrators to optimize the classification. The noise rejection performed by three bandpass filters primarily removes out of band aggressors that prevent accurate classification. Notice the feedback loop for detection relies on a long term average of a narrow band component in addition to a indication of a localized peak above the threshold. Filters \(F1\) & \(F2\) are band pass filters with first order roll-off and a \(0.4-7 kHz\) bandwidth where as \(F3\) is a narrow band of 2.5-4.5 kHz to maximize group delay sensitivity. There is an additional advantage of pre-sectioning the waveform which is that these optimal points can be found independently in sequence. During initialization we seed the starting point in the middle of the section.

{{< figure src="/images/phd-thesis/TD_VM.svg" title="Figure 86: Illustration of closed loop control for finding point of maximum variance. " width="500" >}}

The control loop for implementing this method is shown in Figure 86. Here \(\Omega_{1/2}\) represents the temporal off-set separated by 20 \mu s with respect to the detection event around which the signal is integrated in the analogue domain. The digital integrators primarily average the aggregate statistics to assist long term convergence by tracking the mean value of both integration results. This allows us to compare the deviation from the mean for each spike event and evaluate which has a larger variance. Note a key characteristic is that signal being analysed is band limited to such an extent that we do not expect local minima within each separate section. Since the digital integrators are accumulating boolean results to reject the uncorrelated noise the factor \(a_{1}\) represents the register depth of the counter. To be more precise the \(\Omega\)s are taken as a first order difference between the detection event and an off-set.

X(t) = \underbrace{\int_{t0+\Omega}^{t0+\Omega+\Delta +MA} Q_{out}(t) dt}_{Integrated \: Signal} - \underbrace{ 2 \cdot \int_{t0}^{t0+\Delta} Q_{out}(t) dt}_{Off-set}

This first order difference operation is shown in Equation 51 where \(t_0\), \(\Delta\), and \(Q_{out}\) are the time where the spike even is aligned, static integration time of 20 \mu s, and analogue signal containing the spike. This clarifies that the moving average \(MA\) is mixed with the signal when it evaluates the mean of the \(\Omega_{1}\) section. This approach primarily helps with a self referenced gain that allows smaller register depth.

{{< figure src="/images/phd-thesis/TD_CC.svg" title="Figure 87: Illustration of closed loop control for tracking two-centroids with the \(\Omega_{1}\) feature. " width="500" >}}

After several seconds of training data or equivalent spike count we can assume that the sections that maximize variance have been approached. At this point the \(\Omega\)s are fixed and the centroids need to be generated to complete the adaptive process for classification. As illustrated in Figure 87 a similar feedback mechanism is used to adjust the mean centroids based on boolean results. This particular configuration is the adjustment of two centroids based on one feature in the \(\Omega_1\) section. By adjusting the centroids \(MA_{\mu}\) when it is the closest to the new data point we realize a k-means clustering method with a \(l_0\) norm distance operator.

Since we bound the \(\Omega\) sections to be strictly non-overlapping the same analogue integrator can be used to evaluate the accumulated error of all features to each centroid. Moreover for a small \(a_1\) the centroid adjustment can be time multiplexed leaving a reduced requirement on the total number of integrators required. This implies that \(K\) integrators are needed to iterative adjust all the centroids and \(M\) integrators are needed to evaluate the distance from each centroid. Because this adaptive process is performed in isolation we may perform the training in phases that updates clusters and features separately we will only need \(M\) or \(K\) integrators concurrently which ever is more demanding. However \(KM\) registers are needed to specify the location of each centroid which should be converted to a time-domain signal by using reconfigurable delay lines.

{{< figure src="/images/phd-thesis/TD_ADR.svg" title="Figure 88: Delay line configuration for evaluating the absolute difference between the asynchronous time domain signal \(D\) and the registered centroid position \(X\). " width="500" >}}

Such a configuration is illustrated in Figure 88 where there is course control by selecting different phases of \(D\) and fine control with a conventional multiplexed delay structure. Again the reduction in complexity and rejection of quantization noise by performing time domain computation opposed to the equivalent \(8b\) full adder is typical of this processing modality. Finally the question should remain is that how are centroids initialized without requiring quantization. This requirement is avoided by using an iterative method with respect to centroid generation. After having a single centroid converge to the mean of the feature space we iteratively split centroids in two while training similar to that discussed in Section36. We presume redundant clusters will be generated that are removed if supervision is allowed to intervene or in the case that high level control is used to analyse which clusters are significant after several iterations. The results presented here however do not consider this supervision.

51 Validation

In order to demonstrate the viability of this approach we will simulate a linearised model that is constructed using Matlab. Here we aim show to what extent unsupervised methods are constrained with respect to classification performance. Original data sets used in Section 33 have been up sampled from \(24 kS/s\) to \(240 kS/s\) after the band limiting filters to emulate the continuous time logic that will operate at a high clock rate.

{{< figure src="/images/phd-thesis/feature_clean.svg" width="500" >}} {{< figure src="/images/phd-thesis/feature_noisy.svg" title="Figure 89: comparing PCA and \(\Omega\) feature distribution for the Difficult2 data set. Ground truth for spike classes annotated as cyan, maroon, yellow and blue for false positives." width="500" >}}

The feature space resulting from this method is exemplified in Figure 89. Here we compare it to that of the two component PCA feature space since the \(\Omega\) represents its approximation. It is typical to see multiple additional clusters form either due to the detection of false positives or miss alignment of a spike class in the presence of noise. Initializing extra clusters can typically retain classification accuracy in noisy environment but degrade precision in pristine conditions.

{{< figure src="/images/phd-thesis/A05.svg" width="500" >}} {{< figure src="/images/phd-thesis/B05.svg" title="Figure 90: \(\Omega_2\) and \(\Omega_3\) classification for data sets with \(-26 dB\) background activity." width="500" >}}

{{< figure src="/images/phd-thesis/A01.svg" width="500" >}} {{< figure src="/images/phd-thesis/B01.svg" title="Figure 91: \(\Omega_2\) and \(\Omega_3\) classification for data sets with \(-20 dB\) background activity." width="500" >}}

{{< figure src="/images/phd-thesis/A02.svg" width="500" >}} {{< figure src="/images/phd-thesis/B02.svg" title="Figure 92: \(\Omega_2\) and \(\Omega_3\) classification for data sets with \(-16 dB\) background activity." width="500" >}}

The results in Figure 91 demonstrate classification accuracy in terms of the percentage of all correctly classified events with respect to the ground truth including false positives and false negatives. This indicates that when a signal to noise ratio exceeds \(20dB\) the conditions are quite forgiving towards the simplicity of the algorithm. The two features used here require very little effort to adapt and classify activity. It is important to mention that a fixed filtering configuration is maintained for all test points in order to demonstrate adaptive characteristics.

The results in Figure 92 shows improvement in noisy conditions if the number of sections is increased to 3 implying a three dimensional feature space to improve centroid distance. Using the same algorithm for feature selection the configuration can deal with twice the amount of background noise without supervision. In some sense the fact that the signal is not quantized does not have a significant impact on classification accuracy. This highlights the importance of closed loop algorithms whether resources are constrained or not. As such constructing a convex search space or extracting well reasoned features from underlying phenomena is crucial to reducing in complexity.

{{< figure src="/images/phd-thesis/CA01.svg" title="Figure 93: False alarm rates normalized by true positives for the analogue detection." width="500" >}}

Figure 93 shows that these noise levels detection is relatively consistent but not as adaptive as the digital approach. The threshold for detection will favour generating false positives over false negatives. The main point of failure for noise levels beyond that point lies with the inability to perform feature selection based on localized variance maximization. This is partially expected as PCA will similarly perform poorly when noise levels become comparable to the signal.

In a practical case it may be difficult to ascertain if signal to noise level is adequate to trust classification unless there is confidence to do so in the sense that there may be redundancy in the recording taken. However it is significantly more viable for realizing sub 1 \mu W neural spike classification for large scale recordings considering the resource requirements for adaptive classification. Given that each integrator consumes less that \(50 nW\) in 0.18 \mu m CMOS and each structure needs minimal supervision.

Table 12: Overview of Detection & Classification performance in green for data sets from 34 for different methods. \(\star\) Requires off-chip Supervision. \(\dagger\) White noise is also added at -20dB of the signal power.

Method Analogue Digital Registers Cycles / Sample Data Set \small{Background} @ -16dB (^\dagger) \small{Background} @ -20dB (^\dagger)
\multirow{3}{2.5cm}{RVD} \multirow{3}{2.5cm}{1(\times)ADC} \multirow{3}{2cm}{83} \multirow{3}{2.5cm}{172} Easy 2 \flcl{0.734} \flcl{0.842}
Diff. 1 \flcl{0.729} \flcl{0.871}
Diff. 2 \flcl{0.748} \flcl{0.848}
\multirow{3}{2.5cm}{Template} \multirow{3}{2.5cm}{1(\times)ADC} \multirow{3}{2cm}{105} \multirow{3}{2.5cm}{90} Easy 2 \flcl{0.820} \flcl{0.876}
Diff.1 \flcl{0.860} \flcl{0.835}
Diff. 2 \flcl{0.803} \flcl{0.875}
\multirow{3}{2.5cm}{WDF 35 } \multirow{3}{2.5cm}{2(\times)BP-Filter 1(\times)ADC} \multirow{3}{2cm}{41} \multirow{3}{2.5cm}{104(^\star)} Easy 2 \flcl{0.951} \flcl{0.991}
Diff. 1 \flcl{0.850} \flcl{0.929}
Diff. 2 \flcl{0.846} \flcl{0.916}
\multirow{3}{2.5cm}{(\Omega_3) Features} \multirow{3}{2.5cm}{3(\times)BP-Filter 4(\times)Integrator 4(\times)DAC} \multirow{3}{2cm}{16} \multirow{3}{2.5cm}{1} Easy 2 \flcl{0.800} \flcl{0.946}
Diff. 1 \flcl{0.723} \flcl{0.931}
Diff. 2 \flcl{0.798} \flcl{0.886}

A number of methods are shown in Table 12 where we see classification accuracy and the corresponding hardware requirements in both the analogue and digital domain. The RVD and template methods presented in Section 33 represent the digital approach where little analogue components are needed beside the quantizer. Allocating more processing power or memory resources would imply choosing one over the other. As expected supervised intervention allows methods like WDF 35 to leverage a substantial improvement with respect to resource efficient classification. In this perspective we see using \(\Omega_3\) features as distributing our resources in the analogue domain while still maintaining comparable classification accuracy but require less reliance on digital scaling factors. We provide more comparison details in Section 62 for the proposed digital and analogue methods proposed by this work as well as the equivalent Matlab implementation used for evaluation.

52 Conclusion

This chapter has proposed a number of time-domain constructs that encourage mixed signal design for instrumentation. Where we derived underlying concepts from the phase state of a ring oscillator in order to represent continuous valued time domain memory as the equivalent of a clocked filp-flop or sampled capacitor. In addition we have discussed the means to analytically evaluate and optimize the characteristics of these topologies. Overall we present are clear benefits over conventional implementations such as instrumentation and the functional manipulation of continuous valued signals. Moreover these structures will scale performance with technology due to the extensive use of digital gates. The instrumentation structure in particular gives way to fully synthesized platforms. Performing processing and filtering in the digital domain remains to be critical for robust sensing of LFPs and EAPs in very poor signal to noise conditions. A 0.6 V 58 dB SNDR time domain instrumentation architecture is demonstrated with a NEF of 1.18 that generates multiphase PWM encoded digital signals using sub 0.01 mm\(^2\) footprint and employing bandpass filtering with 40 dB/Dec roll off.

In extension we demonstrated the capacity for mixed signal analogue to information conversion with respect to unsupervised classification that uses adaptive techniques to converge towards specific signal characteristics. Using reconfigurable integration of selected temporal sections in the spike shape lets us effectively focus resources on feature and cluster evaluation without open loop quantization. This mitigates the trade off associated with resolution and digital complexity. The main challenge as pointed out is establishing what dynamics will allow convergence to optimal feature extraction with reduced hardware requirements. Here we exploit certain phenomena in the principle components of spike shapes and the sensitivity of group delay of analogue detection to frequency content in spike waveforms to achieve direct classification.

It is typical that techniques behind instrumentation and signal acquisition are more mature in development and direction when compared to different processing modalities. Especially when realizing mixed signal methods for machine learning where a multitude of convoluted factors impact performance. There is much still to addressed when adaptive techniques are evaluated with respect to their resource efficiency and this will likely be a important aspect that will emerge in many intelligent sensor systems.

References:


  1. B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641 ↩︎

  2. Y.Tsividis, ''Event-driven data acquisition and continuous-time digital signal processing,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, September 2010, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2010.5617618 ↩︎

  3. I.Lee, D.Sylvester, and D.Blaauw, ''A constant energy-per-cycle ring oscillator over a wide frequency range for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.51, no.3, pp. 697--711, March 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2517133 ↩︎

  4. B.Drost, M.Talegaonkar, and P.K. Hanumolu, ''Analog filter design using ring oscillator integrators,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 3120--3129, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2225738 ↩︎

  5. V.Unnikrishnan and M.Vesterbacka, ''Time-mode analog-to-digital conversion using standard cells,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.61, no.12, pp. 3348--3357, December 2014. [Online]: http://dx.doi.org/10.1109/TCSI.2014.2340551 ↩︎

  6. K.Yang, D.Blaauw, and D.Sylvester, ''An all-digital edge racing true random number generator robust against pvt variations,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 1022--1031, April 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2519383 ↩︎

  7. M.Alioto, ''Understanding dc behavior of subthreshold cmos logic through closed-form analysis,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.7, pp. 1597--1607, July 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2009.2034233 ↩︎

  8. C.C. Enz and E.A. Vittoz, Charge-based MOS transistor modeling: the EKV model for low-power AND RF IC design.\hskip 1em plus 0.5em minus 0.4em elax John Wiley & Sons, August 2006. [Online]: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470855452.html ↩︎

  9. A.Demir, A.Mehrotra, and J.Roychowdhury, ''Phase noise in oscillators: a unifying theory and numerical methods for characterization,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.47, no.5, pp. 655--674, May 2000. [Online]: http://dx.doi.org/10.1109/81.847872 ↩︎

  10. A.Hajimiri and T.Lee, ''A general theory of phase noise in electrical oscillators,'' IEEE Journal of Solid-State Circuits, vol.33, no.2, pp. 179--194, February 1998. [Online]: http://dx.doi.org/10.1109/4.658619 ↩︎

  11. A.Hajimiri, S.Limotyrakis, and T.Lee, ''Phase noise in multi-gigahertz cmos ring oscillators,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, May 1998, pp. 49--52. [Online]: http://dx.doi.org/10.1109/CICC.1998.694905 ↩︎

  12. W.Jiang, V.Hokhikyan, H.Chandrakumar, V.Karkare, and D.Markovic, ''A ±50mv linear-input-range vco-based neural-recording front-end with digital nonlinearity correction,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 484--485. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418118 ↩︎

  13. C.Weltin-Wu and Y.Tsividis, ''An event-driven clockless level-crossing adc with signal-dependent adaptive resolution,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2180--2190, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2262738 ↩︎

  14. H.Y. Yang and R.Sarpeshkar, ''A bio-inspired ultra-energy-efficient analog-to-digital converter for biomedical applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.53, no.11, pp. 2349--2356, November 2006. [Online]: http://dx.doi.org/10.1109/TCSI.2006.884463 ↩︎

  15. F.Corradi and G.Indiveri, ''A neuromorphic event-based neural recording system for smart brain-machine-interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.9, no.5, pp. 699--709, October 2015. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2479256 ↩︎

  16. K.A. Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066 ↩︎

  17. J.Agustin and M.Lopez-Vallejo, ''An in-depth analysis of ring oscillators: Exploiting their configurable duty-cycle,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.10, pp. 2485--2494, October 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2476300 ↩︎

  18. K.Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066 ↩︎

  19. M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016. ↩︎

  20. J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052 ↩︎

  21. Y.W. Li, K.L. Shepard, and Y.P. Tsividis, ''A continuous-time programmable digital fir filter,'' IEEE Journal of Solid-State Circuits, vol.41, no.11, pp. 2512--2520, November 2006. [Online]: http://dx.doi.org/10.1109/JSSC.2006.883314 ↩︎

  22. B.Schell and Y.Tsividis, ''A continuous-time adc/dsp/dac system with no clock and with activity-dependent power dissipation,'' IEEE Journal of Solid-State Circuits, vol.43, no.11, pp. 2472--2481, November 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2005456 ↩︎

  23. S.Aouini, K.Chuai, and G.W. Roberts, ''Anti-imaging time-mode filter design using a pll structure with transfer function dft,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.59, no.1, pp. 66--79, January 2012. [Online]: http://dx.doi.org/10.1109/TCSI.2011.2161411 ↩︎

  24. X.Xing and G.G.E. Gielen, ''A 42 fj/step-fom two-step vco-based delta-sigma adc in 40 nm cmos,'' IEEE Journal of Solid-State Circuits, vol.50, no.3, pp. 714--723, March 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2015.2393814 ↩︎

  25. K.Reddy, S.Rao, R.Inti, B.Young, A.Elshazly, M.Talegaonkar, and P.K. Hanumolu, ''A 16-mw 78-db sndr 10-mhz bw ct \delta \sigma adc using residue-cancelling vco-based quantizer,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 2916--2927, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2218062 ↩︎

  26. J.Daniels, W.Dehaene, M.S.J. Steyaert, and A.Wiesbauer, ''A/d conversion using asynchronous delta-sigma modulation and time-to-digital conversion,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.9, pp. 2404--2412, September 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2010.2043169 ↩︎

  27. F.M. Yaul and A.P. Chandrakasan, ''A sub-$\mu$w 36nv/√Hz chopper amplifier for sensors using a noise-efficient inverter-based 0.2v-supply input stage,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 94--95. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417923 ↩︎

  28. S.Patil, A.Ratiu, D.Morche, and Y.Tsividis, ''A 3-10 fj/conv-step error-shaping alias-free continuous-time adc,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 908--918, April 2016. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7433385&isnumber=7446371 ↩︎

  29. M.Verhelst and A.Bahai, ''Where analog meets digital: Analog-to-information conversion and beyond,'' IEEE Solid-State Circuits Magazine, vol.7, no.3, pp. 67--80, September 2015. [Online]: http://dx.doi.org/10.1109/MSSC.2015.2442394 ↩︎

  30. J.M. Duarte-Carvajalino and G.Sapiro, ''Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,'' IEEE Transactions on Image Processing, vol.18, no.7, pp. 1395--1408, July 2009. [Online]: http://dx.doi.org/10.1109/TIP.2009.2022459 ↩︎

  31. R.S. Schneider and H.C. Card, ''Analog hardware implementation issues in deterministic boltzmann machines,'' IEEE Transactions on Circuits and Systems---Part II: Analog and Digital Signal Processing, vol.45, no.3, pp. 352--360, Mar 1998. [Online]: http://dx.doi.org/10.1109/82.664241 ↩︎

  32. J.Lu, S.Young, I.Arel, and J.Holleman, ''A 1 tops/w analog deep machine-learning engine with floating-gate storage in 0.13$\mu$m cmos,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 270--281, January 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2356197 ↩︎

  33. M.T. Wolf and J.W. Burdick, ''A bayesian clustering method for tracking neural signals over successive intervals,'' IEEE Transactions on Biomedical Engineering, vol.56, no.11, pp. 2649--2659, November 2009. [Online]: http://dx.doi.org/10.1109/TBME.2009.2027604 ↩︎

  34. R.Q. Quiroga, Z.Nadasdy, and Y.Ben-Shaul, ''Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,'' Neural Computation, vol.16, pp. 1661--1687, April 2004. [Online]: http://dx.doi.org/10.1162/089976604774201631 ↩︎

  35. D.Y. Barsakcioglu and T.G. Constandinou, ''A 32-channel mcu-based feature extraction and classification for scalable on-node spike sorting,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016. ↩︎