# A Multi-rate Filter for Broadband Radiotelescopes

P.Camino<sup>1</sup>, D.Dallet<sup>2</sup>, B. Quertier<sup>1</sup>, A.Baudry<sup>1</sup>, B. Le Gal<sup>2</sup>, G.Comoretto<sup>3</sup>

 <sup>1</sup> Observatoire de Bordeaux, LAB, UMR 5804, University of Bordeaux BP 89, 2 rue de l'observatoire 33270 Floirac France
Phone: +33 5 57 77 61 42, Fax: +33 5 57 77 61 55, Email: <u>camino@obs.u-bordeaux1.fr</u>
<sup>2</sup> Laboratoire IMS – Université de Bordeaux - ENSEIRB - UMR 5218 351 Cours de la Libération, Bâtiment A31, 33405, Talence Cedex, France
Phone: +33 5 40 00 26 32, Fax: +33 5 56 37 15 45, Email: <u>dominique.dallet@ims-bordeaux.fr</u>
<sup>3</sup> Osservatorio Astrofisico di Arcetri, Largo Fermi 5, 50125 Firenze, Italy

Abstract-We have investigated the implementation of various structures based on the Cascaded Integrator Comb (CIC) filter to replace the decimation filter in the first stage, 32-time demultiplexed input digital filter of the large Correlator System of the ALMA (Atacama Large Millimeter Array) interferometer project. The main goal of this study is to reduce the power dissipation in the 2-stage, 32-time demultiplexed ALMA digital filter implemented in large FPGA's by optimizing the used number of logic elements available in these chips. Different modified CIC filter implementations are presented and compared in terms of complexity and power performances. We conclude that a CIC filter with a Quarter-band filter significantly improve the overall power dissipation and thus the ALMA filter efficiency.

#### I. Introduction

In radioastronomy, the signal captured by an antenna or an array of antennas is processed in a front-end (heterodyne mixer and intermediate frequency amplification) and further processed in a back-end which mainly consists of a large Correlator System [1] in the ALMA project. The Correlator System aims at detecting radio signals by combining the radio cosmic waves collected by the 64 antennas of the ALMA array. The system presented in this section is part of the ALMA Correlator: a digital filtering system named Tunable Filter Bank (TFB) [2]. It is composed of a Direct Digital Synthesizer (DDS) which provides the 'tunable' feature of the TFB by frequency conversion and two different low-pass FIR filter stages (Figure 1) with linear phase as required for radio interferometry applications. The aim of the filtering system is to extract, by frequency division of the input signal bandwidth, sub-bands of smaller bandwidths in order to perform very high spectral resolution analysis of multiple spectral regions or to achieve multi-resolution analysis of different spectral regions (thus allowing spectral zooming of the most interesting spectral features). The TFB spectral enhancement is obtained by a decimation process.



Figure 1. TFB FIR architecture

## A Input data flow and ALMA filtering system

The incoming signal is a wide band signal (2-4 GHz) digitized at 4 GSamples/s. The 3 bit - 8 level Analog to Digital Converter has been specifically designed for the ALMA project [3]. The small number of bits is

appropriate to the radioastronomy signals which are Gaussian white noise signals by nature; the quantization process is thus seen as an extra noise factor and the overall noise magnitude decreases with the integration time. The 4 GSamples/s input rate delivered by the Analog to Digital Converters can not be processed by the FPGAs in the TFB. These FPGAs are limited by their maximum clock frequency. In the ALMA project a clock of 125 MHz has been chosen for digital signal processing (filtering and correlator systems). However the signal processing has to be performed in real time. This results in a 32-time-demultiplexed input data flow at 125 MS/s. Note that the 32 lines are not independent channels and correspond to 32 successive samples digitized at 4 GHz.

## **B** TFB FIR solution

From now on, only one stream (among two: real and imaginary parts, Figure 1) will be considered all along the paper (Figure 2). The first decimation filter has no strong constraining transition band specification. Its passband is 1/32 of the input band. Attenuation in the stop-band is 47 dB and ripple in the passband is 0.2 dB. It is followed by a 32 decimation process. The second decimation filter stage is a half-band filter which has the same attenuation as the first one and a passband which compensates the first filter passband shape. The final transition region is fixed by the second stage. It is followed by a 2 decimation factor process (Figure 2).



Figure 2. Data processing chain

#### **C** Need for optimization

To cover the entire input wide band and to respect the Nyquist theorem, 32 sub-band filters are implemented to synthesize each 62.5 MHz. For information, 512 TFB cards are required for the complete Correlator System. Each card is populated with 16 FPGA chips which allow us to synthesize 32 sub-bands filters (2 sub-band filters per FPGA). Various TFB designs and power dissipation measurements have been performed as well as FPGA junction temperature measurements. The first implemented design (in Altera Stratix I chip) showed a relatively high power dissipation of about 150 W per card. A good performance improvement has been achieved without any architecture modification by using the Altera Stratix II chips; the dissipation went down to 78 W per card. Despite this good result the Stratix II junction temperature in the operational conditions of the complete Correlator System remains close to the maximum advised by Altera. Thus, in parallel with an improvement of the air flow circulation in the Correlator racks, we have considered how a redesign of the TFB first-stage filter could improve the net dissipation per filter card.

Note that in the present TFB FIR design, each clock rising edge implies a 32 sample shift of the convolution window. Thus, the decimation process by 32 is intrinsic. The convolution window is 128 samples wide, so 32 registers of depth 4 are required. Arithmetic operations are performed with full scale representation. The truncated output is a single 8-bit sample @125 MHz which is processed by the second filter stage.

The fact that the first filter stage operates as a decimation filter with a large transition region and that is followed by a second filter which fixes the final passband leads us to consider a Cascaded Integrator Comb (CIC) filter solution.

## **II.** The CIC Filter Principle

The CIC filter [4] has proved to be an effective element in high decimation or interpolation systems. This kind of filter is usually used to perform the first decimation stage in  $\Sigma \Delta$  analog to digital conversion [5]. It requires no multipliers and uses limited storage resulting in economical hardware implementation (low computation complexity). So, this choice has an impact on the design power consumption, reducing the area and the line switching associated to the computations (in the operators as multipliers).

The z transform of the transfer function is [4]:

$$H(z) = \left(\sum_{k=0}^{D-1} z^{-k}\right)^{N} = \left(\frac{1-z^{-D}}{1-z^{-1}}\right)^{N}$$
(1)

The impulse response is composed of D coefficients equal to 1, it is a Finite Impulse Response (1). The second representation of (1) is a sum transformation resulting in an Integrator part and Comb part cascade. Two parameters have to be specified: the decimation factor D and the order of the comb filter N (Figure 3(a)).



Figure 3. CIC Magnitude frequency response

Note that increasing the order results in a faster passband drop which will have to be compensated with a  $2^{nd}$  filter. The linear phase characteristic inside each magnitude response lobe has to be pointed out. For a defined Pass-Band (corresponding to the normalized frequency  $f_{PB}=1/128$  in our case), different decimation factors D, we get the attenuations shown in table 1 (with  $f_c=1/D - f_{PB}$ , the frequency where the worst case of aliasing error occurs, Figure 3(b)).

Table 1. Worst case attenuation (at f<sub>c</sub>) for different D, N

| Tuble 1. Worst cuse attendation (at 10) for anterent 2, 10 |        |        |        |         |         |
|------------------------------------------------------------|--------|--------|--------|---------|---------|
|                                                            | N=1    | N=2    | N=3    | N=4     | N=5     |
| D=32                                                       | -10 dB | -20 dB | -31 dB | -42 dB  | -52 dB  |
| D=16                                                       | -17 dB | -34 dB | -51 dB | -68 dB  | -84 dB  |
| D=8                                                        | -23 dB | -47 dB | -70 dB | -93 dB  | -116 dB |
| D=4                                                        | -28 dB | -58 dB | -86 dB | -115 dB | -144dB  |
| D=2                                                        | -32 dB | -64 dB | -96 dB | -128 dB | -161dB  |

For the ALMA application, the first idea was to replace the TFB first stage (FIR filter) by a single CIC filter D=32, due to the required decimation. To achieve the required attenuation (47 dB), the CIC order N must be equal to 5 (cf. table 1).

#### **III. Electronic architecture**

The architecture of the CIC filter has been considered because of its low computation complexity providing a potential easy implementation. Actually no multipliers are required and no coefficient storage is needed. So a simple structure can be designed and we expect a priori low power dissipation.

The main electronic structures that can be found in the literature have been examined. They include a classical structure, a modified rotated-angle CIC filter structure [5], a delayed CIC filter structure, a sharpened CIC filter structure, a CIC polyphase decomposition, a non-recursive demultiplexed CIC filter structure, a non-recursive CIC filter structure. Three of them have been more deeply compared due to their potential interest in terms of power dissipation.

#### A Classical structure

The classical way to implement a 2-order CIC filter is shown in Figure 4: 2 *integrator* blocks followed by 2 *comb* blocks. To obtain a higher comb filter order, more blocks have to be cascaded. The main drawback of this architecture is the integrator part (due to the divergence of its output). Hogenauer described in [4] a method to limit the register growth of the integrator part (and then also the complexity of the comb part). The use of two's complement arithmetic allowing 'wrap-around' between the most positive and most negative values has to be pointed out ('wrapped' adders). By applying formulae [4] to each stage based on statistic calculations of each

block, we can determine the number of LSB (Least Significant Bit) that can be discarded without any precision loss. This method allows us to limit the implementation hardware required for this recursive structure.



Figure 4. Usual CIC implementation

## **B** Non-recursive demultiplexed architecture

Another structure, a non-recursive one, based on a reformulation of (1) was elaborated by G. Comoretto [6].

$$C_D^{(n)} = S_D^{(n)} (1 - z^{-1})^{n-1} + z^{-1} \sum_{i=1}^{n-1} A_i(D) C_D^{(n-i)} (1 - z^{-1})^{i-1}$$
(2)

This architecture is composed of adder blocks and small FIR filters (Figure 5). The input format is a demultiplexed one.



Figure 5. Non-recursive structure schematic

#### **C** Non-recursive CIC solution

Factorisation of the sum in (1) leads to [7]:

$$H(z) = \left(\sum_{k=0}^{D-1} z^{-k}\right)^{N} = \prod_{i=0}^{(\log_2 D)-1} (1+z^{-2i})^{N}$$
(3)

with  $D=2^{M}$ , M integer. The structure resulting from this transformation is a non-recursive architecture and each block is followed by a 2 decimation process.

#### IV. Hardware implementation for the ALMA case

For some of the structures described in Section III, design adaptations have to be performed due to the incoming data flow. The structures have been implemented in Altera Stratix II chips, the resources are specified in terms of ALMs (Adaptative Logic Modules). Each structure is modeled with Matlab and implemented in VHDL code. The Modelsim simulator allows us to validate the design before implementation in the hardware by using test patterns from Matlab.

As explained at the end of Section II, replacing the 1<sup>st</sup> TFB filter stage by a CIC filter leads us to use a CIC with D=32, N=5. A VHDL architecture implementation derived from Figure 4 has been considered. The integrator blocks, with 32-demultiplexed input, are implemented with 32 cascaded adders that are processed in one clock cycle (with a feedback between the 31<sup>st</sup> adder and first input) and correspond to the most important part of the

resource requirement. The Hogenauer method allowing the reduction of the register growth has been applied to the structure.

The structure described in [6] is also implemented in the frame of a single-stage decimation filter.

However, most interesting results are obtained for multi-stage decimation filters as shown in Figure 6. Such structures allow us to use a smaller CIC decimation factor.





They are composed of a classical CIC filter structure D=8, N=2 and followed by 2 half-band filters or a quarterband filter with a large transition bandwidth (Figure 7,  $[f_1, f_2]$  being the transition band;  $[0, f_1]$  being the final band obtained after the 2<sup>nd</sup> TFB filter stage). The attenuation of the overall filter is about 47 dB and the decimation 32. The choice of a CIC filter D=8, N=2 is justified by the performances given in table 2. This is the lowest filter order coupled with the highest decimation factor fitting the attenuation requirement. The output is truncated to 8 bits by using Hogenauer method without any significant loss of information. The FIR filters have been synthesized with the Remez algorithm. It results in an 11 ordered FIR halfband filter (4 coefficients are equal to 0, Figure 7(a)) and in a 16 ordered FIR quarter band, both with symmetrical impulse response.



The described halfband filter is used twice with two different structures due to the incoming data flow: 4-line input in Figure 8(a) and 2-line input Figure 8(b). The structure of the quarter-band filter is similar to the one used for the  $1^{st}$  TFB FIR stage. The last filter output is truncated to 8 bits to fit the second TFB filter stage input range.



Figure 8. Implementation of the 2 half-band filters

Use of the non-recursive CIC structure (cf. Figure 9, coupled with half-band filters or the quarter-band filter) provides an alternative to the structure described above. Arithmetic operations are performed with full scale representation. Each block is followed by a decimation operation by 2 that allows us to suppress one addition every two at the block output. This simplification results in a resource optimization. No truncation is used in the

non-recursive CIC filter and the quarter-band output filter is truncated to 8 bits to fit the 2<sup>nd</sup> TFB filter stage format.



Figure 9. CIC non-recursive implementation

The optimal solution is achieved with the CIC-Quarter-band filter design. This solution has been implemented in the ALMA TFB chip to check the power consumption improvement. Table 2 summarizes our results. The original filter design resulted in a dissipation of 78W while the new design implies a total of 60W. This allows us to decrease the junction temperature below the recommended maximum temperature thus enhancing the long term reliability of the FPGA chips.

| Table 2. Summary of the different studied solutions |                        |                |  |  |  |
|-----------------------------------------------------|------------------------|----------------|--|--|--|
|                                                     | Resources (Stratix II) | Max. Frequency |  |  |  |
| TFB 1 <sup>st</sup> stage (original design)         | 1775 ALMs              | 180 MHz        |  |  |  |
| Recursive solution (D=32, N=5)                      | 6500 ALMs              | 140 MHz        |  |  |  |
| Comoretto's Solution [6]                            | 1900 ALMs              | 128 MHz        |  |  |  |
| CIC (D=8,N=2) 2 HBs                                 | 1700 ALMs              | 130 MHz        |  |  |  |
| CIC (D=8,N=2) non rec 2 HBs                         | 750 ALMs               | 200 MHz        |  |  |  |
| CIC (D=8,N=2) QB                                    | 1440 ALMs              | 130 MHz        |  |  |  |
| CIC (D=8,N=2) non-rec QB                            | 630 ALMs               | 200 MHz        |  |  |  |

# V. Conclusion

Several architectures based on the CIC filter have been considered to optimize the ALMA TFB filter power consumption. The main problem encountered is the input data flow that is not adapted to the classical CIC filter structure. We have shown that a non-recursive CIC structure followed by a quarter band filter can optimize the overall dissipation by making optimum use of FPGA ALM resources.

## **VI.** Acknowledgements

This study was supported by the ALMA European Correlator project team and by the University of Bordeaux (OASU, LAB and IMS).

#### References

[1] R. Escoffier, J. Webber and A. Baudry, "64 Antenna Correlator Specifications and Requirements", ALMA System Document, 2005.

http://edm.alma.cl/forums/alma/dispatch.cgi/documents/showFile/100591/d20050708085722/No/ALMA-60.00.00.00-001-B-SPE.pdf

[2] B. Quertier, G. Comoretto et al, "Enhancing the Baseline ALMA Correlator Performances with the Second Generation Correlator Digital Filter System", ALMA Memo, n°476, 2003

[3] C. Recoquillon, A. Baudry et al, "The ALMA 3-bit 4 Gsample/s, 2-4 GHz Input Bandwidth, Flash Analogto-Digital Converter", ALMA Memo, n°532, 2005.

[4] Eugene B. Hogenauer, "An Economical Class of Digital Filters for Decimation and Interpolation", IEEE Transaction on Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp. 155-162, 1981.

[5] F. Daneshgaran, M. Laddomada, "A Novel Class of Decimation Filters for ΣΔ A/D Converters", Wireless Communications and Mobile Computing, vol. 2, pp. 867-882, 2002.

[6] G. Commoretto, "Notes on the implementation of a time demultiplexed comb filter", http://www.obs.ubordeaux1.fr/electronique/Publications/Comoretto.pdf

[7] Y. Gao, L. Jia et al, "A Comparison Design of Comb Decimators for Sigma-Delta ADCs", Analog Integrated Circuits and Signal Processing, n°22, pp. 51-60, 1999.