# A Vision System using a 3-Dimensional Integration with Local and Global Wireless Interconnections

Seiji Kameda, Nobuo Sasaki, Hiroshi Ando, Daisuke Arizono, Kentaro Kimoto, Masaki Odahara, Mamoru Sasaki, Takamaro Kikkawa and Atsushi Iwata

Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8530, Japan Phone +81-824-22-7358, Fax +81-824-22-7358, E-mail kameda,iwa@dsl.hiroshima-u.ac.jp

#### 1 Introduction

The vertebrate visual system processes huge visual information in real-time by massively parallel neural networks arranged hierarchically. Inspired by the unique architecture and algorithm, the vision chips having the massively parallel structure have been developed[1]. And then, the multi-chip vision systems have been developed to improve resolution and processing function by using multiple chips configured hierarchically[2]. The multichip system, however, cannot keep the processing speed of each chip if inter-chip communication cannot deal with the huge processed information. It is called 'IO bottleneck'. To overcome the problem, we have proposed a 3-dimensional custom stuck system (3DCSS) integrating two types of wireless interconnections: LWI and GWI[3]. The local wireless interconnection, LWI, utilizes an onchip inductor coupling to transfer local data in parallel between adjacent chips and therefore can handle huge visual information[4]. Global wireless interconnection, GWI, utilizes electromagnetic wave transmission with onchip antennas to transfer global information throughout whole chips [5]. In the present study, we applied the 3DCSS to a vision system.

#### 2 Visual processing chip for 3DCSS

The vision system was configured by multi-chips of the developed visual processing chip for 3DCSS (VP3D chip). A block diagram of the VP3D chip consists of a cell array, pulse width modulator (PWM) and demodulator (PWD), LWI transmitters and receivers, and GWI receivers as shown in Fig.1. The cell array is composed of 80 x 80 pixel circuits. Fig.2(a) shows the circuit design depicting a single pixel of the VP3D chip. It consists of an analog memory, a resistive network and a sample/hold circuit (S/H)[2]. An analog input image is smoothed by the resistive network. The resistive network consists of a MOS resistor [2]. Fig. 2(b) shows the detailed design of the MOS resistor. The MOS resistor is improved the linearity compared to a MOS resistor in [2] by a parallel connection between a PMOS and a NMOS resistive elements. The smoothing area of the resistive network is controllable by external bias voltages. The S/H can serve as a subtractor and compensate for circuit offsets<sup>[2]</sup>. The PWM and PWD are used as image data conversion between analog data of the cell array and pulse width signal to transfer using LWI. The PWM and PWD are arranged every 4 column of the cell array in



Figure 1: A block diagram of the visual processing chip for 3DCSS (VP3D chip).

20 parallel, respectively. Fig.3 shows the circuit designs depicting the PWM and the PWD circuits. The PWD consists of PMOS and NMOS switched current sources (SCS) and an integration capacitor [6]. A pulse width signal from the LWI receiver fed into the switched current source is converted into current pulse and integrated on the capacitor. The output voltage is proportional to the input pulse width. And the PWD circuit can serve as a weighted adder-subtractor because the connection to PMOS or NMOS SCSs and these current values are controllable. The PWD circuit can be compensated mismatches of the SCSs to 8-bit accuracy by binary search. The PWM circuit consists of a clocked CMOS comparator capable of offset compensation[6]. The PWM generates a pulse signal to the LWI transmitter until a ramp voltage, Vrmp, is equal to an input voltage from the pixel circuit. Thus, the output pulse width is proportional to the pixel output voltage.

The pulse width image data is transferred from the LWI transmitter to the LWI receiver between adjacent chips by using inductive coupling consisted of two spiral inductors as shown in Fig.4[4]. The LWI transmitter and receiver were the same number as the PWM and PWD, respectively. The LWI realizes an asynchronous communication due to employing dynamic circuits and self pre-charge technique and therefore can regenerate a



Figure 2: (a)A circuit design depicting the single pixel of the VP3D chip. (b)Detailed design of the CMOS resistor.



Figure 3: Circuit designs depicting the PWM and PWD circuits.

transmitted pulse without any clocking scheme[4]. To separate positive and negative edges of the pulse width signal, the transmitter and receiver have encoder and decoder, respectively. A distance between two spiral inductors can be separated up to about 150 um. As computed waveforms, a maximum transfer rate of the LWI circuits was 250 Mbps/cell, which was limited to a decoding time, and enough for the image data transfer.

The VP3D chip has two channels of GWI, the 1st CH transfers the instruction code and the 2nd CH transfers a global clock. Fig.5 shows a circuit design depicting the GWI receiver. The GWI receiver consists of a dipole antenna and a receiver circuit. An amplitude shift keyed (ASK) sinusoidal microwave is received and demodulated into the clock signal and instruction codes for each chip. The instruction code is fed into an instruction register in



Figure 4: A circuit design depicting the LWI circuit.



Figure 5: A circuit design depicting the GWI receiver.

sync with the global clock. Each chip has unique ID and received a different instruction by including the ID in the instruction code.

# 3 The VP3D chip developed with a 0.35 um CMOS process

The VP3D chip was developed with a 0.35 um, 3 metal, CMOS process and the die size was  $9.8 \times 9.8 \text{ mm}^2$ . The specifications of the VP3D chip developed with 0.35 um process is shown in Table.1. And then, we developed a 3DCSS test module using the VP3D chips. Fig.6 shows a cross-section and the photograph of the 3DCSS test module together with the VP3D chip. We used a flexible printed circuit (FPC) to wiring of power supplies, input signals and test pins. The upper VP3D chip was thinned to reduce a distance to the lower chip. Each chip, whose pad was bonded to the FPC by ultrasonic flip chip bonding, and both chips were stacked and connected by contact pins on a printed-circuit board. The distance between two spiral inductors became about 160 um because the thickness of the thinned chip was 90 um, the thickness of the FPC was 40 um and the height of the Au bump was 30 um.

### 3.1 Experimental results

Fig.7 shows measured transmission characteristics of meander dipole antenna for GWI. The characteristics



Figure 6: 3DCSS test module using flexible printed circuit, (a) cross-section, (b) photo graph.



Figure 7: Measured transmission characteristics of dipole antenna. (a)Experimental setup, (b)Characteristics.

show that a 2GHz and 500mV sinusoidal signal is decreased to about 6mV. It is enough to demodulate into a 3.3V peak-to-peak 100MHz signal by the GWI circuit. Fig.8 shows transmitted and received waveforms of single LWI pair, which can be measured independently of the other circuits of the test module. The LWI pair transmitted 250Mbps pseudo random data with 8ns delay time. Fig.9 shows responses obtained from the test module. An input image was fed into the sender chip (the lower chip) in serial. The center images in Fig.9 show outputs from the sender chip. The bottom image was obtained when resistive network was activated. The upper image represents the input image and the bottom image is blurred strongly by resistive network. These images were transmitted to the receiver chip (the upper chip) by using line parallel LWI circuits. The right images in Fig.9 show output from receiver chip. Unfortunately, the outputs had good transmitted parts and some incorrect parts. If bias voltages of the LWI, vb1 and vb2 in Fig.4, were changed, the incorrect parts showed correct responses. The cause of this problem is thought that the distance between two spiral inductors of the test module had exceeded the design value.

| Table 1: | Specifications | of the ' | VP3D | chip | developed | with |
|----------|----------------|----------|------|------|-----------|------|
| a 0.35 u | m CMOS proc    | ess.     |      |      |           |      |

| Process                            | CMOS 0.35um 2 poly 3 metal             |  |  |  |  |
|------------------------------------|----------------------------------------|--|--|--|--|
| Die size                           | $9.8 \ge 9.8 \text{ [mm^2]}$           |  |  |  |  |
| Number of pixels                   | 80(H) x 80(V) [pixel]                  |  |  |  |  |
| Number of LWIs                     | 20 x 2 [cell]                          |  |  |  |  |
| Power supply                       | 3.3 [V]                                |  |  |  |  |
| GWI receiver circuit               |                                        |  |  |  |  |
| Antenna length                     | 8010 [um] (Meander type)               |  |  |  |  |
| Size <sup>*1</sup>                 | $858.1(H) \ge 820.0(V)[um^2]$          |  |  |  |  |
| Power consumption                  | 40 [mW]                                |  |  |  |  |
| LWI transmitter & receiver circuit |                                        |  |  |  |  |
| Size                               | $1100.0(H) \ge 400.0(V) \ [um^2]$      |  |  |  |  |
| (Inductor)                         | $(300.0(H) \ge 300.0(V) \ [um^2])$     |  |  |  |  |
| Number of turns                    | 6 (Top metal: 3, 2nd metal: 3)         |  |  |  |  |
| $Power consumption^{*2}$           | 5.6  [mW/cell](transmitter + receiver) |  |  |  |  |
| PWM & PWD circuit                  |                                        |  |  |  |  |
| Size                               | $332.4(H) \ge 344.8(V) \ [um^2]$       |  |  |  |  |
| Resolution                         | 8 [bit]                                |  |  |  |  |
| Power                              | (PWM) 0.66 [mW/cell]                   |  |  |  |  |
| $consumption^{*3}$                 | (PWD) 1.70 [mW/cell]                   |  |  |  |  |
| Pixel circuit                      |                                        |  |  |  |  |
| Size                               | $77.1(H) \ge 86.2(V) \ [um^2]$         |  |  |  |  |
| Outputaccuracy <sup>*4</sup>       | $6{\sim}8$ [bit]                       |  |  |  |  |
| Power                              | (processing) 64.7 [uW/pixel]           |  |  |  |  |
| consumption                        | (readout) 78.9 [uW/pixel]              |  |  |  |  |

\*1 Except antenna and wire areas. \*2 Pulse cycle = 2.0[us], Pulse width = 1.0[us]. \*3 Maximum pulse width = 1.0[us], Maximum voltage magnitude = 1.0[V]. \*4 Depend on an input image, the resistive network, and etc.



Figure 8: Measured waveforms of the LWI pair of the test module. (a)Transmitted data, (b)Received data.



Figure 9: Responses obtained from test module.

## 4 The VP3D chip redeveloped with a 0.18 um CMOS process

To eliminate the problem in the VP3D chip developed with the 0.35 CMOS process, a visual processing chip for 3DCSS was redeveloped with a 0.18 um, 6 metal, CMOS process (re-VP3D chip). A block diagram of the re-VP3D chip was almost the same as shown in Fig.1. The cell array was composed of 84 x 84 pixel circuits. The PWM and PWD circuits and the LWI transmitter and receiver circuits were arranged every 4 column of the cell array in 21 parallel, respectively. Circuit designs depicting the pixel circuit, the PWM and PWD circuits and the LWI transmitter and receiver circuits were the same as shown in Fig.2(a), 3 and 4, respectively. A circuit design of the GWI receiver was changed as shown in Fig.10[7]. An on-off-keying (OOK) modulated Gaussian monocycle pulse is received and demodulated into the clock signal and instruction codes for each chip. The specifications of the re-VP3D chip redeveloped with 0.18 um process is shown in Table.2. Due to the 0.18um CMOS process, the die size of the re-VP3D chip was  $5.0 \times 5.0 \text{ mm}^2$  despite almost the same performance and number of circuit elements as the VP3D chip. At around the same time, we developed two kinds of chips, a Detection/Recognition chip and a Reference Memory chip[8]. The re-VP3D chip acts as visual preprocessing for a multi-object recognition system consisted of these chips.

And then, we redeveloped a 3DCSS test module using the re-VP3D chips as almost the same as Fig.6. The thickness of the thinned re-VP3D chip was about 50 um because the small die size makes thinning of the chip easier. Additionally, the thickness of the FPC and the height of the Au bump were reduced 30um and 20um, respectively. The distance between two spiral inductors became about 100um, which was below the design value. Experiments for verification of system operation and evaluating system performance have been proceeding.



Figure 10: A circuit design depicting the GWI receiver of the re-VP3D chip. (a)Whole circuit, (b)GWI receiver.

#### 5 Conclusions

We developed the visual processing chip with LWI and GWI with the 0.35um CMOS process. And the chips were developed to the 3DCSS test module. We measured transmission characteristics of meander dipole antenna by VP3D chip, and confirmed the operation of the LWI

Table 2: Specifications of the re-VP3D chip redeveloped with 0.18um process.

| Process                            | CMOS 0.18um 1 poly 6 metal                 |  |  |  |
|------------------------------------|--------------------------------------------|--|--|--|
| Die size                           | $5.0 \ge 5.0 \ [mm^2]$                     |  |  |  |
| Number of pixels                   | $84(H) \ge 84(V)$ [pixel]                  |  |  |  |
| Number of LWIs                     | 21 x 2 [cell]                              |  |  |  |
| GWI receiver circuit               |                                            |  |  |  |
| Antenna length                     | 3990 [um]                                  |  |  |  |
| Size <sup>*1</sup>                 | $872.9(H) \ge 409.6(V)[um^2]$              |  |  |  |
| Power consumption                  | 54 [mW]                                    |  |  |  |
| Power supply                       | 1.8 [V]                                    |  |  |  |
| LWI transmitter & receiver circuit |                                            |  |  |  |
| Size                               | $552.8(H) \ge 190.0(V) \ [um^2]$           |  |  |  |
| (Inductor)                         | $(200.0(H) \ge 180.0(V) \ [um^2])$         |  |  |  |
| Number of turns                    | 9 (top - $3rd$ metals: 2, 2nd metal: 1)    |  |  |  |
| Powerconsumption <sup>*2</sup>     | $1.59 \ [mW/cell](transmitter + receiver)$ |  |  |  |
| Power supply                       | 1.8 [V]                                    |  |  |  |
| PWM & PWD circuit                  |                                            |  |  |  |
| Size                               | $284.6(H) \ge 130.4(V) \ [um^2]$           |  |  |  |
| Resolution                         | 8 [bit]                                    |  |  |  |
| $Power consumption^{*3}$           | 262.6 [uW/cell]                            |  |  |  |
| Power supply                       | 3.3 [V]                                    |  |  |  |
| Pixel circuit                      |                                            |  |  |  |
| Size                               | $32.6(H) \ge 29.1(V) \ [um^2]$             |  |  |  |
| Outputaccuracy <sup>*4</sup>       | $6{\sim}8$ [bit]                           |  |  |  |
| Power                              | (processing) 81.5 [uW/pixel]               |  |  |  |
| consumption                        | (readout) 56.1 [uW/pixel]                  |  |  |  |
| Power supply                       | 3.3 [V]                                    |  |  |  |

\*1 Except antenna and wire areas. \*2 Pulse cycle = 1.5[us], Pulse width = 1.0[us]. \*3 Maximum pulse width = 1.0[us], Maximum voltage magnitude = 1.0[V]. \*4 Depend on an input image, the resistive network, and etc.

and the cell array by the 3DCSS test module. And then, we redeveloped the visual processing chip with the 0.18 CMOS process. Presently, we conduct verification experiments for the test module using the redeveloped chips.

#### References

- A. Moini, ed. Vision Chips, Kluwer Academic Publishers, 2000.
- [2] S. Kameda and T. Yagi, *IEEE Trans. Neural Net*works, vol.17, no.1, 2006, pp.197-210.
- [3] A. Iwata, et al, ISSCC Dig. Tech. Papers, 2005, pp.262-263.
- [4] M. Sasaki and A. Iwata, VLSI Symp. Dig. Tech. Papers, 2005, pp.348-351.
- [5] A.B.M.H. Rashid, S. Watanabe and T. Kikkawa, *IEEE ED Letters*, vol.23, no.12, 2002, pp.731-733.
- [6] A. Iwata, T. Morie and M. Nagata, *IEICE Trans. Fundamentals*, vol.E84-A, no.2, 2001, pp.486-496.
- [7] N. Sasaki, et al., Proc. of the 5th Hiroshima International Workshop on Nanoelectronics for tera bit Information Processing, 2007.
- [8] H. Ando, et al., Proc. of the 5th Hiroshima International Workshop on Nanoelectronics for tera bit Information Processing, 2007.