## Functional-Memory Architectures for Information Processing Systems

H. J. Mattausch, T. Koide, M. A. Abedin, K. Johguchi

Hiroshima University Research Center for Nanodevices and Systems Graduate School of Advanced Sciences of Matter

## Outline 1. Information-Processing Problems from the Memory Point of View 1.1. Access Bandwidth of the Memory 1.2. Separation between Memory and Processing Unit 2. Improved Memory Access Bandwidth by a larger Number of Access Ports 2.1. Efficient Multi-Port Memory Architectures 2.2. Design Examples for Different Applications 3. Unification of Processing Unit and Memory for Pattern Matching





























- 1. Information-Processing Problems from the Memory Point of View
- 1.1. Access Bandwidth of the Memory
- 1.2. Separation between Memory and Processing Unit
- 2. Improved Memory Access Bandwidth by a larger Number of Access Ports
- 2.1. Efficient Multi-Port Memory Architectures
- 2.2. Design Examples for Different Applications
- 3. Unification of Processing Unit and Memory for Pattern Matching









|                            | Multi-bank<br>Register File<br>(HMA)            | Conventional<br>Multi-port-cell<br>Register File | Multi-bank<br>Register File<br>(HMA),<br>estimated | Conventiona<br>Multi-port-cel<br>Register File<br>ISSCC2002 |
|----------------------------|-------------------------------------------------|--------------------------------------------------|----------------------------------------------------|-------------------------------------------------------------|
| Technology                 | 200nm L <sub>gate</sub><br>5 metal CMOS         | 200nm L <sub>gate</sub><br>5 metal CMOS          | 110nm L <sub>gate</sub><br>5 metal CMOS            | 110nm L <sub>gate</sub><br>4 metal CMOS                     |
| Supply voltage             | 1.8 V                                           | 1.8 V                                            | 1.2 V                                              | 1.2 V                                                       |
| Access ports               | 12 (8r, 4w)                                     | 12 (8r, 4w)                                      | 16 (10r, 6w)                                       | 16 (10r, 6w)                                                |
| Registers                  | 128                                             | 128                                              | 34                                                 | 34                                                          |
| Word length                | 32 bit                                          | 32 bit                                           | 64 bit                                             | 64 bit                                                      |
| Core area                  | 0.39 mm <sup>2</sup>                            | 1.43 mm <sup>2</sup>                             | 0.21 mm <sup>2</sup>                               | 0.5 mm <sup>2</sup>                                         |
| Max operation<br>frequency | 640 MHz<br>(simulated)<br>417 MHz<br>(measured) | 330 MHz<br>(simulated)                           | 1140 MHz<br>(from sim.)<br>746 MHz<br>(from meas.) | 545 MHz<br>(measured)                                       |
| Power<br>dissipation       | 210 mW<br>@500 MHz<br>(simulated)               | 105 mW<br>@330MHz<br>(simulated)                 | 106 mW<br>@500 MHz                                 | 220 mW<br>@500 MHz                                          |

th International Workshop, 21 Centuary COE Program on Nanoelectronics for Terabit Information Processing, January 2007













## Memory-Field Construction for Hamming Distance













## 64 Pattern Euclidean Distance Search Example

| Search data<br>(5-bit × 16)                                                               |  |
|-------------------------------------------------------------------------------------------|--|
| Memory Field<br>64 rows, 16 5-bit binaries<br>WLA + WTA<br>Row decoder<br>Output selector |  |
| 2.56 mm<br>Column Decoder<br>+ Read/Write                                                 |  |

| Distance Measure                    | Euclidean-Distance                            |  |
|-------------------------------------|-----------------------------------------------|--|
| Reference Patterns                  | 64 Patterns (16 binaries<br>each 5-bit long)  |  |
| Design Area                         | 5.12 mm <sup>2</sup><br>(2.56mm x 2mm)        |  |
| Nearest Match Unit<br>Area          | 0.53mm <sup>2</sup> = 11.1% of<br>design area |  |
| Nearest Match<br>Times (simulation) | < 157 nsec                                    |  |
| Power Dissipation<br>(simulation)   | < 195 mW                                      |  |
| Chip size                           | 4.9 mm × 4.9 mm                               |  |
| Chip pin                            | 144                                           |  |
| No. of Transistors                  | 1,86,648                                      |  |
| Technology                          | 0.35 µm, 2-poly, 3-metal<br>CMOS              |  |
| Supply Voltage                      | 3.3V                                          |  |



| Conclusion                                                                                                                                                                                                                                                                                                              |  |  |  |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| <ul> <li>Data transmission between memory and<br/>processing unit limits the performance<br/>improvements of integrated systems.</li> </ul>                                                                                                                                                                             |  |  |  |  |
| <ul> <li>Two methods for mitigating this problem have<br/>been proposed:         <ul> <li>Bank-based Multi-porting of the memory</li> <li>Unification of memory and processing unit</li> </ul> </li> </ul>                                                                                                              |  |  |  |  |
| <ul> <li>Applications of these two methods lead to key technologies for terabit information processing, enabling in particular:         <ul> <li>Tera-bit-per-second (Tbps) memory-access bandwidth</li> <li>Tera-operation-per-second (TOPS) processing power for the pattern-matching function</li> </ul> </li> </ul> |  |  |  |  |