



### Optically Interconnected Computing at Heriot-Watt

### G.A. Russell, C.J. Moir, R. Gil-Otero, A. McCarthy, S. Kumpatla, S. Brown and J.F. Snowdon

Optically Interconnected Computing (OIC) Group, Heriot-Watt University, Edinburgh

http://www.optical-computing.co.uk

# **OIC Group Research**



http://www.optical-computing.co.uk





# Partners

- BAE SYSTEMS BAe Systems, UK
  - British Telecom, UK
- 🤛 Conjunct, UK
  - Ecole Superieure d'Electricite (SUPELEC), France
  - ILFA GmbH, Germany
- Imperial College London, UK
  - Leeds University, UK
- SIEMENS Siemens Business Services GmbH & Co. OHG, Germany
- sgi Silicon Graphics Inc., UK
- Swiss Federal Institute of Technology (ETHZ), Switzerland
- THALES THALES Communications (TCFR), France
  - Universität Gesamthochschule Paderborn, Germany
  - University of Hagen, Germany
  - Xilinx, USA





### **Demonstrators**





# **SCIOS Sorting Demonstrator**

- Batcher's Bitonic Sort
- The architecture of the demonstrator utilises optoelectronics exploiting non-local interconnection: in this case the perfect Optical relay shuffle.
- The data to be sorted are entered sequentially into the processing loop through electrical I/O.





### **SCIOS Demonstrator**



HERIC

WAT

# **The SPOEC Project**

- A free-space optically connected crossbar demonstrator with Tbit/s I/O to Si.
- Motivation:
  - Interconnect Bottleneck
- Features:
  - Hybrid Si/InGaAs smartpixel logic.
  - Optical clock distribution.
  - Header decoding in silicon.
  - 8×8 VCSEL array input.





# **SPOEC System Overview**



## **Assembled Demonstrator**







# Optoelectronic Neural Networks

- Neural network scalability limited in silicon.
- Free-space optics can be used to perform interconnection.
- Optoelectronics allows scaleable networks.
- Input summation is also done in an inherently analogue manner.
- Noise added naturally.











- DOE provides a shift invariant inhibitory interconnect pattern.
- Neuron input summation is the total power falling on a detector.



# **System Overview**







# Direct Write, Multi-level Waveguides









# **STAR 3D Lightwave Circuits**



 "Optical wiring" capability for high density integration of active optoelectronic devices and packaging to parallel fibre I/O.





## STAR 3DLC Test Waveguides









# High Speed Optoelectronic Memory System (HOLMS)

- •European Commission FW5 Project
- •Low latency memory architecture
- •Multiple memory banks with optical fan-in/-out

**Memory Architecture** 





# High Speed Optoelectronic Memory System (HOLMS)

- •Test bed for multiple optical technologies and packaging
  - •Fibre
  - •PIFSO
  - •Waveguide







### HOLMS MCM







## Programmable Optoelectronic Computing Architecture (POCA)

- •Investigate reconfigurable logic (FPGAs) with optical I/O
- •Logic required for data recovery and error recovery
- Latency for added logic
- •Behavioural model of optoelectronic level for simulation

| . – | _          | _                     |                         | -          |                                                |                |            |             |         |
|-----|------------|-----------------------|-------------------------|------------|------------------------------------------------|----------------|------------|-------------|---------|
|     | A.         | Ν                     | ×.                      |            |                                                |                | <u>]  </u> |             |         |
|     |            |                       | N-                      |            |                                                |                | ם מ        |             |         |
|     | J.         |                       |                         | 17         |                                                |                |            |             |         |
|     | 5          |                       |                         | V I        |                                                |                |            |             |         |
|     | 2          |                       |                         |            |                                                |                |            |             |         |
| I A |            | <u>8</u> .1           |                         | <b>8</b> . |                                                |                | ┛╙╙        |             |         |
|     | 21         | 3                     | ۳I 🛛                    | S.         |                                                |                |            |             |         |
| ΗZ  | Ň.         | 8                     | 21                      |            |                                                |                | ם ם        |             |         |
|     | æ          | ĕ.,                   |                         |            |                                                | <br>]: []: [   |            |             |         |
|     | 77.        |                       |                         |            |                                                | _;; L<br>_,, r |            |             |         |
|     | <u>.</u>   |                       |                         |            |                                                |                |            |             |         |
|     |            | Ē                     |                         |            | Device Utilization Summary                     |                |            |             |         |
|     |            |                       |                         |            | Logic Utilization                              | Used           | Available  | Utilization | Note(s) |
| İ 🗅 |            |                       |                         |            | Number of Slice Flip Flops                     | 210            | 3,840      | 5%          |         |
|     |            |                       | 5                       | 5          | Number of 4 input LUTs                         | 235            | 3,840      | 6%          |         |
|     |            |                       |                         |            | Logic Distribution                             |                |            |             |         |
|     |            | Ľ                     | Ц                       | ĽĽ         | Number of occupied Slices                      | 205            | 1,920      | 10%         |         |
|     |            |                       |                         | Ð          | Number of Slices containing only related logic | 205            | 205        | 100%        |         |
|     |            |                       |                         |            | Number of Slices containing unrelated logic    | 0              | 205        | 0%          |         |
|     |            |                       |                         |            | Total Number 4 input LUTs                      | 321            | 3,840      | 8%          |         |
|     |            |                       |                         |            | Number used as logic                           | 235            |            |             |         |
|     |            |                       |                         |            | Number used as a route-thru                    | 2              |            |             |         |
|     |            |                       | ן בק                    |            | Number used for Dual Port RAMs                 | 16             |            |             |         |
|     |            |                       |                         |            | Number used for 32x1 RAMs                      | 52             |            |             |         |
|     | <b>N</b> : | ı.                    |                         | 57         | Number used as Shift registers                 | 16             |            |             |         |
|     | j.         | i<br>i<br>i<br>i<br>i |                         |            | Number of bonded IOBs                          | 5              | 173        | 2%          |         |
|     |            | Ľ                     | ┢╼┛╢                    | Į L        | IOB Flip Flops                                 | 2              |            |             |         |
|     |            |                       |                         |            | Number of Block RAMs                           | 1              | 12         | 8%          |         |
|     |            |                       | Д                       | Ъ          | Number of GCLKs                                | 4              | 8          | 50%         |         |
|     |            |                       | $\overline{\mathbf{h}}$ | I,         | Number of DCMs                                 | 3              | 4          | 75%         |         |
|     |            |                       | Ľ,                      |            | Total aminulate acts agent for design          | 1 09 757       |            |             |         |
|     |            |                       |                         |            | Additional ITAC gate count for DBc             | 96,151         |            |             |         |



## POCA 2



### •Virtex 2 Pro @ 320MHz

•Rx -> Parity Check -> Input Buffer -> Output Buffer -> Parity Calc -> Tx

•165ns latency = 52 clock cycles





## Technologies





### Rapid prototyping of optomechanical structures



Structures has been made using a fast 3D printer which creates the models directly from digital data in hours





### **Real system**











# Bonding

- Direct epitaxial optical I/O integration in its infancy.
  - Extra epitaxial layers will decrease yield.
  - Thermal and voltage issues critical at 90nm.
  - Cannot run optical I/O off any less than 3.3V.
- Use flip-chip techniques instead until optical I/O epitaxy<sup>Stud</sup> improves.



#### Flip-Chip VCSEL Using Compliant Polymer Bumps







# **Optoelectronic Packaging**

- FC6 flip-chip bonder for
  - IR Thermocompression
  - UV Thermocompression
  - Reflow
- K&S 4124 ball bonder
- EDB80 wire bonder
  - Stud, die and wire bonding



 Access to facilities of specialist packaging companies for larger jobs







# **Solder Bumps**

- Pb/Sn solder phased out. Au/Sn and In bumps.
- Creep rate can give micron level misalignment.
- Flux required. Can lead to impurities in the process.



### Flip-Chip MQW Using Solder Bumps

#### Solder Bumps Ready to Bond





# **Gold Stud Bumps**

- Good conductor, requires no flux, no creep.
- Only needs a modified wire bonder.
- Needs higher temperature and pressure than other attach methods.
- Final attach uses either conductive adhesive or reflow and pressure.





### Flip-Chip VCSEL Using Gold Stud Bumps



# **Anisotropic Conductive Film**

- Gold stud bumps formed and ACF used to create connection.
- Connection formed by compression of conductive particles.
- Resistivity varies from bond-site to bond-site.
- Can only be used at lower speeds: <1GHz.

### Flip-Chip VCSEL Using Gold Stud Bumps and ACF Attach



# **Compliant Polymer Bumps**

- Polymer bumps compressed to 80% original size to make connection.
- Bump elasticity gives tolerance to thermal mismatch.
- Flip-chipped substrate glued in place.
- Larger dimensions allow integrated waveguides.



#### Flip-Chip VCSEL Using Compliant Polymer Bumps



## **Conclusions and Final Thoughts**

- Our experience of high connectivity free-space demonstrators
- Opto-mechanical design important for any optical system
  - Slotted base plate
  - Rapid prototyping
- Opto-electronic packaging
  - Our own small scale facilities and contacts with larger industrial partners
- Computer architecture as well as optical engineering expertise
- The Rest of Heriot-Watt University Physics
  - Diffractive Optics
  - Quantum Cryptography
  - Semiconductor Physics

• .......



