

# AI ASIC: Design and Practice (ADaP) Fall 2024 Memory Technologies



# Introduction







北京大学

PEKING UNIVERSITY





- Memory Types
- Memory Organization
- ROM design
- RAM design





| Read-Writ        | te Memory                             | Non-Volatile<br>Read-Write<br>Memory | <b>Read-Only Memory</b>                |  |  |  |
|------------------|---------------------------------------|--------------------------------------|----------------------------------------|--|--|--|
| Random<br>Access | Non-Random<br>Access                  | EPROM<br>E <sup>2</sup> PROM         | Mask-Programmed<br>Programmable (PROM) |  |  |  |
| SRAM<br>DRAM     | FIFO<br>LIFO<br>Shift Register<br>CAM | FLASH<br>RRAM<br>MRAM<br>PCM         |                                        |  |  |  |

#### Memory Spatial Abstraction





Too many select signals: N words == N select signals  $K = log_2 N$ 













# Memory Timing Behavior / Compute-In-Memory





- Add additional (compute mode) inputs
- Perhaps additional address bits

## Memory Architecture





macro in 22nm FinFET technology with adaptive forming/set/reset schemes yielding down to 0.5 V with sensing time of 5ns at 0.7 V." 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2019.

#### Memory Address





Each memory I/O bit width is 128bit

For 1Mb memory, what is the range of memory address?

1Mb/128b= (2<sup>20</sup> bit)/(2<sup>7</sup> bit) = 2<sup>13</sup>

13-wire address is necessary

# Memory Architecture (inside a memory block)





Assume 1Mb is one subarray

Configuration:

- Memory I/O bit width is 128b
- 1Mb/subarray
- Address width: 13b

How get 4Mb memory?

## **Hierarchical Memory Architecture**





Advantages:

- 1. Shorter wires within blocks
- 2. Block address activates only 1 block => power savings





- Read-Only-Memory (ROM)
- Random-Access-Memory (RAM) : Read/Write Memory
  - SRAM
  - DRAM

Before we introduce them, think first: what are good memories?

Density, R/W Speed, Endurance, Retention, Nonvolatility, ...





- Read-Only-Memory (ROM)
- Random-Access-Memory (RAM) : Read/Write Memory
  - SRAM
  - DRAM













# MOS ROM Example

- 4-word x 6-bit ROM
  - Represented with dot diagram
  - Dots indicate 1's in ROM



Looks like 6 4-input pseudo-nMOS NORs



- Word 0: 010101
- Word 1: 011001
- Word 2: 100101
- Word 3: 101010

























(b) Using a metal bypass





- Read-Only-Memory (ROM)
- Random-Access-Memory (RAM) : Read/Write Memory
  - SRAM
  - DRAM





#### □ STATIC (SRAM)

Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential

#### **DYNAMIC (DRAM)**

Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended





## □ STATIC (SRAM)

Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential

#### **DYNAMIC (DRAM)**

Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended







Push Rule: special design rule for SRAM Transistors







**SRAM Bitcell Array** 











#### PEKING UNIVERSITY



- Advantage: reduce read disturbance
- Disadvantage: Too large
- Read disturbance:
  - Unexpectedly change bitcell data when read





#### □ STATIC (SRAM)

Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential

#### DYNAMIC (DRAM)

Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended







No constraints on device ratios Reads are non-destructive Value stored at node X when writing a "1" = V<sub>WWL</sub>-V<sub>Tn</sub>

**Destructive read:** after reading, the data bit changes with 100% probabilities







Write: C<sub>S</sub> is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance

$$\Delta V = V_{BL} - V_{PRE} = V_{BIT} - V_{PRE} \frac{C_S}{C_S + C_{BL}}$$

Voltage swing is small; typically around 250 mV.







## Better Density – 3D Integration





Lee, Dong Uk, et al. "22.3 A 128Gb 8-high 512GB/s HBM2E DRAM with a pseudo quarter bank structure, power dispersion and an instruction-based at-speed PMBIST." 2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020.





- eDRAM: embedded DRAM
- DRAM special process (thick oxide/high threshold) is too slow as logic platform
- eDRAM process incorporates compact-size of capacitors to be compatible with processes for logic (regular threshold transistors)





#### Figure 3: 1 Gig x 8 Functional Block Diagram



Source: Micron





# CPUs can also use registers, caches and scratchpad memory



• If registers are enough, they are all in registers

**Register Files & Cache** 

- If registers are not enough, they need to enter main memory (with an address)
- If "cache hit" happens, they will not go off-chip (large latency/low access bandwidth)

#### **Register files**



```
module picorv32_regs (
       input clk, wen,
        input [5:0] waddr,
        input [5:0] raddr1,
                                            Is this synthesizable? Yes!
       input [5:0] raddr2,
                                                                             registers
                                             Do we use synthesis flow to
                                          ۲
       input [31:0] wdata,
                                             generate memory? Yes!
       output [31:0] rdata1,
                                                                              cache
       output [31:0] rdata2
                                             Sometimes we use this, often
                                          •
);
                                             we use full-custom flow to
       reg [31:0] regs [0:30];
                                                                           main
                                             design register files
                                                                          memory
       always @(posedge clk)
               if (wen) regs[~waddr[4:0]] <= wdata;</pre>
       assign rdata1 = regs[~raddr1[4:0]];
       assign rdata2 = regs[~raddr2[4:0]];
```

endmodule

Made from SRAM? Requirement: Multiport Read/Write

**Register files** 

- How to use this cell to build large array of registers and even a register file?
- ww1, ww2, ww3 control need to be one-hot





Multiport SRAM cell







CPU keeps asking if the data is in the cache:

- If yes, it is a cache hit
- If no, it is a cache miss







Tag: higher 2b address as the tag

To clarify whether the requested word in memory is in the cache

## Inside A Cache





Content-addressable memory (CAM) / associate memory

# Direct-associativity & Set-associativity





direct-associative

n-way set-associative

## Example of direct- and set-associativity



#### One-way set associative

#### (direct mapped)



 Set
 Tag
 Data
 Tag
 Data
 Tag
 Data

 0
 1
 1
 1
 1
 1

#### Eight-way set associative (fully associative)

| Tag | Data |
|-----|------|-----|------|-----|------|-----|------|-----|------|-----|------|-----|------|-----|------|
|     |      |     |      |     |      |     |      |     |      |     |      |     |      |     |      |

Set: increase the cache hit probability

Assumption/Logic behind: Cache miss leads to very long time of data movement

# Challenges for Cache Design



Definition: Scratchpad Memory is just embedded "memory".

Embedded memory: memories that are on the chip, i.e. "integrated memory".

- Complex mechanism
- Large area overhead
- Stochastic characteristics
  - Cache miss penalty is too large!



Banakar, Rajeshwari, et al. "Scratchpad memory: A design alternative for cache on-chip memory in embedded systems." *Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No. 02TH8627).* IEEE, 2002.

# Scratchpad Memory vs. Cache



Definition: Scratchpad Memory is just embedded "memory".



# **Review for Memory**



- Memories
  - Operations: Read, Write
  - Basic circuits: bitcell array, column/row decoders, sense amplifiers (SA)
  - Hierarchy:
    - registers, cache, scratchpad memory, main memory
  - Memory technologies:
    - Transistor-based ROMs, SRAM, DRAM, eNVM (embedded nonvolatile memory)
  - Other concepts:
    - cache miss, cache hit, address space