SIMT vs SIMD vs SMT

Noted from http://goo.gl/S6HOZD

SIMD:

elements of short vectors are processed in parallel.
SIMD uses a "short vector" spelling – You break your data into short vectors, and your loop processes them using instructions with ugly names.

void add(uint32_t *a, uint32_t *b, uint32_t *c, int n) {
  for(int i=0; i<n; i+=4) {
    //compute c[i], c[i+1], c[i+2], c[i+3]
    uint32x4_t a4 = vld1q_u32(a+i);
    uint32x4_t b4 = vld1q_u32(b+i);
    uint32x4_t c4 = vaddq_u32(a4,b4);
    vst1q_u32(c+i,c4);
  }
}

SMT (Simultaneous Multithreading)

instructions of several threads are run in parallel.

SIMT

A hybrid between vector processing and hardware threading.
"Scalar spelling", where you write the code of a single thread using standard arithmetic operators, is arguably a better interface than SIMD loops with ugly assembly-like opcodes.
```
__global__ void add(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i]=b[i]+c[i]; //no loop!
}
```
Several threads – a "warp" in NVIDIA terminology – run simultaneously. So each thread needs its own registers.
Several warps, making up a "block", are mapped to an SM, and an SM instantaneously switches between the warps of a block. So each warp needs separate registers for each of its threads.

Conclusion

SIMT is more flexible in SIMD in three areas

Single instruction, multiple register sets
Single instruction, multiple addresses
Single instruction, multiple flow paths

SIMT is less flexible than SMT in three areas

Low occupancy greatly reduces performance
Flow divergence greatly reduces performance
Synchronization options are very limited

SIMT vs SIMD vs SMT

SIMT vs SIMD vs SMT

SIMD:

SMT (Simultaneous Multithreading)

SIMT

Conclusion

results matching ""

No results matching ""