Micron presented its industry-leading research on AI training model offload to NVMe, collaborating with teams at Dell and NVIDIA
The standard procedure for training huge models whose sizes are increasing quickly is to use as much HBM as possible on the GPU
The cost of parallelizing training over numerous servers is high since data must travel over system and network links, which can quickly become bottlenecks
It transfers the data and control routes to the GPU by replacing and streamlining the Gen5 NVMe SSD driver
The benchmark’s feature aggregation component, which depends on storage performance, shows that performance improvement
Micron aimed to demonstrate at GTC how successfully their future Gen5 NVMe SSD performed AI model offload
Its execution duration accounts for 80% of the whole runtime, and it improves by twice between Gen4 and Gen5 NVMe SSD