The AAAI 2024 vision transformer and Convolutional Neural Network

Model pruning is a major acceleration technique that aims to remove unnecessary weights intentionally while preserving accuracy

The process of fine-tuning a subnet by eliminating activation layers directly may jeopardies the integrity of baseline model weights

In order to address these issues, they suggest a depth pruning methodology that can prune CNN and vision transformer models

AMD depth pruning approach proposes a novel block pruning strategy with reparameterization technique in an effort to reduce model depth

To speed up and conserve memory, each baseline block that has been pruned will progressively grow into a smaller merged block

A unified and efficient depth pruning method for both Convolutional Neural Network  and vision transformer models

AMD applied its approach to ConvNeXtV1, resulting in three pruned models that outperformed popular models with identical inference performance, as shown by P6, which represents pruning 6 blocks of the model

ConvNeXtV1 depth pruning findings on ImageNet performance. A batch size of 128 AMD Instinct MI100 GPUs is used to test speedups For more details