Small language models (SLMs), on the other hand, are becoming more and more important in the AI model community and provide a special benefit for certain use situations
The first Small language model for the Llama family, AMD-135M, was split into two models: AMD-Llama-135M and AMD-Llama-135M-code
Using four MI250 nodes, 670 billion tokens of general data were used to train the AMD-Llama-135M model from scratch over the course of six days
AMD-Llama-135M: AMD used 670B general data to train the model from scratch on the MI250 accelerator
Using the same hardware, an extra 20 billion tokens of code data were added to the AMD-Llama-135M-code version, which took four days to complete
AMD-code Llama-135M: It improved the AMD-Llama-135M further by adding 20B code data tokens to make it more precise and enable a certain code mode
To enable developers to replicate the model and assist in training further SLMs and LLMs, the training code, dataset, and weights for this model are made available as open source