AMD SLMs: AMD-Llama-135M And AMD-Llama-135M-Code

There has been a lot of talk and attention given to the fast advancement of artificial intelligence technologies, particularly with regard to large language models (LLMs)

In the now fast growing area of artificial intelligence, LLMs models such as GPT-4 and Llama 3.1 have lifted the bar for performance and capacity

The AMD-Llama-135M and AMD-Llama-135M-code models are the first two tiny language models for the Llama family

Pretraining AMD-Llama-135M on four MI250 nodes, each with four MI250 accelerators, took us six full days

To pretrain the 135M model, it used the SlimPajama and Project Gutenberg datasets. Over 70,000 free ebooks are in Project Gutenberg. This equals 670 billion tokens

It improved the AMD-Llama-135M further by adding 20B code data tokens to make it more precise and enable a certain code mode

To refine our 135M pretrained model, they used the Python part of the StarCoder dataset