There has been a lot of talk and attention given to the fast advancement of artificial intelligence technologies, particularly with regard to large language models (LLMs)
In the now fast growing area of artificial intelligence, LLMs models such as GPT-4 and Llama 3.1 have lifted the bar for performance and capacity
The AMD-Llama-135M and AMD-Llama-135M-code models are the first two tiny language models for the Llama family
Pretraining AMD-Llama-135M on four MI250 nodes, each with four MI250 accelerators, took us six full days
To pretrain the 135M model, it used the SlimPajama and Project Gutenberg datasets. Over 70,000 free ebooks are in Project Gutenberg. This equals 670 billion tokens
It improved the AMD-Llama-135M further by adding 20B code data tokens to make it more precise and enable a certain code mode
To refine our 135M pretrained model, they used the Python part of the StarCoder dataset