large language models (LLMs) are incredibly powerful, their high processing power requirements make them unfeasible for use on ordinary computers

The introductory assertion, the typical moderately priced laptop lacks the processing capability necessary to do LLMs at a level that is sufficient

The most popular deep learning operations, such matrix multiplication and convolution, are accelerated by hardware built into modern CPU architectures

These models are as good as or better than larger models because of their creative designs and training methods

The shift in bit width between, for example, 16-bit floating point (fp16) and 8-bit integers (int8) in model weights and activations

4-bit quantisation to the model weights beginning with the Microsoft Phi-2 mode thanks to the OpenVINO Toolkit integration in the Thire Optimum for Intel library

with sixteen Xe Vector Engines (XVE) on each GPU (iGPU). As the name implies, a XVE can perform vector operations on 256-bit vectors

An initial offering is a neural processing unit (NPU) for Intel architectures