LLM Chat Performance Optimisation for Intel Flex GPUs with RAG

Documents are stored in memory using the InMemoryDocumentStore, a lightweight document storage

A flexible and strong library for creating embeddings that work with a variety of NLP applications is called Sentence Transformers

Twixor used Retrieval Augmented Generation (RAG) to try and increase the chat’s accuracy

They used Haystack’s `InMemoryDocumentStore, a simple and light document storage intended for rapid development and experimentation

During the second stage of this project, They inferred the Neural Chat LLM for Twixor using the Intel Data Centre GPU Flex Series 140

Matrix multiplication and convolution operations are two AI tasks that the Flex 140 GPUs are specifically designed to accelerate using dedicated hardware

For effective AI implementation, Intel offers optimised libraries such as Intel AI Analytics Toolkit, OpenVINO, and oneDNN