LLM Chat Performance Optimisation for Intel Flex GPUs with RAG
Documents are stored in memory using the InMemoryDocumentStore, a lightweight document storage
A flexible and strong library for creating embeddings that work with a variety of NLP applications is called Sentence Transformers
Twixor used Retrieval Augmented Generation (RAG) to try and increase the chat’s accuracy
They used Haystack’s `InMemoryDocumentStore, a simple and light document storage intended for rapid development and experimentation
During the second stage of this project, They inferred the Neural Chat LLM for Twixor using the Intel Data Centre GPU Flex Series 140
Matrix multiplication and convolution operations are two AI tasks that the Flex 140 GPUs are specifically designed to accelerate using dedicated hardware
For effective AI implementation, Intel offers optimised libraries such as Intel AI Analytics Toolkit, OpenVINO, and oneDNN