Granite Vision VLM For Open-Source Chart, Data Visualization

Data visualisations simplify and even entertain by compressing a sea of words and figures into a compelling story

Humans love graphical data, but multi-modal language models trained on text and images may struggle to interpret it

IBM's breakthrough 2 billion-parameter Granite language model, Granite Vision, improves function calling, context window size to 128,000 tokens, and RAG

The encoder turns input images into numerical visual embeddings, and the projector turns them into text a language model can comprehend

Granite Vision was trained on 80.3 million pairs of document photos and 16.3 million pairs of natural photographs, in addition to raw images

IBM Granite Vision would understand the material, which was mostly tables, charts, and illustrations of business processes

Time can be saved in the workplace by using an AI to deconstruct visual materials

This could involve analyzing hundreds of bills at once, detecting product flaws, or gleaning information about auto accidents from photos

It is difficult to train large language models(LLMs) and VLMs on multi-page data because model context windows are frequently too tiny