Granite Vision VLM For Open-Source Chart, Data Visualization
Data visualisations simplify and even entertain by compressing a sea of words and figures into a compelling story
Humans love graphical data, but multi-modal language models trained on text and images may struggle to interpret it
IBM's breakthrough 2 billion-parameter Granite language model, Granite Vision, improves function calling, context window size to 128,000 tokens, and RAG
The encoder turns input images into numerical visual embeddings, and the projector turns them into text a language model can comprehend
Granite Vision was trained on 80.3 million pairs of document photos and 16.3 million pairs of natural photographs, in addition to raw images
IBM Granite Vision would understand the material, which was mostly tables, charts, and illustrations of business processes
Time can be saved in the workplace by using an AI to deconstruct visual materials
This could involve analyzing hundreds of bills at once, detecting product flaws, or gleaning information about auto accidents from photos
It is difficult to train large language models(LLMs) and VLMs on multi-page data because model context windows are frequently too tiny