Massive Multitask Agent Understanding (MMAU)

The new Massive Multitask Agent Understanding (MMAU) benchmark evaluates large language models features

While beneficial, current benchmarks often focus on specific application settings and task fulfilment without assessing the underlying skills

Setting up these settings is laborious, and reproducibility and reliability issues may arise, especially in interactive jobs

Understanding, Reasoning, Planning, Problem-solving, and Self-correction are the five key competencies covered by Graph (DAG) QA

Researchers provide comprehensive and perceptive assessments by evaluating 18 representative models on MMAU

When an LLM comes across a challenging maths problem, several skills are needed to answer it

MMAU is made up of 3,220 unique prompts that are collected from various data sources

The five main skills that MMAU looks for in models are comprehension, reasoning, planning, problem-solving, and self-correction