Structured Object Language Model

Amazon’s SoLM is a lightweight NLP model designed to generate structured objects that strictly adhere to specific schemas

SoLM is trained to produce outputs only within a defined schema, unlike general-purpose LLMs that require extensive prompt engineering for structured outputs

The model uses self-supervised denoising during training, where noise is added to structured data and the model learns to restore the original structure

Confidence-aware substructure beam search (CABS) is used at inference time, reducing hallucinations and improving the accuracy of generated objects

SoLM achieves cost efficiency an order of magnitude better than state-of-the-art LLMs, while maintaining or exceeding their output accuracy

In product attribute generation, CABS decoding improved recall by 16.7% over traditional beam search at 90% accuracy

SoLM can transform unstructured or partially structured data into clean, schema-compliant records, handling both descriptive text and structured attributes