IBM LSM: New Watson Large Speech Model

Large language models, or LLMs, are a term that most people are familiar with because of generative AI’s remarkable ability to generate text and images

[{"selector":"#anim-78a9ff7b-fc97-46ed-935b-fbff43fff5eb","keyframes":{"transform":["translate3d(-121.10727%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .8, 1)","fill":"both"}] [{"selector":"#anim-9855cf35-ae44-44a6-ab76-fbc8e1d39bf8","keyframes":{"transform":["rotateZ(-180deg)","rotateZ(0deg)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .5, 1)","fill":"forwards"}] [{"selector":"#anim-f27b810a-056c-4518-9378-2350e2f7afa0 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-34.249999904429615%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}]

Rigid conversational experiences—yes, Interactive Voice Response, or IVR—remain the norm in today’s contact centers. Unlock the mysteries of Large Speech Models (LSMs)

[{"selector":"#anim-29cf4cd5-1664-4f03-95f6-180a828ee193","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-00f51e8f-0192-4cd2-89a8-2f5c38b1017c","keyframes":{"transform":["translate3d(0px, -128.74396%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a9193e5b-b96f-44f0-92c5-e0ad88bc0718 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(32.85060965203572%, 0, 0)","translate3d(0%, 0, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"both"}]

The development teams of IBM Watsonx and IBM Research have been working diligently over the last few months to create a brand-new, cutting-edge Large Speech Model (LSM)

[{"selector":"#anim-f9a0db56-1115-419e-8b9e-55300b4ee32b","keyframes":{"transform":["translate3d(116.10737%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .8, 1)","fill":"both"}] [{"selector":"#anim-dee5a3d1-1dc2-4e6d-83d7-a0b77ed421ea","keyframes":{"transform":["rotateZ(180deg)","rotateZ(0deg)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .5, 1)","fill":"forwards"}] [{"selector":"#anim-209a1d32-6dd0-49eb-bcd7-1c885ffba0e7 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-12.514648210040342%, 0, 0)","translate3d(0%, 0, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"both"}]

IBM’s LSM is designed with customer care use cases such as real-time call transcription and self-service phone assistants in mind

[{"selector":"#anim-98393b8c-932a-4fc0-8883-d8c7e87821bd","keyframes":{"transform":["translate3d(-126.39405%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .8, 1)","fill":"both"}] [{"selector":"#anim-0a21c5a5-7615-4a66-a803-ad330a74f00c","keyframes":{"transform":["rotateZ(-180deg)","rotateZ(0deg)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .5, 1)","fill":"forwards"}] [{"selector":"#anim-4309de61-e5de-4370-bd7e-c58891fca1d4 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-34.249999904429615%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}]

IBM are thrilled to announce the launch of new LSMs in both English and Japanese, which are only accessible to Watson Speech to Text and Watsonx Assistant phone customers in closed beta right now

[{"selector":"#anim-de61a066-5e90-401d-9142-49096187c900","keyframes":{"transform":["translate3d(-123.38983%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .8, 1)","fill":"both"}] [{"selector":"#anim-cc23f5a5-2cb3-4b0c-b3a8-1c1299cc3dbd","keyframes":{"transform":["rotateZ(-180deg)","rotateZ(0deg)"]},"delay":0,"duration":1000,"easing":"cubic-bezier(.2, 0, .5, 1)","fill":"forwards"}] [{"selector":"#anim-0a92e1db-9838-42b6-b27a-0e88ccac82e6 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-32.85060965203572%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}]

The new LSM outperforms OpenAI’s Whisper model on short-form English use cases, making it our most accurate speech model to date according to internal benchmarking

[{"selector":"#anim-2a39b2dc-5af2-4d53-bfe1-f1a6fa0e9608","keyframes":{"transform":["rotate(-540deg) scale(0.1)","none"],"opacity":[0,1]},"delay":0,"duration":1000,"fill":"both","iterations":1}] [{"selector":"#anim-7c5d859e-b0ac-4506-aa73-7cb62ff7498c [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(21.874999829338595%, 0, 0)","translate3d(0%, 0, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"both"}]

With five times fewer parameters than the Whisper model, IBM’s LSM processes audio ten times faster on the same hardware

[{"selector":"#anim-2a4bae11-ac4e-47dd-af4a-b0bdc6d29be0","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-db800cd8-f6fe-43d0-a51a-070b6222dc93","keyframes":{"transform":["translate3d(0px, -159.04856%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-f37db481-7e2b-4699-b6ad-b254d7e93f64 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-31.248159846960395%, 0, 0)","translate3d(0%, 0, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"both"}]

when processing an audio file that is less than 30 seconds, say 12 seconds, IBM LSM processes after the audio has finished for a total of 30 seconds, despite Whisper padding with silence during that time For more details Govindhtech.com

[{"selector":"#anim-6feafaec-7e50-4125-ad90-420c8e505bd4","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-6553d2e2-99d6-4208-a495-a7b1c65c5355","keyframes":{"transform":["translate3d(0px, -108.79335%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-7a9617d2-5f7a-427e-8a13-d701c2884387 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-12.499999772451451%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}]