Amazon Bedrock Performance Analysis Model Evaluation

This new feature allows you to choose the foundation model that produces the best results for your specific use case, which makes it easier for you to integrate generative AI into your application

[{"selector":"#anim-97b50d34-eb0b-49eb-b6e1-f38b5049d95f [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-34.249999904429615%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}] [{"selector":"#anim-8f23875a-d7e0-4ddf-9bf6-1d4ea1b90167","keyframes":{"transform":["translate3d(115.55555%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-99798ac9-f767-418d-b36d-c8898f29785e","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a622605e-2546-434b-9df6-1784277c01aa","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

If you choose Text classification, for instance, you can assess robustness and/or accuracy in relation to either an internal or your own dataset

[{"selector":"#anim-707ba41d-cac8-4221-bb7c-e706da1ba8d7 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-31.688399169461455%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}] [{"selector":"#anim-3ae7552f-ec68-44e7-86c5-a12cd19a0a60","keyframes":{"transform":["translate3d(115.23810%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-67f3ce53-90a5-4583-9089-42a2cb201e62","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-18521647-a161-4491-8aaf-c66318be1e90","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

For all human assessment settings and for some combinations of job types and metrics for automatic evaluation, the reference response is optional

[{"selector":"#anim-63d16ecf-c8b0-4ebe-8992-0ee7168bf0a0 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-34.249999904429615%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}] [{"selector":"#anim-c1a470d7-bdf7-44e1-8d27-0e83eecf349b","keyframes":{"transform":["translate3d(116.38226%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-9edc3ad5-9d55-4141-9efc-8380b6ec4f4b","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-9106e534-8960-4b8e-8a47-fc4e154fb648","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

These datasets can be selected as needed because they are made to evaluate particular kinds of activities and metrics

[{"selector":"#anim-0fc7824a-7df8-4bf9-81f7-8fa84a4c3e1e [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-35.937499914669296%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}] [{"selector":"#anim-68f351b8-7cf9-42e9-a805-c73c05b03139","keyframes":{"transform":["translate3d(116.16161%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-084f9532-71b6-4dce-b3f8-b2414d295baf","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-172bec5d-d374-4374-a167-eb3093adf1ae","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

The status of every model evaluation job that you have is accessible through the console and the newly added GetEvaluationJob API function

[{"selector":"#anim-e49f645c-5555-489b-9753-bee42e2f0484 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-34.249999904429615%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}] [{"selector":"#anim-e3188383-c529-45f9-9fde-f2640ede008b","keyframes":{"transform":["translate3d(122.77580%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-f314e5b6-31b9-4404-bb17-25077621bd30","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-06e02a3c-fb50-480f-a01a-09d6a7ae6e6d","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

Obtain and Examine the Evaluation Report Obtain the report and evaluate the model’s performance using the previously chosen metrics

[{"selector":"#anim-ef211be0-b73e-476a-a408-15d5d47fbc7c [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(35.937499914669296%, 0, 0)","translate3d(0%, 0, 0)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"both"}] [{"selector":"#anim-faf11da5-89b6-4a7c-ab13-6f422280e067","keyframes":{"transform":["translate3d(125.09092%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-5460f18f-1189-4bd4-8088-342435729bb7","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-545da761-a3c3-4f3f-9f51-eaba8eb6abaf","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

Using the console or the recently introduced model evaluation API, you may now halt a running job

[{"selector":"#anim-2adca13a-ebae-4729-969c-24dde2f24132 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate3d(-31.688399169461455%, 0, 0) translate(-25%, 0%) scale(1.5)","translate3d(0%, 0, 0) translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"fill":"forwards"}] [{"selector":"#anim-df5a94a5-f05a-404a-a011-a2bbc8b9cec8","keyframes":{"transform":["translate3d(125.36764%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-785f04bd-a13d-44c6-bd12-a39dd6f39bec","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-711f293d-69ff-4e46-8197-1d6b23a2ed42","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]

A human worker submits an evaluation of a single prompt and its accompanying inference replies in the human evaluation user interface For more details govindhtech.com

[{"selector":"#anim-b3069336-2fda-44ed-9646-8aa1a232e405 [data-leaf-element=\"true\"]","keyframes":{"transform":["translate(0%, 0%) scale(1.5)","translate(0%, 0%) scale(1)"]},"delay":0,"duration":2000,"easing":"cubic-bezier(.3,0,.55,1)","fill":"forwards"}] [{"selector":"#anim-21765c4a-54fd-480b-95f9-baaec9e85dc7","keyframes":{"transform":["translate3d(120.68966%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-1e2c91fa-4468-456f-b257-77f2352574f1","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-21737592-f482-4b47-901f-d2a4c3039227","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}]