Make.com banner

ServiceNow Launches EnterpriseOps-Gym for LLM Evaluation

This article was generated with the help of AI and may contain errors.

ServiceNow Research has introduced EnterpriseOps-Gym, a new benchmark for assessing autonomous agents in professional environments. This tool addresses the lack of standards for long-term planning and complex workflows.

EnterpriseOps-Gym: New Benchmark for LLMs

EnterpriseOps-Gym is a containerized Docker environment that simulates eight critical business domains, including customer service and IT services. The benchmark contains 164 relational databases and 512 functional tools, with 1,150 expert-curated tasks.

The tool is important for evaluating the performance of large language models (LLMs) in complex professional settings. Results show that existing models have a success rate below 40%, indicating the need for improvements in strategic planning and task execution.

Source: MarkTechPost

Read also: Pentagon Plans AI Training on Classified Data