← Rankings · AI-RELATED REPO

NavidBroumandfar/agent-behavior-evals-lab

Policy-mapped evaluation lab for AI assistants and agents: approval gates, refusal boundaries, uncertainty handling, tool-use grounding, JSONL traces, and regression reports.

Python GitHub ↗

★ 0

stars

AI relevance

solo dev

tool sigs

SUMMARY AI summary by gpt-5-mini

Agent Behavior Evals Lab is a local evaluation harness for testing AI assistants and agentic systems against policy-defined behavior expectations. It’s intended for engineers, adapter developers, evaluators, and researchers who need reproducible, policy-driven assessments of agent behavior (including future adapters and saved transcripts). It is not an OpenClaw-only project and the current milestone uses a deterministic mock client. Key features: a behavior policy and failure taxonomy; 30 JSONL eval cases across categories; target profiles and system prompts; a deterministic mock model client and rule-based scorer; end-to-end mock eval runner producing scored JSONL traces and a Markdown baseline report. Tooling includes schema/manifest validators, trace/report writers, saved-transcript replay, adapter fixture import/validation, adjudication-aware reporting, regression/comparison checks, and utilities for promoting reviewed outputs.

Language breakdown (by bytes)

Python

100%

Owner

Account

NavidBroumandfar

Type

User

Followers

Applied AI • Workflow Automation • Decision Support

Dates

Created on GitHub	2026-05-06
Last push	2026-05-09
First seen here	2026-05-09
Last fetched	2026-05-09 18:17

Similar repos (same language)

su5176/Mklink-AI-Probe

MKLink AI Probe is an all-in-one embedded debugging CLI for Cortex-M microcontrollers. It bridges the MKLink/MicroLink hardware probe with AI agents (Claude, OpenAI) to enable natural-language driven firmware flashing, real-time waveform visualization, memory inspection, fault diagnosis, and industrial protocol debugging.

Python ★ 11 AI 65

SuvenSeo/codex-usage-tracker

Local-first usage dashboard for Codex, Claude Code, Cursor, and WakaTime AI coding activity

Python ★ 5 AI 100

Kerlewor/narrative_workbench

面向 Claude Code 与 Codex CLI 的 AI 小说写作工程框架 | Inspired by InkOS | AGPL-3.0

Python ★ 4 AI 70

convictional/souls-only

A font for humans no AI and keyboard firmware to type in it.

Python ★ 3 AI 45

webdevtodayjason/hermes-embodiment

Give your Hermes agent a body — an animated face, live RGB presence, and mood, all driven by the agent's real state. Minnie is the flagship example.