AI Dev Impact Lab JA
← Rankings · AI-RELATED REPO

NavidBroumandfar/agent-behavior-evals-lab

Policy-mapped evaluation lab for AI assistants and agents: approval gates, refusal boundaries, uncertainty handling, tool-use grounding, JSONL traces, and regression reports.

Python GitHub ↗
★ 0
stars
75
AI relevance
50
solo dev
0
tool sigs

SUMMARY AI summary by gpt-5-mini

Agent Behavior Evals Lab is a local evaluation harness for testing AI assistants and agentic systems against policy-defined behavior expectations. It’s intended for engineers, adapter developers, evaluators, and researchers who need reproducible, policy-driven assessments of agent behavior (including future adapters and saved transcripts). It is not an OpenClaw-only project and the current milestone uses a deterministic mock client. Key features: a behavior policy and failure taxonomy; 30 JSONL eval cases across categories; target profiles and system prompts; a deterministic mock model client and rule-based scorer; end-to-end mock eval runner producing scored JSONL traces and a Markdown baseline report. Tooling includes schema/manifest validators, trace/report writers, saved-transcript replay, adapter fixture import/validation, adjudication-aware reporting, regression/comparison checks, and utilities for promoting reviewed outputs.

Language breakdown (by bytes)

Python
100%

Owner

Account
NavidBroumandfar
Type
User
Followers
2

Applied AI • Workflow Automation • Decision Support

Dates

Created on GitHub 2026-05-06
Last push 2026-05-09
First seen here 2026-05-09
Last fetched 2026-05-09 18:17

Similar repos (same language)

su5176/Mklink-AI-Probe

MKLink AI Probe is an all-in-one embedded debugging CLI for Cortex-M microcontrollers. It bridges the MKLink/MicroLink hardware probe with AI agents (Claude, OpenAI) to enable natural-language driven firmware flashing, real-time waveform visualization, memory inspection, fault diagnosis, and industrial protocol debugging.

Python 11 AI 65
SuvenSeo/codex-usage-tracker

Local-first usage dashboard for Codex, Claude Code, Cursor, and WakaTime AI coding activity

Python 5 AI 100
Kerlewor/narrative_workbench

面向 Claude Code 与 Codex CLI 的 AI 小说写作工程框架 | Inspired by InkOS | AGPL-3.0

Python 4 AI 70
convictional/souls-only

A font for humans no AI and keyboard firmware to type in it.

Python 3 AI 45
webdevtodayjason/hermes-embodiment

Give your Hermes agent a body — an animated face, live RGB presence, and mood, all driven by the agent's real state. Minnie is the flagship example.

Python 3 AI 70
DeconBear/learn-ai

通过一个仓库,拆解 AI 的运行逻辑和建构方式,培养从 0 到 1 设计和训练 AI 模型的能力

Python 2 AI 100
Junaid-Ahmed-Rupok/continuum-rag-chatbot

🧠 A persistent memory RAG chatbot that never forgets. Uses Phi-3-mini LLM, ChromaDB vector database, and Ebbinghaus memory decay curves.

Python 1 AI 100
genjibyte/AI-Test-Platform
Python 1 AI 50