SUMMARY AI summary by gpt-5-mini
Agent Behavior Evals Lab is a local evaluation harness for testing AI assistants and agentic systems against policy-defined behavior expectations. It’s intended for engineers, adapter developers, evaluators, and researchers who need reproducible, policy-driven assessments of agent behavior (including future adapters and saved transcripts). It is not an OpenClaw-only project and the current milestone uses a deterministic mock client. Key features: a behavior policy and failure taxonomy; 30 JSONL eval cases across categories; target profiles and system prompts; a deterministic mock model client and rule-based scorer; end-to-end mock eval runner producing scored JSONL traces and a Markdown baseline report. Tooling includes schema/manifest validators, trace/report writers, saved-transcript replay, adapter fixture import/validation, adjudication-aware reporting, regression/comparison checks, and utilities for promoting reviewed outputs.
Language breakdown (by bytes)
Owner
Applied AI • Workflow Automation • Decision Support
Dates
| Created on GitHub | 2026-05-06 |
| Last push | 2026-05-09 |
| First seen here | 2026-05-09 |
| Last fetched | 2026-05-09 18:17 |
Similar repos (same language)
MKLink AI Probe is an all-in-one embedded debugging CLI for Cortex-M microcontrollers. It bridges the MKLink/MicroLink hardware probe with AI agents (Claude, OpenAI) to enable natural-language driven firmware flashing, real-time waveform visualization, memory inspection, fault diagnosis, and industrial protocol debugging.
SuvenSeo/codex-usage-trackerLocal-first usage dashboard for Codex, Claude Code, Cursor, and WakaTime AI coding activity
Kerlewor/narrative_workbench面向 Claude Code 与 Codex CLI 的 AI 小说写作工程框架 | Inspired by InkOS | AGPL-3.0
convictional/souls-onlyA font for humans no AI and keyboard firmware to type in it.
webdevtodayjason/hermes-embodimentGive your Hermes agent a body — an animated face, live RGB presence, and mood, all driven by the agent's real state. Minnie is the flagship example.
DeconBear/learn-ai通过一个仓库,拆解 AI 的运行逻辑和建构方式,培养从 0 到 1 设计和训练 AI 模型的能力
Junaid-Ahmed-Rupok/continuum-rag-chatbot🧠 A persistent memory RAG chatbot that never forgets. Uses Phi-3-mini LLM, ChromaDB vector database, and Ebbinghaus memory decay curves.
genjibyte/AI-Test-Platform