← ランキング · AI関連リポジトリ

NavidBroumandfar/agent-behavior-evals-lab

Policy-mapped evaluation lab for AI assistants and agents: approval gates, refusal boundaries, uncertainty handling, tool-use grounding, JSONL traces, and regression reports.

Python GitHub ↗

★ 0

stars

AI関連スコア

個人開発度

AIツール痕跡

SUMMARY AI要約 by gpt-5-mini

Agent Behavior Evals Lab is a local evaluation harness for testing AI assistants and agentic systems against policy-defined behavior expectations. It’s intended for engineers, adapter developers, evaluators, and researchers who need reproducible, policy-driven assessments of agent behavior (including future adapters and saved transcripts). It is not an OpenClaw-only project and the current milestone uses a deterministic mock client. Key features: a behavior policy and failure taxonomy; 30 JSONL eval cases across categories; target profiles and system prompts; a deterministic mock model client and rule-based scorer; end-to-end mock eval runner producing scored JSONL traces and a Markdown baseline report. Tooling includes schema/manifest validators, trace/report writers, saved-transcript replay, adapter fixture import/validation, adjudication-aware reporting, regression/comparison checks, and utilities for promoting reviewed outputs.

使用言語(バイト数比)

Python

100%

オーナー情報

アカウント

NavidBroumandfar

タイプ

User

フォロワー

Applied AI • Workflow Automation • Decision Support

日付

GitHub作成日	2026-05-06
最終Push	2026-05-09
当サイト初検出	2026-05-09
最終取得	2026-05-09 18:17

類似Repo (同じ言語のAI関連Repo)

bgzhang1/sw2api

Reverse proxy for your ai quota from the SW platform.

Python ★ 19 AI 45

dea6cat/2b-agent

This llm Agent was created based on necessity for one that could simply use local models without making them hallucinate and keep them focus

Python ★ 1 AI 75

Zuboy/Carbon-Cost-Optimizer-Agentic-AI

An AI agent that decides where and when to run ML training jobs to minimize cost and carbon emissions, then launches them autonomously — exposed entirely through MCP tools.

Python ★ 1 AI 75

FuchaZ/lm-studio-vision-bridge

给纯文本 AI agent 装上眼睛——通过 LM Studio 本地视觉模型提供 MCP 图片识别服务

Python ★ 1 AI 45

Shrutirowlo/AI-Video-Agent

AI-powered video generation agent that automates script analysis, scene planning, and video creation.

Python ★ 1 AI 75

Shivang0/social-vision

Paste an Instagram/TikTok/YouTube/X link and Claude watches it for you — transcribes audio, reads on-screen text & visuals, explains what it's about. Local, cross-platform Claude Code plugin.

Python ★ 1 AI 70

markusbegerow/claude-fuer-oeffentliche-verwaltung

⚠️ Experimentelle Skill-Sammlung für Digitalisierung, Fachverfahren und Organisation der deutschen öffentlichen Verwaltung (OZG, FIM, FIT-Connect, KI-Einsatz, eIDAS, BSI-IT-Grundschutz, DSGVO) – bitte testen, Issues/PRs willkommen! Keine Rechts-/Datenschutzberatung, keine verbindliche Behördenentscheidung. Keine Bürger-/Mitarbeiterdaten im Repo.

Python ★ 1 AI 75

knowledgegut/course-distiller

把有權使用的工作坊/課程回放消化成套品牌色的步驟式學習 PDF；最後的重組交還人腦。A Claude Code skill by 知識包小腸.

Python ★ 1 AI 45