Scoring formulas & detection rules
Every number on this site is computed by the formulas defined here — AI relevance scoring, AI coding tool detection, and adoption counts for all 13 categories. Reproducible by design.
DEFINITION 1. Two-axis AI definition
"Is this an AI project?" cannot be captured with a single criterion. We measure two orthogonal axes.
AI-built (is_ai_built)
Whether the repo's code was written with help from AI coding tools. The repo's purpose is irrelevant — could be a chatbot or a todo app.
Detection: GraphQL checks the existence of any of these files at HEAD:
CLAUDE.md→ Claude Code.cursorrules→ CursorAGENTS.md→ OpenAI Codex / generic.windsurfrules→ Windsurf.github/copilot-instructions.md→ GitHub Copilot.aider.conf.yml→ Aider.clinerules→ Cline
AI-using (is_ai_using)
Whether the repo's code embeds AI capabilities. If it depends on an AI SDK, it definitively integrates AI.
Detection: Check that a dependency manifest references an AI SDK:
- Python:
requirements.txt/pyproject.toml/Pipfile - JS/TS:
package.json - Go:
go.mod· Rust:Cargo.toml
AI SDKs detected: openai, anthropic, langchain, llama-index, huggingface, cohere, mistralai, replicate, qdrant, pinecone, chroma, weaviate, litellm, ollama …
ESTIMATION 2. Sample-based GitHub-wide estimation
GitHub has 100M+ repositories — exhaustive detection is infeasible. We estimate global ratios via statistical sampling.
- Random sample of N=2,000 from BigQuery GH Archive's active set for the target month (deterministic via FARM_FINGERPRINT)
- Detect is_ai_built / is_ai_using on each sampled repo via GraphQL
- Sample ratios: built/N, using/N, plus both/either
- Estimated GitHub-wide count = sample ratio × GH-Archive monthly active total
N=2,000 gives ~±1pt at 95% confidence. Samples are independent across months.
AUX 3. Auxiliary scores (reference)
Legacy auxiliary scores per tracked repo (0-100). The site's primary axis is the two-axis AI detection above; these are kept for repo-level filtering and detail pages.
AI relevance (0-100)
AI keyword density across name/desc/topics/README. ≥40 → tracked.
Solo dev likelihood (0-100)
Heuristic: owner_type, low followers, has README, multiple pushes, has homepage.
Web launch (0-100)
homepage URL + deployed on Vercel/Netlify + landing keywords in README.
Continuity rate
Fraction of repos created 3 months ago that pushed within last 30 days.
DETECT 3. AI coding tool detection
Via GitHub GraphQL `object()` we check whether the following files/directories exist at HEAD. Hits are recorded in `repo_ai_signals` and feed the AI coding tools ranking .
| File / dir | Tool | Vendor |
|---|---|---|
| CLAUDE.md / .claude/ | Claude Code | Anthropic |
| AGENTS.md | OpenAI Codex / 汎用 | OpenAI |
| .cursorrules / .cursor/rules/ | Cursor | Anysphere |
| .github/copilot-instructions.md | GitHub Copilot | GitHub |
| .windsurfrules | Windsurf | Codeium |
| .aider.conf.yml | Aider | Aider |
| .clinerules | Cline | Cline |
ADOPTION 4. Adoption count for the 13 categories
For LLM provider / framework / vector DB / model categories, we substring-search a per-repo corpus (description + GitHub topics + AI-summarized README) against each category's keyword dictionary (<code>ai_keywords</code>, currently 377 active keywords). The count is the number of distinct repos that mention each keyword.
- Case collisions handled with BINARY collation (so `langchain` does not also catch `LangChain` by accident).
- Short noisy keywords (Lit / Bun / Gin / Yi / …) deactivated to reduce false positives.
- Models are aggregated at family level (Claude / GPT / Gemini …) instead of per-version to avoid fragmentation.
- Only latest content snapshots of repos with AI relevance ≥40 are scanned.
CADENCE 5. Update cadence
| Job | Cadence | What |
|---|---|---|
| ai-index:daily | daily 01:10 JST | discover new AI repos, refresh scores |
| ai-index:weekly | Sun 03:10 JST | full repo re-scan |
| ai-index:monthly | 1st 04:10 JST | GH Archive aggregation, monthly metrics |
| ai-index:generate-report | 1st 05:30 JST | monthly report draft + auto-publish |
| ai-index:summarize | manual (within $3/mo budget) | README AI summarization (gpt-5-mini) |