FORMULAS & DETECTION · 377 tracked keywords

METHODOLOGY

Scoring formulas & detection rules

Every number on this site is computed by the formulas defined here — AI relevance scoring, AI coding tool detection, and adoption counts for all 13 categories. Reproducible by design.

DEFINITION 1. Two-axis AI definition

"Is this an AI project?" cannot be captured with a single criterion. We measure two orthogonal axes.

AI-built (is_ai_built)

Whether the repo's code was written with help from AI coding tools. The repo's purpose is irrelevant — could be a chatbot or a todo app.

Detection: GraphQL checks the existence of any of these files at HEAD:

CLAUDE.md → Claude Code
.cursorrules → Cursor
AGENTS.md → OpenAI Codex / generic
.windsurfrules → Windsurf
.github/copilot-instructions.md → GitHub Copilot
.aider.conf.yml → Aider
.clinerules → Cline

AI-using (is_ai_using)

Whether the repo's code embeds AI capabilities. If it depends on an AI SDK, it definitively integrates AI.

Detection: Check that a dependency manifest references an AI SDK:

Python: requirements.txt / pyproject.toml / Pipfile
JS/TS: package.json
Go: go.mod · Rust: Cargo.toml

AI SDKs detected: openai, anthropic, langchain, llama-index, huggingface, cohere, mistralai, replicate, qdrant, pinecone, chroma, weaviate, litellm, ollama …

ESTIMATION 2. Sample-based GitHub-wide estimation

GitHub has 100M+ repositories — exhaustive detection is infeasible. We estimate global ratios via statistical sampling.

Random sample of N=2,000 from BigQuery GH Archive's active set for the target month (deterministic via FARM_FINGERPRINT)
Detect is_ai_built / is_ai_using on each sampled repo via GraphQL
Sample ratios: built/N, using/N, plus both/either
Estimated GitHub-wide count = sample ratio × GH-Archive monthly active total

N=2,000 gives ~±1pt at 95% confidence. Samples are independent across months.

AUX 3. Auxiliary scores (reference)

Legacy auxiliary scores per tracked repo (0-100). The site's primary axis is the two-axis AI detection above; these are kept for repo-level filtering and detail pages.

AI relevance (0-100)

AI keyword density across name/desc/topics/README. ≥40 → tracked.

Solo dev likelihood (0-100)

Heuristic: owner_type, low followers, has README, multiple pushes, has homepage.

Web launch (0-100)

homepage URL + deployed on Vercel/Netlify + landing keywords in README.

Continuity rate

Fraction of repos created 3 months ago that pushed within last 30 days.

DETECT 3. AI coding tool detection

Via GitHub GraphQL `object()` we check whether the following files/directories exist at HEAD. Hits are recorded in `repo_ai_signals` and feed the AI coding tools ranking .

File / dir	Tool	Vendor
CLAUDE.md / .claude/	Claude Code	Anthropic
AGENTS.md	OpenAI Codex / 汎用	OpenAI
.cursorrules / .cursor/rules/	Cursor	Anysphere
.github/copilot-instructions.md	GitHub Copilot	GitHub
.windsurfrules	Windsurf	Codeium
.aider.conf.yml	Aider	Aider
.clinerules	Cline	Cline

ADOPTION 4. Adoption count for the 13 categories

For LLM provider / framework / vector DB / model categories, we substring-search a per-repo corpus (description + GitHub topics + AI-summarized README) against each category's keyword dictionary (<code>ai_keywords</code>, currently 377 active keywords). The count is the number of distinct repos that mention each keyword.

Case collisions handled with BINARY collation (so `langchain` does not also catch `LangChain` by accident).
Short noisy keywords (Lit / Bun / Gin / Yi / …) deactivated to reduce false positives.
Models are aggregated at family level (Claude / GPT / Gemini …) instead of per-version to avoid fragmentation.
Only latest content snapshots of repos with AI relevance ≥40 are scanned.

CADENCE 5. Update cadence

Job	Cadence	What
ai-index:daily	daily 01:10 JST	discover new AI repos, refresh scores
ai-index:weekly	Sun 03:10 JST	full repo re-scan
ai-index:monthly	1st 04:10 JST	GH Archive aggregation, monthly metrics
ai-index:generate-report	1st 05:30 JST	monthly report draft + auto-publish
ai-index:summarize	manual (within $3/mo budget)	README AI summarization (gpt-5-mini)

EXPLORE Related

🔌

Data sources

How GitHub API, GH Archive, OpenAI, Google Trends are used

📖

Glossary

Definitions used on this site

📊

13 rankings

Real results computed by these formulas

Scoring formulas &amp; detection rules