AI Dev Impact Lab JA
GLOSSARY
GLOSSARY

Terms used on this site

Definitions of terms that appear across the site — AI relevance scoring, adoption count, signature files, and more. For full formulas see the methodology page.

SCORING Scoring

AI relevance score

0-100 score of how AI-related a repo is. Checks AI keywords in name/description/topics/README. Threshold ≥40 → tracked in detail.

→ formula

Solo developer score

0-100. Heuristic: owner_type=User, low followers, has README, multiple pushes, has homepage. ≥60 → solo candidate.

Web launch score

0-100. homepage URL + deployed on Vercel/Netlify + README contains demo URL or landing keywords. ≥50 → web-launch candidate.

Composite score

Components normalized to 2021 avg = 100, then weighted-summed. Auxiliary; the headlines are the concrete counts.

Continuity rate

Fraction of AI-relevant repos created 3 months before the target month that pushed within the last 30 days. 0.0-1.0; null when cohort is empty.

Adoption / share

Per-category mention count and share. Computed by substring match on each repo's description + topics + AI summary corpus.

DETECTION Detection

Signature file

Config files generated/consumed by AI coding tools. Their presence in a repo signals tool usage: CLAUDE.md / AGENTS.md / .cursorrules / .windsurfrules / .github/copilot-instructions.md / .aider.conf.yml / .clinerules

→ tools ranking

AI keyword dictionary

The keyword corpus (`ai_keywords` table) per category. LangChain / OpenAI / pgvector / Claude etc. Short noisy keywords (Lit / Bun) deactivated.

AI summary

500-1000 char README summary generated by OpenAI gpt-5-mini. Stored for both ja/en. Generated within a $3/month budget cap.

Detected AI stack

On each repo detail page, the AI-related keywords found in the repo's description / topics / summary, grouped by category.

SOURCES Data sources

GitHub GraphQL / REST

GitHub's two API families. GraphQL for batch repo details + READMEs, REST Search for new-repo discovery, REST Code Search for site-wide signature counts.

GH Archive

Third-party project archiving public GitHub events (create / push / fork / watch). Available free as a public BigQuery dataset.

BigQuery

Google Cloud's data warehouse. Used here to query the GH Archive monthly tables, kept within the 1TiB/month free tier.

Monthly partition

Time-series tables like `repo_metric_snapshots` are partitioned by `RANGE (TO_DAYS(month))` for efficient month-scoped queries and easy old-month archival.

EXPLORE Related

🧮

Methodology

Formulas and detection rules

🔌

Data sources

API usage and rate limits

ℹ️

About

Purpose & policy