Terms used on this site
Definitions of terms that appear across the site — AI relevance scoring, adoption count, signature files, and more. For full formulas see the methodology page.
SCORING Scoring
AI relevance score
0-100 score of how AI-related a repo is. Checks AI keywords in name/description/topics/README. Threshold ≥40 → tracked in detail.
Solo developer score
0-100. Heuristic: owner_type=User, low followers, has README, multiple pushes, has homepage. ≥60 → solo candidate.
Web launch score
0-100. homepage URL + deployed on Vercel/Netlify + README contains demo URL or landing keywords. ≥50 → web-launch candidate.
Composite score
Components normalized to 2021 avg = 100, then weighted-summed. Auxiliary; the headlines are the concrete counts.
Continuity rate
Fraction of AI-relevant repos created 3 months before the target month that pushed within the last 30 days. 0.0-1.0; null when cohort is empty.
Adoption / share
Per-category mention count and share. Computed by substring match on each repo's description + topics + AI summary corpus.
DETECTION Detection
Signature file
Config files generated/consumed by AI coding tools. Their presence in a repo signals tool usage:
CLAUDE.md / AGENTS.md / .cursorrules / .windsurfrules / .github/copilot-instructions.md / .aider.conf.yml / .clinerules
AI keyword dictionary
The keyword corpus (`ai_keywords` table) per category. LangChain / OpenAI / pgvector / Claude etc. Short noisy keywords (Lit / Bun) deactivated.
AI summary
500-1000 char README summary generated by OpenAI gpt-5-mini. Stored for both ja/en. Generated within a $3/month budget cap.
Detected AI stack
On each repo detail page, the AI-related keywords found in the repo's description / topics / summary, grouped by category.
SOURCES Data sources
GitHub GraphQL / REST
GitHub's two API families. GraphQL for batch repo details + READMEs, REST Search for new-repo discovery, REST Code Search for site-wide signature counts.
GH Archive
Third-party project archiving public GitHub events (create / push / fork / watch). Available free as a public BigQuery dataset.
BigQuery
Google Cloud's data warehouse. Used here to query the GH Archive monthly tables, kept within the 1TiB/month free tier.
Monthly partition
Time-series tables like `repo_metric_snapshots` are partitioned by `RANGE (TO_DAYS(month))` for efficient month-scoped queries and easy old-month archival.