AI Dev Impact Lab JA
DATA SOURCES
DATA SOURCES

What we fetch and from where

aidev-index relies exclusively on the public APIs and datasets below. All numbers shown on the site are reproducible from these inputs. Source data remains under its original rights; we only aggregate and visualize.

6,522
repos in DB
152
GH Archive rows
900
OpenAI summaries
22
months tracked

SOURCES Source list

Source Use Cadence / rate URL
GitHub GraphQL API Repo details, owner info, signature file existence, README fetch daily, 5,000 req/h docs↗
GitHub REST Search Recent 24-48h new repo discovery daily, 30 req/min docs↗
GitHub REST Code Search AI tool config-file counts across all GitHub (reference scale) monthly, 9 req/min docs↗
GH Archive (BigQuery) GitHub-wide monthly event aggregates (used as denominator): create / push / fork / watch monthly, within 1TiB/mo free tier gharchive.org↗
OpenAI API (gpt-5-mini) README summarization + monthly report drafts $3/mo budget brake, 50 req/min platform↗
Google Trends Search interest for related keywords (auxiliary signal) monthly, manual CSV import trends↗

PIPELINE From source to ranking

1. Discovery

Split-query GitHub REST Search (3 query groups, 3 keywords each) finds new repos matching AI-related terms. Inserted into `github_repositories`.

2. Detail fetch

GitHub GraphQL fetches repo details, owner, README, topics, and 7 AI tool signature files in a single query (batch size 20, with 5xx retry).

3. Scoring

AI relevance, solo dev, web launch scores computed monthly into `repo_metric_snapshots`. AI tool signatures into `repo_ai_signals`.

→ formulas

4. AI summarization

OpenAI gpt-5-mini summarizes READMEs (500-1000 chars) into `repo_content_snapshots.readme_summary`. Stops automatically at $3/mo budget cap.

5. Monthly aggregate

GH Archive (BigQuery) → `global_monthly_aggregates` stores GitHub-wide monthly event counts, used as denominator for AI-relevant share.

6. Adoption count

13 category keyword dictionaries are substring-matched against each repo's corpus (description + topics + summary). Unique repo counts cached 30 min.

→ 13 rankings

COST Cost & quota management

BigQuery

Month-partitioned tables + column pruning + maximum_bytes_billed=100GB hard cap keep usage within the 1TiB/mo free tier (typical: a few hundred GB).

OpenAI

Monthly budget $3 USD. Each call pre-flight checks cumulative + estimate; throws BudgetExceededException when it would breach the cap. Real cost per summary: $0.001-0.002.

GitHub API

Personal Access Token (Classic, no scopes): 5,000 core / 30 search / 9 code-search per minute. Batched + exponential-backoff retry on 5xx.

Hosting

Laravel 13 + PHP 8.5 + MySQL 5.7 on Xserver shared hosting. Monthly partitioning keeps DB size in check.

EXPLORE Related

🧮

Methodology

How these sources translate into the numbers

ℹ️

About

Purpose & operating policy

📊

13 rankings

Aggregation results