What we fetch and from where
aidev-index relies exclusively on the public APIs and datasets below. All numbers shown on the site are reproducible from these inputs. Source data remains under its original rights; we only aggregate and visualize.
SOURCES Source list
| Source | Use | Cadence / rate | URL |
|---|---|---|---|
| GitHub GraphQL API | Repo details, owner info, signature file existence, README fetch | daily, 5,000 req/h | docs↗ |
| GitHub REST Search | Recent 24-48h new repo discovery | daily, 30 req/min | docs↗ |
| GitHub REST Code Search | AI tool config-file counts across all GitHub (reference scale) | monthly, 9 req/min | docs↗ |
| GH Archive (BigQuery) | GitHub-wide monthly event aggregates (used as denominator): create / push / fork / watch | monthly, within 1TiB/mo free tier | gharchive.org↗ |
| OpenAI API (gpt-5-mini) | README summarization + monthly report drafts | $3/mo budget brake, 50 req/min | platform↗ |
| Google Trends | Search interest for related keywords (auxiliary signal) | monthly, manual CSV import | trends↗ |
PIPELINE From source to ranking
1. Discovery
Split-query GitHub REST Search (3 query groups, 3 keywords each) finds new repos matching AI-related terms. Inserted into `github_repositories`.
2. Detail fetch
GitHub GraphQL fetches repo details, owner, README, topics, and 7 AI tool signature files in a single query (batch size 20, with 5xx retry).
3. Scoring
AI relevance, solo dev, web launch scores computed monthly into `repo_metric_snapshots`. AI tool signatures into `repo_ai_signals`.
4. AI summarization
OpenAI gpt-5-mini summarizes READMEs (500-1000 chars) into `repo_content_snapshots.readme_summary`. Stops automatically at $3/mo budget cap.
5. Monthly aggregate
GH Archive (BigQuery) → `global_monthly_aggregates` stores GitHub-wide monthly event counts, used as denominator for AI-relevant share.
6. Adoption count
13 category keyword dictionaries are substring-matched against each repo's corpus (description + topics + summary). Unique repo counts cached 30 min.
COST Cost & quota management
BigQuery
Month-partitioned tables + column pruning + maximum_bytes_billed=100GB hard cap keep usage within the 1TiB/mo free tier (typical: a few hundred GB).
OpenAI
Monthly budget $3 USD. Each call pre-flight checks cumulative + estimate; throws BudgetExceededException when it would breach the cap. Real cost per summary: $0.001-0.002.
GitHub API
Personal Access Token (Classic, no scopes): 5,000 core / 30 search / 9 code-search per minute. Batched + exponential-backoff retry on 5xx.
Hosting
Laravel 13 + PHP 8.5 + MySQL 5.7 on Xserver shared hosting. Monthly partitioning keeps DB size in check.