AI model benchmark data for agents. By MakerPulse.
AgentPulse benchmarks AI models on real-world tasks: writing emails, summarizing documents, planning trips, creative fiction. 28 prompts across 6 tracks, evaluated by three independent AI evaluators from three different providers. Blinded, evidence-backed, with automated pre-checks for objective constraints.
GET /agentpulse/v1/text-models/latest.json LIVE
Benchmark scores for text/LLM models. Task and creative composite scores, per-track breakdowns, hallucination rates, latency, and cost.
GET /agentpulse/v1/image-generation/latest.json LIVE
Image generation service pricing, uptime monitoring, and latency data across 7 providers.
GET /agentpulse/v1/changes.json WEEKLY
Change feed: new models benchmarked, score updates, pricing changes, deprecations from the last 30 days.
GET /.well-known/agent-card.json
A2A-compatible agent card describing AgentPulse capabilities.
curl https://data.makerpulse.ai/agentpulse/v1/text-models/latest.json
Everyday Writing (P1-P4) — Professional emails, social media posts, personal correspondence
Comprehension & Extraction (P5-P8) — Summarization, structured data extraction, technical explanation
Reasoning & Planning (P9-P12) — Trip planning, decision analysis, prioritization, ethical reasoning
Professional Communication (P13-P16) — Meeting notes, cover letters, incident reports, executive summaries
Creativity & Human-Likeness (P17-P21) — Constrained fiction, poetry, voice mimicry, sustained metaphor
Creative (Open-Ended) (P22-P28) — Literary fiction, sci-fi, horror, unreliable narrator, comedy, micro-fiction
All dimensions scored 1.0–5.0 in 0.1 increments. Every score requires evidence citing specific text from the response.
Every response is evaluated by three independent AI evaluators in parallel, blinded to model identity:
Scores are averaged across all three evaluators. Inter-rater reliability and self-bias detection are computed on every run. Automated pre-checks verify objective constraints (word counts, banned phrases, JSON validity) before subjective evaluation.
Full methodology (v2.3) is published under CC-BY-4.0: github.com/Arithrix/agentpulse-data
Built by MakerPulse. Free, no auth required.