Validated biomedical evidence,
delivered by API
Query CIViC, PharmGKB, and DailyMed records — curated, deduplicated, and scored for confidence — without building or maintaining any pipelines. Every record carries an evidence state (S0–S5), a Markov confidence score, and a full source audit trail.
Authentication
All /v1 endpoints require a Bearer API key in the Authorization header. Keys are scoped to your team and prefixed with at_live_.
curl https://api.attester.ai/v1/datasets \
-H "Authorization: Bearer at_live_your_key_here"Getting a key
Keys are created from your account dashboard. The full key is shown exactly once — store it in a secret manager (AWS Secrets Manager, Vault, GitHub Actions secrets). We store only a SHA-256 hash.
/v1 access requires an active Builder or Enterprise subscription. Explorer accounts receive catalog metadata but cannot call the records or download endpoints.View plans →Quick start
Three steps from zero to your first PGx DataFrame:
import requests
API_KEY = "at_live_your_key_here"
BASE = "https://api.attester.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Step 1 — list datasets
datasets = requests.get(f"{BASE}/datasets", headers=HEADERS).json()
slug = datasets["datasets"][0]["slug"] # e.g. "pharmacogenomics-variant-map"
# Step 2 — fetch the first page of evidence records
page = requests.get(
f"{BASE}/datasets/{slug}/records",
headers=HEADERS,
params={"gene": "CYP2D6", "minConfidence": "0.7", "limit": 100}
).json()
print(f"{page['meta']['total']} records for CYP2D6 above 0.70 confidence")
# Step 3 — download the full Parquet file into a DataFrame
import pandas as pd, io
parquet_bytes = requests.get(f"{BASE}/datasets/{slug}/download", headers=HEADERS).content
df = pd.read_parquet(io.BytesIO(parquet_bytes))
print(df.head())Endpoints
Base URL: https://api.attester.ai
Filtering & pagination
Filters are combinable. All string filters are case-insensitive and support partial matches. Use state=S4 or state=S5 plus a minConfidence threshold to get publication-ready, high-confidence records only.
Evidence states
| State | Meaning | Equivalent source | Typical confidence |
|---|---|---|---|
| S5 | Regulatory / guideline-level | CPIC guideline, FDA label, PharmGKB 1A | 0.85–1.0 |
| S4 | Strong clinical evidence | CIViC level A, PharmGKB 1B | 0.70–0.90 |
| S3 | Moderate evidence | CIViC level B, PharmGKB 2A/2B | 0.50–0.75 |
| S2 | Preliminary / emerging | CIViC level C/D, PharmGKB 3 | 0.30–0.55 |
| S1 | Case reports / speculative | PharmGKB 4, single case reports | 0.10–0.35 |
| S0 | Unclassified | Insufficient evidence to classify | 0.0–0.15 |
Pagination
GET /v1/datasets/pharmacogenomics-variant-map/records?page=2&limit=100
# Response includes _links.next when more pages exist:
{
"meta": { "total": 847, "page": 2, "limit": 100, "pages": 9 },
"_links": { "next": "/v1/datasets/.../records?page=3&limit=100" }
}Code examples
Python — pandas + Parquet
import requests, pandas as pd, io
HEADERS = {"Authorization": "Bearer at_live_your_key_here"}
BASE = "https://api.attester.ai/v1"
SLUG = "pharmacogenomics-variant-map"
# Download the full Parquet file (preferred for ML pipelines)
resp = requests.get(f"{BASE}/datasets/{SLUG}/download", headers=HEADERS)
df = pd.read_parquet(io.BytesIO(resp.content))
# Filter for high-confidence CYP2D6 records
pgx = df[(df["geneSymbol"] == "CYP2D6") & (df["confidenceScore"] > 0.7)]
print(pgx[["variantName", "drug", "evidenceState", "confidenceScore"]].head(10))Python — polars (fast, lazy evaluation)
import polars as pl, requests, io
HEADERS = {"Authorization": "Bearer at_live_your_key_here"}
resp = requests.get("https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download", headers=HEADERS)
df = pl.read_parquet(io.BytesIO(resp.content))
# Multi-gene PGx panel query
panel = ["CYP2D6", "CYP2C19", "CYP2C9", "DPYD", "TPMT", "UGT1A1"]
result = (
df.lazy()
.filter(pl.col("geneSymbol").is_in(panel))
.filter(pl.col("evidenceState").is_in(["S4", "S5"]))
.filter(pl.col("confidenceScore") > 0.75)
.select(["geneSymbol", "variantName", "drug", "clinicalSignificance", "confidenceScore"])
.sort("confidenceScore", descending=True)
.collect()
)
print(result)R — arrow + dplyr
library(httr2)
library(arrow)
library(dplyr)
req <- request("https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download") |>
req_headers(Authorization = "Bearer at_live_your_key_here") |>
req_perform()
df <- read_parquet(resp_body_raw(req))
df |>
filter(geneSymbol %in% c("CYP2D6", "CYP2C19"), confidenceScore > 0.7) |>
select(geneSymbol, variantName, drug, evidenceState, confidenceScore) |>
arrange(desc(confidenceScore))DuckDB — in-process SQL on Parquet
-- Load directly from the API response into DuckDB
INSTALL httpfs; LOAD httpfs;
SET s3_region = 'us-east-1'; -- not needed; shown for completeness
-- Or: save the Parquet locally first, then query
-- curl -H "Authorization: Bearer at_live_..." \
-- https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download \
-- -o pgx.parquet
SELECT
geneSymbol,
drug,
evidenceState,
ROUND(confidenceScore, 3) AS confidence,
clinicalSignificance
FROM read_parquet('pgx.parquet')
WHERE evidenceState IN ('S4', 'S5')
AND confidenceScore > 0.75
ORDER BY confidenceScore DESC
LIMIT 20;cURL — paginated records
# All CYP2D6 records at S4 or above, first 100
curl "https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/records?gene=CYP2D6&state=S4&limit=100" \
-H "Authorization: Bearer at_live_your_key_here" | jq '.data[] | {gene: .geneSymbol, drug, state: .evidenceState, score: .confidenceScore}'
# Download as CSV instead of Parquet
curl "https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download?format=csv" \
-H "Authorization: Bearer at_live_your_key_here" \
-o pgx-variant-map.csvData quality & provenance
Every dataset published on Attester includes an FDA Provenance Data Card — a structured attestation of where the data came from, how it was curated, and what quality checks were applied. This is what makes Attester data usable in regulated AI contexts.
The qualityDimensions object in every dataset response includes a score (0–100) and a human-readable rationale for each dimension. The overall quality field is the weighted composite. These scores are recomputed on every ingestion run.
Rate limits & plans
| Explorer | Builder | Enterprise | |
|---|---|---|---|
| API access | Catalog only | Full /v1 access | Full /v1 access |
| Monthly API calls | — | 100,000 | Unlimited |
| Records per page | — | 500 | 500 |
| Download format | — | Parquet + CSV | Parquet + CSV |
| Provenance card | Preview | Full | Full + custom |
| Support | Community | Dedicated SLA | |
| Price | Free | Contact us | Contact us |
API call limits reset on the first of each month. Requests that exceed the limit return 429 Too Many Requests. Rate limiting is tracked at the team level, not per key — all keys in a team share the monthly quota.