API Reference v1

Validated biomedical evidence,
delivered by API

Query CIViC, PharmGKB, and DailyMed records — curated, deduplicated, and scored for confidence — without building or maintaining any pipelines. Every record carries an evidence state (S0–S5), a Markov confidence score, and a full source audit trail.

3 data sources

CIViC · PharmGKB · DailyMed

Evidence-graded

S0–S5 + confidence score

Parquet + CSV

Columnar for ML pipelines

FDA provenance

Audit-ready data cards

Authentication

All /v1 endpoints require a Bearer API key in the Authorization header. Keys are scoped to your team and prefixed with at_live_.

curl https://api.attester.ai/v1/datasets \
  -H "Authorization: Bearer at_live_your_key_here"

Getting a key

Keys are created from your account dashboard. The full key is shown exactly once — store it in a secret manager (AWS Secrets Manager, Vault, GitHub Actions secrets). We store only a SHA-256 hash.

Builder plan required — /v1 access requires an active Builder or Enterprise subscription. Explorer accounts receive catalog metadata but cannot call the records or download endpoints.View plans →

Quick start

Three steps from zero to your first PGx DataFrame:

Create an API key

Go to Account → API Keys, click Create key. Copy the full key — you won't see it again.

Pick a dataset

Call /v1/datasets to get the catalog. Use the slug field as the dataset identifier for all subsequent calls.

Query or download

Stream paginated records with /v1/datasets/:slug/records, or pull the full Parquet file for batch ML workflows.

import requests

API_KEY = "at_live_your_key_here"
BASE    = "https://api.attester.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Step 1 — list datasets
datasets = requests.get(f"{BASE}/datasets", headers=HEADERS).json()
slug = datasets["datasets"][0]["slug"]      # e.g. "pharmacogenomics-variant-map"

# Step 2 — fetch the first page of evidence records
page = requests.get(
    f"{BASE}/datasets/{slug}/records",
    headers=HEADERS,
    params={"gene": "CYP2D6", "minConfidence": "0.7", "limit": 100}
).json()

print(f"{page['meta']['total']} records for CYP2D6 above 0.70 confidence")

# Step 3 — download the full Parquet file into a DataFrame
import pandas as pd, io

parquet_bytes = requests.get(f"{BASE}/datasets/{slug}/download", headers=HEADERS).content
df = pd.read_parquet(io.BytesIO(parquet_bytes))
print(df.head())

Endpoints

Base URL: https://api.attester.ai

GET/v1/datasets

Returns all published datasets with metadata, schema, provenance dimensions, and permit/prohibit use policies. Use this to discover available datasets and their slugs.

Response

{
  "count": 2,
  "datasets": [
    {
      "id": "cm9...",
      "slug": "pharmacogenomics-variant-map",
      "name": "Pharmacogenomics Variant Map",
      "domain": "Genomics",
      "modality": "Structured",
      "license": "commercial",
      "quality": 81,
      "records": "14,312",
      "auditDate": "April 2026",
      "permitUses": ["commercial_ml", "research", "clinical_decision_support"],
      "prohibitUses": ["redistribution_without_attribution"],
      "qualityDimensions": {
        "completeness":  { "score": 85, "rationale": "All mandatory fields present across CIViC + PharmGKB sources" },
        "sourceTrust":   { "score": 83, "rationale": "Primary sources: CIViC (CC0), PharmGKB (CC BY-SA 4.0), DailyMed (public domain)" },
        "schemaStd":     { "score": 76, "rationale": "Unified evidence level schema across 3 sources" },
        "curationDepth": { "score": 79, "rationale": "12,847 ingested, 1,403 PhD-reviewed and approved" }
      }
    }
  ]
}

GET/v1/datasets/:slug

Full dataset detail including the FDA Provenance Data Card, sample rows, schema definition, and source methodology. Use the provenance card in regulatory submissions.

Response

{
  "dataset": {
    "slug": "pharmacogenomics-variant-map",
    "provenanceCard": {
      "datasetName": "Pharmacogenomics Variant Map",
      "version": "1.0.0",
      "auditDate": "2026-04-19",
      "sources": [
        { "name": "CIViC",    "license": "CC0-1.0",      "recordCount": 9841 },
        { "name": "PharmGKB", "license": "CC-BY-SA-4.0", "recordCount": 3204 },
        { "name": "DailyMed", "license": "public_domain","recordCount": 1267 }
      ],
      "attestations": [
        { "type": "phd_review", "reviewerCredentials": "PhD Pharmacogenomics, Stanford", "count": 1403 }
      ],
      "qualityScore": 81,
      "integrityHash": "sha256:4a8f..."
    }
  }
}

GET/v1/datasets/:slug/records

Paginated list of real evidence records linked to a dataset. Only returns records with reviewStatus: approved. Supports filtering by gene, drug, evidence state, and confidence score.

Parameters

`page`	integer	Page number (default: 1)
`limit`	integer	Records per page (max 500, default 100)
`gene`	string	Filter by gene symbol, e.g. CYP2D6, BRAF (case-insensitive, partial match)
`drug`	string	Filter by drug name (case-insensitive, partial match)
`state`	string	Filter by evidence state: S0–S5 (S5 = strongest regulatory evidence)
`minConfidence`	float	Minimum confidence score 0.0–1.0 (Markov-derived, accounts for evidence state + ratings)

Response

{
  "data": [
    {
      "id": "uuid...",
      "evidenceState": "S5",
      "confidenceScore": 0.91,
      "source": "civic",
      "sourceId": "1234",
      "sourceUrl": "https://civicdb.org/evidence/1234/summary",
      "license": "CC0-1.0",
      "geneSymbol": "CYP2D6",
      "variantName": "*4",
      "drug": "codeine",
      "disease": "Pain",
      "evidenceLevel": "A",
      "evidenceType": "PREDICTIVE",
      "clinicalSignificance": "Sensitivity/Response",
      "evidenceRating": 5,
      "description": "CYP2D6 poor metabolizers cannot convert codeine to morphine..."
    }
  ],
  "meta": { "total": 847, "page": 1, "limit": 100, "pages": 9, "datasetSlug": "pharmacogenomics-variant-map" },
  "_links": { "next": "/v1/datasets/pharmacogenomics-variant-map/records?page=2&limit=100" }
}

GET/v1/datasets/:slug/download

Stream the full dataset file. Defaults to Parquet (columnar, fast to load in pandas/polars/DuckDB). Append ?format=csv to get a CSV instead.

Parameters

format string Output format: omit for Parquet (default), or csv

Response headers include X-Attester-Format (parquet or csv) so you can detect the format programmatically.

Response

# Binary Parquet or CSV stream
# Content-Type: application/x-parquet  (or text/csv)
# Content-Disposition: attachment; filename="pharmacogenomics-variant-map-full.parquet"
# X-Attester-Format: parquet

Filtering & pagination

Filters are combinable. All string filters are case-insensitive and support partial matches. Use state=S4 or state=S5 plus a minConfidence threshold to get publication-ready, high-confidence records only.

Evidence states

State	Meaning	Equivalent source	Typical confidence
S5	Regulatory / guideline-level	CPIC guideline, FDA label, PharmGKB 1A	0.85–1.0
S4	Strong clinical evidence	CIViC level A, PharmGKB 1B	0.70–0.90
S3	Moderate evidence	CIViC level B, PharmGKB 2A/2B	0.50–0.75
S2	Preliminary / emerging	CIViC level C/D, PharmGKB 3	0.30–0.55
S1	Case reports / speculative	PharmGKB 4, single case reports	0.10–0.35
S0	Unclassified	Insufficient evidence to classify	0.0–0.15

Pagination

GET /v1/datasets/pharmacogenomics-variant-map/records?page=2&limit=100

# Response includes _links.next when more pages exist:
{
  "meta":   { "total": 847, "page": 2, "limit": 100, "pages": 9 },
  "_links": { "next": "/v1/datasets/.../records?page=3&limit=100" }
}

Code examples

Python — pandas + Parquet

import requests, pandas as pd, io

HEADERS = {"Authorization": "Bearer at_live_your_key_here"}
BASE    = "https://api.attester.ai/v1"
SLUG    = "pharmacogenomics-variant-map"

# Download the full Parquet file (preferred for ML pipelines)
resp = requests.get(f"{BASE}/datasets/{SLUG}/download", headers=HEADERS)
df = pd.read_parquet(io.BytesIO(resp.content))

# Filter for high-confidence CYP2D6 records
pgx = df[(df["geneSymbol"] == "CYP2D6") & (df["confidenceScore"] > 0.7)]
print(pgx[["variantName", "drug", "evidenceState", "confidenceScore"]].head(10))

Python — polars (fast, lazy evaluation)

import polars as pl, requests, io

HEADERS = {"Authorization": "Bearer at_live_your_key_here"}
resp    = requests.get("https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download", headers=HEADERS)

df = pl.read_parquet(io.BytesIO(resp.content))

# Multi-gene PGx panel query
panel = ["CYP2D6", "CYP2C19", "CYP2C9", "DPYD", "TPMT", "UGT1A1"]
result = (
    df.lazy()
    .filter(pl.col("geneSymbol").is_in(panel))
    .filter(pl.col("evidenceState").is_in(["S4", "S5"]))
    .filter(pl.col("confidenceScore") > 0.75)
    .select(["geneSymbol", "variantName", "drug", "clinicalSignificance", "confidenceScore"])
    .sort("confidenceScore", descending=True)
    .collect()
)
print(result)

R — arrow + dplyr

library(httr2)
library(arrow)
library(dplyr)

req <- request("https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download") |>
  req_headers(Authorization = "Bearer at_live_your_key_here") |>
  req_perform()

df <- read_parquet(resp_body_raw(req))

df |>
  filter(geneSymbol %in% c("CYP2D6", "CYP2C19"), confidenceScore > 0.7) |>
  select(geneSymbol, variantName, drug, evidenceState, confidenceScore) |>
  arrange(desc(confidenceScore))

DuckDB — in-process SQL on Parquet

-- Load directly from the API response into DuckDB
INSTALL httpfs; LOAD httpfs;
SET s3_region = 'us-east-1';  -- not needed; shown for completeness

-- Or: save the Parquet locally first, then query
-- curl -H "Authorization: Bearer at_live_..." \
--      https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download \
--      -o pgx.parquet

SELECT
    geneSymbol,
    drug,
    evidenceState,
    ROUND(confidenceScore, 3) AS confidence,
    clinicalSignificance
FROM read_parquet('pgx.parquet')
WHERE evidenceState IN ('S4', 'S5')
  AND confidenceScore > 0.75
ORDER BY confidenceScore DESC
LIMIT 20;

cURL — paginated records

# All CYP2D6 records at S4 or above, first 100
curl "https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/records?gene=CYP2D6&state=S4&limit=100" \
  -H "Authorization: Bearer at_live_your_key_here" | jq '.data[] | {gene: .geneSymbol, drug, state: .evidenceState, score: .confidenceScore}'

# Download as CSV instead of Parquet
curl "https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download?format=csv" \
  -H "Authorization: Bearer at_live_your_key_here" \
  -o pgx-variant-map.csv

Data quality & provenance

Every dataset published on Attester includes an FDA Provenance Data Card — a structured attestation of where the data came from, how it was curated, and what quality checks were applied. This is what makes Attester data usable in regulated AI contexts.

Completeness

Are all mandatory fields populated? Missing gene symbols, drugs, or significance codes are flagged.

Schema standardization

CIViC, PharmGKB, and DailyMed use different evidence level systems. We normalize to a unified S0–S5 scale.

Source trust

Each source carries a trust score based on editorial rigor: CIViC (CC0, expert-curated) scores 85/100; PharmGKB (CPIC-aligned) scores 82/100.

Curation depth

Records reviewed and approved by credentialed PhD reviewers contribute to a higher curation depth score, tracked per dataset.

The qualityDimensions object in every dataset response includes a score (0–100) and a human-readable rationale for each dimension. The overall quality field is the weighted composite. These scores are recomputed on every ingestion run.

Rate limits & plans

	Explorer	Builder	Enterprise
API access	Catalog only	Full /v1 access	Full /v1 access
Monthly API calls	—	100,000	Unlimited
Records per page	—	500	500
Download format	—	Parquet + CSV	Parquet + CSV
Provenance card	Preview	Full	Full + custom
Support	Community	Email	Dedicated SLA
Price	Free	Contact us	Contact us

API call limits reset on the first of each month. Requests that exceed the limit return 429 Too Many Requests. Rate limiting is tracked at the team level, not per key — all keys in a team share the monthly quota.

Ready to integrate?

Get your API key in 60 seconds. No credit card required to explore the catalog. Builder plan unlocks full Parquet downloads and the records API.

Get started free →View pricing

Validated biomedical evidence,delivered by API

Authentication

Getting a key

Quick start

Endpoints

Filtering & pagination

Evidence states

Pagination

Code examples

Python — pandas + Parquet

Python — polars (fast, lazy evaluation)

R — arrow + dplyr

DuckDB — in-process SQL on Parquet

cURL — paginated records

Data quality & provenance

Rate limits & plans

Validated biomedical evidence,
delivered by API