API Reference v1

Validated biomedical evidence,
delivered by API

Query CIViC, PharmGKB, and DailyMed records — curated, deduplicated, and scored for confidence — without building or maintaining any pipelines. Every record carries an evidence state (S0–S5), a Markov confidence score, and a full source audit trail.

3 data sources
CIViC · PharmGKB · DailyMed
Evidence-graded
S0–S5 + confidence score
Parquet + CSV
Columnar for ML pipelines
FDA provenance
Audit-ready data cards

Authentication

All /v1 endpoints require a Bearer API key in the Authorization header. Keys are scoped to your team and prefixed with at_live_.

curl https://api.attester.ai/v1/datasets \
  -H "Authorization: Bearer at_live_your_key_here"

Getting a key

Keys are created from your account dashboard. The full key is shown exactly once — store it in a secret manager (AWS Secrets Manager, Vault, GitHub Actions secrets). We store only a SHA-256 hash.

Builder plan required/v1 access requires an active Builder or Enterprise subscription. Explorer accounts receive catalog metadata but cannot call the records or download endpoints.View plans →

Quick start

Three steps from zero to your first PGx DataFrame:

1
Create an API key
Go to Account → API Keys, click Create key. Copy the full key — you won't see it again.
2
Pick a dataset
Call /v1/datasets to get the catalog. Use the slug field as the dataset identifier for all subsequent calls.
3
Query or download
Stream paginated records with /v1/datasets/:slug/records, or pull the full Parquet file for batch ML workflows.
import requests

API_KEY = "at_live_your_key_here"
BASE    = "https://api.attester.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Step 1 — list datasets
datasets = requests.get(f"{BASE}/datasets", headers=HEADERS).json()
slug = datasets["datasets"][0]["slug"]      # e.g. "pharmacogenomics-variant-map"

# Step 2 — fetch the first page of evidence records
page = requests.get(
    f"{BASE}/datasets/{slug}/records",
    headers=HEADERS,
    params={"gene": "CYP2D6", "minConfidence": "0.7", "limit": 100}
).json()

print(f"{page['meta']['total']} records for CYP2D6 above 0.70 confidence")

# Step 3 — download the full Parquet file into a DataFrame
import pandas as pd, io

parquet_bytes = requests.get(f"{BASE}/datasets/{slug}/download", headers=HEADERS).content
df = pd.read_parquet(io.BytesIO(parquet_bytes))
print(df.head())

Endpoints

Base URL: https://api.attester.ai

GET/v1/datasets

Returns all published datasets with metadata, schema, provenance dimensions, and permit/prohibit use policies. Use this to discover available datasets and their slugs.

Response
{
  "count": 2,
  "datasets": [
    {
      "id": "cm9...",
      "slug": "pharmacogenomics-variant-map",
      "name": "Pharmacogenomics Variant Map",
      "domain": "Genomics",
      "modality": "Structured",
      "license": "commercial",
      "quality": 81,
      "records": "14,312",
      "auditDate": "April 2026",
      "permitUses": ["commercial_ml", "research", "clinical_decision_support"],
      "prohibitUses": ["redistribution_without_attribution"],
      "qualityDimensions": {
        "completeness":  { "score": 85, "rationale": "All mandatory fields present across CIViC + PharmGKB sources" },
        "sourceTrust":   { "score": 83, "rationale": "Primary sources: CIViC (CC0), PharmGKB (CC BY-SA 4.0), DailyMed (public domain)" },
        "schemaStd":     { "score": 76, "rationale": "Unified evidence level schema across 3 sources" },
        "curationDepth": { "score": 79, "rationale": "12,847 ingested, 1,403 PhD-reviewed and approved" }
      }
    }
  ]
}
GET/v1/datasets/:slug

Full dataset detail including the FDA Provenance Data Card, sample rows, schema definition, and source methodology. Use the provenance card in regulatory submissions.

Response
{
  "dataset": {
    "slug": "pharmacogenomics-variant-map",
    "provenanceCard": {
      "datasetName": "Pharmacogenomics Variant Map",
      "version": "1.0.0",
      "auditDate": "2026-04-19",
      "sources": [
        { "name": "CIViC",    "license": "CC0-1.0",      "recordCount": 9841 },
        { "name": "PharmGKB", "license": "CC-BY-SA-4.0", "recordCount": 3204 },
        { "name": "DailyMed", "license": "public_domain","recordCount": 1267 }
      ],
      "attestations": [
        { "type": "phd_review", "reviewerCredentials": "PhD Pharmacogenomics, Stanford", "count": 1403 }
      ],
      "qualityScore": 81,
      "integrityHash": "sha256:4a8f..."
    }
  }
}
GET/v1/datasets/:slug/records

Paginated list of real evidence records linked to a dataset. Only returns records with reviewStatus: approved. Supports filtering by gene, drug, evidence state, and confidence score.

Parameters
pageintegerPage number (default: 1)
limitintegerRecords per page (max 500, default 100)
genestringFilter by gene symbol, e.g. CYP2D6, BRAF (case-insensitive, partial match)
drugstringFilter by drug name (case-insensitive, partial match)
statestringFilter by evidence state: S0–S5 (S5 = strongest regulatory evidence)
minConfidencefloatMinimum confidence score 0.0–1.0 (Markov-derived, accounts for evidence state + ratings)
Response
{
  "data": [
    {
      "id": "uuid...",
      "evidenceState": "S5",
      "confidenceScore": 0.91,
      "source": "civic",
      "sourceId": "1234",
      "sourceUrl": "https://civicdb.org/evidence/1234/summary",
      "license": "CC0-1.0",
      "geneSymbol": "CYP2D6",
      "variantName": "*4",
      "drug": "codeine",
      "disease": "Pain",
      "evidenceLevel": "A",
      "evidenceType": "PREDICTIVE",
      "clinicalSignificance": "Sensitivity/Response",
      "evidenceRating": 5,
      "description": "CYP2D6 poor metabolizers cannot convert codeine to morphine..."
    }
  ],
  "meta": { "total": 847, "page": 1, "limit": 100, "pages": 9, "datasetSlug": "pharmacogenomics-variant-map" },
  "_links": { "next": "/v1/datasets/pharmacogenomics-variant-map/records?page=2&limit=100" }
}
GET/v1/datasets/:slug/download

Stream the full dataset file. Defaults to Parquet (columnar, fast to load in pandas/polars/DuckDB). Append ?format=csv to get a CSV instead.

Parameters
formatstringOutput format: omit for Parquet (default), or csv

Response headers include X-Attester-Format (parquet or csv) so you can detect the format programmatically.

Response
# Binary Parquet or CSV stream
# Content-Type: application/x-parquet  (or text/csv)
# Content-Disposition: attachment; filename="pharmacogenomics-variant-map-full.parquet"
# X-Attester-Format: parquet

Filtering & pagination

Filters are combinable. All string filters are case-insensitive and support partial matches. Use state=S4 or state=S5 plus a minConfidence threshold to get publication-ready, high-confidence records only.

Evidence states

StateMeaningEquivalent sourceTypical confidence
S5Regulatory / guideline-levelCPIC guideline, FDA label, PharmGKB 1A0.85–1.0
S4Strong clinical evidenceCIViC level A, PharmGKB 1B0.70–0.90
S3Moderate evidenceCIViC level B, PharmGKB 2A/2B0.50–0.75
S2Preliminary / emergingCIViC level C/D, PharmGKB 30.30–0.55
S1Case reports / speculativePharmGKB 4, single case reports0.10–0.35
S0UnclassifiedInsufficient evidence to classify0.0–0.15

Pagination

GET /v1/datasets/pharmacogenomics-variant-map/records?page=2&limit=100

# Response includes _links.next when more pages exist:
{
  "meta":   { "total": 847, "page": 2, "limit": 100, "pages": 9 },
  "_links": { "next": "/v1/datasets/.../records?page=3&limit=100" }
}

Code examples

Python — pandas + Parquet

import requests, pandas as pd, io

HEADERS = {"Authorization": "Bearer at_live_your_key_here"}
BASE    = "https://api.attester.ai/v1"
SLUG    = "pharmacogenomics-variant-map"

# Download the full Parquet file (preferred for ML pipelines)
resp = requests.get(f"{BASE}/datasets/{SLUG}/download", headers=HEADERS)
df = pd.read_parquet(io.BytesIO(resp.content))

# Filter for high-confidence CYP2D6 records
pgx = df[(df["geneSymbol"] == "CYP2D6") & (df["confidenceScore"] > 0.7)]
print(pgx[["variantName", "drug", "evidenceState", "confidenceScore"]].head(10))

Python — polars (fast, lazy evaluation)

import polars as pl, requests, io

HEADERS = {"Authorization": "Bearer at_live_your_key_here"}
resp    = requests.get("https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download", headers=HEADERS)

df = pl.read_parquet(io.BytesIO(resp.content))

# Multi-gene PGx panel query
panel = ["CYP2D6", "CYP2C19", "CYP2C9", "DPYD", "TPMT", "UGT1A1"]
result = (
    df.lazy()
    .filter(pl.col("geneSymbol").is_in(panel))
    .filter(pl.col("evidenceState").is_in(["S4", "S5"]))
    .filter(pl.col("confidenceScore") > 0.75)
    .select(["geneSymbol", "variantName", "drug", "clinicalSignificance", "confidenceScore"])
    .sort("confidenceScore", descending=True)
    .collect()
)
print(result)

R — arrow + dplyr

library(httr2)
library(arrow)
library(dplyr)

req <- request("https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download") |>
  req_headers(Authorization = "Bearer at_live_your_key_here") |>
  req_perform()

df <- read_parquet(resp_body_raw(req))

df |>
  filter(geneSymbol %in% c("CYP2D6", "CYP2C19"), confidenceScore > 0.7) |>
  select(geneSymbol, variantName, drug, evidenceState, confidenceScore) |>
  arrange(desc(confidenceScore))

DuckDB — in-process SQL on Parquet

-- Load directly from the API response into DuckDB
INSTALL httpfs; LOAD httpfs;
SET s3_region = 'us-east-1';  -- not needed; shown for completeness

-- Or: save the Parquet locally first, then query
-- curl -H "Authorization: Bearer at_live_..." \
--      https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download \
--      -o pgx.parquet

SELECT
    geneSymbol,
    drug,
    evidenceState,
    ROUND(confidenceScore, 3) AS confidence,
    clinicalSignificance
FROM read_parquet('pgx.parquet')
WHERE evidenceState IN ('S4', 'S5')
  AND confidenceScore > 0.75
ORDER BY confidenceScore DESC
LIMIT 20;

cURL — paginated records

# All CYP2D6 records at S4 or above, first 100
curl "https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/records?gene=CYP2D6&state=S4&limit=100" \
  -H "Authorization: Bearer at_live_your_key_here" | jq '.data[] | {gene: .geneSymbol, drug, state: .evidenceState, score: .confidenceScore}'

# Download as CSV instead of Parquet
curl "https://api.attester.ai/v1/datasets/pharmacogenomics-variant-map/download?format=csv" \
  -H "Authorization: Bearer at_live_your_key_here" \
  -o pgx-variant-map.csv

Data quality & provenance

Every dataset published on Attester includes an FDA Provenance Data Card — a structured attestation of where the data came from, how it was curated, and what quality checks were applied. This is what makes Attester data usable in regulated AI contexts.

Completeness
Are all mandatory fields populated? Missing gene symbols, drugs, or significance codes are flagged.
Schema standardization
CIViC, PharmGKB, and DailyMed use different evidence level systems. We normalize to a unified S0–S5 scale.
Source trust
Each source carries a trust score based on editorial rigor: CIViC (CC0, expert-curated) scores 85/100; PharmGKB (CPIC-aligned) scores 82/100.
Curation depth
Records reviewed and approved by credentialed PhD reviewers contribute to a higher curation depth score, tracked per dataset.

The qualityDimensions object in every dataset response includes a score (0–100) and a human-readable rationale for each dimension. The overall quality field is the weighted composite. These scores are recomputed on every ingestion run.

Rate limits & plans

ExplorerBuilderEnterprise
API accessCatalog onlyFull /v1 accessFull /v1 access
Monthly API calls100,000Unlimited
Records per page500500
Download formatParquet + CSVParquet + CSV
Provenance cardPreviewFullFull + custom
SupportCommunityEmailDedicated SLA
PriceFreeContact usContact us

API call limits reset on the first of each month. Requests that exceed the limit return 429 Too Many Requests. Rate limiting is tracked at the team level, not per key — all keys in a team share the monthly quota.

Ready to integrate?
Get your API key in 60 seconds. No credit card required to explore the catalog. Builder plan unlocks full Parquet downloads and the records API.