API Download Options

Step 1: Install dependencies

Run this in your terminal:

pip install pyDataverse httpx

Step 2: Configure connection

Copy–paste into your script and replace <YOUR_API_TOKEN> ( get your API token on Harvard Dataverse by logging in, clicking your account name → “API Token” → “Create Token”)

BASE_URL   = "https://dataverse.harvard.edu"
API_TOKEN  = "<YOUR_API_TOKEN>"
DV_ALIAS   = "US-EPA-SIM-FACILITY"

Step 3: Fetch the tree

Retrieve all datasets and files under the collection:

from pyDataverse.api import NativeApi
import json

api  = NativeApi(BASE_URL, API_TOKEN)
tree = api.get_children(DV_ALIAS, children_types=["datasets","datafiles"])

with open("facility_tree.json","w") as f:
    json.dump(tree, f, indent=2)

Step 4: Download files

Loop through and download every datafile:

from pyDataverse.api import DataAccessApi

data_api = DataAccessApi(BASE_URL, API_TOKEN)

for ds in tree:
    for f in ds.get("children", []):
        if f.get("type") != "datafile":
            continue
        fid     = f["datafile_id"]
        name    = f["filename"]
        content = data_api.get_datafile(fid).content
        with open(name, "wb") as out:
            out.write(content)
        print("Saved:", name)

Step 1: Install dependencies

Run this in your terminal:

pip install pyDataverse httpx

Step 2: Configure connection

Copy–paste into your script and replace <YOUR_API_TOKEN> ( get your API token on Harvard Dataverse by logging in, clicking your account name → “API Token” → “Create Token”)

BASE_URL   = "https://dataverse.harvard.edu"
API_TOKEN  = "<YOUR_API_TOKEN>"
DV_ALIAS   = "US-EPA-SIM-STATE"

Step 3: Fetch the tree

Retrieve all datasets and files under the collection:

from pyDataverse.api import NativeApi
import json

api  = NativeApi(BASE_URL, API_TOKEN)
tree = api.get_children(DV_ALIAS, children_types=["datasets","datafiles"])

with open("state_tree.json","w") as f:
    json.dump(tree, f, indent=2)

Step 4: Download files

Loop through and download every datafile:

from pyDataverse.api import DataAccessApi

data_api = DataAccessApi(BASE_URL, API_TOKEN)

for ds in tree:
    for f in ds.get("children", []):
        if f.get("type") != "datafile":
            continue
        fid     = f["datafile_id"]
        name    = f["filename"]
        content = data_api.get_datafile(fid).content
        with open(name, "wb") as out:
            out.write(content)
        print("Saved:", name)

Step 1: Install dependencies

Run this in your terminal:

pip install pyDataverse httpx

Step 2: Configure connection

Copy–paste into your script and replace <YOUR_API_TOKEN> ( get your API token on Harvard Dataverse by logging in, clicking your account name → “API Token” → “Create Token”)

BASE_URL   = "https://dataverse.harvard.edu"
API_TOKEN  = "<YOUR_API_TOKEN>"
DV_ALIAS   = "US-EPA-CAMPD"

Step 3: Fetch the tree

Retrieve all datasets and files under the collection:

from pyDataverse.api import NativeApi
import json

api  = NativeApi(BASE_URL, API_TOKEN)
tree = api.get_children(DV_ALIAS, children_types=["datasets","datafiles"])

with open("campd_tree.json","w") as f:
    json.dump(tree, f, indent=2)

Step 4: Download files

Loop through and download every datafile:

from pyDataverse.api import DataAccessApi

data_api = DataAccessApi(BASE_URL, API_TOKEN)

for ds in tree:
    for f in ds.get("children", []):
        if f.get("type") != "datafile":
            continue
        fid     = f["datafile_id"]
        name    = f["filename"]
        content = data_api.get_datafile(fid).content
        with open(name, "wb") as out:
            out.write(content)
        print("Saved:", name)

API Download Instructions

Step 1: Install dependencies

Step 2: Configure connection

Step 3: Fetch the tree

Step 4: Download files

Step 1: Install dependencies

Step 2: Configure connection

Step 3: Fetch the tree

Step 4: Download files

Step 1: Install dependencies

Step 2: Configure connection

Step 3: Fetch the tree

Step 4: Download files