Run this in your terminal:
pip install pyDataverse httpx
Copy–paste into your script and replace <YOUR_API_TOKEN>
(
get your API token on Harvard Dataverse
by logging in, clicking your account name → “API Token” → “Create Token”)
BASE_URL = "https://dataverse.harvard.edu"
API_TOKEN = "<YOUR_API_TOKEN>"
DV_ALIAS = "US-EPA-SIM-FACILITY"
Retrieve all datasets and files under the collection:
from pyDataverse.api import NativeApi
import json
api = NativeApi(BASE_URL, API_TOKEN)
tree = api.get_children(DV_ALIAS, children_types=["datasets","datafiles"])
with open("facility_tree.json","w") as f:
json.dump(tree, f, indent=2)
Loop through and download every datafile:
from pyDataverse.api import DataAccessApi
data_api = DataAccessApi(BASE_URL, API_TOKEN)
for ds in tree:
for f in ds.get("children", []):
if f.get("type") != "datafile":
continue
fid = f["datafile_id"]
name = f["filename"]
content = data_api.get_datafile(fid).content
with open(name, "wb") as out:
out.write(content)
print("Saved:", name)
Run this in your terminal:
pip install pyDataverse httpx
Copy–paste into your script and replace <YOUR_API_TOKEN>
(
get your API token on Harvard Dataverse
by logging in, clicking your account name → “API Token” → “Create Token”)
BASE_URL = "https://dataverse.harvard.edu"
API_TOKEN = "<YOUR_API_TOKEN>"
DV_ALIAS = "US-EPA-SIM-STATE"
Retrieve all datasets and files under the collection:
from pyDataverse.api import NativeApi
import json
api = NativeApi(BASE_URL, API_TOKEN)
tree = api.get_children(DV_ALIAS, children_types=["datasets","datafiles"])
with open("state_tree.json","w") as f:
json.dump(tree, f, indent=2)
Loop through and download every datafile:
from pyDataverse.api import DataAccessApi
data_api = DataAccessApi(BASE_URL, API_TOKEN)
for ds in tree:
for f in ds.get("children", []):
if f.get("type") != "datafile":
continue
fid = f["datafile_id"]
name = f["filename"]
content = data_api.get_datafile(fid).content
with open(name, "wb") as out:
out.write(content)
print("Saved:", name)
Run this in your terminal:
pip install pyDataverse httpx
Copy–paste into your script and replace <YOUR_API_TOKEN>
(
get your API token on Harvard Dataverse
by logging in, clicking your account name → “API Token” → “Create Token”)
BASE_URL = "https://dataverse.harvard.edu"
API_TOKEN = "<YOUR_API_TOKEN>"
DV_ALIAS = "US-EPA-CAMPD"
Retrieve all datasets and files under the collection:
from pyDataverse.api import NativeApi
import json
api = NativeApi(BASE_URL, API_TOKEN)
tree = api.get_children(DV_ALIAS, children_types=["datasets","datafiles"])
with open("campd_tree.json","w") as f:
json.dump(tree, f, indent=2)
Loop through and download every datafile:
from pyDataverse.api import DataAccessApi
data_api = DataAccessApi(BASE_URL, API_TOKEN)
for ds in tree:
for f in ds.get("children", []):
if f.get("type") != "datafile":
continue
fid = f["datafile_id"]
name = f["filename"]
content = data_api.get_datafile(fid).content
with open(name, "wb") as out:
out.write(content)
print("Saved:", name)