The Accessor Class

For every type of file, where it’s found in the NSNSD file hierarchy and how it’s parsed is different. However, once you know the where and how, the logic for reading many of those types of files from multiple sites is the same. The Accessor class provides this generalization: given a function for reading a datatype from a single site, each instance provides methods for applying that function to multiple sites.

Example: creating an accessor for RAVEN data:

import soundDENA
import pandas as pd

def parseRAVENfile(pathToFile):
    data = pd.read_table(pathToFile)
    data.rename(columns= {"Species_"+i: "Species "+i for i in "1234"}, inplace= True)
    return data

pathToRAVENfileWithinSite = "02 ANALYSIS/RAVEN/table_{unit}{site}.txt"

ravenAccessor = soundDENA.Accessor(parseRAVENfile,
                                 pathToRAVENfileWithinSite)

## You can now do things like:

ravenAccessor.all(["DENAFANG2013", "DENAWEBU2009"])
## Returns a DataFrame of RAVEN data for both the sites

for data, unit, site, year in ravenAccessor(soundDENA.metadata.query("elevation < 500")):
    unique = sum(data["Species "+i].nunique() for i in "1234")
    print("{} in {}: {} unique species".format(site, year, unique))
class soundDENA.Accessor(parserFunc, pathToData)[source]
__call__(sites, quiet=True, **kwargs)[source]

Iterate site-by-site over a type of data.

Data is yielded in the same order as the given sites, though some sites may be missing if they lack data.

Parameters:
  • sites (iterable) – siteID strings, or a pandas structure indexed by siteID
  • quiet (boolean, optional) – Whether to not print info about any errors that occur
  • kwargs – Any keyword arguments specific to this filetype’s parse() or pathToData function
Yields:
  • data (varies) – Data from each site in the format returned by parse()
  • unit (str)
  • site (str)
  • year (str)
__init__(parserFunc, pathToData)[source]

Instantiate an Accessor for a specific filetype by giving a function to parse that kind of file, and where that file is located.

Parameters:
  • parserFunc (function) – A function which, given the path(s) to a file, parses and returns the data. This overrides the parse() method of the instance.
  • pathToData (str, pathlib.Path, or function) – Where to find the filetype in a site’s data directory

The docstring of parserFunc also will become the docstring of the Accessor instance.

If pathToData is a string or pathlib.Path, it should be the path to this filetype relative to a data directory. The path can look like a Python format string that takes the keyword arguments unit, site, and year. If the path contains a * character, it will also be passed to glob, and the resulting list will be converted to pathlib.Paths and returned. (In this case, the parserFunc should also expect a list of paths.)

Examples for pathToData:

  • "01 DATA/PHOTOS/CardinalPhotoComposite_{unit}{site}.jpg"
  • soundDENA.paths.spl / "SRCID_{unit}{site}.txt"
  • soundDENA.paths.nvspl / "NVSPL_{unit}{site}*.txt"

If pathToData is a function, it should take the details of a specific site and return the path to the file(s) within that site, with this signature:

pathToData(dataDir, unit, site, year, **kwargs)
Parameters:
  • dataDir (pathlib.Path) – The root data directory for a site
  • unit (str) – Unit of the site
  • site (str) – Site code of the site
  • year (str) – Year of the site
  • kwargs – Any keyword arguments related to path selection. The same keyword arguments will be given to both pathToData and parserFunc, so they should both be able to handle unexpected keyword arguments by having **kwargs as the last item in their argument lists.
Returns:

Varies; the result is passed directly to parserFunc. Often, a pathlib.Path or list of pathlib.Path to the file(s) to be parsed within the specific site’s data directory.

access(site, **kwargs)[source]

Read data from one site.

Parameters:
  • site

    A single site or data directory specifier:

    • siteID string
    • pathlib.Path to a data directory
    • tuple of (unit, site, year) (all strings)
    • tuple of (dataDir, unit, site, year) (all strings, dataDir as pathlib.Path)
  • kwargs – Any keyword arguments specific to this filetype’s parse() or pathToData function
Returns:

varies – The result of the instance’s parse() function (typically a pandas DataFrame or Panel)

all(sites, quiet=True, **kwargs)[source]

Read data from all specified sites into a single DataFrame or dict.

Parameters:
  • sites (iterable) – siteID strings, or a pandas structure indexed by siteID
  • quiet (boolean, optional) – Whether to not print info about any errors that occur
  • kwargs – Any keyword arguments specific to this filetype’s parse() or pathToData function
Returns:

NDFrame or dict – If parse() retuns a pandas NDFrame for each site, all sites will be concatenated into one NDFrame, with siteID as outermost level of hierarchical index. Otherwise, returns a dict of { siteID: data }

static parse(filepath, **kwargs)[source]

Parse data from disk located at filepath. This method is overridden in each instance by passing a parse function into __init__().

All parse functions should have this signature:

Parameters:
  • filepath (pathlib.Path, or iterable of pathlib.Path) – The path(s) from which to read data
  • kwargs – Any keyword arguments specific to reading this filetype
Returns:

varies – Depends on what type of data is read. Typically, a pandas NDFrame.

paths(sites, quiet=True, **kwargs)[source]

Iterate site-by-site over the paths to this sort of data file.

Parameters:
  • sites (iterable of str, or NDFrame) – siteID strings, or a pandas structure indexed by siteID
  • kwargs – Any keyword arguments specific to this filetype’s pathToData function
Yields:
  • path ((list of) pathlib.Path) – Path(s) to the data file(s) for the site
  • unit (str)
  • site (str)
  • year (str)