The Accessor Class¶
For every type of file, where it’s found in the NSNSD file hierarchy and how it’s parsed is different. However, once you know the where and how, the logic for reading many of those types of files from multiple sites is the same. The Accessor class provides this generalization: given a function for reading a datatype from a single site, each instance provides methods for applying that function to multiple sites.
Example: creating an accessor for RAVEN data:
import soundDENA
import pandas as pd
def parseRAVENfile(pathToFile):
data = pd.read_table(pathToFile)
data.rename(columns= {"Species_"+i: "Species "+i for i in "1234"}, inplace= True)
return data
pathToRAVENfileWithinSite = "02 ANALYSIS/RAVEN/table_{unit}{site}.txt"
ravenAccessor = soundDENA.Accessor(parseRAVENfile,
pathToRAVENfileWithinSite)
## You can now do things like:
ravenAccessor.all(["DENAFANG2013", "DENAWEBU2009"])
## Returns a DataFrame of RAVEN data for both the sites
for data, unit, site, year in ravenAccessor(soundDENA.metadata.query("elevation < 500")):
unique = sum(data["Species "+i].nunique() for i in "1234")
print("{} in {}: {} unique species".format(site, year, unique))
-
class
soundDENA.Accessor(parserFunc, pathToData)[source]¶ -
__call__(sites, quiet=True, **kwargs)[source]¶ Iterate site-by-site over a type of data.
Data is yielded in the same order as the given sites, though some sites may be missing if they lack data.
Parameters: - sites (iterable) – siteID strings, or a pandas structure indexed by siteID
- quiet (boolean, optional) – Whether to not print info about any errors that occur
- kwargs – Any keyword arguments specific to this filetype’s
parse()orpathToDatafunction
Yields: - data (varies) –
Data from each site in the format returned by
parse() - unit (str)
- site (str)
- year (str)
-
__init__(parserFunc, pathToData)[source]¶ Instantiate an Accessor for a specific filetype by giving a function to parse that kind of file, and where that file is located.
Parameters: - parserFunc (function) – A function which, given the path(s) to a file,
parses and returns the data. This overrides the
parse()method of the instance. - pathToData (str, pathlib.Path, or function) – Where to find the filetype in a site’s data directory
The docstring of
parserFuncalso will become the docstring of the Accessor instance.If pathToData is a string or pathlib.Path, it should be the path to this filetype relative to a data directory. The path can look like a Python format string that takes the keyword arguments
unit,site, andyear. If the path contains a*character, it will also be passed toglob, and the resulting list will be converted to pathlib.Paths and returned. (In this case, theparserFuncshould also expect a list of paths.)Examples for
pathToData:"01 DATA/PHOTOS/CardinalPhotoComposite_{unit}{site}.jpg"soundDENA.paths.spl / "SRCID_{unit}{site}.txt"soundDENA.paths.nvspl / "NVSPL_{unit}{site}*.txt"
If pathToData is a function, it should take the details of a specific site and return the path to the file(s) within that site, with this signature:
-
pathToData(dataDir, unit, site, year, **kwargs)¶ Parameters: - dataDir (pathlib.Path) – The root data directory for a site
- unit (str) – Unit of the site
- site (str) – Site code of the site
- year (str) – Year of the site
- kwargs – Any keyword arguments related to path selection. The same keyword arguments will be given to both pathToData and parserFunc, so they should both be able to handle unexpected keyword arguments by having **kwargs as the last item in their argument lists.
Returns: Varies; the result is passed directly to
parserFunc. Often, a pathlib.Path or list of pathlib.Path to the file(s) to be parsed within the specific site’s data directory.
- parserFunc (function) – A function which, given the path(s) to a file,
parses and returns the data. This overrides the
-
access(site, **kwargs)[source]¶ Read data from one site.
Parameters: - site –
A single site or data directory specifier:
- siteID string
- pathlib.Path to a data directory
- tuple of (unit, site, year) (all strings)
- tuple of (dataDir, unit, site, year) (all strings, dataDir as pathlib.Path)
- kwargs – Any keyword arguments specific to this filetype’s
parse()orpathToDatafunction
Returns: varies – The result of the instance’s
parse()function (typically a pandas DataFrame or Panel)- site –
-
all(sites, quiet=True, **kwargs)[source]¶ Read data from all specified sites into a single DataFrame or dict.
Parameters: - sites (iterable) – siteID strings, or a pandas structure indexed by siteID
- quiet (boolean, optional) – Whether to not print info about any errors that occur
- kwargs – Any keyword arguments specific to this filetype’s
parse()orpathToDatafunction
Returns: NDFrame or dict – If
parse()retuns a pandas NDFrame for each site, all sites will be concatenated into one NDFrame, with siteID as outermost level of hierarchical index. Otherwise, returns a dict of{ siteID: data }
-
static
parse(filepath, **kwargs)[source]¶ Parse data from disk located at filepath. This method is overridden in each instance by passing a parse function into
__init__().All parse functions should have this signature:
Parameters: - filepath (pathlib.Path, or iterable of pathlib.Path) – The path(s) from which to read data
- kwargs – Any keyword arguments specific to reading this filetype
Returns: varies – Depends on what type of data is read. Typically, a pandas NDFrame.
-
paths(sites, quiet=True, **kwargs)[source]¶ Iterate site-by-site over the paths to this sort of data file.
Parameters: - sites (iterable of str, or NDFrame) – siteID strings, or a pandas structure indexed by siteID
- kwargs – Any keyword arguments specific to this filetype’s
pathToDatafunction
Yields: - path ((list of) pathlib.Path) – Path(s) to the data file(s) for the site
- unit (str)
- site (str)
- year (str)
-