geoglows.data
The data module provides functions for requesting forecasted and historical data river discharge simulations. The data can be retrieved from the REST data service hosted by ECMWF or it can be retrieved from the repository sponsored by the AWS Open Data Program. The speed and reliability of the AWS source is typically better than the REST service.
In general, each function requires a river ID. The name for the ID varies based on the streams network dataset. It is called LINKNO in GEOGLOWS which uses the TDX-Hydro streams dataset. This is the same as a reach_id or common id (COMID) used previously. To find a LINKNO (river ID number), please refer to https://data.geoglows.org and browse the tutorials.
Forecasted Streamflow
- geoglows.data.forecast(*args, **kwargs)
Gets the average forecasted flow for a certain river_id on a certain date
- Keyword Arguments:
river_id (int) – the ID of a stream, should be a 9 digit integer
date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
format – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray
data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.
- Returns:
pd.DataFrame or dict or str
- geoglows.data.forecast_stats(*args, **kwargs)
Retrieves the min, 25%, mean, median, 75%, and max river discharge of the 51 ensembles members for a river_id The 52nd higher resolution member is excluded
- Keyword Arguments:
river_id (int) – the ID of a stream, should be a 9 digit integer
date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
format (str) – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray
data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.
- Returns:
pd.DataFrame or dict or str
- geoglows.data.forecast_ensembles(*args, **kwargs)
Retrieves each of 52 time series of forecasted discharge for a river_id on a certain date
- Keyword Arguments:
river_id (int) – the ID of a stream, should be a 9 digit integer
date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
format (str) – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray
data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.
- Returns:
pd.DataFrame or dict or str
- geoglows.data.forecast_records(*args, **kwargs)
Retrieves a csv showing the ensemble average forecasted flow for the year from January 1 to the current date
- Keyword Arguments:
river_id (int) – the ID of a stream, should be a 9 digit integer
start_date (str) – a YYYYMMDD string giving the earliest date this year to include, defaults to 14 days ago.
end_date (str) – a YYYYMMDD string giving the latest date this year to include, defaults to latest available
format (str) – csv, json, or url, default csv.
- Returns:
pd.DataFrame or dict or str
Historical Simulation
- geoglows.data.retrospective(*args, **kwargs)
Retrieves the retrospective simulation of streamflow for a given river_id from s3 buckets
- Parameters:
river_id (int) – the ID of a stream, should be a 9 digit integer
- Keyword Arguments:
format (str) – the format to return the data, either ‘df’ or ‘xarray’. default is ‘df’
storage_options (dict) – options to pass to the xarray open_dataset function
resolution (str) – resolution of data to retrieve: hourly, daily, monthly, or yearly. default hourly
- Returns:
pd.DataFrame or xr.Dataset
- geoglows.data.return_periods(*args, **kwargs)
Retrieves the return period thresholds based on a specified historic simulation forcing on a certain river_id.
- Parameters:
river_id (int) – the ID of a stream, should be a 9 digit integer
- Keyword Arguments:
format (str) – the format to return the data, either ‘df’ or ‘xarray’. default is ‘df’
storage_options (dict) – options to pass to the xarray open_dataset function
distribution (str) – the method to use to estimate the return period thresholds. default is ‘logpearson3’
- Returns:
pd.DataFrame or xr.Dataset