geoglows.data

The data module provides functions for requesting forecasted and historical data river discharge simulations. The data can be retrieved from the REST data service hosted by ECMWF or it can be retrieved from the repository sponsored by the AWS Open Data Program. The speed and reliability of the AWS source is typically better than the REST service.

In general, each function requires a river ID. The name for the ID varies based on the streams network dataset. It is called LINKNO in GEOGLOWS which uses the TDX-Hydro streams dataset. This is the same as a reach_id or common id (COMID) used previously. To find a LINKNO (river ID number), please refer to https://data.geoglows.org and browse the tutorials.

Forecasted Streamflow

geoglows.data.forecast(*args, **kwargs)

Gets the average forecasted flow for a certain river_id on a certain date

Keyword Arguments:

river_id (int) – the ID of a stream, should be a 9 digit integer
date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
format – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray
data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.

Returns:

pd.DataFrame or dict or str

geoglows.data.forecast_stats(*args, **kwargs)

Retrieves the min, 25%, mean, median, 75%, and max river discharge of the 51 ensembles members for a river_id The 52nd higher resolution member is excluded

Keyword Arguments:

river_id (int) – the ID of a stream, should be a 9 digit integer
date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
format (str) – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray
data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.

Returns:

pd.DataFrame or dict or str

geoglows.data.forecast_ensembles(*args, **kwargs)

Retrieves each of 52 time series of forecasted discharge for a river_id on a certain date

Keyword Arguments:

river_id (int) – the ID of a stream, should be a 9 digit integer
date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
format (str) – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray
data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.

Returns:

pd.DataFrame or dict or str

geoglows.data.forecast_records(*args, **kwargs)

Retrieves a csv showing the ensemble average forecasted flow for the year from January 1 to the current date

Keyword Arguments:

river_id (int) – the ID of a stream, should be a 9 digit integer
start_date (str) – a YYYYMMDD string giving the earliest date this year to include, defaults to 14 days ago.
end_date (str) – a YYYYMMDD string giving the latest date this year to include, defaults to latest available
format (str) – csv, json, or url, default csv.

Returns:

pd.DataFrame or dict or str

Historical Simulation

geoglows.data.retrospective(*args, **kwargs)

Retrieves the retrospective simulation of streamflow for a given river_id from the AWS Open Data Program GEOGLOWS V2 S3 bucket

Parameters:

river_id (int) – the ID of a stream, should be a 9 digit integer
format (str) – the format to return the data, either ‘df’ or ‘xarray’. default is ‘df’

Returns:

pd.DataFrame

geoglows.data.daily_averages(river_id: int, **kwargs) → DataFrame[source]

Retrieves daily average streamflow for a given river_id

Parameters:: river_id (int) – the ID of a stream, should be a 9 digit integer
Returns:: pd.DataFrame

geoglows.data.monthly_averages(river_id: int, **kwargs) → DataFrame[source]

Retrieves monthly average streamflow for a given river_id

Parameters:: river_id (int) – the ID of a stream, should be a 9 digit integer
Returns:: pd.DataFrame

geoglows.data.annual_averages(river_id: int, **kwargs) → DataFrame[source]

Retrieves annual average streamflow for a given river_id

Parameters:: river_id (int) – the ID of a stream, should be a 9 digit integer
Returns:: pd.DataFrame

geoglows.data.return_periods(*args, **kwargs)

Retrieves the return period thresholds based on a specified historic simulation forcing on a certain river_id.

Parameters:

river_id (int) – the ID of a stream, should be a 9 digit integer
format (str) – the format to return the data, either ‘df’ or ‘xarray’. default is ‘df’
method (str) – the method to use to estimate the return period thresholds. default is ‘gumbel1’

Changelog:: v1.4.0: adds method parameter for future expansion of multiple return period methods

Returns:: pd.DataFrame

GEOGLOWS Model Utilities

geoglows.data.metadata_tables(columns: list = None) → DataFrame[source]

Retrieves the master table of rivers metadata and properties as a pandas DataFrame :param columns: optional subset of columns names to read from the parquet :type columns: list

Returns:: pd.DataFrame