geoglows.data

The data module provides functions for requesting forecasted and historical data river discharge simulations. The data can be retrieved from the REST data service hosted by ECMWF or it can be retrieved from the repository sponsored by the AWS Open Data Program. The speed and reliability of the AWS source is typically better than the REST service.

In general, each function requires a river ID. The name for the ID varies based on the streams network dataset. It is called LINKNO in GEOGLOWS which uses the TDX-Hydro streams dataset. This is the same as a reach_id or common id (COMID) used previously. To find a LINKNO (river ID number), please refer to https://data.geoglows.org and browse the tutorials.

Forecasted Streamflow

geoglows.data.forecast(*args, **kwargs)

Gets the average forecasted flow for a certain river_id on a certain date

Keyword Arguments:
  • river_id (int) – the ID of a stream, should be a 9 digit integer

  • date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified

  • format – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray

  • data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.

Returns:

pd.DataFrame or dict or str

geoglows.data.forecast_stats(*args, **kwargs)

Retrieves the min, 25%, mean, median, 75%, and max river discharge of the 51 ensembles members for a river_id The 52nd higher resolution member is excluded

Keyword Arguments:
  • river_id (int) – the ID of a stream, should be a 9 digit integer

  • date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified

  • format (str) – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray

  • data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.

Returns:

pd.DataFrame or dict or str

geoglows.data.forecast_ensembles(*args, **kwargs)

Retrieves each of 52 time series of forecasted discharge for a river_id on a certain date

Keyword Arguments:
  • river_id (int) – the ID of a stream, should be a 9 digit integer

  • date (str) – a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified

  • format (str) – if data_source==”rest”: csv, json, or url, default csv. if data_source==”aws”: df or xarray

  • data_source (str) – location to query for data, either ‘rest’ or ‘aws’. default is aws.

Returns:

pd.DataFrame or dict or str

geoglows.data.forecast_records(*args, **kwargs)

Retrieves a csv showing the ensemble average forecasted flow for the year from January 1 to the current date

Keyword Arguments:
  • river_id (int) – the ID of a stream, should be a 9 digit integer

  • start_date (str) – a YYYYMMDD string giving the earliest date this year to include, defaults to 14 days ago.

  • end_date (str) – a YYYYMMDD string giving the latest date this year to include, defaults to latest available

  • format (str) – csv, json, or url, default csv.

Returns:

pd.DataFrame or dict or str

Historical Simulation

geoglows.data.retrospective(*args, **kwargs)

Retrieves the retrospective simulation of streamflow for a given river_id from s3 buckets

Parameters:

river_id (int) – the ID of a stream, should be a 9 digit integer

Keyword Arguments:
  • format (str) – the format to return the data, either ‘df’ or ‘xarray’. default is ‘df’

  • storage_options (dict) – options to pass to the xarray open_dataset function

  • resolution (str) – resolution of data to retrieve: hourly, daily, monthly, or yearly. default hourly

Returns:

pd.DataFrame or xr.Dataset

geoglows.data.return_periods(*args, **kwargs)

Retrieves the return period thresholds based on a specified historic simulation forcing on a certain river_id.

Parameters:

river_id (int) – the ID of a stream, should be a 9 digit integer

Keyword Arguments:
  • format (str) – the format to return the data, either ‘df’ or ‘xarray’. default is ‘df’

  • storage_options (dict) – options to pass to the xarray open_dataset function

  • distribution (str) – the method to use to estimate the return period thresholds. default is ‘logpearson3’

Returns:

pd.DataFrame or xr.Dataset

GEOGLOWS Model Utilities