Requirements

The package functionality must respect the following rules:

  • The data and metadata can be tested separately.
  • The tests can be run locally via a simple function call.
  • There is not hardcoded dependency to GitHub (or to a specific repository). Users can choose any submission process and still be able to run the validation code.

Functionality dropped from the previous iteration

  • The GitHub issue labelling feature will be moved to a different action directly in the main repository because it is not part of the validation process/
  • The comment linking to the forecast previsualisation will be moved to a different action, controlled by a workflow_run event based on the success of the validations.
  • The zoltar upload workflow will be moved to a different repository (possibly the main repository).
  • The check that predicted values do not exceed population size is dropped entirely. I don’t believe it’s our role to gatekeep this. We only want to validate the forecast format, not the forecast itself.

Proposed interface

  • validate_model_data(file = ): validate a single .csv forecast file.
  • validate_model_metadata(file = ): validate a single .yml metadata file.
  • validate_model_folder(path = ): validate all the .csv forecast files and the .yml metadata file in a given folder.
  • validate_repository(path = ): validate all model folders in a given folder / repository
  • validate_pr(url = ): validate all the files edited in a pull request / all the relevant folders. Identifies which files have been modified and dispatch them to the relevant validation functions.

Tests

validate_model_data()

  • test that file name follow the required format
  • test that the column forecast_date matches the file name
  • test that projected values are positive
  • test that all required columns are present (in the expect order)
  • test that all columns have the expected type
  • test that quantiles match quantiles in the config file
  • test that horizons match horizons in the config file
  • test that target types match target types in the config file
  • test that target_end_dates match forecast_date + horizon
  • test that weekday of target_end_dates matches what is in the config file
  • test that weekday of forecast_date matches what is in the config file
  • test that predicted values are increasing with quantile
  • test that locations match locations in the config file

validate_model_metadata()

  • validate against schema
  • test that the license is one of the accepted licenses. Where does the list of accepted licenses come from?
  • test that the file name starts with model_abbr

validate_model_folder()

validate_repository()

  • test that each team only has a single primary model

validate_pr()

  • test from validate_model_folder()
  • test that the files are added in the correct location (instead of, e.g., at the root of the repo, as has happened in the case of the US hub)