Meteorological meanderings, part 1: getting short-range US forecasts

NOAA’s NCEP generates a lot of weather data. Various numerical weather prediction models are executed at regular intervals multiple times a day, and their outputs are made available as FTP and HTTPS downloads. Many of these models have significant overlap in the data they generate, so a natural question would be, why not consolidate? The answer is that, because our planet’s atmosphere is chaotic, no numerical model gets everything right. Some models have been shown to make better predictions in one situation, while others are superior in another situation. Therefore, running multiple models gives meteorologists a chance to compare and contrast results in order to make better overall predictions.

For short-range CONUS (CONtiguous United States, no Alaska or Hawaii) forecasts, two models often referenced are NAM (North American Mesoscale) and RAP/HRRR (RAPid Refresh and High Resolution Rapid Refresh, respectively). NAM has been around since the early 2000s, while RAP was introduced in the 2010s. They run at different intervals and have different resolutions. HRRR has an improved resolution over RAP. The latest version of RAP and HRRR became operational in December 2020. Let’s take a look at how we can use HRRR for short-range forecasts.

Although some APIs are available from NOAA/NWS, it looks like they aren’t used very much. Instead, files with predictably-formatted names are uploaded to older FTP sites and to NOMADS (NOAA Operational Model Archive and Distribution System) during model runs. Then, people use scripts to download and process those files, a concept referred to as ETL.

For HRRR, the data files themselves are in the GRIB Edition 2 format, as standardized by the WMO (World Meteorological Organization, a UN agency). GRIB2 files can contain multiple data sets, or bands, inside them. For example, one band might be temperature at 2 meters above ground, while another band might be snow depth on the ground. Definitions for the bands can be found on NCEP’s GRIB2 pages. These bands contain data encoded in a grid pattern, based on a defined map projection for the grid. The current version of HRRR uses a Lambert conformal conic projection at a 3 km resolution.

Screenshot of https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20210106/conus/

The above screenshot is just the top of the page. There are hundreds of files with similar-looking names in that directory. With basic scripting, it’s fairly easy to download the ones you need. Determining what you need, however, can be tricky. Looking at the NCEP Products Inventory page for HRRR is a first step. This page explains, among other things, the formatting of the filenames. The double zero in t00z represents the time the model ran, and the increasing double-digit number after the “f” is the forecast hour. This page also tells us that the “wrfnat” part of the filenames seen above represents Native Levels. Some searching leads to an FAQ page (with Fortran code, in 2021 no less) mentioning that native levels are sigma levels, and a bit more searching tells us that this is referring to the sigma coordinate system. Sure enough, opening one of the human-readable index (.idx) files reveals a whole lot of hybrid level data:

Screenshot of a GRIB2 index file with many entries mentioning "N hybrid level"

Speaking of index files, these are used to help reduce bandwidth utilization when downloading model data. Utilizing HTTP Range requests, with the help of these index files, it’s possible to download only the needed GRIB2 bands, rather than entire sets of these large files.

In addition to native level data, there are sub-hourly outputs, 3D pressure levels, and 2D surface levels. It looks like “wrfsfc”, 2D surface levels, is going to be useful for displaying basic forecast data, so let’s stick with that set of files.

How do we begin processing and analyzing these GRIB2 files, then? There are a few different applications and libraries that can work with this format. For both command-line and programmatic access, I’ve found GDAL to be a most useful library. Not only does it have a powerful set of CLI tools, but it also comes with a C/C++ API, along with Python and Java bindings. Debian and Ubuntu have a package ready for installation, named gdal-bin; many other Linux distros have packages as well, and there are multiple sources for pre-built Windows binaries, such as GISInternals. Executing the gdalinfo binary against a downloaded HRRR file will yield pages of information, starting with this:

Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: hrrr.t00z.wrfsfcf12.grib2
Size is 1799, 1059
Coordinate System is:
PROJCRS["unnamed",
    BASEGEOGCRS["Coordinate System imported from GRIB file",
        DATUM["unnamed",
            ELLIPSOID["Sphere",6371229,0,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CONVERSION["Lambert Conic Conformal (2SP)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",38.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",262.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",38.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",38.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["Metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["Metre",1]]]
Data axis to CRS axis mapping: 1,2
Origin = (-2699020.142521930858493,1588193.847443336388096)
Pixel Size = (3000.000000000000000,-3000.000000000000000)
Corner Coordinates:
Upper Left  (-2699020.143, 1588193.847) (134d 7'17.14"W, 47d50'44.64"N)
Lower Left  (-2699020.143,-1588806.153) (122d43'44.80"W, 21d 7'19.89"N)
Upper Right ( 2697979.857, 1588193.847) ( 60d53'28.48"W, 47d50'57.51"N)
Lower Right ( 2697979.857,-1588806.153) ( 72d16'48.48"W, 21d 7'28.62"N)
Center      (    -520.143,    -306.153) ( 97d30'21.52"W, 38d29'50.09"N)
Band 1 Block=1799x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] EATM="Entire Atmosphere"
  Metadata:
    GRIB_COMMENT=Maximum / Composite radar reflectivity [dB]
    GRIB_DISCIPLINE=0(Meteorological)
    GRIB_ELEMENT=REFC
    GRIB_FORECAST_SECONDS=43200 sec
    GRIB_IDS=CENTER=7(US-NCEP) SUBCENTER=0 MASTER_TABLE=2 LOCAL_TABLE=1 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2021-01-10T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)
    GRIB_PDS_PDTN=0
    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=16 196 2 0 83 0 0 1 12 10 0 0 255 0 0
    GRIB_PDS_TEMPLATE_NUMBERS=16 196 2 0 83 0 0 0 1 0 0 0 12 10 0 0 0 0 0 255 0 0 0 0 0
    GRIB_REF_TIME=  1610236800 sec UTC
    GRIB_SHORT_NAME=0-EATM
    GRIB_UNIT=[dB]
    GRIB_VALID_TIME=  1610280000 sec UTC
Band 2 Block=1799x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] CTL="Level of cloud tops"
  Metadata:
    GRIB_COMMENT=Echo Top [m]
    GRIB_DISCIPLINE=0(Meteorological)
    GRIB_ELEMENT=RETOP
    GRIB_FORECAST_SECONDS=43200 sec
    GRIB_IDS=CENTER=7(US-NCEP) SUBCENTER=0 MASTER_TABLE=2 LOCAL_TABLE=1 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2021-01-10T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)
    GRIB_PDS_PDTN=0
    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=16 3 2 0 83 0 0 1 12 3 0 0 255 0 0
    GRIB_PDS_TEMPLATE_NUMBERS=16 3 2 0 83 0 0 0 1 0 0 0 12 3 0 0 0 0 0 255 0 0 0 0 0
    GRIB_REF_TIME=  1610236800 sec UTC
    GRIB_SHORT_NAME=0-CTL
    GRIB_UNIT=[m]
    GRIB_VALID_TIME=  1610280000 sec UTC
Band 3 Block=1799x1 Type=Float64, ColorInterp=Undefined
  Description = 0[-] EATM="Entire Atmosphere"
  Metadata:
    GRIB_COMMENT=(prodType 0, cat 16, subcat 201) [-]
    GRIB_DISCIPLINE=0(Meteorological)
    GRIB_ELEMENT=unknown
    GRIB_FORECAST_SECONDS=43200 sec
    GRIB_IDS=CENTER=7(US-NCEP) SUBCENTER=0 MASTER_TABLE=2 LOCAL_TABLE=1 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2021-01-10T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)
    GRIB_PDS_PDTN=0
    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=16 201 2 0 83 0 0 1 12 10 0 0 255 0 0
    GRIB_PDS_TEMPLATE_NUMBERS=16 201 2 0 83 0 0 0 1 0 0 0 12 10 0 0 0 0 0 255 0 0 0 0 0
    GRIB_REF_TIME=  1610236800 sec UTC
    GRIB_SHORT_NAME=0-EATM
    GRIB_UNIT=[-]
    GRIB_VALID_TIME=  1610280000 sec UTC

Only the first three bands are shown above, but this particular file has 173 of them. As the filename hrrr.t00z.wrfsfcf12.grib2 indicates, this file was generated in the midnight cycle (t00z), and the data it contains is 12 hours in the future (f12).  The intimidating PROJCRS[...] multi-line string is the WKT ISO standard way of describing coordinate reference systems. After that, gdalinfo outputs metadata about every single band in the file.

How do we get some actual information out of this thing? First, we need to find a band that we care about. Let’s look at temperature, as an example:

Band 71 Block=1799x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL="Specified height level above ground"
  Metadata:
    GRIB_COMMENT=Temperature [C]
    GRIB_DISCIPLINE=0(Meteorological)
    GRIB_ELEMENT=TMP
    GRIB_FORECAST_SECONDS=43200 sec
    GRIB_IDS=CENTER=7(US-NCEP) SUBCENTER=0 MASTER_TABLE=2 LOCAL_TABLE=1 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2021-01-10T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)
    GRIB_PDS_PDTN=0
    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=0 0 2 0 83 0 0 1 12 103 0 2 255 0 0
    GRIB_PDS_TEMPLATE_NUMBERS=0 0 2 0 83 0 0 0 1 0 0 0 12 103 0 0 0 0 2 255 0 0 0 0 0
    GRIB_REF_TIME=  1610236800 sec UTC
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[C]
    GRIB_VALID_TIME=  1610280000 sec UTC

In this file, band 71 contains the temperature in degrees Celsius at 2m above ground level. (Strictly speaking, the GRIB2 temperature data is in Kelvin, but GDAL automatically converts it.) Some of the metadata fields to pay special attention to are GRIB_ELEMENT and GRIB_SHORT_NAME, because in many situations, a combination of these two will uniquely identify a data band that is of interest to us. Since we know which band we want to look at, we can query it using another GDAL binary:

$ gdallocationinfo -b 71 -wgs84 hrrr.t00z.wrfsfcf12.grib2 -86.1583502 39.7683331

Report:
  Location: (1222P,462L)
  Band 71:
    Value: -3.91000976562498

gdallocationinfo accepts a band filter, which we use to only have it show band 71. -wgs84 tells the application that the (x,y) coordinates are specified in the WGS84 format, which is the familiar latitude and longitude system that we see almost everywhere. Of course, since it expects the order of parameters to be “filename x y”, we’re really specifying longitude first and latitude second. In the above example, the coordinates are in Indianapolis, Indiana, USA. The predicted temperature is -3.9°C.

Now that we’ve figured out how to get temperature, let’s take a look at some other bands that might be useful for a forecast:

  • Pressure
    • Element PRES at 0-SFC (surface level)
  • Relative humidity percentage
    • Element RH at 2-HTGL (2m height above ground level)
  • Precipitation rate
    • Element PRATE at 0-SFC
  • Precipitation categories, represented as Boolean values
    • Rain: CRAIN
    • Snow: CSNOW
    • Freezing rain: CFRZR
    • Ice pellets (sleet): CICEP
  • Total cloud cover percentage
    • Element TCDC at 0-EATM (entire atmosphere)
  • Wind

Wind is a special case here. For wind, we have two bands: the u-component (UGRD) and the v-component (VGRD). As explained by this university document, a positive value in the u-component denotes wind blowing eastward, while a positive v-component denotes wind blowing northward. Conversely, negative amounts denote westward and southward winds, respectively. The document also explains how to convert from these vector components to more familiar azimuth angle and total speed.

The same kind of gdallocationinfo query as we saw above can be used to get data for all of these bands. Adding perhaps a few more bands and downloading data for more forecast times, we can get a short-range forecast going!

In the next post, we’ll look at visualizing GRIB2 bands as well as dealing with long-range, global forecasts.