GEOS-2km During The 2025 NOAA-HWT Spring Forecast Experiment

GEOS-2km During The 2025 NOAA-HWT Spring Forecast Experiment

Authors: Gary Partyka, Bill Putman, Bennett Erdman

Published December 19, 2025

The GMAO again participated in NOAA’s Hazardous Weather Testbed Spring Forecast Experiment. By integrating operational National Weather Service forecasters with research meteorologists, this month-long annual experiment located at the National Weather Center in Norman, Oklahoma sets out, among other topics in severe thunderstorm research, to answer the following questions:

1. How well are the current state-of-the-art deterministic convection allowing models (CAM)s forecasting severe convective storms during the two-to-three-day range?

2. Which of these models perform best in that timeframe in terms of identifying severe thunderstorm attributes, including tornadoes, large hail, or severe surface wind gusts?



Figure 1: The convective thunderstorm activity which occurred the evening of Friday, May 30th in the Mid-Atlantic (left panel) was handled well by GEOS-2km, as shown in the radar reflectivity 25 hours out from model run on 2025-05-30 00z (right panel).



Figure 2: NASA-FV3 (GEOS, gray square) had probability of detection and critical success index scores as good as operational models like the High-Resolution Rapid Refresh (HRRR), green star).


Over five weeks, the pool of forecasters ran five flagship CAMs – one being the 2km version of the GMAO’s Goddard Earth Observing System (GEOS), or GEOS-2km – daily, and evaluators scored them via objective and subjective methods. Objective evaluations included success ratio scorings (Figure 2) and RMSE comparisons (Figure 3), while subjective methods included a blind intercomparison ranking of all five models against observations (Figure 4). These tests were carried out weekly, then again at the end. 



Figure 3: Evaluations from the 2025 SFE show the lack of radar assimilation in GEOS-2km (top, blue line) affecting the RMSE values for first few hours after initialization, with HRRR (top, red line) outperforming it.  By hour 7, though, GEOS-2km overcomes this difference and outperforms the operational HRRR model all the way up to 48-hours out.


Figure 4: NASA GEOS-2km outperformed all regional models in Day 3 subjective analysis. This subjective scoring methodology had participating scientists blindly compare all model’s output to observations, then rank them 1-5. These plots are scores for radar reflectivity and updraft helicity, including mean scores displayed above each model plot, count and median metrics, and the general distribution of scores depicted by the shape of each model’s colored figure.


This spring, we proudly announce that GMAO’s GEOS-2km model was rated the highest among the five models in Day 3 blind inter-comparison evaluations. This represents a marked year-over-year improvement, owing to several physics advancements in GEOS models achieved by the model development group, led by Bill Putman.