Ensemble Forecasting: From Spaghetti to Scenarios

On the left is an example of what has been a typical way to view ensemble data. In this case, a GEFS 500mb height spaghetti plot from PSL. Usually, a meteorologist would look at these to determine uncertainty, and when it looked like “spaghetti” they would just say the forecast is “uncertain” and leave it at that. However, we can do better with the help of artificial intelligence (AI). We’ll look at the current forecast challenge in the Western US through the lens of traditional ways of forecasting, and a more modern way of harnessing and quantifying uncertainty while still coming up with actionable information in an uncertain forecast.

Let’s back up one step from ensembles and just look at the GFS deterministic, a staple in any forecast process, right? In the slideshow below are GFS 500mb heights and anomalies for various GFS forecast cycles for the same valid time (12Z 9/21) from pivotal weather. Even without looking at a single run of the ensemble, we can infer quite a bit of uncertainty just from the run-to-run variability of the GFS. At this point, a meteorologist has two choices. 1) Say the forecast is “too uncertain” to begin discussing any details. or 2) Dive deeper with some newer tools to be able to message more of what we know as experts. Let’s dive into choice #2.

What if I told you in less than the time it took you to look at a handful of deterministic runs, you could identify the major sources and modes of uncertainty, and come up with meteorologically-coherent scenarios and their probability of occurrence? Oh, and not with a handful of models, but rather a grand ensemble comprised of 100 members from the GEFS (30), EPS (50), and CMC(20)! With the rapid growth and application of AI, we can do just that!

Principal Components Analysis

If you have already done machine learning (ML), you will recognize principal components analysis (PCA), and need to prepare yourself for what is probably a painful oversimplification by me coming next. If not, buckle up! Machine learning is great for tackling big data problems. But it does not see the world as humans do. More data is usually better, but too many degrees of freedom, or too many variables are tough for even machines to handle. PCA can be simply boiled down as a data reduction technique. If the wikipedia link above contained a lot of words that confused you, don’t worry. PCA is just mathematical wizardry to break down our 3D 500 mb height maps for a region into components that explain how it varies between members (and in our case, from the grand mean). Okay, so now let’s back up one cycle to the 00Z Wed, Sep 14 cycle and look at the 100 members of the three ensemble systems I mentioned earlier in WPC’s cluster analysis.

WPC cluster analysis Empirical Orthogonal Function (EOF)

The plot above is the Empirical Orthogonal Functions (EOF) of all 100 members for the 24-hour period ending 00Z Wed, Sep 21. EOFs? Wait, you said PCA earlier?! Yes, EOF is a form of PCA that is more specific to the atmospheric sciences (and some of its initial usage included identification of ENSO phases). WPC’s page will always return two leading EOFs. The colors/shading are the EOF values, and the contours are the grand mean 500mb height. The specific colors are not a big deal for us to worry about now, but the shading, or rather the pattern of shading is what tells us a lot (specifically in relation to the mean). EOF-1 accounts for over 60% of the variability expressed in the 500mb height fields for this time in the grand ensemble (this is a lot of variance explained!). Notice the shading forms a sort of dipole around the mean trough axis. This suggests a lot of variability in the placement of the trough (and can often be interpreted as timing or progression uncertainty, but not always). You might think EOF-2 also looks like a dipole, but actually is two monopoles centered on the trough and ridge axis respectively. Shading centered on trough/ridge axes suggests uncertainty with the depth/strength of that feature. So, for this time, we can say a vast majority (60%) of the uncertainty at this forecast time revolves around the exact placement of the upper-level trough over the west. Okay, we have identified the major source of uncertainty, so what? This by itself isn’t the most actionable, but will be important as we describe scenarios later on to add to their context. Explaining why there is uncertainty increases trust in a forecast.

Phase Space

So, we still haven’t come up with scenarios yet. This is where machine learning really comes into play. We now have taken our mountain of ensemble data and reduced it to a few pebbles (EOF-1 and EOF-2). We can use these two values as coordinates for each ensemble member in EOF phase space. If we plot that up, we get something like the phase space diagram below.

Phase Space diagram from WPC Clusters

Remember that each of these principal components are EOFs that described variance from the mean. So, if we have a mean, and these two PCs, we can rebuild how each member looks. This also means that members close together on this plot look similar to each other. This is how a machine now sees the ensemble. While humans might try to match up 500mb height contours, a machine can do something similar in a fraction of a second by “looking” at the grand ensemble in this phase space. A human can still easily pick out groups of models there, but what about machine? Enter k-means clustering.

K-means Clustering

K-means clustering works by placing down either a specified (WPC cluster analysis always uses 4) or random number of cluster centroids down in this phase space and looking for ensemble members around it and grouping them in. It then nudges the centroid of each of these clusters to see get better groups, and iterates this, up to several hundred times until improvements start minimizing. The youtube video below shows how this process works.


So, now that we have clusters, we can take the mean of each smaller cluster, and instead of 100 layers of spaghetti, we now have only 4 distinct scenarios that (hopefully, and usually) encompass the uncertainty contained in the ensemble, with their membership size explaining the rough probability of that scenario.

WPC Clusters mean 500mb heights and anomalies

So here we have our four scenarios at 500mb above, derived from our 100-member grand ensemble! How different each cluster is from each other depends mostly on 1) how much uncertainty is there and 2) are four scenarios appropriate to describe that uncertainty. Through the EOFs, we saw that the main source of uncertainty was location (possibly progression), and to a lesser extent, depth of the trough at this time. If you look at this over a couple of time steps, you can infer this is probably a phasing issue- with a deep trough along the coast that either phases with an additional shortwave in the north, progressing across the Western US (clusters 1 and 3, representing ~52% of the ensemble solutions), or not phasing and instead cutting off over the Pacific (clusters 2 and 4, representing ~48% of the ensemble solutions).


So now we have our scenarios. 500mb height doesn’t mean a lot of have a lot of impact to most people, so let’s look at some associate parameters closer to the surface. First, we’ll look at QPF in the image compare below. On the left side (slider all the way right), we see the cluster 500mb heights and anomaly from model climatology shaded. On the right side (slider all the way left), we see the resulting 24-hr QPF contoured, with differences from the grand mean (not climatology) shaded. From these, we can see with the phasing scenario (clusters 1 and 3, ~52% of the ensemble solutions), we get more widespread precipitation across the Western US- especially across the north. In contrast with the non-phasing solution (clusters 2 and 4, representing ~48% of the ensemble solutions), precipitation is more uncertain and confined to the southwest (I will spoil the game a bit – precipitation here is a result of moisture from the tropical system currently developing as of this writing, being wrapped up into the southwest by the closed upper-level low). Since these are coarse (0.5 degree), raw ensembles, I would not put any stock in, or try to interpret specific amounts here, just focus relative differences.

WPC Clusters 500mb heights & anomaly // 24-hr QPF

We’ll look at the same thing below, only for max temperatures. Once again, this is raw, coarse ensemble data, so pay attention to relative differences vs absolute values here. The phasing scenario brings cooler, possibly (~20% chance of) much cooler temperatures to the west, while the non-phasing solution brings warmer, possibly (~18% chance of) much warmer temperatures to the region.

WPC Clusters 500mb heights & anomaly // 24-hr max temperature

We can take this even a step further by using this to provide context to NBM probabilities (since those are calibrated, unlike these), since this 100-member grand ensemble is a majority of weight in the NBM. But, we’ll leave this at just focusing on the clusters and the value it adds in allowing us to sort through the spaghetti and create actionable scenarios, rather than just saying “there is too much uncertainty”. Hopefully, even though we ran through this quickly, I have demonstrated the power of ensembles- especially when there is high uncertainty! If not, keep following along as weather continues to unfold (especially across the Western US)!

Add a Comment

Your email address will not be published. Required fields are marked *