Here is my next contribution:
Pan Arctic Outlook
Christopher Randles
Public outlook, statistical method
1. Extent Projection 4.438 + (End of July CT area - 4.383) * 0.9176 m km^2
+/- 95% confidence interval of 0.8 m
2. Method – single linear, single non-linear regression
A gompertz fit of the NSIDC September extent figures is used as a starting point.
Linear regression is then used to predict the residual from the Gompertz fit using the residual of the end of July 2011 Cryosphere Today area number from a gompertz fit of end of July area numbers.
3. Rationale
Several contributors have used multiple linear regression. This felt inadequate when there appears to be a curved shape that other contributors have used quadratic, exponential, logistic or gompertz fits to approximate.
For the July outlook I used both area and volume data to predict the residual. For this outlook, PIOMAS volume data does not appear to improve prediction of the residual significantly and has been dropped.
The following did not appear to be significantly useful:
A scatter plot suggests a fairly linear response to area residuals:
Hamilton's Contribution used a gompertz fit and yielded an estimate of 4.438m km^2. This prediction updates that prediction with an area based prediction. If area differs from the gompertz fit value of 4.383 then extent is predicted to be different by the regression factor of 0.9176 times the difference.
I have not seen anyone attempting any sort of multiple non-linear regression and this approach which de-trends the extent and area data in a non-linear manner prior to a linear regression to predict the residual in the extent that we are trying to estimate appears to be one way to do that.
4. Executive Summary
The data appears to have a curved shape which it appears advantageous to recognise and adapt multiple linear regression to predicting the residuals from the curved shape which has been approximated using a Gompertz fit. – See Hamilton's Contribution. This model yields an average September extent prediction of X.X m km^2 with a 95% confidence interval in the region of +/- 0.8m (though RMSE is as low as 0.28m).
5. Estimate of Forecast Skill
A 95% confidence interval of +/- 0.8 m is calculated though there are mixed indicators over whether this might understate the uncertainty. This estimate is higher than the inappropriately tuned RMSE figures of as low as 0.28m.
The RMSE of estimates reduces as follows:
Linear regression of September average extent =0.508 m
Gompertz fit of September average extent = 0.438 m
Gompertz fit then linear regression prediction of residual with CT area residual from gompertz fit = 0.282 m
Note however that these RMSE numbers are likely to underestimate the likely error as they have the advantage of the method being tuned with data that cannot be available at the time of making a true prediction.
Removing that advantage
Year |
Gompertz |
Adjusted Gompertz |
Actual |
Error |
1991 |
6.8956 |
6.8076 |
6.55 |
-0.2576 |
1992 |
6.7332 |
6.9470 |
7.55 |
0.6030 |
1993 |
6.8981 |
6.4133 |
6.5 |
0.0867 |
1994 |
6.7476 |
7.1558 |
7.18 |
0.0242 |
1995 |
6.8010 |
6.3916 |
6.13 |
-0.2616 |
1996 |
6.5989 |
7.6012 |
7.88 |
0.2788 |
1997 |
6.8246 |
6.4639 |
6.74 |
0.2761 |
1998 |
6.7712 |
6.4440 |
6.56 |
0.1160 |
1999 |
6.6921 |
6.7547 |
6.24 |
-0.5147 |
2000 |
6.4976 |
6.2863 |
6.32 |
0.0337 |
2001 |
6.3397 |
6.4042 |
6.75 |
0.3458 |
2002 |
6.4390 |
6.5182 |
5.96 |
-0.5582 |
2003 |
6.1693 |
6.2445 |
6.15 |
-0.0945 |
2004 |
6.0600 |
6.2112 |
6.05 |
-0.1612 |
2005 |
5.9505 |
5.7487 |
5.57 |
-0.1787 |
2006 |
5.6486 |
5.4538 |
5.92 |
0.4662 |
2007 |
5.6232 |
4.8907 |
4.3 |
-0.5907 |
2008 |
4.7365 |
4.8002 |
4.68 |
-0.1202 |
2009 |
4.3707 |
4.5991 |
5.36 |
0.7609 |
2010 |
4.5460 |
4.3898 |
4.9 |
0.5102 |
Average absolute error 0.312 m
RMSE without tuning to unavailable data 0.378 m
A 95% confidence interval is calculated at +/- 0.8 m. All 20 errors are less than this suggesting that 0.8 m may be more than necessary.
However, the average of the absolute errors for the first 10 year is only 0.245 m whereas the average in the last 10 years is higher at 0.378 m. So there may be some growth in the expected size of errors and therefore a 95% credible interval may need to be higher than +/- 0.8 m.
Using exponential fits instead of Gompertz fits yielded a minor improvement in the fits giving a RMSE of 0.2814 instead of 0.282 but I prefer to stick with the Gompertz fit in case this is needed for a levelling off in the rate of decrease which could occur.
In the format
The regression factors and data are
Multiple Regression Factors – Area |
|||
0.917599 |
0.00023 |
||
0.140987 |
0.051496 |
||
0.585401 |
0.291307 |
||
42.35909 |
30 |
||
3.594582 |
2.545792 |
||
|
6. Projection based on August data
Unless improvements are found, Average September Extent projection =
= 4.438 + (End August CT Area – 3.124 ) * 1.1284 + (NSIDC Aug Extent – 5.916) * 0.323
m km^2 +/- unfairly tuned RMSE of 0.23 m.
7. Invitation to discuss
Comments on this method or the error estimate or comparing different methods or error estimates between different contributions are welcome. I suggest such discussion could be useful be done at Neven’s blog. The latest appropriate post being
http://neven1.typepad.com/blog/2011/07/august-search-outlook-contribution.html
With the century break reported earlier for the 26th, area at 4.452 is not much above the gompertz fit for end of the month of 4.383. So there seems time for the area to not only get below the gompertz fit but also lose another 250k or so area so that the prediction comes in just below the 2007 record.
This method does not account for weather so there is still a large margin for error.
Posted by: crandles | July 27, 2011 at 15:04
Great stuff, crandles!
Posted by: Neven | July 27, 2011 at 16:24
CT SIA was 4.452 M km^2 for July 25, 2011 (year.frac = 2011.5643)
SIA for July 26 has not been reported yet.
Posted by: Artful Dodger | July 27, 2011 at 20:17
I am using the data for the day ending .5781 so there are 5 more days to report. How do you get .5643 to be 25th Jul?
Posted by: crandles | July 27, 2011 at 20:34
Lodger,
So the CT SIA value associated with 2011.0000 in their table is the one for December 31, 2010 and CT typically reports SIA not the day after they measure it but approximately two days later, while posting their concentration imagery with only a one day delay? Not utterly ridiculous but certainly counterintuitive. Are you basing this on personal communication with the CT people or is all this explained somewhere online I haven't noticed yet?
Posted by: Jon Torrance | July 27, 2011 at 20:40
Regarding conversion from the decimal dates that CT uses to more comprehensible elapsed dates, here's a piece of code using Stata's date functions that might be adapted to other systems.
In this snippet, "date" refers to decimal date, and all other variables are defined from that. "yearlength" figures out whether it's a leap year or not (based on what doy, day of year, Dec 31 calculates to be), "year" is just year, and "edate" is the elapsed date -- number of days since Jan 1 1960, then formatted so that 11894 (=2011.5643) shows up as 25jul2011.
gen year = floor(date)
gen yearlength = doy(mdy(12,31,year))
gen doy = max(round((date-year*yearlength,1),1)
gen edate= mdy(1,1,year) + doy -1
format edate %d
label variable edate "Elapsed date"
Posted by: L. Hamilton | July 27, 2011 at 21:27
Gas Glow, Jon, Lord Soth, and other Citizens of the Blog...
First, look at the top of the CT SIA table. The very first line begins with a year.frac value: 1979.0027
Understand that year.frac does not specify a day, it specifies an instant in time, in this example 00:00 hrs on Jan 2, 1979.
The time period covered by each line in the CT table is bounded by the time given on the previous line. So the previous line is 2011.5616 which is Jul 25 2011 00:00
Then the line beginning 2011.5644 is for data collected during the 24-hr period we commonly refer to as July 25, 2011.
Thanks Larry for the Stata code snippet. This can also be calculated in Excel with something like:
=ROUND(YEARFRAC(A2;A3;1);4)
where
A2 = Jan 01 2011 00:00
A3 = Jul 26 2011 00:00
the result is .5644
I believe FrankD came to this realization earlier this year by comparing daily SI concentration charts between CT and IJIS. Perhaps Frank, Peter Ellis, or Neven can explain it better than me. I them to comment.
In addition (and separate from this explanation), CT uses NASA AMSR-E level 3 data, which has not been released yet for July 26, 2011. This is Public information provided through their website. I have explained in private communications to Neven why I decline to link to the data here, but there is already ample evidence showing how CT data uses the year.frac function.
Cheers,
Lodger
Posted by: Artful Dodger | July 27, 2011 at 22:11
OK thanks for the explanations.
Posted by: crandles | July 27, 2011 at 23:21
"I believe FrankD came to this realization earlier this year by comparing daily SI concentration charts between CT and IJIS. Perhaps Frank, Peter Ellis, or Neven can explain it better than me. I them to comment."
Couldn't hurt. I'm unclear as to what the daily concentration charts from CT and IJIS have to do with this, assuming that means what can currently be seen at http://arctic.atmos.uiuc.edu/cryosphere/NEWIMAGES/arctic.seaice.color.000.png and http://www.ijis.iarc.uaf.edu/cgi-bin/seaice-monitor.cgi?lang=e&mode=img&size=L&date=set&y=2011&m=07&d=26.
Posted by: Jon Torrance | July 27, 2011 at 23:34
The ice doesn't match up between CT and IJIS unless you adjust the CT dates. Frank noticed this earlier this year.
Posted by: Artful Dodger | July 27, 2011 at 23:45
I dug out Frank's original comment, which was posted May 21, 2011 at 01:09
You can see effect when comparing CT to IJIS charts as well. So the CT date issue applies to their ice concentration charts and SIA graphs/tables.
Posted by: Artful Dodger | July 28, 2011 at 00:08
Peter doesn't seem to be around today, but he had this to say on April 06, 2011 at 09:31
Posted by: Artful Dodger | July 28, 2011 at 03:36
Peter's problem of not having initially noticed that the table contained a row starting with 2011.0000 isn't really relevant to the issue here. It's somewhat unfortunate, if your interpretation is correct on all points, that CT labels an image based on observations taken on July 25th "07/26/2011". Even more unfortunate that they don't publish more metadata generally.
That said, given that JAXA's extent numbers are based on observations from two days averaged together, it's not clear to me that it much matters which of the two days in question the SIA input to a CAPIE calculation comes from. Though ideally, assuming you're correct to think the CT SIA numbers are each based on one day's worth of observations, I suppose one would use the average SIA for the two days in question in the calculation.
Posted by: Jon Torrance | July 28, 2011 at 04:54
After that last discussion I redid my CAPIE spreadsheet, and I guess it's fine now. I don't think it's a huge mistake if it's one day off. Just make sure you make the same mistake for every year so you're still comparing rotten apples to rotten apples, right?
I once ran a restaurant and one of the waitresses would put the fork on the right side of one plate, and on the left side of another. The sharp edges of the knives pointed to the left here, to the right there. I said to her that she didn't have to do it my way (which happened to be the right way), as long as she did it the same everywhere.
It's important to be consistent. Even when you make mistakes. :-)
Posted by: Neven | July 28, 2011 at 05:17
It's possible to use 2-day avg CT data for CAPIE, but even more interesting to use 1-day data for IJIS SIE.
Posted by: Artful Dodger | July 28, 2011 at 05:47
I have indeed seen the site, and Lodger is right. CT is a bit behind IJIS when it comes to releasing data. So for instance, the +26K that was reported today by CT was for July 26th, corresponding to the -55K from IJIS, and not today's -73K.
Posted by: Neven | July 28, 2011 at 21:34
The .5781 year-fraction data is now available as 4.271 m km^2. This is .112 below the gompertz fit of 4.383. So the prediction works out to 0.9176*0.112 below the gompertz fit of 4.438 which calculates to 4.34 rounded to 4.3 m km^2.
Posted by: crandles | August 01, 2011 at 14:57
If you want the next week data to form some sort of target: The Gompertz fit for 0.6 year-fraction data (8 days' data time = 7th August) is 3.965 m km^2.
The regression factor is 1.093.
Posted by: crandles | August 01, 2011 at 15:03
Regarding the SEARCH SIO, this year I'm going with a "let it ride" theory. My August prediction will be the same one I had in April, 4.4.
http://neven1.typepad.com/blog/2011/04/trends-in-arctic-sea-ice-extent.html
Posted by: L. Hamilton | August 01, 2011 at 15:44
>"let it ride"
But that backtracking on what your executive summary said
"Refinements to this predict should be possible when we have further information, as the 2011 melt season progresses."
Only kidding ;o) Sorry if I have pinched your intended refinements.
4.4 may well be more likely to be more accurate than my 4.3. I just don't have a data set that I think I can use for expected weather.
It did vaguely cross my mind that we could try to start one by counting comments on this blog that indicate weather will be good for small extent reductions minus posts on this blog indicating weather will be good for large extent reductions. Unfortunately I think it may take a long time time to see if this was any use.
Posted by: crandles | August 01, 2011 at 16:22
It's true, earlier I had expected that I would refine that April prediction. Last year I was fiddling with new models right up to the deadline. Tried some of that this year and didn't come up with anything that made me think "Wow!" On the other hand, the Gompertz idea had appealing simplicity. So I figure, what the heck, let's see how close or how wrong that first guess will be.
Posted by: L. Hamilton | August 01, 2011 at 16:55
OK we now have 7 Aug area at 3.695. As mentioned earlier, the Gompertz fit for 7 Aug (yearfraction 0.6) is 3.965 so area is .27 below the gompertz fit.
Using only this CT area data the prediction would be 4.438-1.093*.27 = 4.142
However, I now also have the GFSC-IJIS melded daily extent data which I can use in addition to area.
The gompertz fit for 7th August is 6,187,305. This is suprisingly higher than the 7 Aug 2007 value.
The 7 Aug IJIS extent is 6,174,375 which is marginally below the gompertz fit. So using this record alone would predict less than 4.438 by .894*.0109 = 4.427
Using both the prediction is 4.438-.602*.27-.053*.0109 =4.271 m km^2
So all three of these predictions are well within the uncertainty range of the 2007 minimum. It could very easily go either way.
Posted by: crandles | August 09, 2011 at 16:05
In the report due soon, Paragraph 6 should be changed to:
6. Projection based on August data
Unless improvements are found, Average September Extent projection =
= 4.438+(IJIS JAXA End Aug Extent–4.824)*.9083+(End Aug CT Area–3.124)*0.3785
m km^2 +/- unfairly tuned RMSE of 0.14 m.
The IJIS JAXA daily extent record is only short starting in 2002. Thanks to Lucia and blog readers the NASA GFSC daily extent record here was made known to me. I have attempted to create a homogenous daily extent record by applying a step change to the GFSC data. The data and calculations are made available here. This appears to work well and is a better guide to NSIDC average extent than CT daily area at the end of August. At the end of July, both CT area and GFSC-JAXA daily extent are significant but CT area is a better guide than this daily extent record. There is a large overlap of information between these two as using both only performed better than area and 8 out of 10 sets of random numbers. So there does not appear to be a significant advantage to using this GFSC-JAXA daily extent record at 31 July and it has not been used.
Posted by: crandles | August 09, 2011 at 16:39
Target information for 15th August:
Gompertz fit of CT area for 15th Aug is 3.623; we are nearly down to that already!
Gompertz fit of GFSC-IJIS is 5.824.
Prediction is 4.438+(CT Area for 15/8 - 3.623)*0.9787 + (IJIS for 15/8 - 5.824)*0.296
Posted by: crandles | August 09, 2011 at 19:18
The August report is out
http://www.arcus.org/search/seaiceoutlook/2011/august
Posted by: crandles | August 13, 2011 at 13:15
Re: crandles | August 09, 2011 at 16:39
Hey there,
GFSC or GSFC... got data from latter. Was there a link meant where you say
Thanks
Posted by: Seke Rob | August 13, 2011 at 17:08
The meld data and calcs should be at
https://docs.google.com/spreadsheet/ccc?key=0AjpGniYbi4andElTczhVS2t2VW5Ka0sySnFrcndTTkE&hl=en_US#gid=0
Sorry about that and delay in answering.
Posted by: crandles | August 14, 2011 at 15:09
Good stuff, thx Chris R.
Posted by: Seke Rob | August 14, 2011 at 16:18
Following my recent comments at Lucia's blog
Lucia's vunerable ice = extent-area may not be a useful predictor.
I am now thinking of calculating a weighted average of extent and area for the last few available days. Deducting the September minimum and this is what I will try to predict. I might predict it using area or the weighted average of extent and area or I might try some sort of weighted average cumulative area since May 1 or perhaps later.
Just wondering what weightings to use for the cumulative area. A sine wave that peaks and troughs on the soltices or is there some more complex formula to weight the energy capture by ocean rather than ice?
Or should it be more weighted to recent areas?
Thoughts anyone?
Posted by: crandles | August 17, 2011 at 16:51
So it is now clear that the 3.437 is a revision to 15/8 area. Using that and extent of 5.588 for 15/8, the gompertz fit adjustment method comes out predicting 4.318 m km^2.
If I predict the fall from a weighted average of area and extent for the 3 days to 15/8 using the weighted average as the predictor then I get a prediction of 4.538+/-0.2
If I wanted to average these I would want to give more weight to the 4.538 prediction so I end up with about 4.45.
This averaged prediction is indistinuishable from Larry's 4.438. 'Let it ride' seems to be working well so far.
Posted by: crandles | August 18, 2011 at 14:36
One cute thing about my black box approach is that it applies with equal simplicity (simplistic-ness?) to any of the ice indexes. So I've also got estimates made in April for September volume, area, and 3 kinds of extent ... it will be interesting to see which if any of those turn out close.
Posted by: L. Hamilton | August 18, 2011 at 16:37