I hope this first attempt at a post works. You probably know me as Gas Glo here though I usually post on the internet as crandles - that wasn't intentional just an effect of choosing to sign in with an existing google group.
Following Larry Hamiltons work on a Gompertz fits, I have submitted a method of updating that as more information about area and volume become available. I have already provided quite a bit of the detail in the posts so please excuse the repetition. Apologies also if you are not into geeky maths stuff.
Anyway enough preamble, here is what my outlook says:
Pan Arctic Outlook
Christopher Randles
Public outlook, statistical method
1. Extent Projection 4.3m km^2 *
2. Method – multiple linear, single non-linear regression
A gompertz fit of the NSIDC September extent figures is used as a starting point. Multiple linear regression is then used to predict the residual from the Gompertz fit. Two predictors have been used which are:
a) The residual of the end of June Cryosphere Today area numbers at the end of June from a gompertz fit of those end of June area numbers.
b) The residual of the end of June PIOMAS volume numbers at the end of June from a gompertz fit of those end of June volume numbers.
3. Rationale
Several contributors have used multiple linear regression. This felt inadequate when there appears to be a curved shape that other contributors have used quadratic, exponential, logistic or gompertz fits to approximate.
The predictors used in the multiple linear regression are 1. Area because of the direct implications for albedo feedback. 2. Volume because ice is more likely to disappear faster if there is less ice to melt. Testing showed that using the residual from a gompertz fit worked better for predicting the residual than using the raw area and volume numbers which is not surprising. These predictor variables are likely to work better using July data when that is available. Other predictor variables may well exist to further reduce the error.
Hamilton's Contribution used a gompertz fit and yielded an estimate of 4.4m km^2. This prediction updates that prediction with two effects:
Area at end of June of 7.28m km^2 is higher than the gompertz fit value of 7.14, suggesting that area has not declined by as much as expected by the gompertz fit. The multiple regression factor of 0.5236 applied to the residual of 0.14m suggests a prediction of 0.07m more than Hamilton’s gompertz fit.
The other factor is the PIOMAS volume. This year’s end of June data will be well below the Gompertz fit of the end of June volumes. This suggests we will likely see more rapid decline in extent. This residual is much larger around 1.5k km^3 *. The multiple regression factor of 0.1078 is much smaller than the factor for area suggesting that area is more important than volume. However, as the residual is much larger, it gives a larger 0.16m reduction from Hamilton’s gompertz fit. The multiple regression intercept figure is negligible as I am working with residuals meaning I am working with de-trended data. Hence the overall effect is for a 0.09m km^2 lower prediction than Hamilton’s Gompertz fit prediction.
I have not seen anyone attempting any sort of multiple non-linear regression and this approach which de-trends the extent, area and volume data in a non-linear manner prior to a multiple linear regression to predict the residual in the extent that we are trying to estimate appears to be novel in the context of SEARCH predictions.
4. Executive Summary
The data appears to have a curved shape which it appears advantageous to recognise and adapt multiple linear regression to predicting the residuals from the curved shape which has been approximated using a Gompertz fit. – See Hamilton's Contribution. This model yields an average September extent prediction of 4.3m km^2 with a 95% confidence interval in the region of +/- 1m (though RMSE is as low as 0.36m).
5. Estimate of Forecast Skill
A 95% confidence interval of +/- 1m is calculated though there are some indicators that this understates the uncertainty. This estimate is substantially higher than the inappropriately tuned RMSE figures of as low as 0.36m.
The RMSE of estimates reduces as follows:
Linear regression of September average extent =0.508m
Gompertz fit of September average extent = 0.438m
Gompertz fit then linear regression prediction of residual with CT area residual from gompertz fit = 0.372m
Gompertz fit then linear regression prediction of residual with PIOMAS volume residual from gompertz fit = 0.396m
Gompertz fit then multiple linear regression prediction of residual with both CT area and PIOMAS volume residuals from gompertz fits = 0.36m
Note however that these RMSE numbers are likely to underestimate the likely error as they have the advantage of the method being tuned with data that cannot be available at the time of making a true prediction.
Removing that advantage
Year Prediction Actual Error
1991 6.940 6.55 -0.395
1992 7.000 7.55 0.550
1993 6.293 6.5 0.207
1994 6.885 7.18 0.295
1995 6.408 6.13 -0.278
1996 7.528 7.88 0.352
1997 6.793 6.74 -0.053
1998 6.775 6.56 -0.215
1999 6.666 6.24 -0.426
2000 6.440 6.32 -0.120
2001 6.638 6.75 0.112
2002 6.864 5.96 -0.904
2003 6.269 6.15 -0.119
2004 6.310 6.05 -0.260
2005 5.703 5.57 -0.133
2006 5.324 5.92 0.596
2007 5.148 4.3 -0.848
2008 5.026 4.68 -0.346
2009 4.872 5.36 0.488
2010 3.697 4.9 1.203
Average absolute error 0.395
RMSE without tuning to unavailable data 0.492
A 95% confidence interval is calculated at +/- 1m and only one year of 20 above has a larger error supporting that size for the confidence interval.
However, the average of the absolute errors for the first 10 year is only 0.29 whereas the average in the last 10 years is higher at 0.50. So there may be some growth in the expected size of errors and therefore a 95% credible interval may need to be higher than +/-1m.
In the format
The multiple regression factors and data are
Multiple Regression Factors - Area and Volume |
|||
0.107838 |
0.523576 |
-2.09026E-05 |
|
0.073398 |
0.206417 |
0.06655474 |
|
0.330561 |
0.37649 |
||
7.159937 |
29 |
||
2.02977 |
4.110604 |
*Note that the end of June PIOMAS volume does not yet seem to be available. The prediction is actually for 3.03+.1078*PIOMAS volume at end of June in 1000km^3. If that volume is 12.2k km^3, the extent prediction can then be calculated as 4.345m km^2 which I have rounded to 4.3 to avoid suggesting too much precision. The method therefore appears to be predicting a negligible amount higher than the 2007 record minimum.
6. Review of formula arising from model for possible bias
As explained earlier, the full formula for average September 2011 extent can be expressed as:
=gompertz fit of 4.438 +0.5236*(area-7.14) + 0.1078*(volume-13.69)
Which can be simplified to = -0.776 + 0.5236 * area + 0.1078 *volume
If the end of June area and volume was as absurdly low as half the expected figures (say 3.5m km^2 and 7k km^3), we should be certain that the vast majority of the ice would melt by the beginning of September. The above formula would calculate to a September average extent of 1.8m km^2. The formula clearly calculates too much ice extent when it is taken outside of the ranges where it is hoped that the linear regression might work. This suggests that we might expect a non-linear response to the predictors used in the linear regression. This year it looks like the volume is well below the expected gompertz fit value. Thus trying to account for the expected non linear response to the volume predictor would seem to suggest that this method will predict too high a level for the September 2011 average extent. However the same could be said to apply even more to 2010 and that effect does not seem to have been observed. It could of course be there but hidden by random error but this would mean that the random error would have to be an even larger unprecedented size.
Well that is the preview, but I can also update you with 4 July area numbers used the prediction moves down to 4.2m km^2 suggesting a new record this year though there is still a lot of uncertainty. I will be interested to see any comment and suggestions.
I want to say thank you to Larry for helpful suggestions and allowing me to go ahead with submitting this.
Posted by: crandles | July 05, 2011 at 16:29
You are forgiven. ;-)
Well, it's not that I'm not into it. I just don't get any of it!
Thanks a lot, Gas Glo, for sharing your submission to the SEARCH Sea Ice Outlook. We now have two ASI blog commenters submitting. Great stuff, guys. I take my hat off to you.
Posted by: Neven | July 05, 2011 at 18:29
Nice job, Gas Glo/crandles :).. And as for Apologies also if you are not into geeky maths stuff: I am so glad that you showed us how things can be done mathematically. Even if I am not capable doing this (I basically understand your method, maybe, if I won't be sure about some detail after reading it couple of times, I will ask you for some help), it is nice to see how you did it. Now, it is on me to do some googling about some terms.
Posted by: Patrice Pustavrh | July 05, 2011 at 18:54
I have planned to do a meta-analysis of all prediction based on their previous performance. However, I have been too lazy.
Posted by: Yvan Dutil | July 05, 2011 at 21:36
I didn't revise my own prediction this month, but I understand that some of the other models sent in to the SEARCH SIO have gone noticeably lower. They must be seeing something....
Posted by: L. Hamilton | July 05, 2011 at 23:32
The PIOMAS volume at end of June was 12.261 so using end of June data Gompertz fit of 4.44 is reduced by 0.8 to 4.36m km^2.
However both 4.44 and 4.36 get rounded to 4.4 to avoid suggesting too much precision.
All that effort for no adjustment :o( , oh well ;)
Thank you for the comments. If I lose half the readership for every equation used and probably a greater proportion for technical terms like confidence intervals, credible intervals, Root mean square errors(RMSE), standard errors..., it is suprising if I have any readers left.
Re Patrice's 'Even if I am not capable',
I doubt I am capable either... ;o)
Posted by: crandles | July 06, 2011 at 12:50
As for losing readership, there are two levels -- an "executive summary" where you put place your bet on some number, then the justification where IMHO it really is best to lay out the steps and equations behind your number. Pictures are good too.
That's not just so others can figure out what you did. In my case, at least, it really helps *me* to figure out what I did when I come back to it weeks or months later.
Posted by: L. Hamilton | July 06, 2011 at 17:30
Actually I don't understand many of the details of the above analysis, but as it uses a purely statistical model with no physical elements in it, its worth might be limited. As I read, the Gompertz - function is used in population or usage statistics, where it describes some kind of saturation behaviour. In our case, the driving force, i.e. the heat transfer into the arctic, may well have significant variations, which may accelerate or slow down the process considerably and unexpectedly. Also, with steady driving force, the behaviour of the now-thickest areas of ice is of importance - it might well be, that it melts all away! There we have to enter some discussion about where those areas are (in the nearest vicinity of Greenland and the Canadian Islands north costs), why there of all places (influence of ice shields???) and wether some protecting function of the adjacent land masses will be sufficient to keep a rest of summer ice over a longer time "alive".
Posted by: dominik lenné | July 08, 2011 at 00:56
Certainly it is only a statistical model. I would suggest there are several contributions using multiple linear regression to which your comments would also apply.
I have tried to keep my predictors down to ones that have obvious physical cause and effect and I would suggest have other statistical contributors.
>"heat transfer into the arctic, may well have significant variations"
Are you thinking random weather variations which are hard to predict more than a week in advance or systematic changes as the ice gets thinner?
If random hard to predict variations, then I think multiple linear regression is regarded as a good technique to extract the systematic and average the remaining random residuals.
If systematic changes as the ice gets thinner are these going to change the trends from previous years? Why this year? But if so and you can reliably tell us how, then a better prediction can be made.
I certainly agree that if there is a lot of thick ice getting thinner and down to the thickess that can be melted in a season then we could get a non-linear response moving away from a gompertz curve. However, the area reaching such critical thickness could also increase fairly steadily over a number of years and effect might already be picked up by the gompertz curve.
Beauford Gyre and Transpolar Drift tend to pack the ice against Greenland and Canadian Archipelago. It has occured to me that the mass of ice is considerably less now so it is now being packed in there with a small hammer rather than an almighty sledge hammer so the ice may well not build up its thickness or density as quickly as in the past. Nevertheless, I still think there is more ice than can be melted this season. If the weather is good for melting and transport, next year could be a different story....
Posted by: crandles | July 08, 2011 at 12:23
With this heat transfer thing, I had in mind something like changing medium time scale weather patterns, like the north atlantic oscillation, which may change its sign and consequently break the trend. Actually this is a denialist way of arguing. You are of course right insofar as any trend already started some years ago is incorporated in the statistical model.
(An other interesting question is, whether the north atlantic oscillation isn't itself broken by sea ice levels so low - that is, whether there is a "weather pattern tipping point". This would then be the kind of nonlinearity you mentioned.)
Concerning the packing of ice against the Greenland and Canadian Islands north coast as cause for the higher ice thickness there i would be thankful for one or two links.
Posted by: dominik lenné | July 09, 2011 at 16:34
Hmm not sure about links/papers for that but maybe:
An animation shows it:
http://nsidc.org/news/press/2007_seaiceminimum/images/20070822_oldice.gif
http://iabp.apl.washington.edu/pdfs/RigorWallace2004.pdf
http://www.sciencedirect.com/science/article/pii/0165232X87900073
Quote: "This is as expected since the Beaufort Gyre and Transpolar Drift Stream tend to push the ice pack around the Arctic Ocean in a clockwise direction causing the ice to the north to pile up along the natural barriers to this flow (Fig"
Posted by: crandles | July 09, 2011 at 17:25
The Gompertz fit of 9 July NSIDC areas is for 6.260 whereas the area is only 6.146 so are is now below the gompertz fit by 0.114.
The multiple regression factors have changed to give more weight to area and less to volume which is not surprising with the area information now more up to date. The factors are now 0.871 for area and 0.0557 for volume.
So the method now calculates
4.438-.114*.871-1.433*.0557 = 4.26 m km^2
This is slightly higher than the 4.22 calculated with 4 July data.
I should have added more graphics
The predictors appear to have some skill:

I wondered if the predictor anomalies might have non linear impact but that is not apparent from this scatter plot:

Posted by: crandles | July 10, 2011 at 17:35
@crandles/gas glo: I think you did a really decent model, but, what about running it on a couple of days average value (just for the smoothing purpose) ? Just for try, but you may find some of the fits better, I think (and don't know cause I didn't crunch any of the numbers). But, if you have things set up in your model and you can easily rerun them, I'd be pleased to see the outcome.
Posted by: Patrice Pustavrh | July 10, 2011 at 23:02
Patrice,
I didn't really expect it to make much difference but I tried averaging area over 7-9 instead of just using 9th. (I haven't bothered also trying this with volume.)
period used, r2, SE, est
9th July only, 0.5276, 0.3162, 4.2593
7-9 July average, 0.4668, 0.3360, 4.2396
.01 to the estimate seems little difference. The r2 and se seem to have got markedly worse. Perhaps on other dates an averaging would make them markedly better. If so and I attempted to use any improvements possible, such decisions would appear to me to be cherrypicking.
Were you expecting a different conclusion to that?
Posted by: crandles | July 11, 2011 at 14:43
Time for another update with area to 17th July just released at 4.4555. Though this is 177k above 2007's 4.279, it is a little below the Gompertz fit for 17th July of 5.520. So area residual is 0.065
So the estimate is now for gompertz fit of 4.438-(0.860*0.065)-(0.063*1.433)=4.292
That is almost spot on the 2007 minimum suggesting a 50:50 chance of a record low. But there are 2 things to bear in mind:
If area has gone down more rapidly than suggested by gompertz fits then perhaps volume has also done the same since last volume update to 30th June.
Secondly weather could be turning against a record low.
Posted by: crandles | July 18, 2011 at 14:22