It took a while, but century breaks have started rolling in this month. As you all know (and if you don't, read last year's Century Breaks post) a century break is the name I have given to daily extent decreases as reported by IJIS that surpass 100,000 square km. The name is based on one of the best and toughest sports in the world: Snooker. On the image to the right you see Ronnie 'The Rocket' O'Sullivan, one of the best snooker players of all times. If you want to know why, you can watch the video in this post from last year.
With 4 century breaks so far 2011 is behaving like most recent years, with the exception of 2010, which had already compiled 8 century breaks by the end of June. But it didn't go on to break records as July was a very bad month century break-wise.
Here is everything you need to know about the IJIS century breaks:
- 2005: 4
- 2006: 5
- 2007: 5
- 2008: 3
- 2009: 4
- 2010: 8
- 2011: 4 (with 2 days to go)
---
Century breaks in July
- 2005: 10
- 2006: 3
- 2007: 11
- 2008: 4
- 2009: 11
- 2010: 2
---
Total century breaks
- 2005: 14
- 2006: 8
- 2007: 20
- 2008: 12
- 2009: 15
- 2010: 11
- 2011: 4 (so far)
---
Maximum Break
- 2005: 161,719
- 2006: 191,094
- 2007: 201,875 (highest break in the dataset)
- 2008: 145,000
- 2009: 168,437
- 2010: 142,344
- 2011: 115,000 (so far)
---
Most century breaks in a row
- 2007: 6
- 2009: 5
- 2005: 5
---
PS In the comment section of the last SIE update Gas Glo has done an interesting analysis of CT area century breaks (which are about twice as frequent as extent century breaks).
This is voodoo, not science. Entertaining voodoo, but voodoo. There's nothing special about the number 100,000.
Posted by: Nick Barnes | June 29, 2011 at 13:38
I'm glad you like it too, Nick. :-P
To be honest, this is hidden advertising for the snooker industry. I have been approached by someone from the Snooker and Public Policy Institute.
Posted by: Neven | June 29, 2011 at 13:40
That six straight in 2007 was also notable because it contained 3 of the 11 biggest breaks in the book - 143K on Jul 1 (#11), 162K on Jul 2 (#4) and then 202K on Jul 3 (#1). The week of Jun 26 - Jul 5 logged a melt of 960625 sq kms, 14% more than any other seven-day period, and an average of 137K per day. To put that in perspective, the biggest seven-day melt for 2011 is only 612K.
2007 didn't just have the most century breaks though. Sure, at the peak it booked 18 century breaks in 41 days between Jun 26 and Aug 5, but its worth noting that there were another 9 days in the 90s and another 8 in the 70's or 80's - 35. That 41 day run (with 35 days over 70K) saw a drop of just over 4 million sq kms at an average of a whisker under 100K per day.
Sometimes consistency is just as important - 70's may not be as spectacular, but if you score 70 every time you step up to the table, you're not going to lose many matches, and you'll pick up some centuries along the way.
That's been 2011's story so far - not too many spectacular breaks, but pretty consistent. Mind you, the heavy lifting has only just started...despite having the second biggest June losses ever, 2011 will still need one of the meltiest July's (3rd?) to stay below 2007.
Posted by: FrankD | June 29, 2011 at 14:11
In cycling some people call the century doing over 100 miles in one ride. I guess its more amatuers than anything else.
Posted by: dorlomin | June 29, 2011 at 14:16
Thanks for the added info, Frank.
Gas Glo suggested I copy his comments on CT area century breaks here. Here goes:
1) I used Cryosphere today area numbers to get a longer data set in order to have some hope of determining whether the number of century breaks might have some useful meaning:
Year Centuries Min 8 year sum
1979 32 5.3067255
1980 18 5.5077119
1981 32 4.9564924
1982 32 5.13906
1983 44 5.386929
1984 26 4.6958923
1985 42 4.992847
1986 30 5.3818426 256
1987 33 5.2889948
1988 38 5.1448908
1989 46 4.8159156
1990 41 4.6289349
1991 42 4.4603844
1992 40 5.0267782
1993 43 4.4729533
1994 40 4.8160958 323
1995 44 4.4103012
1996 35 5.2381849
1997 45 4.8997059
1998 42 4.262403
1999 55 4.2044988
2000 39 4.1687655
2001 45 4.5336194
2002 38 4.0347104 343
2003 48 4.1416645
2004 33 4.2829733
2005 33 4.0917983
2006 33 4.0169191
2007 41 2.9194391
2008 44 3.0035558
2009 42 3.4245975
2010 46 3.0721295 320
39 Last 365 days
Correl year,centuries 0.455357601
Correl centuries,min -0.424719847
OK correlation coefficient seems on a similar scale to year so perhaps using number of centuries could be useful. So lets try it:
Using linear regression on year produces a RMSE of 0.3456.
Using linear regression on number of centuries is 0.627 (and we don't know the number of centuries until late in the season).
Using multiple linear regression using both year and number of centuries reduces the RMSE to 0.3448. This is only marginally better than just using year, but it has to be better by the definition of the process. I tried using ten different sets of random numbers and in 7 out of those 10 attempts a set of random numbers outperformed the number of century breaks.
The conclusion would seem to be that the number of century breaks do not appear to have much predictive power in the way used here as they appear worse than an average set of random numbers. There could of course be other ways of usefully using such information.
---
2) OK 'Average km^2 reduction in century break days' is probably better than total km^2 in century break days.
Year Centuries Total km^2 Avg km^2 Min
1979 32 -4.6170642 -0.144283256 5.3067255
1980 18 -2.3684177 -0.131578761 5.5077119
1981 32 -4.4653372 -0.139541788 4.9564924
1982 32 -4.3404537 -0.135639178 5.13906
1983 44 -5.5245612 -0.125558209 5.386929
1984 26 -4.3045169 -0.165558342 4.6958923
1985 42 -5.7870403 -0.137786674 4.992847
1986 30 -4.0629231 -0.13543077 5.3818426
1987 33 -4.8633007 -0.147372748 5.2889948
1988 38 -5.6880888 -0.149686547 5.1448908
1989 46 -6.4194954 -0.139554248 4.8159156
1990 41 -5.8903435 -0.143666915 4.6289349
1991 42 -6.3517074 -0.151231129 4.4603844
1992 40 -5.8922116 -0.14730529 5.0267782
1993 43 -6.2021023 -0.144234937 4.4729533
1994 40 -5.8805938 -0.147014845 4.8160958
1995 44 -6.0084145 -0.136554875 4.4103012
1996 35 -4.7555421 -0.135872631 5.2381849
1997 45 -5.9793433 -0.132874296 4.8997059
1998 42 -6.0610489 -0.144310688 4.262403
1999 55 -8.1776087 -0.148683795 4.2044988
2000 39 -5.9563771 -0.152727618 4.1687655
2001 45 -6.6941426 -0.148758724 4.5336194
2002 38 -5.7568805 -0.151496855 4.0347104
2003 48 -7.0111419 -0.146065456 4.1416645
2004 33 -4.4579023 -0.135087948 4.2829733
2005 33 -4.7205853 -0.143048039 4.0917983
2006 33 -4.7038803 -0.142541827 4.0169191
2007 41 -6.135042 -0.149635171 2.9194391
2008 44 -6.9659804 -0.158317736 3.0035558
2009 42 -6.4141986 -0.152719014 3.4245975
2010 46 -6.477607 -0.140817543 3.0721295
There is no trend in the average numbers (-0.00027). Using this 'average km^2 reduction in century break days' only for linear regression results in RMSE of 0.61 only marginally better than number of century breaks 0.627 and a lot worse than just using year, 0.3456.
However, despite not looking much better on above measures, when used in multiple linear regression with year, it fairs better. The RMSE is reduced from 0.3456 for using year only down to 0.312. This is better than any of my 10 sets of random numbers though one set got close, 0.314.
For comparison using area at end of June and year in multiple linear regression reduces RMSE to 0.303.
So as Patrice would have expected, area at end of June appears a better predictor than average km^2 decrease in century break days.
---
3)Doing a multiple linear regression using 3 predictors: year, end June Area and average km^2 reduction in century break days reduces the RMSE to 0.261.
So adding end June to year reduced the RMSE from 0.3456 to 0.303 a reduction of 0.0426.
Adding average reduction in century breaks as a third predictor seems to be capturing something else not in year or end Jun Area as the reduction from .303 to .261 a drop of 0.042 is very nearly as large as the drop from adding the end Jun area as my second best predictor.
Of course, there is still the problem that we don't know the average area reduction in century breaks very well until near the end of the melt season. Could always try the average reduction in April-June century break days.....
Posted by: Neven | June 29, 2011 at 14:21
Taking continuous numbers and making them categorical is the most commonly applied statistical technique. Nothing voodoo about it.
Posted by: Andyborst | June 29, 2011 at 15:18
4) I was wondering whether you might transfer my posts to the century break thread.
I am not so sure about about "thorough". I don't really like the linear of multiple linear regression when we all (even inc W Connolley) agree there is downward acceleration.
To effectively get than sort of multiple non-linear regression where the non linear is only for one variable, time, then 'all' I need to do is change the predictand from the minimum to the anomaly of the minimum from the smooth non linear function. I am thinking of Larry Hamiltons' Gompertz fit as the smooth non linear function.
I wonder if Larry is planning to submit an update to his prediction to the June SEARCH report. Whether he is or not, what factors would you want to throw at this muliple (sort of non) linear regression?
Obvious ones occuring to me include:
Area at end of June for albedo effect,
Volume near end of June for less ice disappears faster,
Arctic oscilation for some of ice export effect,
Area reductions over 100k km^2 per day in April to June,
What else, suggestions welcome?
I am suggesting changing from average km^2 reduction on century break days to the reductions over 100k km^2 per day because small difference between above and below the 100k threshold can affect the average noticably whereas the effect on total reductions in excess of 100k is going to be small.
Seems like there is lots more to do rather than having been thorough.....
Posted by: Gas Glo | June 29, 2011 at 16:11
Hmm. On changing from predicting NSIDC daily area area minimum to predicting the NSIDC monthly extent anomaly from Gompertz fit I am now trying to predict a noisier data set so the RMSEs have gone up. So I need to get used to these higher RMSEs before I can make much sense of them.
Predicting NSIDC monthly minimum using linear regression on year gives RMSE of 0.508
Using Gompertz non linear fit reduces this RMSE to 0.438
Using linear regession of NSIDC area at end of June to predict Gompertz anomaly give RMSE of 0.423
Using linear regession of total reductions over 100k area during April-June to predict Gompertz anomaly give RMSE of 0.422
Using muliple linear regession of total reductions over 100k area during April-June and area at end of June to predict Gompertz anomaly give RMSE of 0.42
So neither the June area or century break data are reducing the RMSE much. 3 out of 10 sets of random numbers reduced RMSE by more so neither of these appear very useful.
Posted by: Gas Glo | June 29, 2011 at 16:59
Andyborst wrote:
Sea Ice Extent is NOT a continuous variable, it is the binary sum of a defined grid of 6.25 km x 6.25 km cells, with a >15% sea ice concentration.
Each "category" therefore has the least common factor of 39.0625 km^2 (the cell size) and each sum for SIE is rounded to the nearest whole number.
CT SIA on the other hand IS a continuous variable, subject to the resolution of the Satellite sensor conducting the sampling.
Posted by: Artful Dodger | June 29, 2011 at 23:26
Cobblers. SIA is granular for exactly the same reason as SIE - it uses the same cell sizes, in fact the exact same cells! However, because you're multiplying each by an (integer) percentage, the granularity is 100x smaller.
Both are granular measurements of a continuous variable.
Posted by: Peter Ellis | June 30, 2011 at 01:26
No Peter, SIA is granular only because of the finite resolution of the sensor (it is digital). The underlying variable however is Continuous (ie: SIA can be measured to any arbitrary precision).
SIA resolution as reported by CT is so high that duplicate values have only occurred 48 times out of 11,866 observations since 1979.
On the other hand, SIE is a Discrete variable by its very definition. SIE granularity is deliberately set by the Researchers with their choice of cell size.
Rigorously, we would use the discrete approximation of the continuous for SIA, but AMSR-E resolution is so high that it scarcely affects results.
However, wrt SIE, the correct Statistic methods to use are one appropriate for Discrete Variables.
Posted by: Artful Dodger | June 30, 2011 at 04:11
Century breaks, wile exciting, is a very bad statistical indicator. Extreme values statistics are notoriously unstable. Luck play a much larger role than other statistical parameter.
Add that to the fact that derivative is noisier than the melt curve, this makes the information content of century breaks rather low.
Posted by: Yvan Dutil | June 30, 2011 at 14:44
Welcome, Yvan Dutil!
It's easy for you all to talk disparagingly about the statistical (in)significance of century breaks, but you go take your cue and try and compile one. That ought to teach you. ;-)
Posted by: Neven | June 30, 2011 at 15:06
Maybe I will get the hang of the analysis I am trying to do eventually.
In the last instalment June area and Century breaks didn't help much. So I realised to predict the anomaly what we want to use is not the area but the anomaly of the area from a gompertz fit of area at end of June. This worked better so I also looked at volume anomaly from gompertz fit of volume data at end of June.
So The RMSE of estimates that I now get are:
Predicting NSIDC monthly minimum using linear regression on year gives RMSE of 0.508
Using Gompertz non linear fit reduces this RMSE to 0.438
Adding end of June area anomaly from gompertz fit of end June area reduces RMSE to 0.3715
Instead adding end of June volume anomaly from gompertz fit of end June volume reduces RMSE to 0.396
so area appears better than volume.
Using both area and volume RMSE is reduced to 0.358
If I were to consider adding Arctic oscilation or NAO to try to add a predictor for ice export, what lag would be appropriate?
I haven't heard many suggestions for other predictors that I could consider using.
Posted by: Gas Glo | June 30, 2011 at 15:55
Neven, what happened? I was just about to pass you a Grolsch and settle back to watch Ronnie the Rocket, and then a whole bunch of maths nerds walked in....
To paraphrase an old adage: "I can't define class, but I know it when I see Ronnie O'Sullivan snag a perfect break inside five minutes at the Crucible."
:-)
Posted by: FrankD | June 30, 2011 at 16:23
I plan to take another statistical look once the June NSIDC and GISTEMP values are out. Last year, it looked like June values were more systematic and predictable than September, as if June depended more on climate and September on weather (e.g., winds).
Regarding century breaks, comparing their frequency across datasets would be tricky if there are differences in smoothing. I don't recall how IJIS smooths their data, but the numbers tend to show less day-to-day variation than Uni Bremen does. As a consequence, IJIS has fewer century breaks -- but also fewer days with counter-seasonal change.
Posted by: L. Hamilton | June 30, 2011 at 16:40
Hi Larry, does that mean you won't be able to submit in time for June SEARCH?
Presumably it is possible to do the analysis of past years and submit a report where the projection is a function of this years June PIOMAS data. I have nearly finished doing this. Can I send this to you?
Posted by: Gas Glo | June 30, 2011 at 16:49
I realize that time is running out for June, but I just had a thought, reading the above.
Are you using the "raw" June Area/Volume, or the anomaly? I'd bet that the anomaly would be a stronger predictor.....
Posted by: Bfraser | June 30, 2011 at 18:50
Yes, I am now using anomaly not raw. Raw area was eliminated as a predictor in June 29th 16:59 post while the 30 June 15:55 post explains the anomalies I am now using are from Gompertz fits.
Posted by: Gas Glo | June 30, 2011 at 19:21
Posted by: Artful Dodger | July 01, 2011 at 00:32
Larry is quite right in that comparing IJIS SIE to CT SIA is like comparing the average of 2 Apples to 1 Orange :^)
So, if we apply 2-day averaging to CT SIA (similar to IJIS SIE), we get the following Yearly totals for SIA Century Breaks:
Year: CB's:
2002 21
2003 35
2004 25
2005 29
2006 30
2007 35
2008 49
2009 38
2010 38
2011 21
Tot: 321
Quite different now, with 2008 being a clear leader, and both 2009 and 2010 tied and in front of 2007.
Aren't Stats wonderful? I think I will now have some Shave-Ice at the Error Bar :^P
Posted by: Artful Dodger | July 04, 2011 at 13:28
Artful,
How did you generate the CT data? CT also averages their sea ice area data (5 day average?) do you don't want to average it again.
From the shape of their graph I imagine U. Bremen does not average at all. NSIDC averages for 5 days.
Posted by: michael sweet | July 04, 2011 at 13:54
Hi Michael. Perhaps you're thinking of NSIDC SIE? If CT smoothed their data with a 5-day average, you would not see swings like this:
Year.Frac SeaIceArea Delta-SIA Day
2002.4958 8.3312864 -53,109 181
2002.4987 8.2707653 -60,521 182
2002.5013 8.0455074 -225,258 183
2002.5042 8.0492058 3,698 184
2002.5068 8.0133419 -35,864 185
2002.5096 7.9655905 -47,751 186
2002.5123 7.7933941 -172,196 187
Posted by: Artful Dodger | July 04, 2011 at 17:25