« Across the North Pole | Main | 2011 New Siberian Islands Animation »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Nick Barnes

This is voodoo, not science. Entertaining voodoo, but voodoo. There's nothing special about the number 100,000.

Neven

I'm glad you like it too, Nick. :-P

To be honest, this is hidden advertising for the snooker industry. I have been approached by someone from the Snooker and Public Policy Institute.

FrankD

That six straight in 2007 was also notable because it contained 3 of the 11 biggest breaks in the book - 143K on Jul 1 (#11), 162K on Jul 2 (#4) and then 202K on Jul 3 (#1). The week of Jun 26 - Jul 5 logged a melt of 960625 sq kms, 14% more than any other seven-day period, and an average of 137K per day. To put that in perspective, the biggest seven-day melt for 2011 is only 612K.

2007 didn't just have the most century breaks though. Sure, at the peak it booked 18 century breaks in 41 days between Jun 26 and Aug 5, but its worth noting that there were another 9 days in the 90s and another 8 in the 70's or 80's - 35. That 41 day run (with 35 days over 70K) saw a drop of just over 4 million sq kms at an average of a whisker under 100K per day.

Sometimes consistency is just as important - 70's may not be as spectacular, but if you score 70 every time you step up to the table, you're not going to lose many matches, and you'll pick up some centuries along the way.

That's been 2011's story so far - not too many spectacular breaks, but pretty consistent. Mind you, the heavy lifting has only just started...despite having the second biggest June losses ever, 2011 will still need one of the meltiest July's (3rd?) to stay below 2007.

dorlomin

In cycling some people call the century doing over 100 miles in one ride. I guess its more amatuers than anything else.

Neven

Thanks for the added info, Frank.

Gas Glo suggested I copy his comments on CT area century breaks here. Here goes:

1) I used Cryosphere today area numbers to get a longer data set in order to have some hope of determining whether the number of century breaks might have some useful meaning:

Year Centuries Min 8 year sum
1979 32 5.3067255
1980 18 5.5077119
1981 32 4.9564924
1982 32 5.13906
1983 44 5.386929
1984 26 4.6958923
1985 42 4.992847
1986 30 5.3818426 256
1987 33 5.2889948
1988 38 5.1448908
1989 46 4.8159156
1990 41 4.6289349
1991 42 4.4603844
1992 40 5.0267782
1993 43 4.4729533
1994 40 4.8160958 323
1995 44 4.4103012
1996 35 5.2381849
1997 45 4.8997059
1998 42 4.262403
1999 55 4.2044988
2000 39 4.1687655
2001 45 4.5336194
2002 38 4.0347104 343
2003 48 4.1416645
2004 33 4.2829733
2005 33 4.0917983
2006 33 4.0169191
2007 41 2.9194391
2008 44 3.0035558
2009 42 3.4245975
2010 46 3.0721295 320

39 Last 365 days

Correl year,centuries 0.455357601
Correl centuries,min -0.424719847

OK correlation coefficient seems on a similar scale to year so perhaps using number of centuries could be useful. So lets try it:

Using linear regression on year produces a RMSE of 0.3456.

Using linear regression on number of centuries is 0.627 (and we don't know the number of centuries until late in the season).

Using multiple linear regression using both year and number of centuries reduces the RMSE to 0.3448. This is only marginally better than just using year, but it has to be better by the definition of the process. I tried using ten different sets of random numbers and in 7 out of those 10 attempts a set of random numbers outperformed the number of century breaks.

The conclusion would seem to be that the number of century breaks do not appear to have much predictive power in the way used here as they appear worse than an average set of random numbers. There could of course be other ways of usefully using such information.

---

2) OK 'Average km^2 reduction in century break days' is probably better than total km^2 in century break days.

Year Centuries Total km^2 Avg km^2 Min
1979 32 -4.6170642 -0.144283256 5.3067255
1980 18 -2.3684177 -0.131578761 5.5077119
1981 32 -4.4653372 -0.139541788 4.9564924
1982 32 -4.3404537 -0.135639178 5.13906
1983 44 -5.5245612 -0.125558209 5.386929
1984 26 -4.3045169 -0.165558342 4.6958923
1985 42 -5.7870403 -0.137786674 4.992847
1986 30 -4.0629231 -0.13543077 5.3818426
1987 33 -4.8633007 -0.147372748 5.2889948
1988 38 -5.6880888 -0.149686547 5.1448908
1989 46 -6.4194954 -0.139554248 4.8159156
1990 41 -5.8903435 -0.143666915 4.6289349
1991 42 -6.3517074 -0.151231129 4.4603844
1992 40 -5.8922116 -0.14730529 5.0267782
1993 43 -6.2021023 -0.144234937 4.4729533
1994 40 -5.8805938 -0.147014845 4.8160958
1995 44 -6.0084145 -0.136554875 4.4103012
1996 35 -4.7555421 -0.135872631 5.2381849
1997 45 -5.9793433 -0.132874296 4.8997059
1998 42 -6.0610489 -0.144310688 4.262403
1999 55 -8.1776087 -0.148683795 4.2044988
2000 39 -5.9563771 -0.152727618 4.1687655
2001 45 -6.6941426 -0.148758724 4.5336194
2002 38 -5.7568805 -0.151496855 4.0347104
2003 48 -7.0111419 -0.146065456 4.1416645
2004 33 -4.4579023 -0.135087948 4.2829733
2005 33 -4.7205853 -0.143048039 4.0917983
2006 33 -4.7038803 -0.142541827 4.0169191
2007 41 -6.135042 -0.149635171 2.9194391
2008 44 -6.9659804 -0.158317736 3.0035558
2009 42 -6.4141986 -0.152719014 3.4245975
2010 46 -6.477607 -0.140817543 3.0721295

There is no trend in the average numbers (-0.00027). Using this 'average km^2 reduction in century break days' only for linear regression results in RMSE of 0.61 only marginally better than number of century breaks 0.627 and a lot worse than just using year, 0.3456.

However, despite not looking much better on above measures, when used in multiple linear regression with year, it fairs better. The RMSE is reduced from 0.3456 for using year only down to 0.312. This is better than any of my 10 sets of random numbers though one set got close, 0.314.

For comparison using area at end of June and year in multiple linear regression reduces RMSE to 0.303.

So as Patrice would have expected, area at end of June appears a better predictor than average km^2 decrease in century break days.

---

3)Doing a multiple linear regression using 3 predictors: year, end June Area and average km^2 reduction in century break days reduces the RMSE to 0.261.

So adding end June to year reduced the RMSE from 0.3456 to 0.303 a reduction of 0.0426.

Adding average reduction in century breaks as a third predictor seems to be capturing something else not in year or end Jun Area as the reduction from .303 to .261 a drop of 0.042 is very nearly as large as the drop from adding the end Jun area as my second best predictor.

Of course, there is still the problem that we don't know the average area reduction in century breaks very well until near the end of the melt season. Could always try the average reduction in April-June century break days.....

Andyborst

Taking continuous numbers and making them categorical is the most commonly applied statistical technique. Nothing voodoo about it.

Gas Glo

4) I was wondering whether you might transfer my posts to the century break thread.

I am not so sure about about "thorough". I don't really like the linear of multiple linear regression when we all (even inc W Connolley) agree there is downward acceleration.

To effectively get than sort of multiple non-linear regression where the non linear is only for one variable, time, then 'all' I need to do is change the predictand from the minimum to the anomaly of the minimum from the smooth non linear function. I am thinking of Larry Hamiltons' Gompertz fit as the smooth non linear function.

I wonder if Larry is planning to submit an update to his prediction to the June SEARCH report. Whether he is or not, what factors would you want to throw at this muliple (sort of non) linear regression?

Obvious ones occuring to me include:
Area at end of June for albedo effect,
Volume near end of June for less ice disappears faster,
Arctic oscilation for some of ice export effect,
Area reductions over 100k km^2 per day in April to June,

What else, suggestions welcome?

I am suggesting changing from average km^2 reduction on century break days to the reductions over 100k km^2 per day because small difference between above and below the 100k threshold can affect the average noticably whereas the effect on total reductions in excess of 100k is going to be small.

Seems like there is lots more to do rather than having been thorough.....

Gas Glo

Hmm. On changing from predicting NSIDC daily area area minimum to predicting the NSIDC monthly extent anomaly from Gompertz fit I am now trying to predict a noisier data set so the RMSEs have gone up. So I need to get used to these higher RMSEs before I can make much sense of them.

Predicting NSIDC monthly minimum using linear regression on year gives RMSE of 0.508

Using Gompertz non linear fit reduces this RMSE to 0.438

Using linear regession of NSIDC area at end of June to predict Gompertz anomaly give RMSE of 0.423

Using linear regession of total reductions over 100k area during April-June to predict Gompertz anomaly give RMSE of 0.422

Using muliple linear regession of total reductions over 100k area during April-June and area at end of June to predict Gompertz anomaly give RMSE of 0.42

So neither the June area or century break data are reducing the RMSE much. 3 out of 10 sets of random numbers reduced RMSE by more so neither of these appear very useful.

Artful Dodger

Andyborst wrote:

Taking continuous numbers and making them categorical is the most commonly applied statistical technique."

Sea Ice Extent is NOT a continuous variable, it is the binary sum of a defined grid of 6.25 km x 6.25 km cells, with a >15% sea ice concentration.

Each "category" therefore has the least common factor of 39.0625 km^2 (the cell size) and each sum for SIE is rounded to the nearest whole number.

CT SIA on the other hand IS a continuous variable, subject to the resolution of the Satellite sensor conducting the sampling.

Peter Ellis

Cobblers. SIA is granular for exactly the same reason as SIE - it uses the same cell sizes, in fact the exact same cells! However, because you're multiplying each by an (integer) percentage, the granularity is 100x smaller.

Both are granular measurements of a continuous variable.

Artful Dodger

No Peter, SIA is granular only because of the finite resolution of the sensor (it is digital). The underlying variable however is Continuous (ie: SIA can be measured to any arbitrary precision).

SIA resolution as reported by CT is so high that duplicate values have only occurred 48 times out of 11,866 observations since 1979.

On the other hand, SIE is a Discrete variable by its very definition. SIE granularity is deliberately set by the Researchers with their choice of cell size.

Rigorously, we would use the discrete approximation of the continuous for SIA, but AMSR-E resolution is so high that it scarcely affects results.

However, wrt SIE, the correct Statistic methods to use are one appropriate for Discrete Variables.

Yvan Dutil

Century breaks, wile exciting, is a very bad statistical indicator. Extreme values statistics are notoriously unstable. Luck play a much larger role than other statistical parameter.

Add that to the fact that derivative is noisier than the melt curve, this makes the information content of century breaks rather low.

Neven

Welcome, Yvan Dutil!

It's easy for you all to talk disparagingly about the statistical (in)significance of century breaks, but you go take your cue and try and compile one. That ought to teach you. ;-)

Gas Glo

Maybe I will get the hang of the analysis I am trying to do eventually.

In the last instalment June area and Century breaks didn't help much. So I realised to predict the anomaly what we want to use is not the area but the anomaly of the area from a gompertz fit of area at end of June. This worked better so I also looked at volume anomaly from gompertz fit of volume data at end of June.

So The RMSE of estimates that I now get are:

Predicting NSIDC monthly minimum using linear regression on year gives RMSE of 0.508

Using Gompertz non linear fit reduces this RMSE to 0.438

Adding end of June area anomaly from gompertz fit of end June area reduces RMSE to 0.3715

Instead adding end of June volume anomaly from gompertz fit of end June volume reduces RMSE to 0.396

so area appears better than volume.

Using both area and volume RMSE is reduced to 0.358

If I were to consider adding Arctic oscilation or NAO to try to add a predictor for ice export, what lag would be appropriate?

I haven't heard many suggestions for other predictors that I could consider using.

FrankD

Neven, what happened? I was just about to pass you a Grolsch and settle back to watch Ronnie the Rocket, and then a whole bunch of maths nerds walked in....

To paraphrase an old adage: "I can't define class, but I know it when I see Ronnie O'Sullivan snag a perfect break inside five minutes at the Crucible."

:-)

L. Hamilton

I plan to take another statistical look once the June NSIDC and GISTEMP values are out. Last year, it looked like June values were more systematic and predictable than September, as if June depended more on climate and September on weather (e.g., winds).

Regarding century breaks, comparing their frequency across datasets would be tricky if there are differences in smoothing. I don't recall how IJIS smooths their data, but the numbers tend to show less day-to-day variation than Uni Bremen does. As a consequence, IJIS has fewer century breaks -- but also fewer days with counter-seasonal change.

Gas Glo

Hi Larry, does that mean you won't be able to submit in time for June SEARCH?

Presumably it is possible to do the analysis of past years and submit a report where the projection is a function of this years June PIOMAS data. I have nearly finished doing this. Can I send this to you?

Bfraser

I realize that time is running out for June, but I just had a thought, reading the above.

Are you using the "raw" June Area/Volume, or the anomaly? I'd bet that the anomaly would be a stronger predictor.....

Gas Glo

Yes, I am now using anomaly not raw. Raw area was eliminated as a predictor in June 29th 16:59 post while the 30 June 15:55 post explains the anomalies I am now using are from Gompertz fits.

Artful Dodger
Averaging period and the update timing of daily data

  • In general, sea-ice extent is defined as a temporal average of several days (e.g., five days) in order to eliminate calculation errors due to a lack of data (e.g., for traditional microwave sensors such as SMMR and SSM/I). However, we adopt the average of two days to achieve rapid data release. The wider spatial coverage of AMSR-E enables reducing the data-production period.
  • Usually the latest value of daily sea-ice extent is fixed and updated at around 1 p.m. (4 a.m.) JST (UT). Before the value is fixed, we also assign a preliminary value of daily sea-ice extent several times (usually three to four times) as an early report, which is determined without the full two-day observation coverage. (The fixed values of sea-ice extent are determined with the full coverage of observation data.)

Artful Dodger

Larry is quite right in that comparing IJIS SIE to CT SIA is like comparing the average of 2 Apples to 1 Orange :^)

So, if we apply 2-day averaging to CT SIA (similar to IJIS SIE), we get the following Yearly totals for SIA Century Breaks:

Year: CB's:
2002 21
2003 35
2004 25
2005 29
2006 30
2007 35
2008 49
2009 38
2010 38
2011 21
Tot: 321

Quite different now, with 2008 being a clear leader, and both 2009 and 2010 tied and in front of 2007.

Aren't Stats wonderful? I think I will now have some Shave-Ice at the Error Bar :^P

michael sweet

Artful,
How did you generate the CT data? CT also averages their sea ice area data (5 day average?) do you don't want to average it again.

From the shape of their graph I imagine U. Bremen does not average at all. NSIDC averages for 5 days.

Artful Dodger

Hi Michael. Perhaps you're thinking of NSIDC SIE? If CT smoothed their data with a 5-day average, you would not see swings like this:

Year.Frac SeaIceArea Delta-SIA Day
2002.4958 8.3312864 -53,109 181
2002.4987 8.2707653 -60,521 182
2002.5013 8.0455074 -225,258 183
2002.5042 8.0492058 3,698 184
2002.5068 8.0133419 -35,864 185
2002.5096 7.9655905 -47,751 186
2002.5123 7.7933941 -172,196 187


The comments to this entry are closed.