-
My musings on ratings Hi!
The recent thread on inflated ratings pushed me to finally write down some thoughts on ratings that have popped up every time I have read about them on rsf. Without further ado, here comes:
Hi!
There are several properties of the USFA ratings system that tend to pop up in these discussions:
1. The granularity, or lack of it, of the system – six levels from A to U
2. The regional variation – one rating may mean an intrinsically lower level of ability on one region compared to another, while at the same time it is more difficult to get the really high ratings in the former regions
3. The wide span of level of ability within one rating
4. The first time lag issue – a single good competitive result will result in an inflated rating, whether it is the result of the fencer fencing above his usual ability, or luck with the other results
5. The second time lag issue – a good fencer who has not been fencing for some time will have a rating lower than his ability, and thus throw seeding out of whack
6. The fact that it is based on the final result of the fencer, as opposed to the results in his bouts
7. The fact that a small change in number of competitors can make it impossible to earn a rating, despite the competition in itself not being noticeably easier
8. The reason for having a ratings system at all – seeding, PR, self-esteem, and their relative importance
If one wants to improve on the present system, one should be clear over what one wants a ratings system to achieve. Also, one should be clear over what the rating should reflect in a fencer. Ideally, the rating should reflect his intrinsic ability. However, there is no way to measure this directly – one has to go by competitive results. As we all know, competitive results are based on several parameters – intrinsic ability (primarily), daily form, ability of opponents, results in bouts that one does not take part in, and sheer luck. Note that ability and form are distinct parameters. The former will only change on a comparatively long-term basis, while the latter may change relatively fast.
The ratings system is then the mechanism for distilling the known competitive results into a metric representing the intrinsic ability of the individual fencer. Ideally – IMO – this system should be able to discount for results in bouts that one does not take part in, luck (at least to some extent), be able to separate form and intrinsic ability, and take into consideration the intrinsic ability of one’s opponents. The reasons for these preferences are that the former does not reflect at all on the fencer’s own ability, and the latter can happen to anyone, regardless of ability. Then we come to the - at first - thorny question of estimating one fencer’s ability based on his competitive results and those of his opponents, whose ability also should be estimated since they are not a priori known. How do we estimate something based on something similar that is not known? It turns out that this is not as difficult as it may seem. Once one recognizes that the rating letter, or number, does not bear any God-given meaning regarding the true ability, that it instead serves to rank the fencers among themselves, the rest is math. One can start out by giving all fencers ratings totally uncorrelated to their ability, but with a reasonably well-constructed system their ratings will gravitate towards values so that ratings and relative ability will have a high and positive correlation.
How to construct this ideal system? (In this paragraph, “ideal” only relates to correlation between rating and innate ability) Firstly, only results in the bouts where the fencer takes part in should be counted. In this way, results totally out of his control will not affect his rating. Secondly, bout results over some time should be counted, in order to let good and bad luck – and short-term form swings - neutralize each other as much as possible. Thirdly, there should be some mechanism to lower the ratings for fencers who leave the competitive scene and presumably become less skilled when they stop training. Put in other words: the system should employ some sort of low-pass time averaging of bout results with time decay. The USFA system, OTOH, is a maximum-value of competition results, with a time chop. The similarity is not total.
Which features – other than discussed above – have been put forth as desirable in constructing a ratings system?
1. Low work amount – only a few data have to be input
2. Tamper-resistant – it should not be possible to access the ratings calculation system unauthorized
3. Transparency – each fencer should be able to check for himself that ratings are correctly calculated
4. Not conducive to collusion – the individual fencer should not have any motivation to fence under his ability in order improve his rating
5. Not penalize fencers for competing a lot
6. Preserve self-esteem – there should be limit to how fast one can lose ratings
The second feature is really about data and communication security, and its solutions are therefore outside the discussion of how the ratings system should be constructed. The first and third feature are linked – if all data are known and the first is fulfilled, then the third is also fulfilled if one assumes that the individual fencer has any mathematical ability. These features have also been used as arguments for a mathematically simple ratings system. Since fencing is a sport for intelligent people, I think that this should not be so. One should not accommodate everything. The fourth feature is the one where the USFA system fails – it is easy to dream up a scenario where it is crucial for the individual fencer to throw a bout in order to further his own hopes of ratings advancement. I will detail that in a post of its own. The fifth and sixth feature are possible motivations for why the maximum-value system has been chosen by USFA.
The possibly most irritating (measured as number of comments here and in rsf) peculiarity of the USFA system is that one may fence better than one usually does, and await an ratings improvement – but high-ranking fencers do not place high enough, obliterating hopes. It is my opinion that if one would change from a system based on final competition results, to a system based on bout results; this and several other drawbacks would be remedied. The following paragraph is an attempt at demonstrating this.
If one would have a system based on bout results, the results in bouts in which the fencer does not take part in would a priori lose their importance. In a bout results system, there is always motivation to win over your opponent – you never face the dilemma of wanting to keep him in the competition so that enough high-rated fencers place high enough to make the competition a ratings-awarding one. In a bout results system, it is easy to account for the fact the different fencers’ fence different number of bouts in a competition. There is no need for steps in competitor number (6, 15, 25, 64) when it comes to the rating that can be awarded at a competition, since only the fencer’s own bouts – and all of them – will be basis for the rating. How, in detail, should this bout results system be constructed? One method that has been suggested is a system similar to the ELO ranking of chess; this has been rejected due to the comparatively high computational workload. However, there are simpler systems – more on that in later posts.
How to remedy the problem of the same rating meaning different innate ability in different regions? Is it possible to solve this problem in a way that also remedies at least another problem? It turns out that there is. If some big competitions are designated national competitions, some medium-sized regional competitions, and hosts of the small ones are designated local competitions, one can solve at least three problems. By giving competitions different designations and also creating different level ratings to match, the following is achieved: fencers can get three different ratings at the same time, pertaining to different geographical levels. Someone who is good in North Dakota, say, but does not compete outside of state will then get a rating of U/U/A. A national top level fencer from New York who competes all over the place can get a rating of A/A/A. By comparing the national and regional ratings of fencers from different areas with the same local rating, the relative strengths of different areas can be estimated and taken into account during seeding. Since we have three geographical levels, ratings calculation can be done at different places, distributing the total workload. Since we now have three geographical levels and six ratings within each level, we have 63 = 216 different possible ratings, more than enough to solve the problem of wide ability span within a single rating. Rating ties when seeding should also be fairly uncommon.
A further advantage of having many rating levels (not necessarily due to several geographical levels) is that it allows for using all competitions, including restricted ones, for rating calculation. The A-C ratings would be much more exclusive than they are now, while Y-10 fencers would generally hold ratings in the XYZ range. With so many rating levels, one can let the same rating letter mean the same innate ability – for different categories of fencers – without any category being “rating cramped”, that is, almost all fencers within that age/gender category having the same rating. If the same rating letter means the same innate ability for all categories, there will be no problem with awarding ratings in a restricted competition, since the awarded rating will not be inflated compared to other categories. A downside of this – for people who can not grasp the concept of rating relativity – is that many fencers will get a lower rating letter, and thus feel “demoted”.
This leaves us with the second time lag issue, fencers who have stopped fencing and the start again. We have two cases: either they train in the interval, or they do not. Clearly, the former can be expected to be better when they start again and in an ideal ratings system this should be reflected. However, this would require information other that from competitions, which would require a lot of information gathering, difficult evaluation, and probably the information will be of low quality – people may misremember details, or even misrepresent their activity in order to get a better rating. Also, it is highly unlikely that in any given competition the proportion of returnees will be high enough to seriously disrupt the total seeding. Given all this, it is not worth the trouble to try to tackle this issue – it is best left to be dealt with by the seeding rounds.
Above it was noted that it is considered good if the fencer competes a lot, while a good ratings system (good in the meaning of high correlation between innate ability and rating) the rating should be calculated as an average of latest results. These wishes are irreconcilable – if one is satisfied better, the other will be less so. The usual compromise it to award ratings based on the average results in the fencer’s N best competitions. This allows good results to be counted, while not letting a freak result translate into an inflated rating. It also works as a low-pass filter, lessening the effect of luck and varying form. It has often be said - on rsf - that this type of system does not give any incentive for a fencer to continue competing once he has posted N good results, and has little hope of getting a better rating since another good result will only push away a previous good result. This is not true. By continuing to compete after he has posted N good results, he will – if he continues to fence well – deprive other fencers of good results, thus stopping them from catching up.
The reasons for having a ratings system at all have also been debated at great length on rsf. Some have said that it is absolutely necessary for good seeding so that the final results accurately reflect the innate ability of the fencers. This is overstated, at least. Since all competitions (except straight DE) start with a poule seeding round, luck will tend to average out and relatively weak fencer who wins a weak poule will run into good fencers later on anyway. In Sweden, many competitions are run with poules that are decided totally by lottery, or by ad-hoc ranking by the competition committee. These competitions do not seem to produce more upsets than those where properly seeded by national ranking. Personally, I think that the other reasons are more important – a new rating step is a huge boost for self-esteem, and probably can keep up the retention rate. For high-level fencers, the rating may be useful as a publicity tool in local papers – if one can get them to consider a local boy who is on national team level as more interesting than the local soccer team, ranked #2000 in the country.
This post is quite long already; I will have to cover the other sub-topics in subsequent posts.
Have a nice time!
Peter Gustafsson -
Senior Member
Array Re: My musings on ratings Originally posted by PeterGustafsson Hi!
The recent thread on inflated ratings pushed me to finally write down some thoughts on ratings that have popped up every time I have read about them on rsf. Without further ado, here comes: Hi Peter!
Thanks for the very good post (lengthy, but very, very well worth the read!!) -- you have followed all of our posts on rsf very well, and summarized and analyzed them excellently!! The only problem is that you are too European and logical for America!! <g>
Politics is the biggest issue here -- the vast majority of the USFA administration is very happy with the current rating system and feel that there is no need to change it. Any attempts to change it will not work -- it is a sacred cow. -
Quit (no longer with us)
Array while the european thing might be there, it also might be good, it's a-bout time we brought the two systems together, usfa and fie need to bring their thoughts into play, the only way is to adopt some of your ideas into our system, thusly we will all speak french, or italian. who cares, lets just do this thing. i'm with you on this. i also thought our entire salle should have packed up and gone to france for the entire summer but i couldn't convince anyone of that.....NO ADVENTURE!!!
we need a shot of europe here, our customs are becoming sanitized. -
Senior Member
Array I think there are some countries that have levels of skill they test for that you get certification for if I remember right that seemed similar to the belts you'd test for in martial arts. I suppose if that were done and there were tourneys catered to people who had tested for the similar levels of skill it would do the trick. Good luck trying to get anyone to agree on standards or who would get to do the testing though. -
Senior Member
Array I'm thinking the equation:
Peter at keyboard+22 hour Scandinavian Darkness=megaposts?
Anybody else see a mathematical correlation? "Sometimes we, as coaches, get into that dictator mode where you just tell and you don't listen and you don't try to understand them." Tom Izzo, Mich. St.
"Fraud is the creation of trust. And then: its betrayal."
William Black, Ph.D. -
Hi!
Yes it took a few hours to compile that beast. Mrs. was not pleased, thought time could have been spent better. :-(
You may be more literally true than you expect. Where I grew up - Luleå - sun goes up 11:20, and goes down 11:50 during winter solstice. I have worked for a mine in a place - Kiruna - where the sun sets Dec. 8th, and reappears Jan. 2nd. Makes for a lot of alcoholism, and trips to the Canary islands by those who can get away.
Have a nice time!
Peter Gustafsson -
Senior Member
Array I'm with Sl-Mo my posts were much longer when I was in London. Theses are evil....VERY evil, someone rescue me pls! -
Curmudgeon Emeritus
Array My only comment, which I make every time the idea of a rating system like that used in chess comes up, is that the tendency to equate "intelligence" with "mathematical ability" is ill-founded. Math ability is merely one aspect of intelligence, and is not equally ( or at all ) present in everyone of high IQ. Those in whom the faculty is highly developed tend to consider it intrinsic to anyone of any high cognitive level...just as those who are more verbally or spatially gifted tend to view that as inseparable from intelligence. Neither view is really correct. There are many who are good at math and terrible with words, and vice versa. I admit to being one of the latter sort---I am to all intents and purposes a math idiot ( as witness the fact that some of the terminology Peter used in his initial post might as well have been written in coded mirror-image Sanskrit for all the meaning it had to me ). I for one would not look favorably on any rating system which required me to juggle numbers to figure out where I was, or to check the result assigned me by some governing authority like the USFA.
The present system is not perfect; none can be. It is however adequate, and it has the benefit of simplicity. Adding accuracy at the cost of complexity is not IMO a better option... Similar Threads -
By drizzt_do_urden in forum Fencing Discussion
Replies: 11
Last Post: 02-26-2003, 10:22 PM -
By a517dogg in forum Fencing Discussion
Replies: 146
Last Post: 01-17-2003, 01:42 PM -
By swordsen in forum Discussion Archive
Replies: 21
Last Post: 10-02-2002, 02:55 AM -
By Sciurus Rex in forum Discussion Archive
Replies: 16
Last Post: 03-07-2002, 09:58 AM -
By epeemike81 in forum Discussion Archive
Replies: 10
Last Post: 07-12-2001, 03:22 PM
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules |