-
Senior Member
Array  Originally Posted by oso97 Jason, you asked for problems with the current classification method. Here's one as I see it. To be fair, its not really a problem with the classification method itself, but rather how its used. And that is the whole Div I, Div II, Div III. We're using what is a 4 year high-water mark to determine entry into various levels of competition. This essentially turns every C1 or higher rated event, everywhere in the country, every weekend, into a qualifier event for what is supposed to be our elite levels of competition (Div I). I don't think this is a problem. Not every C1 or higher tournament results in a new C classification and a new Div 1 eligible fencer. Only a C2 is very likely to produce a newly minted C (and because of the placement component of the system, this isn't even guaranteed). Live every week like it's Shark Week. -
 Originally Posted by oso97 Second weakness of the current system: it says that an A06 who took 8th place in an A4 tournament 4 years ago stronger than a B10 who beat 10 other B's last weekend. Really? Is that a common situation? Have we been witnessing a real effect on the final results of competitions because of that? Does the possibility of this hypothetical in any real way affect the viability of the rating system as a standard (often in conjunction with national points) of preliminary seeding? -
Senior Member
Array Live every week like it's Shark Week. -
Whether it is intended or not and whether Jason likes it or not, the current system does provide extrinsic motivation, just not very fine grained (especially at the top and bottom).
Many fencers look at events on AskFred, and decide whether or not to attend based on the predicted rating of the event and whether or not they have a chance to improve their rating by attending.
Unclassified, D3, and D3 events are very popular - do you think that is because fencers want to improve their form against other beginners? or is it because they think they might get a better rating?
Jason, I think you are in Metro Division, home of many A rated WF fencers, take a look at local WF events and tell me how many of them show up for non-NAC events (excepting the recent BMW circuit which uses points to motivate participation). -
 Originally Posted by oso97 Jason, you asked for problems with the current classification method. Here's one as I see it. To be fair, its not really a problem with the classification method itself, but rather how its used. And that is the whole Div I, Div II, Div III. We're using what is a 4 year high-water mark to determine entry into various levels of competition. This essentially turns every C1 or higher rated event, everywhere in the country, every weekend, into a qualifier event for what is supposed to be our elite levels of competition (Div I). Why is that a problem? I'm not being snarky, I actually want to know. Is Div I being flooded with cannon fodder?
That being said, I think we could keep the classification system, IF we found another way to qualify fencers to the elite levels of competition.
If Div I is too big you could restrict it to As and Bs. This is only tangental to the idea of an Elo system; consider that, if we had a Div I analog in an Elo world, we'd use a number as a cutoff for Div I, right? So, again, lots of small events are in effect qualifiers.
Does the USFA really need to control the entry to Div I more closely? Is there some failure going on I'm not aware of?
Second weakness of the current system: it says that an A06 who took 8th place in an A4 tournament 4 years ago stronger than a B10 who beat 10 other B's last weekend. Really?
How do you beat ten Bs and get a B? But that's an aside. Tweak the A1, A2 etc numbers, by all means. Make the degrade time of a rating two years instead of four, that's fine too.
K O'N -
 Originally Posted by mrbiggs It's worth mentioning that it's much easier to reearn than to earn a classification. This is true both because your own rating boosts the tournament rating significantly if the tournament is small (an A only needs to win a tournament with another A to reearn his, instead of coming in above 2 As), and because your better pool seeding leads (on average) to better pool results, and therefore an easier DE path. This is a good point.
It's a very interesting paper and many thanks to DHCJr for posting. I had no idea that our current system came about from a long series of small changes instead of being implemented as a sufficient, standalone system all at once. It makes me think even more that the current system is far from ideal.
I strongly disagree with whoever said that elo is more applicable in chess because there are fewer upsets.
That was me 
In both chess and fencing, if someone is significantly better than someone else they have a huge advantage, but if they make a large mistake or don't take it seriously they can find themselves behind very easily.
While that is true, chess has fewer upsets than fencing. Certainly chess has fewer upsets than epee, anyway. To be more precise, the standard deviation of the performance of a chess player is smaller than that of a fencer.
I'm not going to beat Kasparov anytime soon, but I'm not going to beat Baldini either. That said, the pools->DEs format does not lend itself to an elo system very well, whereas the swiss system used in chess tournaments does.
I wasn't clear why this was true until I thought about it some, but it's a good point.
Let's consider the typical tournament path for an E (analog), and for an A (analog) in an Elo rating system in pools -> DEs single elimination fencing tournaments, over several events.
They each enter an A1 (analog). The E goes 2-4 in the pool, as expected. The A goes 5-1, as expected. The E gets a low seed, the A gets a high seed, all as expected. They fence each other in the first DE, and the A wins, as expected. The A goes on to win two more DEs against lower rated fencers, before he gets to the final where he fences the other A.
This happens every other weekend for four tournaments in a row.
You may say, well, the A doesn't always beat the E, right? And you're right. At the fifth tournament the E pulls an upset and wins his first DE against the A. Hooray!
What would the Elo rankings look like from this?
Assuming only DEs are used for ranking changes, let's say the A is rated at 80% to win against the E. Then when he does win, he gets 20% of his K value, and the E loses 20% of his K value. The A goes on to beat other lower ranked fencers each time, in each case picking up small amounts of points. In the finals let's say the A and his nemesis the other A just trade the same points back and forth, each winning half the finals.
Then in the last event, the E beats the A. Great, so he gets 80% of his K value in points and goes on.
Over the course of the tournaments the E would see his Elo rating go down, down, down, down, then up! This is not much like watching your PR slowly improve as a result of practice. In fact, I will venture to say that this is terrible motivation. Yet this is the typical path of an E fencing in open events. What will Es do? Well, they could fence only in E and under events, that would be good for thier Elo scores. Is that what we want Es to do, avoid fencing good fencers?
The A would see his Elo rating go up, up, up, up, then down suddenly. That last "down suddenly" part is why I think the A might think about staying home as the NAC approaches, no need to drop a hundred points for no good reason right before a big event, right?
Do either of these seem like better motivation, assuming we want external motivation, than "Man, I almost got my B last week, I really have to work harder at practice"?
None of this happens in a Swiss system chess tournament, since everyone plays the same number of games. We don't fence swiss system tournaments, though. Huh, maybe we should try one, that might be fun.
K O'N
Last edited by K O'N; 10-30-2010 at 11:29 AM.
-
 Originally Posted by fdad Unclassified, D3, and D3 events are very popular - do you think that is because fencers want to improve their form against other beginners? or is it because they think they might get a better rating? That's a false dichotomy. They can do both. And there could be many other reasons to attend a tournament. For example, just learning how much and when to eat to maintain your energy through four rounds of DEs is a valuable skill. If you always lose in the first round of DEs to one of the As, you're not learning that skill. If you fence at a big E & Under, you have a chance to work on that skill.
Personally, I feel that I learn the most about fencing when I'm fencing an opponent at or just a little above my current level. It's fun and motivating to compete against a vastly superior opponent every now and then. But if I were a D or under fencer (i.e., the majority of local fencers in most areas), I'd probably be competing at mostly D & Under events.
And none of that has anything to do with the rating system. It works with any rating system, as long as I can tell which events will have more fencers around my own level. -
Fencing Expert
Array  Originally Posted by fdad Whether it is intended or not and whether Jason likes it or not, the current system does provide extrinsic motivation, just not very fine grained (especially at the top and bottom).
Many fencers look at events on AskFred, and decide whether or not to attend based on the predicted rating of the event and whether or not they have a chance to improve their rating by attending.
Unclassified, D3, and D3 events are very popular - do you think that is because fencers want to improve their form against other beginners? or is it because they think they might get a better rating? My own experience (as a coach of many fencers ranked "C" and below) is that these events are popular because there is a good chance that the fencer can earn a decent pool result and then possibility fence more fencers in DE's of equal skill, rather than have a poor pool result and face someone much tougher than they are in the first or second DE and go home early. The perception is that they have a chance to have more fencing (at a level they can enjoy and appreciate) than simply get beaten badly and early.
A -
 Originally Posted by Allen Evans My own experience (as a coach of many fencers ranked "C" and below) is that these events are popular because there is a good chance that the fencer can earn a decent pool result and then possibility fence more fencers in DE's of equal skill, rather than have a poor pool result and face someone much tougher than they are in the first or second DE and go home early. The perception is that they have a chance to have more fencing (at a level they can enjoy and appreciate) than simply get beaten badly and early.
A Not all fencers are motivated by the same thing, but many will look at an events rating and decide to go based on potential rating upgrade. -
Senior Member
Array I think that eliminating A1 events and reducing the amount of time till ratings expire to 3 years would eliminate a lot of the random A's that I see in the epee world. As for intrinsic vs. extrinsic motivation, people are motivated by both. Why not apply some algorithm to the askfred database to give folks who are interested in rankings something to look at? Then they could tell their friends, "I'm ranked 3247th in the country, up from 3248th yesterday! Woo hoo!" Just don't have to USFA do it or use it. -
 Originally Posted by fdad Whether it is intended or not and whether Jason likes it or not, the current system does provide extrinsic motivation, just not very fine grained (especially at the top and bottom).
Many fencers look at events on AskFred, and decide whether or not to attend based on the predicted rating of the event and whether or not they have a chance to improve their rating by attending.
Unclassified, D3, and D3 events are very popular - do you think that is because fencers want to improve their form against other beginners? or is it because they think they might get a better rating?
Jason, I think you are in Metro Division, home of many A rated WF fencers, take a look at local WF events and tell me how many of them show up for non-NAC events (excepting the recent BMW circuit which uses points to motivate participation). While it would be great if the USFA based its systems on whether or not I liked something, I'm pretty sure this isn't about me.
There is no question that there are extrinsic motivators aplenty in US fencing--and all sports, for that matter. In fact, a big part of sports psychology is developing the ability to turn your attention away from the external motivators. Nothing kills performance like being focused on the outcome.
However, the idea in the paper is that the US would somehow benefit from adding more extrinsic motivation. It's a bizarre conclusion that overtly contradicts the example it presents. There may be a reason to change the rating system--no one has clearly presented one yet, as far as I know, but there may be one. However, if the reason is based on motivation, then it's a ridiculous idea.
One of the "benefits" of the current rating system is that it's general enough to be fairly easily disregarded as a motivator by most fencers. If we were to adopt a system that ranks every fencer and that ranking can fluctuate with every bout, it could create far more problems. Suddenly there is always a rating on the line.
The purpose of any new rating system needs to be very clear. The rationale presented in the paper is terribly flawed. Perhaps there are better reasons, but the solution then should be designed to address those reasons, rather than--as seems often to be the case when alternate rating systems get talked about here--picking a solution and then trying to squeeze a rationale to fit it.
Last edited by Jason; 10-30-2010 at 04:54 PM.
-
Changing the classification system to be a metric of progress and motivator: bad idea.
Having a metric of progress and a motivator: not a bad idea. -
Senior Member
Array I've been staying out of this discussion so far even though as some of you know, I've been assisting the ratings committee by simulating several candidate systems using FRED's results data. The reason I haven't weighed in yet is because the work I'm doing isn't "ready for prime time". (almost, I swear! I'm working on it right now as a matter of fact!) I'm mindful of how unsatisfying it can be to have someone give you only a little bit of information when you can tell there's so much more. Also, please bear in mind that I did not design any of the candidate systems, nor do I play any decision making role. I am providing technical assistance and data only.
That said...
Some characteristics of a rating system are hard metrics which can be tested and some are intangible factors that cannot. Of the hard metrics, there are two most important ones I see:
1. How frequently does the rating system accurately predict the bout outcome? That is to say: the higher rated fencer should win most of the time, more so as the difference in two fencers' ratings gets bigger.
2. How closely does the initial seeding produced by the rating system match the final placement? A perfect seeding would exactly match the final placement. In the real world that never happens, but the better the seeding, the closer it matches the final placement.
These are both being analyzed using the ~1.3 million bouts in FRED's database. Today I am working on adding NAC results to the mix.
Some concerns people have brought up: "Is the current system is a candidate system?"
I am simulating the current system in the same way as the other candidates so as to compare it to the others on a level playing field. Again, I'm not a decision maker in this, but I imagine that until and unless a new system be demonstrated to be a significant improvement on the current one, the current one will remain. "A rating system that lowers your rating will discourage participation."
I'm not so sure. It depends on how the system lowers your rating, and by how much. There are two main ways to lower a rating: demotion for poor performance and decay over time. The former is more accurate*, but might discourage participation. The latter encourages participation (to make up for expired ratings points), but is less accurate. Just because you haven't fenced in ages doesn't mean you suck. We've all seen examples in both directions.
Perhaps the best system would incorporate both of these. We could demote for poor performance, but tune the system so that when your rating goes up, it goes up by 3 (4? 5?) times the amount it typically goes down. That way, your potential upside outweighs the potential downside so you're not discouraged from participating. "But won't that result in rating inflation?" That's where decay over time comes in. We could decay everyone's rating X% every year (or month, or even day). X is whatever number we need to keep the overall mass of points consistent; so a "1500" in 2010 means the same as a "1500" in 2015.
Bottom line: Ratings have to go down. Otherwise they can't be accurate. The only question is how to make them go down in a way that doesn't have bad side effects. "Providing extrinsic motivation isn't a good enough reason for a new rating system"
Maybe, maybe not. Personally, I think lots of people look at ratings as a motivating goal. That said, despite what can be read in George Masin's proposal, I don't think extrinsic motivation is the main reason the USFA is considering a replacement rating system. Motivations that I'm aware of (note that there may be others, and I don't speak with authority representing the USFA on this, and these are not in order): - More accurate seeding. The current system does poorly at both the top and the bottom of the scale.
- A better way to control tournament entry. They could, for example, open Div1 NAC attendance to the top 128 people who register by a certain date. Or they could open Div1ME to all above 1750, while Div1WS is open to all above 1500.
- Motivate competition: A good system would give many opportunities to earn a higher rating, thereby motivating people to compete. An even better system would somehow motivate high level fencers to compete with lesser fencers.
- Remove the motivation to game the rating system. We all know of the A4 tournament that awarded 3 A ratings only because they suited up someone's mom and the three folks who earned A's split her USFA membership fee. Or the "rating factory" club where they hold unrated tournaments every day just to manufacture ratings.
- Better granularity at the top and the bottom. We already talk about how Div1ME has too many A's in it, but the problem is just as bad (worse really) at the bottom where you have local tournaments with 95% U rated fencers.
- Better parity in ratings across geography, gender, and age. The current system has very little rating "virality". That is to say, ratings aren't communicated from fencer to fencer very quickly, so it's difficult for rating disparity to equalize between groups of fencers. In a more viral rating system, the "crossover fencers" (e.g the woman who competes in mixed events, or the Y14 fencer who competes in Cadet) can more easily "catch" rating change from the "other" competitive community (Mixed and Cadet in my examples) and bring that change to their "native" competitive community where it might equalize disparity between the competitive communities. NOTE: by rating change, what I mean here is change both up AND down.
Ok, that's enough for this post. Perhaps a later post will give some info about the simulations I'm doing. Sorry for the Wall Of Text. Now you see why it's hard for me to post about this; I have a hard time saying just a little about it.
But for now, back to the actual work..... 
* "Accurate" defined as an evaluation of the fencer's real competitive strength. Ok, so define "competitive strength" I hear you say. I define that as ability to win bouts and ability to place well in tournaments. You'll notice those parallel the hard metrics mentioned above. -
Senior Member
Array  Originally Posted by Guided by Wire I don't think this is a problem. Not every C1 or higher tournament results in a new C classification and a new Div 1 eligible fencer. Only a C2 is very likely to produce a newly minted C (and because of the placement component of the system, this isn't even guaranteed). 490 of the 1153 C1 events in FRED awarded a C rating. Not sure if that's a lot or a little. -
Thanks very much for this post, it's extremely helpful. -
Curmudgeon Emeritus
Array Sigh. If you want extrinsic motivation, just put everyone into an online database and add a "Like My Result" function. 
How many people does the USFA have working diligently to find solutions to non-problems, and putting hours of effeort and analysis into fixing things that aren't broken? Use the Shift key, people! Keyboard manufacturers everywhere are ineffably saddened when you ignore what they made just for you! -
 Originally Posted by peet I've been assisting the ratings committee by simulating several candidate systems using FRED's results data. Will the procedure of the simulation be published when the results are published?
I'm a bit mystified by how this simulation would work. It's been a long time since I did any research where it was easy to compute a bunch of data based on simulation results, but it was significantly more difficult to prove that the data supported any particular conclusion. Maybe I'm rusty, but I'm not sure how the existing data is going to help us really evaluate a new rating system.
The question that we're asking is, "If we implement rating system B and seed all tournaments based on that system, then does it perform well by some set of metrics." Right? But when we "simulate" a candidate system based on existing results, what are we doing? Are we just taking the tournament entries and final results from the existing database and pretending like those results were produced by seeding the tournament using the candidate rating system?
Because that doesn't seem right. That is, if the current system accidentally puts my best A10s in the same pool, and because of that, they end up meeting and knocking each other out before the finals, the results of that tournament are bad (not consistent with the strength of the fencers) because of the current rating system. If your new system would not have placed them in the same pool, then maybe they would have seeded first and second out of pools, and they could not ave met ntil the final.
If you just use the entrants and final results to do a simulation, it looks like the new system didn't predict the outcome well. That is, it says fencers X and Y should have been 1st and 2nd, but they ended up 1st and 6th or something. That looks like it failed to predict the outcomes well, but that might not be because the new system predicted Y's finishing place badly. The new system would never have put X and Y in the same pool, so the seedings into DEs would have been different and possibly more accurate in the new system, avoiding these early match ups in the DEs and making it more likely that fencer Y would place higher (since he wouldn't hit fencer X until later).
And even if fencer Y still finished 6th when seeding by the new system, then perhaps his rating would drop (under a system like Elo). That would be used to seed the next tournament. An Elo-like system has this kind of feedback built in where the rating is changing after almost every tournament, and that new rating should be used to seed the next tournament.
So, I don't really understand how taking a bunch of events seeded by rating system A and then using the results to compute a rating for system B produces useful information about how system B would perform in practice.
Are we just using the current results and pretending that those are the results for the simulated rating system? Or are we using the existing data to produce some probability that fencer X beats fencer Y and then generating a bunch of new results based on that data so that we can simulate tournaments that haven't happened? Of course, I'm still not sure whether that would make sense. I'm really curious to know how the simulations are being done.
Last edited by tbryan; 10-30-2010 at 10:33 PM.
Reason: Clarification.
-
Senior Member
Array  Originally Posted by peet 490 of the 1153 C1 events in FRED awarded a C rating. Not sure if that's a lot or a little. Presumably that 490 also includes fencers renewing their C; or were those all first-time C winners? Live every week like it's Shark Week. -
Thanks for the info, Peet. Some things are still fairly unclear to me:  Originally Posted by peet A perfect seeding would exactly match the final placement. In the real world that never happens, but the better the seeding, the closer it matches the final placement. How close does the USFA want it to be? How close does it come now? Is there reason to believe that the current system creates an initial seeding that compromises the final result?
A better way to control tournament entry. They could, for example, open Div1 NAC attendance to the top 128 people who register by a certain date. Or they could open Div1ME to all above 1750, while Div1WS is open to all above 1500.
What about the current system makes this impossible?
Motivate competition: A good system would give many opportunities to earn a higher rating, thereby motivating people to compete. An even better system would somehow motivate high level fencers to compete with lesser fencers.
Why does the USFA think that they need to motivate people to compete? Does the USFA believe that if we were to eliminate all ratings completely, all competition would cease? Is competing a chore that people have to be bribed with prizes into doing? They need medals, points, ratings, prizes and recognition (all, preferably, within an officially designated circuit of some kind) or they'll all just quit?
Remove the motivation to game the rating system. We all know of the A4 tournament that awarded 3 A ratings only because they suited up someone's mom and the three folks who earned A's split her USFA membership fee. Or the "rating factory" club where they hold unrated tournaments every day just to manufacture ratings.
Is this really something a change in the rating system can achieve? The only way I can think of to remove the motivation to game the system is to devalue ratings.
Better granularity at the top and the bottom. We already talk about how Div1ME has too many A's in it, but the problem is just as bad (worse really) at the bottom where you have local tournaments with 95% U rated fencers.
I appreciate the ME problem but, as I mentioned before, considering how much fluctuation there is in ME results, can any system really solve this problem better than just tweaking what we already have? With how much back-and-forth and how many upsets there are in epee, could it be that the inability to "precisely" seed a ME event is an inherent reality of the sport?
Also, why do we need a system that differentiates amongst beginners? What exactly does that achieve?
Better parity in ratings across geography, gender, and age. The current system has very little rating "virality". That is to say, ratings aren't communicated from fencer to fencer very quickly, so it's difficult for rating disparity to equalize between groups of fencers. In a more viral rating system, the "crossover fencers" (e.g the woman who competes in mixed events, or the Y14 fencer who competes in Cadet) can more easily "catch" rating change from the "other" competitive community (Mixed and Cadet in my examples) and bring that change to their "native" competitive community where it might equalize disparity between the competitive communities. NOTE: by rating change, what I mean here is change both up AND down.
This is the "an A in Nebraska is worth a C in NY" problem, no? Isn't the easiest solution to this just to change the requirements for achieving ratings at competition? Or is this also supposed to address the (somewhat contradictory) "it's so hard to get a C in Wyoming" problem? What real-world problems are we seeing arise from these issues?
I have to say I'm skeptical that there is a major problem with the current system. I look forwarded to hearing more about your project.
Last edited by Jason; 10-30-2010 at 10:55 PM.
-
Curmudgeon Emeritus
Array The USFA wants to motivate people to compete? But...but...won't that result in even larger, more unmanageable national tournaments?
Won't someone please think of the poor officials? Clearly we need a way to deter people from competing...
Eureka! At last I understand the dogged persistence of the Elo proponents. Use the Shift key, people! Keyboard manufacturers everywhere are ineffably saddened when you ignore what they made just for you! Similar Threads -
By LeftHanded in forum Fencing Discussion
Replies: 8
Last Post: 04-24-2007, 09:11 AM -
By ntjst4sprt in forum Fencing Discussion
Replies: 29
Last Post: 06-09-2006, 05:58 PM -
By PeterGustafsson in forum Fencing Discussion
Replies: 67
Last Post: 01-18-2006, 12:34 PM -
By D'Artagnan1673 in forum Discussion Archive
Replies: 25
Last Post: 03-29-2002, 10:15 PM -
By masterrock in forum Discussion Archive
Replies: 9
Last Post: 08-01-2001, 06:35 PM Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules |