topleft topright

Results 1 to 17 of 17
  1. #1
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7

    The program "Felo" released

    Hallöchen!

    Yesterday I released version 1.0 of "The Felo program".

    It's about calculating ratings for fencers in order to estimate their strength, much like the Elo rating in chess. It's Free Software, and you may download its source code, too, if you wish.

    I hope you like this little Christmas present!

  2. #2
    Senior Member Array erooMynohtnA's Avatar
    Join Date
    Mar 2006
    Location
    Madison, WI
    Posts
    4,904
    It looks really cool.

    I wonder if there would be pressure at a local tournament not to publish a result if a really good fencer had a bad day and his Felo dropped.

  3. #3
    Moderator Array
    Join Date
    Feb 2005
    Location
    Austin, TX
    Posts
    12,167
    It looks interesting. The three significant questions that spring to my mind are:
    1) How well does it scale, in terms of number of fencers supported and number of results?
    2) How long does it take to go from initial setup to reliable results? ie, if I put 200 fencers in, how long would it take before I should expect a reasonably strong and consistant connection between results and predictions? If ever, of course.
    3) Does the system place any substantial weight on win versus lose, or does it give the same step difference to a user in 5:4 versus 4:5?

    Also, I wonder what other people think of basing results on total point outcomes. Currently, if you're in a DE bout you have a strong incentice to win regardless of the other fencer's score, because 15-0 is as good for you as 15-14. If final score is an important component, "doubling out" in epee will suddenly be a lot less advantagous.

  4. #4
    Senior Member Array Army Fencer's Avatar
    Join Date
    Oct 2003
    Location
    DC
    Posts
    2,690
    Quote Originally Posted by KD5MDK View Post
    It looks interesting. The three significant questions that spring to my mind are:
    1) How well does it scale, in terms of number of fencers supported and number of results?
    2) How long does it take to go from initial setup to reliable results? ie, if I put 200 fencers in, how long would it take before I should expect a reasonably strong and consistant connection between results and predictions? If ever, of course.
    3) Does the system place any substantial weight on win versus lose, or does it give the same step difference to a user in 5:4 versus 4:5?

    Also, I wonder what other people think of basing results on total point outcomes. Currently, if you're in a DE bout you have a strong incentice to win regardless of the other fencer's score, because 15-0 is as good for you as 15-14. If final score is an important component, "doubling out" in epee will suddenly be a lot less advantagous.
    The calculation - Felo 1.0

    Apparently, it is by touches (15-0 is better than 15-14). It looks to me like it's the engine of the formula.

    I'll add that for epee, this is a very, very bad way to calculate any sort of rating. A win is a win.
    Last edited by Army Fencer; 12-21-2006 at 12:04 AM.
    Don't let 'em drop it. Don'tlet'emdropit. Stop it... bebop it.

    ~Charlie Mingus

  5. #5
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7
    Quote Originally Posted by Army Fencer View Post
    The calculation - Felo 1.0

    Apparently, it is by touches (15-0 is better than 15-14). It looks to me like it's the engine of the formula.

    I'll add that for epee, this is a very, very bad way to calculate any sort of rating. A win is a win.
    Please note that the Felo ratings don't try to imitate or even replace a tournament. They just try to estimate the relative strengths as accurately as possible, (in terms of win probabilities).

    Besides, mathematically, there is no big difference between counting wins/losses, or using the exact score (as Felo does it); you end up with the very same values. However, using the score converges much faster because you don't throw away information.

    Psychologically, there may be a very slight difference because at a 14:0 it's still worth fighting for every hit, even with epee. But this effect wouldn't be too bad, would it?
    Torsten Bronger, Aachen, Germany

  6. #6
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7
    Quote Originally Posted by KD5MDK View Post
    [...] The three significant questions that spring to my mind are:
    1) How well does it scale, in terms of number of fencers supported and number of results?
    I've never tested it but I'd be very suprised if you managed to bring it down. The whole calculation is not a CPU or RAM hog. On the contrary, for today's computers it's almost trivial. The only thing that is important is a good backup scheme so that the valuable bout file cannot get lost.

    For the future, I plan the feature to create "snapshots", ie a new bout file is created with the current Felo ratings as the new initial ratings and an empty bout list. This makes the work more convenient if you have a lot of data. However, this is more or less aesthetical.

    The only thing that doesn't scale yet are the plots. If you have more then 50 fencers, they get cluttered and should be split. Simply haven't done this so far.
    Quote Originally Posted by KD5MDK View Post
    2) How long does it take to go from initial setup to reliable results? ie, if I put 200 fencers in, how long would it take before I should expect a reasonably strong and consistant connection between results and predictions? If ever, of course.
    In our group we got good initial values with 6-7 bouts per fencer. Most of them were 5 point bouts.

    The Felo ratings find their true values rather quickly (as do the Elo ratings), however, the best so-called "k factor" is unknown so far because the Felo ratings are too young. I think I set it too high in the 1.0 release. I used the chess values but fencers produce more information per "game". Setting k too high doesn't affect the rating itself, but makes the rating too sensitive to wins and losses so that it jumps too much.

    The program allows to set the k factor in the bout file so that everybody can damp this behaviour. In a future release of Felo, it may well be that I change the defaults (not without mentioning it loudly in the manual of course). However, not before having much more experience with Felo ratings.
    Quote Originally Posted by KD5MDK View Post
    3) Does the system place any substantial weight on win versus lose, or does it give the same step difference to a user in 5:4 versus 4:5?
    As already said, it's equivalent to count wins or losses on the one hand, and the fraction of touches won on the other hand. So the only alternative we considered was to add a "win bonus". However, this destroys the mathematical model, so you would have to prove that this still leads to consistent values. Additionally, this win bonus would be arbitrary, without an objective foundation. And last but not least, it would not change much.
    Quote Originally Posted by KD5MDK View Post
    Also, I wonder what other people think of basing results on total point outcomes. Currently, if you're in a DE bout you have a strong incentice to win regardless of the other fencer's score, because 15-0 is as good for you as 15-14. If final score is an important component, "doubling out" in epee will suddenly be a lot less advantagous.
    This is absolutely right. Including every hit in the calculation rewards, well, every hit. However, the Felo ratings inherited the stability from the Elo ratings. In my simulations I was suprised how strongly I could skew some parameters without much effect. The difference between the "doubling-out" strategy and "fencing for every point" strategy won't end up at significantly different numbers, because the opponent will get into "doubling-out" mode, too, with his own probability. Eventually, it averages out.

    However, this is theory. It may be a psychological difference, as mentioned in the other post. Again, it won't change much, but this apparent difference between Felo success and tournament success may annoy some fencers.

    If you are really concerned about it, simply put 1:0 and 0:1 as the only possible results in the Felo input file. The Felo ratings will then reflect tournament elimination behaviour instead of "fencing for the result" behaviour, but they will converge more slowly.

    I don't want to recommend this as the standard way in the manual, though. Or what do you think?
    Torsten Bronger, Aachen, Germany

  7. #7
    Fencing Expert Array oiuyt's Avatar
    Join Date
    Apr 2000
    Location
    Pennsauken, NJ
    Posts
    11,887
    Quote Originally Posted by bronger View Post
    As already said, it's equivalent to count wins or losses on the one hand, and the fraction of touches won on the other hand. So the only alternative we considered was to add a "win bonus". However, this destroys the mathematical model, so you would have to prove that this still leads to consistent values. Additionally, this win bonus would be arbitrary, without an objective foundation. And last but not least, it would not change much.
    Only if touches are independent events and bout decisions are merely the result of a series of such events. Certainly not something I'd be willing to stipulate (in large part because I don't believe it to be true).

    Anecdotal and empirical evidence suggests that scores tend to be closer than relative ability would indicate in most cases -- a fencer well ahead will coast to victory rather than continue to expend energy to force a slaughter situation. Which makes sense with our tournament strucutre. 15-11 isn't worse than 15-5.

    The minor extent that there is a difference (the coasting fencer coasts too far and starts risking dropping the bout) will already be adequately covered by the few extra upsets. The somewhat less common difference (the losing fencer gives up and effectively mails in the end of the bout, allowing for a rout) doesn't noticibly change when only the eventual outcome is examined.

    Looking at tournament ranking systems one has to consider how much to adjust to the way fencing has historically viewed results. When George Masin proposed an ELO-based system about a decade ago he modified it to look at final tournament placement, rather than head-to-head comparisons based on each bout. I see two primary reasons for this decision.

    1) Requires less input data -- specifically doesn't require any more than is typically listed in the final results that pretty much every tournament currently posts/releases

    2) Matches what fencing has historically viewed as the appropriate reflection of tournament performance.

    Those both seem like very reasonable desiderata.

    Note that he might have had additional (or merely different) reasons. Those are the two that immediately spring to mind as likely arguments in favor of this method.

    -B
    "Oh but you can't expect to wield supreme executive power just because some watery tart threw a sword at you!"

  8. #8
    Senior Member Array jeff's Avatar
    Join Date
    Nov 2002
    Location
    It's a dry heat
    Posts
    6,725
    Hmmm, now that computer power is so widely available, we can go back to those hideous TS/TR ratios we used for indicators in the bad old days, instead of the TS-TR system we use now (stands back to avoid thrown objects...)
    "In theory, theory and practice are the same, but in practice, theory and practice are different."

  9. #9
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7
    Quote Originally Posted by oiuyt View Post
    Only if touches are independent events and bout decisions are merely the result of a series of such events. Certainly not something I'd be willing to stipulate (in large part because I don't believe it to be true).

    Anecdotal and empirical evidence suggests that scores tend to be closer than relative ability would indicate in most cases -- a fencer well ahead will coast to victory rather than continue to expend energy to force a slaughter situation. Which makes sense with our tournament strucutre. 15-11 isn't worse than 15-5. ...
    Only a small fraction of points is affected by this, therefore, it changes the resulting rating not significantly. (Well, I think, because I haven't yet simulated this case.) Besides, this is averaged out too, apart from the component created by the fact that different fencers have differently strong tendencies to give up a bout or to coast. But this is really a subtlety.

    Quote Originally Posted by oiuyt View Post
    Looking at tournament ranking systems one has to consider how much to adjust to the way fencing has historically viewed results. When George Masin proposed an ELO-based system about a decade ago he modified it to look at final tournament placement, rather than head-to-head comparisons based on each bout. ...
    Well, in the end you must define what you want to measure. There is no silver bullet. Felo measures the ability to make a hit (which includes the sometimes-not-present motivation to do so. ) It doesn't make any errors with it, it is just the quantity it estimates. This approach has the advantages of converging quickly, being flexible, and adjusting quickly if your condition/abilities change.

    If you want to measure the ability to win, you can do so as I explained above. I will add to a future release the option to switch between both approaches to make the transition loss-less with regard to the bout data. I can only tell that I changed the rules quite often while developing Felo, but the inner order of our fencers didn't change really.

    However, this approach is only feasible for fencers who fence very regularly. Moreover, they should do elimination bouts mostly. In our group, at least a 70% fraction is "free fencing", where you fight for every point because a close result does count for your personal record. Here, it would really be a pity to throw away all this information.
    Torsten Bronger, Aachen, Germany

  10. #10
    That Guy Array Craig's Avatar
    Join Date
    Dec 1999
    Location
    Atlanta, GA
    Posts
    6,330
    Blog Entries
    18
    Looks interesting. Will have to play with it some.

    Craig

  11. #11
    eac
    eac is online now
    Senior Member Array
    Join Date
    Oct 2005
    Location
    San Francisco, CA
    Posts
    1,346
    I agree about wins being more accurate measurements of strength than scores. Under a score system, a guy who won the tournament by winning all his 6 DE's 15-12 is worse off than a guy who beat two strong people in a row 15-0 and then lost 15-14, which is probably not what we want.
    The other bothersome thing that occurred to me about an Elo system even based on victories or defeats, though I am favorably disposed to it by default, is that it rewards bad pool results. If you do badly in pools, you hit a strong fencer soon. If you have bad stamina, you might deliberately do not-perfectly in pools (because presumably pool bouts don't count as much as DE bouts) so that you can hit stronger people earlier. Not that this is always a good idea, but it seems that the motivation is not always purely to fight for every touch in pools anymore.
    All this stuff would be fixed, I think, by a results-list-based system. That makes people fight for the same thing they're fighting for now. A bout-based system might work better on people who don't care about their Elo rating, but people trying to optimize their bout-based Elo rating might produce some weird phenomena.

  12. #12
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7
    Quote Originally Posted by eac View Post
    I agree about wins being more accurate measurements of strength than scores. Under a score system, a guy who won the tournament by winning all his 6 DE's 15-12 is worse off than a guy who beat two strong people in a row 15-0 and then lost 15-14, which is probably not what we want.
    Not necessarily. Note that the strength of the opponents is taken into account, hence a 15-12 may give more points than a 15-0, depending on the opponent. So it is not more serious -- even much less serious -- than the possibility that a weaker fencer kicks out a stronger one in an elimination round.
    Quote Originally Posted by eac View Post
    The other bothersome thing that occurred to me about an Elo system even based on victories or defeats, ..., is that it rewards bad pool results. If you do badly in pools, you hit a strong fencer soon. If you have bad stamina, you might deliberately do not-perfectly in pools (because presumably pool bouts don't count as much as DE bouts)
    If pool bouts are 5 points bouts and DE bouts 10 or 15, then yes.
    Quote Originally Posted by eac View Post
    so that you can hit stronger people earlier. ...
    But what is the advantage? The bout against the good fencer doesn't offer more Felo points than the other bouts.
    Quote Originally Posted by eac View Post
    All this stuff would be fixed, I think, by a results-list-based system. That makes people fight for the same thing they're fighting for now. ...
    If the Felo ratings -- or something similar -- were used on the big scale, the fencing behaviour would very slightly shift towards "fighting for every point". But first, I don't believe this to be significant, and secondly, I don't believe this to be a bad development. Besides, elimination systems have serious drawbacks, too, and ratings based on them would adopt these drawbacks.

    However, we are far away from this anyway. Although I find it great that you discuss the Big Picture, we intended Felo to be a neat tool for everyone, including fencing clubs or college/university groups. There, you have more than tournaments. For mathematical reasons, you can't mix the methods. I still think the per-hit approch is the lesser of the two evils.

    And, after all, placing successfully the very next hit should be the predominant motivation of a fencer despite all strategy, shouldn't it?
    Last edited by bronger; 12-21-2006 at 05:51 PM. Reason: typo
    Torsten Bronger, Aachen, Germany

  13. #13
    That Guy Array Craig's Avatar
    Join Date
    Dec 1999
    Location
    Atlanta, GA
    Posts
    6,330
    Blog Entries
    18
    Any chance of an import tool to grab some enguard files and churn them through this system?

    or

    Can you provide a template file so that I can format a spreadsheet to import? Didn't seem too clear in the instructions.

    Craig

  14. #14
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7
    Quote Originally Posted by Craig View Post
    Any chance of an import tool to grab some enguard files and churn them through this system?
    You're not the first who asks ... I'll do it over the holidays.
    Quote Originally Posted by Craig View Post
    Can you provide a template file so that I can format a spreadsheet to import? Didn't seem too clear in the instructions.
    Which format has the speadsheet data? Is this data that already exists?
    Torsten Bronger, Aachen, Germany

  15. #15
    Fencing Expert Array oiuyt's Avatar
    Join Date
    Apr 2000
    Location
    Pennsauken, NJ
    Posts
    11,887
    It'd be interesting if you could structure it to scrape data from FRED. Perhaps run it both with the touch method and the whole-bout method. There's enough data in FRED that slow convergence shouldn't be an issue. At the very least it'd be interesting to see whether or not your theory holds that they should converge to the same numbers.

    -B
    "Oh but you can't expect to wield supreme executive power just because some watery tart threw a sword at you!"

  16. #16
    Just Joined Array
    Join Date
    Dec 2006
    Location
    Aachen/Germany
    Posts
    7
    Quote Originally Posted by oiuyt View Post
    It'd be interesting if you could structure it to scrape data from FRED.
    That's difficult because I see no other method but to parse the Web pages. Or is there a comprehensive download?
    Quote Originally Posted by oiuyt View Post
    Perhaps run it both with the touch method and the whole-bout method. There's enough data in FRED that slow convergence shouldn't be an issue. At the very least it'd be interesting to see whether or not your theory holds that they should converge to the same numbers.
    Well, this was wrong, at least partly. (I should have known better since this was one of my first findings.)

    As I said above, "you can't mix both methods". The reason is that the "touch method" brings the ratings closer together. The cause is quite simple: If a fencer has a 60% change to win a point, he has a 80% chance to win a 15 point bout. So counting only whole bouts widens the interval that the ratings cover.

    The order of the fencers remains the same, though, but the whole-bout list is more noisy, as already explained.
    Last edited by bronger; 12-23-2006 at 06:08 AM. Reason: typo
    Torsten Bronger, Aachen, Germany

  17. #17
    Senior Member Array peet's Avatar
    Join Date
    Jan 2003
    Location
    San Francisco
    Posts
    2,192
    Quote Originally Posted by bronger View Post
    That's difficult because I see no other method but to parse the Web pages. Or is there a comprehensive download?

    We could maybe talk about getting you an extract straight from the DB.

    PM me if you're interested.


    -p

Similar Threads

  1. Political Compass beyond "liberal" and "conservative"
    By TrainingDummy in forum Politics
    Replies: 16
    Last Post: 11-07-2006, 12:23 PM
  2. "Club in a box" program???
    By PeterGustafsson in forum Fencing Discussion
    Replies: 2
    Last Post: 09-12-2006, 11:14 AM
  3. BF "white" and BF "blue" FIE foil blades
    By millsisland in forum Armory - Q&A
    Replies: 8
    Last Post: 09-27-2005, 12:25 AM
  4. ER "Secrets and Lies" (S8, E16) features "roughhouse" fencing
    By esskreemr in forum Fencing Discussion
    Replies: 2
    Last Post: 12-06-2004, 07:42 PM
  5. Teaching "proper" form when it isn't "natural"
    By Tomas N in forum Fencing Discussion
    Replies: 27
    Last Post: 10-07-2004, 02:33 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30