From: wjh+@andrew.cmu.edu (Fred Hansen)


Recently I offered to post a summary of the algorithms used in the
American Go Association rating system.  The summary below was written by
Paul Matthews, author of the rating system.  It comes as part of the
software for the "Accelerated Pairing System" which is a practical and
equitable system for pairing players in tournaments.

- - - - - - - - - - - - - -
INSIDE THE AGA RATINGS SYSTEM                                           7/28/90
Paul Matthews, Princeton Go Society


INTRODUCTION

Questions about ranks and ratings, who's really stronger, and how one
part of the world compares with another, probably have no
once-and-for-all-time answers.  Local, national and international
traditions evolve, players enter and leave active competition, the
general level of go knowledge increases, and new champions appear.  Yet
there is a persistent interest in having some kind of measurement and
recognition of playing strength.  The AGA approach for many years has
been to publish ratings, numbers on a continuous scale that can be
equated roughly to traditional amateur ranks, but that reflect the ups
and downs of competitive play.

In 1988 and 1989, the AGA ratings system was extensively overhauled. 
Phil Straus, Paul Matthews, Bob High, Steve Fawthrop, Laurie Sweeney,
Richard Cann, Bruce Ladendorf, Nick Patterson, and others, contributed
mightily of their time and expertise to launch the new system.  Although
the initial goal was to correct logical inconsistencies that had crept
into the old system, the bulk of the work turned out to be concerned
with data integrity, tournament reporting practices, computer software
development, and proving to each other that the new system really
worked.  The present article takes an inside look at the new system.


NUMERICAL SCALE

Ratings are expressed on a scale 100 and up for dan level players, and
-100 and down for kyu level players.  Dividing a rating by 100 yields
the rank equivalent; thus, 276 is a 2 dan rating, and -432 is 4 kyu. 
Because there is no rank between 1 kyu and 1 dan, there are no ratings
between -100 and 100, which can be confusing when doing ratings
arithmetic.

When a player first enters the system, his or her self-declared rank is
translated to a provisional rating.  For example, 6 dan is translated to
650, and 1 kyu to -149.  Ratings adjust quickly, so that a new player
reaches the right level in just a few tournaments, and no player's
rating gets stuck; this is one of the improvements over the old system.


CREDIBILITY

Your AGA rating does not tell you precisely how strong you are.  What it
does tell you is how you stand relative to other players based on your
recent performance in tournaments and other rated events.  Your
perception of your strength is based on more games than are rated, and
you may be more accurate, particularly if you have been playing at about
the same level for several years.  However, if your estimate differs
radically from your AGA rating, say higher by as much as 200 points,
then most players would agree that you have something to prove, and be
quite willing to give you the chance!  Discrepancies of up to 100 points
are within the range of statistical error, but if your rating were
chronically 100 points below your claimed rank, then you ought to
reassess the strength of your play.

Be aware that many of your opponents may exaggerate their rank.  In
tournaments, players often enter at a higher rank to gain experience. 
But the ratings system sees them as they are, and consequently, your
victories may not gain as many rating points as you think they should,
and your losses may be more serious.  In the United States, about one
third of the players who claim ranks between 6 kyu and 3 dan have
ratings that are one or more ranks lower.  However, the ratings of
players below 6 kyu and above 3 dan agree remarkably well with their
claimed ranks.


STATISTICAL MODEL

A statistical model is indispensable to avoid logical inconsistencies
and to do ratings arithmetic properly.  In common with the Elo system
used internationally in chess, the AGA model expresses the probability
of winning a game as a function of rating difference.  This so called
"percentage expectancy" curve, PX, is represented as a normal
probability distribution function with standard deviation px_sigma. 
Working backward from this assumption, it is possible to infer likely
rating differences given actual game results.

One problem this approach must address is to estimate a rating
difference based on a single game, or any set of games where one player
always wins.  The mathematics of simple maximum likelihood estimation
would suggest that the winning player is likely to be infinitely
stronger than the loser!  Given that most games are approximately evenly
matched, this inference is obviously unreasonable, and ignores the fact
that we have some prior knowledge about the players.  The AGA system
uses Bayesian statistical methods to solve the problem.  The essential
idea is to capture the notion that players are probably about the
strength they say they are; the technical device is a normal probability
density function, called the "rating prior," RP, centered on the
player's presumed rating and with standard deviation rp_sigma.  For one
game, the Bayesian likelihood is of the form,

    likelihood(outcome) = RP(rating1) * RP(rating2)
			* PX(outcome | rating1 - rating2)

At some point, the increase in PX likelihood as the estimated ratings of
the two players spread apart is balanced by decreases in player RP
likelihoods as ratings are stretched farther from the players' prior
presumed strengths; new ratings are defined by the balance point where
likelihood is at a maximum.  The magnitude of the rating change is
determined by rp_sigma, larger values allowing larger movements.

For multiple games, the RPs for all the players, and the PXs for all the
games, are multiplied together to obtain the overall likelihood.  This
connects the ratings of all players together in a network of
interlocking games, and improves the stability and accuracy of ratings
compared with updating ratings one game at a time.  The maximum Bayesian
likelihood is found numerically by simultaneously adjusting all the
ratings until the best (i.e., most likely) combination is found.


PARAMETER VALUES

The current values of the AGA ratings system parameters are shown in the
table below.  A px_sigma value of 104 implies that a player who is
stronger by a full rank (i.e., 100 rating points) should win about 83%
of the time; the percentage for two ranks is 97%.  The value of px_sigma
was chosen, based on the analysis of thousands of games, to be
consistent with the model that the rating point equivalent of an n stone
handicap is 100n.


	RATINGS SYSTEM PARAMETER VALUES

		px_sigma  =  104
		rp_sigma  =  80

		Rating point equivalents of handicaps:
			50 - 10 * komi,   if stones = 0
			100 * stones - 10 * komi   if 2 <= stones <= 9
					where  -20 <= komi <= 20

Rp_sigma expresses the uncertainty associated with old ratings; in
practice, rp_sigma controls the volatility of ratings.  The current
default value of 80 was chosen so that the average rating point value of
a single game is 30, which limits the expected maximum gain in a five
round tournament to 150 rating points.  Simulations showed that both
large and very small values of rp_sigma work poorly, leading to severe
fluctuations or stagnant ratings respectively.

The rating point equivalent of no komi, the so called "one stone"
handicap, is significantly less than 100, a fact that was also
recognized in the old ratings system.  The rating point values of other
komi handicaps is an interesting topic for future statistical
investigation.  The data that is currently available, much of it
provided by Wayne Nelson, suggests that every point of a komi
compensates for about 10 rating points.  Thus, since the value of the
first move (i.e., taking Black) is about 50 rating points, a reverse
komi of 5 1/2 points should come close to compensating for a full rank
difference.


IMPROVING PLAYERS

Many players believe that they are growing stronger, and are annoyed if
their rating lags behind their self assessment.  The default value of
rp_sigma seems sufficient for routine rating adjustments; however, a
rapidly improving player may play at a rank several hundred points above
his or her old rating, and a boost is needed.  Players who declare a
rank more than 50 points higher than their rating, have the mean and
standard deviation parameters of their RP function increased.  By adding
points to the RP mean, points are added to the whole system, helping to
counteract the tendency for the ratings of stable players to deflate as
other players improve.  The larger standard deviation allows an
improving player's rating to float more freely, upward or downward, and
have less effect on the ratings of opponents.  Note that a player who
performs poorly when playing above his or her rating risks a larger loss
of rating points.


SOFTWARE

The AGA ratings system is a suite of programs implemented (in C) for IBM
PC compatible machines running DOS.  The ratings system software has
been extended to provide on-site support for a wide variety of handicap
and championship tournaments, both small and large.  Now tournament
directors can generate on-the-spot ratings based on entry ranks and
tournament games, and can even use the ratings to do pairings and figure
out the tournament winners!  These extensions are called the
"Accelerated System."  Significant effort also is being devoted to
software that supports the verification and correction of AGA ID#s and
names, preferrably at the tournament site.


FUTURE WORK

The revitalized AGA ratings system is a world class system that is a
credit to the AGA and the go world.  But it will never be perfect, and
work continues.  Phil Straus, the AGA Ratings Commission chairperson, is
doing a super job in coordinating and motivating many activities
relating to ratings.  Some of the areas that are currently being
addressed are:  a comparison of ranks in foreign countries with AGA
ratings; rating the games of professional players; and better tournament
practices to improve data integrity.