When I calculate error, I cap any win over 45 at 45. As long as the expected was 45 or over, the error is 0. This is what makes things tough. You cannot use a true error. If a team was supposed to win by 100 and wins by 45, that's really all you can do.
As for
@Dilla Killa asking about 1-x sims or regressions, no, I do not. There is literally only one input -- SCORE. You cannot get anything else. Heck, it is hard enough just getting the scores. So, this becomes really more rules-based than some sort of deep learning.
Remember, I started this in 1993 and there wasn't enough power on a laptop to run anything too serious at that point and you still only have the one input. So maybe it is best to dive into the evolution of the whole thing.
The idea came from when a college roommate who played chess taught me about the ELO rating system. Two people play chess and are rated X and Y. Based on the difference, X should win z% of games. So, based on what actually happens, you follow a formula and adjust each player. At that time Sagarin was the only person doing ratings (in USA Today) and I followed those closely.
So I though, there has to be an IDEAL team (not real, but fictional perfect team). Let's make that team 100. Everyone else is somewhere between that perfect 100 and 0. This was all completely arbitrary. Next what do you do with the score. Well, I had collected a ton of scores and calculated that the average teams wins by x with a standard deviation of y. I don't remember the actual numbers, but every year in the early years, I would make sure that stayed close to true and adjust accordingly. Now is Team A beats Team B by let's say 24 and they were expected to win by 11, they over-performed by 13 and the other team under-performed by 13. You get the t-score of that and multiply it by the average MOV and you move each team either +/- that.
Interesting side note - the original algorithm was written in Pascal on an Apple. That went away fairly quickly and I had to move to straight C because I got a work PC and my Apple Pascal no longer worked. Then C++ then MS Basic. The last version I wrote was in Python.
OK, back to the early days. It seemed to work fine for small groups of schools, but the sport started growing and you had to go beyond 100, especially to keep teams from gradually falling below 0. I never wanted a team to have a negative ranking. I always believed in 2 things: every team deserved to be ranked and no team should fall below 0. It was important for me to care about everyone (except for your school, which I obviously hate 😉)
As the system grew, so did the need to enable larger moves. The concept of upsets was added as well as the idea that beating a team ranked in the top-10 or 20 should maybe provide a bump of some sort. This all came together with the idea of continually re-running the system in micro bursts until it reached some sort of equilibrium. Over the years I have tried different ways of scaling this. None are perfect.
In the early years, I never worried about the error much and would wait until the season was over to tweak. Later, I would sometime tweak a little and test for the first few weeks.
About 10 years ago I took the errors to heart and started tracking the different ones, but it came down to what is most important? I went with PREDICTING THE WINNER narrowly over MARGIN OF VICTORY, so I try to incorporate both, but in my mind, picking winners is the more important thing.