Testing the Toy

51eleven

Active member
Granger's masters thesis, nicknamed inappropriately the toy by I can't remember who, is a success story. He graduated and it propelled him to greater things in UT Tennis and the greater statistical world. It's been bashed since infancy. And refined by the Guru over the years. It's somewhat of a crap shoot at the start of the year but gets better with each week. Which is why I don't understand a spread this week.
At week four it should be getting better. I hope it's wrong. All rankings are current this week, not when prior games happened.
Strawn was 46'd by currently # 12 Knox City (lost to # 4 May by 23 last week).
Then they beat # 92 Dallas Lutheran 52-50. A surprisingly fast team, the Hounds went up on 24-8 in the 1st Q.
Then came # 17 Gordon who had 45'd #53 Throckmorton and # 157 Ranger. They were 46'd by the Horns at the half. Good luck to the Horns when the going gets tough, not playing any top 20 teams yet. Hounds currently still ranked # 59.
Gorman, Strawn's opponent this week is ranked # 23. They 45'd # 62 Lometa and # 142 Evant and beat # 69 Blum by 28.
If the current # 23 team beat # 69 by 28, should they be favored by 45 over the current # 59?
IDK, as I've told Guru before I got lost at logarithms in college math.
I still hope the Hounds might have a chance of making it a full game, maybe even a game of it.
 

Mike

Administrator
I see a lot of things in the spreads, even later in the season last year, that don't make sense to me. Gorman lost rating points last week for not taking care of business and 45ing Blum as they were picked to do and, even still, would still be picked to 45 them. Maybe The Toy isn't deducting enough rating points for teams that still win but don't cover the spread? I'm, honestly, not sure. Now that I've got the algorithm re-written in a code that I can understand, though, I am running some tests behind the scenes to see if I can get things to be more accurate on a week by week basis. So far, though, I can't get it to do much better than Granger had it set up to do already. I will keep playing and tweaking it, though, and if I come up with something that works better we'll start rolling with that.

For now all I can say is maybe The Toy is biased towards Gorman...
 

Tigerfan85

Active member
I see a lot of things in the spreads, even later in the season last year, that don't make sense to me. Gorman lost rating points last week for not taking care of business and 45ing Blum as they were picked to do and, even still, would still be picked to 45 them. Maybe The Toy isn't deducting enough rating points for teams that still win but don't cover the spread? I'm, honestly, not sure. Now that I've got the algorithm re-written in a code that I can understand, though, I am running some tests behind the scenes to see if I can get things to be more accurate on a week by week basis. So far, though, I can't get it to do much better than Granger had it set up to do already. I will keep playing and tweaking it, though, and if I come up with something that works better we'll start rolling with that.

For now all I can say is maybe The Toy is biased towards Gorman...
Just make sure it doesn’t put May at number one. It was a curse in 1984 when DCTFM first did sixman ratings and it was a curse last year to May again. Put us at like 26 or 27. That’d be fine.
 

granger

Founder
Just to clarify. It wasn’t a masters thesis. I wrote that thing way before graduate school (either time). The original system pegged a ‘perfect’ team around 100. It was also zero sum, meaning the total points in the system stayed consistent. There were fewer than 90 schools back then. As more schools came to six-man, the rules needs to change. Also when you end a game early, you need to set limits on movement. As the season moves on, these limits really hamper decent teams with weak schedules. Every few years tweaks need to be made. Are there weaknesses, yes, nothing is perfect, but you try to limit the ‘error’ as much as possible for the entire system.
 

Mike

Administrator
Just to clarify. It wasn’t a masters thesis. I wrote that thing way before graduate school (either time). The original system pegged a ‘perfect’ team around 100. It was also zero sum, meaning the total points in the system stayed consistent. There were fewer than 90 schools back then. As more schools came to six-man, the rules needs to change. Also when you end a game early, you need to set limits on movement. As the season moves on, these limits really hamper decent teams with weak schedules. Every few years tweaks need to be made. Are there weaknesses, yes, nothing is perfect, but you try to limit the ‘error’ as much as possible for the entire system.

I think you've done a great job with it and, as of now, it's still about as best as I can get it. I've done countless tests using last year's ratings/scores and there is nothing I can do to make it any better. I've changed just about everything that can be changed, added some things, removed some things, and it's still better the way it was when I got it. I've looked at everything from weekly performance, overall performance, scoring differences, etc. Of course I won't stop there. I'll always be looking at ways to make it better, every season from here on out, but for now I'd say it's just fine the way it is.

Nothing is going to be 100% accurate in all cases, but when The Toy is picking at 85%, or better, later in the season I'd say that's pretty darn good!
 

GoBucks

Member
I think you've done a great job with it and, as of now, it's still about as best as I can get it. I've done countless tests using last year's ratings/scores and there is nothing I can do to make it any better. I've changed just about everything that can be changed, added some things, removed some things, and it's still better the way it was when I got it. I've looked at everything from weekly performance, overall performance, scoring differences, etc. Of course I won't stop there. I'll always be looking at ways to make it better, every season from here on out, but for now I'd say it's just fine the way it is.

Nothing is going to be 100% accurate in all cases, but when The Toy is picking at 85%, or better, later in the season I'd say that's pretty darn good!
What does the accuracy regression look like? And when you say 85%, is that just picking winners, or closely approximating the score/spread? I know that it definitely gets better as the season goes on, and when you've only got 6 young men on the field, things can change for teams drastically from year to year based on graduations, injuries, etc.

And I will say that it has done a great job in the past, and I expect it to continue to do great and get better.

Thanks for all you do Mike (and Granger for putting in the team biases to start with ;))
 

Mike

Administrator
No, the accuracy is just picking winners. I did run numbers on how far the spread was off on games that were guessed incorrectly, though. On that, week 1 was something like 3242 points off, and week 11 was 518 points off. Since it's important to note that there were a lot more incorrect games in week 1 than in week 11, I had to determine how many points per game it was off. For that, week 1 was 66 points per game off and week 11 was just 37 points per game. While it'd be nice to close that gap a little, I think picking every game to within an average of 37 points is pretty good, especially considering the number of teams/games there are in this thing and how many of the schools in here never even play similar opponents.

As for the accuracy, using last year as an example, in week 1 it was 59.84% accurate. Definitely not good, but I'm not sure this year's were any better just because, like you said, years can be so much different for teams from one to the next and it's very hard to line up all the teams. 90% of the time, teams will be just where they were to end the previous season. Even if I do know about a drastic change (losing a lot of seniors, new coach, etc) it's still just a guess. With my luck, I'll take 59%. Ha!

In 17 weeks (16 really because there were no games in week 16) one week was 92% accurate (week 10 it went 106-9). There were 4 weeks over 85%. Another two weeks over 80%. Five weeks over 75%. It's important to note, too, that week 17 was 0%. It didn't pick either of the state games accurately. Also, week 15 it went 5-2, making that week 71%. Other than those two weeks, starting in week 3 it never dipped below 75%.

I should also mention that even though it went 5-2 in week 15, making it the lowest week by accuracy since week 1, it only missed those 2 games by an average of 24.88 points.

On the season The Toy went 1075-284, accurately picking the winner in 79.1% of games. If you discount week 1, because those were my "picks" then The Toy actually went 1002-235, which is 81% accurate. Again, with the number of teams/games in the system, I think that's pretty darn good.
 

granger

Founder
It will be bouncy at best for a few teams over the first half+ of the season. Just the way things go.

The worst is a team that is good early on then suffers from grades or injuries and becomes just a shell of themselves. Those really mess things up.

As for finding the error or regression as you say… I have always made my tweaks based on a combination of whether the total adjusted points off (think about 45s capped at 45) least squares AND whether re-running all games at the current level maximizes correct picks. It is always a balance.
 

GoBucks

Member
No, the accuracy is just picking winners. I did run numbers on how far the spread was off on games that were guessed incorrectly, though. On that, week 1 was something like 3242 points off, and week 11 was 518 points off. Since it's important to note that there were a lot more incorrect games in week 1 than in week 11, I had to determine how many points per game it was off. For that, week 1 was 66 points per game off and week 11 was just 37 points per game. While it'd be nice to close that gap a little, I think picking every game to within an average of 37 points is pretty good, especially considering the number of teams/games there are in this thing and how many of the schools in here never even play similar opponents.

As for the accuracy, using last year as an example, in week 1 it was 59.84% accurate. Definitely not good, but I'm not sure this year's were any better just because, like you said, years can be so much different for teams from one to the next and it's very hard to line up all the teams. 90% of the time, teams will be just where they were to end the previous season. Even if I do know about a drastic change (losing a lot of seniors, new coach, etc) it's still just a guess. With my luck, I'll take 59%. Ha!

In 17 weeks (16 really because there were no games in week 16) one week was 92% accurate (week 10 it went 106-9). There were 4 weeks over 85%. Another two weeks over 80%. Five weeks over 75%. It's important to note, too, that week 17 was 0%. It didn't pick either of the state games accurately. Also, week 15 it went 5-2, making that week 71%. Other than those two weeks, starting in week 3 it never dipped below 75%.

I should also mention that even though it went 5-2 in week 15, making it the lowest week by accuracy since week 1, it only missed those 2 games by an average of 24.88 points.

On the season The Toy went 1075-284, accurately picking the winner in 79.1% of games. If you discount week 1, because those were my "picks" then The Toy actually went 1002-235, which is 81% accurate. Again, with the number of teams/games in the system, I think that's pretty darn good.
When a team wins by more than 45, and since the toy only predicts up to a 45 point win, do you take the "hit" for inaccuracy for someone blowing out a team 70-0? It's tough to predict how badly a team will get beat and how much coaches will continue to pour it on after the 45 pt. threshold has been attained.
 

Dilla Killa

Go Coyotes!
It will be bouncy at best for a few teams over the first half+ of the season. Just the way things go.

The worst is a team that is good early on then suffers from grades or injuries and becomes just a shell of themselves. Those really mess things up.

As for finding the error or regression as you say… I have always made my tweaks based on a combination of whether the total adjusted points off (think about 45s capped at 45) least squares AND whether re-running all games at the current level maximizes correct picks. It is always a balance.

Curious, if you are willing to share, does 'the toy' run 1-x simulations / regressions and the rankings are chosen based on a best guess / mid-point of the 1-x simulations?

I ask because I do something similar in my work. Although I cannot go into too much detail, think about the way a torpedo tracks its target through the water. There are many things that can alter a trajectory (salt water vs. fresh, impact with soft and hard objects that are not mapped, prop wash, etc). It is always a 'best guess' scenario to get the torpedo back on the intended target path. Simulations / scenarios / regressions are run constantly with new inputs.

Although this is radically different subject matter, the data science and analysis is relatively similar. It is incredibly complex (depending on the number of inputs) and props to you for creating your own algos. As you indicated, the more data you have to analyze, usually the more accurate the suggested data.
 

Texas Rebel

Active member
No, the accuracy is just picking winners. I did run numbers on how far the spread was off on games that were guessed incorrectly, though. On that, week 1 was something like 3242 points off, and week 11 was 518 points off. Since it's important to note that there were a lot more incorrect games in week 1 than in week 11, I had to determine how many points per game it was off. For that, week 1 was 66 points per game off and week 11 was just 37 points per game. While it'd be nice to close that gap a little, I think picking every game to within an average of 37 points is pretty good, especially considering the number of teams/games there are in this thing and how many of the schools in here never even play similar opponents.

As for the accuracy, using last year as an example, in week 1 it was 59.84% accurate. Definitely not good, but I'm not sure this year's were any better just because, like you said, years can be so much different for teams from one to the next and it's very hard to line up all the teams. 90% of the time, teams will be just where they were to end the previous season. Even if I do know about a drastic change (losing a lot of seniors, new coach, etc) it's still just a guess. With my luck, I'll take 59%. Ha!

In 17 weeks (16 really because there were no games in week 16) one week was 92% accurate (week 10 it went 106-9). There were 4 weeks over 85%. Another two weeks over 80%. Five weeks over 75%. It's important to note, too, that week 17 was 0%. It didn't pick either of the state games accurately. Also, week 15 it went 5-2, making that week 71%. Other than those two weeks, starting in week 3 it never dipped below 75%.

I should also mention that even though it went 5-2 in week 15, making it the lowest week by accuracy since week 1, it only missed those 2 games by an average of 24.88 points.

On the season The Toy went 1075-284, accurately picking the winner in 79.1% of games. If you discount week 1, because those were my "picks" then The Toy actually went 1002-235, which is 81% accurate. Again, with the number of teams/games in the system, I think that's pretty darn good.
There is no doubt that Six-Man football is a very dynamic sport characterized by constant change. A team's status can change on a dime. We all know that the injury of only 1 key player can immediately effect a teams overall performance regardless of past performance. There are many variables at play. Expect the unexpected.
 

Mike

Administrator
When a team wins by more than 45, and since the toy only predicts up to a 45 point win, do you take the "hit" for inaccuracy for someone blowing out a team 70-0? It's tough to predict how badly a team will get beat and how much coaches will continue to pour it on after the 45 pt. threshold has been attained.

So, there is a cap as far as how the ratings are concerned. Beating a team by 70 does you no more good than beating a team by 45, as far as the ratings are concerned. However, if you were referring to my inaccuracy point differential, I use the full number. If Team A was picked to win by 20 and Team B won by 70, then I'd add 90 points to my point differential for that game. That said, keep in mind I only calculate games where the pick was wrong. I never thought about calculating the accuracy for games picked correctly...until now anyways.
 

Mike

Administrator
Curious, if you are willing to share, does 'the toy' run 1-x simulations / regressions and the rankings are chosen based on a best guess / mid-point of the 1-x simulations?

I ask because I do something similar in my work. Although I cannot go into too much detail, think about the way a torpedo tracks its target through the water. There are many things that can alter a trajectory (salt water vs. fresh, impact with soft and hard objects that are not mapped, prop wash, etc). It is always a 'best guess' scenario to get the torpedo back on the intended target path. Simulations / scenarios / regressions are run constantly with new inputs.

Although this is radically different subject matter, the data science and analysis is relatively similar. It is incredibly complex (depending on the number of inputs) and props to you for creating your own algos. As you indicated, the more data you have to analyze, usually the more accurate the suggested data.

From what I can tell, and I certainly don't have it all down 100% just yet, each week is calculated several times. During each week's calculation, each previous week is re-calculated as well. So, if Team A beat Team B in week 1 and gained 20 points for that victory, then Team B's rating drops in week 2, the 20 points that Team A gained in week 1 will be re-calculated for the week 2 rating. That's why it's possible to see teams win a game and still lose rating points from time to time.

This week's games aren't all that matter to this week's rankings. What you've already done, and what your opponents have already done, play a major role too.
 
Top