Thursday, August 30, 2012

Database Cleanup, the Wild Pitch, and the IBB

Each year I take some time to clean-up our database.  The past few years defense and basic stats have been checked against baseball-rerefence for accuracy and adjusted where necessary.

This year I am going through the same exercise, but for the first time including Information Only stats.  These are stats that are not strictly required to enter a player into the game, they include:
  • For Batters: R, RBI, SH
  • For Pitchers: W, L, Sv, G, GS, CG, ShO, R, GF
Here are the critical errors found this off season.  There were many more (hundreds) of corrections on the information only stats, and a handful of additional minor corrections to the required stats.  These were minor enough (AB here, one triple there, 1/3 of an inning, etc) that they will not have in impact on player valuation and I didn't list them out.

Major Batter Corrections
  • 1930 Goose Goslin (overall line)
  • 1947 Harry Walker (10 more doubles)
  • 1964 Ken Boyer (10 more homers)

Major Pitcher Corrections
  • 1959 Hoyt Wilhelm (10 more K's)
  • 1983 Kent Tekulve (100 fewer batters faced, this worries me, his Opp BA will skyrocket)
  • 1995 Jose Mesa (5 fewer HBP's)
  • 2008 Corey Wade (88 more batters faced)
  • 2009 Ted Lilly (10 more walks)
  • 2011 Antonio Bastardi (78 more batters faced)
Intentional Walks
Throughout league history I have completely ignored the IBB mainly because baseball-reference never included the stat on it's main player pages.  However, a few years ago they altered their page layout and made room for it, making it a lot easier for me during data entry.   Until now, I let DMB self-estimate the IBB based upon pitcher walk rate and era.  This will change going forward and I made retroactive updates to virtually every player.

This may impact certain relievers significantly.  Jim Brewer, for instance, walked 25 batters in 78 innings.  DMB estimated that 4 of those walks (16%) were intentional.  In truth, Brewer intentionally walked 11 batters (44%) and I expect this would have a measurable impact to how Brewer performs.  The impact should be fewer non intentional walks.

Major IBB Updates
  • +10 - 1986 Mark Eichhorn
  • +8 - 1972 Jim Brewer
  • +7 - 1963 Dick Radatz
  • +6 - 1983 Steve Howe, 1987 Dave Smith, 1966 Phil Regan, 2002 Chris Hammond, 1969 Fritz Peterson
  • +5 - 2008 Chad Bradford, 1984 Willie Hernandez
  • -5 - 1978 Ron Guidry, 1982 Mario Soto, 1999 Randy Johnson, 2009 Tim Lincecum, 1968 Denny McClain, 1971 Wilbur Wood, 1971 Tom Seaver, 1968 Dave McNally, 2005 Andy Pettitte
  • -6 - 1979 JR RIchard, 1997 Al Leiter, 1985 Sid Fernandez, 1991 Nolan Ryan, 2009 Felix Hernandez, 1972 Don Sutton, 2005 Chris Carpenter
  • -7 - 1961 Whitey Ford, 1971 Vida Blue, 1995 HideoNomo
  • -8 - 1965 Sam McDowell, 1987 Nolan Ryan, 1998 Kerry Wood, 2008 Tim Lincecum, 2007 Chris Young

Wild Pitch Changes
Wild Pitches have historical been one of the "Informational Only" stats in our league.  This upcoming year, however, I decided to implement the Wild Pitch rating system DMB employs:
This number indicates how often a pitcher throws a wild pitch when there are runners on base. The wild pitch rating tends to range from 0 to 60 with an average of 15. Use the formula:

  rating = (wild pitches * 1000) / (batters faced * .43)

For example, if a pitcher threw four wild pitches in a season in which he faced 1000 batters, his rating is 9. Why .43? Because about 43% percentage of batters faced occur with runners on base, though this number rises and falls over time and will vary for individual pitchers.
Using that formula, the worst offenders will be:

80 - 1996 Ruffin,Bruce
79 - 2006 Rodriguez,Francisco
77 - 2005 Turnbow,Derrick
76 - 1890 Neale,Joe
70 - 2011 Holland,Greg
68 - 1998 Hoffman,Trevor
66 - 1998 Gordon,Tom
64 - 2002 Eischen,Joey
63 - 2002 Romero,J.C.
62 - 1981 Ryan,Nolan
60 - 1986 Murphy,Rob
59 - 1993 Bedrosian,Steve
58 - 1885 Ramsey,Toad
58 - 1993 Ward,Duane
57 - 1995 Holmes,Darren
57 - 1995 Nomo,Hideo
56 - 1998 Brocail,Doug
54 - 1999 Rocker,John
53 - 1967 Niekro,Phil
52 - 1872 Spalding,Al
51 - 2011 Robertson,David
50 - 1989 Davis,Mark
50 - 1989 Russell,Jeff
48 - 2008 Rodriguez,Francisco
48 - 2006 Reyes,Dennys
47 - 1993 Wetteland,John
47 - 1995 Mesa,Jose
46 - 1881 Whitney,Jim
45 - 2000 Nen,Robb
45 - 2003 Guardado,Eddie
44 - 1992 Guzman,Juan
44 - 2006 Liriano,Francisco
44 - 2008 Buchholz,Taylor
43 - 2010 Feliz,Neftali
43 - 2009 Bailey,Andrew
43 - 1970 Richert,Pete
43 - 1965 Wilhelm,Hoyt
43 - 2008 Lincecum,Tim
42 - 1991 Fassero,Jeff
42 - 2010 Jimenez,Ubaldo
41 - 2011 Bastardo,Antonio
41 - 2004 Nathan,Joe
41 - 2007 Marmol,Carlos
41 - 2001 Fox,Chad
40 - 2009 Hernandez,Felix
40 - 2008 Marmol,Carlos
40 - 1972 Marshall,Mike
40 - 1977 Sutter,Bruce

While mostly relievers, there are a few great pitchers on that list nonetheless.  If I am interpreting the formula correctly, this means that Bruce Ruffin will throw 80 passed balls for every 1000 times he pitches WITH RUNNERS ON BASE.

I don't expect the need for draft strategy changes, we are talking wild pitches now occurring between .01% and .08% of the time with runners on base instead of virtually never.

Wednesday, August 22, 2012

Further Evidence the 2000s Stink for Hitters

(Not that we needed any more).

I added Mike Trout and Andrew McCutchen to the system and ran 3 sims.  This was using last years database in neutral parks.

.263 / .312 / .338, 10 HR

.283 / .321 / .338, 10 HR

Investigating the DMB Era Bias

Not sure why I hadn’t thought of this before, but this morning it struck me that Diamond Mind Baseball must handle certain “era’s” better than others.  Further, it shouldn’t be difficult to identify those era’s and adjust our drafting strategies accordingly, at least in terms of which new players to take a chance on.

DMB generally regresses all players to some sort of theoretical average.  For instance, Babe Ruth doesn’t maintain his real life 1.378 OPS, he regresses towards a league average and ends up with a 1.000 OPS.  He is still the best batter in both Real Life and the world of DMB, it’s just that DMB accounts for a reduction in quality because Ruth is facing stiff competition every day. 

We do not know how DMB does this, and more importantly, we don’t know which stats DMB values over others or how those stats are compared to the league averages.  However, we hopefully can get an idea of which era’s give the DMB engine more fits than others.

As it turns out, DMB is a harsh mistress.  Out of 499 qualified batting seasons, not one individual player saw their OPS improve from their real life average.  Overall, DMB reduces a player's OPS by an average of 25%

1884 Candy Nelson was impacted the least, with his OPS dropping just 2 percent from .682 to .665.  However, with a real life OPS that low he is a true oddity and not representative of the typical player drafted in ATB.  1884 Orator Shaffer was next on the list, is more of a typical player, and saw ‘only’ a 10% reduction in OPS (.900 down to .813). 

In third is 1876 Ross Barnes and a trend has emerged. The three players who hold up the best are all from the 1800s and incredibly, out of the top 20, only two were outside the dead ball era (pre 1919).

There is a similar trend at the opposite end of the spectrum, though not as pronounced.  While 1977 Carlton Fisk easily takes the biggest hit, with an OPS drop of 41% (.922 down to .541) and 1930 Hack Wilson is in second, with a 38% drop (1.174 down to .729), the current era players are at an extreme disadvantage.  Out of the 12 most impacted players, 9 are from 1999 or later.

 Below is a timeline for the batters, with the percent difference in OPS plotted against the player year.  The dots represent the actual average OPS difference for that year in particular, and the blue line is the 5-month rolling average trend.

The 1906-1917 sweet spot is obvious.  During this period batters averaged just an 18% dip in OPS, and that drops to 16% when excluding 1911 and 1912.  

Within this group there are some gems.  Three all time greats – Honus Wagner, Ty Cobb, and Tris Speaker – highlight this cast, but even as a whole this time period outperforms every other.  These outfielders averaged an 14.5% drop in OPS – stellar – but twenty others also saw OPS reductions of less than 20% and every single player in the study performed better than the -25% historical average.

I mentioned the two outlying years during this time frame, 1911 and 1912.  Nine players had enough at bats to qualify, collectively averaging a 23% drop in OPS.  To put that in perspective, a .900 OPS reduces to .756 with a 16% decline, and down to .693 with the 23% decline of 1911-1912.

 Here’s the exact numbers by year:
-16% - 1906
-15% - 1908
-13% - 1909
-19% - 1910
-22% - 1911
-23% - 1912
-19% - 1914
-18% - 1915
-14% - 1917

Excepting 1909, the trend in this 12 year period follows a normal distribution pattern, peaking in 1911 and 1912 with – Ty Cobb and Tris Speaker, among others such as Heinie Zimmerman, Sam Crawford, and Joe Jackson all appear in this two year period.  These are good players, but due to the specific yearly averages of the league, DMB treats them harshly.

The conclusion?  Take a flyer on some new players during this 1906-1917 period, but steer clear from 1910-1915, and specifically 1911 and 1912.

While there are a few other peaks, one other specifically needs to be mentioned. 

  • After a relatively lackluster 1930s, the war years 1943-1948 are generally kind to batters.  Phil Caverretta, Stan Spence, Stan Hack, Mickey Vernon, Dixie Walker, Stan Musial and more all had respectful declines in OPS.   1947 stands out as being particularly brutal, with Ralph Kiner and Johnny Mize combining to average a 30% decline in OPS.
  • The 1950s through mid 1980s are relatively stable with two minor exceptions. The late 1960s/Early 1970s has a noticeable drop thanks to Johnny Bench, Jim Wynn, Frank Howard, Rico Petrocelli, Harmon Killebrew and to a lesser extent Roberto Clemente.  The late 70s/early 80s are also periods to avoid, with Dwight Evans, Mike Schmidt, Gary Carter, Toby Harah, and Davey Lopes all struggling
  • Finally, it appears the worst period of all is our current era of the 2000s.  At first glance, I chalked this up to a selection bias.  We are collectively most familiar with players from our own era and naturally gravitate towards selecting them on draft day.  Indeed, a case can be made that players such as Edgar Renteria (-33%), Rich Aurilia (-32%), JD Drew (-32%), Jason Varitek (-30%) and others don’t belong in ATB to begin with.  However, true greats from our era also struggle:
-35%  - 2002 Alex Rodriguez
-34% - 2002 Vladimir Guerrero
-34% - 2006 Miguel Cabrera
-33% - 2007 Hanley Ramirez
-33% - 2007 Jimmy Rollins
-31% - 2006 Albert Pujols
-31% - Sammy Sosa

The list goes on and on.  In fact, only 6 of 70 players come in better than the historical ATB average of a 25% reduction of OPS.

What should we do with this information?  Frankly, the majority of us will completely disregard and that may not be the worst idea. I would not suggest that anyone avoids any player in particular because of this study.  However, if you are looking to take a flyer on two players that appear to be equal, take the one from the dead ball era and avoid the one from our current era.