Cricket Statistics

Remove this Banner Ad

Just curious how many stats nerds we have out there, and where you get your data from. Do you use a web interface like Statsguru, download cubes from other websites, or compile scorecards manually to create your own database?

I have always wanted access to a reliable and comprehensive source of first class match data (not necessarily going back decades), if anyone can recommend one I'd be very appreciative. Don't really mind if it's a paid service.
Excel
 
I'm surprised standard deviation isn't talked about more to get some sort of 'risk adjusted average'

I presume that is partly because the distribution of innings is not normal?
Partly, but probably mostly because it is a neutral measure without context. Most cricket stats are easy - higher is better than lower, or the reverse. You can therefore compare players etc. in a fairly straightforward way.

Standard deviation treats 'bad' deviation as the same as 'good' deviation so it becomes very difficult to interpret as a measure of consistency. For a batsman averaging 50, adouble ton does four times as 'damage' to your SD-measured consistency as a duck, but nobody would argue that it makes you a worse batsmen.

It's a useful measure to look at (indeed I think Cricinfo have blogged about it a few times) but I think its interpretation is way too subjective and contextual for it to ever become a mainstream measure.
 

Log in to remove this ad.

Partly, but probably mostly because it is a neutral measure without context. Most cricket stats are easy - higher is better than lower, or the reverse. You can therefore compare players etc. in a fairly straightforward way.

Standard deviation is already a bit esoteric to the non-mathematically minded, but when you throw in the fact that it treats 'bad' deviation as the same as 'good' deviation it becomes very difficult to interpret as a measure of consistency. For a batsman averaging 50, adouble ton does four times as 'damage' to your SD-measured consistency as a duck, but nobody would argue that it makes you a worse batsmen.

I'd argue though that a batsmen who scores 0 200 0 0 is less valuable than one who scores 100 0 100 0 ?
 
I'd argue though that a batsmen who scores 0 200 0 0 is less valuable than one who scores 100 0 100 0 ?
Perhaps, but what about 50 50 50 50 vs 100 0 100 0? Or 100 0 50 50? The point is that the value of consistency (SD) is highly contextual in cricket, whilst the value of runs (average) is relatively absolute.

It's hard to use contextual measures in large-scale comparisons, so while they are useful for shedding light they are difficult to use as a measuring stick.
 
Perhaps, but what about 50 50 50 50 vs 100 0 100 0? Or 100 0 50 50? The point is that the value of consistency (SD) is highly contextual in cricket, whilst the value of runs (average) is relatively absolute.

It's hard to use contextual measures in large-scale comparisons, so while they are useful for shedding light they are difficult to use as a measuring stick.

What about medians? Obviously in this example it's not useful, but I really like it for real stats.
 
Surprised to see Pujara at no2 on the test batsman rankings. I guess it's just because India have played all their recent games at home.

Root and Williamson are terrific batsman all around the world unlike Pujara
 
Going through some early 2000's stats from the Shield, you forget how strong that era was.

* Bevan was unlucky to not play more test cricket, he didn't set the world on fire from his 18 tests but then went on to average 58 in first class cricket with 68 tons. Was a very handy bowler to.

Basically everything Australia would want in a #6 currently.
 
Going through some early 2000's stats from the Shield, you forget how strong that era was.

**** Bevan was unlucky to not play more test cricket, he didn't set the world on fire from his 18 tests but then went on to average 58 in first class cricket with 68 tons. Was a very handy bowler to.

Basically everything Australia would want in a #6 currently.
Yeah, there were heaps of guys back in the 2000's who were never going to get anywhere near the test side who would be Best XI nowadays. For example James Brayshaw could probably be a regular in this side based on his stats.
 
Going through some early 2000's stats from the Shield, you forget how strong that era was.

**** Bevan was unlucky to not play more test cricket, he didn't set the world on fire from his 18 tests but then went on to average 58 in first class cricket with 68 tons. Was a very handy bowler to.

Basically everything Australia would want in a #6 currently.
Kept getting bounced out in his test career. He was a bit of a liability by the end
 
Yeah I always tell people to look at Bevan's Test career when they go on about how great a player Dussey would have been.
 
Going through some early 2000's stats from the Shield, you forget how strong that era was.

**** Bevan was unlucky to not play more test cricket, he didn't set the world on fire from his 18 tests but then went on to average 58 in first class cricket with 68 tons. Was a very handy bowler to.

Basically everything Australia would want in a #6 currently.

Stuart Law scored 27,000 runs @ 50.5 with 79 centuries at FC level.
Darren Lehmann scored nearly 26,000 @ 57.8 with 82.
Even Podge has 17,000 @ 48.8 with 51.

When you look down the list of guys with great Shield batting records like Hodge, Elliot, Love, Katich, Langer, Hussey, Rogers, Blewett etc. that were around in the 90s and 2000s it's amazing to think that all battled for years to get a chance at test level and other than Hussey battled to stay in the side. Compare that to our current top 6 and the most deserving on FC form is probably Handscomb who's played 70 matches and averages 42. Marsh is the most experienced but was pretty underwhelming for the first half of his career. It's not a whole lot different with the bowlers. O'Keefe is the only one who has earned his spot the old fashioned way. Haze and Starc were fast tracked, Cummins has played about 4 Shield games ever and Lyon is a groundsman.

What also stands out is how many Shield games (let alone other FC cricket) even guys who played a lot of test cricket played. Allan Border played 108 Shield matches to go with 156 tests. 385 FC matches overall and 382 List A. That's 5 years and 8 months dedicated to playing cricket (though not every test went to 5 days, FC to 4 etc.).

There are only a handful of current players who've played 100 Shield games and none of them have had long test careers.
 

(Log in to remove this ad.)

What about medians? Obviously in this example it's not useful, but I really like it for real stats.
Medians are a bit difficult in cricket because of the relatively small number of innings that a batsman plays. As they're independent of all other items in the series they are not always representative unless you have a large population.

e.g. if you have a median score of 38 and I have a median score of 34, does this actually mean you're more consistent or is it just down to the vagaries of where our 'middling' scores happened to fall?

It's good to add context to an average but it doesn't tell you a lot on its own.
 
Medians are a bit difficult in cricket because of the relatively small number of innings that a batsman plays. As they're independent of all other items in the series they are not always representative unless you have a large population.

e.g. if you have a median score of 38 and I have a median score of 34, does this actually mean you're more consistent or is it just down to the vagaries of where our 'middling' scores happened to fall?

It's good to add context to an average but it doesn't tell you a lot on its own.
It's a statistic that penalises batsmen who don't get out too, so will generally benefit the top order - players who usually get a chance to complete their innings
 
It's a statistic that penalises batsmen who don't get out too, so will generally benefit the top order - players who usually get a chance to complete their innings

But surely you could just add that innings to the next one to get a median ... obviously not perfect, but neither is calculating not outs in averages!
 
Surprised to see Pujara at no2 on the test batsman rankings. I guess it's just because India have played all their recent games at home.

Root and Williamson are terrific batsman all around the world unlike Pujara

Pujara is only 2 because the other batsmen aren't going that great. 861 is a historically pretty average ranking.

the rankings are done almost purely on strength of opposition, which is why Smith's ranking has suddenly skyrocketed, since he is facing Ashwin and Jadeja who are around the 900 mark for bowling which is historically pretty high.
 
So I ended up writing a little bit of VBA code that iterates through a range of URLs on the Cricinfo website and scrapes the data (based on HTML table reference) into CSV files which I can then load into SQL Server. The formatting is still a bit screwy but will fix it at some stage in the next couple of weeks. If anyone is interested in replicating I based the code mostly on the below:

www.stackoverflow.com/questions/8798260/html-parsing-of-cricinfo-scorecards

It's a bit of a cumbersome solution, so if anyone comes across actual live data feeds for FC matches I would still be interested.

I did a similar thing with Python (dunno why you'd do it in VBA but i guess you use what you know) It worked pretty well but in reality statsguru does 99% of the stuff you want. Statsguru is essentially a SQL query front end.

cricinfo don't seem to shut down people smashing their website which is surprising. One thing statsguru doesn't give you is ball by ball stats and some of their more exotic stats (like in control) so harvesting that information would be pretty useful.
 
Cause I don't code Python and R is too sluggish. VBA is easy, it's just a scrape.

Statsguru doesn't give you First Class stats.
 
Kept getting bounced out in his test career. He was a bit of a liability by the end
And yet bringing in the bouncer to ODI cricket didn;t affect him all that much. (It did a little,)
I always felt Bevan's short ball problem was as much psychological as technical. And he scored plenty of FC against on Australian tracks against good fast bowling. Just something at Test level, once he got out a couple of times like that it seemed to play on his mind, slowing him down and creating a technicall problem that wasn't there against the same boeling in other circumstances.
 
Partly, but probably mostly because it is a neutral measure without context. Most cricket stats are easy - higher is better than lower, or the reverse. You can therefore compare players etc. in a fairly straightforward way.

Standard deviation treats 'bad' deviation as the same as 'good' deviation so it becomes very difficult to interpret as a measure of consistency. For a batsman averaging 50, adouble ton does four times as 'damage' to your SD-measured consistency as a duck, but nobody would argue that it makes you a worse batsmen.

It's a useful measure to look at (indeed I think Cricinfo have blogged about it a few times) but I think its interpretation is way too subjective and contextual for it to ever become a mainstream measure.
True, although an outlying high scores also lifts your mean more than a low score reduces it - especially a high not out. The two don't balance each other, but some counteraction is there.
As with any statistics, they still need to be interpreted and that leaves meaning very much open to the interpreter.
 
Sure. My point was just that medians are often not directly comparable in small datasets.
 
Not worth its own thread, but thought I'd share CricHQ in case anyone hasn't seen it: https://www.crichq.com/

It lets you look at players First-Class stats for specific sides (so you can see their average only in First-Class games for Queensland, without County/Test stats, and the like). Successor to cricketarchive which is dead and entombed behind a paywall.
 

Remove this Banner Ad

Back
Top