Resource AFL Fantasy spreadsheet - 2022

  • Thread starter
  • Moderator
  • #26
This is an awesome discussion gents; well done. I love Fantasy and I'm learning data science via R, so this is amazing on both fronts. Jivlain, have you compared the correlation of expected prices to actual from the Footywire website? They have some price predictions (predicted changes at n+1, n+2, n+3); I'm looking at some of their data on a weekly basis, but I haven't bothered to check whether their predictions are accurate.

Loving your work! Thanks!

I have all the actual prices already - I have a scraper for that - and calculate the factors with a system of simultaneous equations using last week’s price and the last k rounds as coefficients on one side and the current actual prices on the other. The resulting correlation of estimated price to actual price is really close - R squared of 0.999 or so.

There is some mysterious factor of last year’s performance in the prices according to the rules - and there clearly are residuals that should be explainable - but so far neither my eyeballs nor the simultaneous equations have found any correlation between the residuals and last year’s performance.

Given the already tight correlation it must be pretty small however.

Either way, seeing as it is apparently tied to last year’s performance - I figure it doesn’t change much. Plus I’ve noticed that there is a correlation between last week’s residuals and this one’s. So my “predicted prices” are actually not the output of my formula.

Instead, I estimate the current price again based on my formula. I then estimate the next week’s price using my formula and predicted score. And then I add the difference between them to the current price and that is my predicted price - I may not know the mysterious factor but that should control for it pretty neatly.

So I reckon the next big step forward would be to make my score estimation better - obviously it’ll remain limited and could never have guessed what Telly was about to do :)
 
  • Thread starter
  • Moderator
  • #27
AFL Fantasy spreadsheet (round 11)

The Fantasy Freako newsletter, from which I get the breakevens, is showing the BEs for the sides about to have their bye as 0. Which makes a kind of sense I guess but also means if you're sorting by stuff they get in the way. So for the teams having a bye this round, I've instead provided an estimated breakeven, based on the following formula:
be_formula.png

Which I really just included to make me look fancy and mathematical, but it's more or less the pricing formula, except that I set the next price to be equal to the current price (that's what a breakeven is, after all), and then rearranged the terms until L0 (the score for the upcoming round) was alone on one side.

It's close enough for the other players who don't have the bye coming up for me to trust that I've basically done the algebra correctly. I've included the scraped breakeven for them anyway.
 
Jul 1, 2013
21,866
18,248
AFL Club
Fremantle
Other Teams
Fulham, 76ers, Cubs, Bills
Some analysis this week:

Last month of scores, relative to average. Excluding players under 400k.

GOOD FORM (last month average 15+ points worse than season average)
Name - Pos - Price - BE - Games - Average
Dayne Zorko Mid 729000 90 11 96.5
Alex Neal-bullen Fwd 547000 37 11 78.8
Tom Rockliff Mid/Fwd 539000 90 7 68.9
Matthew Scharenberg Def 569000 70 10 79.9
Jack Sinclair Mid 567000 62 9 81.6
Angus Brayshaw Def/Mid 655000 39 8 98.9
Ryan Burton Def 442000 33 10 58
Ben Ainsworth Fwd 441000 34 10 61.5
Luke Ryan Def 593000 75 11 85.2
Michael Walters Mid/Fwd 575000 66 9 90.6
Adam Saad Def 416000 36 11 57.1
Jake Melksham Fwd 515000 75 11 70.5
George Hewett Mid/Fwd 483000 61 11 65
James Frawley Def 401000 40 9 56.3
Bayley Fritsch Mid/Fwd 534000 42 10 79.4
Jake Lloyd Def 694000 105 11 98.3
Tom Langdon Def 540000 71 8 77.8

Not a lot of surprises here, many of the popular names for upgrades over the next month including Zorko, Rocky, Gus Brayshaw, Walters and Lloyd.
Interesting to see some other names not considered as widely including Neal-Bullen, Burton, Sinclair, Langdon and Melksham, all of which may represent useful stepping stones. Having said that none of these are actually averaging over 80, so their month is better but is it worthwhile overall?

BAD FORM (last month average 15+ points worse than season average)

Name - Pos - Price - BE - Games - Average
Michael Hurley Def 595000 136 9 88
David Swallow Mid 574000 111 10 90.9
Brandan Parfitt Mid/Fwd 554000 97 7 86.1
Robbie Gray Mid/Fwd 629000 137 9 97
Luke Brown Def 425000 104 10 65.9
Jared Polec Mid 627000 138 10 95.5
Jack Riewoldt Fwd 483000 134 11 73.6
Sam Jacobs Ruc 524000 104 11 82.4
Aaron Young Fwd 437000 96 10 66.3
Paul Seedsman Def/Mid 603000 125 10 93.2

With Dustin Martin at -14.9

Interesting names, including many who were thought to be good upgrade targets over the bye month but have fallen by the wayside, including Hurley, Gray, Polec, Seedsman and of course Dusty.
 
Jul 1, 2013
21,866
18,248
AFL Club
Fremantle
Other Teams
Fulham, 76ers, Cubs, Bills
CONSISTENCY
Excluding those with season average <55, and those priced <400k
These are the players who have scored more than their season average each of the past 4 rounds:

Player - Pos - Price - BE
Jack Crisp Def 675000 76
Dayne Zorko Mid 729000 90
Alex Neal-bullen Fwd 547000 37
Matthew Scharenberg Def 569000 70
Ryan Burton Def 442000 33
Adam Saad Def 416000 36
George Hewett Mid/Fwd 483000 61
Hugh Greenwood Mid 620000 88
Liam Duggan Def 467000 63
Oliver Florent Fwd 401000 54
Brodie Grundy Ruc 818000 117
Will Hayward Fwd 451000 66
Isaac Heeney Mid/Fwd 647000 95

Additionally Ainsworth (3/3 of his last month which includes a bye).

Compared to last list, Hewett, Hayward stay on, whilst bigger names like Redden, Darling, Macrae fell off the list

Some overlap between consistency and form, and if you're looking for players on both, you have:
Zorko
ANB
Shaz
Burton
Saad
Hewett
 

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions

Hey Jivlain, thanks for the data. I am learning R, so I'd thought I'd use it to give processing a PDF a go. I'm trying to extract freako PDF data, but I'm having trouble with the spaces when there are missing scores. My problem occurs when there's no score for (round-2) and/or (round-1), as the round score stacks left and I can't determine where the score should be placed in the three rounds. Any tips for an R newbie? Thanks!
 
  • Thread starter
  • Moderator
  • #33
Jivlain, thanks for the data. I am learning R, so I'd thought I'd use it to give processing a PDF a go. I'm trying to extract freako PDF data, but I'm having trouble with the spaces when there are missing scores. My problem occurs when there's no score for (round-2) and/or (round-1), as the round score stacks

Yeah, scraping the Freako PDF is an... interesting challenge. I used PDFxStream, the table parser didn't really work for it (at least when I first wrote my scraper). So instead its VisualOutputTarget to maintain the spacing and then I have a... very interesting... algorithm to partition it into columns. The first several iterations of it were rather unreliable but now it's pretty consistent. However the code is also absolutely horrific and if it breaks again I may well just scrap the thing.

For a whole other thing (posting stats graphs of the selected teams on my Twitter), I now scrape the team sheets too. So what I'd probably do now, were I implementing it now, is just scrape what the actual teams were from the AFL website and figure out where the blanks are supposed to be from that.

It's C# but perhaps you can figure out the logic.

Code:
using AFL.Entities;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace AFL.Teams
{
    public static class TeamScraper
    {
        private static Regex extractTeamCode = new Regex("team-([a-z]{2,3})");
        private static Regex extractPlayerName = new Regex(@"^\s*([^,]+),?\s*");

        public static IEnumerable<GraphSquad> ScrapeTeams(int round)
        {
            var doc = Html.Load("http://www.afl.com.au/news/teams?round=" + round).DocumentNode;
            var teams = GraphTeam.Load();
            var sections = doc.SelectNodes("//div[@id='tteamtext']/div[@class='game']/div[@class='lineup']/div[contains(@class, 'text-inouts')]");
            foreach (var team in sections)
            {
                var teamCode = team.GetAttributeValue("class", "");
                var positions = team.SelectNodes("div[contains(@class, 'posGroup')]");
                var squad = new GraphSquad { Code = teamCode, Team = teams[teamCode] };
                foreach (var position in positions)
                {
                    if (position.SelectNodes("p[@class='pos']/abbr") == null)
                        continue;
                    var posCode = position.SelectSingleNode("p[@class='pos']/abbr").InnerText;
                    if (string.Equals(posCode, "Emg", StringComparison.InvariantCultureIgnoreCase))
                        continue;
                    var players = position.SelectNodes("ul/li[not(contains(@class, 'note'))]").Select(node => extractPlayerName.Match(node.InnerText).Groups[1].Value);
                    squad.Players.AddRange(players.Select(p => new GraphPlayer(Player.GetKey(teamCode, p))));
                }
                yield return squad;
            }
        }
    }
}
 
Yeah, scraping the Freako PDF is an... interesting challenge. I used PDFxStream, the table parser didn't really work for it (at least when I first wrote my scraper). So instead its VisualOutputTarget to maintain the spacing and then I have a... very interesting... algorithm to partition it into columns. The first several iterations of it were rather unreliable but now it's pretty consistent. However the code is also absolutely horrific and if it breaks again I may well just scrap the thing.

For a whole other thing (posting stats graphs of the selected teams on my Twitter), I now scrape the team sheets too. So what I'd probably do now, were I implementing it now, is just scrape what the actual teams were from the AFL website and figure out where the blanks are supposed to be from that.

It's C# but perhaps you can figure out the logic.

Code:
using AFL.Entities;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace AFL.Teams
{
    public static class TeamScraper
    {
        private static Regex extractTeamCode = new Regex("team-([a-z]{2,3})");
        private static Regex extractPlayerName = new Regex(@"^\s*([^,]+),?\s*");

        public static IEnumerable<GraphSquad> ScrapeTeams(int round)
        {
            var doc = Html.Load("http://www.afl.com.au/news/teams?round=" + round).DocumentNode;
            var teams = GraphTeam.Load();
            var sections = doc.SelectNodes("//div[@id='tteamtext']/div[@class='game']/div[@class='lineup']/div[contains(@class, 'text-inouts')]");
            foreach (var team in sections)
            {
                var teamCode = team.GetAttributeValue("class", "");
                var positions = team.SelectNodes("div[contains(@class, 'posGroup')]");
                var squad = new GraphSquad { Code = teamCode, Team = teams[teamCode] };
                foreach (var position in positions)
                {
                    if (position.SelectNodes("p[@class='pos']/abbr") == null)
                        continue;
                    var posCode = position.SelectSingleNode("p[@class='pos']/abbr").InnerText;
                    if (string.Equals(posCode, "Emg", StringComparison.InvariantCultureIgnoreCase))
                        continue;
                    var players = position.SelectNodes("ul/li[not(contains(@class, 'note'))]").Select(node => extractPlayerName.Match(node.InnerText).Groups[1].Value);
                    squad.Players.AddRange(players.Select(p => new GraphPlayer(Player.GetKey(teamCode, p))));
                }
                yield return squad;
            }
        }
    }
}
Your lack of braces around the if statements bothers me! ;)
 

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions
Yeah, scraping the Freako PDF is an... interesting challenge. I used PDFxStream, the table parser didn't really work for it (at least when I first wrote my scraper). So instead its VisualOutputTarget to maintain the spacing and then I have a... very interesting... algorithm to partition it into columns. The first several iterations of it were rather unreliable but now it's pretty consistent. However the code is also absolutely horrific and if it breaks again I may well just scrap the thing.

For a whole other thing (posting stats graphs of the selected teams on my Twitter), I now scrape the team sheets too. So what I'd probably do now, were I implementing it now, is just scrape what the actual teams were from the AFL website and figure out where the blanks are supposed to be from that.

Thanks Jivlain. I tried the tabulizer package...unfortunately it keeps missing some of the tables, otherwise it would have worked a bit better. I have a pretty reliable scraper in C# for a couple of good websites that give me everything I need, including BEs. I was just hoping to merge my R learning with my AF addiction ;).

I'm thinking its best to stick with C# scraping into a DB and then use R for some analysis by querying the db...and Freako as a source isn't the best option.
 

rimb

Team Captain
Jul 5, 2011
316
242
AFL Club
Geelong
Surely it would be easier to scrape data from the dtlive dataview?

I know there are occasional inadvertent 0s when there are late outs but they're pretty easy to spot.
 

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions
Surely it would be easier to scrape data from the dtlive dataview?

I know there are occasional inadvertent 0s when there are late outs but they're pretty easy to spot.

Yep; I have reliable C# scraper code for footy wire and dtlive. I was just trying to find a way to solve in R via PDF...more for the exercise of it really.
 
  • Thread starter
  • Moderator
  • #39
Surely it would be easier to scrape data from the dtlive dataview?

I know there are occasional inadvertent 0s when there are late outs but they're pretty easy to spot.

That'd work too I guess ¯\_(ツ)_/¯

For the late out thing you could just have it print out an alert when there's 23 players in a team and you know you've got something to review. They got that 0 for the bye breakevens thing going on too though.
 

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions
I think I worked the Freako PDF format out. I was using string functions to separate the string bits via space tokens. Using this method, generally everything is fine up until BE, but the gaps after that were screwing me. I now understand the PDF format uses a fixed line length format of 82 chars; this falls to 75 if rnd score is missing, 71 if both rnd and rnd-1 scores are missing, and 67 chars if all three rnd-2, rnd-1 and rnd scores are missing (ie player hasn't played last 3 games). I'm pretty sure I can now use this pattern to drag out what I need in the right order.

Again, I agree there's better sources of data (eg footy wire), where the data can be bulk scraped. But its a good exercise in trying to extract data from a PDF.
 
  • Thread starter
  • Moderator
  • #41
I think I worked the Freako PDF format out. I was using string functions to separate the string bits via space tokens. Using this method, generally everything is fine up until BE, but the gaps after that were screwing me. I now understand the PDF format uses a fixed line length format of 82 chars; this falls to 75 if rnd score is missing, 71 if both rnd and rnd-1 scores are missing, and 67 chars if all three rnd-2, rnd-1 and rnd scores are missing (ie player hasn't played last 3 games). I'm pretty sure I can now use this pattern to drag out what I need in the right order.

Again, I agree there's better sources of data (eg footy wire), where the data can be bulk scraped. But its a good exercise in trying to extract data from a PDF.

Careful, they have been known to shuffle things around a bunch. Not sure how stable they've been this year but certainly a couple years ago they were throwing things all over the place. Plus a period when they kept changing their mind about how/whether to abbreviate McDonald-Tipungwuti.
 
Last edited:

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions
Careful, they have been known to shuffle things around a bunch. Not sure how stable they've been this year but certainly a couple years ago they were throwing things all over the place. Plus a period when they kept changing their mind about how/whether to abbreviate McDonald--Tupungwiti.

No worries; I'm going to run my script against all the Freakos and see what happens. The first two are exceptions, as there aren't 3 rounds of data, but the rest should be ok.

This is a code snippet for R if anyone is interested. There's some looping, searching, data cleaning/converting, etc that needs to happen as well, but you get the gist...

Code:
path <- setPath(rd)
document <- Corpus(URISource(path), readerControl = list(reader = read))
doc <- content(document[[1]])
corpus.array <- content(document[[1]])
text <- pdf_text(path)
text2 <- strsplit(text, "\n")

...

# name # substr(text2[[6]], start = 1, stop = 18)
# pos # substr(text2[[6]], start = 19, stop = 25)
# price # substr(text2[[6]], start = 26, stop = 34)
# tog # substr(text2[[6]], start = 35, stop = 41)
# ppm # substr(text2[[6]], start = 42, stop = 49)
# mt # substr(text2[[6]], start = 50, stop = 55)
# avg # substr(text2[[6]], start = 56, stop = 62)
# be # substr(text2[[6]], start = 63, stop = 66)
# rnd-2 # substr(text2[[6]], start = 68, stop = 70)
# rnd-1 # substr(text2[[6]], start = 71, stop = 74)
# rnd # substr(text2[[6]], start = 75, stop = 82)

[/CODE
 

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions
Careful, they have been known to shuffle things around a bunch. Not sure how stable they've been this year but certainly a couple years ago they were throwing things all over the place. Plus a period when they kept changing their mind about how/whether to abbreviate McDonald-Tipungwuti.
You were absolutely right; the name and position columns weren't consistent...they kept shifting with the varied name lengths, but everything is constant from the $ column to the right. I had to get creative with the [stringr] library, but I got it done. I've now got a solid data frame from rnds 1 - 13 with all of the freako parameters. Just have to deal with Matt de Boer...his two part last name needs special attention.

R / RStudio is a pretty sweet tool for anyone looking to do data analytics.
 

Rydo

Rydo
Aug 3, 2009
8
2
AFL Club
Brisbane Lions
You were absolutely right; the name and position columns weren't consistent...they kept shifting with the varied name lengths, but everything is constant from the $ column to the right. I had to get creative with the [stringr] library, but I got it done. I've now got a solid data frame from rnds 1 - 13 with all of the freako parameters. Just have to deal with Matt de Boer...his two part last name needs special attention.

R / RStudio is a pretty sweet tool for anyone looking to do data analytics.

If anyone is interested, here's the Freako data scraped from PDF using R / RStudio, up to R13, via Google Sheets.

 
Last edited:
  • Thread starter
  • Moderator
  • #45
AFL Fantasy spreadsheet (round 14)

I added a new column for player availability so you know whether they're selected or injured. Will be updated at discretion, but at least you'll know if they're injured. And maybe whether or not they're selected.
 
Back