Tournament Director and chatGPT combine to rank my league players!! (5 Viewers)

I do find this interesting. I’m not so concerned with whether or not AI is being used to solve for this. I am intrigued by the ranking piece, and how it may differ from using league points.
Very cool - my original purpose for posting.
If I may ramble…

Clearly, each league season has players that outperform others. Players earn points, and the points paint a picture of who is doing well and who is not. Do season standings tell you whether one player is better than another though? Maybe as a snapshot, but not over the long run. That seems to be what you are trying to answer here.
Indeed!
Point standings are objective and show a person’s position in a given season.

Rankings are subjective and evaluate players based on perceived strength and/or ability.
Depending on how yuo do the ranking i guess - a ranking system that isn't objective is indeed worthless. I specifically tried to make the rankings as objective as possible. This is why the the calculation starts with the points each player earned in each game. For me, the trick was getting this info out of the TD .tdt files and boiled down to a per game per player row in my ranking data set. Chatty did this grunt work for me. Then I asked her to suggest random spot checks and I'm happy to report that the resulting data set completely passed the random spot check. I was quite satisfied that she had boiled down the per game .tdt files correctly. She may (or may not) suck for other things but she Aced this task!

Rankings also allow us to evaluate players who may not have played against each other in different seasons. Is the Season 7 champion a better player than the Season 8 champion?
My ranking doesn't do this and is a shortcoming that I can live with. That said, I'm sure Chatty could help me address it. I'm just not inclined to go there!
Why can’t the GPI formula or World Golf Ranking formula be applied to a home league?
It could be but for reasons shared in a different post here it isnt' suited to league play as well as the ranking system we came up with. Chatty and I had a long conversation about this. Would you like me to post snippets of it here?
When looking at the results, do you tend to agree with what they show?
My subjective perceptions of who the strongest players are matches pretty well with the rankings.
Like the College Football Playoff rankings, it’s time for the talking heads to argue every angle about which players should be above/below another.
My rankings won't get published - I actually asked Chatty to analyze the pros and cons of publishing them. This too was a fascinating convesation. The analysis persuaded me not to publish. One of the cons it gave was that it would be open for distracting debate. Another was that it doesn't serve the goals of the league, which are to get into championships! The season standings i publish completely handle this goal.

Thanks for you interest!
HK
 
GIGO. I don’t understand how anyone can trust anything but 4th grade math solutions that any current AI spits out. (OP situation fits the bill)
But that’s me. I’d do my own research to find out the best way to grow tomatoes or put air in my tires before I’d trust anything point blank an AI engine told me. And if you have to check on its responses….well maybe you just should have done the work.
I use AI to tell me how to grow everything!
 
For my understanding, what ranking are you creating that is different than your points formula?

I had this question as well. If your points system is solid, the existing rankings should be pretty accurate as far as evaluating raw performance — at least, for those players with a large sample size over many games.

First let me say that I do think LLMs (AIs) can do many amazing things, and this type of task may be one of them—as it is data-driven and not terribly subjective. It does not require much if any reflective judgement, mainly just data crunching and pattern recognition. I have found bots like this to be very good at straightforward tasks, for example finding where something went wrong in a website’s code.

If it also can absorb from poker literature what kind of variance to expect in tournaments of the size you are hosting, that would probably be a big plus. It might be worth prompting it to take a look at any available articles or studies of tournament variance and results. (Even extraordinarily talented pros can bust without cashing in a tourney. I remember hearing a podcast a few years ago where a well-known successful pro said he once played 30 tournaments in a row without seeing any profit.)

That said, how much you glean from the results is likely to depend a lot on what types of questions and follow-up questions you ask to make sure it is looking deeply enough into the data, and not generating off-point or simplistic conclusions.

............................

FWIW, about a decade ago I used to handle all the record-keeping for a friend’s two-table tournament which generally had 15-18 players every week. I amassed about three years of data in a spreadsheet on a player pool of about three dozen players, about half of whom attended between half and 90% of games. The rest were one-off guests, or more occasional participants. (We had an end-of-year special event for a bigger prize, where starting stacks were based on performance during the year, which was the purpose of the record-keeping.)

This was long before A.I., but I recall being able to glean a fair amount of insight into the player pool just by looking at and sorting the spreadsheet in various ways. In addition to our points system, I had fields for stuff like number of 1st/2nd/3rd/4th/5th place cashes, number of total cashes, earnings/losses, rebuy frequency, etc., both in total and per appearance. I came up with formulas to estimate stuff like how high people generally placed, normalizing the numbers based on number of players (since something like a 7th place finish in a field of 15 is not quite the same as in a field of 18).This was also interesting in terms of who had the least cashes/earnings per appearance.

Somewhat more intriguing were anomalies like the player who had a lot of top 3 finishes, but who also had a lot of early bust-outs. This of course was a guy who played a high-variance, gambly game. Then there were others who had above average overall rankings, but relatively poor earnings. These were nittier players who tended to go deeper into the game than many of the weaker or more wild players, but who rarely finished 1st or even in the top 3 places, because they did not play aggressively enough in the middle and later stages of the tournaments.

I would think an LLM might be able to suss out more complex insights, such as who appears to be overperforming vs. underperforming; or, whose win rate has the highest long-term quality as opposed to just the biggest numbers. Among those who have fewer data points than your most frequent regs, the LLM might be able to offer insight into who is likely to climb up the standings as they play more often. And so forth.

Long message, I know... Basic point is that if you nudge Chatty to look harder and make finer distinctions beyond just “who is the best player,” you might really get some good insights out of it. But I would remember to double-check and be skeptical of all answers, because the bot does not really know what poker tournaments are and is only going to be as smart as your prompts.
 
Last edited:
I had this question as well. If your points system is solid, the existing rankings should be pretty accurate as far as evaluating raw performance — at least, for those players with a large sample size over many games.
The only "rankings" available in TD that span multiple seasons are total cumulative points which, from player to player, can't be compared apples to apples because everyone had played a different number of games. The extra-curricular ranking system me and Chatty developed appropriately adjusts for this differential.
FWIW, about a decade ago I used to handle all the record-keeping for a friend’s two-table tournament which generally had 15-18 players every week. I amassed about three years of data in a spreadsheet on a player pool of about three dozen players, about half of whom attended between half and 90% of games.
This is precisely where TD lets me down, this per game, per player data set. I had to use Chatty to extract this data set from the individual tournament files!
Long message, I know... Basic point is that if you nudge Chatty to look harder and make finer distinctions beyond just “who is the best player,” you might really get some good insights out of it. But I would remember to double-check and be skeptical of all answers, because the bot does not really know what poker tournaments are and is only going to be as smart as your prompts.
Thanks for this. I especially enjoy double checking her! She is actually very helpful with this. Example...once she created the condensed data set I asked her for 10 random games so I could check her work. She obliged. She passed this sanity check perfectly.
 
could you give me an example of a bias in this context?
If it adds 2 in error, it should add the 2 in all cases. Where it can be an issue is if it multiplies something and says there is a null value, the player hadn't joined the league, or missed that session. It converts nulls to 0, so the player is down a multitude, and they don't get excluded from their average because it didn't handle nulls correctly, or the inverse: it treated 0 as null.
 
Where it can be an issue is if it multiplies something and says there is a null value, the player hadn't joined the league, or missed that session.
The age old null value issue, the bane of database designers' existences!!

So Chatty and I working together caught the instances of this. For example, I had numerous tournaments where a player was in the player list but had no buyins because they didn't show up for the game after saying they were coming. She referred to this as an "edge condition". I told her these players could be completely ignored. She included some of these games in her suggested random spot check list!

Definitely I had to have my DBA hat on during the whole process. I also had to have on my league director hat - what am I really trying to accomplish here so that when she gave me options i could ask good follow up questions and ultimately chose what fit my goals the best.

It was truly enjoyable.

HiveKueen
 
@HiveKueen - one question on your formula: does it account for recency of performance, or just total games played?

Thinking about whether a player who crushed it 2 years ago but has gone cold should rank the same as someone currently on a heater.

I've been using Claude for something adjacent - generating individual player scorecards after each event (trends, head-to-head records, path to qualification). Different goal than all-time rankings, but similar data extraction challenge.

Happy to compare notes on prompts/workflows if useful.
 
I am cautious of AI. I will use it to see how it would approach a solution, or to find basic syntax, but I would not trust it to do complex calculations. I would say a poker league could go either way. The AI should treat all the players the same, so if there was a bias, at least it should be applied evenly.

I use caution with pretty much anyone and anything which claims to be good at something, let alone expert, until they’ve proven themselves... Or itself.

As such, I don’t actually see how interacting with A.I. has to be be significantly different for fairly ordinary *technical* questions than when dealing with a human.

You know... humans. Who sometimes do stuff like make mistakes, overestimate/exaggerate their own expertise, make shit up to cover for their shortcomings, have trouble focusing on the task at hand/get distracted/go off on tangents, and have even been known to flat out lie to another person‘s face.

My experience with ChatGPT, Claude et al. is that these services are actually pretty good, and sometimes shockingly good, at technical tasks. I’ve used them for stuff like revamping and error-checking a problematic website which started malfunctioning after being ported to a new platform. It also walked me through how to use Terminal on my Mac to make bulk changes to the site’s code so that I didn’t have to fix literally thousands of pages. Five years ago, I would have done a ton of reading, visited some tech message boards, maybe asked a friend, or eventually figured it out myself... But A.I. helped me get it done at least 95% faster.

I’ve also found that for fairly robust statistical analysis, it is pretty adept at doing stuff which is either tedious for me to do myself, or at the fringes of my actual expertise.

As with human work products, anyone who plans to actually rely on A.I. results ought to apply routine due diligence and common sense to its suggestions before implementing any major change, fix or new plan.

Sure, face-to-face with other humans we like to imagine that we have the skills and intuition and time to help spot someone who is not in fact competent, or bullshitting, or sloppy, etc. But can any of us truly say that we have never made a bad evaluation of another’s work or honesty or competence?

When you ask someone else to help you with... almost anything, take some basic precaustions. Example: My hot water heater has been performing poorly for several months. I had three guys from the firm who installed it show up and basically say “We dunno, we fiddled with some settings, see if it improves.” Didn’t improve. Eventually had to go to their office and track down the boss... who now thinks the whole thing needs replacing. Oh, and he causally mentioned there’s been a massive recall on my furnace for almost an entire year (deaths from CO leaks) which no one alerted me to, and which I only found out about because I happened to catch him getting into his truck that day. Nice guy... Generally does very good work... Was too busy to let me know about the [checks notes] RISK TO MY LIFE. Honestly if I were to set up an A.I. to monitor recalls on stuff I own, I’d probably be safer.

In general, the knee-jerk scoffing at/dismissal of anything related to A.I. doesn’t strike me as any more or less valid than A.I. output itself. It’s a human reaction to a fear of being replaced, or shown up, or undermined by technology. The results are obviously only going to as good as the ability of the operator providing the data, the instructions and the follow-up to implement them wisely. Like with everything else, all the time.
 
Last edited:

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account and join our community. It's easy!

Log in

Already have an account? Log in here.

Back
Top Bottom