AI can now beat multi player Hold’em game (1 Viewer)

Rhodeman77

Straight Flush
Joined
Nov 11, 2014
Messages
8,546
Reaction score
23,933
Location
Cleveland
Last edited:
Online Poker........

4911695.jpg
 
“The bot learned to pick its moments and then make huge bets and bluffs — bigger than most humans would make. That's what Elias found when he played against it.

"The bot was not afraid to make these kind of plays often," he says. "Which is something that humans could probably do a little more." Elias says he's starting to incorporate bigger bets into his own game.

The bot was excellent at varying its strategy even when dealt the exact same hand, Elias says, "which is pretty tough to play against because you can't really pick up a pattern."”
 
I was assuming the amount of computing power for this was going to be significant... I was wrong:

Unlike systems that can master three-dimensional video games like Dota and StarCraft — systems that need weeks or even months to train to play against humans — Pluribus trained for only about eight days on a fairly ordinary computer at a cost of about $150. The hard part was creating the detailed algorithm that analyzed the results of each decision. “We’re not using much computing power,” Mr. Brown said. “We can cope with hidden information in a very particular way.”
 
Just numbers based data for the algo so I guess not that surprising? but still sad for us lol
 
The "algorithm" part would barely use any fixed numbers if it works with machine learning. The only fixed part of it that are directly set by a human would be the function that calculates which of two plays was the better one, which might operate on some constants, and to teach the bot the rules of the game - what is allowed and what not.

The AI however plays those giant number of hands in order to learn from the statistics it gathers. If it bluffs in a particular situation and gets caught, hence loses money, it will still try the same in a couple more similar hands. If the bluff is caught in 4 out of 10 cases it will "learn" that in the long run the bluff is still profitable and hence should be played, but perhaps not as often as other bluffs that turned out to have a higher success rate.

The effective approach of this bot seems to be very similar to the very complex and prep-intensive strategy Ed Miller presents in his book "Poker's 1%". Just applied in the absolute perfect way, better than any human could even if he did all the preparations etc, and with much more intel more accurately gathered from statistics rather than guessing (villain ranges).
 
Make the robot play the robot and learn against the other. That'll fuck 'em up.


Pluribus succeeds because it can very efficiently handle the challenges of a game with both hidden information and more than two players. It uses self-play to teach itself how to win, with no examples or guidance on strategy.
https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-in-6-player-poker/


Some samples.
 
I think I kind of get how that second hand river reraise makes sense but can someone more knowledgable explain why that would be the appropriate response?

I think I understand it saying:
X% of hands will fold no matter the reraise is with weak hands,
X% of hands will have a better hand/kicker and still fold with fear of the large reraise (thinking two pair or trips?),
X% will actually have the better hand and call but I'm assuming the AI would calculate the odds of the opponent having trips as lower than actual based on better pattern/sizing and obviously would also calculate in the probability they have two pair or a higher kicker, and
X% will call the crazy high reraise with Q6/8/10 and that small sliver whose calling will make the reraise (obviously calculated as a specific probability $$$ amount) worth it based on the number crunching even with the small probability of a call and loss. PLUS it provides cover when the AI semi-bluffs (again?) in the future.

It's amazing how it makes sense when you see it but if I (granted total amateur) saw it I'd think the player was bonkers just looking to gamble.
 
Last edited:
I’d like like to see Pluribus win $1000/hr against the nits at my local casino. :D
 
Last edited:
I'd like to see some hands were the bot hits the board, but subsequently folds. Is it a calling station?
 
I'd like to see some hands were the bot hits the board, but subsequently folds. Is it a calling station?

Might be able to get that for you...

The data is attached to a paper written by the guys here: https://science.sciencemag.org/content/early/2019/07/10/science.aay2400/tab-figures-data

Their HH is not in the usual format for us poker players...but with some light parsing it should be possible...

This contains the hands played in the 5 humans + 1 AI set of experiments involving Pluribus. This document describes the format of this data.

Each file contains a series of lines that look like this:

STATE:7:fr225fffc/cr475c/cr1225c/cc:5hJc|Jd9h|6s5c|Ah7h|2s2d|3hTs/3sJh2h/Tc/Ks:-50|1275|0|-1225|0|0:Gogo|Budd|Eddie|Bill|Pluribus|MrWhite

STATE means this line is a hand of poker

7 is the index of the hand in this session for this table

fr225fffc/cr475c/cr1225c/cc is the sequence of actions that have occurred in this hand. '/' signifies the end of a betting round, 'f' means fold, 'c' means call, 'r' means raise. The number following an 'r' is the total number of chips that player has in the pot after the raise (including money from all prior betting rounds in this hand).

5hJc|Jd9h|6s5c|Ah7h|2s2d|3hTs/3sJh2h/Tc/Ks shows the cards that were dealt to the players and the public board cards that were revealed. Each card is represented by two characters: the first showing the rank and the second showing the suit. 2 through 9 are possible ranks of the cards. Additionally, T = 10, J = Jack, Q = Queen, K = King, and A = Ace. 's', 'h', 'd', 'c' are the possible suits of the cards. The private cards dealt to the players are shown as pairs of cards with '|' separating the players. The '/' signifies the start of the public board cards. The first three cards after the first '/' are the flop cards (revealed on the second betting round). The card after the next '/' is the turn card (revealed on the third betting round). The card after the last '/' is the river card (revealed on the fourth betting round). If the game ends before a board card is revealed, that board card will not be listed.

-50|1275|0|-1225|0|0 shows the amount of money won or lost by each player, with each player separated by a '|'

Gogo|Budd|Eddie|Bill|Pluribus|MrWhite shows the participants in this hand and their positions. In this case, Gogo is the small blind and Budd is the big blind. Eddie is the first player to act on the first betting round in this hand. On all subsequent rounds, Gogo will be the first player to act (if still in the hand).
 
I'd like to see some hands were the bot hits the board, but subsequently folds. Is it a calling station?

From a WaPo article:
WASHINGTON POST said:
Les recalled another hand that Pluribus lost but that revealed something about the bot. Pluribus had three twos, a pretty good hand, and made one of its typically aggressive bets, three times the pot value, about $3,000, as Les recalled. Then a human opponent went all in. Pluribus folded.

That sounds like a bad move on its face. Pluribus lost so much money! But the bot didn’t care. The bot sticks to a strategy that seems to work inexorably over time even if there are losses in the mix. That includes folding without a worry and not fretting about lost money. A human would be very reluctant to simply give up on a hand with three deuces and with $3,000 already committed to the pot, Les noted.

“A lot of humans might be, like: ‘I’ve got three of a kind. I’ve got such a good hand. I can’t let this guy push me around,’ " Les said. “The AI doesn’t have an emotional response like that. It just has a strategy.”

Brown said of his invention: “The bot is always playing the long game. As long as it’s right most of the time, it’s going to make money in the long run.”
 
I think I kind of get how that second hand river reraise makes sense but can someone more knowledgable explain why that would be the appropriate response?

It's amazing how it makes sense when you see it but if I (granted total amateur) saw it I'd think the player was bonkers just looking to gamble.

What I am still working on myself is making a credible line to base a specific later bluff on, or, deciding if I have made a credible line when considering to bluff late in a hand.

--

The third hand was a better example where the bot simply picked up a barely credible story told by that player villain and feasted on it. A very vigilant and experienced poker player could have made the same decisions at the poker table, without resorting to any real statistics or other tedious number crunching.

Player showed strength with his reraise preflop, but then only called the re-reraise instead of outright going all in. Hence player can't have a high pocket pair in most cases but would most likely have two court cards, perhaps suited, probably an ace among them. AKs would be the most probable hand that would lead to this behavior. A ten or lower would only be found in his bluff hands, and given how costly that play is, the chance that player actually bluffs here should be very, very low.

On the flop, there was nothing player could possibly have hit with aforementioned range, except for a very high flush draw (or having a pocket pair of TT or below that was played in a ballsy way, which is unlikely, although not extremely). Player's check could be interpreted as reinforcing that suspicion, but on the other hand the high chance of the flush arriving by the river, together with the overcards that could also give him a suckout by making the higher pair - assuming bot held QQ or, far more likely, KK - would also very much warrant a large bet followed by an all-in reraise if raised, which he did not do. Both those points could nearly cancel each other out meaning the bot wouldn't really change his assumptions of villain's hand just because of that flop check. Since he only has a pair of sevens and no draw at this point, checking behind is absolutely reasonable, given the possibility of player sucking out by hitting one of his court cards. However a 1/2 to 2/3 pot bet in an attempt to semibluff steal the pot early when the bluffs don't cost that much yet must have ranked fairly high on bot's decision table as well. Perhaps it got downranked a bit because of the wild raising preflop inflating the pot above average for a flop pot.

Turn brings a nearly complete blank for player and bot knows it, holding the ace of spades himself and even another spade so he could potentially get a backdoor flush on the river. It is simply extremely unlikely player holds J9, 96 or 65 given his actions preflop. At this point, bot can be nearly absolutely sure player is bluffing with that turn bet, and doesn't have the worst chances for a very strong hand himself. However not yet having a strong made hand, a call instead of a raise seems like the far better choice.

River is a total blank as well which changes absolutely nothing in bot's assumptions. Player bluff bets, player gets called. Since bot reasonably assumes player is completely bluffing with nothing but a pair of balls/high kicker, the pair of sevens is good enough against that.

--

The second hand which you referred to is the same but with much finer nuancing.

Preflop raise from button is most likely an outright steal, and the range for steal hands would most likely be two low face cards or one high face card and a rag. Occasionally player would raise with a legitimate hand that he simply happened to get while on the button. Suited connectors are very unlikely given the game is shorthanded and there are only two players who could potentially call and bring enough money into the pot to allow for profitable drawing to a straight or flush later. Holding QJo, bot finds its hand ranks roughly the same as the average player hand given that range, so he calls and sees what the flop brings.

Flop has both bot and player hit the Q, bot however not knowing that player also hit and vice versa. Bot checks, knowing player may well hold a slightly higher kicker or even having hit two pair with Q9, which is on the low end of the preflop button raise range. Player betting under 1/2 pot claims he has hit anything up to and including top pair (most likely the latter), bot did as well, so call is a no-brainer.

Turn brings a blank, but bot still doesn't know whether player's kicker to the Q is higher, equal or lower than his, so another check is a straightforward choice. Player checking as well increases the likelihood player's kicker is inferior.

River brings another complete blank. Bot knows player has hit something, most likely top pair, and is fairly sure his own kicker is slightly better, but only by a tiny margin tops. Betting now that all the cards are out, having been passive before, would send player an unmistakable tell that bot made a better hand at some earlier point, revealing bot's early-started trap. Traps set up early would likely make player assume bot has hit high two pair on the flop, something he absolutely cannot credibly claim to have beat, given the low turn and river blanks. Thus, player would not call or reraise a bet here unless he actually has the better hand, which is extremely unlikely. So bot checks again. Why needlessly risk money. With player's weak 1/2 pot bet considering his previous line, it is clear player has not improved and is merely placing a thin value bet in the hopes bot has the slightly lower kicker. Bot therefore reraises all-in, knowing his chances of holding the better hand is slightly over 50%.

Bot knows that if he played this situation with player a hundred times, bot would make a profit. Bot also knows (albeit likely doesn't consider) that he can play millions of hands without rest and hence gain the advantage of the law of big numbers with no "pain" for him. Bot of course doesn't know his master's priorities he was not told about, for example deception, not having the bot play 24/7 but giving him some credible rest time so he would stay under the radar, therefore reducing the amount of hands bot can potentially play. But not only does bot not know, he also does not care. Making the choice to go with the big balls play is straightforward with this background.

A human player might of course consider here that he maybe never gets to play such a hand against this very villain again, he might consider his current session's performance particularly in regards to how much longer he plans to play in this session (am I up so much that I can easily do this near coin flip and still walk away with satisfactory winnings?). And hence would be much more inclined to either merely call or even fold.

Had player bet pot or more straight up, I'd assume bot had ranked a fold much, much higher among his options, assuming player has sort of thinly slowplayed a higher kicker to the Q or hit an extremely unlikely but ultimately stronger hand.
 
Last edited:
It would be pretty cool to be able to play against the bot. People saying this is killing the online game (not disagreeing) but the online card rooms could make this a sort of feature at least in the short term where the average joe can test their skill against this insane bot.

Anything is possible in the short term so people might sight down to play 3/6 NL and then hit and run the bot for a buy in and still have bragging rights.
 
I've been working on writing a very similar program that can learn to play any variation of poker ever since I closed down Chip Donkeys. I write code like this for work, and have been wanting to do this for years (but it's a lot of work and I have a one year old, lol). The most impressive part to me is the minimal compute power they were able to do this on. I would love to peek behind the curtain and see how they coded some of their algorithms to be so efficient. The concepts behind this aren't new though. It's reinforcement learning /Q-learning with recurrent neural networks, which is the same ML architecture behind alpha zero which beat the best Go players in the world after playing against itself for only a few hours. It doesn't need to know strategy at all. All it needs are options and outcomes. I love that it donk bets too; a topic I've been arguing about with some of my pro friends for decades. Exciting times for an AI geek like me.
 
Uh, fake news? Here’s Pluribus’s graph:
311047

Pluribus LOST 7 BB/100 over the 10,000 hand sample. Feels like a relevant fact. Maybe more relevant than “won $1,000 per hour”.

The research team “adjusted for luck”..which is way more in depth than the fairly standard assignment of all-in EV for those situations, and what do you know, their bot “won $1,000 an hour”. Is their luck adjustment valid? Possibly, but I get skeptical when the same team that is touting victory is the one that created the convoluted measurement methods. And any article that says Pluribus won without an asterisk is bullshit.
 
Uh, fake news? Here’s Pluribus’s graph:
View attachment 311047
Pluribus LOST 7 BB/100 over the 10,000 hand sample. Feels like a relevant fact. Maybe more relevant than “won $1,000 per hour”.

The research team “adjusted for luck”..which is way more in depth than the fairly standard assignment of all-in EV for those situations, and what do you know, their bot “won $1,000 an hour”. Is their luck adjustment valid? Possibly, but I get skeptical when the same team that is touting victory is the one that created the convoluted measurement methods. And any article that says Pluribus won without an asterisk is bullshit.

LOL!!!!

I saw somewhere it was bragging it beat LinusLove as well.... man I’d love it see Linus go at it with this machine heads up for at least 50,000 hands. It’d be the human walking solver vs the computer solver.
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account and join our community. It's easy!

Log in

Already have an account? Log in here.

Back
Top Bottom