mathletix

Jul 24, 2025

Small stakes give you the minimum blues

(This is an excerpt from a larger project about sports gambling. Code and early drafts of some of the materials can be found at https://github.com/csdurfee/book.)

Spoon, "Small Stakes"

I'll be talking about "the public" in this installment, by which I mean the side of a wager that gets the most number of bets placed on it.

I talk about the vig a lot without explaining it. It's explained in the book, but the short version is on standard bets, a gambler needs to win at least 52.4% of the time against the spread to break even due to needing to risk $110 to win $100. That $10 difference is the vig -- how the sportsbook makes their money.

In gambling circles, bets are often framed as Vegas or the sharps versus the public. Sharp started out as a term for cheaters -- dishonest bookies setting unfair lines, or card sharps who win thru deception rather than skill. The meaning has changed a bit over time. In modern parlance, a sharp is someone who wagers on sports as a game of skill, making money over the long term by placing bets with positive expected value. But the negative connotation persists in popular chatter about gambling.

Say there's a game between the Lakers and the Charlotte Hornets, and the Hornets win against the spread. The public lost. What degenerate is betting the Hornets? Sharps, that's who. You'd think the public wouldn't have a problem with the sharps -- at least someone won money off Vegas tonight. Without the sharps, all the money that the public lost would go to Vegas. But Vegas and sharps are often conflated together. It's the public versus everybody.

It seems unlikely to me that it's always the public on one side of the bet and sharps on the other. The public is still right around 50% of the time, right? They can't be drastically worse than a coin flip, so taking the opposite bets can't be drastically better than a coin flip. That means that sharps are going to agree with the public at least some of the time. They might fade the public (bet the opposite side) more often than they agree with the public, but there's probably a fair amount of both.

How do bettors do against the spread as the season goes on?

Does the public side do better over time? If records against the spread were random and the lines totally fair, we'd expect the public's winning percentage to bounce around pretty close to 50%, spending about as much time on both sides of the line -- sometimes doing a little better than 50%, sometimes a little worse. Over the course of the season, the public's cumulative record against the spread should get closer and closer to 50%, as the sample variance gets smaller.

Here's the 2024-2025 data. This is the public's winning percentage, graphed as a 100 game moving average:

2024-ma

The white line is the start of the All-Star break. The public was winning well below 50% of their bets until a surge in the 100 or so games before the break, as we can see on the cumulative graph:

2024-cumu

The public ended up going 584-614 on the season. Someone taking the public side of every bet against the closing line would have lost 91.4 units on the season, for a 48.75% winning percentage.

The yellow line is the break-even point for fading the public -- taking the non-public side on every bet over the season. Up until that surge before the All Star break, it would've been extremely profitable to do so. Even by the end of the year, the public's win percentage didn't get close to 50%. Someone betting at -105 reduced juice could have made .8 units by fading the public on every single bet.

The public were 369-388 when betting on the favorite, and 215-226 betting on the underdog. They went 293-311 when the away team won, and 291-303 when the home team won. They were bad no matter how you slice it.

While that's all super weird, it's only one season. My data source (sportsbookreview.com) only has spotty data for the 2023-24 NBA season, but they do have mostly complete data for 2021-22 and 2022-23. (Nothing before that, unfortunately.)

2021-22 season

I have data for 1108 out of 1230 regular season games for 2021-22.

The public went 566-542 on the season, for a loss of 30.2 units, much better than 2024-25.

Here's the 100 game moving average:

2021-ma

Except for a dip in early March, the public did consistently fairly well. Not well enough to make money, but better than 50% win percentage.

On the cumulative graph, while fading the public (yellow line) would have been profitable for the first month or so, the graph spends most of the season over the 50% line. However, it never gets over 52.4%.

2021-cumu

2022-23 season

I have data for 1176 off 1230 games in 2022-23.

The public went 587-589 for the season, for a loss of 61 units on the season. Here's the moving average:

2022-ma

And the cumulative:

2022-cumu

This one is similar to the 2023-24 graph, where the public pretty consistently lost a little bit more the 50% of the time, but not often enough to make fading the public a viable strategy.

Are team records against the spread a Martingale?

I started to answer this last time, but didn't have time to go deeper. If betting records are random, previous performance gives no information about future performance. Each game is like a coin flip, with equal chances of heads and tails. Teams will have good or bad records against the spread due to chance alone.

However, I gave some plausible reasons why this might not be the case.

The simplest way to test this I could think of was comparing records against the spread in the 1st half of the season to the 2nd half of the season. If the records are random, there should be no correlation between 1st half and 2nd half records.

I found there was a positive correlation between 1st half and 2nd half records in all three seasons I have data for. In 2023-24, the correlation coefficient was .10. In 2022-23, it was .40, and in 2021-22 it was .27. Only 2022023 was statistically significant. Assuming randomness, positive and negative correlation should be equally likely. So all three being positive is suspicious. I definitely can't rule out there being a non-random aspect to records against the spread over time.

It's not quite good enough for an automated betting strategy, though it's close. Say we track which teams had winning records against the spread over the 1st half of the season, then bet on those teams for the 2nd half of the season. (I didn't bother to filter out the games where teams with winning records play each other, so this analysis isn't perfect.)

In 2024-25, that would give a record of 297-297 ATS -- can't get more fair than that.

In 2022-23, it would have gone 279-241, for a profit of 13.9 units at standard vig, and a 53.7% winning percentage.

In 2021-22, it would have gone 247-225, for a loss of .5 units and a 52.3% winning percentage.

So, it's definitely not enough to be profitable as a strategy on its own. But for such a simple strategy to be close to profitable in 2/3 years is interesting.

A gambler needs to win at least 52.4% of the time to break even against the vig. Say they're picking from a subset of bets that have a 52.3% chance of winning, as the naive strategy achieved in 2021-22. They'd just barely need to do better than flipping a coin to be profitable. That could be much easier than picking from a set of bets with a 50% chance of winning, right?

Final thoughts

In all three seasons, the public did a little worse in the first half of the season than the second half. In the two most recent seasons, the cumulative winning percentage was below 50% for nearly the whole season.

That doesn't seem random to me. It makes sense that sportsbooks would offer slightly more favorable odds to the less popular team in order to attract equal money on both sides. It also makes sense that sportsbooks would be happy if the team with more money on it lost over 50% of the time. The difference between the public winning 50% of the time and the public winning 48.5% of the time could be significant on enough betting volume.

In all three seasons, there was a positive correlation between a team's record against the spread over the first half of the season and the second half. The correlation is strong enough that over 3 seasons, it's almost possible to make money by betting on teams with a good first half record.

On both points, I don't have nearly enough data to draw grand conclusions about how "the market" operates -- this is just one sportsbook, and an unknown one at that. Yahoo and DraftKings also provide betting percentage data, which would be useful for cross-checking these trends. I'm going to hold off for now, though -- there are too many other interesting things in the world.

Jul 18, 2025

The public wants what the public gets

(This is an excerpt from a larger project about sports gambling. Code and early drafts of some of the materials can be found at https://github.com/csdurfee/book.)

The Jam, "Going Underground"

Two types of people

Lots of sportsbooks publish info on how much action they've gotten on each side.

Here's DraftKings': https://dknetwork.draftkings.com/draftkings-sportsbook-betting-splits/

It's a smart move. It's good for SEO (to the extent that still matters). And I'm sure they get a lot of people who decide to take bets from that page.

For example, the Pacers were playing Brooklyn the night I wrote this. 27% of the bets were on Brooklyn at +10.5. 73% are on Pacers -10.5.

Somebody who sees that and decides to make a bet based on that information could bet either way. They could either tell themselves, "Everybody's taking the Pacers, so it must be a good bet" or "Everybody's taking the Pacers, so it must be a bad bet".

What are those two groups like when they're not betting on basketball, do you think? Do they use the same kind of toothpaste? Watch the same kind of TV shows? Vote the same way?

The public gets what the public wants

One bit of gambling lore is that there are "public" teams that get bet on more frequently, regardless of the line. Like, your cousin who's a Cowboys fan is going to bet the Cowboys on Thanksgiving regardless of whether it's a fair line or not. He'd watch the game and root for the Cowboys anyway, but it's a little more fun that way. The Cowboys aren't just a random number generator to him.

There's a social aspect to gambling now that I imagine didn't exist when it was underground. Lots of gamblers will "follow" bets that other people have placed. If the bet wins, I'm sure it's a cool communal thing to be a part of. But social media can act in opposition to the "wisdom of crowds" -- in places like reddit where users vote content up and down, the conventional wisdom is going to be amplified, and people with minority opinions are going to be suppressed. If well over 90% of sports gamblers lose money long term, the majority opinions are going to be bad.

I scraped betting percentage data from sportsbookreview (SBR) for the 2024-5 season. They don't say where they get the betting percentages from. If I had to guess, it would be MGM Grand, their primary source of other data. The SBR numbers seemed to indicate more action overall than a couple other sources I found -- the betting percentages were closer together. Other sites had games where there's 10% action on one side and 90% on the other, which seems implausible on a large volume of bets. So it's probably a pretty big site, whatever it is.

As with the data from the previous installment, there are 32 games out of 1230 missing data.

The money_percents column is the median amount bet on each team. The money_game_winners column tracks the number of games where that team got the majority of the money bet on their side. Both of these can be taken as indicators of how much teams are favored by the public.

Here are the teams sorted by money_percents. The teams near the top were less popular with gamblers, the teams at the bottom more popular.

winner loser ats_win_pct money_percents money_game_winners
New Orleans 34 44 44 39.5 20
Charlotte 36 42 46 41.5 24
Miami 39 41 49 43 20
Philadelphia 26 52 33 43.5 29
Portland 45 33 58 43.5 25
Orlando 41 40 51 44 29
Utah 39 38 51 45 33
Sacramento 35 44 44 45 32
L.A. Clippers 47 34 58 46 27
San Antonio 38 41 48 47 31
Chicago 42 38 52 47 36
New York 38 44 46 48 38
Phoenix 29 49 37 48 36
Washington 33 46 42 49 37
L.A. Lakers 48 33 59 51 42
Indiana 38 43 47 51 42
Atlanta 37 42 47 52 41
Boston 39 42 48 52 43
Minnesota 37 43 46 52 43
Dallas 37 44 46 52 41
Detroit 41 38 52 53 43
Brooklyn 42 35 55 53 41
Toronto 49 28 64 53 47
Golden State 42 40 51 54 51
Houston 44 38 54 54 49
Oklahoma City 53 29 65 54.5 54
Milwaukee 44 38 54 56.5 56
Cleveland 47 33 59 57 53
Memphis 41 41 50 57 51
Denver 37 45 45 58.5 63

The public favorites

The most popular teams with NBA gamblers were Denver, Cleveland, Memphis, Milwaukee, and Oklahoma City.

Cleveland, OKC and Memphis were dominant for most of the season.

Denver and Milwaukee have two of the best and most entertaining players in the league. Both Giannis for Milwaukee and Jokic for Denver are fun to root for. People like to take bets on games that are fun to follow.

The ugly dogs

The bottom teams were New Orleans, Charlotte, Miami, Philadelphia and Portland. All these teams except for Portland were totall bummers to watch and cheer for this year. They all had injuries and organizational dysfunction that led to wasted seasons. People don't like to take bets on games that are a bummer to follow.

Against the spread

Here's the same data sorted by record against the spread.

winner loser ats_win_pct money_percents
Philadelphia 26 52 33 43.5
Phoenix 29 49 37 48
Washington 33 46 42 49
New Orleans 34 44 44 39.5
Sacramento 35 44 44 45
Denver 37 45 45 58.5
Dallas 37 44 46 52
Charlotte 36 42 46 41.5
Minnesota 37 43 46 52
New York 38 44 46 48
Indiana 38 43 47 51
Atlanta 37 42 47 52
San Antonio 38 41 48 47
Boston 39 42 48 52
Miami 39 41 49 43
Memphis 41 41 50 57
Orlando 41 40 51 44
Utah 39 38 51 45
Golden State 42 40 51 54
Detroit 41 38 52 53
Chicago 42 38 52 47
Houston 44 38 54 54
Milwaukee 44 38 54 56.5
Brooklyn 42 35 55 53
L.A. Clippers 47 34 58 46
Portland 45 33 58 43.5
L.A. Lakers 48 33 59 51
Cleveland 47 33 59 57
Toronto 49 28 64 53
Oklahoma City 53 29 65 54.5

Philadelphia, Washington and Phoenix were just as terrible at the sportsbook as they were on the basketball court. OKC and Cleveland had outstanding seasons in both places.

However, there's only a rough correlation between how good the teams were at actual basketball, and at beating the spread. Minnesota, New York and Denver were in the bottom 10 by winning % against the spread, even though they had good records and were doing their best to win. Toronto and Brooklyn weren't really trying to win a lot of basketball games, but ended up in the top 10.

Which teams should the public love and hate?

I calculated the amount of units a gambler would win if they bet on each team when they got the majority of the bets. public_units is the amount won/lost betting in favor of the team, and fade_units by betting against them, when they are the public team. (The two values are different because of the vig.)

Phoenix, Sacramento, Dallas, Denver and Indiana disappointed the public the most.

public_units fade_units
Phoenix -16.5 12.9
Sacramento -14.2 11
Dallas -13.6 9.5
Denver -12.6 6.3
Indiana -12.6 8.4
Atlanta -11.5 7.4
Utah -11.1 7.8
Chicago -10.2 6.6
Minnesota -9.5 5.2
Detroit -7.4 3.1
Philadelphia -6.7 3.8
New York -6.1 2.3
Brooklyn -5.2 1.1
Washington -5 1.3
Boston -3.2 -1.1
New Orleans -3.1 1.1
Memphis -1.5 -3.6
Charlotte -1.2 -1.2
Miami -1 -1
San Antonio -0.5 -2.6
Orlando -0.4 -2.5
Milwaukee 1.4 -7
Golden State 2.7 -7.8
L.A. Lakers 4.2 -8.4
Houston 4.9 -9.8
Cleveland 6.8 -12.1
Portland 10.3 -12.8
Toronto 11.3 -16
L.A. Clippers 12.3 -15
Oklahoma City 18.3 -23.7

This is a pretty random list of teams, in both directions. It's a good illustration that gamblingball is different from basketball. It's not clear whether gamblingball is a game with an element of skill, or if it's all chance.

Are records against the spread due to chance?

If we assume that all variations are due to randomess, each game should be a coin flip whether the underdog or favorite wins against the spread.

Calculating exact odds using the binomial distribution, 94% of NBA teams should have between 33 and 49 wins against the spread over an 82 game season.

We'd expect 2 teams to be outside that range, and there are 3. Philadelphia went 26-52 in 78 games we have data for. Even if they won the other 4 games that are missing data, they'd only have 30 wins. So that record was definitely an outlier, but overall the season was about what we'd expect based on chance.

I find it very believable that some teams are more likely to have a winning record against the spread, because they are underestimated by the handicappers or the betting public. They end up getting lines that are too generous, and thus do better than expected against the spread. Toronto could be an example of that. They were bad, but they weren't really as bad as people thought.

Other teams could be inherently worse against the spread, as well. Perhaps they are super popular to bet on, so the lines tend to move against them -- a public team. Or perhaps gamblers and sportsbooks overvalue the team -- the conventional wisdom is that they'll be good when they're not. That definitely describes Philadelphia and Phoenix.

In both cases, the teams themselves aren't necessarily doing anything to be better or worse against the spread than an average team would be. It's about the perceptions of the bookmakers and gamblers.

Do gamblers follow the record against the spread?

If a team's record against the spread is due solely to random error, then we've got a LeMartingale on our hands. The current record would have no bearing on the future record. So gamblers shouldn't factor it in when deciding to take a bet or not.

By the end of the season, there was a significant correlation between money percents and win percentage against the spread. I wanted to see how that might've changed over time. So I generated the table shown above for every single day of the season, and calculated the Spearman rank correlation on that day. Here's what that looks like over time:

/img/money-ats-win-pct.png

The money percentages are cumulative,the mean of all games in the season that have come before -- it's not showing gamblers' betting behavior on a particular day, compared to records against the spread on that day. The graph is a lot smoother that way, but we're losing something.

It also doesn't show whether records against the spread are a Martingale or not. The correlation between betting percentages and win records increases over time, but that doesn't mean this is because gamblers are behaving rationally.

The jump in correlation around mid-Februrary corresponds to the All-Star break, which is curious.

Stay tuned; I'll have more on this.

Jul 17, 2025

Last fair deal in the country

(This is an excerpt from a larger project about sports gambling. Code used, and early drafts of some of the chapters can be found at https://github.com/csdurfee/book.)

The Grateful Dead, "Loser"

Efficiency of betting markets

The efficient market hypothesis says that given enough time and competition, free markets are able to establish the correct price for a commodity. In the case of sports betting, we could think of it as the price of a money line bet.

On a money line bet, you are betting on who will win the game straight up. You get a smaller payout for betting on the favorite, and a larger payout betting on the underdog. If the money line is negative, that's how much money you have to risk in order to win $100. For example, -200 indicates you have to risk $200 to win $100. If it is positive, that's how much money you win if you risk $100. If it sounds like a bad way to write the odds, you're correct.

A market maker will respond to an imbalance in bets by adjusting the price. If CLE -300 is a good value, people will rationally want to take it, driving the price up to, say -400. If it is a bad value, people will rationally want to take the other side and the price might go down to -200. These rational actors will collectively push the price towards the best possible estimate that humans can make. It serves as a sort of collective intelligence.

In the first installment, I showed that humans are irrational when it comes to sports betting, so I was skeptical of how good, or fair, the lines could be. Could I find proof of this collective intelligence in action? Are there any obvious market inefficiencies?

The data & stuff to know

Stats are from the NBA season. I screen-scraped the data from sportsbookreview.com. All data is from the MGM Grand. Unfortunately, some data is missing from around Christmastime, and a few random days in between. 32 games are missing from the data set out of 1230 total, 2.6% of all games. These are games that don't appear on sportsbookreview's website, or have incomplete data on there.

This is an analysis of the MGM Grand's NBA lines for 2025. It's not a comprehensive guide to how the lines work.

There are always two lines on each game, one for the home team and one for the away team. Each side may have different vigs. Say for instance Bucks @ Pacers starts out at IND +3.5 -110/MIL -3.5 -110. It could close at IND +3.5 -115/MIL -3.5 -105. So it costs more to bet on the Pacers, but the actual line didn't move. I'm mostly ignoring that, but will point out when it's relevant.

"Line" and "spread" mean the same thing.

"Reduced juice" means risking -105 or -106 instead of the usual -110 to win 100. A "unit" is a gambler's standard betting size. "4.1 units of profit" would mean +$410 for a gambler betting $100 a game. Both are explained in much more detail in the book.

A note about pushes

When the final score agrees with the line exactly, neither side of the bet can be declared a winner. This is called a push. The bet is cancelled and everybody gets their money back. The casino makes nothing.

The MGM Grand always keeps point spreads on the half point (eg +6.5 or +7.5 rather than +7) so that they will never push. I don't think it's a bad policy, and I'm surprised more sportsbooks don't do it. The sportsbooks know how good their customers are at betting, so they should probably shade the point spread a half a point towards the side of the bet that has the less savvy bettors on it. (This assumes the sportsbook can identify and ban arbitrage gamblers, but more about that in the book.)

Analysis

If there is a wisdom of crowds, the final lines should be more accurate than the opening lines. Are they?

My code calculates the difference between the final score and the line, called the error. Because the MGM's lines always end in a half point, that means the error is going to be artificially high -- there will never be a game where it is exactly zero.

The opening and closing lines are a set of predictions. The smaller the difference between the line and reality, the better the prediction. Mean Squared Error is a standard way to compare two prediction systems in statistics and machine learning.

The MSE for the opening lines is 191.06, and the closing lines is 184.8. So we can say that in aggregate, the closing lines are more accurate than the opening lines.

MSE can't tell us how good the closing lines are, though, just that one set of predictions is better than another set. It's a relative measure, not an absolute one. We're squaring the error, so the MSE will always be positive. The errors in one direction don't cancel out ones in the other direction.

Let's look at how far the lines were off by. the Os are the opening lines, and X's are the closing lines. If the X is closer to the center line than the O, the market action made the line more accurate. I've plotted a random sample of 300 games to make the plot more readable.

/img/scatter-mess.png

Unfortunately, that doesn't really show us much about how or when the closing lines are better than the opening ones.

Adam Smith, Handicapper

When were the closing lines more accurate than the opening lines?

Closing lines better: 467
Opening lines better: 384 Tied: 347

If the free market were a handicapper, and we interpreted the line movements as a bet on one side, they would have a 54.88% winning percentage (and 347 pushes).

While that's a respectable win percentage for a human trying to beat the spread, I was expecting better from the free market. The market only being right 55% of the time holds true for a couple of previous seasons I have looked at as well. NBA betting, as a market, is not very efficient.

There are good reasons for that. Sportsbooks that set the opening lines aren't trying that hard to be accurate. It's just a first guess. Only a tiny percentage of money is wagered at the opening line number. However, there are good reasons why lines tend not to move very much, even when the opening line is a bad one. For an in-depth explanation, check out The Logic of Sports Betting, by Miller and Davidow.

The myth of closing line value

The conventional wisdom is that sports betting markets are efficient, so that the only way to make money over the long run is by doing better than the closing lines, picking up on any flaws in the opening lines before the market eliminates them. Anyone else can only make a profit due to chance. From this perspective, the right way to measure a handicapper's skill is how their picks compare to the closing line. Say the opening line is Nuggets -3, and I take the bet at that number. The closing line is Nuggets -6. Then I captured 3 points of value against the closing line. This is known as closing line value (CLV). (We can figure out how valuable those 3 points are, and I show how in the book.)

Beating the closing line might be positively correlated with higher profits when analyzing betting records of touts -- people who sell betting picks for money. But when the market is wrong 45% of the time, focusing too much on CLV seems like a bad idea. There's no good reason to believe that a gambler is destined to lose money by picking against the closing lines. What if their strategy is to mostly bet against the prevailing wisdom on the 45% of games where the market is wrong?

CLV is a prime example of Goodhart's Law. As a measure of a handicapper's skill, it's probably fine (though not ideal). But it shouldn't be the target. A gambler shouldn't make picks explicitly to capture as much CLV as possible.

Say the opening line is Nuggets -3 against the Timberwolves. I like the Nuggets in this matchup, but I think the public will go for the Timberwolves and it will finish at Nuggets -1/Timberwolves +1.

If I was trying to capture as much CLV as possible on this bet, I should take the Timberwolves +3 on the opening line, even though that's not the side I actually like!

If I was trying to actually win the bet, I should take the Nuggets at the closing line, hoping maybe I can get Nuggets -1 or even Nuggets +1. I can never get positive CLV on the Nuggets, because the market was wrong about them. Not me, the market!

CLV gets described as being the best way to test a handicapper's skill, but it's obviously non-optimal. Maybe it's the contrarian in me, but Opening Line Value -- identifying bets where the market is going to be wrong, and waiting till the last minute to place the bet -- is more impressive.

The best way to test a handicapper is to have them write out what they think the lines should be, rather than making a binary decision about somebody else's line (favorite or underdog). If a handicapper's lines are closer to the truth than the closing lines, they are good at handicapping. Looking at what bets they took is only a secondary signal of that. If they took Nuggets -7, is it because they thought the true line should be Nuggets -8, or Nuggets -12?

When the line doesn't move

Setting aside why the market moves in the wrong direction 45% of the time, I'm curious about the games where the spread didn't move at all. Maybe those lines were perfect as-is? If so, we'd expect to see equal splits of home vs. away winners, and underdog vs. favorite winners. There shouldn't be any bias to those games. The free market is essentially labelling these the pinnacle of the handicapper's art, impossible to be improved upon.

The difference between the predicted outcome (the line) and the actual outcome is a combination of how much the line maker got it wrong, plus random variation. So the games where the line didn't move should be totally random, right?

They're not. If we look at games where the line didn't move, the away team went 184-163 in those games. Someone betting the away team in every game where the line didn't move would win 53% of their bets, for 4.7 units of profit at full vig, or 11.2 units of profit at reduced juice.

There's also a bias towards underdogs, who went 186-161 in this situation. Always taking the underdog would give a 53.6% winning percentage, for 8.9 units at full vig, or 15.3 units at reduced juice.

There's an even bigger bias if we combine the two. Away underdogs went 122-92 in these games, which is a 55.2% winning percentage, for 13.1 units of profit at full vig, and 17.1 at reduced juice.

None of these results are statistically significant, but they are very :thinking_face_emoji:

About the vig

When the vig is imbalanced, the side with the higher vig should be more likely to win, because they're winning less money in return. Moving the vig from -110 to -115 is a way for the bookmaker to discourage bets on one side without moving the line. Likewise moving it to -105 is a way to encourage bets on that side.

Since the MGM Grand always keeps their lines on the half point, we'd expect them to adjust the vig often rather than change the spread. They do for most of the games where they didn't move the lines, but 39% of the time the vig stays at -110.

If we break down the games where the line didn't move by vig, the underdogs went 62-43 when the vig was high (-115), 74-62 when the vig was at the standard level (-110), and 50-56 at low vig (-105).

Someone taking the underdogs when the line doesn't move, and the vig is -110 or -115, would've gone 136-105 this season, a 56.4% winning percentage, and around 18 units of profit (factoring in the additional -115 vig on some bets).

Now, the strategy is pretty convoluted, and won't necessarily hold for future seasons, but it's definitely evidence there could be irrational factors at work in the market. It certainly doesn't show the market to be the well oiled machine that Closing Line Value assumes it is.

Must love dogs

Winners ended up being pretty evenly divided between favorites and underdogs by the end of the season, but underdogs were way ahead for most of the year.

Betting every single underdog against the spread over the first quarter of the season would've been fairly profitable -- a 165-136 record (54.8% winning percentage), and 15.4 units profit at full vig. People betting favorites got killed at the beginning of the season.

Dogs and favorites were basically even through the middle half of the season, before favorites finished off 167-144 (53.7% win percentage) to even things out.

Here's a plot of the winning percentage of favorites over the course of the season. I skipped the first 50 games because of noise. The yellow line represents the winning percentage necessary for betting all underdogs to be profitable (at standard vig). That happens when the favorites win less than 47.6% of the time (which means underdogs win more than 52.4% of the time.)

It wasn't until the last month of the season that blindly betting all underdogs started being a losing proposition, even factoring in the vig.

/img/wp-vs-gameno.png

Did the lines improve over time?

I was curious if there was evidence that the errors were getting smaller, or more predictable over time.

The raw errors are too noisy to see any sort of pattern:

/img/err-vs-line.png

This is a plot of the 100 game moving average of the absolute error of the closing line. I don't see any trends to suggest the lines got more accurate with time.

/img/closing-line-err.png

The size of the error against the closing line isn't the ideal metric, because not all points are created equal -- the higher the line, the less surprising the error. (I'm going to skip discussing that for now, but it's explained in the book.)

Did the lines change over time?

I wondered whether the size of the lines changed over time -- did the games get more or less competitive over the course of the season?

This is a 100 game moving average of the average size of the spread. As we can see, the lines did get bigger near the end of the year.

/img/spread-over-time.png

It's possible the trend is due to scheduling, but the change at the end seems significant -- teams tend to give up near the end of the year. Bad teams want to be as bad as possible in order to get the best odds in the NBA draft, so they're not that competitive.

What type of games are affected by line movement?

There were 130 games where the winner flipped from the favorite to the underdog, or the underdog to the favorite, because of the line movement. In other words, these are games where either side could have won the bet, depending on whether you took the opening line or the closing line.

These games were perfectly balanced -- 65 times, the favorite won (vs the closing line); 65 times the underdog won.

What about games where the spread was extremely accurate (off by 3 points or less)? Underdogs went 138-121 in those games (53.2%).

The difference is more dramatic in games where the line was off by 1 point or less. The underdogs went 56-33 (63% win percentage). Of course, there's no way to use that as a betting strategy since we can't identify these games before the fact, but it does show a small potential bias in favor of underdogs.

What would "perfect" lines even look like?

It's rare to see NBA lines that are bigger than +15/-15 points. There were 15 this season, about 4.4% of all games. That's around one NBA game a week with a line that high.

By contrast, 31% of NBA games end with a score differential of over 15 points. That's 7x more often, roughly one game a day.

The lines really shouldn't be as large as the final score differential, because they are an estimate of the mean outcome of the game. If the Celtics beat the Raptors by 54 points, that doesn't mean the line should have been Celtics -54. The Celtics and Raptors played 4 times last season (data taken from basketball-reference). I'm going to ignore home court advantage -- imagine these are played at a neutral gym. The first game, Boston won by 3. The second, Boston won by 54. The third game, the Raptors won by 13. The last game, Boston won by 10.

Boston won by an average of 13.5 points, so BOS -13.5 would be a reasonable line for all four games, as that's the best estimate we can make of their difference in skill. Only 1 of the 4 games ended up close to that line. For the other 3 games, the error was at least 10 points. And Boston -- despite being the better team -- would have gone 1-3 against the spread. If the line for all four games was BOS -9.5, they would have gone 2-2, but the error would stukk be 44.5 points on the second game, and 23.5 on the third one.

The actual outcomes might be all over the place, but the spread isn't meant to predict the actual outcome, just the point where both sides are equally likely to win the bet.

Here's a histogram of the spreads (for the away team) overlaid on a histogram of the errors against the spread:

/img/spread-vs-score-diff.png

If we look at just the score differential, we can stick a bell curve over the top and it looks pretty normal:

/img/score-diff-normal.png

Simulating games from the spreads

The problem is, these aren't outcomes from one distribution. Every game is essentially a sample from a different distribution. Each game has a different mean (the spread, or rather the ideal version of it) and a different variance (how predictable games are between the two teams). Combining them all together, the results end up looking kinda normal (because a lot of things do).

I decided to simulate the entire season to show how the point differentials are going to be much bigger than the original lines.

I simulated every game by sampling from a normal distribution with the mean set to that game's spread, and the variance equal to the sample variance of all games that season with that spread.

Here's how they match up:

/img/point-differential.png

I know that's a pretty rough simulation. There's some weirdness in the middle. NBA games never end in a tie, and they end in a one point difference less often than expected due to tactical reasons, so there's a little notch right in the center of the green curve. If a team is down one, they foul the other team and hope they miss at least one of their free throws. There's also more simulated games than I would expect that end with a differential of +1 or -1. There very well may be a bug in my code -- it is in the "ep 2 LAST FAIR DEAL.ipynb" notebook. So there's a big discrepancy in the middle. But the spread of the data is the same, which is the main thing I'm trying to show.

Hopefully the simulation shows that the lines shouldn't be bigger than they are, even though they are frequently off by many multiples compared to the final result. If the line is Dallas -3 and the other team wins by 27, that doesn't mean the line was off by 30 points. The line is meant to be an estimate of the mean outcome, if the teams played each other a large number of times. We only ever see one sample, though, and a lot of times it is far from the mean.

Jul 16, 2025

Cool Parlay, Bro

(This is an excerpt from my book about sports gambling. Code and early drafts of some of the chapters can be found at https://github.com/csdurfee/book.)

Sportsbooks have many ways of encouraging people to lose their money as quickly and efficiently as possible. One of the best ways to do this is a type of bet called the parlay. "Parlez" means "to talk" in French, so it's no surprise dudes always want to talk about them online.

The idea behind a parlay is that you can bet on multiple events at once and if they all win, you make a nice profit, otherwise you lose. On a technical level, if the first bet on the parlay wins, the winnings are immediately placed on the second bet in the parlay, if that wins, it rolls over to the third bet, and so on. It's a sequence of bets, with the stakes going up with each bet. The individual bets in the parlay are known as "legs".

I'm going to keep asking this question: why would they be offering this bet if it was good for you? Parlays might be more fun, but that just means they found a way to get you to part with your money easier, which sounds like a bad thing.

We can compare different bets by using Expected Value (EV). EV is the weighted average of all the possible outcomes. If the expected value is positive, we will make money over the long run; if it's negative, we will lose money.

The traditional payout on a 4 team parlay is 10:1. What is the expected value of such a play? is it higher or lower than taking the individual bets?

It's easy to grind the math on this one, and see which option is better. I can't say "make us more money" because both types of bets are guaranteed losers without some sort of edge.

Parlays are such a bad type of bet on the surface that to understand them, I have to give a little taste of parlay culture first.

Nephew Doug

Say you want to place some bets. You just learned about betting on sports, so as a newbie you're trying to learn from experts by listening to gambling podcasts. These guys have been gambling for decades. Surely they must know what's up. Their wisdom will give you the edge for sure. Surely they will keep you from making costly mistakes.

You've listened to Nephew Doug's podcast, and wrote down his Locks of the Week. You're ready to enter them into your betting app, which has been hand-optimized to be as much of a dopamine and money sink as possible.

Because you listen every week, you know Nephew Doug has been burdened by the Gods with the gift of prophecy. Just ask him. He's like a modern day Cassandra, only it's about how the Cowboys are always going to suck.

Now, the Gods like a little competition. The Olympics were invented as a religious ceremony in their honor, after all. But they're not above making a call from on high to nudge the result a little bit. Yes, Zeus is definitely a Chiefs fan.

So you believe ahead of time Nephew Doug's picks will win 55% of the time. Which way of betting these picks will bring you the most money in the long run?

1) "throw 'em all in a 4 team parlay" like Doug and his buddy Jorts Guy do
2) randomly choose 3 of Nephew Doug's picks and bet them individually. Don't do anything with the 4th one.

Maybe option 2 seems insane to you. But let's game it out.

You listen to Nephew Doug, but I don't. My assumption would be this guy is no better than a coin flip -- he only wins 50% of the time, or close to it. Parlays at old school casinos pay at 10:1 and online sportsbooks pay 12.28:1. Let's see how that works out at 10:1 payout, the ones Jorts Guy and Nephew Doug were cutting their teeth on back in the day.

There are two ways to do the comparison. We could bet $100 on the parlay, and compare to putting $25 on each leg. Or we could compare $100 on the parlay to $100 on each leg.

Neither way of comparing is entirely fair, though, because the stakes increase throughout each leg of the parlay. If a gambler bets $100 on a 4 team parlay, they're risking $100 on the first leg. Assuming they keep winning, they're risking $190 on the 2nd leg, $300ish dollars on the 3rd leg, and $600ish dollars on the 4th leg.

For each leg of the parlay, the gambler should consider the risk of losing out on $600 if they took a 3 team parlay and it won. Risking $100 on the 4 leg parlay is sort of like taking 4 different $600 bets, because each one could cost the gambler that much if it loses.

I will be comparing risking $100 on the parlay versus $25 on each of the legs.

7x worse

This difference only matters for bettors with a high degree of skill. For the beginner, who we can reasonably assume will no better than a coin flip, the parlay is always a worse option. Almost 7 times worse. The parlays lose about 30% per bet, versus 4.5% for the straight bets.

Yikes. Maybe parlays are fun, but it's like blowing the whole week's vig budget on Sunday when compared to taking one regular bet a day.

Even a stretch of good luck is going to get swallowed up real fast if you're losing 30% of the stake on average. These types of parlays are a sadness machine.

Partially blessed

What if Nephew Doug truly has been partially blessed by the Gods, and can beat the lines 55% of the time? That's pretty good. Only a small percentage of sports bettors can achieve that, in my research.

The gambler should turn a profit either way, but maybe parlays offer a better return?

The parlays have an expected return of +0.66%, versus +5% for the straight bets. The straight bets make 7.6x as much money.

OK, what about if we just don't play the 4th bet? We will bet on the first three legs, and don't play the 4th one. We put the $25 for the fourth bet in our piggy bank and earn a 0% interest rate.

We're throwing out a bet with positive expected value, and risking angering the Gods by ignoring their chosen sports prophet, Nephew Doug. Perhaps that will tilt things in the parlay's favor.

Nope! The straight bets have a return of +3.75%, which is still 5.7x better than the parlays.

Finally, let's say we flip a coin to decide the 4th bet. It will only win 50% of the time, which means it's a guaranteed loser because of the vig -- you win less than you have to risk. (There is much more about the vig in the book.)

Nope! The coin flip hurts our profitability, but we're still clearing +2.6%, which is 4x better than the parlays. We'd have to do 2 of 4 bets by coin flip for the parlays to be more profitable.

To be fair, 55% is just barely profitable for an old-school parlay. The profitability increases exponentially as the win rate goes up. There is a win rate where parlays would make more money than the straight bets. If you win 100% of the time, the parlay is definitely a better deal, right? 10x profits taking the parlay versus 4x taking the original bets.

Successful handicappers who sell their picks on the internet are only winning around 55% of the time. If you have to do as well at something as people who do it for a living just to break even, that's not a great plan.

Online Parlays

Online 4 leg parlays pay out $1228 per $100 risked, which make them a little less bad. However, they're still way worse for the average gambler than taking the straight bets. At a 50% win rate, online parlays have an expected return of -17%, versus -4.5% for the straight bets. So they're 3.7x worse. They're half as bad as the old school parlays, but still terrible.

That $1228.33 payout for online parlays was chosen deliberately. It means that online parlays have the same break-even point as the individual bets (at standard vig) -- winning 52.4% of the individual bets. Because parlay profits climb exponentially, that means a skilled bettor with a 55% win rate will have a much higher EV with the parlays. They will have a +21.6% rate of return, versus +5%.

EV doesn't describe the range of outcomes

Expected Value is a good way of determining whether you can make money taking a certain type of bet, but it doesn't describe the range of possible outcomes. With parlays, a lot of those outcomes are bad, even for a gambler with enough skill to make them more profitable on paper.

Simulations are great in this sort of situation, because they can convey the range of possible outcomes in a way EV can't. I simulated 200 individual bets versus 50 parlays, and ran that 10,000 times. Our virtual gambler wins 55% of the time, and bets $100 on parlays, $25 on each individual bet.

The individual bets made more money (or lost less money) than the parlays 38% of the time. Just because the expected value is higher for the parlays, that doesn't mean they will always be more profitable.

More concerningly, the parlays had big losses (down more than $1000 on $100 bets) 33% of the time. That only happened 0.2% of the time on the straight bets. There were almost no small losses with the parlays, because the payout is so high and the number of bets (50 parlays) is so low. Winning one more parlay could be the difference between being down $1000, and breaking even.

Expected Value can't be the only thing we consider, because we don't live an infinite life. Our whole life is a small sample size, if the variance is high enough. Our bankrolls are always finite. The fact that we might make more money over 100 years is undercut by the fact that we'll die or go broke before then.

Even for the skilled bettor, parlays make it more of a game of luck. Let's say my simulation represents an entire season of betting on basketball. Imagine playing the parlays with a 55% win rate, being better than almost everybody at handicapping, and still having massive losses one season in three? When you'd make more money taking the individual bets 40% of the time?

Parlay psychology

There's a weird psychology to the parlay as well. These parlays only win once every couple of weeks, so they'd be kind of a grim strategy in practice. A good bettor taking the individual bets will have winning days over half the time. Is it better to feel like a winner most days, or every other week?

Most people don't have to consider that question, though. Without a huge amount of skill at betting, the only scenario where they might make sense is as a lottery: something that can deliver a tiny chance of massive payouts without any skill.

What happens if we do the same simulation, but the win rate is 50%, like a coin flip, or most sports bettors? The parlays make money 38% of the time! Taking 200 straight bets will only make money 26% of the time. Isn't that a little surprising? Even though the straight bets have better expected value (well, less bad), they also offer less of an opportunity to make money based on chance alone.

What about over a longer time frame? I simulated 500 parlays versus 2,000 straight bets by flipping a coin. The parlays make money 12% of the time, versus only 1.74% of the time for the straight bets. However, the losses are much, much bigger than the wins:

img/parlay-500.png

12% is 1 in 8, which isn't that rare. 500 parlays could end up stretching over multiple seasons, perhaps a lifetime of sports betting. That means somebody who was making picks by flipping a coin could end up looking like a pretty good bettor for a long stretch if they are taking parlays. Of course, 88% of people will lose money, far more money than the 12% of profitable bettors win. In practical terms, it's like a lottery where you have a 12% chance of winning $3617, but an 88% chance of losing $10,221.

It's really, really hard to tell if someone taking bets at long odds is actually good at betting, not without thousands of documented bets. Over 50 parlays, or even 500, it's not that surprising for some people to look smart on parlays by chance alone. It would be much better to assess their skill based on the individual bets they took within the parlays.

Other parlays

Parlays with only 2 or 3 legs have higher relative payouts compared to 4 leg parlays, so they're not nearly as bad. They'll also have less variance in outcomes than the 4+ leggers. I'll leave those calculations to the reader, though. While they're one of the least bad bets offered by the average sportsbook, it's extremely rare to see people talking about 2 or 3 leg parlays online. Gamblers love the higher payouts and drama of parlays with a bunch of legs. I will have a lot more to say about how people actually play the parlays in a future installment.

Same Game Parlays (SGPs) are a new type of bet which allows the player to make multiple wagers on the same game. For instance, someone could bet on their favorite team winning and their favorite player scoring over a certain amount of points and the guy they hate on the other team scoring under a certain amount of points, with a big payout if all 3 things happen. SGPs have become the most popular type of bet I see online, and deserve their own lengthy discussion. For now just think of them as the vape pens of betting. They're obviously super addictive, extremely popular with younger people, and you can't really know what's in them, but it's probably bad.

Gambling gurus

I've taken up lot of hobbies over the years. It seems like every time I take up a new hobby, I end up spending a lot of money on stupid stuff at the beginning. Then I get into it more, and realize what matters.

There is an adverse selection process, where people who are new to a hobby have no idea what's actually good, what they actually need, or what things should actually cost. Filled with zeal to get started, they end up overpaying for inferior goods. Same thing for travelling in a new country. The guys at the train station trying to hustle you into a taxi are definitely not hooking you up with the cheapest way to get around.

Betting experts, the kind who have podcasts and big followings on social media, are supposed to know more about this stuff than the average person. They're supposed to be like the guidebooks, or the seasoned traveller telling the newbie to walk 2 blocks and take the metro for $2 instead of paying $100 for a taxi. Yet they're pushing parlays and other sucker bets, and pushing sportsbooks that charge full vig and ban anybody who wins too much. The "experts" are pushing beginners into bad situations.

Most of them don't do any better than flipping a coin, so I doubt they're actually making money on their "can't miss locks of the week". Gambling ads, sure. Everybody's taking gambling money right now, regardless of how it will hurt their brand, their audience, and sports long term. Clearly there's a lot of money in talking about it. That gambling money is there because these self-styled experts bring the sportsbooks more customers -- losing customers, specifically.

Where's all that ad money coming from? The sportsbooks wouldn't be throwing money at influencers who were actually winning consistently. Any gambling show they sponsor is pretty much guaranteed to lose you money, or it wouldn't be sponsored. Any bet they're promoting heavily, like they do with parlays, is because it makes them more money that way. You shouldn't need to know any math to figure out why they're pushing teasers and parlays and "profit boosters". I love that last one. It's like saying Idi Amin served mankind. Why would they care about boosting YOUR profits?

Unlike gamblers, sportsbooks don't make negative expected value plays due to emotions or lack of information. Your irrationality is their entire business.

Jun 30, 2025

The final word on the hot hand (for now)

(Notebooks and other code available at: https://github.com/csdurfee/hot_hand.)

Last time, we found that there are many players like LeBron, where their FG% is higher when they've missed most of their last 5 shots than when they've made most of them. However, most players don't have enough attempts when they've gone 0 or 5 out of their last 5 for a good statistical analysis.

So instead I will be looking at a binary split -- I will call a player cold when they've made 0, 1 or 2 of their last 5 shots, and hot when they've made 3, 4 or 5 of their last 5. Most players have a FG% between 40 and 60%, so this nicely splits them into times when they're shooting better than average versus worse than average.

Anthony Edwards

Anthony Edwards ("Ant") is particularly unstreaky for a young player. He's only completed 5 seasons in the league, but has the 5th biggest z score of the last 20 years. He could definitely catch LeBron someday.

Ant has the LeBron-like pattern of FG% trending downward when he's hot. He doesn't have anywhere near the volume of LeBron, so the spike at 20% (1/5) might just be noise. But overall, he shoots worse when he's been shooting well.

ant-last-5

The trend appears to be due to shot selection. He takes far more above the break 3 pointers when he's hot than when he's cold. The additional 3 point attempts come at the expense of shots in the restricted area.

Here are the changes in tendencies:

| BASIC_ZONE            |   hot |   cold |   diff |
|:----------------------|------:|-------:|-------:|
| Above the Break 3     |  41.5 |   31.7 |    9.8 |
| Corner 3              |   3.6 |    4.8 |   -1.2 |
| In The Paint (Non-RA) |  13.4 |   14.6 |   -1.1 |
| Mid-Range             |  14.1 |   12.2 |    1.9 |
| Restricted Area       |  27.4 |   36.7 |   -9.4 |

Of course, this would be justified if Ant shot above the break 3's better when he's hot, but he doesn't. He makes 37% of his above the break 3's when he's cold but that drops to 34% when he's hot. So he's trading restricted area shots, with an expected value of .601 * 2 = 1.202 points, for above the break 3's, with an expected value of .34 * 3 = 1.02 points.

Here are the changes in FG percentages. His FG% on corner 3's goes up, but it's on insignificant volume:

| BASIC_ZONE            |   hot |   cold |   diff |
|:----------------------|------:|-------:|-------:|
| Above the Break 3     |  34   |   37.1 |   -3.1 |
| Corner 3              |  45   |   33   |   12   |
| In The Paint (Non-RA) |  40.8 |   34   |    6.8 |
| Mid-Range             |  36.3 |   34.8 |    1.5 |
| Restricted Area       |  60.1 |   65.4 |   -5.2 |

The rest of the league

I looked at league-wide shot selection in hot/cold situations. I restricted to the last 10 seasons, since the rise of the 3 pointer has dramatically changed shot selection. Here are changes in shot selection for all players:

| BASIC_ZONE            |   hot |   cold |   diff |
|:----------------------|------:|-------:|-------:|
| Above the Break 3     |  22.1 |   22.3 |   -0.2 |
| Corner 3              |   6.2 |    7   |   -0.9 |
| In The Paint (Non-RA) |  15.8 |   15.3 |    0.4 |
| Mid-Range             |  25.3 |   23.5 |    1.8 |
| Restricted Area       |  30.7 |   31.8 |   -1.2 |

The mid-range shot is the lowest value shot type, so it's notable that the rate goes up when players are hot. These additional mid ranges come at the expense of Corner 3's and Restricted Area shots, the two most valuable types of shots.

As before, changes in shot selection could be justified if players actually shoot differently based on their last 5 results, but they don't. Here are the changes in shooting percentages (hot minus cold) for all players:

| BASIC_ZONE            |   hot |   cold |   diff |
|:----------------------|------:|-------:|-------:|
| Above the Break 3     |  34.7 |   35   |   -0.3 |
| Corner 3              |  38.4 |   38.9 |   -0.4 |
| In The Paint (Non-RA) |  41.7 |   41.2 |    0.5 |
| Mid-Range             |  39.8 |   40.1 |   -0.3 |
| Restricted Area       |  62.7 |   60.7 |    2   |

For 3 out of 5 shot types, the hot FG percentages are lower than the cold ones. Combined with the changes in shot selection, I think there's evidence that the league as a whole is scoring less efficiently because of the false belief in the hot hand.

The data says that players are essentially trading Restricted Area (.627 * 2 = 1.25 points per shot) and Corner 3 (.384 * 3 = 1.15 points per shot) attempts for Mid-Ranges (.398 * 2 = .796 points per shot) when they think they've got the hot hand. That's clearly bad! If it happens once a game, that's 38 points a year lost, which might be enough to swing a game or two.

The change in restricted area and in the paint (non-RA) FG% is intriguing, but if the hot hand did exist, wouldn't we see it on 3 point or mid-range shots, rather than restricted area shots? The announcer doesn't say "he's heating up" after a guy has made 3 layups in a row, they say it after 3 longer range shots in a row, right?

Higher volume players

I decided to focus on players with at least 1000 streaks, which leaves 630 players. Collectively, they are responsible for 84% of all shots in the NBA over the last 20 years.

Their FG percentages are, on average, 1% lower when they are hot than when they are cold.

68% of them shoot worse when they're hot than when they're cold, which is a pretty dramatic split.

fg-pct-hot-cold

Here's a plot of the difference between hot and cold FG% versus z-score:

z-score-hot-cold

Players with negative values on the x axis shoot better when they're cold, and positive values shoot better when they're hot.

Now, there should be some correlation between z-scores and hot/cold shooting tendency. I've shown simulations where a tendency to shoot better cold produces unstreaky results (skewed towards positive z scores), and better hot will produce streaky results (negative z scores). So there should be more dots in the upper left and bottom right quadrants compared to the other diagonal.

But if players behaved by coin flips, we should see roughly the same number of players with positive and negative z scores, and roughly the same number of players who shoot better when they're hot and better when they're cold.

I simulated all 3.5 million shots by these players, using their career average FG% for every shot. So any streakiness or unstreakiness is going to be totally random. As you can see, the data is much less spread out across both the X and Y axis.

sim-z-hot-cold

Here are the crosstabs from the simulation:

better cold better hot margin
positive z 178 135 313
negative z 112 210 322
margin 290 345

As promised, the marginal values are pretty close to one another. That's what happens when "better hot" vs. "better cold" and "positive z" vs. "negative z" are determined purely by chance.

Here are the actual crosstabs. The marginal values are much more imbalanced.

better cold better hot margin
positive z 343 126 469
negative z 88 78 166
margin 431 204

Things to note:

  • 68% of the players shoot better when they're cold.
  • 74% of the players have a positive z-score.
  • Even among players with a negative z score, the majority of them shoot better when they're cold.
  • Even among players that shoot better when they're hot, the majority of them still produce results that are less streaky than expected by chance.

That's all super weird!

As always, these are just general trends. There are 78 players in the "better hot" + "negative z" box, and there should be around 210 players. We can't really say which players are the 130 "missing" players, though.

That's all I've got on the hot hand in the NBA for now. I think I understand it a lot better now, and I hope you do, too.

Jun 18, 2025

LeSimulation

(As usual, all code and notebooks are available at https://github.com/csdurfee/hot_hand)

Last time, we saw that LeBron James was by far the un-streakiest player in the NBA over the last 20 years and found out that it's at least partly caused by shot selection. He takes both lower percentage shots than average when he's shooting well and higher percentage shots than average when he's shooting poorly.

LeMartingale

I got the question of why it's OK to use a player's overall FG% to gauge their streakiness. We know that every shot a player takes has a slightly different level of difficulty, and thus a different probability that it will go in. Shouldn't that affect the streakiness?

It's a good question. Let's say you've got a bag with 2 types of coins inside. One of them comes up heads 40% of the time, the other comes up heads 60% of the time. You can't tell which is which. If you pick a coin randomly out of the bag and flip it, what are the chances, on average, it comes up heads?

It's 50%, right? The selecting of the coin and the flipping of the coin are two independent steps. We can multiply the probabilities at each step together, so the overall chances of heads are (.5 * .4) + (.5 * .6) = .5. If we kept randomly selecting from the bag and flipping a coin, the results would be indistinguishable from just flipping a single fair coin over and over.

In math, this is known as a Martingale. Previous outcomes don't give us information about the next event. (More in depth explanation here). That's different from LeBron. We know he essentially chooses the 60% heads coin when he's been getting a lot of tails recently, and the 60% tails coin when he's been getting a lot of heads recently.

LeSimulation

If I create a simulation of LeBron James that uses his exact shooting tendencies and FG percentages, and the shot selection is totally random, it shouldn't show any streaky or unstreaky tendencies beyond expected by chance. Let's see what LeSimulation looks like.

At the end of the last edition, I got LeBron's shooting stats:

Above the Break 3        0.344598
Backcourt                0.058824
In The Paint (Non-RA)    0.401369
Left Corner 3            0.394799
Mid-Range                0.379890
Restricted Area          0.720138
Right Corner 3           0.370370

And shooting tendencies (what percent of the time he takes each type of shot):

Above the Break 3        0.204940
Backcourt                0.001160
In The Paint (Non-RA)    0.109652
Left Corner 3            0.014431
Mid-Range                0.267715
Restricted Area          0.386442
Right Corner 3           0.015660

The simulation randomly chooses a shot type, based on the actual tendencies, then attempts a shot at the corresponding FG%.

le-fake-career

The z-scores look like they should -- mean is very close to 0, standard deviation close to 1. No streaky/unstreaky tendencies, as promised. No evidence that shot attempts were at different FG%.

LeSimulation 2 - last 5 FG%

My next simulation uses LeBron's FG% over his last 5 shots. We've seen he shoots the best with 0 makes in his last 5; the worst with 5 makes in his last 5. The simulation uses his exact percentages at each level. For the first 5 shots of every game, it uses his career FG%.

I ran the simulation 1,000 times. Here are the z-scores:

le-fake-career-2

As expected, this simulation is pretty un-streaky:

count    1000.000000
mean        1.635843
std         0.985464
min        -1.665389
25%         1.001550
50%         1.623242
75%         2.346864
max         4.509869

It's still not nearly as unstreaky as the man himself, though -- Lebron's z score of 5.9 would be way bigger than the largest value in 1,000 simulations (4.5). So he'd still be an outlier compared to these simulated un-streaky players.

LeSimulation 3 -- No resetting streaks

What about a fake player where the streaks don't reset between games? That should make the simulated player even more unstreaky.

In this version of the simulation, every shot will be influenced by the FG% of the previous 5 shots, even if they happened in the previous game(s).

le-fake-career-3

Here are the corresponding z-scores:

count    1000.000000
mean        2.179821
std         0.963330
min        -1.033468
25%         1.536542
50%         2.203681
75%         2.865350
max         5.073167

So, the mean went from 1.6 to 2.2, and the max z score went from 4.5 to 5.1. That's still not nearly unstreaky enough to match LeReal LeBron, but at least it's closer.

It's possible that if we tracked the last 7 shots, or 9, instead of 5, we would see even more of a dramatic change in FG percentage. Or there's some other factor I haven't considered that is adding unstreakiness, such as the fact that his FG percentage tends to go down the more shots he's taken in a game.

DoppLeGangers

I was curious if I could find similar players to LeBron. There's a good way to do that, but I wanted to try my own way first. I found players where, like LeBron, their FG% steadily declines the more shots they've made out of the last 5. There are 18 such players in the 2004-2024 years: Karl Malone (his last season), Grant Hill, Ben Wallace, Eddie House, Michael Redd, Jarvis Hayes, Andres Nocioni, JJ Redick, Nicolas Batum, Goran Dragic, DeMar DeRozan, Patrick Beverley, Marcus Morris Sr., Bradley Beal, Kelly Oubre Jr., Norman Powell, Donte DiVincenzo, and Landry Shamet.

Overall, these players have a mean z-score of 1.47, which is pretty impressive, but except for Goran Dragic, there isn't much overlap over the players with the highest overall z scores. 18 players is a pretty small sample size, as well.

I also looked at a broader set of players where at least 4 out of the 5 comparisons were decreasing. This gave 180 players, with an average z score of 1.0.

LeRight way

The right way to identify LeBron-alikes is probably to use a similarity metric that I didn't invent. The fg percentages after 0,1,2...5/5 makes are sort of like a probability distribution.

In statistics and machine learning, we are often fitting a theoretical distribution to the actual observed data. Is it a good representation of the observed data? Do their distributions have the same sort of shape? The standard measure is relative entropy, also known as KL divergence.

If I normalize the shooting percentages and compare them to LeBron's, players with a low relative entropy should show the same tendency to shoot better when they're shooting worse than average over their last 5, and vice versa.

For example, LeBron's last 5 percentages are:

0    0.564612
1     0.50712
2    0.505937
3    0.496538
4    0.473849
5    0.464052

By normalizing them, they act like a probability distribution (they all add up to one) but still have the same relative proportions.

0    0.187448
1     0.16836
2    0.167968
3    0.164847
4    0.157315
5    0.154062

The normalization also corrects for the fact that shooters have different overall FG percentages.

Normalized values can then be compared to other players' values. The lower the entropy, the more similar their shapes are.

I also calculated the Jensen-Shannon distance, which is like relative entropy, but symmetrical (distance(le_bron, le_other_guy) = distance(le_other_guy, le_bron)).

The closest guys to LeBron by this measure are CJ McCollum, Terry Rozier, Andrea Bargnani, Marcus Morris, Richard Hamilton, Nikola Vucevic, Zach Randolph, Lauri Markkanen, Kawhi Leonard, and Kevin Huerter.

Since Richard Hamilton had the streakiest game in the last 20 years, it's not surpring to see him. But except for Randolph and Vucevic, none of the top 10 had exceptional z scores, though they were all positive.

The Jensen-Shannon distance results were extremely similar to entropy. It agreed exactly with the entropy on 73 of the top 100 players. The average z score for those players was 1.16, versus 1.15 for entropy. So, in aggregate, both were better than my homegrown metric at identifying unstreaky players.

This graph shows the shape of the 10 players most similar to LeBron. They all have the same downward trend

most-similar-last5

I haven't looked at whether the reason for the trend in last 5 FG% is due to shot selection for these other players, which is probably the interesting part. Some of the players flagged here are inevitably due to chance. It's based on six 50/50 measurements, so 1 in 64 players would get flagged as "LeBron like" even if the data was randomly generated.

None of my queries here turned up the un-streakiest players like Luka Doncic and Anthony Edwards. Whatever causes their extreme unstreakiness (beyond randomness) must be different from LeBron's tendencies. Stay tuned!

Jun 12, 2025

LeBron James, the Unstreaky King

In previous installments, I've shown that NBA players are, as a whole, less streaky than they should be. This is apparent in game-level data, and more obvious looking at multi season trends.

So far, I've only looked at the past few seasons of the NBA. I decided to gather as much data as I could, analyzing every single shot taken in the NBA from 2004 to 2024.

Data is taken from https://www.kaggle.com/datasets/mexwell/nba-shots.

The streakiest games of the past 20 years

As I showed in the last installment, there are two ways of measuring how relatively streaky each individual game is. We can use the normal approximation from the Wald-Wolfowitz test, or we can calculate the percentile ranks from the exact probabilities.

These two metrics give different answers to what is the streakiest game of the past 20 years. According to percentile rank, the streakiest ever was Cedi Osman, who in 2022 missed 10 shots in a row, followed by making 6 shots in a row, for an equivalent z-score of -3.6.

According to the normal approximation, the streakiest game ever was Chris Bosh in 2007, who made 15 shots in a row before missing his final 4. Bosh doesn't even make the top 5 by percentile rank.

Other strong performances include Andre Iguodala, who had 16 straight misses followed by 3 makes in 2008, and Willie Green, who had 5 misses followed by 12 makes. The sheer length of those streaks is impressive, but to maximize the number of expected streaks, there need to be similar numbers of makes and misses. A game with 5 makes and 5 misses has a maximum of 10 streaks. A game with 15 makes and 4 misses, like Chris Bosh, has a maximum of 9 streaks.

Kobe Bryant's final game in the NBA also deserves mention. He went an extremely streaky 22 for 50, earning the highest number of expected streaks in the data I have (25.64): 11111000100110000101110000101100001100001111100000

The least streaky games

The least streaky was by Richard Hamilton in 2006, who had 10 makes and 13 misses, no two makes in a row: 01001010101010010101010

Kyrie Irving, Dejounte Murray (previously covered), and Kevin Martin also had strong showings.

The un-streaky king

The 4th most un-streaky game of the past 20 years belongs to LeBron James. LeBron scored 31 points in an easy win over the SuperSonics in 2005. Aside from 2 makes in a row at the start of the game, he perfectly alternated makes and misses the rest of the game: 110101010101010101

In the 20 years of shot data I analyzed, LeBron stands out as by far the most un-streaky player. Here are the career z scores of every player from 2004-2024:

career-z-scores

LeBron can't even be seen on this chart. He is in a world of his own, with a career z score of 5.9. We have to go to the Jon Bois style scatterplot with one extreme outlier in the corner:

career-z-scatter

If this were a Youtube video, imagine me zooming in on the solitary dot in the upper right while the saxophone hook from Baker Street kicks in.

Which is to say, it's really, really unlikely. The odds are around 1 in 550 million. That puts him in the 99.9999998th percentile.

If all 8.2 Billion people on the planet had LeBron's NBA career, taking over 29,000 shots like he has, at the same FG% he did, we'd expect 15 people to be that unstreaky or more. That's elite company. Not only is LeBron James the LeBron James of basketball, he's also the LeBron James of being unstreaky at basketball.

As both the most unstreaky player of all time, and the most prolific scorer of all time, LeBron James makes a perfect test subject for understanding unstreakiness.

He's had 15,159 shooting streaks in his career so far, which is 504 more streaks than expected. Say LeBron takes a low percentage shot because he feels like he has the hot hand. It might be lower, but it's probably not dramatically worse than his regular shot. So for him to have 500 more streaks than expected, that's potentially thousands of choices LeBron has made over his career that increased the likelihood of streaks getting broken.

Streak lengths

I simulated LeBron's career 1000 times and compared the frequency of streak lengths to his actual career. Here are his actual streaks compared to the expected frequencies:

lebron-make-streaks

lebron-miss-streaks

He has slightly more 1 and 2 shot make/miss streaks than expected, and fewer streaks of 5-6 or more.

Previously I discussed that players could cause unstreakiness because they "go get a bucket" when the "shot isn't falling" -- in other words, they take higher percentage shots when they're on a cold streak. They might try to draw contact from a defender, and if they do get fouled, it only counts as a shot attempt if the shot goes in. On the other hand, they might take risky "heat check" shots when they are performing relatively well because they feel like they can't miss.

To capture "hot" versus "cold", I decided to track the FG% over the previous 5 shots in the game. So, it's undefined for the player's first 5 shots of the game, then defined from the 6th on. Because LeBron is such a high volume scorer and has been for so many years, that's still a lot of data to look at.

here are the number of shot attempts by LeBron by each "last 5" shooting percentage.

NaN    7460
0.6    7222
0.4    6906
0.8    3518
0.2    3090
1.0     612
0.0     503

I have defined cold as making 0 or 1 of the last 5 shots, and hot as making 4 or 5 of the last 5. This was a semi-arbitrary choice based on make/miss streaks longer than 5 happening less frequently than chance would dictate. It also matches how my simulated un-streaky player works.

There's a clear trend. LeBron's FG% is 10% higher when he's missed his last 5 shots than when he's made his last 5.

lebron-last-5

That's a pretty big swing.

LeBron is un-streaky due to shot selection

What's behind this trend?

LeBron takes a lot more high percentage shots when he's cold versus when he's hot.

Change in shot rates (cold minus hot):

Above the Break 3       -0.119694
Backcourt               -0.001659
In The Paint (Non-RA)    0.023637
Left Corner 3            0.000429
Mid-Range               -0.056841
Restricted Area          0.159098

When he's hot, he takes 29% of his shots in the restricted area (right near the basket, which is his highest percentage shot). When LeBron's cold, that jumps up to 45% of his shots. When he's hot, 29% of his shots are above the break 3's, but he only takes that shot 17% of the time when he's cold.

LeBron's FG% at each type of shot doesn't change much between times when he's hot and cold and in between. He's a tiny biy better at corner 3's when he's cold vs. hot, but that's on very small volume. LeBron is usually attacking the middle of the court, not standing in the corner.

He's actually slightly worse at his three most common shot types (above the break 3, mid-range, restricted area) when he's on a cold streak. He's not un-streaky because he suddenly becomes a better shooter. He chooses to "go get a bucket" and seek out a higher percentage shot.

Change in FG% (cold minus hot):

Above the Break 3       -0.036856
In The Paint (Non-RA)    0.001647
Left Corner 3            0.026525
Mid-Range               -0.023709
Restricted Area         -0.013754
Right Corner 3           0.157949

It looks like it cuts both ways. LeBron takes lower percentage shots when he's shooting well, and higher percentage shots when he's shooting poorly over the past 5 shots, compared to the average performance.

Shot order trends

LeBron's FG% appears to trend downward with the more shots that he takes in a game. The white line is his career average:

seq-vs-fg

I haven't looked into it yet, but I suspect this is partially due to LeBron often taking the last shot of the game. Final shots of the game should be harder than average if it's a close game. Everybody knows the ball's going to LeBron for the final shot, so the defense is keying in on him. I'll save that for another installment, though.

Other unstreaky guys

Kyle Kuzma, Julius Randle, Elton Brand, and Anthony Edwards are all in the 4+ z score club, with Luka Doncic, Giannis, John Henson, Goran Dragic, and Jordan Poole also in the top 10.

League-wide trends

I went back and did the same analysis for every non-LeBron shot over the last 20 years. The league as a whole doesn't show the same trends that LeBron does. FG% isn't correlated with number of makes of the last 5. Here are the shooting percentages, graphed on the same scale as the one I used for LeBron:

league-last-5

There's a very slight uptick when a player has made all 5 of their last 5 shots in the game, but otherwise it's remarkably flat.

Looking at the order of of shots taken, the NBA as a whole shows the same rough trend as LeBron's, though less dramatically. Lower FG% on the first shot of the game, and FG% slowly going down as the number of attempts goes up. (Again, I've locked the Y axis to match the scale of LeBron's.)

league-shot-seq

Not a lot there in aggregate. I also looked at just higher volume shooters, but there wasn't a trend I could see there.

Finally, I looked at players with a z-score over 2 (aside from LeBron). Now, by artificially selecting players with a high z-score, I need to be careful with my analysis to avoid self-fulfilling prophecy.

By definition, they're going to seem more unstreaky. But it's notable they're unstreaky in the same way as LeBron -- higher shooting percentage when they're cold and lower when they're hot. It's not a strong trend, but it's there. They shoot 47% when they've missed their last 5 in a row, and 44.8% when they've made their last 5 in a row. If the high Z scores were solely due to chance, I wouldn't expect to see such a clear pattern in the last 5 data.

high-z-last5

Streaky guys

There's only one super streaky guy in the last 20 years, Ivica Zubac, with a Z score of -3.98. That's pretty crazy, but still plausibly within the realm of chance, around 1 in 30,000.

The other guys with extremely streaky behavior on high volume are Dwight Powell, Nemanja Bjelica, Erick Dampier, Aaron Nesmith and Rudy Gobert, with z scores around -3. Most of those guys are big men who aren't primarily scorers. I wonder if it has something to do with rebounding their own shot. Sometimes a big man who isn't good at shooting will do what I call the Moses Malone -- miss a shot, get their own rebound, miss that, get their own rebound again, shoot again, etc, which might produce longer streaks of misses than makes.

However, I don't think there's a need to deeply analyze the streaky players at this point, because it could be due to chance alone.

Goofy Nonsense

Zydrunas Ilgauskas was a longtime player for the Cleveland Cavaliers who was nicknamed "The Big Z". However, his career z score was only 1.49, so I don't think it's a statistics based nickname. Which is too bad, because the game could use some of those. "Small Z" for Ivica Zubac's -3.98 score might be confusin to people, unfortunately.

Final thoughts

If I were LeBron's coach, I'd try to talk him out of believing he has the hot hand, because as we've seen, acting like it exists has caused him to be the most lukewarm handed player of the past 20 years.

Shot selection shouldn't change for the worse just because a player is shooting well. LeBron on a hot streak has roughly the same shooting percentages for each type of shot as when he's not on a hot streak, or when he's on a cold streak.

His innate shooting skill doesn't change, he just takes lower percentage shots, perhaps believing they're not really lower percentage shots when he's feeling it. It's feel vs. real, as it often is in sports, and life in general. Regardless of feel, they're still worse shots than he would normally take.

Going the other way, it's like the old joke about an airplane's black box -- if black boxes are indestructible, why don't they just build the whole airplane out of that material? If LeBron has a higher shooting percentage when he's cold and decides to "go get a bucket", and that works, why doesn't he just do that on every play?

I don't have a statistical answer to that question, but I do have a common sense one. In sports, part of the game is making the other team have to handle as many possibilities at a time. A quarterback in football shouldn't throw deep passes every play, because that's easy to defend. A baseball pitcher shouldn't just throw their best pitch every time, because that's easy for the hitter to anticipate.

Likewise, LeBron probably shouldn't just put his head down and "get a bucket" every possession, because that's easy to plan against. LeBron wouldn't be an all time great if he only shot in the restricted area. While 3 pointers and midrange shots may have a lower expected value versus driving to the hoop, they force the defender to worry about LeBron no matter where he is on the court. But I think both as a hoops fan and a data nerd, trying to create a high percentage shot isn't a bad thing to do when a player is struggling in a game. That's especially true if a player can become less engaged in other aspects of the game when they are shooting poorly.

Jun 08, 2025

Approximate Normality and Continuity Corrections

(Notebooks and other code available at: https://github.com/csdurfee/hot_hand. As usual, there is stuff in there I'm not covering here.)

What is "approximately normal"?

In the last installment, I looked at NBA game-level player data, which involve very small samples.

Like a lot of things in statistics, the Wald Wolfowitz test says that the number of streaks is approximately normal. What does that mean in practical terms? How approximately are we talking?

The number of streaks is a discrete value (0,1,2,3,...). In a small sample like 2 makes and 3 misses, which will be extremely common in player game level shooting data, how could that be approximately normal?

Below is a bar chart of the exact probabilities of each number of streaks, overlaid with the normal approximation in white. Not very normal, is it?

not very normal

To make things more interesting, let's say the player made 7 shots and missed 4. That's enough for the graph to look more like a proper bell curve.

exact 7-4 (or 4-7)

The bell curve looks skewed relative to the histogram, right? That's what happens when you model a discrete distribution (the number of streaks) with a continuous one -- the normal distribution.

A continuous distribution has zero probability at any single point, so we calculate the area under the curve between a range of values. The bar for exactly 7 streaks should line up with the probability of between 6.5 and 7.5 streaks in the normal approximation. The curve should be going through the middle of each bar, not the left edge.

We need to shift the curve to the right a half a streak for things to line up. Fixing this is called continuity correction.

Here's the same graph with the continuity correction applied:

with cc

So... better, but there's still a problem. The normal approximation will assign a nonzero probability to impossible things. In this case of 7 makes and 4 misses, the minimum possible number of streaks is 2 and the max is 9 (alternate wins and losses till you run out of losses, then have a string of wins at the end.)

Yet the normal approximation says there's a nonzero chance of -1, 10, or even a million streaks. The odds are tiny, but the normal distribution never ends. These differences go away with big sample sizes, but they may be worth worrying about for small sample sizes.

Is that interfering with my results? It's quite possible. I'm trying to use the mean and the standard deviation to decide how "weird" each player is in the form of a z score. The z score gives the likelihood of the data happening by chance, given certain assumptions. If the assumptions don't hold, the z score, and using it to interpret how weird things are, is suspect.

Exact-ish odds

We can easily calculate the exact odds. In the notebook, I showed how to calculate the odds with brute force -- generate all permutations of seven 1's and four 0's, and measure the number of streaks for each one. That's impractical and silly, since the exact counting formula can be worked out using the rules of combinatorics, as this page nicely shows: https://online.stat.psu.edu/stat415/lesson/21/21.1

In order to compare players with different numbers of makes and misses, we'd want to calculate a percentile value for each one from the exact odds. The percentiles will be based on number of streaks, so 1st percentile would be super streaky, 99th percentile super un-streaky.

Let's say we're looking at the case of 7 makes and 4 misses, and are trying to calculate the percentile value that should go with each number of streaks. Here are the exact odds of each number of streaks:

2    0.006061
3    0.027273
4    0.109091
5    0.190909
6    0.272727
7    0.227273
8    0.121212
9    0.045455

Here are the cumulative odds (the odds of getting that number of streaks or fewer):

2    0.006061
3    0.033333
4    0.142424
5    0.333333
6    0.606061
7    0.833333
8    0.954545
9    1.000000

Let's say we get 6 streaks. Exactly 6 streaks happens 27% of the time. 5 or fewer streaks happens 33% of the time. So we could say 6 streaks is equal to the 33rd percentile, the 33.3%+27.3% = 61st percentile, or some value in between those two numbers.

The obvious way of deciding the percentile rank is to take the average of the upper and lower values, in this case mean(.333, .606) = .47. You could also think of it as taking the probability of streaks <=5 and adding half the probability of streaks=6.

If we want to compare the percentile ranks from the exact odds to Wald-Wolfowitz, we could convert them to an equivalent z score. Or, we can take the z-scores from the Wald Wolfowitz test and convert them to percentiles.

The two are bound to be a little different because the normal approximation is a bell curve, whereas we're getting the percentile rank from a linear interpolation of two values.

Here's an illustration of what I mean. This is a graph of the percentile ranks vs the CDF of the normal approximation.

cdf-normal-exact2

Let's zoom in on the section between 4.5 and 5.5 streaks. Where the white line hits the red line is the percentile estimate we'd get from the z-score (.475).

cdf-zoom

The green line is a straight line that represents calculating the percentile rank. It goes from the middle of the top of the runs <= 5 bar to the middle of the top of the runs <=6 bar. Where it hits the red line is the average of the two, which is percentile rank (.470).

In other situations, the Wald-Wolfowitz estimate will be less than the exact percentile rank. We can see that on the first graph. The green lines and white line are very close to each other, but sometimes the green is higher (like at runs=4), and sometimes the white is higher (like at runs=8).

Is Wald-Wolfowitz unbiased?

Yeah. The test provides the exact expected value of the number of streaks. It's not just a pretty good estimate. It is the (weighted) mean of the exact probabilities.

From the exact odds, the mean of all the streak lengths is 6.0909:

count    330.000000
mean       6.090909
std        1.445329
min        2.000000
25%        5.000000
50%        6.000000
75%        7.000000
max        9.000000

The Wald-Wolfowitz test says the expected value is 1 plus the harmonic mean of 7 and 4, which is 6.0909... on the nose.

Is the normal approximation throwing off my results?

Quite possibly. So I went back and calculated the percentile ranks for every player-game combo over the course of the season.

Here's a scatter plot of the two ways to calculate the percentile on actual NBA player games. The dots above the x=y line are where the Wald-Wolfowitz percentile is bigger than the percentile rank one.

percentile-vs-ww

59% of the time, the Wald-Wolfowitz estimate produces a higher percentile value than the percentile rank. The same trend occurs if I restrict the data set to only high volume shooters (more than 10 makes or misses on the game).

Here's a bar chart of the differences between the W-W percentile and the percentile rank:

ww-minus-pr

A percentile over 50, or a positive z score, means more streaks than average, thus less streaky than average. In other words, on this specific data set, the Wald-Wolfowitz z-scores will be more un-streaky compared to the exact probabilities.

Interlude: our un-streaky king

For the record, the un-streakiest NBA game of the 2023-24 season was by Dejounte Murray on 4/9/2024. My dude went 12 for 31 and managed 25 streaks, the most possible for that number of makes and misses, by virtue of never making 2 shots in a row.

It was a crazy game all around for Murray. A 29-13-13 triple double with 4 steals, and a Kobe-esque 29 points on 31 shots. He could've gotten more, too. The game went to double overtime, and he missed his last 4 in a row. If he had made the 2nd and the 4th of those, he could've gotten 4 more streaks on the game.

The summary of the game doesn't mention this exceptional achievement. Of course they wouldn't. There's no clue of it in the box score. You couldn't bet on it. Why would anyone notice?

box score on bbref

Look at that unstreakiness. Isn't it beautiful?

makes                                                  12
misses                                                 19
total_streaks                                          25
raw_data                  LWLWLWLWLWLWLLWLLLWLWLWLWLWLLLL
expected_streaks                                15.709677
variance                                         6.722164
z_score                                          3.583243
exact_percentile_rank                           99.993423
z_from_percentile_rank                           3.823544
ww_percentile                                   99.983032

On the other end, the streakiest performance of the year belonged to Jabari Walker of the Portland Trail Blazers. Made his first 6 shots in a row, then missed his last 8 in a row.

makes                                  6
misses                                 8
total_streaks                          2
raw_data                  WWWWWWLLLLLLLL
expected_streaks                7.857143
variance                        3.089482
z_score                        -3.332292
exact_percentile_rank             0.0333
z_from_percentile_rank         -3.403206
ww_percentile                   0.043067

Actual player performances

Let's look at actual NBA games where a player had exactly 7 makes and 4 misses. (We can also include the flip side, 4 makes and 7 misses, because it will be the same distribution of streak lengths)

The green areas are where the players had more streaks than the exact probabilities; the red areas are where players had fewer streaks. The two are very close, except for a lot more games with 9 streaks in the player data, and fewer 6 streak games.

The exact mean is 6.09 streaks. The mean for player performances is 6.20 streaks. Even in this little slice of data, there's a slight tendency towards unstreakiness.

streaks-vs-probs

Percentile ranks are still unstreaky, though

Well, for all that windup, the game-level percentile ranks didn't turn out all that different when I calcualted them for all 18,000+ player-game combos. The mean and median are still shifted to the un-streaky side, to a significant degree.

z-from-percentile

Plotting the deciles shows an interesting tendency: a lot more values in the 60-70th percentile range than expected. the shift to the un-streaky side comes pretty much from these values.

perc-rank-deciles

The bias towards the unstreaky side is still there, and still significant:

count    18982.000000
mean         0.039683
std          0.893720
min         -3.403206
25%         -0.643522
50%          0.059717
75%          0.674490
max          3.823544

A weird continuity correction that seems obviously bad

SAS, the granddaddy of statistics software, applies a continuity correction to the runs test whenever the count is less than 50.

While it's true that we should be careful with normal approximations and small sample size, this ain't the way.

The exact code used is here: https://support.sas.com/kb/33/092.html

        if N GE 50 then Z = (Runs - mu) / sigma;
        else if Runs-mu LT 0 then Z = (Runs-mu+0.5)/sigma;
          else Z = (Runs-mu-0.5)/sigma;

Other implementations I looked at, like the one in R's randtests package, don't do the correction.

What does this sort of correction look like?

For starters, it gives us something that doesn't look like a z score. The std is way too small.

count    18982.000000
mean        -0.031954
std          0.687916
min         -3.047828
25%         -0.401101
50%          0.000000
75%          0.302765
max          3.390395

sas-cc

What does this look like on random data?

It could just be this dataset, though. I will generate a fake season of data like in the last installment, but the players will have no unstreaky/streaky tendencies. They will behave like a coin flip, weighted to their season FG%. So the results should be distributed like we expect z scores to be (mean=0, std=1)

Here are the z-scores. They're not obviously bad, but the center is a bit higher than it should be.

sas-sim

However, the continuity correction especially stands out when looking at small sample sizes (in this case, simulated players with fewer than 30 shooting streaks over the course of the season).

In the below graph, red are the SAS corrected z-scores, green are the wald-wolfowitz z scores, brown are the overlap.

sas-low-vol

Continuity corrections are at best an imperfect substitute for calculating the exact odds. These days, there's no reason not to use exact odds for smaller sample sizes. Even though it ended up not mattering much, I should've started with the percentile rank for individual games. However, I don't think that the game level results are as important to the case I'm making as the career-long shooting results.

Next time, I will look at the past 20 years of NBA data. Who is the un-streakiest player of all time?

May 28, 2025

Simulating hot and lukewarm hands

(Notebooks and other code available at: https://github.com/csdurfee/hot_hand. There's a bunch of stuff in the notebook about the Wald-Wolfowitz test that I will save for another week.)

In my last installment, I was looking at season long shooting records from the NBA, and I concluded that NBA players were less streaky than expected. They have fewer long strings of makes and misses than a series of coin flips would.

I've been thinking this could be due to "heat check" shots -- a player has made a bunch of shots in a row, or are having a good shooting game in general, so they take harder shots than they normally take. It would explain some players that fans consider streaky or "heat check" players who are actually super un-streaky. Jordan Poole was the least streaky player over the last 4 seasons, which defies my expectations. Say he believes he is streaky, so tends to take bad shots when

Or it could be due to "get a bucket" shots -- a player is having a bad shooting game, so they force higher percentage shots and potentially free throws.

There's a quirk of NBA stats to remember: if a player is fouled while shooting, it only counts as a field goal attempt if they make the shot. So driving to the hoop is guaranteed to not decrease a player's field goal percentage if they successfully draw a foul, or get called for an offensive foul.

I'm not sure I've made an airtight case for the lukewarm hand. Combining every game in a season could hide the hot hand effect. What about individual games?

Game-level shooting statistics show a lukewarm tendency

I am using the complete shooting statistics available from this kaggle project: https://www.kaggle.com/datasets/mexwell/nba-shots

I'm looking at the 2023-2024 season, since the current season isn't included yet.

I went through every game that every player played in the NBA season and calculated the expected vs. actual number of streaks.

There are 24,895 player+game combos. 10,285 of them had more streaks than expected against 8,977 who had fewer streaks than expected (and around 5,000 that are exactly as expected). This is a significant imbalance towards the "lukewarm hand" side.

Here's the histogram of individual game z-scores:

individual game z-scores, 2023

And the breakdown:

count    18982.000000
mean         0.051765
std          0.988789
min         -3.332292
25%         -0.707107
50%          0.104103
75%          0.816497
max          3.583243
Name: z_score, dtype: float64

Limiting to higher volume games (at least 10 makes or 10 misses) shows the same tendency.

high attempt games, 2023-24

count    2536.000000
mean        0.055925
std         1.010195
min        -3.079575
25%        -0.616678
50%         0.072404
75%         0.750366
max         3.583243
Name: z_score, dtype: float64

There definitely appears to be a bias towards the lukewarm hand in individual game data. The mean z scores aren't that much bigger than zero, but it's a huge sample size.

Simulating streaky and non-streaky players

I coded up a simulation of a non-streaky player. When they have hit a minimum number of attempts in the game, if their shooting percentage goes above a certain level, they get a penalty. If it goes below a certain level, they get a boost.

I was able to create results that look like NBA players in aggregate with an extremely simplified model. The parameters were arbitrarily chosen

By default, the thresholds are 20% and 80%, and the boost/penalty is 20%. So a 50% shooter who has taken at least 4 shots and is shooting 80% or better for the game will get their FG% knocked down to 30% till their game percentage drops below the threshold. Likewise if they hit 20% or less, they get a boost until they're over the threshold.

I used the game level shooting statistics I got for the individual game-by-game analysis. I then replayed every shot in the NBA in the 2023-24 season using the simulated lukewarm player (and the actual fg% and number of shots attempted in each game). This is what I got:

sim-z-scores

count    526.000000
mean       0.218032
std        0.965737
min       -2.397958
25%       -0.491051
50%        0.241554
75%        0.836839
max        3.787951
Name: z_score, dtype: float64

My simulation was actually less biased to the right than the actual results:

actual-2023-24

Several big things to note:

  1. I simulated every player in the league as being a little un-streaky.
  2. I simulated them being un-streaky in both directions
  3. The boost/penalty are pretty big -- going from 50% FG percentage to 30% is going from a good NBA player to a bad college player level, and the boost to 70% FG percentage has no precedent. The most accurate shooters in the NBA are usually big men who only shoot dunks and layups, and they still usually end up in the 60-65% range.

Which is to say, my simulation is kind of silly and seemingly over-exaggerated. And it's still not as lukewarm as real NBA players are. Wild, isn't it?

Streakiness in only one direction

I also simulated players who were only streaky in one direction: "get a bucket" players who get a boost to shooting percentage when they are shooting poorly, but no penalty when they are doing well, and "heat check" players who only get the penalty.

The results were biased to the unstreaky side, but about half as much as the ones that are streaky in both directions. I had to crank the penalties/boosts up to unrealistic levels to get the bias of the z-scores up to the .2-.3 range I'm seeing with real season-level data.

The truly streaky player

Of course, I had to simulate the hot hand. The TrulyStreakyPlayer is the exact opposite of the LukewarmPlayer. They get a 20% boost when they're shooting well on the game, and a 20% penalty when they're shooting poorly.

What stands out to me here is how much it affects the z-score. I was expecting the z-scores to be biased to the negative side by about as much as the unstreaky player was to the positive side. But the effect was a lot more dramatic:

count    524.000000
mean      -0.455522
std        1.144570
min       -4.413268
25%       -1.225128
50%       -0.458503
75%        0.404549
max        2.486584

truly-streaky-player

Unlike the un-streaky simulations, the streaky behavior increased the dispersion (std), like we saw with the real shot data. There are many more outliers to the negative side than we'd expect.

What next?

I could certainly sim a mixture of streaky and unstreaky players, and eventually maybe get something that matches the real numbers pretty closely. But there are so many parameters to fit that it would be pretty arbitrary. Someone else could produce a different model that works just as well.

Most importantly, it couldn't tell us which players might be streaky due to chance versus streaky due to behavior/shot selection. So I think the next step is looking at the shot selection in the "hot hand" vs. "get a bucket" situations -- do players switch to higher percentage shots when they're having a bad game, and worse shots when they're shooting better than usual?

May 16, 2025

The hot hand doesn't exist in the NBA, but its opposite does

(The code used, and ipython notebooks with a fuller investigation of the data is available at https://github.com/csdurfee/hot_hand.)

Streaks

When I'm watching a basketball game, sometimes it seems like a certain player just can't miss. Every shot looks like it's going to go in. Other times, it seems like they've gone cold. They can't get a shot to go in no matter what they do.

This phenomenon is known as the "hot hand" and whether it exists or not has been debated for decades, even as it's taken for granted in the common language around sports. We're used to commentators saying that a player is "heating up", or, "that was a heat check".

As a fan of the game, it certainly seems like the hot hand exists. If you follow basketball, some names probably come to mind. JR Smith, Danny Green, Dion Waiters, Jamal Crawford. When they're on, they just can't miss. It doesn't matter how crazy the shot is, it's going in. And when they're cold, they're cold.

It's a thing we collectively believe in, but it turns out that there isn't clear statistical evidence to support it.

We have to be careful with our feelings about the hot hand. It certainly feels real, but that doesn't mean that it is. Within the drama of a basketball game, we're inclined to notice and assign stories to runs of makes or misses. Just because we notice them, that doesn't mean they're significant. This is sometimes called "the law of small numbers" -- our brains have a tendency to reach spurious conclusions from a very small amount of data.

Pareidolia is the human tendency to see human faces in inanimate objects -- clouds, the bark of a tree, a tortilla. While the faces might seem real, they are just a product of our brain's natural inclination to identify patterns. It's possible the "hot hand" is a similar phenomenon -- a product of the way human brains are wired to see patterns, rather than an objective truth.

Defining Streakiness

Streaks of 1's and 0's in randomly generated binary data follow regular mathematical laws, ones our brains can't realy replicate. Writer Joseph Buchdal found that he couldn't create a random-looking sequence by hand that would fool a statistical test called the Wald-Wolfowitz test, even though he knew exactly how the statistical test worked.

I think at some level, we're physically incapable of generating truly random data, so it makes sense to me that our intuitions about randomness are a little off. Our brains are wired to notice the streaks, but we seem to have no such circuitry for noticing when something is a little bit too un-streaky. Our brains are too quick to see meaningless patterns in small amounts of data, and not clever enough to see subtle, meaningful patterns in large amounts of data. Good thing we have statistics to help us escape those biases!

For the sake of this discussion, a streak starts whenever a sequence of outcomes changes from wins (W) to losses (L), or vice-versa. (I'm talking about makes and misses, but those start with the same letter, so I'll use "W" and "L".)

The sequence WLWLWL has 6 streaks: W, L, W, L, W, L
The sequence WWLLLW has 3 streaks: WW, LLL, W

Imagine I asked someone to produce a random-looking string of 3 W's and 3 L's. If they were making the results up, I think the average person would be more likely to write the first string. It just looks "more random", right?

If they flipped a coin, it would be more likely to produce something with longer streaks, like the second example. With a fair coin, both of those exact sequences are equally likely to occur. But the second sequence has a more probable number of streaks, according to the Wald-Wolfowitz Runs Test. The expected number in 3 wins and 3 losses is (2 * (3 * 3) / (3+3)) + 1 = 4.

The expected number of streaks is the harmonic mean of the number of wins and the number of losses, plus one. Neat, right?

Around 500 players attempted a shot in the NBA this season. Let's say we create a custom coin for each player. It comes up heads with the same percentage as the player's shooting percentage on the season. If we took those coins and simulated every shot in the NBA this season, some of the coins would inevitably appear to be "streakier" than others.

Players never intend to miss shots, yet most players shoot around 50%, so there has to be some element of chance as far as which shots go in or not. Otherwise, why wouldn't players just choose to make all of them?

So makes versus misses are at least somewhat random, which means if we look at the shooting records of 500 players in an NBA season, some will seem more or less consistent due to the laws of probability. That means a player with longer or shorter streaks than expected could just be due to chance, not due to the player actively doing something that makes them more streaky.

The Lukewarm Hand

We might call players who have fewer streaks than expected by chance consistent. Maybe they go exactly 5 for 10 every single game, never being especially good or especially bad. Or maybe they go 1 for 3 every game, always being pretty bad.

But that feels like the wrong word, and I don't think our brains aren't really wired to notice a player that has fewer streaks than average. As we already saw, the "right" number of streaks is counterintuitive.

I might notice a player is unusually consistent after the fact when looking at their basketball-reference page, but the feeling of a player having the hot hand is visceral, experienced in the moment. Even without consulting the box score, sometimes players look like they just can't miss, or can't make, a shot. They seem more confident, or their shot seems more natural, than usual. Both the shooter and the spectator seem to have a higher expectation that the shot will go in than usual. The hot hand is a social phenomenon.

there's always an xkcd
(from https://xkcd.com/904/)

If we look at the makes and misses of every player in the league, do they look like the results of flipping a coin (weighted to match their shooting percentage), or is there a tendency for players to be more or less streaky than expected by chance?

We don't really have a formal word for players who are less streaky than they should be, so I'm going to call the opposite of the hot hand the lukewarm hand. While the lukewarm hand isn't a thing we would viscerally notice the way we do the hot hand, it's certainly possible to exist. And it's just as surprising, from the perspective of treating basketball players like weighted coins.

Some people I've seen analyze the hot hand treat the question as streaky versus non-streaky. But it's not a binary thing. There are two possible extremes, and a region in between. It's unusually streaky versus normal amount of streaky versus unusually non-streaky.

The Wald-Wolfowitz test says that the number of streaks in randomly-generated data will be normally distributed, and gives a formula for the variance of the number of streaks. The normal distribution is symmetrical, so there should be as many hot hand players as lukewarm hand ones. Players have varying numbers of shots taken over the course of the season so we can't compare them directly, but we can calculate the z score for each player's expected vs. actual number of streaks. The z score represents how "weird" the player is. If we look at all the z-scores together, we can see whether NBA players as a whole are streakier or less streaky than chance alone would predict. We can also see if the outliers correspond to the popular notions of who the streaky shooters in the NBA are.

Simplifying Assumptions

We should start with the assumption that athletes really are weighted random number generators. A coin might have "good days" and "bad days" based on the results, but it's not because the coin is "in the zone" one day, or a little injured the next day. At least some of the variance in a player's streakiness is due to randomness, so we have to be looking for effects that can't be explained by randomness alone.

So I am analyzing all shots a player took, across all games. This could cause problems, which I will discuss later on, but splitting the results up game-by-game or week-by-week leads to other problems. Looking at shooting percentages by game or by week means smaller sample sizes, and thus more sampling error. It also means that comparisions between high volume shooters and low volume shooters can be misinterpreted. The high volume shooters may appear more "consistent" simply because it's a larger sample size.

I think I need to prove that streakiness exists before making assumptions about how it works. Let's say the "hot hand" does exist. If a player makes a bunch of shots in a row, how long might they stay hot? Does it last through halftime? Does it carry over to the next game? How many makes in a row before they "heat up"? How much does a player's field goal percentage go up? Does a player have cold streaks and hot streaks, or are they only streaky in one direction?

There are an infinite number of ways to model how it could work, which means it's ripe for overfitting. So I wanted to start with the simplest, most easily justifiable model. The original paper about the hot hand was co-written by Amos Tversky, who went on to win a Nobel Prize for helping to invent behavioral economics. I figure any time you can crib off of a Nobel Prize winner's homework, you probably should!

Results

I started off by getting data on every shot taken in the 2024-25 NBA regular season. I calculated the expected number of streaks and actual number, then a z-score for every player.

Players with a z-score of 0 are just like what we'd expect from flipping a coin. A positive z-score indicates there were more streaks than expected. More streaks than expected means the streaks were shorter than expected, which means less streaky than expected.

A negative z-score indicates the opposite. Those players had fewer streaks than expected, which means the streaks were longer. When people talk about the "hot hand" or "streaky shooters", they are talking about players who should have a negative z-score by this test.

all players, 2024-5

The curve over the top is the distribution of z-scores we'd expect if the players worked like weighted coin flips.

Just eyeballing it, it's pretty close. It's definitely a bell curve, centered pretty close to zero. If there is a skew, it's actually to the positive, un-streaky side, though. The mean z-score is .21, when we'd expect it to be zero.

count    554.000000
mean       0.212491
std        1.075563
min       -3.081194
25%       -0.546340
50%        0.236554
75%        0.951653
max        3.054836

The Wilk-Shapiro test is way to decide whether a set of data plausibly came from a normal distribution. It passed. There is no conclusive evidence that players in general are streakier or less streaky than predicted by chance. This data very well could've come from flipping a bunch of coins.

But it's still sorta skewed. There were 320 players with a positive z-score (un-streaky) versus 232 with a negative z-score (streaky). That's suspicious.

Outliers

A whole lot of those 554 players didn't make very many shots.

numer of makes, 2024-5

I decided to split up players with over 100 makes versus under 100 makes. Unlike high volume shooters, the low volume shooters had no bias towards unstreakiness. They look like totally random data.

Here are just the high volume shooters (323 players in total). Notice how none of them have a z-score less than about -2. It should be symmetrical.

over 100 makes, 2024-5

count    323.000000
mean       0.347452
std        1.068341
min       -2.082528
25%       -0.454794
50%        0.363949
75%        1.091244
max        3.054836

There were 20 players with a z-score less than -2 versus only 2 players with a score greater than 2.

The Eye Test

I looked at which players had exceptionally high or low z scores. The names don't really make sense to me as an NBA fan. There were players like Jordan Poole and Jalen Green, who I think fans would consider streaky, but they had exceptionally un-streaky z-scores. I don't think the average NBA fan would say Jalen Green is less streaky than 97.5% of the players in the league, but he is (by this test).

On the other hand, two streakiest players in the NBA this year were Goga Bitadze and Thomas Bryant, two players who don't fit the profile of the stereotypical streaky shooter by any means.

Makes vs. Streakiness

The more shots a player made this season, the less streaky they tended to be. Here's a plot of makes on the 2024-25 season versus the z-score.

makes vs z-score

That's pretty odd, isn't it?

Getting more data: 2021-present

I figured a bigger sample size would be better. Maybe this season was just weird. So got the last 4 seasons of data (2021-22,2022-2023, 2023-2024, 2024-2025) for players who made a shot in the NBA this season and combined them.

The four year data is even more skewed towards the lukewarm hand, or un-streaky side, than the single year data.

all players, 2021-2025

count    562.000000
mean       0.443496
std        1.157664
min       -4.031970
25%       -0.312044
50%        0.449647
75%        1.184918
max        4.184025

The correlation between number of makes and z-score is quite strong in the 4 year data:

2021-2025 z score vs makes

There were 48 players with a z-score > 2, versus only 9 with a score <-2. That's like flipping a coin and getting 48 heads and 9 tails. There's around a 2 in 10 million chance of that happening with a fair coin.

High Volume Shooters, Redux

The bias towards the lukewarm hand is even stronger among high volume shooters. Here are players with more than 500 makes over the past 4 years.

over 500 makes

The z-scores are normally distributed according to the Wilk-Shapiro test, but they're no longer even close to being centered at zero. They're also overdispersed (the std is bigger than the expected 1.) It's not plausible that the true mean is 0, given the sample mean is .680.

count    265.000000
mean       0.680097
std        1.217946
min       -2.392061
25%       -0.149211
50%        0.776917
75%        1.485595
max        4.184025

high volume hist

Streak Lengths

I looked at the length of make/miss streaks for the actual NBA players versus simulating the results. The results were simulated by taking the exact number of makes and misses for each NBA player, and then shuffling those results randomly. What I found confirmed the "lukewarm hand" -- overall, NBA players have slightly more 1 and 2 shot streaks than expected, and fewer long streaks than expected.

streaks

Obvious objections, and what about free throws?

I'm treating every field goal attempt like it has the same chance of going in. Clearly that's not the case. Players, especially high volume scorers, can choose which shots they take. It's easy to imagine a player that has missed several shots in a row and is feeling "cold" would concentrate on only taking higher percentage shots. There's also the fact that I'm combining games together. That could potentially lead to players looking less streaky than they are within the course of a single game. But it should also make truly unstreaky players look less unstreaky. Streaks getting "reset" by the end of the game should make players act more like a purely random process -- not too streaky or unstreaky. It shouldn't increase the standard deviation of the z-scores like we're seeing, or cause a shift towards unstreakiness.

I may do a simulation to illustrate that, but in the meantime, the most controlled shot data we have is free throw data. Every free throw should have exactly the same level of difficulty for the player.

I got the data for the 200,000+ free throws in the NBA regular season over the past four years (October 2021 through April 2025).

Here are the z-scores for all players. There's a big chunk taken out of the middle of the bell curve, but it's normal-ish other than that.

free throws

240 players have made over 200 free throws in the past 4 years. When I restrict to just those players, there's a slight skew towards the "hot hand", or being more streaky than expected. There are no exceptionally lukewarm hands when it comes to free throws. It's sort of the mirror image of what we saw with high volume field goal shooters.

free throws, over 200 makes

count    240.000000
mean      -0.144277
std        1.021330
min       -2.686543
25%       -0.854723
50%       -0.174146
75%        0.660302
max        1.845302

Conclusions, for now

I feel comfortable concluding that the hot hand doesn't exist when it comes to field goals. I can't say why there's a tendency towards unstreakiness yet, but I suspect it is due to shot selection. Players who have made a bunch of shots may take more difficult shots than average, and players who have missed a bunch of shots will go for an easier shot than average. While players can't choose when to "heat up" or "go cold", they can certainly change shot selection based on their emotions or the momentum of the game.

There may be a slight tendency towards the hot hand when it comes to free throws. It's worth investigating further, I think. But the effect there doesn't appear to be nearly as strong as the lukewarm hand tendency for field goals.

Next → Page 1 of 2