Sports Reference founder and president Sean Forman discussed with Boardroom his site’s decision to add women’s college basketball to its database.
If you’re a sports fan, you’re familiar with Sports Reference. It’s where you go to check your favorite baseball player’s WAR. To see how many times Giannis has been named to the All-Defensive team (five). To look up Derek Carr‘s career home/road splits.
Until recently, however, the site has had a blind spot in women’s college basketball. Sports Reference has a treasure trove of data related to the men’s game and the WNBA, but if you wanted to see Sabrina Ionescu‘s advanced stats at Oregon, you were out of luck.
Not anymore. A couple of weeks ago, the site added women’s college hoops data going back to the 2009-10 season and hopes to have another decade’s worth of info ready to go this week.
As the site continues to roll out a tsunami of statistics for the women’s game, Sports Reference founder and president Sean Forman spoke with Boardroom about how and why his company took on such a tall task.
“What’s important to us is just making data available in places where maybe it wasn’t before,” he told Boardroom. “Part of that was a big push in the WNBA. Part of that was a big push for women’s soccer as well. I think women’s college basketball is definitely on the rise in terms of popularity. It made sense for us.”
RUSSELL STEINBERG: Did you see a demand for women’s basketball data beforehand? Were people asking you about it?
Sean Forman: I think we could see that just around the women’s tourney, there was a lot more interest, both television interest and, and, like, ESPN had a record number of brackets submitted for the women’s tourney as well. I thought it definitely felt like there was an interest there. Our WNBA site is very extensive and does pretty good traffic each year when it’s in season. And so, we felt it was a good extension for that.
RS: It must have been much easier to build out the site for baseball versus women’s college basketball, where the availability of data is a lot more difficult. How do you even go about compiling all of this data?
SF: One of our dirty little secrets is we don’t do a lot of data aggregation ourselves in terms of from primary sources. Genius Sports, who is one of the big data providers along with Sport Radar and Stats LLC, they’re the NCAA’s official partner. And so they had women’s coverage back to 2009-10. So we were able to strike a deal with them, and it’s a significant upfront investment, but we were able to get them to license us that data.
We’re also going in and filling in things like biographical information. We had to get all the awards, we’re still working on the conference players of the year, all-conference teams, all-tourney teams, things like that. By the tourney, I’m hoping we’ll have 20-plus years of data. And then from there, it’s really about filling it in. Like on the men’s side, we have tourney box scores, I think going back to the 50s. Obviously, we’d like to get that back to 81-82 on the women’s side, and then start filling in stats. The NCAA has some of this historical information available. The schools have some of this historical information available. On our men’s side, we’re not 100% covered. So it’s where we can get the information, we’re gonna start adding it in. Our goal is that we would love to have something that’s at least for full-season data complete back to 81-82.
RS: In terms of actually gathering the data and getting it all on the site, I know that’s not you sitting there plugging in every number, but what is that process like? What’s the time commitment that goes into that, and how does it get from step one to what I see when I log on?
SF: We had a group of five or six people whose primary job was to launch the women’s college basketball site. They’ve been working on it probably six, seven months now. We already had the men’s college basketball site, so we’d already identified old schools. But, you know, having to go back and work out all the conference affiliations over time, we had to work on all that kind of stuff that, often we’re looking at Wikipedia, we’re looking at other sources that you would look at as well, just to make sure that we’re sussing this out correctly and implementing it.
And there are gonna be errors. Hopefully, you or your readers will let us know and we can fix them. One of the nice things about what we do is things always can get better as we fix errors. So that’s a nice thing about our business. The main work was taking what we had on the men’s side and then generalizing it so we could likewise cover the women’s side and make sure that we were doing it in a way that was respectful to the women’s game and not shunting it off into kind of a separate section or something like that. We tried to treat both sides of the coin equally and give them equal presentation on the site. I think we’ve mostly got there.
RS: I would imagine one of the challenges is that there’s just a lot more information publicly available about the men’s game. What sort of roadblocks did you hit in trying to even just get the data that you’ve been able to present so far?
SF: I think one of the benefits on the men’s side is there have just been a lot more eyes on it. And so, players who’ve transferred, we’ve probably got an email about, ‘Hey, you’re not linking these two player IDs, these are the same person.’ There probably has not been as much because the site’s only been up for like two or three weeks now. But even from the data providers, there’s fewer eyes on that information and making those corrections. So we send a lot of information to our data providers saying, ‘here are things that we’ve found,’ and they add those corrections into their data set as well.
So it’s kind of a virtuous cycle of making these corrections and updates. I think there’s a thirst for this information. It’s not as readily available as it might be on the men’s side. And so that, for us, has been a big opportunity. Caitlin Clark has already been trending on our site, and that’s irrespective of gender. So we’ve seen a significant interest in looking at those pages and finding information about those players. I think the lack of competitors makes the information a little hard to find, but also gives us a better opportunity going forward.
RS: In the short time that this has been up, what sort of feedback have you gotten, either from fans or media?
SF: It’s been very gratifying. We’ve gotten a lot of positive feedback. Some people have said ‘what took so long?’ It’s a fair criticism. We probably should have done this earlier than we ended up doing it, but we got there. But it’s been very positive. When we first launched, saw a lot of really positive comments on Twitter and elsewhere. The team has been very excited about it. You can tell sometimes when you’ve made somebody’s life a lot easier. So that’s a very gratifying thing, and the team feels very good about that. They’ve been pretty proud. We had a very young team working on this. I think everybody, the leadership on the team was in their twenties. And so, fairly junior people were taking the lead to make this happen and really pushed it forward and I’m really proud that they were able to grow into that and get this out the door.
RS: Maybe this is me being overly cynical, but the fact that you’ve added women’s data two years after the 2021 debacle in the San Antonio bubble, and all of the increased attention around gender equity, is that what spurred this? Just finally realizing that, hey, maybe it’s time to explore the women’s game?
SF: That certainly has weighed on me as time has gone on, and I’m certainly aware of those things. I would say probably the bigger issue for us is we’ve expanded pretty significantly in the last two to three years. So we had 11 people for quite a while, and then in 2020, actually into 2021, we’ve started expanding. We’re up to 32 people now. And so previously, there was a lot of scrambling just to keep the lights on and keep operating the sites we have. So we didn’t necessarily have the bandwidth to start adding a lot of additional stuff.
This was definitely one gap that we had. We were aware of them, had been working to close it, and the other aspect of that is with Genius, we were able to find a partner that would license us like 10 years of data, which we didn’t really wanna start something that just had like one or two years of data, and our previous college basketball provider on the men’s side did not cover women’s college basketball. The day that deal expired we were able then look for a new partner. Genius was thankfully willing to work with us, and so we converted out all of our college stuff over to the college football, men’s college basketball, and then we’re able to add women’s college basketball. So that’s been a significant reason for why the timing has been such as it is. We were — I wouldn’t say dragging our feet — but it was not a priority for us prior.
RS: When Breanna Stewart was at UConn, my friends and I were trying to compare her to Rebecca Lobo from her college days. We were trying to find numbers for Lobo, and we just couldn’t, other than very basic, basic stats. It’s things like that that just come up a lot for me in covering women’s basketball, and it feels like making this data more available is such an underrated component to growing the sport. Do you see that as well?
SF: Absolutely. I think Sue Bird wrote a piece for the Players Tribune back in 2016 about how on the NBA’s side you’d have charges drawn and field goal percentage at the rim, and all that kind of stuff. And they weren’t presenting any of that on the WNBA side. And so they didn’t really know the things that the men could know on the NBA side, that they could study and learn from and improve from. It was kind of a blocker for them. And so thankfully we’ve seen the WNBA up their game and start presenting that information.
I think we’re all about facilitating arguments about different players, right? So if we can put out offensive rating and effective field goal percentage and stuff like that for Breanna Stewart and Rebecca Lobo, that’ll allow you to compare those two. I think that’s gonna be great for everybody. You know, we love those arguments. We love facilitating those discussions. Those are the types of things you’ll see in our Slack channel in our office as well. We’re excited to see where people go with this information. We make it very easy to use, very easy to repackage, share with the readers, or share with each other.
RS: I think Sport Reference has played a role in familiarizing people with more advanced stats, particularly in baseball. Do you see yourself as having a role in normalizing analytics?
SF: Yeah, absolutely. On the baseball side, for instance, there are a lot of people who are really doing cutting-edge work: Baseball Prospectus, FanGraphs, MLB Savant with Statcast data, and stuff like that. We really try to cover everything. We have such a large audience. I really view our role as, we’re not on the cutting edge, but we’re kind of like 85% of the way there. And so we’re going to present stuff that we think provides an interesting context to the user and allows them to answer questions that they might be having. That’s really kind of our standard for any stat we provide is that, is this useful to the user? Will they be able to understand it? Can they put it in context? Can they understand what good is, what bad is, what average is, all that kind of stuff.
We really work hard to try and put that context around it. All the advanced stats we had on the men’s side, we’ve applied to the women’s side as well. That’s ongoing work for us and so as new numbers become available, we’ll take a hard look at adding those. We have some access to shot charts and things like that, that we’re not using at the moment. Maybe that’s the direction we go in the future where we’re adding more shot-related information that I know is being gathered at the team level.
RS: So what’s coming next?
SF: We will be launching, I think it’s the [data dating back to the] 01-02 season. We’re kind of merging two different data sets that we’re able to get. And so putting those together, at the boundary, you’ve gotta like match up players and teams and all that kind of stuff. So it’s a long process, but everything the team tells me is that we’ll be ready to go early [this] week with data back to 01-02.
Anything we have on the men’s side we’re hoping to have [on the women’s side]. For the most part we’re there. There are a few things we’re working on, like our NCA tourney forecast for the women’s side, which we will hopefully have up in time for Selection Sunday. One thing we have on the men’s side that we probably won’t have in time this year, but we have every buzzer beater in men’s NCAA tourney history. And so that’s really just kind of sweat-of-your-brow, show me all the games that are within three points, and then see how the game ended. Or is there a YouTube video for it kind of stuff? Mike Lynch, who’s our head of data here, that was a passion project for him that he started. And in the NBA, every NBA buzzer-beater in NBA history on our site, we hope to get all the NCAA tourney games as well on both. We have it on the men’s side, but we’re hoping to get it on the women’s side as well.