Aaron Barzilai is a data science and analytics expert with experience both in basketball analytics and pharmaceutical data science applications. In his work as an independent consultant, he helps companies craft and execute complex predictive analytics, statistical analysis, and data architecture projects.
In our latest Expert Q&A, Aaron talks about evaluating data science opportunities, combining the human and statistical perspective, and building tools that are useful to everyone in the business.
What skills do you bring to the table as an independent data science expert?
My background is pretty varied. I’m not classically trained in data science—I have a PhD in mechanical engineering from Stanford—but I’ve always been very curious, and I got started in data analysis working first for the Motley Fool and then for Capital One. After that, I moved into management consulting at ZS Associates. As an independent data science expert, I think it’s critical to be as comfortable talking with senior management as you are getting your hands dirty while writing quality code. That way, you can work anywhere in the chain at your client to solve their problems.
You’ve also done a lot of work in basketball analytics, with stints at the Memphis Grizzlies, the Philadelphia 76ers, and HerHoopStats.com, a website you founded in 2017 that’s devoted to NCAA Division I women’s basketball.
I’ve loved basketball from a young age and always enjoyed math. I was on the varsity basketball team at MIT. I wasn’t very good, though. When I first learned about the existence of basketball analytics about 15 years ago, I thought it would be a fun thing to start dabbling with.
Analytics has taken the business world by storm. Everyone knows they need to develop better data science strategies and make more data-driven decisions, but many companies are struggling to get there. What challenges have you seen in your consulting practice?
Companies, especially big, established companies, face a lot of different challenges. I think sales offers a useful analogy—there’s an inherent tension between the sort of localized knowledge that you get from being in the field and the more data-driven recommendations that come from HQ. We say the same thing in basketball. There’s a great quote by Dean Oliver that says, “The numbers don’t see any game as well as a human does, but they see every game.” The idea is that you’re better off watching a game instead of just looking at the stats if you can, but there’s also power in having consistent information on every game—or potential customer—including ones you can’t observe in person.
So the trick is to be able to marry the human and the statistical perspective?
The real trick is having enough information that you can see the same things that people in the field are seeing—the things that humans are using in their decision making. So much of what data scientists are trying to do is just systematize things and make them consistent, so that our approach isn’t swayed by the biases we all bring to decision making.
How, in your consulting practice, do you help your clients accomplish that?
It varies. Sometimes, clients are struggling to optimize a process, and we need to come up with really concrete, measurable ways to prove value. “We sped up the targeting process by X number of weeks or days,” or “We’re much more efficient, so our ROI is going up by Y percent.” It’s always satisfying to create predictive models that do well in unfamiliar situations and to watch your chart of predicted-to-actual-outcomes come close to a straight line. Other times, you’re working in the white space, and you don’t really know what the solution’s going to be.
What’s your approach in that situation? Many companies seem to struggle with a realm in which there’s still so much open territory.
I’m pretty pragmatic, and I don’t assume I know more about what’s going on than the client does. Often, we arrive at the answer by defining a structure through which to evaluate the opportunities and pain points and key drivers of the business. If you think about the pharmaceutical world, it’s like going from the whole population to people with a specific condition to just those people that are going to benefit from your therapy. When you do that systematically, it’s much easier to recognize if you’re lagging in a particular issue.
So you’re trying to create a disciplined discovery process rather than casting about and evaluating different data science applications.
So many times, people ask, “What do we think about this?” instead of, “Well, let’s find out.” When people think about data scientists, they picture fancy algorithms and natural language processing and all that. But really, so much of it is very first-order model kind of stuff. What are the chances that a 5’ 11” NBA prospect is going to be an All Star? I don’t have the numbers off the top of my head, right? But something to the effect of 50 players who are 5’ 11” or smaller have been drafted by the NBA in the last ten years, and three of them have made it to the All Stars. So, saying it’s a 6% chance would be a pretty good first-order guess.
And you can refine your prediction as you learn more about the player. Maybe they have much longer arms, and that increases it from 6% to 7%. But you don’t often need to have some crazy complicated neural network. In a lot of situations, it’s much more about just leveraging what’s out there and what you know.
What steps should companies take as they try to become more data driven?
First, make sure you’ve got good infrastructure around your data so that you can actually start to make use of it. Second, take a very hard look at how you can get your whole organization to use better data practices. Can you put a tool into the hands of your marketing team that everyone can use? If you’ve got 100,000 people in your organization, it might be more impactful to make them 5% better than to hire the top Data Science PhDs from Stanford and form a centralized analytics group.
Finally, and perhaps most importantly, when people get poached by your competitors, make sure the institutional learning stays in your organization and you can still run those models next year.
On some level, that suggests that overcoming data science challenges is not really so different than following best practices for any other domain.
Sometimes, people seize on the next big thing and assume the rules no longer apply. You could have the very best NBA draft model, but it won’t help you if nobody uses it. The same thing applies to data scientists. There are plenty of incredibly talented people out there. You don’t necessarily need the best person in the world, but you do need to make sure your organization can take advantage of what’s being built.
Do you have any specific recommendations for making use of independent data science experts?
It always helps to get as specific as possible about the problem you’re trying to solve. It’s okay to acknowledge that you’re not exactly sure what you’re looking for as long as everyone’s on the same page. But sometimes clients ask for one thing, then discover they really wanted something else, and that’s frustrating for everyone. It’s one of the reasons I try to mock up the deliverable at the start of every project—this is the area we’re looking at, and these are the pieces of information I’m going to give you when things are done.
Another place people get into trouble is when they don’t give the consultant access to the real client. If you’re creating a tool for the company’s sales force, it can be hugely important to talk to some salespeople and find out how they do their work.
In a field that changes so quickly, what developments are you keeping your eye on?
People are moving into newer serverless technologies like AWS Lambda. It’s pretty cool, and I enjoy using it, but it’s a lot harder to debug than traditional computer programs.
I tend to focus on the fundamentals. That was Tim Duncan’s nickname, Big Fundamental, after all. It’s important to track new technologies, but not at the expense of the fundaments. People often make huge advances adding to their data science capabilities and using predictive models. It’s not because they’re inventing new ways to analyze the data, but because they’ve invested in getting better data sets or using the data more effectively. It doesn’t get as much attention as the latest image processing technique. But really, it’s amazing what you can do with logistic regression.
About the AuthorMore Content by Leah Hoffmann