Solving for Will Smith and Max Muncy
Using a Unique Player ID to combine stat lists without duplicate names and misspellings.
It's been a while (like 8 weeks). Turns out it's hard to take care of a newborn and even attempt to post substack articles. That’s right, Mrs. Baseball Sheets and I had our first child. So while you will still see posts, expect them to be full of errors and very rambly as I am sleep-deprived but pumped to watch baseball this year with my little guy. So with that news out of the way, let's get to the content:
The Unique Player ID or UPID is just a number I randomly assigned to every active player I could find. I started counting up randomly with Jacob deGrom back in 2022 with the number 1876 (the birth year of baseball). The real power of the UPID number is the database behind it: I have listed names and alternate spellings for every player along with team and position. That means you can use a QUERY search to match up every random list of players and stats with your own player database.
How it started
In past articles, I showed how I built an off-platform prospect system in Google Sheets. The only way to build something like that was to have a unique ID database like this. Most of my player info is typically from Fangraphs, various prospect ranking websites, and official MLB prospect lists (Like the TOP 100). The first issue I ran into was how to combine all these sources. Lining up different lists, rankings, and information required listing prospects and connecting all the different sources to that list. This was not too hard with the QUERY function, which allowed me to search a range for a prospect’s name and select the data I needed. However, some interesting problems came up when I needed player names to be spelled the same across two data sets. The simple solution was to spot-check known issues. However, I realized quickly that none of the websites were consistent with accents, player names tended to change (Samuel gets listed as Sammy), and overall it became a huge process just going through the list to change each instance of a name not matching up.
To solve that issue I began compiling alternate spellings of player names (adding a non-accent option and listing all juniors with and without the “Jr.”). This allowed me to search through the alternate spellings till I found a match. I made an ID number and labeled each active player so that I would not have to update every name in a list with the proper spelling each time. Also, a number allows consistency across every list.
How it's going
I have since added 5,587 active players to my list with 6,540 alternate name spellings for a total of 12,097 options to pick from. To search the list, I use a Google Sheets Named function called UPID. This is a custom function that I can import into any of my projects. The UPID Named function uses a QUERY function to search an IMPORTRANGE of the Unique Player ID Database allowing it to search my database for the correct player name to match the player name in the data set I am working with. Once it finds the player name, it sends back the UPID number and I can then match that row of player data with any other dataset with the UPID number. It’s sort of like an assistant holding a printout of my unique player database and calling out the number when I ask about a player name.
Me: ”Hey, what is Vladimir Guerrero’s UPID number?”
UPID Assistant: “Vladimir Miguel Guerrero?”
Me: “No, Vladamir Guerrero Jr. on the Blue Jays”
UPID Assitant: “Oh…its 1986”
The UPID function:
3 steps to the UPID Function:
The First Step
Checks to see if the Player Name and the Team match the data set and then returns the UPID. It also checks alternate names. If none of the alternate names and teams match the input, the IFERROR function moves to the next step.
The Second Step
Looks for duplicate names, like Max Muncy or Will Smith, and lets you know that you need to add the team abbreviation to get a UPID back. If there is only one name in the database that matches your input, it moves to the third step.
The Third Step
Opens up the qualifications and if any name in the database matches the player name provided, it will return the UPID.
If there is not a match in the database the function will show the “NO UPID” error code and let you know that no UPID number was found.
Team Abbreviations
Step 1 of the function relies on correct team information, so I do my best to update team info at the start of the season and when players are traded. I have run into the team abbreviation problem where every website likes to use a different code for each team. Jay Cuda @JayCuda on twitter/X has a great breakdown of each major website:
To help with this, I also created another named function to switch incorrect abbreviations (Yes, WSN and WAS are wrong) to the official MLB list. It works the same way as the UPID function but returns the official MLB team code when given a team name. It will also convert full team names to abbreviations.
How you can use UPID
I have a link below to the Unique Player ID Database which you can use in two ways:
1. You can use my sheet as your source. Import the UPID named function and the OFFICIALTEAM named function into your project. It uses data from my Unique Player ID Database and will allow you to use the UPID number to connect your data.
2. You can make a copy of my Unique Player ID database and follow the instructions to change the source of the UPID function to your sheet.
Unique Player ID Database
https://docs.google.com/spreadsheets/d/19hH-bUVbtbF4Qn4Ep6YRCK853eOvoI8lr2zNlRB1wgo/edit?usp=sharing
You will undoubtedly find errors and issues, which is why I put together a Google Form to report on issues so I can find and fix them:
[UPID] Report an Issue form.
For instance Luis Garcia, Luis García, Luís García, and the other Luis García - it took me a while to work out there were 4 of these guys, so please help me keep my list up to date and I hope this solution helps with combining lists and digging deeper into the stats.
Thanks for reading and I hope you enjoy the tool.
This is on my to do list, so this is a great help thank you!! I was planning to start with aligning each source’s unique player ID that’s visible if you download data from the site (fangraphs, BP, Fantrax, etc). But perhaps I’ll use yours as the “master” and add those sources’ UPID in too!