Pyomo Meets Fantasy Football

Intro

So-called “daily fantasy leagues” are popping up on the internet en masse. This form of gambling gets classified as a ‘game of skill’ and hence remains legal (for now) in all states but Arizona, Iowa, Louisiana, Montana and Washington according one website’s disclaimers. The 30-second explanation of a daily fantasy contest follows:

You, the team manager, must select a team from a pool of players; your choices are constrained principally by a salary cap and the positional requirements of the roster. The pool of available players might consist, for example, of all NFL players who have a game on Sunday, November 30. Each player has an associated cost–a healthy Peyton Manning might cost $9,000 in a league with a $50,000 salary cap. If we can obtain player point projections from a third-party site, an interesting albeit recreational optimization problem emerges.

Pyomo

Pyomo is a flexible modeling framework for Python which is written and maintained by Sandia National Labs. Furthermore, it’s supported by the COIN-OR Foundation (under the now-deprecated name Coopr). For an open-source project, it possesses an unusually good level of support and documentation. I’ve worked with a few specialized algebraic modeling languages (MathProg in conjunction with GLPK and Xpress-Mosel) and I came to realize some inherent limitations in their static treatment of data. Sure, there are control structures in Mosel and difficult-to-learn APIs for GLPK, but nothing I’ve encountered offers the fluid model-building paradigm of Pyomo. Do you want to read a static data set, pull in real-time S&P 500 data, build dynamic constraints, pause to cook an omelet, then throw out all data lines that contains the word “aardvark”? Pyomo has you covered. Model building can be stopped and started at will because it’s all performed in-line with your other Python code. For more information about supported solvers and additional components, such as the stochastic programming extension, see http://www.pyomo.org/about/.

The Code

Let’s conjure up a fantasy league in which we’d like to choose 1 quarterback, 2 wide receivers, 2 running backs, 2 tight ends, 1 defense, and 1 kicker. The salary cap is $50,000 and we’re keen on maximizing the sum of projected points.

The first few rows of input data:

name	position	team	cost	proj_pts
Shayne Graham	K	NO	4800	9.13
New Orleans D/ST	D	NO	4600	7.11
Drew Brees	QB	NO	9100	22.07
Luke McCown	QB	NO	5000	0.84
Jimmy Graham	TE	NO	7000	13.16
Mark Ingram	RB	NO	6100	10.75
Marques Colston	WR	NO	6700	9.45

Without further ado, the Python code:

from __future__ import division
import pandas as pd
from pyomo.environ import *
from pyomo.opt import SolverFactory

## IMPORT DATA ##
fileName = "ff_data.csv"
df = pd.read_csv(fileName)

## SETTINGS ##
max_salary = 50000

## DATA PREP ##
POSITIONS = ['QB', 'RB', 'WR', 'TE', 'D', 'K']
psn_limits = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 2, 'D': 1, 'K': 1}
PLAYERS = list(set(df['name']))
proj = df.set_index(['name'])['proj_pts'].to_dict()
cost = df.set_index(['name'])['cost'].to_dict()
pos = df.set_index(['name'])['position'].to_dict()

## DEFINE MODEL ##
model = ConcreteModel()

# decision variable
model.x = Var(PLAYERS, domain=Boolean, initialize=0)

# constraint: salary cap
def constraint_cap_rule(model):
 salary = sum(model.x[p] * cost[p] for p in PLAYERS)
 return salary <= max_salary
model.constraint_cap = Constraint(rule=constraint_cap_rule)

# constraint: positional limits
def constraint_position_rule(model, psn):
 psn_count = sum(model.x[p] for p in PLAYERS if pos[p] == psn)
 return psn_count == psn_limits[psn]
model.constraint_position = Constraint(POSITIONS, rule=constraint_position_rule)

# objective function: maximize projected points
def obj_expression(model):
 return summation(model.x, proj, index=PLAYERS)
model.OBJ = Objective(rule=obj_expression, sense=maximize)

# good for debugging the model
#model.pprint()

## SOLVE ##
opt = SolverFactory('glpk')

# create model instance, solve
instance = model.create()
results = opt.solve(instance)
instance.load(results) #load results back into model framework

## REPORT ##
print("status=" + str(results.Solution.Status))
print("solution=" + str(results.Solution.Objective.x1.value) + "\n")
print("*******optimal roster********")
P = [p for p in PLAYERS if instance.x[p].value==1]
for p in P:
 print(p + "\t" + pos[p] + "\t cost=" + str(cost[p]) + "\t projection=" + str(proj[p]))
print("roster cost=" + str(sum(cost[p] for p in P)))

The first four lines warrant brief commentary: I’m using Python 2.7.9, and I want to make sure that integer division returns a floating point number, hence line 1. Also noteworthy is

from pyomo.opt import SolverFactory

The SolverFactory sub-module interacts directly with the solver (GLPK in this case) and returns the results directly to create a self-contained Python script.

Next, the program imports the player data using pandas in lines 6-8.

By convention, index sets are capitalized while data vectors are presented in lower-case. The dictionary psn_limits prescribes roster limits for each position.

POSITIONS = ['QB', 'RB', 'WR', 'TE', 'D', 'K']
psn_limits = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 2, 'D': 1, 'K': 1}

Next, the player names (index set), projected points (data), cost (data), and position (data) are extracted from the pandas data frame.

PLAYERS = list(set(df['name']))
proj = df.set_index(['name'])['proj_pts'].to_dict()
cost = df.set_index(['name'])['cost'].to_dict()
pos = df.set_index(['name'])['position'].to_dict()

When I first wrote this script, Pyomo preferred Python dictionaries and lists as model inputs; this may no longer be the case, but I encourage the reader to consult the extensive documentation for him or herself. The intermediate set() function in line 16 removes potential duplicate player names.

model = ConcreteModel()

Pyomo users may formulate concrete or abstract models. Abstract models are purely algebraic constructs which are later populated wholesale by static datasets in much same way as MathProg in GLPK or Xpress-Mosel. Concrete models, to me, better illustrate the power of Pyomo–namely, the ability to dynamically load data from native Python structures.

For example, we take the recently extracted set PLAYERS to index the vector of decision variables:

model.x = Var(PLAYERS, domain=Boolean, initialize=0)

Pyomo constraints can be defined by rules which are merely a function returning the desired constraint expression.

# constraint: salary cap
def constraint_cap_rule(model):
 salary = sum(model.x[p] * cost[p] for p in PLAYERS)
 return salary <= max_salary
model.constraint_cap = Constraint(rule=constraint_cap_rule)

The second constraint is really a set of constraints, one for each element in POSITIONS. Note the extra argument in the rule-function definition and the index set passed to the Constraint object.

# constraint: positional limits
def constraint_position_rule(model, psn):
 psn_count = sum(model.x[p] for p in PLAYERS if pos[p] == psn)
 return psn_count == psn_limits[psn]
model.constraint_position = Constraint(POSITIONS, rule=constraint_position_rule)

Approaching the end, we define the objective function with syntax similar to constraint definition. The summation() function provides a useful shorthand for element-wise multiplication followed by summing (i.e. a dot product).

# objective function: maximize projected points
def obj_expression(model):
 return summation(model.x, proj, index=PLAYERS)
model.OBJ = Objective(rule=obj_expression, sense=maximize)

The remainder of the code runs the optimization model and prints the results. Line 53 offers a helpful trick: the optimization results are returned in a raw form from the solver; instance.load() pulls the raw data back into the model framework where we can more easily query the results by player name.

Here’s the output:

status=optimal
solution=100.42

*******optimal roster********
Cleveland D/ST D  cost=4800  projection=10.18
Jermaine Kearse WR  cost=4700  projection=7.33
Heath Miller TE  cost=5400  projection=8.72
Lamar Miller RB  cost=7300  projection=16.15
Billy Cundiff K  cost=4600  projection=8.67
Jeremy Hill RB  cost=5200  projection=10.37
Brian Hoyer QB  cost=6200  projection=17.41
Miles Austin WR  cost=4800  projection=8.43
Jimmy Graham TE  cost=7000  projection=13.16
roster cost=50000

Final Remarks

Some leagues have a “FLEX” position which any one of a running back, wide receiver, or tight end may occupy. The implementation of one or more FLEX positions is left as an exercise. (Hint: constrain the total number of WR + RB + TE.)

Disclaimer: it is the author’s wish not to encourage gambling, but rather the pursuit of statistical education.

Thanks for your feedback; I’m glad it helped. I felt there was a dearth of good Pyomo examples out in the public domain. I’ve built a Pyomo model similar to what you’re describing once before, so I’ll try to formulate a partial optimization problem. I’m going to operate under the assumption that your problem models a supply chain, with origin-destination pairs. To make it interesting, let’s assume that not all pairs are viable shipping routes. Perhaps there are warehouses in Memphis and Denver with stores dotted around the United States. Here’s how I would formulate something using a concrete Pyomo model.

Hypothetical CSV (ship_costs.csv)
origin,destination,ship_cost
Memphis,New York,1000
Memphis,Trenton,1100
Memphis,Tallahassee,800
Memphis,Chicago,700
Denver,Los Angeles,1200
Denver,Boulder,300
Denver,Phoenix,700
Denver,Chicago,800

(warehouse_capacity.csv)
origin,capacity
Memphis,10000
Denver,6000

Using pandas, here are some of the highlights:
shipData = pd.read_csv(‘ship_costs.csv’)
capData = pd.read_csv(‘warehouse_capacity.csv’)

# INDEX SETS
# Note: the intermediate use of ‘set’ removes duplicates
WAREHOUSES = list(set(shipData[‘origin’]))
STORES = list(set(shipData[‘destination’]))
# Note: itertuples() needed to create multi-dimensional list here
ALLOWED_ARCS = list(set(shipData[[‘origin’, ‘destination’]].itertuples(index=False)))

# PARAMETERS
shipCosts = shipData.set_index([‘origin’, ‘destination’])[‘ship_cost’].to_dict()
whCap = capData.set_index([‘origin’])[‘capacity’].to_dict()
…
# DECISION VARIABLE
model = ConcreteModel()
model.flow = Var(ALLOWED_ARCS, domain=NonNegativeIntegers)
…
# APPLY CAPACITY CONSTRAINT
def constraint_flowCap_rule(model, wh):
# creates set of all arcs emanating from warehouse ‘wh’
K = [ (w,s) for w in WAREHOUSES for s in STORES if (w,s) in ALLOWED_ARCS and w == wh ]
# make sure the sums of product flows out of this warehouse don’t exceed its capacity
return sum(model.flow[k] for k in K) <= whCap[wh]

# Note: here the index set (WAREHOUSES) is listed first. A constraint is applied for each
# member (wh) of WAREHOUSES
model.constraint_flowCap = Constraint(WAREHOUSES, rule=constraint_flowCap_rule)

I sort of made that up on the fly while looking at my old model–so it may require some tweaking to get it to work. Good luck!

-Nate

Reply ↓

9 thoughts on “Pyomo Meets Fantasy Football”

Nick P on April 8, 2015 at 2:55 am said:

Great write-up. This has helped a lot with formulating my own model in Pyomo. Have you ever used dictionaries and lists to read in 2d tuples and parameters from a csv? For instance, I want to define an arc between two nodes in a network and define a flow cap on this arc. Having trouble doing this in pyomo.

Thanks,
Nick

Reply ↓
ndejong77 on April 9, 2015 at 11:52 pm said:

Thanks for your feedback; I’m glad it helped. I felt there was a dearth of good Pyomo examples out in the public domain. I’ve built a Pyomo model similar to what you’re describing once before, so I’ll try to formulate a partial optimization problem. I’m going to operate under the assumption that your problem models a supply chain, with origin-destination pairs. To make it interesting, let’s assume that not all pairs are viable shipping routes. Perhaps there are warehouses in Memphis and Denver with stores dotted around the United States. Here’s how I would formulate something using a concrete Pyomo model.

Hypothetical CSV (ship_costs.csv)
origin,destination,ship_cost
Memphis,New York,1000
Memphis,Trenton,1100
Memphis,Tallahassee,800
Memphis,Chicago,700
Denver,Los Angeles,1200
Denver,Boulder,300
Denver,Phoenix,700
Denver,Chicago,800

(warehouse_capacity.csv)
origin,capacity
Memphis,10000
Denver,6000

Using pandas, here are some of the highlights:
shipData = pd.read_csv(‘ship_costs.csv’)
capData = pd.read_csv(‘warehouse_capacity.csv’)

# INDEX SETS
# Note: the intermediate use of ‘set’ removes duplicates
WAREHOUSES = list(set(shipData[‘origin’]))
STORES = list(set(shipData[‘destination’]))
# Note: itertuples() needed to create multi-dimensional list here
ALLOWED_ARCS = list(set(shipData[[‘origin’, ‘destination’]].itertuples(index=False)))

# PARAMETERS
shipCosts = shipData.set_index([‘origin’, ‘destination’])[‘ship_cost’].to_dict()
whCap = capData.set_index([‘origin’])[‘capacity’].to_dict()
…
# DECISION VARIABLE
model = ConcreteModel()
model.flow = Var(ALLOWED_ARCS, domain=NonNegativeIntegers)
…
# APPLY CAPACITY CONSTRAINT
def constraint_flowCap_rule(model, wh):
# creates set of all arcs emanating from warehouse ‘wh’
K = [ (w,s) for w in WAREHOUSES for s in STORES if (w,s) in ALLOWED_ARCS and w == wh ]
# make sure the sums of product flows out of this warehouse don’t exceed its capacity
return sum(model.flow[k] for k in K) <= whCap[wh]

# Note: here the index set (WAREHOUSES) is listed first. A constraint is applied for each
# member (wh) of WAREHOUSES
model.constraint_flowCap = Constraint(WAREHOUSES, rule=constraint_flowCap_rule)

I sort of made that up on the fly while looking at my old model–so it may require some tweaking to get it to work. Good luck!

-Nate

Reply ↓
Pau on September 9, 2015 at 5:23 pm said:

Great post! I’m new to Pyomo and I’d like to be able to run your example in my machine. Could you provide a link to the data?
Thanks!

Reply ↓
- ndejong77 on September 16, 2015 at 12:42 am said:
  
  Pau, here’s a link to a dataset. Unfortunately, I was unable to locate the dataset I used to create the original blog post. However, I tested this one and it works fine.
  https://s3-us-west-2.amazonaws.com/databarista/ff_data.csv
  
  Reply ↓
  - Pau on September 22, 2015 at 6:52 am said:
    
    Thanks! It was really useful. Just in case you don’t know, there a neat way to get a pandas data frame with the roster:
    
    players_hire = [name for name, xvar in model.x.iteritems() if xvar==1]
    roster = df.set_index(‘name’).loc[players_hire]
BCR on September 15, 2015 at 6:03 pm said:

This is a great example however when I try to use your model as constructed, the second constraint throws an error.

ERROR: Constructing component ‘constraint_position’ from data=None failed:
ValueError:
Invalid constraint expression. The constraint expression resolved to a
trivial Boolean (False) instead of a Pyomo object. Please modify your rule
to return Constraint.Infeasible instead of False.

Error thrown for Constraint “constraint_position”

I’m using a very similar dataset and set up the problem as you have and have been unable to figure out the problem. Any insight is appreciated. Thanks.

Reply ↓
- ndejong77 on September 16, 2015 at 12:46 am said:
  
  I’ve encountered that error before and I believe it results from the data not flowing into the Pyomo model properly. Try this dataset; it worked for me:
  https://s3-us-west-2.amazonaws.com/databarista/ff_data.csv
  
  Reply ↓
Eric Truett on December 14, 2015 at 7:43 am said:

I have a question about constraints. If you play in a league that has a flex or utility position, then the maximum number of players at each position will exceed the number of players allowed on the team as a whole, so you need to add a minimum constraint and also a team size constraint. For example, if your league is 1QB, 1RB, 2WR, 1TE, 1FLEX (RB/WR/TE), 1DST, then a valid roster ca have 1 or 2 RB, 2 or 3 WR, and 1 or 2 TE.

Reply ↓
- ndejong77 on December 21, 2015 at 10:36 pm said:
  
  Eric, you are on the to the right track. Probably the easiest way to do it is the following set of constraints:
  1) #RB >= 1
  2) #WR >= 2
  3) #TE >= 1
  4) #RB + #WR + #TE == 5
  
  You’ll need to modify lines 33-37 of my code a bit, but it’s doable.
  
  Reply ↓

The Data Barista

One data latté with an extra shot of math & statistics

Pyomo Meets Fantasy Football

Intro

Pyomo

The Code

Final Remarks

9 thoughts on “Pyomo Meets Fantasy Football”

Leave a comment Cancel reply

Intro

Pyomo

The Code

Final Remarks

Share this:

Related

9 thoughts on “Pyomo Meets Fantasy Football”

Leave a comment Cancel reply