
== PySP Overview  ==

This chapter describes PySP: (Pyomo Stochastic Programming), where parameters are allowed to be uncertain.

=== Overview of Modeling Components and Processes ===

The sequence of activities is typically the following:

* Create a deterministic model and declare components 
* Develop base-case data for the deterministic model
* Test, verify and validate the deterministic model
* Model the stochastic processes
* Develop a way to generate scenarios (in the form of a tree if there are more than two stages)
* Create the data files need to describe the stochastics
* Use PySP to solve stochastic problem

When viewed from the standpoint of file creation, the process is 

* Create an abstract model for the deterministic problem in a file called +ReferenceModel.py+
* Specify data for this model in a file called +ReferenceModel.dat+
* Specify the stochastics in a file called +ScenarioStructure.dat+
* Specify scenario data

=== Birge and Louveaux's Farmer Problem ===

Birge and Louveaux <<BirgeLouveauxBook>> make use of the example of a farmer
who has 500 acres that can be planted in wheat, corn or sugar beets, 
at a per acre cost of $150, $230 and $260, respectively. The farmer
needs to have at least 200 tons of wheat and 240 tons of corn to use as feed, but
if enough is not grown, those crops can be purchased for $238 and $210, respectively.
Corn and wheat grown in excess of the feed requirements can be sold for $170 and $150,
respectively. A price of $36 per ton is guaranteed for the first 6000 tons grown by
any farmer, but beets in excess of that are sold for $10 per ton. The yield
is 2.5, 3, and 20 tons per acre for wheat, corn and sugar beets, respectively.

==== ReferenceModel.py ====
So far, this is a deterministic problem because we are assuming that we know all the data. The Pyomo
model for this problem shown here is in the file +ReferenceModel.py+ in the sub-directory +examples/pysp/farmer/models+ that is
distributed with Pyomo.
----
include::examples/PyomoGettingStarted/farmer.py[]
----

==== ReferenceModel.dat ====
The data introduced here are in the file ReferenceModel.dat in the sub-directory examples/pysp/farmer/scenariodata that
is distributed with Pyomo.
----
include::examples/PyomoGettingStarted/farmer.dat[]
----

Any of these data could be modeled as uncertain, but we will consider only the possibility that the yield per acre
could be higher or lower than expected. Assume that there is a probability of 1/3 that the yields will be
the average values that were given (i.e., wheat 2.5; corn 3; and beets 20). Assume that there is a 1/3 probability
that they will be lower (2, 2.4, 16) and 1/3 probability they will be higher (3, 3.6, 24).  We refer to
each full set of data as a _scenario_ and collectively we call them a _scenario_ _tree_. 
In this case the scenario tree is very simple: there is a root node
and three leaf nodes: one corresponding to each scenario. The acreage-to-plant decisions are root node decisions
because they must be made without knowing what the yield will be. 
The other variables are so-called _second_ _stage_ decisions, because they will depend on which scenario is realized.

==== ScenarioStructure.dat ====
PySP requires that users describe the scenario tree using specific constructs in a file named +ScenarioStructure.dat+; for the farmer
problem, this file can be found in the pyomo sub-directory +examples/pysp/farmer/scenariodata+ that is distributed with Pyomo.
----
include::examples/PyomoGettingStarted/farmerScenarioStructure.dat[]
----
This data file is verbose and somewhat redundant, but in most applications it is generated by software rather than by a person, so
this is not an issue. 
Generally, the left-most part of each
expression (e.g. ``set Stages :='') is required and uses reserved words (e.g., +Stages+) and the other names are
supplied by the user (e.g., ``FirstStage'' could be any name). Every assignment is terminated with a semi-colon. 
We will now consider the assignments in this file one at a time.

The first assignments provides names for the stages and the words "set Stages" are required, as are the := symbols. 
Any names can be used. In this example, we used "FirstStage" and "SecondStage" but we could
have used "EtapPrimero" and "ZweiteEtage" if we had wanted to. Whatever names are given here will continue to be used
to refer to the stages in the rest of the file. The order of the names is important. A simple way to think of it
is that generally, the names must be in time order (technically, they need to be in order of information discovery, but
that is usually time-order).  Stages refers to decision stages, which may, or may not, correspond directly with time stages. 
In the farmer example, decisions about how much to plant are made in the first stage and "decisions" (which are pretty obvious, but
which are decision variables nonetheless) about how much to sell at each price and how much needs to be bought are 
second stage decisions because they are made after the yield is known.
----
set Stages := FirstStage SecondStage ;
----

Node names are constructed next. The words "set Nodes" are required, but any names may be assigned to the nodes. In two
stage stochastic problems there is a root node, which we chose to name "RootNode" and then there is a node for each scenario.
----
set Nodes := RootNode 
             BelowAverageNode
             AverageNode 
             AboveAverageNode ;
----

Nodes are associated with time stages with an assignment beginning with the required words "param Nodestage." The assignments must make
use of previously defined node and stage names. Every node must be assigned a stage.
----
param NodeStage := RootNode         FirstStage 
                   BelowAverageNode SecondStage
                   AverageNode      SecondStage 
                   AboveAverageNode SecondStage ;
----

The structure of the scenario tree is defined using assignment of children to each node that has them. Since this is a two stage
problem, only the root node has children. The words "param Children" are required for every node that has children and the name
of the node is in square brackets before the colon-equals assignment symbols. A list of children is assigned.
----
set Children[RootNode] := BelowAverageNode 
                          AverageNode 
                          AboveAverageNode ;
----

The probability for each node, conditional on observing the parent node is given in an assignment that begins with the
required words "param ConditionalProbability." The root node always has a conditional probability of 1, but it must always
be given anyway. In this example, the second stage nodes are equally likely.
----
param ConditionalProbability := RootNode          1.0
                                BelowAverageNode  0.33333333
                                AverageNode       0.33333334
                                AboveAverageNode  0.33333333 ;
----

Scenario names are given in an assignment that begins with the required words "set Scenarios" and provides a list of 
the names of the scenarios. Any names may be given. In many applications they are given unimaginative names generated by 
software such as "Scen1" and the like. In this example, there are three scenarios and the names reflect the relative values of
the yields.
----
set Scenarios := BelowAverageScenario 
                 AverageScenario 
                 AboveAverageScenario ;
----

Leaf nodes, which are nodes with no children, are associated with scenarios. This assignment must be one-to-one and it is
initiated with the words "param ScenarioLeafNode" followed by the colon-equals assignment characters.
----
param ScenarioLeafNode := 
                    BelowAverageScenario BelowAverageNode
                    AverageScenario      AverageNode
                    AboveAverageScenario AboveAverageNode ;
----

Variables are associated with stages using an assignment that begins with the required words "set StageVariables" and the name
of a stage in square brackets followed by the colon-equals assignment characters. Variable names that have been defined
in the file ReferenceModel.py can be assigned to stages. Any variables that are not assigned are assumed to be in the last stage.
Variable indexes can be given explicitly and/or wildcards can be used. 
Note that the variable names appear without the prefix "model." 
In the farmer example, DevotedAcreage is the only
first stage variable.
----
set StageVariables[FirstStage] :=  DevotedAcreage[*] ;
set StageVariables[SecondStage] := QuantitySubQuotaSold[*]
                                   QuantitySuperQuotaSold[*]
                                   QuantityPurchased[*] ;
----

For reporting purposes, it is useful to define auxiliary variables in +ReferenceModel.py+ that will be assigned the cost
associated with each stage. This variables do not impact algorithms, but the values are output by some software during execution
as well as upon completion. The names of the variables are assigned to stages using the "param StageCostVariable" assignment. The stages
are previously defined in +ScenarioStructure.dat+ and the variables are previously defined in +ReferenceModel.py+. Note that the variable names appear without the prefix "model." 
----
param StageCostVariable := FirstStage  FirstStageCost
                           SecondStage SecondStageCost ;
----

==== Scenario data specification ====
So far, we have given a model in the file named +ReferenceModel.py+, a set of deterministic data in the file named +ReferenceModel.py+, and
a description of the stochastics in the file named +ScenarioStructure.dat+. All that remains is to give the data for each scenario. 
There are two ways to do that in PySP: _scenario-based_ and _node-based_. The default is scenario-based so we will describe that first.

For scenario-based data, the full data for each scenario is given in a +.dat+ file with the root name that is the name of the scenario.
So, for example, the file named +AverageScenario.dat+ must contain all the data for the model for the scenario named "AvererageScenario." It turns out that this file can be created by simply copying the file +ReferenceModel.dat+ as shown above because it contains
a full set of data for the "AverageScenario" scenario. The files +BelowAverageScenario.dat+ and +AboveAverageScenario.dat+ will differ from
this file and from each other only in their last line, where the yield is specified.  These three files are distributed with
Pyomo and are in the pyomo sub-directory +examples/pysp/farmer/scenariodata+ along with +ScenarioStructure.dat+ and +ReferenceModel.dat+.

Scenario-based data wastes resources by specifying the same thing over and over again. In many cases, that does not matter
and it is convenient to have full scenario data files available (for one thing, the scenarios can easily be run
independently using the +pyomo+ command). However, in many other settings, it is better to use a node-based specification where the
data that is unique to each node is specified in a .dat file with a root name that matches the node name. In the farmer example, the file
+RootNode.dat+ will be the same as +ReferenceModel.dat+ except that it will lack the last line that specifies the yield. The files
+BelowAverageNode.dat+, +AverageNode.dat+, and +AboveAverageNode.dat+ will contain only one line each to specify the yield. If node-based
data is to be used, then the +ScenarioStructure.dat+ file must contain the following line:
----
param ScenarioBasedData := False ;
----
An entire set of files for node-based data for the farmer problem are distributed with Pyomo in the sub-directory
+examples/pysp/farmer/nodedata+

=== Finding Solutions for Stochastic Models ===

PySP provides a variety of tools for finding solutions to stochastic programs.

==== runef ====

The +runef+ command puts together the so-called _extensive_ _form_ version of the model. It creates a large model that has constraints to
ensure that variables at a node have the same value. For example, in the farmer problem, all of the +DevotedAcres+ variables must have the
same value regardless of which scenario is ultimately realized. The objective can be the expected value of the objective function, or
the CVaR, or a weighted combination of the two. Expected value is the default. A full set of options
for +runef+ can be obtained using the command:
----
runef --help
----

The pyomo distribution contains the files need to run the farmer example in the sub-directories to the sub-directory
+examples/pysp/farmer+ so if this is the current directory and
if CPLEX is installed, the following command will cause formation of the EF and its solution using CPLEX.
----
runef -m models -i nodedata --solver=cplex --solve
----

The option +-m models+ has one dash and is short-hand for the option +--model-directory=models+ and note that the full option
uses two dashes. The +-i+ is equivalent to +--instance-directory=+ in the same fashion. The default solver is CPLEX, 
so the solver option is not really needed. With the +--solve+ option, runef would simply write an .lp data file
that could be passed to a solver.

==== runph ====

The runph command executes an implementation of Progressive Hedging (PH) that is intended to support scripting and extension.

The pyomo distribution contains the files need to run the farmer example in the sub-directories to the sub-directory
examples/pysp/farmer so if this is the current directory and
if CPLEX is installed, the following command will cause PH to execute using the default sub-problem solver, which
is CPLEX.
----
runph -m models -i nodedata
----

The option +-m models+ has one dash and is short-hand for the option +--model-directory=models+ and note that the full option
uses two dashes. The +-i+ is equivalent to +--instance-directory=+ in the same fashion.

After about 33 iterations, the algorithm will achieve the default level of convergence and terminate. A lot of output is
generated and among the output is the following solution information:
----
Variable=DevotedAcreage
	Index: [CORN] 		 (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
		Values:       79.9844      80.0000      79.9768     Max-Min=      0.0232     Avg=     79.9871 
	Index: [SUGAR_BEETS] 		 (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
		Values:      249.9848     249.9770     250.0000     Max-Min=      0.0230     Avg=    249.9873 
	Index: [WHEAT] 		 (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
		Values:      170.0308     170.0230     170.0232     Max-Min=      0.0078     Avg=    170.0256 
Cost Variable=FirstStageCost
	Tree Node=RootNode 		 (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
	Values:   108897.0836  108897.4725  108898.1476     Max-Min=      1.0640     Avg= 108897.5679 
----
For problems with no, or few, integer variables, the default level of convergence leaves root-node variables almost
converged. Since the acreage to be planted cannot depend on the scenario that will be realized in the future, the
average, which is labeled "Avg" in this output, would be used. 
A farmer would probably interpret acreages of 79.9871, 249.9873, and 170.0256 to be 80, 250, and 170.
In real-world applications, PH is embedded in scripts that produce output in a format desired by a decision maker.

But in real-world applications, the default settings for PH seldom work well enough. In addition to post-processing the output,
a number of parameters need to be adjusted and sometimes scripting to extend or augment the algorithm
is needed to improve convergence rates.
A full set of options can be obtained with the command.
----
runph --help
----
Note that there are two dashes before +help+.

By default, PH uses quadratic objective functions after iteration zero; in some settings it may
be desirable to linearize the quadratic terms. This is required to use a solver such as glpk
for MIPs because it does not support quadratic MIPs. The directive
+--linearize-nonbinary-penalty-terms=n+ causes linearization of the penalty terms using n pieces. For example,
to use glpk on the farmer, assuming glpk is installed and the command is given when the current
directory is the +examples/pysp/farmer+, the following command will use default settings for most parameters and
four pieces to approximate quadratic terms in sub-problems:
----
runph -i nodedata -m models --solver=glpk --linearize-nonbinary-penalty-terms=4
----
Use of the +linearize-nonbinary-penalty-terms+ option requires that all variables not in the final stage have bounds.

// vim: set syntax=asciidoc:
