Está en la página 1de 23

Optimaliseren

13. Decision making under uncertainty


13.1 Decision criteria
4 decision criteria:
- Maximin: choose the action with the ‘best’ worst outcome.
- Maximax: choose the action with the ‘best’ best outcome.
- Minimax regret: for each possible state, find an action that maximizes the outcome and calculate the difference
between this outcome and the others. Then apply the minimax criterion to the regret matrix: choose the action
with the ‘best’ worst regret.
- Expected value criterion: choose the action that yields the largest expected reward.

Problems:
(1) For the maximin, maxmax and minimax regret criteria, determine Pizza King’s choice of advertising campaign.
Noble Greek
Pizza King Small Medium Large
Small 6000 5000 2000
Medium 5000 6000 1000
Large 9000 6000 0

Answer:
Noble Greek
Pizza King Small Medium Large
Small 9000 – 6000 = 3000 6000 – 5000 = 1000 0
Medium 9000 – 5000 = 4000 0 2000 – 1000 = 1000
Large 0 0 2000 – 0 = 2000

Maximin: small
Maximax: large
Minimax regret: large

(2) Sodaco estimates that the annual demand for Chocovan has the following mass function:
- P(D=30 000) = 0,3
- P(D=50 000) = 0,4
- P(D=80 000)= 0,3
Each case of Chocovan sells for $5 and incurs a variable cost of $3. It costs $800 000 to build a plant to produce.
Assume that if $1 is received every year, this is equivalent to receiving $10 at the present time.
Considering the reward for each action and state of the world to be in terms of NPV, use each decision criterion to
determine whether Sadaco should build the plant.

Answer:
Demand
Build/not build 30 000 50 000 80 000
Build -200 000 200 000 800 000
Not build 0 0 0

Demand
Build/not build 30 000 50 000 80 000
Build -200 000 0 0
Not build 0 -200 000 -800 000

Maximin: not build


Maximax: build
Minimax regret: build
Expected value: build → (0,3 x -200 000) + (0,4 x 200 000) + (0,3 x 800 000) > 0

(4) Pizza King believes that Noble Greek’s price is a random variable D having the following mass function:
- P(D=6) = 0,25
- P(D=8) = 0,5
- P(D=10) = 0,25
If Pizza King charges p(1) and Noble Greek charges p(2), Pizza King will sell 100 + 25[p(2)-p(1)] pizzas. It costs Pizza
king $4 to make a pizza. Pizza King is considering charging $5, $6, $7, $8, $9 for a pizza. Use each decision criterion
to determine the price that Pizza King should charge.

Answer:
Noble Greek price
Pizza King price 6 8 10
5 125 175 225
6 200 300 400
7 225 375 525
8 200 400 600
9 125 375 625

Noble Greek price


Pizza King price 6 8 10
5 100 225 375
6 25 100 200
7 0 25 75
8 25 0 0
9 100 25 25

Maximin: 7
Maximax: 9
Minimax regret: 8
Expected value criterion: 8
- $5: (0,25x125) + (0,5x175) + (0,25x225) = 175
- $6: (0,25x200) + (0,5x300) + (0,25x400) = 300
- $7: (0,25x225) + (0,5x375) + (0,25x525) = 375
- $8: (0,25x200) + (0,5x400) + (0,25x600) = 400
- $9: (0,25x125) + (0,5x375) + (0,25x625) = 375

(5) Alden believes that Forbes’s bid is a random variable B with the following mass function:
- P(B=6000) = 0,4
- P(B=8000) = 0,3
- P(B=11000) = 0,3
It will cost Alden $6000 to complete the project. Use each of the decision criteria to determine Alden’s bid.
Assume that in case of a tie, Alden wins the bidding.

Op het eerste gezicht mag Alden het project aanbieden voor elke prijs P; een moment van reflectie leert ons echter
dat sommige prijzen gedomineerd worden door andere prijzen. Bijvoorbeeld, het heeft geen zin om P=7000 te
kiezen; deze keuze wordt gedomineerd door P=8000. (Waarom?)
De enige relevante keuze’s zijn P = 6000, 8000 of 11 000.

Answer:
Forbes’s bid
Alden’s bid 6000 8000 11 000
6000 0 0 0
8000 0 2000 2000
11 000 0 0 5000

Forbes’s bid
Alden’s bid 6000 8000 11 000
6000 0 2000 5000
8000 0 0 3000
11 000 0 2000 0

Maximin: /
Maximax: 11 000
Minimax regret: 11 000
Expected value criterion: 11 00
- 6000: 0
- 8000: 1200
- 11 000: 1500
13.2 Utility theory
𝐿1 𝑝𝐿2 = the person prefers 𝐿1 .
𝐿1 𝑖𝐿2 = = the person is indifferent between choosing 𝐿1 and 𝐿2 . 𝐿1 and 𝐿2 are equivalent lotteries.

The Von Neumann-Morgenstern approach:


1. Identify the most favorable and the least favorable outcomes that can occur.
2. For all other possible outcomes, determine the utility of the reward: 𝑢(𝑟𝑖 ). 𝑈(𝑟𝑖 ) is the probability 𝑞𝑖 such that
you are indifferent between the following two lotteries:

𝑞𝑖 Most favorable outcome


1 𝑟𝑖 and

1 - 𝑞𝑖 Least favorable outcome

3. For a given lottery L, define the expected utility of the lottery L by:

𝑖=𝑛

𝐸(𝑈 𝑓𝑜𝑟 𝐿) = ∑ 𝑝𝑖 𝑢(𝑟𝑖 )


𝑖=1

4. In choosing between the lotteries we simply chose the lottery with the largest expected utility.

Example:
Suppose we ask to rank the following lotteries:

0,5 $30 000


𝐿1 1 $10 000 and 𝐿2

0,5 $0

0,02 $ -10 000


𝐿3 1 $0 and 𝐿4

0,98 $500
1. $30 000 and $ -10 000
2. Suppose that for 𝑟1 = $10 000, you are indifferent between:

0,9 $30 000


1 $10 000 and

0,1 $ -10 000


For 𝑟2 = $500, you are indifferent between:

0,62 $30 000


1 $500 and

0,38 $ -10 000


For 𝑟3 = $0, you are indifferent between:

0,6 $30 000


1 $0 and

0,4 $ -10 000


Then 𝑢(𝑟1 ) = 0,9 ; 𝑢(𝑟2 ) = 0,62 and 𝑢(𝑟3 ) = 0,6.

3. In our example:
𝐸(𝑈 𝑓𝑜𝑟 𝐿1 ) = 1 x 0,9 = 0,9
Answer:

a) (1 x 240) + (0,75 x -1000) + (0,25 x 0) < (0,25 x 1000) + (0,75 x 0) + (1 x -750) of -510 < -500
You prefer 2+3 over 1+4.
b) Framing: people often set their utility function from the standpoint of a frame from which they view the
current situation. Most people’s utility functions treat a loss of a given value as being more important
than a gain of an identical value. They exhibit risk-averse behavior when the outcomes are expressed as
gains and risk-seeking behavior when the outcomes are expressed as losses.

13.4 Decision trees


Example:
Colaco currently has assets of $150 000 and wants to decide whether to market a new chocolate-flavored soda Chocola.
Colaco has three alternatives:
1) Test market Chocola locally, then utilize the results of the market study to determine whether or not to market
Chocola nationally.
2) Immediately market Chocola nationally.
3) Immediately decide not to market Chocola nationally.
Colaco believes that Chocola has a 55% chance of being a national success and a 45% chance of being a national failure.
If Chocola is a success, the asset position will increase by $300 000 an if Chocola is a failure, it will decrease by $100 000.
If Colaco performs a market study (cost = $30 000), there is a 60% chance that the study will yield favorable results
(referred to as a local success) and a 40% chance that the study will yield unfavorable results (referred to as a local failure).
If a local success is observed, there is an 85% chance that Chocola will be a national success.
If a local failure is observed, there is only a 10% chance that Chocola will be a national success.
If Colaco is risk-neutral, what strategy should the company follow?

Solution:
Decision fork: represents a point in time when Colaco has to make a decision. Each branch emanating from a decision fork
represents a possible decision. For example, Colaco must determine whether or not to test market Chocola:

Event fork: is drawn when outside forces determine which of several random events will occur. Each branch represents a
possible outcome, and the number on each branch represents the probability that the event will occur. For example, if
Colaco decides to test market Chocola, the company faces the following event fork when observing the results of the test
market study:

Terminal branch: no forks emanate from the branch. For example: the branche indicating National success.

To determine the decisions that will maximize Colaco’s expected final asset position, we work backward from right to left.
At each event fork, we calculate the expected final asset position and enter it in the event fork. At each decision fork, we
denote by II (Excel: >>>) the decision that maximizes the expected final asset position and enter the expected final asset
position associated with that decision in the decision fork. We continue working backward in this fashion until we reach the
beginning of the tree. Then the optimal sequence of decisions can be obtained by following the II (or >>>).

We begin by determining the expected final asset positions for the following 3 event forks:
(1) market nationally after local success: (0,85 x 420 000) + (0,15 x 20 000) = 360 000
(2) market nationally after local failure: (0,10 x 420 000) + (0,90 x 20 000) = 60 000
(3) market nationally after don’t test market: (0,55 x 450 000) + (0,45 x 50 000) = 270 000
We then evaluate 3 decision forks:
(1) decision after Local success: 360 000 > 120 000 so we enter an expected final asset position of 360 000
(2) decision after Local failure: 120 000 > 60 000 so we enter an expected final asset position of 120 000
(3) decision after Don’t test market: 270 000 > 150 000 so we enter an expected final asset position of 270 000

We then evaluate the event fork emanating from the test market decision: (0,6 x 360 000) + (0,4 x 120 000) = 264 000.
All that remains is to determine the correct decision at the decision fork test market versus don’t test market:
270 000 > 264 000 so we enter an expected final asset position of 270 000.
We have now reached the beginning of the tree and have found that Colaco’s optimal decision is:
don’t test – market nationally.

Incorporating risk aversion into decision tree analysis:


Note that the optimal strategy yields a 45% chance that the company will end up with a relatively small final asset position
of $50 000. On the other hand, the strategy of test marketing yields only a 9% (= 0,6 x 0,15) chance that the company will
end up with a relatively small final asset position of $20 000. Thus, if Colaco is risk-averse, our optimal strategy may not
reflect the company’s preference.

9% want eenmaal we gekozen hebben voor de markt te testen zullen we in geval van succes steeds kiezen voor het product op de markt te
brengen (360 000 > 120 000) en in geval van pech steeds kiezen voor het product niet op de markt te brengen (120 000 > 60 000).

To illustrate how risk aversion may be incorporated into decision tree analysis, suppose:

- U(450 000) = 1
- U(420 000) = 0,99
- U(150 000) = 0,48
- U(120 000) =0,40
- U(50 000) = 0,19
- U(20 000) = 0

To determine Colaco’s optimal decisions, simply replace each final asset position x, with its utility U(x).
Then at each event fork, compute the expected utility of Colaco’s final asset position, and at each decision fork, choose the
branch having the largest expected utility.
We have found that Colaco’s optimal decision is to begin by test marketing. If a local success is observed, then Colaco
should market Chocola nationally; if a local failure is observed, then Colaco should not market Chocola nationally. This
optimal strategy yields only a 9% chance that Colaco will have a final asset position of 20 000.
Suppose U(226 000) = 0,665, this means the company considers the current situation equivalent to a certain asset position
of $226 000. Thus, if somebody offered to pay more than 226 000 – 150 000 = $76 000 to by the rights to Chocola, Colaco
should take the offer. This is because receiving more than $76 000 would bring Colaco’s asset position to more than
150 000 + 76 000 = $226 000, and this situation has a higher expected utitlity than 0,665.

Expected value of sample information:


Decision trees can be used to measure the value of sample or test market information.
Example: What is the value of the information that would be obtained by test marketing Chocola?
Expected value with sample information (EVWSI) = the expected final asset position if the company acts optimally and the
test market study is costless. EVWSI = 294 000
Expected value with original information (EVWOI) = the largest expected final asset position if the test market study were
not available. EVWOI = 270 000
Expected value of sample information (EVSI) = EVWSI – EVWOI = 24 000
→ Since the cost of the test market study ($30 000) exceeds EVSI, Colaco should not conduct the test market study.

Expected value of perfect information:


Perfect information = all uncertain events that can affect Colaco’s final asset position still occur with the given probabilities.
So Colaco finds out whether Chocola is a national success or a national failure before making the decision to market
Chocola nationally or not. Thus, expected value with perfect information (EVWPI) is found by drawing a decision tree in
which the decision maker has perfect information about which state has occurred before making a decision.
Expected value of perfect information (EVPI) = EVWPI – EVWOI

For the Colaco example, we find EVWPI = 315 000, then EVPI = 315 000 – 270 000 = 45 000. Thus, a perfect test marketing
study would be worth $45 000. EVPI is a useful upper bound on the value of sample or test market information.
Example:
An art dealer’s client is willing to buy the painting Sunplant at $50 000. The dealer can buy the painting today for $40 000 or
can wait a day and buy the painting tomorrow for $30 000. The dealer may also wait another day and buy the painting for
$26 000. At the end of the third day, the painting will no longer be for sale. Each day, there is a 60% probability that the
painting will be sold. What strategy maximizes the dealer’s profit?

Solution:

Problems:
(4) Nitro is developing a new fertilizer. If Nitro markets the product and it is successful, the company will earn a
$50 000 profit; if it is unsuccessful, the company will lose $35 000. In the past, similar products have been
successful 60% of the time. At a cost of $5000, the effectiveness of the new fertilizer can be tested. If the test
result is favorable, there is an 80% chance that the fertilizer will be successful. If the test is unfavorable, there is
only a 30% chance that the fertilizer will be successful. There is a 60% chance of a favorable test result and a 40%
chance of an unfavorable test result. Determine Nitro’s optimal strategy. Also find EVSI and EVPI.

Answer:
Optimal strategy: don’t test & market the product.
EVSI = EVWSI – EVWOI = 19 800 – 16 000 = 3800
EVWSI = 19 800
EVWOI = 16 000
EVPI = EVWPI – EVWOI = 30 000 – 16 000 = 14 000
EVWPI = 30 000
(16) You have just been chosen to appear on Hoosier Millionaire. The rules are as follows: there are 4 hidden cards.
One says ‘stop’ and the other three have dollars amounts of $150 000, $200 000 and $1 000 000. You get to
choose a card. If the card sys ‘stop’, you win no money. At any time you may quit and keep the largest amount of
money that has appeared on any card you have chosen, or continue. If you continue and choose the stop card,
you win no money.

a) If your goal is to maximize your expected payoff, what strategy would you follow?
b) My utility function for an increase in cash satisfies:

- U(0) = 0
- U(40 000) = 0,25
- U(120 000) = 0,50
- U(400 000) = 0,75
- U(1 000 000) = 1

After drawing a curve through these points, determine a strategy that maximizes my expected utility.
→ Oplossing PWP
13.5 Bayes’ rule and decision trees
Prior probabilities = estimates of the probabilities of each state of the world. 𝑝(𝑠)
𝑝(𝑁𝑆) = 0,55
𝑝(𝑁𝐹) = 0,45
Posterior probabilities = probabilities that give new values for the probability of each state of the world. 𝑝(𝑠|𝑜)
In the Colaco example, the posterior probabilities were given to be:
𝑝(𝑁𝑆|𝐿𝑆) = 0,85
𝑝(𝑁𝐹|𝐿𝑆) = 0,15
𝑝(𝑁𝑆|𝐿𝐹) = 0,10
𝑝(𝑁𝐹|𝐿𝐹) = 0,90
Likelihoods = likelihoods give the probability of observing each experimental outcome. 𝑝(𝑜|𝑠)
𝑝(𝐿𝑆|𝑁𝑆) = 51/55
𝑝(𝐿𝐹|𝑁𝑆) = 4/55
𝑝(𝐿𝑆|𝑁𝐹) = 9/45
𝑝(𝐿𝐹|𝑁𝐹) = 36/45
With the help of Bayes’ rule we can use the prior probabilities and likelihoods to determine the needed posterior
probabilities. In summary, to find posterior probabilities, we go through the following three-step process:

(1) Determine the joint probabilities of the form 𝑝(𝑠 ∩ 𝑜) by multiplying the prior probability 𝑝(𝑠) times the
likelihood 𝑝(𝑜|𝑠).
(2) Determine the probabilities of each experimental outcome 𝑝(𝑜) by summing up all joint probabilities of the form
𝑝(𝑠 ∩ 𝑜).
(3) Determine each posterior probability 𝑝(𝑠|𝑜) by dividing the joint probability 𝑝(𝑠 ∩ 𝑜) by the probability of the
experimental outcome 𝑜 𝑝(𝑜).

Example:
FCC manufactures memory chips in lots of 10 chips. From past experience, FCC knows that 80% of all lots contain 10%
defective chips and 20% of all lots contain 50% defective chips. If a good batch of chips is sent on to the next stage of
production, processing costs of $1000 are incurred, and if a bad batch is sent to the next stage of production, processing
costs of $4000 are incurred. FCC also has the alternative of reworking a batch at a cost of $1000. A reworked batch is sure
to be a good batch. Alternatively, for a cost of $100, FCC can test one chip from each batch in an attempt to determine
whether the batch is defective. Determine how FCC can minimize the expected total cost per batch. Alsoompute EVSI, EVPI.

Solution:
We will multiply costs by -1 and work with maximizing –(total cost). This enables us to use EVSI and EVPI formulas.
There are 2 states of the world:
- G = batch is good
- B = batch is bad
We are given the following prior probabilities:
- p(G) = 0,80
- p(B) = 0,20
FCC has the option of performing an experiment: inspecting one chip per batch. Possible outcomes of this experiment:
- D = defective chip is observed
- ND = non-defective chip is observed
We are given the following likelihoods:
- 𝑝(𝐷|𝐺) = 0,10
- 𝑝(𝑁𝐷|𝐺) = 0,90
- 𝑝(𝐷|𝐵) = 0,50
- 𝑝(𝑁𝐷|𝐵) = 0,50
To complete the decision tree, we need to determine the posterior probabilities.
(1) We begin by computing joint probabilities:
- 𝑝(𝐷 ∩ 𝐺) = 𝑝(𝐺)𝑝(𝐷|𝐺) = 0,8 𝑥 0,1 = 0,08
- 𝑝(𝐷 ∩ 𝐵) = 𝑝(𝐵)𝑝(𝐷|𝐵) = 0,2 𝑥 0,5 = 0,1
- 𝑝(𝑁𝐷 ∩ 𝐺) = 𝑝(𝐺)𝑝(𝑁𝐷|𝐺) = 0,8 𝑥 0,9 = 0,72
- 𝑝(𝑁𝐷 ∩ 𝐵) = 𝑝(𝐵)𝑝(𝑁𝐷|𝐵) = 0,2 𝑥 0,5 = 0,1
(2) We then compute the probability of each experimental outcome:
- 𝑝(𝐷) = 𝑝(𝐷 ∩ 𝐺) + 𝑝(𝐷 ∩ 𝐵) = 0,08 + 0,1 = 0,18
- 𝑝(𝑁𝐷) = 𝑝(𝑁𝐷 ∩ 𝐺) + 𝑝(𝑁𝐷 ∩ 𝐵) = 0,72 + 0,1 = 0,82
(3) Then we use Bayes’ rule to determine the required posterior probabilities:
𝑝(𝐷 ∩ 𝐵) 0,10
𝑝(𝐵|𝐷) = = = 0,56
𝑝(𝐷) 0,18
𝑝(𝐷 ∩ 𝐺) 0,08
𝑝(𝐺|𝐷) = = = 0,44
𝑝(𝐷) 0,18
𝑝(𝑁𝐷 ∩ 𝐵) 0,10
𝑝(𝐵|𝑁𝐷) = = = 0,12
𝑝(𝑁𝐷) 0,82
𝑝(𝑁𝐷 ∩ 𝐺) 0,72
𝑝(𝐺|𝑁𝐷) = = = 0,88
𝑝(𝑁𝐷) 0,82

These posterior probabilities are used to complete the tree. Straightforward computations show that the optimal strategy is
to test a chip. If the chip is defective, rework the batch. If the chip is not defective, send the batch on. An expected cost of
$1580 is incurred.

EVSI = EVWSI – EVWOI = -1480 – (-1600) = 120


EVPI = EVWPI – EVWOI = -1200 – (-1600) = 400
Problems:
(1) A customer has approached a bank for a $50 000 one-year loan at 12% interest.
If the bank does not approve, the $50 000 will be invested in bonds that earn a 6% annual return.
Without further information, the bank feels that there is a 4% chance that the customer will default.
If the customer totally defaults, the bank loses $50 000.
At a cost of $500, the bank can thoroughly investigate the customer’s credit record and supply a
favorable/unfavorable recommendation. Past experience indicates that:
77
𝑝(𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑑𝑒𝑓𝑎𝑢𝑙𝑡) =
96
1
𝑝(𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠) =
4
How can the bank maximize its expected profits? Also find EVSI & EVPI.

Answer:

The bank can maximize its profits by investigating the customer’s credit record and approve the loan.
EVSI = EVWSI – EVWOI = 54 781 – 53 760 = 1021
EVPI = EVWPI – EVWOI = 55880 – 53 760 = 2120
(4) The NBS earns an average of $400 000 from a hit show and loses an average of $100 000 on a flop.
Of all shows reviewed by the network, 25% turn out to be hits and 75% turn out to be flops.
For $40 000, a market research firm will give its view about whether the show will be a hit or a flop.
If a show is going to be a hit, there is a 90% chance that the market research firm will predict the show to be a hit.
If a show is going to be a flop, there is a 80% chance that the firm will predict the show to be a flop.
Determine how the network can maximize its expected profits. Also find EVSI & EVPI.

Answer:

The network can maximize its expected profits by conducting research and broadcasting the show.
EVSI = EVWSI – EVWOI = 75 000 – 25 000 = 50 000
EVPI = EVWPI – EVWOI = 100 000 -25 000 = 75 000
14. Game theory
14.1 Two-person zero-sum and constant-sum games: saddle points
Characteristics of two-person zero-sum games:
- There are 2 players.
- The row player must choose 1 of m strategies. Simultaneously, the column player must choose 1 of n strategies.
- If the row player chooses his 𝑖th strategy and the column player chooses his 𝑗th strategy, then the row player
receives a reward of 𝑎𝑖𝑗 and the column player loses an amount of 𝑎𝑖𝑗 .
Zero-sum: the sum of the rewards to the players is zero.

Basic assumption of two-person zero-sum game theory:


Each player chooses a strategy that enables him to do the best he can, given that his opponent knows the strategy he is
following.

Column player’s strategy


Row player’s strategy Column 1 Column 2 Column 3 Row Minimum
Row 1 4 4 10 4
Row 2 2 3 1 1
Row 3 6 5 7 5
Column maximum 6 5 10

The row player should choose the row having the largest minimum. Max(4,1,5) = 5, he should choose row 3.
The column player should choose the column having the smallest maximum. Min(6,5,10) = 5, he should choose column 2.
The game matrix we have just analyzed has the property of satisfying the saddle point condition:
𝑚𝑎𝑥(𝑟𝑜𝑤 𝑚𝑖𝑛𝑖𝑚𝑢𝑚) = 𝑚𝑖𝑛(𝑐𝑜𝑙𝑢𝑚𝑛 𝑚𝑎𝑥𝑖𝑚𝑢𝑚)
A saddle point is stable in that neither player has an incentive to move away from it.

Two-person constant-sum games:


Two-person constant-sum game: two player game in which, for any choice of both player’s strategies, the row player’s
reward and the column player’s reward add up to a constant value c.
A two-person zero-sum game is just a two-person constant-sum game with c = 0.

Example:
Two networks are vying for an audience of 100 million viewers. The networks must simultaneously announce the type of
show they will air in that time slot. The possible choices for each network and the number of network 1 viewers for each
choice are shown in the following table. For example: if both networks choose a western, the matrix indicates that 35
million people will watch network 1 and 65 million people will watch network 2. Thus we have a two-person constant-sum
games with c = 100. Does this game have a saddle point? What is the value of the game to network 1?

Solution:

Column player’s strategy


Row player’s strategy Western Soap Comedy Row Minimum
Western 35 15 60 15
Soap 45 58 50 45
Comedy 38 14 70 14
Column maximum 45 58 70

max(row minimum) = 45
min(column maximum) = 45
Network 1 choosing a soap and network 2 choosing a western yields a saddle point: neither side will do better if it changes
strategy. Thus, the value of the game to network 1 is 45 million viewers, and the value of the game to network 2 is 55
million viewers.

Problems:
(3) Dora wants to travel form NY to Dallas by the shortest possible route. He may travel over the routes shown in the
following table. Unfortunately, Swiber can bock one road leading out of Atlanta and one road leading out of
Nashville. Dora will not know which roads have been blocked until she arrives at Atlanta or Nashville.
Should Dora start toward Atlanta or Nashville? Which routes should Swiber block?

Route Miles
NY – Atlanta 800
NY – Nashville 900
Nashville – ST Louis 400
Nashville – New Orleans 200
Atlanta – ST Louis 300
Atlanta – New Orleans 600
ST Louis – Dallas 500
New Orleans – Dallas 300

Solution:
Miles vind je door het wegennetwerk te tekenen!

Column player’s strategy = Dora


Row player’s strategy = Swiber Atlanta Nashville Row Minimum
Atlanta – ST Louis & Nashville – ST Louis 1700 1400 1400
Atlanta – ST Louis & Nashville – New Orleans 1700 1800 1700
Atlanta – New Orleans & Nashville – ST Louis 1600 1400 1400
Atlanta – New Orleans & Nashville – New Orleans 1600 1800 1600
Column Maximum 1700 1800

max(row minimum) = 1700


min(column maximum) = 1700
Dora should start toward Atlanta and Swiber should block Atlana – ST Louis & Nashville – New Orleans.

14.2 Two-person zero-sum games: randomized strategies, domination and graphical


solution
How to find the value and optimal strategies for a two-person zero-sum game that does not have a saddle point?
Example:
Two players (Odd & Even) simultaneously choose the number of fingers (1 or 2) to put out. If the sum of the fingers put out
by both players is odd, then Odd wins $1 from Even. If the sum is even, then Even wins $1 from Odd.
Determine whether this game has a saddle point.

Solution:
Column player’s strategy = Even
Row player’s strategy = Odd 1 2 Row Minimum
1 -1 +1 -1
2 +1 -1 -1
Column Maximum +1 +1

min(column maximum) = + 1
max(row minimum) = -1
This game has no saddle point.
Observe that for any choice of strategies by both players, there is a player who can benefit by changing strategy.

Randomized or mixed strategies:


We can allow each player to select a probability of playing each strategy. For example:
𝑥1 = the probability that Odd puts out 1 finger
𝑥2 = the probability that Odd puts out 2 fingers

If 𝑥1 ≥ 0 and 𝑥1 + 𝑥2 = 1, then (𝑥1 , 𝑥2 ) is a randomized or mixed strategy for Odd.


Any mixed strategy is a pure strategy if any of the 𝑥𝑖 equals 1.

Graphical solution of Odds and Evens:


Finding Odd’s optimal strategy:
Because 𝑥1 + 𝑥2 = 1, we know that 𝑥2 = 1 − 𝑥1 . Thus any mixed strategy may be written as (𝑥1 , 1 − 𝑥1 ), and it suffices to
determine the value of 𝑥1 .
If Even puts out 1 finger and Odd chooses the mixed strategy (𝑥1 , 1 − 𝑥1 ), then Odd’s expected reward is:
(−1)𝑥1 + (+1)(1 − 𝑥1 ) = 1 − 2𝑥1
Similarly, if Even puts out 2 fingers and Odd chooses the mixed strategy (𝑥1 , 1 − 𝑥1 ), Odd’s expected reward is:
(+1)𝑥1 + (−1)(1 − 𝑥1 ) = 2𝑥1 − 1
1
Solving 1 − 2𝑥1 = 2𝑥1 − 1, we obtain 𝑥1 = .
2
1 1 1 1
Thus, Odd should choose the mixed strategy ( , ). The reader should verify that against each of Even’s strategies ( , ),
2 2 2 2
yields an expected reward of zero. Thus, zero is a floor on Odd’s expected reward, because by choosing the mixed strategy
1 1
( , ), Odd can be sure that her expected reward will always be at least zero.
2 2

Finding Even’s optimal strategy:


I
I
1 1 1
1 − 2𝑦1 = 2𝑦1 − 1, or 𝑦1 = . This basic assumption implies that Even should choose the mixed strategy ( , ). For this
2 2 2
mixed strategy, Even’s expected loss is zero. We say that zero is a ceiling on Even’s expected loss, because by choosing the
1 1
mixed strategy ( , ), Even can ensure that her expected loss will not exceed zero.
2 2

More on the idea of value and optimal strategies:


For the game of Odds and Evens, the row player’s floor and the column player’s ceiling are equal. This is not a coincidence.
When each player is allowed to choose mixed strategies, the row player’s floor will always equal the column player’s ceiling.
We call the common value of the floor and ceiling the value of the game to the row player.
Any mixed strategy for the row player that guarantees that the row player gets an expected reward at least equal to the
value of the game is an optimal strategy for that row player. Similarly, any mixed strategy for the column player that
guarantees that the column player’s expected loss is no more than the value of the game is an optimal strategy for the
column player. Thus, we have shown that the value of the game is zero.

Example:
A fair coin is tossed, and the result is shown to player 1. Player 1 must then decide whether to pass or to bet. If player 1
passes, then he must pay player 2 $1. If player 1 bets, then player 2 may either fold or call the bet. If player 2 folds, then he
pays player 1 $ 1. If player 2 calls and the coin comes up heads, then she pays player 1 $2; if player 2 calls and the coin
comes up tails, then player 1 must pay her $2. Formulate this as a two-person zero-sum game. Then graphically determine
the value of the game and each player’s optimal strategy.

Solution:
Player 1’s strategy may be represented as follows:
- PP: pass on heads and pass on tails ;
- PB: pas on heads and bet on tails ;
- BP: bet on heads and pass on tails ;
- BB: bet on heads and bet on tails.
Player 2 simply has two strategies: call & fold.
For each choice of strategies, player 1’s expected reward is shown in the following table:

Player 1’s expected reward


PP vs call (0,5 x -1) + (0,5 x -1) = -1
PP vs fold (0,5 x -1) + (0,5 x -1) = -1
PB vs call (0,5 x -1) + (0,5 x -2) = -1,5
PB vs fold (0,5 x -1) + (0,5 x 1) = 0
BP vs call (0,5 x 2) + (0,5 x -1) = 0,5
BP vs fold (0,5 x 1) + (0,5 x -1) = 0
BB vs call (0,5 x 2) + (0,5 x -2) = 0
BB vs fold (0,5 x 1) + (0,5 x 1) = 1

This example may be described as a two-person zero-sum game represented by the following reward matrix:

Column player’s strategy


Row player’s strategy Call Fold Row Minimum
PP -1 -1 -1
PB -1,5 0 -1,5
BP 0,5 0 0
BB 0 1 0
Column Maximum 0,5 1

This game does not have a saddle point.


Observe that player 1 would be unwise ever to choose PP, because, for each strategy of player 2, player 1 could do better
than PP by choosing PB or BB. In general, a strategy 𝒊 for a given player is dominated by a strategy 𝒊′ if, for each of the
other player’s possible strategies, the given player does at least as well with strategy 𝑖′ and if for at least one of the other
player’s strategies, strategy 𝑖′ is superior to strategy 𝑖. A player may eliminate all dominated strategies form consideration.
After eliminating the dominated strategies PP and PB, we are left with the following game matrix:

Column player’s strategy


Row player’s strategy Call Fold Row Minimum
BP 0,5 0 0
BB 0 1 0
Column Maximum 0,5 1

We proceed with a graphical solution. Let:


𝑥1 = the probability that player 1 chooses BP
𝑥2 = 1 − 𝑥1 = the probability that player 1 chooses BB
𝑦1 = the probability that player 2 chooses call
𝑦2 = 1 − 𝑦1 = the probability that player 2 chooses fold

To determine the optimal strategy for player 1, observe that for any value of 𝑥1 , her expected reward against calling is:
0,5 𝑥1 + 0 (1 − 𝑥1 ) = 0,5 𝑥1
Against folding, player 1’s expected reward is:
0 𝑥1 + 1 (1 − 𝑥1 ) = 1 − 𝑥1
2
Thus, to maximize her expected reward, player 1 should choose the value of 𝑥1 which solves 0,5 𝑥1 = 1 − 𝑥1 or 𝑥1 =
3
1
(and 𝑥2 = ).
3

How should player 2 choose 𝑦1 ? For a given value of 𝑦1 , suppose player 1 chooses BP. Then her expected reward is:
0,5 𝑦1 + 0 (1 − 𝑦1 ) = 0,5 𝑦1
For a given value of 𝑦1 , suppose player 1 chooses BB. Then her expected reward is:
0 𝑦1 + 1 (1 − 𝑦1 ) = 1 − 𝑦1
2 1
Thus, player 2 should choose the value of 𝑦1 which solves 0,5 𝑦1 = 1 − 𝑦1 or 𝑦1 = (and 𝑦2 = ).
3 3

2 1
You should check that no matter what player 1 does, player 2’s mixed strategy ( , ) ensures that player 1 earns an
3 3
1 2 1 2 1
expected reward of = ( 𝑥 0,5 + 𝑥 0) 𝑜𝑓 ( 𝑥 0 + 𝑥 1).
3 3 3 3 3

2 1
In summary, the value of the game is 1/3 to player 1; the optimal mixed strategy for player 1 is ( , ) and the optimal
3 3
2 1
strategy for player 2 is also ( , ).
3 3

Problems:
(2) Player 1 writes an integer between 1 and 20 on a slip of paper. Without showing this slip of paper to player 2,
player 1 tells player 2 what he has written. Player 1 may lie or tell the truth. If caught in a lie, player 1 must pay
player 2 $10; if falsely accused of lying, player 1 collects $5 from player 2. If player 1 tells the truth and player 2
guesses that player 1 has told the truth, then player 1 must pay $1 to player 2. If player 1 lies and player 2 does
not guess that player 1 has lied, player 1 wins $5 from player 2. Determine the value of this game and each
player’s optimal strategy.

Answer:

Column player’s strategy


Row player’s strategy Guess he lied Guess he told the truth Row Minimum
Lie -10 5 -10
Tell the truth 5 -1 -1
Column Maximum 5 5

This game has no saddle point.

Stel:
- 𝑥1 = de kans dat speler 1 liegt
- 𝑥2 = 1 − 𝑥1 = de kans dat speler 1 de waarheid verteld
Wat is de verwachte opbrengst voor speler 1?
Als speler 2 hem beschuldigt van liegen, dan is de verwachte waarde voor speler 1:
−10 𝑥1 + 5(1 − 𝑥1 ) = −15 𝑥1 + 5
Als speler 2 gelooft dat hij de waarheid vertelt, dan is de verwachte opbrengst voor speler 1:
5 𝑥1 − (1 − 𝑥1 ) = 6 𝑥1 − 1
Het LP ziet er als volgt uit:
max v
s.t. 𝑣 ≤ −15 𝑥1 + 5
𝑣 ≤ 6 𝑥1 − 1
0 ≤ 𝑥1 ≤ 1

Nu zoeken we het snijpunt van de twee curves:


−15 𝑥1 + 5 = 6 𝑥1 − 1
2
𝑥1 =
7
De waarde van het spel bedraagt 5/7, vanwege symmetrie geld voor speler 2:
2
𝑦1 =
7

(8) KUL is about to play UA for the tennis championship. The KUL team has 2 players (A & B) and the UA team has 3
players (X, Y & Z). The following facts are known about the players’ relative abilities:
- X will always beat B ;
- Y will always beat A ;
- A will always beat Z.
In another match, each player has a 50% chance of winning. Before the game, the KUL coach must determine who
will play first singles and who will play second singles. The UA coach must also determine who will play first and
second singles. Assume that each coach wants to maximize the expected number of singles matches won.
Use game theory to determine optimal strategies for each coach and the value of the game to each of team.

Answer:

Column player’s strategy = UA


Row player’s XY YX XZ ZX YZ ZY Row Minimum
strategy = KUL
AB 1 0 1 1 0,5 1,5 0
BA 0 1 1 1 1,5 0,5 0
Column 1 1 1 1 1,5 1,5
Maximum

Elimineren we de gedomineerde strategieën, houden we volgende matrix over:

Column player’s strategy = UA


Row player’s XY YX Row Minimum
strategy = KUL
AB 1 0 0
BA 0 1 0
Column 1 1
Maximum
Stel:
- 𝑥1 = de kans dat speler 1 AB speelt
- 𝑥2 = 1 − 𝑥1 = de kans dat speler 1 BA speelt
Wat is de verwachte opbrengst voor speler 1?
Als speler 2 voor XY kiest, dan is de verwachte opbrengst voor speler 1:
𝑥1 + 0(1 − 𝑥1 ) = 𝑥1
Als speler 2 voor YX kiest, dan is de verwachte opbrengst voor speler 1:
0 𝑥1 + (1 − 𝑥1 ) = −𝑥1 + 1
We stellen de vergelijkingen aan elkaar gelijk en zoeken 𝑥1 :
𝑥1 = −𝑥1 + 1
1
𝑥1 =
2
1
De opbrengst voor de KUL is gelijk aan .
2
(10) Consider the following simplified version of football. On each play the offense chooses to run or pass. At the same
time, the defense chooses to play a run defense or pass defense. The number of yards gained on each play is
determined by the reward matrix. The offense’s goal is to maximize the average yards gained per play.

Defense
Offense Run Pass
Run 1 8
Pass 10 0

a) Use problem 9 to show that the offense should run 10/17 of the time.
b) Suppose that the effectiveness of a pass against the run defense improves. Use the results of Problem 9
to show that the offense should pass less. Can you give an explanation for this strange phenomenon?

Answer:
a) Uit de vorige opgave blijkt dat de offense met kans 10/17 voor run zou moeten kiezen. En dus met kans
7/17 voor pass.
b) Als de entry 10 in deze pay-off matrix groter wordt, bijvoorbeeld 100, dan veranderen die kansen tot
100/107 en 7/107. Dus hoewel je performantie verbetert wanneer je pass speelt tegen run, ga je de
optie pass veel minder vaak kiezen. Waarom? Omdat speler 2 dit ook ziet, en dus veel minder vaak run
kiest. Als hij voor pass kiest is het voor jou beter om voor run te kiezen.

(11) Use the idea of dominated strategies to determine optimal strategies for the following reward matrix:
-5 -10 -1 -10 2 -1
-1 2 -10 7 -5 20
2 7 -5 -10 -10 7
7 20 -1 -1 -1 2
20 7 -10 7 -1 -10

→ Kolommen minimaliseren, rijen maximaliseren.


Kolom 3 domineert kolom 6.
Rij 4 domineert rij 3.
Kolom 3 domineert kolom 5.
Rij 4 domineert rij 1.
Kolom 3 domineert kolom 4.
Rij 5 domineert rij 2.
Kolom 3 domineert kolom 1 en 2.
Rij 4 domineert rij 5.

De waarde van het spel is -1.

14.3 Linear programming and zero-sum games


Linear programming can be used to find the value and optimal strategies for any two-person zero-sum game.
Example:
Two players simultaneously utter one of the three words: stone, paper, scissors and show corresponding hand signs. If both
players utter the same word, then the game is a draw. Otherwise, one player wins $1 from the other player according to
the following:
- scissors defeats paper
- paper defeats stone
- stone defeats scissors
Find the value and optimal strategies for this two-person zero-sum game.
To determine optimal mixed strategies for the row and column player, define:
- 𝑥1 = the probability that the row player chooses stone
- 𝑥2 = the probability that the row player chooses paper
- 𝑥3 = the probability that the row player chooses scissors
- 𝑦1 = the probability that the column player chooses stone
- 𝑦2 = the probability that the column player chooses paper
- 𝑦3 = the probability that the column player chooses scissors
Reward matrix:
Column player’s strategy = UA
Row player’s strategy Stone Paper Scissors Row Minimum
Stone 0 -1 +1 -1
Paper +1 0 -1 -1
Scissors -1 +1 0 -1
Column Maximum +1 +1 +1

The row player’s LP:


If the row player chooses the mixed strategy (𝑥1 , 𝑥2 , 𝑥3 ), then her expected reward against each of the column player’s
strategies is as shown:

Column player chooses Row player’s expected reward if row player chooses (𝑥1 , 𝑥2 , 𝑥3 ),
Stone 𝑥2 − 𝑥3
Paper −𝑥1 + 𝑥3
Scissors 𝑥1 − 𝑥2

Suppose the row player chooses the mixed strategy (𝑥1 , 𝑥2 , 𝑥3 ). By the basic assumption, the column player will choose a
strategy that makes the row player’s expected reward equal to min(𝑥2 − 𝑥3 ; −𝑥1 + 𝑥3 ; 𝑥1 − 𝑥2 ). Then the row player
should choose (𝑥1 , 𝑥2 , 𝑥3 ) to make min(𝑥2 − 𝑥3 ; −𝑥1 + 𝑥3 ; 𝑥1 − 𝑥2 ) as large as possible.

The row player’s optimal strategy can be found by solving the following LP:
max z=v
s.t. 𝑣 ≤ 𝑥2 − 𝑥3
𝑣 ≤ −𝑥1 + 𝑥3
𝑣 ≤ 𝑥1 − 𝑥2
𝑥1 + 𝑥2 + 𝑥3 = 1
𝑥1 , 𝑥2 , 𝑥3 ≥ 0

Note that there is a constraint for each of the column player’s strategies. The value of v in the optimal solution is the row
player’s floor, because no matter what strategy is chosen by the column player, the row player is sure to receive an
expected reward of at least v.

The column player’s LP:


Suppose the column player has chosen the mixed strategy (𝑦1 , 𝑦2 , 𝑦3 ). For each of the row player’s strategies, we may
compute the row player’s expected reward if the column player chooses (𝑦1 , 𝑦2 , 𝑦3 ):

Row player chooses Row player’s expected reward if column player chooses (𝑦1 , 𝑦2 , 𝑦3 ),
Stone −𝑦2 + 𝑦3
Paper 𝑦1 − 𝑦3
Scissors −𝑦1 + 𝑦2

The row player is assumed to know (𝑦1 , 𝑦2 , 𝑦3 ), the row player will choose a strategy to ensure that she obtains an
expected reward of max(−𝑦2 + 𝑦3 ; 𝑦1 − 𝑦3 ; −𝑦1 + 𝑦2 ). Thus the column player should choose (𝑦1 , 𝑦2 , 𝑦3 ) to make
max(−𝑦2 + 𝑦3 ; 𝑦1 − 𝑦3 ; −𝑦1 + 𝑦2 ) as small as possible.

The column player may find his optimal strategy by solving the following LP:
min z=w
s.t. 𝑤 ≥ −𝑦2 + 𝑦3
𝑤 ≥ 𝑦1 − 𝑦3
𝑤 ≥ −𝑦1 + 𝑦2
𝑦1 + 𝑦2 + 𝑦3 = 1
𝑦1 , 𝑦2 , 𝑦3 ≥ 0

Observe that the LP contains a constraint corresponding to each of the row player’s strategies. The value of w in the
optimal solution is the column’s player ceiling on the expected losses, because by choosing a mixed strategy, the column
player can ensure that his expected losses will be at most w.

Relation between the row and the column player’s LPs:


The dual of the row player’s LP is the column player’s LP.
How to solve the row and the column player’s LPs:
Add 𝑐 = |𝑚𝑜𝑠𝑡 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑒𝑛𝑡𝑟𝑦 𝑖𝑛 𝑟𝑒𝑤𝑎𝑟𝑑 𝑚𝑎𝑡𝑟𝑖𝑥| to each element of the game’s reward matrix A.

Example stone, paper, scissors:


Add 𝑐 = |−1| = 1, this yields the following constant-sum game:

Column player’s strategy


Row player’s strategy Stone Paper Scissors
Stone 1 0 2
Paper 2 1 0
Scissors 0 2 1

The row player’s LP is as follows:


max v‘
s.t. 𝑣 ′ ≤ 𝑥1 + 2𝑥2
𝑣′ ≤ 𝑥2 + 2𝑥3
𝑣 ′ ≤ 2𝑥1 + 𝑥3
𝑥1 + 𝑥2 + 𝑥3 = 1
𝑥1 , 𝑥2 , 𝑥3 , 𝒗′ ≥ 0

Substituting 𝑥3 = 1 − 𝑥1 − 𝑥2 transforms the row player’s LP into the following LP:


max v‘
s.t. 𝑣 ′ − 𝑥1 − 2𝑥2 ≤ 0 (a)
𝑣 ′ + 2𝑥1 + 𝑥2 ≤ 2 (b)
𝑣 ′ − 𝑥1 + 𝑥2 ≤ 1 (c)
𝑥1 , 𝑥2 , 𝑣′ ≥ 0

The column player’s LP is as follows:


min w’
s.t. 𝑤′ ≥ 𝑦1 + 2𝑦3
𝑤 ′ ≥ 2𝑦1 + 𝑦2
𝑤′ ≥ 2𝑦2 + 𝑦3
𝑦1 + 𝑦2 + 𝑦3 = 1
𝑦1 , 𝑦2 , 𝑦3 , 𝒘′ ≥ 0

Substituting 𝑦3 = 1 − 𝑦1 − 𝑦2 transforms the column player’s LP into the following LP:


min w’
s.t. 𝑤 ′ + 𝑦1 + 2𝑦2 ≥ 2 (d)
𝑤 ′ − 2𝑦1 − 𝑦2 ≥ 0 (e)
𝑤 ′ + 𝑦1 − 𝑦2 ≥ 1
𝑦1 , 𝑦2 , 𝑤′ ≥ 0

Stone, paper, scissors appears to be a fair game, so we might conjecture that v = w = 0 or v’ = w’ = 0 + 1 = 1.


1 1 1
Solving (a) and (b) simultaneously (v’ = 1) yields 𝑥1 = 𝑥2 = . Because 𝑥1 = , 𝑥2 = and w’ = 1 satisfies (c), we have
3 3 3
1
obtained a feasible solution to the row player’s LP. Solving (d) and (e) simultaneously (w’ = 1) yields 𝑦1 = 𝑦2 = and w’=1.
3
This solution is dual feasible. Thus we have found a primal feasible and a dual feasible solution.
Value of stone, paper, scissors: 𝑣’ – 1 = 0
1 1 1
Optimal strategy for the row player: ( , , )
3 3 3
1 1 1
Optimal strategy for the column player: ( , , )
3 3 3
Other examples: page 822!

Problems:
(1) A soldier can hide in one of 5 foxholes. A gunner has a single shot and may fire at any of the four spots A, B, C, D.
A shot will kill a soldier if the soldier is in a foxhole adjacent to the spot where the shot was fired. (A shot fired in
B, will kill the soldier if he is in foxhole 2 or 3.) Suppose the gunner receives a reward of 1 if the soldier is killed
and a reward of 0 if the soldier survives the shot.
c) We are given that an optimal strategy for the soldier is to hide 1/3 of the time in foxholes 1,3 and 5. We
are also told that for the gunner, an optimal strategy is to shoot 1/3 of the time at A, 1/3 of the time at
D and 1/3 of the time at B or C. Determine the value of the game to the gunner.
d) Suppose the soldier chooses the following non-optimal strategy: hide 1/2 of the time in 1, 1/4 of the
time in 3 and 1/4 of the time in 5. Find a strategy for the gunner that ensures that his expected reward
will exceed the value of the game.
e) Write down each player’s LP and verify that the strategies given in c) are optimal strategies.

1 A 2 B 3 C 4 D 5

Solution:
c) Matrix:
1 (1/3) 3 (1/3) 5 (1/3)
A (1/3) 1 0 0
BC (1/3) 0 1 0
D (1/3) 0 0 1

The value of the game to the gunner:


(1 x 1/9) + (0 x 1/9) + (0 x 1/9) + (0 x 1/9) + (1 x 1/9) + (0 x 1/9) + (0 x 1/9) + (0 x 1/9) + (1 x 1/9) = 1/3

d) Matrix:
1 (1/2) 3 (1/4) 5 (1/4)
A 1 0 0
BC 0 1 0
D 0 0 1

(1, 0, 0) gives the gunner an expected reward of: (1 x 1/2) + … = 1/2 > 1/3

e) Gunner’s LP:
max v
s.t. 𝑣 ≤ 𝑥1
𝑣 ≤ 𝑥2
𝑣 ≤ 𝑥3
𝑥1 + 𝑥2 + 𝑥3 = 1
𝑥1 , 𝑥2 , 𝑥3 ≥ 0

Soldier’s LP:
min w
s.t. 𝑤 ≥ 𝑦1
𝑤 ≥ 𝑦2
𝑤 ≥ 𝑦3
𝑦1 + 𝑦2 + 𝑦3 = 1
𝑦1 , 𝑦2 , 𝑦3 ≥ 0

Elk van deze oplossingen is toelaatbaar en de optimale warden komen overeen. Dus zwakke dualiteit
impliceert dat dit optimale oplossingen zijn.

14.4 Two-person non-constant-sum games


Two-person non-constant sum: cooperation between the players is not allowed.
Example Prisoner’s Dilemma:
‘If only one of you confesses and testifies against the other, the person who confesses will go free while the person who
does not confess will surely be convicted and given a 20-year jail sentence. If both of you confess, then you will both be
convicted and sent to prison for 5 years. Finally, if neither of you confesses, I can convict ou both of a misdemeanor and you
will each get 1 year in prison.’

Reward matrix (reward prisoner 1, reward prisoner 2):


Prisoner 2
Prisoner 1 Confess Don’t confess
Confess (-5,-5) (0,-20)
Don’t confess (-20,0) (-1,-1)

Nash equilibrium point: point where neither player can benefit from a unilateral change in strategy. (-5,-5)
Nash evenwicht: situatie waarbij geen enkele speler zijn strategie wenst te wijzigen, gegeven de verwachte strategie van de andere speler.
More formally, a Prisoner’s Dilemma game may be described as in the following table:
Player 2
Player 1 NC C
NC (P,P) (T,S)
C (S,T) (R,R)

where:
- NC = non-cooperative action
- C = cooperative action
- P = punishment for not cooperating
- S = payoff to person who is double-crossed
- R = reward for cooperating if both players cooperate
- T = temptation for double-crossing opponent

For a game to represent a Prisoner’s Dilemma, we require that: T > R > P > S.
(P,P) is an equilibrium point if P > S.
(R,R) not to be an equilibrium requires T > R.

Example Advertising prisoner’s Dilemma Game:


Competing restaurants HD King & HD Chef are attempting to determine their advertising budgets for next year.
The two will have combined sales of $240 million and can spend either $6 or $10 million on advertising.
If one restaurant spends more than the other, then the restaurant that spends more money will have sales of $190 million.
If both companies spend the same amount on advertising, then they will have equal sales.
Each dollar of sales yields 10 cents of profit.
Suppose each restaurant is interested in maximizing (contribution of sales) – (advertising costs).
Find and equilibrium point for this game.

Reward matrix:
HD chef
HD King $10 $6
$10 (2,2) (9,-1)
$6 (-1,9) (6,6)
[ (240/10) – (10+10) ] / 2 = 2

(2,2) is an equilibrium point. Although both restaurants are better off at (6,6) than at (2,2), (6,6) is unstable because either
restaurant may gain by changing its strategy.

Example Chicken game:


Max drives toward James on a deserted road. Each person has two strategies: swerve or don’t swerve.
Reward matrix:
James
Max Swerve Don’t swerve
Swerve (0,0) (-5,5)
Don’t swerve (5,-5) (-100,-100)

For both (5,-5) and (-5,5) neither player can gain by an unilateral change in strategy. (5,-5) and (-5,5) are equilibrium points.

Example A game with no equilibrium in pure strategies:


Reward matrix:
Player 2
Player 1 Strategy 1 Strategy 2
Strategy 1 (2,-1) (-2,1)
Strategy 2 (-2,1) (2,-1)

For this game, the reader should verify that there is no equilibrium in pure strategies and also that each player’s choice of
the mixed strategy (0,5 ; 0,5) is an equilibrium because neither player can benefit from a unilateral change in strategy.

Problems:
(4) Given that each player’s goal is to maximize her expected reward, show that each player’s choice of the mixed
strategy (0,5 ; 0,5) is an equilibrium point.
Game:

Player 2
Player 1 Strategy 1 Strategy 2
Strategy 1 (2,-1) (-2,1)
Strategy 2 (-2,1) (2,-1)

Solution:
Is (0,5 ; 0,5) voor elk der spelers een equilibrium?
Beschouw de opbrengst voor de kolomspeler:
Kolom 1: −𝑥1 + 𝑥2 = 1 − 2𝑥1
Kolom 2: 𝑥1 − 𝑥2 = 2𝑥1 − 1
Dus als 𝑥1 ≠ 0,5, dan zal speler 2 een voorkeur hebben voor een bepaalde kolom en die altijd spelen. Dan is er
geen equilibrium, want gegeven die keuze van speler 2, kan speler 1 zich verbeteren.

14.5 Introduction to n-person game theory


N-person game: any game with n players. An n-person game is specified by the game’s characteristic function.
Characteristic function v: gives the amount v(S) that the members of S can be sure of receiving if they act together and form
a coalition.

Example The drug game:


Willie has invented a new drug. He cannot manufacture the drug himself, but he can sell the drug’s formula to company 2
or company 3. The lucky company will split a $1 million profit with Willie. Find the characteristic function.
Solution:
v({ }) = v({1}) = v({2}) = v({3}) = v({2,3})
v({1,2}) = v({1,3}) = v({1,2,3}) = $ 1 000 000

Example The garbage game:


Each of four property owners has one bag of garbage and must dump it on somebody’s property. If b bags of garbage are
dumped on the coalition of property owners, then the coalition receives a reward of –b. Find the characteristic function.
Solution:
The best that the members of any coalition can do is to dump all of their garbage on property of owners who are not in S.
V({S}) = -(4 – ISI) (if ISI < 4) → ISI is the number of players in S
V({1,2,3,4}) = -4 (if ISI = 4)

Example The land development game:


Player 1 owns a piece of land and values the land at $10 000. Player 2 is a subdivider who can develop the land and increase
its worth to $20 000. Player 3 is a subdivider who can develop the land an increase its worth to $30 000. There are no other
prospective buyers. Find the characteristic function.
Solution:
v({1}) = $10 000 v({ }) = v({2}) = v({3}) = $0 v({1,2}) = $20 000
v({1,3}) = $30 000 v({2,3}) = $0 v({1,2,3}) = $30 000

Consider two subsets of sets A and B such that A and B have no players in common. Then for each of our examples, the
characteristic function must satisfy the following inequality:
𝑣(𝐴 ∪ 𝐵) ≥ 𝑣(𝐴) + 𝑣(𝐵)
This property of the characteristic function is called superadditivity.

A solution concept should indicate the reward that each player will receive. More formally, let 𝑥 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 } be the
reward vector such that player 𝑖 receives a reward 𝑥𝑖 . The reward vector is called an imputation if it satisfies:
𝑖=𝑛

𝑣(𝑁) = ∑ 𝑥𝑖
𝑖=1
𝑥𝑖 ≥ 𝑣({𝑖})
14.6 The core of an n-person game
Given an imputation 𝑥 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 }, we say that the imputation 𝑦 = {𝑦1 , 𝑦2 , … , 𝑦𝑛 } dominates imputation x if there
exists a coalition S such that:
∑ 𝑦𝑖 ≤ 𝑣(𝑆)
𝑖∈𝑆
𝑦𝑖 > 𝑥𝑖
Thus, each member of S prefers y over x and the members of S can ensure they receive the amounts 𝑦𝑖 . Hence, the vector x
should not be considered a possible solution.

También podría gustarte