Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Problems:
(1) For the maximin, maxmax and minimax regret criteria, determine Pizza King’s choice of advertising campaign.
Noble Greek
Pizza King Small Medium Large
Small 6000 5000 2000
Medium 5000 6000 1000
Large 9000 6000 0
Answer:
Noble Greek
Pizza King Small Medium Large
Small 9000 – 6000 = 3000 6000 – 5000 = 1000 0
Medium 9000 – 5000 = 4000 0 2000 – 1000 = 1000
Large 0 0 2000 – 0 = 2000
Maximin: small
Maximax: large
Minimax regret: large
(2) Sodaco estimates that the annual demand for Chocovan has the following mass function:
- P(D=30 000) = 0,3
- P(D=50 000) = 0,4
- P(D=80 000)= 0,3
Each case of Chocovan sells for $5 and incurs a variable cost of $3. It costs $800 000 to build a plant to produce.
Assume that if $1 is received every year, this is equivalent to receiving $10 at the present time.
Considering the reward for each action and state of the world to be in terms of NPV, use each decision criterion to
determine whether Sadaco should build the plant.
Answer:
Demand
Build/not build 30 000 50 000 80 000
Build -200 000 200 000 800 000
Not build 0 0 0
Demand
Build/not build 30 000 50 000 80 000
Build -200 000 0 0
Not build 0 -200 000 -800 000
(4) Pizza King believes that Noble Greek’s price is a random variable D having the following mass function:
- P(D=6) = 0,25
- P(D=8) = 0,5
- P(D=10) = 0,25
If Pizza King charges p(1) and Noble Greek charges p(2), Pizza King will sell 100 + 25[p(2)-p(1)] pizzas. It costs Pizza
king $4 to make a pizza. Pizza King is considering charging $5, $6, $7, $8, $9 for a pizza. Use each decision criterion
to determine the price that Pizza King should charge.
Answer:
Noble Greek price
Pizza King price 6 8 10
5 125 175 225
6 200 300 400
7 225 375 525
8 200 400 600
9 125 375 625
Maximin: 7
Maximax: 9
Minimax regret: 8
Expected value criterion: 8
- $5: (0,25x125) + (0,5x175) + (0,25x225) = 175
- $6: (0,25x200) + (0,5x300) + (0,25x400) = 300
- $7: (0,25x225) + (0,5x375) + (0,25x525) = 375
- $8: (0,25x200) + (0,5x400) + (0,25x600) = 400
- $9: (0,25x125) + (0,5x375) + (0,25x625) = 375
(5) Alden believes that Forbes’s bid is a random variable B with the following mass function:
- P(B=6000) = 0,4
- P(B=8000) = 0,3
- P(B=11000) = 0,3
It will cost Alden $6000 to complete the project. Use each of the decision criteria to determine Alden’s bid.
Assume that in case of a tie, Alden wins the bidding.
Op het eerste gezicht mag Alden het project aanbieden voor elke prijs P; een moment van reflectie leert ons echter
dat sommige prijzen gedomineerd worden door andere prijzen. Bijvoorbeeld, het heeft geen zin om P=7000 te
kiezen; deze keuze wordt gedomineerd door P=8000. (Waarom?)
De enige relevante keuze’s zijn P = 6000, 8000 of 11 000.
Answer:
Forbes’s bid
Alden’s bid 6000 8000 11 000
6000 0 0 0
8000 0 2000 2000
11 000 0 0 5000
Forbes’s bid
Alden’s bid 6000 8000 11 000
6000 0 2000 5000
8000 0 0 3000
11 000 0 2000 0
Maximin: /
Maximax: 11 000
Minimax regret: 11 000
Expected value criterion: 11 00
- 6000: 0
- 8000: 1200
- 11 000: 1500
13.2 Utility theory
𝐿1 𝑝𝐿2 = the person prefers 𝐿1 .
𝐿1 𝑖𝐿2 = = the person is indifferent between choosing 𝐿1 and 𝐿2 . 𝐿1 and 𝐿2 are equivalent lotteries.
3. For a given lottery L, define the expected utility of the lottery L by:
𝑖=𝑛
4. In choosing between the lotteries we simply chose the lottery with the largest expected utility.
Example:
Suppose we ask to rank the following lotteries:
0,5 $0
0,98 $500
1. $30 000 and $ -10 000
2. Suppose that for 𝑟1 = $10 000, you are indifferent between:
3. In our example:
𝐸(𝑈 𝑓𝑜𝑟 𝐿1 ) = 1 x 0,9 = 0,9
Answer:
a) (1 x 240) + (0,75 x -1000) + (0,25 x 0) < (0,25 x 1000) + (0,75 x 0) + (1 x -750) of -510 < -500
You prefer 2+3 over 1+4.
b) Framing: people often set their utility function from the standpoint of a frame from which they view the
current situation. Most people’s utility functions treat a loss of a given value as being more important
than a gain of an identical value. They exhibit risk-averse behavior when the outcomes are expressed as
gains and risk-seeking behavior when the outcomes are expressed as losses.
Solution:
Decision fork: represents a point in time when Colaco has to make a decision. Each branch emanating from a decision fork
represents a possible decision. For example, Colaco must determine whether or not to test market Chocola:
Event fork: is drawn when outside forces determine which of several random events will occur. Each branch represents a
possible outcome, and the number on each branch represents the probability that the event will occur. For example, if
Colaco decides to test market Chocola, the company faces the following event fork when observing the results of the test
market study:
Terminal branch: no forks emanate from the branch. For example: the branche indicating National success.
To determine the decisions that will maximize Colaco’s expected final asset position, we work backward from right to left.
At each event fork, we calculate the expected final asset position and enter it in the event fork. At each decision fork, we
denote by II (Excel: >>>) the decision that maximizes the expected final asset position and enter the expected final asset
position associated with that decision in the decision fork. We continue working backward in this fashion until we reach the
beginning of the tree. Then the optimal sequence of decisions can be obtained by following the II (or >>>).
We begin by determining the expected final asset positions for the following 3 event forks:
(1) market nationally after local success: (0,85 x 420 000) + (0,15 x 20 000) = 360 000
(2) market nationally after local failure: (0,10 x 420 000) + (0,90 x 20 000) = 60 000
(3) market nationally after don’t test market: (0,55 x 450 000) + (0,45 x 50 000) = 270 000
We then evaluate 3 decision forks:
(1) decision after Local success: 360 000 > 120 000 so we enter an expected final asset position of 360 000
(2) decision after Local failure: 120 000 > 60 000 so we enter an expected final asset position of 120 000
(3) decision after Don’t test market: 270 000 > 150 000 so we enter an expected final asset position of 270 000
We then evaluate the event fork emanating from the test market decision: (0,6 x 360 000) + (0,4 x 120 000) = 264 000.
All that remains is to determine the correct decision at the decision fork test market versus don’t test market:
270 000 > 264 000 so we enter an expected final asset position of 270 000.
We have now reached the beginning of the tree and have found that Colaco’s optimal decision is:
don’t test – market nationally.
9% want eenmaal we gekozen hebben voor de markt te testen zullen we in geval van succes steeds kiezen voor het product op de markt te
brengen (360 000 > 120 000) en in geval van pech steeds kiezen voor het product niet op de markt te brengen (120 000 > 60 000).
To illustrate how risk aversion may be incorporated into decision tree analysis, suppose:
- U(450 000) = 1
- U(420 000) = 0,99
- U(150 000) = 0,48
- U(120 000) =0,40
- U(50 000) = 0,19
- U(20 000) = 0
To determine Colaco’s optimal decisions, simply replace each final asset position x, with its utility U(x).
Then at each event fork, compute the expected utility of Colaco’s final asset position, and at each decision fork, choose the
branch having the largest expected utility.
We have found that Colaco’s optimal decision is to begin by test marketing. If a local success is observed, then Colaco
should market Chocola nationally; if a local failure is observed, then Colaco should not market Chocola nationally. This
optimal strategy yields only a 9% chance that Colaco will have a final asset position of 20 000.
Suppose U(226 000) = 0,665, this means the company considers the current situation equivalent to a certain asset position
of $226 000. Thus, if somebody offered to pay more than 226 000 – 150 000 = $76 000 to by the rights to Chocola, Colaco
should take the offer. This is because receiving more than $76 000 would bring Colaco’s asset position to more than
150 000 + 76 000 = $226 000, and this situation has a higher expected utitlity than 0,665.
For the Colaco example, we find EVWPI = 315 000, then EVPI = 315 000 – 270 000 = 45 000. Thus, a perfect test marketing
study would be worth $45 000. EVPI is a useful upper bound on the value of sample or test market information.
Example:
An art dealer’s client is willing to buy the painting Sunplant at $50 000. The dealer can buy the painting today for $40 000 or
can wait a day and buy the painting tomorrow for $30 000. The dealer may also wait another day and buy the painting for
$26 000. At the end of the third day, the painting will no longer be for sale. Each day, there is a 60% probability that the
painting will be sold. What strategy maximizes the dealer’s profit?
Solution:
Problems:
(4) Nitro is developing a new fertilizer. If Nitro markets the product and it is successful, the company will earn a
$50 000 profit; if it is unsuccessful, the company will lose $35 000. In the past, similar products have been
successful 60% of the time. At a cost of $5000, the effectiveness of the new fertilizer can be tested. If the test
result is favorable, there is an 80% chance that the fertilizer will be successful. If the test is unfavorable, there is
only a 30% chance that the fertilizer will be successful. There is a 60% chance of a favorable test result and a 40%
chance of an unfavorable test result. Determine Nitro’s optimal strategy. Also find EVSI and EVPI.
Answer:
Optimal strategy: don’t test & market the product.
EVSI = EVWSI – EVWOI = 19 800 – 16 000 = 3800
EVWSI = 19 800
EVWOI = 16 000
EVPI = EVWPI – EVWOI = 30 000 – 16 000 = 14 000
EVWPI = 30 000
(16) You have just been chosen to appear on Hoosier Millionaire. The rules are as follows: there are 4 hidden cards.
One says ‘stop’ and the other three have dollars amounts of $150 000, $200 000 and $1 000 000. You get to
choose a card. If the card sys ‘stop’, you win no money. At any time you may quit and keep the largest amount of
money that has appeared on any card you have chosen, or continue. If you continue and choose the stop card,
you win no money.
a) If your goal is to maximize your expected payoff, what strategy would you follow?
b) My utility function for an increase in cash satisfies:
- U(0) = 0
- U(40 000) = 0,25
- U(120 000) = 0,50
- U(400 000) = 0,75
- U(1 000 000) = 1
After drawing a curve through these points, determine a strategy that maximizes my expected utility.
→ Oplossing PWP
13.5 Bayes’ rule and decision trees
Prior probabilities = estimates of the probabilities of each state of the world. 𝑝(𝑠)
𝑝(𝑁𝑆) = 0,55
𝑝(𝑁𝐹) = 0,45
Posterior probabilities = probabilities that give new values for the probability of each state of the world. 𝑝(𝑠|𝑜)
In the Colaco example, the posterior probabilities were given to be:
𝑝(𝑁𝑆|𝐿𝑆) = 0,85
𝑝(𝑁𝐹|𝐿𝑆) = 0,15
𝑝(𝑁𝑆|𝐿𝐹) = 0,10
𝑝(𝑁𝐹|𝐿𝐹) = 0,90
Likelihoods = likelihoods give the probability of observing each experimental outcome. 𝑝(𝑜|𝑠)
𝑝(𝐿𝑆|𝑁𝑆) = 51/55
𝑝(𝐿𝐹|𝑁𝑆) = 4/55
𝑝(𝐿𝑆|𝑁𝐹) = 9/45
𝑝(𝐿𝐹|𝑁𝐹) = 36/45
With the help of Bayes’ rule we can use the prior probabilities and likelihoods to determine the needed posterior
probabilities. In summary, to find posterior probabilities, we go through the following three-step process:
(1) Determine the joint probabilities of the form 𝑝(𝑠 ∩ 𝑜) by multiplying the prior probability 𝑝(𝑠) times the
likelihood 𝑝(𝑜|𝑠).
(2) Determine the probabilities of each experimental outcome 𝑝(𝑜) by summing up all joint probabilities of the form
𝑝(𝑠 ∩ 𝑜).
(3) Determine each posterior probability 𝑝(𝑠|𝑜) by dividing the joint probability 𝑝(𝑠 ∩ 𝑜) by the probability of the
experimental outcome 𝑜 𝑝(𝑜).
Example:
FCC manufactures memory chips in lots of 10 chips. From past experience, FCC knows that 80% of all lots contain 10%
defective chips and 20% of all lots contain 50% defective chips. If a good batch of chips is sent on to the next stage of
production, processing costs of $1000 are incurred, and if a bad batch is sent to the next stage of production, processing
costs of $4000 are incurred. FCC also has the alternative of reworking a batch at a cost of $1000. A reworked batch is sure
to be a good batch. Alternatively, for a cost of $100, FCC can test one chip from each batch in an attempt to determine
whether the batch is defective. Determine how FCC can minimize the expected total cost per batch. Alsoompute EVSI, EVPI.
Solution:
We will multiply costs by -1 and work with maximizing –(total cost). This enables us to use EVSI and EVPI formulas.
There are 2 states of the world:
- G = batch is good
- B = batch is bad
We are given the following prior probabilities:
- p(G) = 0,80
- p(B) = 0,20
FCC has the option of performing an experiment: inspecting one chip per batch. Possible outcomes of this experiment:
- D = defective chip is observed
- ND = non-defective chip is observed
We are given the following likelihoods:
- 𝑝(𝐷|𝐺) = 0,10
- 𝑝(𝑁𝐷|𝐺) = 0,90
- 𝑝(𝐷|𝐵) = 0,50
- 𝑝(𝑁𝐷|𝐵) = 0,50
To complete the decision tree, we need to determine the posterior probabilities.
(1) We begin by computing joint probabilities:
- 𝑝(𝐷 ∩ 𝐺) = 𝑝(𝐺)𝑝(𝐷|𝐺) = 0,8 𝑥 0,1 = 0,08
- 𝑝(𝐷 ∩ 𝐵) = 𝑝(𝐵)𝑝(𝐷|𝐵) = 0,2 𝑥 0,5 = 0,1
- 𝑝(𝑁𝐷 ∩ 𝐺) = 𝑝(𝐺)𝑝(𝑁𝐷|𝐺) = 0,8 𝑥 0,9 = 0,72
- 𝑝(𝑁𝐷 ∩ 𝐵) = 𝑝(𝐵)𝑝(𝑁𝐷|𝐵) = 0,2 𝑥 0,5 = 0,1
(2) We then compute the probability of each experimental outcome:
- 𝑝(𝐷) = 𝑝(𝐷 ∩ 𝐺) + 𝑝(𝐷 ∩ 𝐵) = 0,08 + 0,1 = 0,18
- 𝑝(𝑁𝐷) = 𝑝(𝑁𝐷 ∩ 𝐺) + 𝑝(𝑁𝐷 ∩ 𝐵) = 0,72 + 0,1 = 0,82
(3) Then we use Bayes’ rule to determine the required posterior probabilities:
𝑝(𝐷 ∩ 𝐵) 0,10
𝑝(𝐵|𝐷) = = = 0,56
𝑝(𝐷) 0,18
𝑝(𝐷 ∩ 𝐺) 0,08
𝑝(𝐺|𝐷) = = = 0,44
𝑝(𝐷) 0,18
𝑝(𝑁𝐷 ∩ 𝐵) 0,10
𝑝(𝐵|𝑁𝐷) = = = 0,12
𝑝(𝑁𝐷) 0,82
𝑝(𝑁𝐷 ∩ 𝐺) 0,72
𝑝(𝐺|𝑁𝐷) = = = 0,88
𝑝(𝑁𝐷) 0,82
These posterior probabilities are used to complete the tree. Straightforward computations show that the optimal strategy is
to test a chip. If the chip is defective, rework the batch. If the chip is not defective, send the batch on. An expected cost of
$1580 is incurred.
Answer:
The bank can maximize its profits by investigating the customer’s credit record and approve the loan.
EVSI = EVWSI – EVWOI = 54 781 – 53 760 = 1021
EVPI = EVWPI – EVWOI = 55880 – 53 760 = 2120
(4) The NBS earns an average of $400 000 from a hit show and loses an average of $100 000 on a flop.
Of all shows reviewed by the network, 25% turn out to be hits and 75% turn out to be flops.
For $40 000, a market research firm will give its view about whether the show will be a hit or a flop.
If a show is going to be a hit, there is a 90% chance that the market research firm will predict the show to be a hit.
If a show is going to be a flop, there is a 80% chance that the firm will predict the show to be a flop.
Determine how the network can maximize its expected profits. Also find EVSI & EVPI.
Answer:
The network can maximize its expected profits by conducting research and broadcasting the show.
EVSI = EVWSI – EVWOI = 75 000 – 25 000 = 50 000
EVPI = EVWPI – EVWOI = 100 000 -25 000 = 75 000
14. Game theory
14.1 Two-person zero-sum and constant-sum games: saddle points
Characteristics of two-person zero-sum games:
- There are 2 players.
- The row player must choose 1 of m strategies. Simultaneously, the column player must choose 1 of n strategies.
- If the row player chooses his 𝑖th strategy and the column player chooses his 𝑗th strategy, then the row player
receives a reward of 𝑎𝑖𝑗 and the column player loses an amount of 𝑎𝑖𝑗 .
Zero-sum: the sum of the rewards to the players is zero.
The row player should choose the row having the largest minimum. Max(4,1,5) = 5, he should choose row 3.
The column player should choose the column having the smallest maximum. Min(6,5,10) = 5, he should choose column 2.
The game matrix we have just analyzed has the property of satisfying the saddle point condition:
𝑚𝑎𝑥(𝑟𝑜𝑤 𝑚𝑖𝑛𝑖𝑚𝑢𝑚) = 𝑚𝑖𝑛(𝑐𝑜𝑙𝑢𝑚𝑛 𝑚𝑎𝑥𝑖𝑚𝑢𝑚)
A saddle point is stable in that neither player has an incentive to move away from it.
Example:
Two networks are vying for an audience of 100 million viewers. The networks must simultaneously announce the type of
show they will air in that time slot. The possible choices for each network and the number of network 1 viewers for each
choice are shown in the following table. For example: if both networks choose a western, the matrix indicates that 35
million people will watch network 1 and 65 million people will watch network 2. Thus we have a two-person constant-sum
games with c = 100. Does this game have a saddle point? What is the value of the game to network 1?
Solution:
max(row minimum) = 45
min(column maximum) = 45
Network 1 choosing a soap and network 2 choosing a western yields a saddle point: neither side will do better if it changes
strategy. Thus, the value of the game to network 1 is 45 million viewers, and the value of the game to network 2 is 55
million viewers.
Problems:
(3) Dora wants to travel form NY to Dallas by the shortest possible route. He may travel over the routes shown in the
following table. Unfortunately, Swiber can bock one road leading out of Atlanta and one road leading out of
Nashville. Dora will not know which roads have been blocked until she arrives at Atlanta or Nashville.
Should Dora start toward Atlanta or Nashville? Which routes should Swiber block?
Route Miles
NY – Atlanta 800
NY – Nashville 900
Nashville – ST Louis 400
Nashville – New Orleans 200
Atlanta – ST Louis 300
Atlanta – New Orleans 600
ST Louis – Dallas 500
New Orleans – Dallas 300
Solution:
Miles vind je door het wegennetwerk te tekenen!
Solution:
Column player’s strategy = Even
Row player’s strategy = Odd 1 2 Row Minimum
1 -1 +1 -1
2 +1 -1 -1
Column Maximum +1 +1
min(column maximum) = + 1
max(row minimum) = -1
This game has no saddle point.
Observe that for any choice of strategies by both players, there is a player who can benefit by changing strategy.
Example:
A fair coin is tossed, and the result is shown to player 1. Player 1 must then decide whether to pass or to bet. If player 1
passes, then he must pay player 2 $1. If player 1 bets, then player 2 may either fold or call the bet. If player 2 folds, then he
pays player 1 $ 1. If player 2 calls and the coin comes up heads, then she pays player 1 $2; if player 2 calls and the coin
comes up tails, then player 1 must pay her $2. Formulate this as a two-person zero-sum game. Then graphically determine
the value of the game and each player’s optimal strategy.
Solution:
Player 1’s strategy may be represented as follows:
- PP: pass on heads and pass on tails ;
- PB: pas on heads and bet on tails ;
- BP: bet on heads and pass on tails ;
- BB: bet on heads and bet on tails.
Player 2 simply has two strategies: call & fold.
For each choice of strategies, player 1’s expected reward is shown in the following table:
This example may be described as a two-person zero-sum game represented by the following reward matrix:
To determine the optimal strategy for player 1, observe that for any value of 𝑥1 , her expected reward against calling is:
0,5 𝑥1 + 0 (1 − 𝑥1 ) = 0,5 𝑥1
Against folding, player 1’s expected reward is:
0 𝑥1 + 1 (1 − 𝑥1 ) = 1 − 𝑥1
2
Thus, to maximize her expected reward, player 1 should choose the value of 𝑥1 which solves 0,5 𝑥1 = 1 − 𝑥1 or 𝑥1 =
3
1
(and 𝑥2 = ).
3
How should player 2 choose 𝑦1 ? For a given value of 𝑦1 , suppose player 1 chooses BP. Then her expected reward is:
0,5 𝑦1 + 0 (1 − 𝑦1 ) = 0,5 𝑦1
For a given value of 𝑦1 , suppose player 1 chooses BB. Then her expected reward is:
0 𝑦1 + 1 (1 − 𝑦1 ) = 1 − 𝑦1
2 1
Thus, player 2 should choose the value of 𝑦1 which solves 0,5 𝑦1 = 1 − 𝑦1 or 𝑦1 = (and 𝑦2 = ).
3 3
2 1
You should check that no matter what player 1 does, player 2’s mixed strategy ( , ) ensures that player 1 earns an
3 3
1 2 1 2 1
expected reward of = ( 𝑥 0,5 + 𝑥 0) 𝑜𝑓 ( 𝑥 0 + 𝑥 1).
3 3 3 3 3
2 1
In summary, the value of the game is 1/3 to player 1; the optimal mixed strategy for player 1 is ( , ) and the optimal
3 3
2 1
strategy for player 2 is also ( , ).
3 3
Problems:
(2) Player 1 writes an integer between 1 and 20 on a slip of paper. Without showing this slip of paper to player 2,
player 1 tells player 2 what he has written. Player 1 may lie or tell the truth. If caught in a lie, player 1 must pay
player 2 $10; if falsely accused of lying, player 1 collects $5 from player 2. If player 1 tells the truth and player 2
guesses that player 1 has told the truth, then player 1 must pay $1 to player 2. If player 1 lies and player 2 does
not guess that player 1 has lied, player 1 wins $5 from player 2. Determine the value of this game and each
player’s optimal strategy.
Answer:
Stel:
- 𝑥1 = de kans dat speler 1 liegt
- 𝑥2 = 1 − 𝑥1 = de kans dat speler 1 de waarheid verteld
Wat is de verwachte opbrengst voor speler 1?
Als speler 2 hem beschuldigt van liegen, dan is de verwachte waarde voor speler 1:
−10 𝑥1 + 5(1 − 𝑥1 ) = −15 𝑥1 + 5
Als speler 2 gelooft dat hij de waarheid vertelt, dan is de verwachte opbrengst voor speler 1:
5 𝑥1 − (1 − 𝑥1 ) = 6 𝑥1 − 1
Het LP ziet er als volgt uit:
max v
s.t. 𝑣 ≤ −15 𝑥1 + 5
𝑣 ≤ 6 𝑥1 − 1
0 ≤ 𝑥1 ≤ 1
(8) KUL is about to play UA for the tennis championship. The KUL team has 2 players (A & B) and the UA team has 3
players (X, Y & Z). The following facts are known about the players’ relative abilities:
- X will always beat B ;
- Y will always beat A ;
- A will always beat Z.
In another match, each player has a 50% chance of winning. Before the game, the KUL coach must determine who
will play first singles and who will play second singles. The UA coach must also determine who will play first and
second singles. Assume that each coach wants to maximize the expected number of singles matches won.
Use game theory to determine optimal strategies for each coach and the value of the game to each of team.
Answer:
Defense
Offense Run Pass
Run 1 8
Pass 10 0
a) Use problem 9 to show that the offense should run 10/17 of the time.
b) Suppose that the effectiveness of a pass against the run defense improves. Use the results of Problem 9
to show that the offense should pass less. Can you give an explanation for this strange phenomenon?
Answer:
a) Uit de vorige opgave blijkt dat de offense met kans 10/17 voor run zou moeten kiezen. En dus met kans
7/17 voor pass.
b) Als de entry 10 in deze pay-off matrix groter wordt, bijvoorbeeld 100, dan veranderen die kansen tot
100/107 en 7/107. Dus hoewel je performantie verbetert wanneer je pass speelt tegen run, ga je de
optie pass veel minder vaak kiezen. Waarom? Omdat speler 2 dit ook ziet, en dus veel minder vaak run
kiest. Als hij voor pass kiest is het voor jou beter om voor run te kiezen.
(11) Use the idea of dominated strategies to determine optimal strategies for the following reward matrix:
-5 -10 -1 -10 2 -1
-1 2 -10 7 -5 20
2 7 -5 -10 -10 7
7 20 -1 -1 -1 2
20 7 -10 7 -1 -10
Column player chooses Row player’s expected reward if row player chooses (𝑥1 , 𝑥2 , 𝑥3 ),
Stone 𝑥2 − 𝑥3
Paper −𝑥1 + 𝑥3
Scissors 𝑥1 − 𝑥2
Suppose the row player chooses the mixed strategy (𝑥1 , 𝑥2 , 𝑥3 ). By the basic assumption, the column player will choose a
strategy that makes the row player’s expected reward equal to min(𝑥2 − 𝑥3 ; −𝑥1 + 𝑥3 ; 𝑥1 − 𝑥2 ). Then the row player
should choose (𝑥1 , 𝑥2 , 𝑥3 ) to make min(𝑥2 − 𝑥3 ; −𝑥1 + 𝑥3 ; 𝑥1 − 𝑥2 ) as large as possible.
The row player’s optimal strategy can be found by solving the following LP:
max z=v
s.t. 𝑣 ≤ 𝑥2 − 𝑥3
𝑣 ≤ −𝑥1 + 𝑥3
𝑣 ≤ 𝑥1 − 𝑥2
𝑥1 + 𝑥2 + 𝑥3 = 1
𝑥1 , 𝑥2 , 𝑥3 ≥ 0
Note that there is a constraint for each of the column player’s strategies. The value of v in the optimal solution is the row
player’s floor, because no matter what strategy is chosen by the column player, the row player is sure to receive an
expected reward of at least v.
Row player chooses Row player’s expected reward if column player chooses (𝑦1 , 𝑦2 , 𝑦3 ),
Stone −𝑦2 + 𝑦3
Paper 𝑦1 − 𝑦3
Scissors −𝑦1 + 𝑦2
The row player is assumed to know (𝑦1 , 𝑦2 , 𝑦3 ), the row player will choose a strategy to ensure that she obtains an
expected reward of max(−𝑦2 + 𝑦3 ; 𝑦1 − 𝑦3 ; −𝑦1 + 𝑦2 ). Thus the column player should choose (𝑦1 , 𝑦2 , 𝑦3 ) to make
max(−𝑦2 + 𝑦3 ; 𝑦1 − 𝑦3 ; −𝑦1 + 𝑦2 ) as small as possible.
The column player may find his optimal strategy by solving the following LP:
min z=w
s.t. 𝑤 ≥ −𝑦2 + 𝑦3
𝑤 ≥ 𝑦1 − 𝑦3
𝑤 ≥ −𝑦1 + 𝑦2
𝑦1 + 𝑦2 + 𝑦3 = 1
𝑦1 , 𝑦2 , 𝑦3 ≥ 0
Observe that the LP contains a constraint corresponding to each of the row player’s strategies. The value of w in the
optimal solution is the column’s player ceiling on the expected losses, because by choosing a mixed strategy, the column
player can ensure that his expected losses will be at most w.
Problems:
(1) A soldier can hide in one of 5 foxholes. A gunner has a single shot and may fire at any of the four spots A, B, C, D.
A shot will kill a soldier if the soldier is in a foxhole adjacent to the spot where the shot was fired. (A shot fired in
B, will kill the soldier if he is in foxhole 2 or 3.) Suppose the gunner receives a reward of 1 if the soldier is killed
and a reward of 0 if the soldier survives the shot.
c) We are given that an optimal strategy for the soldier is to hide 1/3 of the time in foxholes 1,3 and 5. We
are also told that for the gunner, an optimal strategy is to shoot 1/3 of the time at A, 1/3 of the time at
D and 1/3 of the time at B or C. Determine the value of the game to the gunner.
d) Suppose the soldier chooses the following non-optimal strategy: hide 1/2 of the time in 1, 1/4 of the
time in 3 and 1/4 of the time in 5. Find a strategy for the gunner that ensures that his expected reward
will exceed the value of the game.
e) Write down each player’s LP and verify that the strategies given in c) are optimal strategies.
1 A 2 B 3 C 4 D 5
Solution:
c) Matrix:
1 (1/3) 3 (1/3) 5 (1/3)
A (1/3) 1 0 0
BC (1/3) 0 1 0
D (1/3) 0 0 1
d) Matrix:
1 (1/2) 3 (1/4) 5 (1/4)
A 1 0 0
BC 0 1 0
D 0 0 1
(1, 0, 0) gives the gunner an expected reward of: (1 x 1/2) + … = 1/2 > 1/3
e) Gunner’s LP:
max v
s.t. 𝑣 ≤ 𝑥1
𝑣 ≤ 𝑥2
𝑣 ≤ 𝑥3
𝑥1 + 𝑥2 + 𝑥3 = 1
𝑥1 , 𝑥2 , 𝑥3 ≥ 0
Soldier’s LP:
min w
s.t. 𝑤 ≥ 𝑦1
𝑤 ≥ 𝑦2
𝑤 ≥ 𝑦3
𝑦1 + 𝑦2 + 𝑦3 = 1
𝑦1 , 𝑦2 , 𝑦3 ≥ 0
Elk van deze oplossingen is toelaatbaar en de optimale warden komen overeen. Dus zwakke dualiteit
impliceert dat dit optimale oplossingen zijn.
Nash equilibrium point: point where neither player can benefit from a unilateral change in strategy. (-5,-5)
Nash evenwicht: situatie waarbij geen enkele speler zijn strategie wenst te wijzigen, gegeven de verwachte strategie van de andere speler.
More formally, a Prisoner’s Dilemma game may be described as in the following table:
Player 2
Player 1 NC C
NC (P,P) (T,S)
C (S,T) (R,R)
where:
- NC = non-cooperative action
- C = cooperative action
- P = punishment for not cooperating
- S = payoff to person who is double-crossed
- R = reward for cooperating if both players cooperate
- T = temptation for double-crossing opponent
For a game to represent a Prisoner’s Dilemma, we require that: T > R > P > S.
(P,P) is an equilibrium point if P > S.
(R,R) not to be an equilibrium requires T > R.
Reward matrix:
HD chef
HD King $10 $6
$10 (2,2) (9,-1)
$6 (-1,9) (6,6)
[ (240/10) – (10+10) ] / 2 = 2
(2,2) is an equilibrium point. Although both restaurants are better off at (6,6) than at (2,2), (6,6) is unstable because either
restaurant may gain by changing its strategy.
For both (5,-5) and (-5,5) neither player can gain by an unilateral change in strategy. (5,-5) and (-5,5) are equilibrium points.
For this game, the reader should verify that there is no equilibrium in pure strategies and also that each player’s choice of
the mixed strategy (0,5 ; 0,5) is an equilibrium because neither player can benefit from a unilateral change in strategy.
Problems:
(4) Given that each player’s goal is to maximize her expected reward, show that each player’s choice of the mixed
strategy (0,5 ; 0,5) is an equilibrium point.
Game:
Player 2
Player 1 Strategy 1 Strategy 2
Strategy 1 (2,-1) (-2,1)
Strategy 2 (-2,1) (2,-1)
Solution:
Is (0,5 ; 0,5) voor elk der spelers een equilibrium?
Beschouw de opbrengst voor de kolomspeler:
Kolom 1: −𝑥1 + 𝑥2 = 1 − 2𝑥1
Kolom 2: 𝑥1 − 𝑥2 = 2𝑥1 − 1
Dus als 𝑥1 ≠ 0,5, dan zal speler 2 een voorkeur hebben voor een bepaalde kolom en die altijd spelen. Dan is er
geen equilibrium, want gegeven die keuze van speler 2, kan speler 1 zich verbeteren.
Consider two subsets of sets A and B such that A and B have no players in common. Then for each of our examples, the
characteristic function must satisfy the following inequality:
𝑣(𝐴 ∪ 𝐵) ≥ 𝑣(𝐴) + 𝑣(𝐵)
This property of the characteristic function is called superadditivity.
A solution concept should indicate the reward that each player will receive. More formally, let 𝑥 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 } be the
reward vector such that player 𝑖 receives a reward 𝑥𝑖 . The reward vector is called an imputation if it satisfies:
𝑖=𝑛
𝑣(𝑁) = ∑ 𝑥𝑖
𝑖=1
𝑥𝑖 ≥ 𝑣({𝑖})
14.6 The core of an n-person game
Given an imputation 𝑥 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 }, we say that the imputation 𝑦 = {𝑦1 , 𝑦2 , … , 𝑦𝑛 } dominates imputation x if there
exists a coalition S such that:
∑ 𝑦𝑖 ≤ 𝑣(𝑆)
𝑖∈𝑆
𝑦𝑖 > 𝑥𝑖
Thus, each member of S prefers y over x and the members of S can ensure they receive the amounts 𝑦𝑖 . Hence, the vector x
should not be considered a possible solution.