Natural Language Understanding With Commonsense Reasoning:: MSC in Artificial Intelligence (Muia)

E.T.S.
DE INGENIEROS INFORMÁTICOS
UNIVERSIDAD POLITÉCNICA DE MADRID
MASTER TESIS
MSc IN ARTIFICIAL INTELLIGENCE (MUIA)
NATURAL LANGUAGE UNDERSTANDING

WITH COMMONSENSE REASONING:
APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
AUTHOR: ALFONSO LÓPEZ TORRES

SUPERVISOR: MARTÍN MOLINA GONZÁLEZ
JUNE, 2016
This is for my children Carla and Alonso,
and my wife Véronique
Thanks for their unconditional support and patient (also for the coming
adventures…)
v
Acknowledgments:
I’d like to thank the advices and help received from Martín.
I was very lucky being your student.
vi
RESUMEN
En 1950, Alan Turing propuso un test para evaluar el grado de inteligencia humana
que podría presentar una máquina. La idea principal era realmente sencilla: llevar a
cabo una charla abierta entre un evaluador y la máquina. Si dicho evaluador era
incapaz de discernir si el examinado era una persona o una máquina, podría afirmarse
que el test había sido superado. Desde entonces, a lo largo de los últimos 60 años se
han presentado numerosas propuestas a través de los cuales se han puesto al
descubierto ciertas debilidades del test. Quizás la más importante es el hecho de
centrarse en la inteligencia humana, dejando a un lado otros tipos de inteligencia. El
test obliga en gran medida a definir en la máquina un comportamiento antropomórfico
y de imitación con el único fin de pasar el test.
Con el fin de superar estos y otros puntos débiles, Hector Levesque propuso en 2011
un nuevo reto, “The Winograd Schema Challenge”. Un sencillo test basado en
Pregunta y Respuesta sobre una frase que describe una situación cotidiana. En dicha
frase se identifican dos agentes participantes, existiendo una referencia a uno de ellos
a través de (generalmente) un pronombre. La prueba consiste en entender a cuál de
los agentes está haciendo siendo referenciado por dicho pronombre. Es decir, resolver
la anáfora o correferencia en cuestión.
La clave del test está en la definición de la frase, que debe ser de tal que la solución al
problema no puede encontrarse mediante métodos estadísticos. El equilibrio existente
entre los elementos asociados a la anáfora es tal que sólo es resoluble mediante una
comprensión de los conceptos descritos en la frase tal y como lo haría una persona.
Es decir, a través de la comprensión del lenguaje natural conseguida gracias al
conocimiento del dominio o mundo y utilizando cierto razonamiento de sentido común.
Este trabajo presenta una propuesta para resolver este reto de comprensión de
lenguaje natural definido por Levesque. Para ello se han utilizado diferentes modos de
representación del conocimiento con la idea de permitir en cierta medida aplicar los
conceptos del razonamiento de sentido común clásico del ser humano a través de una
aplicación software desarrollada con dicho fin.
vii
ABSTRACT
In 1950, Alan Turing proposed a test to measure the degree of human intelligence that
could present a machine or computer. The main idea was really simple: To carry out an
open discussion between an evaluator and the machine. If the evaluator was unable to
discern whether the examinee was a person or a machine, it could be argued that the
test had been passed. Since then, over the last 60 years there have been numerous
proposals that have finally revealed certain weaknesses in the test. Perhaps, the most
important one is that it only focus on human intelligence, leaving aside other types of
intelligence. The test also requires the machine to act as an anthropomorphic and
imitation device with the sole purpose of passing the test behavior.
In order to overcome these and other weaknesses, Hector Levesque proposed in 2011
a new test, "The Winograd Schema Challenge". A simple experiment based on Q & A
over a phrase describing an everyday situation. In the sentence, there are two main
participants or actors, and there is also a reference to one of them through (usually) a
pronoun. The test consists of trying to find which agent is referenced by the pronoun.
That is, it tries to solve an anaphora or coreference problem.
The test key point is how the phrase is defined. It should be done in such a way that
the solution to the problem could not be found through statistical methods. The balance
between the elements associated with anaphora makes the problem only resolvable
through a deeper understanding of the concepts described in the phrase (like would be
done by a person). That is, by way of the natural language understanding achieved
with the knowledge of the related domain or world and by using somehow what is
known as commonsense reasoning.
This paper presents a proposal to solve the understanding natural language challenge
defined by Levesque. For this purpose, they have used different modes of knowledge
representation with the idea of allowing the application of classical concepts of human
commonsense reasoning. The results have been consolidated in a software application
developed specifically for this purpose.
viii
TABLE OF CONTENTS
LIST OF FIGURES .............................................................................................................................. viii

LIST OF TABLES .................................................................................................................................. ix
PART I: INTRODUCTION......................................................................................... 1
1. INTRODUCTION..................................................................................................................................... 2
2. MAIN GOALS .......................................................................................................................................... 5
PART II: BACKGROUND .......................................................................................... 7

3. COMMONSENSE REASONING ............................................................................................................. 8
3.1. Introduction .................................................................................................................... 8
3.2. Commonsense Reasoning ............................................................................................ 8
3.2.1. Commonsense Knowledge .......................................................................................... 10
3.2.2. Reasoning with a Commonsense Knowledge Base ..................................................... 13
3.3. Qualitative Reasoning in the Spatio-temporal Domain ................................................ 15
3.4. Event Calculus and DEC ............................................................................................. 16
3.4.1. Event Calculus main concepts ..................................................................................... 17
3.4.2. Using Event Calculus for commonsense reasoning ..................................................... 19
4. NATURAL LANGUAGE UNDERSTANDING (NLU) .............................................................................. 24
4.1. Introduction .................................................................................................................. 24
4.2. Natural Language Processing: Shallow Understanding .............................................. 25
4.3. Natural Language Understanding: A Deeper comprehension .................................... 26
4.4. The Winograd Schema Challenge ............................................................................... 27
4.4.1. Description of the Winograd Schema Challenge .......................................................... 28
4.4.2. Defining the Winograd Schema Challenge Corpus ...................................................... 30
4.4.3. Different approaches to the Winograd Schema Challenge .......................................... 31
PART III: PROPOSAL .............................................................................................. 33

5. PROBLEM APPROACH ........................................................................................................................ 34
5.1. Introduction .................................................................................................................. 34
5.2. Solving the WSC with a Model-Based System ............................................................ 34
6. ADDRESING THE WSC WITH A MODEL-BASED SYSTEM ................................................................ 39
6.1. Introduction .................................................................................................................. 39
6.2. Relevant Information for the understanding process ................................................... 40
6.3. Domain, Agents and Parameters extraction ................................................................ 42
6.3.1. Agent Selection from a Schema answers .................................................................... 43
6.3.2. Domain Selection by detecting the main actions of the sentence ................................ 44
6.3.3. Selection of relevant Information about the Agents ...................................................... 46
6.4. Model description by way of the Event Calculus ......................................................... 47
6.4.1. General predicates ....................................................................................................... 48
6.4.2. Narrative predicates ..................................................................................................... 49
6.5. Model generation by using Domain, Agents and Parameters ..................................... 51
ix
6.5.1. Model Template Selection from the Domain ................................................................ 51
6.5.2. Model Setting and Model Parameters .......................................................................... 52
6.6. Solving the WSC by way of a model ........................................................................... 55
6.6.1. The Event Calculus Interpreter..................................................................................... 56
6.6.2. Answer Selection and possible errors in the process ................................................... 57
6.7. Model-based WSC Solver architecture ....................................................................... 58
6.8. Reducing the WSC problem to the Spatio-temporal domain....................................... 60
7. IMPLEMENTATION .............................................................................................................................. 63
7.1. Introduction .................................................................................................................. 63
7.2. External Applications and software ............................................................................. 64
7.2.1. The Stanford CoreNLP Suite ....................................................................................... 64
7.2.2. The Discrete Event Calculus Reasoner Program ......................................................... 67
7.3. Domain Knowledge and Model Databases ................................................................. 70
7.3.1. The Domain Knowledge Database............................................................................... 70
7.3.2. The Domain Model Database ...................................................................................... 74
7.4. Model-based WSC Solver implementation .................................................................. 77
7.4.1. Natural Language Processing of the Winograd Schema .............................................. 78
7.4.2. Implementing the Relevant Information Extraction ....................................................... 79
7.4.3. DEC Model Generation ................................................................................................ 82
7.4.4. DEC Model Processing ................................................................................................ 85
7.4.5. DEC Model Evaluation ................................................................................................. 87
7.5. System User Interface ................................................................................................. 88
8. EVALUATION ....................................................................................................................................... 92
8.1. Test Environment ......................................................................................................... 92
8.2. Results ......................................................................................................................... 94
8.3. Results Analysis .......................................................................................................... 96
PART IV: CONCLUSIONS........................................................................................ 98

9. FUTURE LINES OF INVESTIGATION .................................................................................................. 99
10. CONCLUSIONS .................................................................................................................................. 101
ANNEX A EVENT CALCULUS AXIOMATIZATION ................................................ 103
ANNEX B REASONING DEC MODELS .................................................................. 106
ANNEX C DOMAIN KNOWLEDGE DATABASE .................................................... 116
REFERENCES ......................................................................................................... 121
x
LIST OF FIGURES
Figure 1. Different approaches to the CSR problem. .............................................. 9

Figure 2. General steps in a computational CSR application. ............................... 10
Figure 3. Main parts fully describing a domain with Event Calculus ...................... 20
Figure 4. Basic schema of a Model-Based WSC Solver ....................................... 37
Figure 5. Different steps performed by the Model-Based WSC System ................ 39
Figure 6. Domain, Agents and Parameters in the example. .................................. 41
Figure 7. Information Extraction process .............................................................. 42
Figure 8. Model Generation process ..................................................................... 51
Figure 9. Model from the template predicates and agent Parameters ................... 52
Figure 10. Model Evaluation process description.................................................... 55
Figure 11. Model-based Winograd Schema Challenge Solver ................................ 59
Figure 12. Coordinate system for Position, Distance and Time ............................... 61
Figure 13. Placing the agents in the coordinate systems ........................................ 62
Figure 14. Model-based Winograd Schema External Tools .................................... 63
Figure 15. Stanford CoreNLP Schema (figure from [Manning et al., 2014]) ............ 65
Figure 16. Relevant Information Extraction steps ................................................... 81
Figure 17. Relevant Information Extraction steps ................................................... 87
Figure 18. System User Interface Main Window ..................................................... 88
Figure 19. Winograd Selection Frame .................................................................... 88
Figure 20. DEC Output Narration and answer about models .................................. 89
Figure 21. DEC Output related with the Execution messages................................. 90
Figure 22. Relevant information found and Final Answer about the models ............ 90
Figure 23. System User Interface Main Window ..................................................... 91
xi
LIST OF TABLES
Table 1. Sentence – Agents – Agent states & properties – Rules examples. .......... 36
Table 2. Schema – Domain Association database entries....................................... 44
Table 3. Relevant Word list with Domain, Type and Qualifier. ................................. 47
Table 4. Meaning of a Qualifier according the Group. ............................................. 72
Table 5. Meaning of a Qualifier according the Group. ............................................. 74
Table 6. Variable Description and Default Values. .................................................. 82
xii
PART I: INTRODUCTION
1
1. INTRODUCTION
Natural language understanding is one of the main fields inside the natural language
processing domain. It focuses in finding methods and computational models emulating
the human tasks of reading and comprehension of written texts (or speeches). This is
not an easy problem as it requires going farther than just parsing and processing
syntactically and semantically the information. We can consider that a machine
understanding a text will be able to answer different questions about it, resume the
included information, or doing some action according the sentences (searching specific
information in a database or starting a list of orders in an application). In a more
general manner, the understanding process implies doing some predicted action
according the input, and it would demonstrate somehow that the system comprehends
the text. For example, taking the sentence “My mother was born in 1950”, we can ask
“How old is my mother?”. The system must answer “66 years old”. Also, if we input the
text “Please, make a backup of all the downloaded data and switch off when finished”,
we would expect from a computer understanding this sentence doing all the described
actions.
In some manner, the understanding process requires an interpretation of the input

sentences, and it usually implies a deeper knowledge about the world described in the
text. But, we cannot expect that all the information needed in the understanding
process will be explicitly included. For example, in the sentence “It has been a long
journey. I cannot wait to get home”, we can deduce that the narrator is tired and he (or
she) wants to rest as soon as possible. We can change slightly the text as follows: “I
have been promoted. I cannot wait to get home”. Now, we can expect that the narrator
is working, he (she) is happy and wants to share the new with someone living with him
(her). I am remarking in both sentences that, as readers, we do not know the gender of
the narrator, so we cannot expect a machine solving this question without additional
information. The arising question now is how the computer can infer about facts not
stated in the texts. How can be established a relationship between a “long journey” and
a fatigue state (when the narrator wants to back home). Or between a professional
promotion and the happiness of the protagonist.
An equivalent problem appears also when the text includes ambiguous concepts. For
example, when there is an anaphora to be solved, or when the meaning of some words
depends on the context. For example, in the sentence “Carla was happy with her new
friend because she gave her a gift”. Solving this anaphora implies some knowledge
about human relations and emotions. Everybody knows that when someone receives a
present, we can expect that he will be happy. Again, we cannot expect the system
2
finding a solution in cases where we also will find difficulties. For example, with the
classical (and frequently presented) ambiguous sentence “I saw a man on a hill with a
telescope”, without any additional information, nobody can affirm that the I saw a man
carrying a telescope, or I saw him by using it, or just there was a telescope in a hill
where I saw a man. But there are ambiguities that can be solved again by considering
the context of the story and background knowledge about the related world. Consider
the sentence “I like Walking on the Moon”, many people will remember the song by
Police, or if they do not know, they probably ask about it, but nobody will think that I
walk on our satellite from time to time.
Now, the key question is how can we define and process an external knowledge that
will allow us a correct interpretation of the initial information. Also, it is necessary to
specify what is the knowledge that will be useful to achieve it. This question is easy to
answer when the thinking mind is human. We use our knowledge and experience
about the world that we have acquired consistently through many years of learning.
Using this kind of knowledge to understand and complete the missing blanks in the
stories is usually described as commonsense reasoning. Something that most of
humans apply very early in their lives to solve ordinary issues without too many
problems. When a computer tries to solve the same kind of problems we could expect
that it will need, somehow, an equivalent knowledge.
When introducing the commonsense reasoning into an automated system or computer,

it needed first some method to represent all the knowledge about the world. Then, all
this knowledge must be acquired (if possible), or introduced into a database or
equivalent, to be accessible to the system. Finally, with all the information ready to be
used, the computer must be prepared to correctly select and apply the commonsense
data according the problem.
Is it necessary for a computer having commonsense knowledge to reason and solve a

text-understanding problem? It would depend on the kind of text. Of course, the
computer will need some knowledge, but many times it can be acquired, for example,
by way of automated learning. Taking as example another classical sentence (from
[Rahman et al., 2012]), “Lions eat zebras because they are predators”, it seems
obvious that solving this anaphora problem is as easy as relating lions and predators.
By training a system with the most frequent adjectives associated with every subject
will find the solution (searching properly on Google could be enough in this case). We
are trying to solve a question with only two possible answers. When there is an
unbalanced relation, the solution appears to be easier to find.
3
In every case, the computer will need a knowledge base. It does not matter if these
data are available before the question (a database) or it is created from exterior
sources (Internet searches) in function of what is needed for each case. But, when the
story that we want to complete is not so obvious and the knowledge about the world is
the key for the understanding process, commonsense reasoning arises as a primary
part of the computational system.
The ability of applying commonsense reasoning have been presented several times as
a way to demonstrate that a computer has some level of intelligence. As this kind or
reasoning has clearly a human origin, measuring the ability of using it could be a good
artificial intelligence benchmark. With this in mind, [Levesque et al., 2012] presented
The Winograd Schema Challenge (WSC), an alternative to the Turing Test to measure
the ability of a computer to solve an anaphora inside a special sentence. The main idea
under this proposal is that the sentence is constructed in such a manner that
disambiguating the anaphora would require some logical reasoning involving
knowledge not stated in the text.
The WSC represents many of the problems we have introduced before. There is a
disambiguation problem that would need knowledge about the related world. Therefore,
we are going to use it as the backbone of the study we are going to present in this
Master thesis. However, because the difficulty of the problem we are introducing, it is
necessary to limit the scope of this work. The best way of doing it starts by we are
going first to specify the main goals of our work.
4
2. MAIN GOALS
The starting point of this work is the NLU problem. We will focus in searching a specific
solution to solve it. But we also want to demonstrate how CSR can help us with this
challenge. So, the first goal we must consider is doing a deeper study about how this
problem can be solved in an easier manner by using a CSR system.
As NLU covers too many possibilities, we are going to focus in the texts or sentences
introduced with the WSC. That is, small phrases including an anaphora disambiguation
problem. In fact, we will reduce also the possible Winograd schemas under study. We
will consider only sentences related with the Spatio-temporal domain. As we see later,
we will try to represent, with more or less accuracy, the world described in these
sentences. This task cannot be addressed covering all the possible domains due the
huge complexity of doing so. We will just try to demonstrate which the paths to follow
are, but defining first a smaller target in the implementation phase.
Every system using CSR requires some method for representing and interacts with the
commonsense knowledge needed in the process. Therefore, the second goal will be
the definition of a proposal to represent this knowledge. It will include the base
interface to allow a computational system its proper use. The base selected for this
representation has been the Event Calculus due to its versatility when using it in
commonsense reasoning considering the effect during periods. We will add also
specific databases that will be specially adapted for with this kind of reasoning.
The last goal of our work will be the implementation of a CSR system as the best way
to test these concepts. We will see the necessity of a previous NLP step. It will not be
specifically created, as this is not the purpose of this work. Based on this, we will add to
the implemented system the free licensed NLP tools developed by the Stanford
University. In addition, we will use the tool DEC Reasoner from IBM Corporation (and
others) and several public, free licensed SAT solvers that complement it. We will create
the rest of the system especially for this work, including the commonsense knowledge
databases, ready to help in the Spatio-temporal Winograd schemas understanding.
The system will also include a human-computer interface to allow, not only the
presentation of the results from an initial examples set, but also adding new related
Winograd schemas to be tested.
Following the theoretical and proposal sections, we will describe how the system has
been implemented by way of an Apple Mac development environment (OS X system).
We will show the results obtained with a Spatio-temporal WSC corpus and compare
5
them with other approaches. We will complete this section with a specific section with
the conclusions about this work and its results, and the different improvements and
future lines of investigation that could continue our proposal.
The background of this document will first introduce these two main parts of the study:
Commonsense Reasoning and Natural Language Understanding, including a more
detailed description of the WSC proposal. Then we will see the different approaches to
the WSC resolution presented (as the test was introduced in 2012, we should consider
all the relevant works as part of the state-of-the-art). We will highlight the difficulties
found in the (well-defined) hard problem of the pronoun resolution, and how the
automated and statistical learning will not be enough to solve it.
We will pay attention also in the Event Calculus language as a key part for the
knowledge representation applied in our proposal. We will focus especially in how this
language can describe the common events that arise every day in human live as this is
the classical situations where commonsense reasoning is used. In addition, we will
introduce several examples from different authors where Event Calculus is used for the
NLU problem resolution.
6
PART II: BACKGROUND
7
3. COMMONSENSE REASONING
3.1. Introduction
Commonsense reasoning (CSR) is an essential part of human behavior. By using it,

people address many of most everyday problems. The essential point inside CSR is
that it helps in the inference process by using the knowledge and experience we have
about common situations about our world and the implicit rules governing it.
CSR is totally assumed by us in such a manner that nobody founds difficulties in

understand and apply it. On the contrary, we expect that everybody uses it and the
resulting inferences should be equivalent. But it cannot be said the same thing when
the subject is a computer. In fact, CSR is one of the unsolved challenges inside
Artificial Intelligence (AI) field and it seems to be one of the pending long-term
problems. However, considering that it is so important for us to solve so many
situations, it seems obvious how important could be finding better solutions. Whenever
this problem will be solved, AI will be closer to human intelligence.
Following we going to go deeper inside the CSR concepts and describe different
methods to represent the related knowledge. As described before, this work will
address the WSC considering only the Spatio-temporal domain. Therefore, we will
comment different aspects about this domain, and how reasoning processes can be
applied in it. Finally, we will introduce the Event Calculus logical language as a leading
part used in this work to represent the commonsense knowledge and the rules about
the world.
3.2. Commonsense Reasoning
CSR can be seen as a simulation of how human reason and infer about the common
daily problems. We expect from a computer using it being able to accept inputs from
the real world and infer results that cannot be achieved just with the initial entry data.
The only solution for doing so is trying to reproduce what humans apply: the
commonsense knowledge, usually acquired after experiencing directly and indirectly
the world.
Researchers have worked in many different ways of acquisition and representation of

this knowledge to allow its use by computers. In fact, we can say that doing both
8
processes, acquisition and representation, enough complete for the task is much
harder and complex than the reasoning process itself. It is due the large amount of
knowledge about the world needed and the heterogeneity we can find around it.
Additionally, this diversity and quantity does not mean separation among the different
data. On the contrary, they are deeply related and, most of the times, no reasoning can
be done if we lose these relations. The complexity of the problem arises immediately
with this fact.
Every human needs several years of learning (and it never ends) to start applying
commonsense correctly. We have also the advantage of how our brain seems to be
specially adapted to reason about this kind of daily problems. In fact, humans seem to
be comfortable managing ambiguity information. In these situations, computers start
early to suffer combinatorial explosion [Mueller, 2002]. It could be enough for a
computer having several chained ambiguities mixed with different knowledge domains
to prevent find a solution. This is just a drop in the ocean of the problems around CSR.
For example, we can also find many difficulties to aboard the huge number of different
knowledge domains available with the relations among them (Many times, they cannot
easily be declared, as they are not directly stated inside the domains).
Not all these considerations must be a resignation about CSR as a research path. It is
just a starting point that should help us in selecting the right steps in our work. For
example, the difficulty and size of the problem invite us to reduce or limit somehow the
domains of our approach. And this is what are we proposing in this thesis by, first
working in a specific anaphora resolution problem, the Winograd Schema Challenge,
and by reducing the world just to the Spatio-temporal domain. This approach allows
finding a deeper solution, as we can strongly focus in a more reduced space. After
solving the problem inside the small world, we can successively expand our proposal to
other domains. An opposite approach could be working in this expanded world, with a
greater number of domains, and trying to going deeper into the problem while
maintaining the focus in all the domains. Figure 1 represents both approaches:
STARTI N G DEEP ER O VER O N E DO MAIN STARTIN G WITH MAN Y DO MAIN S
UND ERS TA ND I NG UND ERS TA ND I NG
D OMA I NS D OMA I NS
AN D EX P AN DIN G O VER N EW DO MAIN AN D GO IN G D EP ER O VER ALL O F THEM
Figure 1. Different approaches to the CSR problem.
9
3.2.1. Commonsense Knowledge
Commonsense knowledge is the base for every CSR system, as it summarizes all the
information or data to be used by the computer. This base consists of all the common
facts, general basic rules and daily information that every human uses when applying
CSR every day. There is no conceptual distinction with what a computer would need
when trying to apply the same kind of reasoning. The real difference appears in how
this information is represented, stored and used by the computer. As commented
before, this is the hardest point inside the CSR topic, due the size of the data needed
to achieve a smooth reasoning process (we expect a computer using CSR to appear
being as intelligent as a human when solving daily situations). Figure 2 resumes the
general steps when acquiring and using commonsense knowledge:
I nput
I nfor mation
INFORMA TION
EXTRACTION
R elev ant
I nfor mation
KNOWL EDGE
REPRESENTATION
COM M ONSENSE
Commonsense
KNOWL EDGE
Knowledge I nfor mation
DATA BASE
C SR COMM ONSENSE
Ques tio n REA SONING
I nfer ence
O utput
Figure 2. General steps in a computational CSR application.
10
Let us review the different steps described. Basically, we can see that from an initial
Input Information we will extract and represent what we have called Commonsense
Knowledge into a specific Database. Then, the Commonsense Reasoner will use this
database to output the resulting Inference for every question.
The Input Information is all the data from the external world. It can be every stimulus,
action, property (characteristic) of every basic element composing our world. It is not
only physical things, such as stones, liquids or living beings. Also, it includes the
relations among all these elements (relationships, classifications/taxonomies,
functionalities), their properties (size, color, distances, locations), general rules and
concepts (physic rules such as gravity or time), more abstract concepts (behaviors,
emotions, needs, goals, contexts), etc. That is, every basic idea or concept that a
human can use when applying CSR. In fact, we usually add to all these data (usually
obtained from our own experience) the additional historical information transmitted
among humans by way tales, texts or other classical knowledge representations. Both
could be considered as direct and indirect acquired knowledge and both are basic for
the reasoning we are looking for. As commented before, there is no difference between
what a computer or a human could need as input information.
The Information Extraction process is as complex as the level of information we are

dealing with. Moreover, as seen before, this is not a clear and simple problem. The
common processes for doing so include, for example, machine translation, text or data
mining, speech recognition and many other natural language processing techniques,
visual object recognition, and so on. This step covers two main tasks:
 Extraction of all the new information for populating the Commonsense

Knowledge Database. It could be considered as the learning process, that it is
equivalent to the human experience process.
 Extraction of the information related with the possible questions we are sending
to the system. The idea is to prepare these data to find the entire possible
match among them that could allow establishing the relation with the concepts
already available inside de database.
Both tasks will result in what we can define as de Relevant Information: Data that
could be useful for the two described tasks.
The Knowledge Representation block performs the next step, and it is a key part
inside the CSR System. By taking the relevant information, the system will try to
represent it in a computable manner with the goal of creating and maintaining a
Commonsense Knowledge Database. Ontologies have been one of the most
11
frequently used method in the last years, thanks their capacity to represent not only
different taxonomies, but also the relations and properties among data. At present, the
number of researches and works about these special databases is enough large, but
we can highlight the most commonly ones related with CSR systems. All of them use a
specifically developed language or a more general version of declarative languages. In
any case, the goal is to allow the representation and query of the stored commonsense
knowledge:
 Cyc ([Siegel et al., 2004])

Cyc is itself a project started in 1984 by Doug Lenat at MCC and later consolidated with
the Cycorp Company. It has the goal of codify, represent and store commonsense
knowledge, and offer it as a ready to use base for different inference processes.
The knowledge base is composed of more than half million concepts declared by way
an ontology based on the domain of human consensus reality. These concepts are
complemented with five millions of assertions complemented with more than 26.000
relations among them. These numbers are just to show the size of the project. Cyc is
accessible with the CycL language and the concepts and asserts have been written
with it. Its power resides not only in the knowledge base, but also in the internal
Inference Engine, that covers different logical deduction resolutions.
 ConceptNet (http://conceptnet5.media.mit.edu)
ConceptNet is a multilingual knowledge base including frequent words and phrases but
including annotations about the commonsense concepts that relate all of them. It is
continuously constructed by using external resources, such as the open version of the
previous commented Cyc (OpenCyc), WordNet, Wiktionary or DBpedia.
 ThoughtTreasure ([Mueller, 2003a])

ThoughtTreasure knowledge base is formed with 27.000 commonsense concepts and
51.000 assertions organized hierarchically. Started in 1994, Mueller declared its
similarity with Cyc, but adding new elements, such as scripts (a structured version of a
concept.
 FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/about)
FrameNet consist of a database including hundreds of semantic frames and lexical
units. A frame in this case is a schema representing different events, participants or
entities, and the relations arising from the described actions and events. It also
includes lexical units, that define which words or language structures evoke every
frame. It could be seen as the key entry point.
12
Other knowledge databases could be considered, such as DBpedia, but in many
cases, their content it is not so clearly related with the commonsense concepts we are
looking for. However, they can be very interesting as a complementary knowledge
source to complete the missing points. The main ideas about we would like to reflect
are the huge work needed to include enough concepts and rules, and the difficult
process that we must implement to find the relations between the original entries (text,
speeches…) and this knowledge.
3.2.2. Reasoning with a Commonsense Knowledge Base
In ([McCarthy, 1960]), the author proposed that every system applying commonsense,
must represent everything what it knows by way of a valid logical language. Following
this approach, when representing commonsense knowledge, researchers usually
select a logic language as main base for the reasoning task.
The problem we found is that when reasoning about the real world, new non-
mathematical conditions arise, such as uncertainty, vagueness, inaccuracy or
incompleteness. In addition, knowledge changes appear in real world. It means that
some deduction done after using a specific set of conditions could by different in the
future if new conditions appear. By conditions, we must include both, the external
influences affecting the world and the new knowledge acquired by the system. Like
humans, if the system increases its experience (additional information), it will be able to
reason in a better manner (or this is what we would expect from it).
The qualification problem ([McCarthy, 1980]) is directly linked with this problem, as if
we need to declare (qualify) every possible action, condition or property about the
world we are dealing with, the reasoning start point seems to be unachievable. In fact,
humans do not use every possible element when using commonsense. For us,
inferring that throwing a ball to the sky will finish with the ball falling to the ground does
not require information about the wind, physic or departing speed.
Non-monotonic Logic (NML) proposes a consistent solution to the qualification

problem. It represents the defeasible reasoning, where the inference conclusions can
be retracted or modified when adding new evidences in the reasoning process, as
described, for example, in the Stanford Encyclopedia of Philosophy ([Strasser et al.,
2014]). It covers greater possible reasoning fields than classical logic, where deduction
cannot be retracted after the inference process.
13
A classic example used to explain it is the following:
What we know at the beginning:

Birds can Fly
Penguins are Birds
X is a Penguin => X can Fly
Adding a new fact about what Penguins can do or not:

Penguins cannot Fly
X is a Penguin => X cannot Fly
There are several options inside the non-monotonic logic, such as default logic, or the
closed world assumption, where only known fact can be true, meanwhile everything
that we do not know, we will assume that it is false. The proposal we are interested is
Circumscription, the non-monotonic logic created and formalized by John McCarthy as
a solution for the frame problem ([McCarthy, 1980]). The main idea behind
Circumscription is that things are not expected to change unless otherwise specified
(similar to the closed world assumption). McCarthy argues that humans use such non-
monotonic reasoning and this justifies its application from systems trying to emulate our
intelligent behavior.
Circumscription was first constructed with first-order logic, where predicates have a
main role in the declaration of the reasoning rules. Our interest in this non-monotonic
logic is due we are going to define all our reasoning knowledge by way of Event
Calculus (see chapter 3.4), a many-sorted first-order logic language, where
Circumscription has a key role as it allows the declaration of a narrower version of the
world we are describing with logic predicates.
When circumscribing any part of the world for reasoning in it, we are reducing the logic
predicates to a specific domain. In addition, many of these predicate definitions are
based on qualitative properties, as they are not fully described by numerical or exact
dimensions. Next chapter will introduce the qualitative reasoning, but specifically
applied to the domain we are interested, as pointed when we have described the goals
of this thesis (see Chapter 2).
14
3.3. Qualitative Reasoning in the Spatio-temporal Domain
The nature of the work presented in this thesis invites to place some focus in the
qualitative reasoning, but also including all the issues related with the Spatio-temporal
domain. When describing some action inside this domain with natural language, we
often include plenty of qualitative references instead using precise data.
Qualitative reasoning is by itself a topic within Artificial Intelligence. It refers to the

reasoning techniques intended to describe, represent and reason about the behavior of
physical systems, but without using exact or precise quantitative descriptions or
references (see [Forbus, 1996]). It is perfectly linked with the common human
knowledge expression, who frequently describes the conditions and relations in a
qualitative manner. For example, we use reference such as high, low, big, far away,
later, large, etc., instead of using numeric values.
Many daily situations do not require an exact measure. It is enough to know if some
object is too heavy to carry or the room is large enough for the bed. Also, many
problems are better described with qualitative values at the beginning, leaving the
quantitative expressions to the next steps when calculations and simulations must be
performed.
Among the different domains covered in the qualitative reasoning, we can find the
spatial and temporal ones. These are not the only domains we are interested in due
our thesis work. These are also other domains frequently invoked when applying this
kind of reasoning and it is common finding then together due their linked physical
properties. For example, movement in the space requires time and the distance is
directly proportional to the lapsed time, for example.
There are different proposals for representing and reasoning about spatial or temporal
properties in the physical world. James Allen Interval Calculus ([Allen, 1983]) is
probably the most well-known proposal when taking about temporal qualitative
reasoning.
The main consequence from this work is the Allen’s Interval Algebra, where it is
defined and represented the relation between two temporal intervals in different
situations (one interval occurs before, at the same time or after the other, they overlap
somehow, and so on). When we are involved in temporal reasoning, we can find
multiple form of expressing the time point and intervals, including absolute, relative or
comparative references. Also, we can consider time instants, durations and so on, with
more or less temporal precision (by using milliseconds, hours, or years, for example).
15
There are also different proposals for spatial qualitative reasoning, where the RCC-8
(Region Connection Calculi, from [Randell et al., 1992]) could be considered the
equivalent to Allen’s algebra but describing positional relations between spatial regions.
However, spatial reasoning can become more complex than managing two static
regions. Again, as described with the temporal reasoning, we have different relations
among spatial entities: points, lines, areas or volumes. In addition, we could express
them with distance measurements, coordinates and so on.
It is also possible to include both domains in the same reasoning process due the
direct relation between the space and time. The distance of a moving vehicle increase
with the time, for example. There is nothing new for our commonsense reasoning mind.
In our work, we are only considering the qualitative situations arising from a
comparison between to elements or agents (objects, persons…). We are interested in
how one agent is positioned in the space and time regarding the other one. Always in a
dual comparison, what it is very near of the concepts used by Allen and Randell. But
usually natural language is simpler than the algebra-based approaches, especially in
common situations where commonsense reasoning is needed. For example, the
Spatio-temporal relation between two agents will be frequently described with
expressions such as “later”, “farther”, “near”, “greater” or “before”. People usually say
that one agent is heavier than other instead giving the exact weight of each agent and
leaving the listener doing a redundant work.
As we will see in Chapter 6.8, where the proposal is reduced to the Spatio-temporal
domain, the information needed for the reasoning process is not greater than the
described here.
3.4. Event Calculus and DEC
The Event Calculus (EC) is a logical language presented by Robert Kowalski and
Marek Sergot in [Kowalski et al., 1989] and modified and performed later by the same
authors and others, such as Murray Shanahan and Rob Miller [Shanahan et al., 1999]
(at present, his different works are the most extended and applied). It was defined
starting from an extension of first-order logic to allow the representation and reasoning
about events and their effect in the time on fluents. We can see events as the different
actions that can occur and fluents are all the logical conditions or properties that can
change over time. By representing the different agents and their properties by way of
16
fluents, we can define a set of predicates that modify these properties by way of
events.
The strength of EC resides mainly its ability to relate different actions happening at
different time. In fact, its primitive elements are the events and fluents, but also the time
points and time intervals. If we define a complete set of predicates describing a specific
world, we will be able to describe the relations among the different agents, and their
change in the time domain. This special feature is why EC is especially interesting
when we are involved in commonsense reasoning logic.
3.4.1. Event Calculus main concepts
When we have introduced EC as an extension, it is because it is based on the variation

called many-sorted first-order logic (The following resume is fully based on [Shanahan
et al. 1999] EC version, and also described in detail in [Mueller, 2004a] and [Mueller,
2006], Chapter 2). To understand it, we are going to introduce a first example by way a
list of predicates describing the following sentence: “The book is on the living room
table”. Representing this sentence with EC could be as follow:
Inside(LivingRoom, Table)
On(Table, Book)
Inside and On are predicate symbols with two arguments each of them. Every argument
is a specific sort. For example, LivingRoom would be part of the room sort, and Table
and Book would part of the object sort. Bot predicates are represented in this case as a
specific expression of a more general declaration. In his case, it could be as follow:
Inside(room, object)
On(object, object)
The Inside predicate declares that any object can be inside a room. In addition, the On
predicate declares that any object can be on top of other object. Every sort can be also
a subset of other sort. For example, person and animal can be sorts included of the
living-being sort. The general predicates referring to living-being can be applied to both,
person and animal (the predicate Eats(livingbeing, food)). But in other cases, there would
be specific predicates referring only to one of the subsets (Speak(person)).
17
These concepts are also part of other many-sorted first order logic languages. In EC
are declared several general sorts and predicates that constitute the base of its
concepts. More specifically, there are three general sorts:
They constitute the base of what we need to express in EC: events that occur at some
time point and modify the state of some fluents. Also, there are several predicates that
use these sorts to represent how they are related among each other. Following there is
the complete set of predicates (considering the Shanahan and Miller proposal):
 Happens(e, t). An event e happens at timepoint t

This first predicate establishes any action occurring in some moment.
 HoldsAt(f, t). A fluent f is true at timepoint t

With the previous one, HoldsAt must be considered as the two basic predicates in
EC. As we will see later, Happens is the trigger applied to any EC system and
HoldsAt is the basic query over fluents to understand the state of the system.
 ReleasedAt(f, t). Fluent f is released from the commonsense law of inertia at timepoint t.
As commonsense law of inertia defines that the value of a fluent does not change
unless it will be modified by an event. If we release the fluent, its value can change.
When we release a fluent, its value changes without any action.
 Initiates(e, f, t). An event e starts (initiates) a fluent f at timepoint t.

Logically, it is related with a Happens predicate. It will trigger the fluent change.
 Terminates(e, f, t). An event e stops (terminates) a fluent f at timepoint t.

It is the opposite of the previous predicate, but it need the same Happens to exist.
 Releases(e, f, t). An Event e releases a fluent f at timepoint t.

That is, if Happens(e, t), then the predicate ReleasedAt(f, t) will be true.
 Trajectory(f1, t1, f2, t2). If fluent f1 is initiated by an event at timepoint t1, then the fluent
f2 will be true at timepoint t2.
This is a very interesting predicate as it allows an event affecting indirectly to other
fluents. It also introduces a delay control over the fluent.
 AntiTrajectory(f1, t1, f2, t2). If fluent f1 is terminated by an event at timepoint t1, then the
fluent f2 will be true at timepoint t2.
It is important to understand that, in this case, the second fluent is indirectly
affected after the first one is terminated.
18
All these basic predicates are complemented with a full axiomatization declaration
composed of ten axioms and seven definitions (ANNEX A includes the complete set
description). Let us take the definition EC3 (all the axioms and definitions are
numbered as ECx, from EC1 to EC17). The combination of the axiomatization and the
rest of predicates defined by us will set the value of the different variables:
EC4. StartedIn(t1,f,t2)
𝑑𝑒𝑓
≡
∃ e, t ( Happens(e, t) ∧ t1 <t <t2 ∧ Initiates(e, f, t ) ).
Definition of StartedIn, where a fluent f is started between t1 and t2 if and only if there
is an event that Happens at time point t that is between t1 and t2, and the fluent f is
initiated by the event e, at time point t.
In our work, we will use the variant Discrete Event Calculus (DEC) that differ basically
from EC by limiting the time point sort to integer values. DEC also differ in the
axiomatization, reduced in this case to twelve axioms and definitions (also described in
ANNEX A). The use of DEC is due we will use the DEC Reasoner software
development (from IBM Corporation) in our implementation work (as this is the only
free licensed implementation of EC).
3.4.2. Using Event Calculus for commonsense reasoning
As we are going to use DEC as a solution for the commonsense knowledge

representation, we must understand what Circumscription is (see Chapter 3.2.2).
This concept is very important for our work, because it reduce significantly the amount
of predicates and rules needed to describe the world. Moreover, this is what we are
trying to solve with DEC: How can we describe the world under study and the relations
and interactions among all the involved agents.
As we have noted before, this work is based in the study of the WSC, but it only
considers schemas related with the Spatio-temporal domain. If we want to understand
the possible situations about this domain, we must first to represent the different
common rules and relations related with it. More generally speaking, every domain will
have its own set of rules and predicates. It does not matter if they are expressed with
DEC or other logical languages.
The full description of each domain will contain three main parts ([Mueller, 2006],
Chapter 2.7). Figure 3 also represents these three blocks:
19
 A first block, with the complete set of axioms describing the domain rules. For
example, in our Spatio-temporal domain, we would decide to include some rules
about how faster agents arrive before the slower ones. Alternatively, maybe we are
interested in define how gravity affect every object falling (until it crashes with the
floor).
 A second part, including all the observations about the properties of every element
inside the domain. Again, in our Spatio-temporal domain, we can declare that an
agent is far away from his goal, or he is stopped at time 0 (every observation must be
associated with a specific time point. If nothing happens, it will maintain its value).
 The last part will be the narrative of the actions or events occurring with the agents of
the domain. It will modify their properties over the time.
Gener al pr edicates (Wor ld r ules)

P red ic ate #1
P red ic ate #2
…
P red ic ate #n
Pr oper ties (Wor ld O bser v ations)

H o lds At( P ro perty#1 )
H o lds At( P ro perty#2 )
…
H o lds At( P ro perty#n )
Pr oper ties (Wor ld N ar r ative)

H app en s ( Ac tion #1 ) at t = 0
…
H ap p en s ( Actio n #n) at t = i
[Flu en ts /p ro p erties s ta rt ch a n gin g their va lu es a cco rdin g the Actio n in fluen c e]
Figure 3. Main parts fully describing a domain with Event Calculus
Let us take as example a Winograd schema sentence: “The box should be over the
book, because Oscar couldn't see it”. In this case, the domain is related with the spatial
position of the objects. In addition, if we consider that and object (box) placed over
other one (book) is usually placed later, the time must be considered.
For this example, we are going to define first seven general predicates about the
domain:
There are three fluents and one possible event:
20
fluent Hidden(object)
fluent Visible(object)
fluent At(position, object)
event PlaceOn(position, object)
Rules 1 & 2: When and object is PlaceOn a position, Visible is initiated for the object
and the object is At that position:
Initiates(PlaceOn(position, object), Visible(object), time).
Initiates(PlaceOn(position, object), At(position, object), time).
Rule3: When and object is PlaceOn, Hidden is terminated for the object:
Terminates(PlaceOn(position, object), Hidden(object), time).
Rules 4: Only one object can be placed at the same time:

Happens(PlaceOn(position, object), time1) & Happens(PlaceOn(position, object), time2)
-> time1 = time2.
Rules 5 and 6: If an agent is Hidden, it is not possible, and vice versa.

HoldsAt(Hidden(object), time) -> !HoldsAt(Visible(object), time).
HoldsAt(Visible(object), time) -> !HoldsAt(Hidden(object), time).
Rules 7: When an object1 is placed at the same position than other, the previous
existing objects will not be visible anymore.
HoldsAt(Visible(object1), time) & HoldsAt(At(position1, object) & object1 ≠ object2 ->
Terminates(PlaceOn(position1, object2), Visible(object1), time).
Of course, this is a simple vision of how the spatial position can be considered. There
is only one dimension when the objects are placed and only one point of view (there is
no behind or in front of positions). Is can be seen as a stack of objects without the
removing action, as only the last placed object is visible and the previous ones are not.
As we can deduct from the seven rules, when something happens (literally), fluents
start changing their values to maintain a logical coherence. Also, rules define what it is
possible inside the described small world.
The second part will add all the elements (agents, objects, etc.) acting inside the world
and the related properties. For example, following the previous sentence, these values
would be:
21
Sort definition:
position Table
object Book, Box
Properties definition:
HoldsAt(Hidden(Book), 0)
HoldsAt(Hidden(Box), 0)
We declare one possible position and two objects, Book and Box The initial state (t = 0)
for both object is Hidden. There is no consideration about the Table position and it is not
necessary to locate the objects with the At fluent. Of course, we could define a new
position, like Chair or Floor, and represent that at t = 0 both objects are in different places
and they do not interfere with each other.
Before continue, it is important to clarify some aspect about the example. We have
introduced some additional information that is not necessary to explain or describe the
sentence with DEC. For example, the position sort does not appear in the text. But the
commonsense reasoning we are trying to introduce with the previous rules needs it. Of
course, we could use new rules considering that every placed object will be at the
same position, but it seems to be an excessively simple solution. Also, the Hidden and
Visible concepts can be unified in only one fluent option (as Hidden could be equivalent
to !Visible). In fact, rules 5 and 6 represent this equivalence. In this case this redundant
choice is only for a better understanding.
The last part describing the sentence inside the defined domain will be the narration of
events. Every change will imply modifications in the fluents (in the range of integer time
points, as we are defining the rules with DEC).
The complete list could be as follow:
t=0
First, the Book is placed on the table.
HoldsAt(Hidden(Box), 0)
Happens(PlaceOn(Table, Book), 0)
t=1
Rules 1 & 2 describe how some fluents change when an object is PlaceOn.
HoldsAt(At(Table, Book), 1)
22
HoldsAt(Visible(Book), 1)
t=2
The other object is placed.
Happens(PlaceOn(Table, Box), 2)
t=3
At this moment, the Book will not be visible anymore (rule 7).
HoldsAt(At(Table, Box), 3)
HoldsAt(Visible(Box), 3)
It is important to note that this narration does not describe the sentence literally. It is
intended to explain what could be happening behind the text that explains the facts:
“some object is not visible, so some other object must be over it”. Moreover, this is
what we are looking for in our work. The goal is using event calculus to understand the
world around the natural language. It is not a literal translation from the text to DEC. Of
course, it will be necessary a specific work to extract the relevant information from the
text and use it in the understanding process.
Then, what can we deduce or infer from this narration? For example, when an object is
placed at the same position where other object is already placed, the previous object
will change to hidden state. This idea is clear for the observer by applying a minimal
commonsense reasoning and the way we have indirectly described this fact with DEC
should be the line to follows when we need to relate this kind of reasoning and event
calculus.
23
4. NATURAL LANGUAGE UNDERSTANDING (NLU)
4.1. Introduction
Natural Language Understanding (NLU) is an essential part inside the Natural

Language Processing (NLP) topic. It covers the different methods developed to allow
(or emulate) the machine reading comprehension. The basic idea is taking a text as
input and output some form of interpretation. It is not a translation between natural and
machine languages. It must be some kind of machine understanding started with the
natural language input. For example, summarizing the input texts, executing some
actions according them or finding all the related documents in function of the text
context.
Inside NLP, it is possible to find different levels of language comprehension (from

[Allen, 1987], Chapter 1). It starts with the basic morphology (prefixes, suffixes, gender,
number, etc.) and syntax analysis, where words and their structural relationships is
detected. Of course, we can consider the knowledge extracted from these analyses the
first level of understanding. Next level comes from the semantic analysis of the text.
Now, we search and relate the different word meanings to conform a greater
understanding across the whole sentence. In this case, we must consider the context in
which the sentence is used to solve common ambiguities from these word meanings.
There are additional possible analyses, such as discourse and pragmatic ones, but we
are going to move to the last level, the analysis related with the World knowledge. It
considers the facts about the world and the related commonsense knowledge that
helps in the interpretation and understanding process.
As we can see, there are different levels of understanding according how deep we
move inside the essence of the sentences. From a shallow comprehension where the
focus is just placed over the information extraction process, to a deeper understanding
where the main ideas and facts are inferred from the text. Probably, the main difference
from both kind of understandings is that the second one allows to infer actions, facts or
concepts that are not explicitly described in the sentences. Here takes relevance the
previously commented World knowledge.
The understanding task requires anyway the information extraction step, directly
related with the different methods and applications inside NLP field. This is the
indispensable first step; regardless we then move to a deeper understanding process.
24
In next chapters we will describe in more detail these NLP needed at first, and then, we
will center in the deep NLU process and how it has been related by different
researchers applying the commonsense reasoning.
Finally, due the relevance inside the work presented in this master thesis, we will go
deeper inside the Winograd Schema Challenge and the anaphora problem that it
introduces as the main question to solve.
4.2. Natural Language Processing: Shallow Understanding
NLP covers many different topics, such as the, automatic summarization, speech
recognition, OCR, question answering, co-reference resolution, sentiment analysis and
a large etcetera. The level of analysis used by each of them is different, and the result
leads from a shallow text understanding to a deeper comprehension (as described
before). For example, if we are just interested only in the information extraction, we can
apply techniques as Named entity recognition (NER) or use a specific corpus as a base
to extract all the relevant information we are looking for. In this case, a grammatical
and syntactical analysis is the basic initial task to be done. In fact, for every shallow or
deep understanding seems to be necessary these analyses as the best way to first
detect and discriminate the relevant words or minimum pieces we are going to use
when move towards deeper studies, such as sentiment analysis or co-reference
resolution.
All these previous analyses are also basic for NLU. Before any possible understanding
process, we will need to extract the smallest parts of the text and to find their basic
lexical, syntactic and morphological properties. Following the description from
[Indurkhya et al., 2010] (Chapter 1.2) with the different NLP steps, the task involving a
shallow understanding of the text should include:
1. Tokenization: A first step where all the words or symbols are divided in tokens and
sentence segmentation plus phrase structure is detected.
2. Lexical Analysis: The morphology for every token is detected. The associated
lemma will be de result of the analysis.
3. Syntactic Analysis: Every token/word is analyzed as a string of symbols and

converted into lexemes and Part-Of-Speech. This is the basic parsing process and
usually ends with a parse tree describing the syntactic relations among words.
4. Semantic Analysis: At this point, the analysis will try to find the meaning of the
words by considering the rest of tokens and the text context. The understanding
25
process starts here. It can be considered nearer to a deep understanding process,
but it also will require some kind of NLP.
The shallow understanding start with these analyses, but continue with different tools such
as NER, knowledge databases, ontologies and corpus, with the goal of complete the
information about the text.
We expect for a shallow understanding system being able to answer questions where the
answer can be found in the text: the color of a table, the age of a child or where the
Johnson family lives. In [Indurkhya et al., 2010] there is a complete relation of different
approaches for finding these answers, by way of statistical or automated methods. The
key difference from this system is that we cannot expect to find information not stated
in the text. For example, we could know the age of a child (as commented before), but
the system probably does not know that this child cannot use the elevator due his age.
This kind of knowledge is the goal of a Deep NLU system.
4.3. Natural Language Understanding: A Deeper comprehension
When going deeper in the understanding process, the task becomes harder. A full text
understanding means that the computer could answer any question about it like a
human. However, this problem is far from be solved. In fact, while shallow
understanding solutions have been frequently presented, there have been few
advances with deeper comprehension. Also, due the difficulty and slow advanced,
there are not enough researchers working on it. This state of the problem must not be a
surprise because it implies a deep knowledge about the world. Humans need several
years of living experience to start doing so.
There have been several interesting works about this matter, many of them during 70’s
and 80’s, such as the proposal from [Shank et al., 1977] and others introduced in
[Mueller, 2002]. The main idea from these works is the search of methods to represent
the world and their relations. Also, it looks for the feasible paths to match the texts with
these representations. Scripts from Shank are interesting examples where the possible
behavior inside common situations is described by way of domain templates. If the
computer is able to match the domain of the text with a script and nothing special
happens (something not considered in the script template), it will be possible to answer
many questions about the related story.
Closer to our work, we can find the proposals from [Mueller, 2003b] and [Mueller,
2004a]. Both works share the use of event calculus to represent what is happening in
26
the texts. Again, the author tries to represent somehow the story and the knowledge
about the world with event calculus predicates.
One of the most frequently methods used to demonstrate understanding has been the
Question Answering developments (see [Indurkhya et al., 2010], Chapter 20). We do not
expect a description from the computer about the text. Only we will ask for answers
about some questions related with the text. One of the greatest advances in latest
years is Watson from IBM (see http://www.ibm.com/smarterplanet/us/en/ibmwatson/), a
system that has demonstrated a powerful capacity for answering many kind of more or
less complex questions by way of different developments dealing with massive data.
But Watson would probably fail in many cases of special QA tests, such as the
Winograd Schema Challenge.
4.4. The Winograd Schema Challenge
In 1950, Alan Turing proposed a first test that would demonstrate somehow if machines
were able to think or not. That is, they can perform a human equivalent of intelligent
behavior. The main idea was to compare the machine reactions (answers to different
questions during a natural language conversation) with those that we could expect from
a human. If a human evaluator could not distinguish the machine from a human, it will
be considered passing the test.
[Levesque et al., 2012] introduced in 2012 an alternative proposal to the Turing test.
The reasons behind it were mainly: Passing the Turing test can be seen many times as
some kind of imitation show, where the machine demonstrate that it can returns answer
similar to humans. However, this does not necessary demonstrate an intelligent
behavior. For example, in the Chinese Room Experiment, a machine uses all the
knowledge available to answer every question by matching concepts and ideas, but
without using any understanding process. A second argument exposed by Levesque is
that answering many questions requires the machine to lie about a false live (we can
remember the film Blade Runner, by Ridley Scott). If we ask the machine about their
parents, it has the option of answer based on an invented past or answer the truth, as it
has not any parent (we do not enter in philosophical question about if the programmers
or designers of a machine can be considered its parents).
Levesque proposed a different approach to the same problem, where there the
machine does not need to look like a human behind a curtain. It just needs to
demonstrate that it can carry out intelligent reasoning to answer questions that would
27
seem simple or easy to a human actor. The key point is that answering requires a deep
knowledge about the world related with the question and the application of reasoning
processes with information not directly related with the natural language sentences
used as input.
4.4.1. Description of the Winograd Schema Challenge
The Winograd Schema Challenge (WSC) is a variant from the previous Recognizing
Textual Entailment (RTE) challenge (([Dagan et al. 2006]), where a machine is required
to answer a binary (Yes/No) English question. The question is composed of two
sentences, where the first one (A) entails the second one (B). For example, (from
Dagan description):
A: Time Warner is the world’s largest media and internet company.

B: Time Warner is the world’s largest company.
In this case, the answer should be logically “No”. It requires some background
knowledge (a rule defining that having some property in a closer domain does not imply
having it in general). The advantage about this challenge is that it cannot be solved by
rounding or evading the question. The implication of the machine must be complete.
However, maybe the challenge is not enough difficult when comparing with the Turing
test.
The WSC proposal maintains the challenge of answering binary questions requiring
some kind of reasoning, but removing the entailment condition (it is not necessary the
presence if this relation) and adding a harder relation to be solved: a complex case of
co-reference resolution.
Every Winograd Schema [Levesque et al., 2012]) is composed of a sentence (or group
of them) describing some facts, and a binary question about it. Following there are two
examples:
 The trophy doesn’t fit in the brown suitcase because it’s too big/small.
What is too big/small?
Answer 0: the trophy
Answer 1: the suitcase
 The firemen arrive before/after the police, because they were coming from so far away.
Who was coming from so far away?
28
Answer 0: the firemen
Answer 1: the police
As we can see, both sentences do not seem to be difficult questions and they can be
solved easily by applying some commonsense. And this is what authors are looking for.
There is a special word (with underscore in the examples) and an alternative version of
it. We will see its meaning, but, as we can see, selecting answer 0 or 1 depends on
whether we select this special word or its alternative.
The set of the sentence, the question and the two possible answers are defined as a
Winograd Schema. Following are listed the four conditions about this input schema:
1. The sentence must include two agents or parties described by a noun phrase. They
must not be distinguishable by gender or number for example. The two possible
answers refer to both parties.
2. There is a pronoun or possessive adjective referring to one of the previous parties.
3. The question asks for selecting one of the two offered answers.
4. There is a special word (it is underscored in the previous examples). If we replace
this word with its alternative version, the selected answer changes.
Taking the first example, we can identify the two parties, “trophy” and “suitcase” (they
appears as the two possible answers 0 and 1). In addition, there is a pronoun, “it”,
referring to one of the parties. There is a question asking for one of the two answers.
Finally, the special word “big” and its alternative “small”. If we use “big”, the answer will
be 0, “the trophy”. However, if we change to the alternative version of the sentence
by using “small”, the logical answer will be 1, “the suitcase”.
There is no limitation about the domain or theme of the sentences. Schemas must
accomplish these four conditions. But there are additional background characteristics
about the challenges that must be also considered (Levesque et al., 2012]):
 Answering must be an easy task for a human reader, usually by applying some
commonsense reasoning (it could be so obvious to humans that the ambiguity or
difficulty could be imperceptible)
 It must not be easily solved with simple techniques. It must be necessary applying
some kind of “intelligent” reasoning. In general, it will require some additional
knowledge not declared in the schema.
 It must be Google-proof. That is, possible statistical correlations must not help in
solving the disambiguation problem.
29
Passing the challenge implies answering a complete Winograd schemas corpus. There
is no fix path for researchers to solve the problem, but the WSC reduces the chances
for statistical approaches, as they seem to be less powerful when addressing this kind
of problem. It does not mean that it must be discarded, but maybe other paths using
commonsense knowledge bases could be more near to success. As schemas are not
limited, this knowledge should include information about space, time, emotions,
physical rules, social relations, and so on. In addition, this other approach does not
suffer so much the lack of information that is obvious when the starting points are these
small sentences.
4.4.2. Defining the Winograd Schema Challenge Corpus
Probably, once we have defined the challenge, one of the more difficult tasks consists
in the creation of an enough big, well-defined corpus of Winograd schemas. In the first
document by Levesque are included 37 schemas and the authors included links to a
more complete corpus with more than 140 schemas. But how must these small reading
comprehension tests?
Levesque described some situations under which the schemas are not valid. Most of
the examples show that the sentences include information that help in unbalance the
test to one of the answers, or there is not symmetry between the agents or parties
when considering the actions described in the sentence. We will see both situations in
the following examples:
 The racecar arrived to the meeting point before the bus, because it was faster.
There is a straightforward relation between faster and a racecar. Especially, if we
compare it with a bus.
 The mother gave birth to a beautiful daughter. She was a happy woman.
There is a direct relation between mother and woman and we can consider an opposite
relation with the daughter.
 The lion ate a zebra, because they were hungry

In the sentence, there are two pitfalls. First, there is a direct relation between the verb
to eat and the condition being hungry. Second, lions are predators and usually eat
herbivores, such as zebras. Of course, this is not in the sentence, but both questions
are easy to match in Google.
30
There are other situations to be considered. We cannot use schemas where a human
reader does not easily answer the question. If ambiguity persists and the answer is not
clear, the schema must be discarded. As these problems can appear without the
knowledge of the author, it seems to be interesting to do a first checking by human
readers, as proposed in [Bender, 2015]. Only a well-scored corpus should be suitable
for being used in the challenge.
4.4.3. Different approaches to the Winograd Schema Challenge
Since the WSC introduction, several authors have proposed mainly partial solutions to
the problem. By partial we understand that, at present, there is not a guaranteed
complete solution covering the general Winograd corpus. Most of these approaches
present the challenge as a hard co-reference resolution problem. That is, they try to
find enough clues to resolve the lost relation between the pronoun and one of the
subjects. The goal is to find some element, background knowledge or data able to tilt
the balance toward one of the options.
In some cases, the proposals incorporate somehow the use of knowledge sources in
combination with a pronoun resolver system. For example, in [Rahman et al., 2012],
the authors propose a machine learning ranking-based framework. It includes up to
eight linguistic features, defined as components, covering many different ways of
searching an answer. The goal is to train the ranker system with a solved set of
sentences and learn the response for each of the components. Then, the ranker can be
applied to new test instances. It will return as solution the higher-ranked option.
The technique is supported by FrameNet [Baker et al., 1998] in one of the components
as a unique commonsense knowledge input. They add also a general knowledge about
antecedent dependencies between words. However, most of the features are based on
automated learning and search, such as the trained sentiment analysis component. It
does not seem to approach to the problem by way of the commonsense reasoning, but
the results presented exceed the 73% when using a general WSC corpus, much better
results than most of the current approaches.
A different approach is presented in [Sharma et al., 2015]. The authors designed a

semantic parser (K-parser) able to consider not only the lexical dependencies, but also
several semantic relationships. It is supported by a knowledge hunter based on
searching over “Google”. By using an adapted version of the sentence (string query),
the system will try to find texts related with the schema under study. The parser is able
to work only with sentences having specific relationships among events and their
31
causalities. This causal relation is needed to solve a question where the query is
created from an ordered version of the sentence. The most difficult schemas are those
where this relation does not exist, because only with a previous knowledge about the
concept under study, it will be possible to sl0kp’olve them.
Other proposals are centered in the knowledge representation. For example, [Shuller,
2014] bases its work Relevance Theory to define the evaluation part over knowledge
graphs. The input is transformed into a detailed knowledge graph and them is
compared with other background ones to select the correct answer. In [Bova et al.,
2015] the schemas are translated into First Order Logic relations and the resulting
representation is used as a query to ConcepNet semantic network ([Liu et al., 2004]).
The work presented in [Peng et al., 2015] highlights the huge amount of background
knowledge needed in the co-reference resolution process. They try to find this
knowledge from multiple resources, such as, Wikipedia, Web Queries, or Gigaword
corpus. The use of the proposed concept of Predicate schemas to reduce the problem
with the relevant relations to be studied. By replacing the pronoun inside these
predicates with every possible answer, the system will search the most probable
solution according the background knowledge.
As we see in next chapter, our approach eludes to aboard the co-reference directly.
We will try to model by way of logical rules representing a reduced world where the
action takes place. The relations among subjects (agents) and the pronoun will be a
result from these models. Additionally, the background information inside the schema
will define the behavior of the agents inside the represented world.
32
PART III: PROPOSAL
33
5. PROBLEM APPROACH
5.1. Introduction
As mentioned before, the Winograd Schema Challenge (WSC) consist basically in

answering a question about a sentence by solving an anaphora problem. Every
sentence describes some events and actions involving always two agents. Both
appears in this sentence as part of noun phrases and one of them is additionally
represented by a pronoun or possessive adjective (we will refer only as pronoun to
allow an easier lecture). To solve the anaphora problem, we must find which of these
two subjects is also referenced by the pronoun.
This simple Question Answering test proposed by [Levesque et al., 2012] is not
interested in distinguish a human from a machine through an intelligence evaluation. Its
main goal is to verify if a machine is able to demonstrate enough intelligence to reason
like a human (or simulate it).
A first logical approach to the problem is trying to find the linguistics relations and
statistical associations between the agents and the pronoun, but if the sentence follows
the Challenge rules, it will fail in finding the right answer [Levesque, 2014]. Statistical
and brute force approaches seem to be an unsuitable option or, at least, when they are
used alone without additional knowledge.
The main relations and associations needed to solve the problem are not included in
the sentence and we cannot find them without a basic knowledge of the small part of
the world we are involved in. If this knowledge cannot be extracted from the sentence,
then the main question is where we can extract it and how we can use to give the right
answer to the problem.
5.2. Solving the WSC with a Model-Based System
We have seen the main difficulties concerning the WSC problem and the importance of
the commonsense reasoning to overcome them (see Chapter 4.4). We have also
described why it is necessary for this challenge a specific knowledge about the world
we are trying to understand. Our proposal is based on constructing this knowledge
from two different sources of information:
34
First, we will consider that every WSC sentence can be related somehow with a
domain. This domain represents a small closed part of the world where we can perform
all the actions described in the sentence. Every domain must be described with all the
general logical rules defining and limiting the behavior of every subject or agent living
inside this reduced world.
Second, we must declare the domain properties of every agent involved in the
sentence. That is, according the scope of the domain, we will search from the sentence
how the agents are related with the small world. It can be considered as the specific
properties that make different each sentence.
The idea of describe somehow a reduced world have been used in many situations
when trying to understand natural language stories. For example, in [Shank et al, 1977]
the authors proposed the definition of scripts describing sequentially the main parts of a
story. Also Mueller proposed in [Mueller, 2003b] and [Mueller, 2004a] many of these
concepts to understand stories by way of scripts and models.
With both, a set of domain rules and the specific properties we will construct models
representing what is happening in the sentence. The challenge consists of finding
which of the two agents from the Winograd schema is also described by the pronoun.
So, we will create two different models and we will substitute this pronoun with one of
the agents. If the rules and the properties from every agent are correctly defined, only
one of these models will be valid, and the agent used will be the right answer.
Table 1 includes several schema examples from [Levesque et al, 2012] with a
extraction of the agents, the relevant information including the agent properties and
several symbolic rules that could explain what is happening in the sentences.
The domain rules are not limited to a specific sentence. They must be general enough
to be applied to every sentence inside the same domain.
The agent state and properties must be coherent with the selected domain. If the
sentence essence is related with space and time, probably we are not interested in the
name of the agents or how they are dressed.
35
Winograd Schema Domain / Essence
Agents + Pronoun Agent states & properties General Rules
John couldn’t see the stage with Billy in front of him because he
Do something according position
is so [short/tall]. Who is so [short/tall]?
 John can’t see (the stage)  If Agent1 [short/tall] & Agent2
John, Billy + He  Billy is in from of John is in front of Agent1 => Agent1
 He is [short/tall] [can/can’t] see objects
The sack of potatoes had been placed [above/below] the bag of
Do something according position
flour, so it had to be moved first. What had to be moved first?
 Sack & Bag placed at same point  Agent1 placed after Agent2 =>
Sack of potatoes,
 Sack [above/below] Bag Agent1 placed above & and it
Bag of flour+ It
 It had to be moved first can be moved first
The firemen arrived [after/before] the police because they were
Movement of agents
coming from so far away. Who were coming form so far away?
Firemen, Police +  Firemen arrived [after/before]  Distance is directly
They  They are Far Away proportional to Time
The trophy does not fit in the brown suitcase because it’s
Spatial relation between agents
too big. What is too big?
 Agent1 fit Ageny2 =>
 Trophy does not fit in Suitcase
Trophy, Suitcase + It Agent1 Size < Agent2 Size
 It is too big
 Too Big => Greater Size than…
Table 1. Sentence – Agents – Agent states & properties – Rules examples.
It is important also to consider the pronoun as a third agent. At the beginning, we do

not know which of the agents is also the pronoun, but all its related states and
properties will be a key aspect in the understanding process. In each of the two
models, we will replace the pronoun by one of the agents and all the pronoun
properties will pass to the agent. For example, in the last example from the table:
 The first model will consider the Trophy agent as “too big” (like the pronoun).
 The second model will consider the Suitcase agent as “too big”.
 According the agent properties and the domain rules, if we substitute the pronoun
with Trophy, all the rules are correct:
not (Trophy fit Suitcase) => not (Trophy Size < Suitcase Size) ≡ Trophy Size >= Suitcase Size
Trophy Too big ≡ Trophy Size > Suitcase Size
 But if we substitute the pronoun with Suitcase, we will obtain two inequality
equations with contradictory results:
not (Trophy fit Suitcase) => not (Trophy Size < Suitcase Size) ≡ Trophy Size >= Suitcase Size
Suitcase Too big ≡ Suitcase Size > Trophy Size
36
Figure 4 resumes how can we perform all the related tasks. The starting point will be a
Winograd Schema as question input and the resulting output will be the answer about
the question:
Schema
Question
INFORMATION
EXTRACTION
Information
MODEL
GENERATION
Models
MODEL
EVALUATION
Answer
Figure 4. Basic schema of a Model-Based WSC Solver
The first part is an Information Extraction process, where relevant information

existing in the schema will be extracted. Using this information, the system will select a
domain representing the essence of the schema.
According this domain, the next step will be the Model Generation, where two different
logic Models are defined, one for each possible solution. The models will represent the
same story, but assuming a different agent to be also the pronoun.
The resulting models must be coherent with the data from the schema and the
knowledge (and rules) from the selected domain. If this knowledge and data are
37
complete enough, only one of these models will be logically valid. The agent selected
for this valid model will be the answer to the problem.
In 4.4.3 we have reviewed several approaches to solve the WSC problem. We have
introduced several proposals also supported by a relevant data extraction process:
[Rahman et al., 2012], [Schuller, 2014] or [Sharma, 2015]). Others include also a logic
representation of events and data from the schemas. For example, in [Bailey et al.,
2015], [Sharma, 2015] and [Bova et al., 2015].
Most of these approaches try to represent somehow the sentence from the schema.
The main difference from our proposal is that we will not try to create this
representation. The structure of the sentence and the narration items will be discarded
after the information extraction process. We will just keep the relevant information for
every participant (agents) and how this information is related with the selected domain.
The understanding effort will rest in the rules set from the domain.
This solution reduces the number of different knowledge needed to solve the problem
by focusing only in what is relevant for the understanding. But the definition of domain
dependent models requires a bigger manual effort to define them. We need to
distinguish which information is relevant or not to select the domain and properties of
the agents. This is only possible with a very accurate processing of the sentence.
Next chapter describes how we have solved these issues by modeling and
implementing a Winograd Schema Challenge Domain-Based Solver.
38
6. ADDRESING THE WSC WITH A MODEL-BASED SYSTEM
6.1. Introduction
To understand and solve the WSC we propose a Model-Based System. This system
will perform a three steps process described in more detail in Figure 5.
Schema
Question
DOMAIN
INFORMATION
KNOWLEDGE
EXTRACTION
DATABASE
Information
DOMAIN + AGENT PROPERTIES
DOMAIN
MODEL
MODEL
GENERATION
DATABASE
Model A Model B
MODEL
EVALUATION
Answer
Figure 5. Different steps performed by the Model-Based WSC System
First, we must implement a full Information Extraction process from the schema. In
this step will be defined the domain and properties for each agent. Only the relevant
information inside the domain will be considered. A background Domain Knowledge
Database will support the extraction process. It will contain all the possible domains
and the words and relations that can be useful for every specific domain. By detecting
39
different database entries inside the sentences, the system will be able to match the
correct domain and select the relevant information.
Next, the Model Generation will configure two models representing what is happening
in the schema. It will use an external Domain Model Database including the set of
rules for every domain. In this step the system will fit the states and properties of the
agents inside a set rules defined according the Domain.
Finally, these models will be evaluated to find if there is a unique model logically valid.
If so, the agent associated with that model will be the answer to the WSC problem. This
task will be performed will by the Model Evaluation process.
6.2. Relevant Information for the understanding process
Every model will have three main parts:
 A set of general logical rules explaining the small world or domain detailed in the
sentence. These rules control the state and behavior of the agents and are identical
for every sentence classified in the same domain.
 A second set of logical rules adding the special states and properties of both agents
for the specific sentence. These states and properties will be represented by a set
of agent Parameters.
 The last set of rules will be different for each model because it will include the
pronoun information. We will create two different sets by considering the pronoun to
be one of the agents
Therefore, the model generation will require enough information to fulfill the following
information:
 Domain: To select the set of rules.

 Agents: To detect the parameters inside the schema.
 Properties: A detailed information about the states and properties of all the agents
(including the pronoun) inside the schema.
The following Winograd schema will help us during this explanation (seventh schema
from the corpus listed in [Levesque et al., 2012]):
“The firemen arrived [after/before] the police because they were coming from so far away”
Question: Who was coming from far away?
Answers: The Firemen / The Police.
40
Matching the right Domain will allow the selection of the right rules we need to
understand what is happening. In the example, the Domain could be Movement of
Two Agents to a Meeting Point and it does not need detailed information about when
the agents did the movement, where they were coming or how did the movement.
Once we have selected the Domain, the next step will be identifying the Agents. Let’s
denote both agents of the sentence as AgentX and AgentY. We will also declare a third
agent to represent the pronoun as AgentQ. In the example, the agents are “The
Firemen”, “The Police” and the pronoun “They” respectively. Solving the problem
means finding the right equality between these two options:
AgentQ = AgentX | AgentQ = AgentY
Our last definition will refer to every word related with the selected Domain and
describing the states and properties of the agents. We will denote these selected words
for every agent as Properties: DataX, DataY and DataQ respectively. The sentence
does not give too much information about the agents, but it is enough to solve the
problem, as we will see later.
There are two relevant words in the sentence: “later” (or “before) and “far away”. They
can be considered as part of DataX and DataQ respectively. In fact, “far away” is also
related with one of the agents, X or Y, but this cannot be certainly associated before
understanding the sentence. At this point, we can only ensure that AgentQ is “far away”
from somewhere.
Figure 6 represents all this relevant information about the sentence: Domain, Agents
and Properties. These elements will be the pillars for the generation of the models that
will be used in the understanding process.
Figure 6. Domain, Agents and Parameters in the example.
41
Now, the question is how can we extract this information from a Winograd schema and
use how can we use it in the understanding process.
6.3. Domain, Agents and Parameters extraction
The understanding process of a Winograd schema requires the use of two kind of
knowledge. The first one is related with all the information included in the schema and
the second one is all the information not declared in it. The Information Extraction task
is related with the first kind of knowledge. We will see later how the second one can be
obtained in the Model Generation process. Figure 7 represents the schema of this
Information Extraction process. The input will be the Schema Question and the result
will be the Domain, Agents and their Properties.
Schema Question
SENTENCE, QUESTION, ANSWER
NATURAL LANGUAGE
PROCESSING
Annotated Schema
LEMMA, POS, DEPENDENCIES,…
DOMAIN + AGENTS
SELECTION
DOMAIN
Domain Agents KNOWLEDGE
DATABASE
PROPERTIES
SELECTION
Properties
Figure 7. Information Extraction process
42
Every schema includes three differentiated parts: (I) the sentence, (II) a question and
(III) two possible answers. The sentence and the question will be used to extract the
Domain and the Parameters and the answers will give us directly the Agents involved
in the sentence.
First, it seems obvious that the schema must be processed to distinguish all the parts
or tokens from the sentence and question, and create a set of all these words with their
properties and dependencies among them. In fact, we will use only these
dependencies, the lemma and POS properties (see Chapter 4.2).
Probably, the understanding process does not need all the words from the annotated
schema. For example, if the question is about who arrived before/later to some place,
any word explaining how the agents were dressed, or how the relationship between
them seem to be useless.
As it is represented in Figure 7, the Domain, Agents data selection will be selected by

adding the Domain Knowledge Database. Then, the Properties data selection will result
from the previous Domain and Agents values, also supported by the Database.
Following there is a more detailed explanation of the extraction process.
6.3.1. Agent Selection from a Schema answers
In every schema, the included answers represent directly the two agent that could be
related with the pronoun. So, there is not additional work to obtain the Agents from it. In
the example from Figure 6, the two possible answers to the question “Who was coming
from far away?” are: (I) The Firemen and (II) The Police. Obviously both are the Agents
we are looking for.
As we have noted before, there is a third agent, AgentQ, representing the pronoun.
Selecting the agents means also find this word. I can be done by searching, among the
POS values, all the pronouns in the sentence. Usually it will be found only one word,
but if there are more options, it will be necessary discriminate the right one. In these
cases, the method that we are going to apply will be a deeper study of the question.
This question will have an interrogation pronoun like Who, What, Which, etc. The
action or verb related with this pronoun will be frequently the same that the verb related
with the searched pronoun. An erroneous or incomplete agent selection will lead
certainly to failure in the answer.
43
6.3.2. Domain Selection by detecting the main actions of the sentence
The Domain selection is a much harder question. In addition, an error in this phase will
conduce definitely to a wrong (or random) solution. The method we are proposing to
solve it is trying to discover the essence of the schema from the main verbs and other
special words from the sentence and question.
From Cambridge Dictionary (http://dictionary.cambridge.org/es/), a verb is “a word or

phrase that describes an action, condition, or experience” and the main verbs “have
meanings related to actions, events and states”. If we identify the main verbs, we will
be close to discover what is happening in the sentence. At this moment, we are not
interested in the details from the sentence, so it is not necessary to understand, for
example, the relations among these actions and the agents.
Also, the rest of words can adjust or confirm the previous choice. For example, words
like “short” or “tall” are related with spatial dimensions. If there is a verb such as “to
place” or “to see”, the selection of a Domain like “Space Position” will be reinforced by
those words. This extra information is especially useful when the verbs are more
ambiguous or general, like “to be” or “to have” (when they are not used as auxiliary
verbs). Paying attention in the especial word defined in the schema and its alternative
word is very important. Usually, both words will be relevant to understand the text, and
therefore, to find the correct Domain. Table 2 includes again several examples from
[Levesque et al, 2012] with the main verbs, the especial words and a Domain proposal.
Sentence + Question Main Verb Domain

John couldn’t see the stage with Billy in front of him See + Be + See something according
because he is so [short/tall]. Who is so [short/tall]? Short/Tall agent position
The sack of potatoes had been placed [above/below] the
Place + Move Move something
bag of flour, so it had to be moved first.
First according agents position
What had to be moved first?
Jane knocked on Susan’s door, but she didn’t [answer/get Knock + Door Communication between
an answer]. Who didn’t [answer/get an answer]? + Answer two agents
Ann asked Mary what time the library closes, [but/ Ask + Forget Communication between
because] she had forgotten. Who had forgotten? But/Because two agents.
The firemen arrived [after/before] the police because they
Arrive + Movement of agents to a
were coming from so far away. Who were coming form
Come meeting point
so far away?
Jim [yelled at/comforted] Kevin because he was so upset. Yelled at + Emotional action from
Who was upset? Comforted one agent to another
The trophy does not fit in the brown suitcase because it’s Fit + Spatial relation between
too [big/small. What is too [big/small]? Big/Small two agents
Table 2. Schema – Domain Association database entries.
44
In many cases, there will be several main verbs. The Domain selection will be
according this combination, but not all the situations will lead to the same choice. For
example, for the combination in the second case, Place + Move, the main action is Move
and the verb Place condition the movement of the agents. Let us change the last part
of this sentence replacing “so it had to be moved first” with “so I couldn’t see it”. In the first
case, the position affect in which agent can be moved first, and in the second one, the
position will define which agent is visible to an observer.
The multiple branches created by the combination of different verbs can be solved by
creating more general models. It can be said that in both cases the position of each
agent is the main fact. So, it will be possible to move one before the other or it will be
visible or not.
Other examples include redundant information. For example, the pairs Arrive + Come
can be perfectly resumed with the first verb. Also we will find very special situations
such us the described in the sixth schema. Main verbs, Yelled at + Comforted, are
related with human behaviors describing emotional actions from one agent to another.
In this case, only with the combination of an agent state, was upset, can be selected a
coherent Domain.
It has not been necessary to present many examples to verify the difficulty of this task.
Every possible verb and its different combinations should be done to cover all the
schemas. We will see later the need to narrow the problem and focus in those
schemas covered by our system. By increasing the number of models and making
them more general, we will spread the number of possible solutions.
The Domain selection starts with the selection of all the verbs by way of the POS
results. The resulting Verbs List will be crosschecked with the entries of the Domain
Knowledge Database, populated with Verb Combinations, special related words and
the corresponding domains. The result will be (I) none, (II) one or (III) several values of
Domain candidates.
Only when there are several options, the system will work in which value is the best for
the schema. It will be done by measuring how close is the sentence to every option. I
will be done by defining a proximity value to each domain for every selected word. That
is, we will quantify the sentence and select the domain with the biggest weight. At the
end, we will have only one possible value, but the risk of failure increases with the
number of initial candidates.
45
6.3.3. Selection of relevant Information about the Agents
When the Domain have been selected, the system is ready to make a first relevant
words list from the Domain Knowledge Database. This database is the same as the
described in the previous point and it will include all the words that could add
information for every Domain. The more possible options considered in the database,
the wider will be the covered knowledge.
For example, if we are interested in Movement, the list should include words such as
near, far, distance, longer, faster, slowly, etc. A different selection, Position, should
populate the list with words like under, over, behind or aside. These are what we call
“relevant” words for a specific Domain. This database can be internal or external, and
can be complemented with additional elements like ontologies, synonyms and lexical
databases or any other resource able to help in the relevant word selection process.
The next step will be finding all the matches between the sentence and the relevant
words list. This will generate a second list with all the candidates to be incorporated to
the Properties set.
Finally, by way of a more detailed study of all the references from these candidate
words, we will check which of them are related with the agents and the main actions
previously selected. This relation can be found by way of the dependency information.
According the kind of existing dependencies, the candidate words will be confirmed
and will be associated with its corresponding agent. The result will be a final list of the
Agents Properties. Every word will qualify the agent somehow and will allow the system
to describe the behavior of the agents inside the logic models we want to construct.
Usually, the words will introduce a comparison between agents. So, we could use
symbols like “+” and “-“ as qualifiers. For example, the word “later” could be associated
with “+” and “before” with “-“. This value can change in function of the Domain or can
be the same for every situation. Also, some words will help to identify the states of the
agents. In this case we can declare directly this state: waiting, visible, happy and so on.
The interpretation of these values will be done later during the model generation. There
are also defined several words acting as modifiers. They can revert or increase the
value of the affected word. For example, “very” will convert the qualifier for “Big” into
“++”. In addition, a negation would revert the qualifier value. The expression “Not
before” will return as result a “+” instead a “-“. All the intelligence applied for this
process will be a key aspect for a good agent Parameters definition.
46
Table 3 Represents several database entries with the Domain-word-value association:
Domain Type Word Qualifier

General Time Before -
General Time After +
General Position Behind -
General Position In Front +
Fit Position Small -
Fit Position Big +
See Position Under -
See Position Over +
Move Distance Near +
Finish Task Modifier Near +
Move Distance Far away ++
Table 3. Relevant Word list with Domain, Type and Qualifier.
6.4. Model description by way of the Event Calculus
The model must be expressed in a logical language to allow its evaluation with a Logic
Solver. The option we have selected is the Discrete Event Calculus (DEC) language
due its powerful representation of events and their effects on the agents by way of
fluents ([Shanahan, 1999] and [Mueller, 2006]).
As we have described before (see Chapter 3.4), the Event Calculus uses a many-
sorted first-order logic, where there are defined the following sorts (Resume from
Chapter 3.4):
 Event sort (e1, e2, …), Fluent sort (f1, f2, …), Timepoint sort (t1, t2, …)
And the predicates:
 Happens(e, t), HoldsAt(f, t), ReleasedAt(f, t)

 Initiates(e, f, t), Terminates(e, f, t), Releases(e, f, t)
 Trajectory(f1, t1, f2, t2), AntiTrajectory(f1, t1, f2, t2).
47
DEC restricts the time point sort to integer values. As we will see later, this restriction
does not affect the description capabilities needed. Annex A includes a detailed
description of the twelve axioms and definitions defined for DEC.
A valid model for a Winograd schema must include the rules related with the
represented world and a temporal narrative of the states and events described in the
schema. The first part is the result of the Domain selection and it will be denoted as
General predicates. The second part is obtained from the agent Properties and it will
be denoted as Narrative predicates. Finally, to relate both predicate groups we must
include all the agents from the schema.
We will use the following Winograd schema to explain both parts of the model:
“John was waiting for Peter because he was [late/early].”

Question: Who was [late/early]?
Answers: John / Peter.
This is the information we have from the sentence:
 The Domain for this new schema could be: “Two agents arrive to a meeting
point”. Also a more general choice can be done as we will see in next chapters:
“Agents Movement”, for example.
 The agents are “John” and “Peter”. The pronoun is “He”.
 John is waiting at the meeting point.
 One of the agent (He) is late/early. He arrives later/earlier than the other agent.
6.4.1. General predicates
The first part of the model must include all the new sorts, events and fluents. Its
expression in DEC is:
(1.1) The events of the model are:

Arrive(agent)
(1.2) The fluents or states for the agents are:

Moving(agent)
Waiting(agent)
AtMeetingPoint(agent)
48
Following these declarations are all the predicates defining the common rules of the
described world. Both groups of asserts, 1.x and 2.x, will be the General predicates:
(2.1) When an agent arrive he will stay at Meeting Point.

Initiates(Arrive(agent), AtMeetingPoint(agent), time)
(2.2) When an agent arrives, it will stop moving.

Terminates(Arrive(agent), Moving(agent), time)
(2.3) If an agent is waiting, he is not moving

HoldsAt(Waiting(agent), time) => ¬HoldsAt(Moving(agent), time)
(2.4) If an agent is not moving, he cannot arrive (he already did).

¬HoldsAt(Moving(agent), time) => ¬Happens(Arrive(agent), time)
(2.5) If an agent is moving, he is not at the Meeting Point.

HoldsAt(Moving(agent), time) => ¬HoldsAt(AtMeetingPoint(agent), time)
(2.6) If agent1 arrive and agent2 is not waiting, then agent1 will start waiting.
¬HoldsAt(Waiting(agent1), time) ^ agent1 ≠ agent2 =>
Initiates(Arrive(agent2), Waiting(agent2), time)
(2.7) If agent1 arrive and agent2 is waiting, then agent1 will stop waiting.
HoldsAt(Waiting(agent1), time) => Terminates(Arrive(agent2), Waiting(agent1), time)
6.4.2. Narrative predicates
The Narrative predicates, will include a temporal description of the events affecting
every agent. In this case, this narrative could be:
(3.1) At Time = 0: Both agents are moving:

HoldsAt(Moving(John), 0)
HoldsAt(Moving(Peter), 0)
(3.2) At Time = t1: John arrive to the meeting point and will start waiting Peter:
Happens(Arrive(John), t1)
HoldsAt(Waiting(John), t1 + 1)
HoldsAt(AtMeetingPoint(John), t1 + 1)
(3.3) At Time = t2 > t1: Peter arrive. John stop waiting:

Happens(Arrive(Peter), t2)
HoldsAt(AtMeetingPoint (Peter), t2 + 1)
¬HoldsAt(Waiting(John), t2 + 1)
49
The narrative has redundant information. For example, the fluents describing that an
agent is AtMeetingPoint are automatically deducted from 2.1 when the event Arrive
happens. Also, the fluent Waiting in 3.3 will be deducted from 2.7.
In fact, if we consider only the information stated in the sentence, we cannot confirm
who the agent that arrives before the other is. If we translate it to the previous notation
for the agents, we must describe the last part as follows (we will use the sentence with
the “late” option, so AgentQ will arrive later than the other one). In addition, we know
that John will be always in Waiting state:
(3.1) At Time = 0: Both agents are moving:

HoldsAt(Moving(AgentX=John), 0)
HoldsAt(Moving(AgentY=Peter), 0)
(3.2) At Time = t1: One agent arrives to the meeting point and will start waiting Peter:
Happens(Arrive(Not-AgentQ), t1)
HoldsAt(Waiting(AgentX=John), t1 + 1)
(3.3) At Time = t2 > t1: AgentQ arrive. John stop waiting:

Happens(Arrive(AgentQ), t2)
The only valid solution for this model is when AgentQ = AgentY = Peter. That is, the agent
expressed in 3.2 as Not-AgentQ must be AgentX due the 2.6 predicate. This is the only
way to be true the predicate: HoldsAt(Waiting(AgentY=John), t1 + 1), and this condition is
stated in the sentence: “John is Waiting…”. If AgentQ = AgentX = John, the predicates in
3.2 will fail in the verification process. Peter must necessarily be AgentQ: the answer.
This narrative is the key to find the right answer. When we transform the information
from the sentence in Event Calculus predicates, we can obtain two different models,
each one with a different value for AgentQ. But, only one of them will pass the test.
This is how we must define our model with Event Calculus. The selected Domain will
declare the 1.x, 2.x and 3.x predicates and it will be denoted as the Template for that
Domain. The Agents and their Properties will fulfill the 3.x predicates. Next chapter
explains how it can be accomplished and how the final models will be generated.
An important question we must have in account is that the templates must be done
manually, limiting the number of schemas that can be addressed by this method. In
fact, the greatest difficulty faced in this project is the creation of sufficient templates to
cover as many schemas as possible.
50
6.5. Model generation by using Domain, Agents and Parameters
Figure 8 describes how the model generation can be done for a given Winograd
schema. The Information Extraction, described in 6.3, will output the values for the
Domain, the Agents and their related Properties.
The Model Generation process will use this output to perform two main functions:
 Model Template Selection: Selecting a model template from a Domain Model

Database. It will set the Domain and Narrative predicates.
 Model Parameters: Fitting the model parameters according de Agents and their
Properties. These parameters are specific values for each schema.
Agents Properties Domain
DOMAIN
MODEL TEMPLATE
MODEL SETTING MODEL
SELECTION
DATABASE
Model Model Template

Parameters GENERAL + NARRATIVE
PREDICATES
MODEL
GENERATION
Model A Model B
Figure 8. Model Generation process
6.5.1. Model Template Selection from the Domain
The Model Template Selection from a given Domain is just a matching process from
a Domain value and the Domain Model Database. The Domain act as key value inside
this Database.
51
It is possible that the information extraction process fails to select a Domain for a given
schema. In this case, there is no solution to the problem and the model generation will
not be possible. But, if a Domain is selected, it means that there is a corresponding
template.
The template selected has all the predicates used in the model. The General
predicates are constant for every schema with the same Domain. Not all the
information extracted affects these predicates. However, the Narrative predicates need
to be completed with a set of parameters that are calculated from the Agents and their
Properties. As we have seen before in 6.3, the Domain not only allows the Model
Template selection. It defines what information from the schema could be useful for the
model generation. Figure 9 shows this issue. Each predicate can be parameterized by
those values needed to describe the events and fluents.
Schema Model
General predicates Narrative predicates
Predicate #1 Time = t1
Predicate #2 Predicate #1.1 (Agent X/Y/Q, Data X1.1/Y1.1/Q1.1)
… Predicate #1.2 (Agent X/Y/Q, Data X1.2/Y1.2/Q1.2)
Predicate #N …
Predicate #1.n (Agent X/Y/Q, Data X1.n/Y1.n/Q1.n)
…
Time = tm
…
Predicate #m.n (Agent X/Y/Q, Data Xm.n/Y m.n/Qm.n)
Agents Parameters
Figure 9. Model from the template predicates and agent Parameters
6.5.2. Model Setting and Model Parameters
Every template, beside the known predicates, includes an array of variables, which will
act as interface between these predicates and the relevant data from the schema. Only
the information matching with one of these variables can be added to the model, in a
process defined as Model Setting. All the variables will have a default value to be
used when there is not enough information to set its value. The array from Figure 9
would be as follow:
52
[AgentX, AgentY, AgentQ, DataX11 , DataY11 , DataQ11 … DataQ mn ]
Each Narrative predicate can have none, one or several of these variables. The model
setting process will try to solve all of them by using the extracted information. All the
variables not solved in this process will take a default value.
If the information extraction process obtains poor results, it will probably force the
system to use too many default values. In this case, the generated models could not be
complete enough to solve the problem.
Let’s use the model example from 6.3.3. (The case where the pronoun, or AgentQ,
arrive late). We are only interested in the latest predicates. To distinguish variables
from the rest of the static part of the predicates, they will be annotated starting with a
dollar symbol ($):
HoldsAt(Moving($AgentX), $StartTimeX)
HoldsAt(Moving($AgentY), $StartTimeY)
Happens(Arrive($AgentX), $EndTimeX)
HoldsAt(Waiting($AgentW), $EndTimeW + 1)
Happens(Arrive($AgentY), $EndTimeY)
The array for the model could be as follow:
[AgentX, AgentY, AgentW, StartTimeX, StartTimeY, EndTimeW]
These seven variables must be completed, but it cannot be done freely. There are
several rules to follow when doing so. Some of them just for convenience and others
with the goal of generating two different models (one for each agent, and declared as
model A and model B):
1. AgentX and AgentY will be selected by order of appearance in the sentence.
2. AgentW will be substituted by AgentX or by AgentY depending on the agent that is

waiting. The example set always this agent as: AgentW = AgentX. It also fixes the
equivalence for its related time: $EndTimeW = $EndTimeX.
3. Values related with AgentQ will take always values according the properties
observed for the pronoun. They will be the same in both models. For example, if
the AgentQ is “late”, we will define a greater value of $EndTimeX for model A and
$EndTimeY for model B.
53
Applying these rules in the example, the first model will have all the predicates 1.x and
2.x listed in chapter 6.3.3 and these additional Narrative predicates:
HoldsAt(Moving(John), $StartTimeX)
HoldsAt(Moving(Peter), $StartTimeY)
Happens(Arrive(John), $EndTimeX)
HoldsAt(Waiting(John), $EndTimeX + 1)
Happens(Arrive(Peter), $EndTimeY)
4. After working with these variables and completing the Narrative predicates, we
will have two models almost identical, but with enough differences to allow one
of them to pass a logical evaluation while the other one fails. The difference
between models will be the value comparison between $EndTimeX and $EndTimeY
variables.
By using the qualifiers described in 6.3.3 we can confirm two facts:
 AgentX = John state is Waiting during the model time lapse.

 Qualifier for $EndTimeQ is “+” due the relation between the pronoun and “late”.
From these values, we will define two arrays, one for each model. Some values will be
completed with the qualifiers and others will be default values. Next matrix includes the
resulting arrays (with all the default values inside parenthesis). First line will be the
variable names, the second and third lines will be the values for model A and B
respectively:
AgentX AgentY AgentQ StartTimeX StartTimeY EndTimeX EndTimeY

[ 𝐽𝑜ℎ𝑛 𝑃𝑒𝑡𝑒𝑟 𝐽𝑜ℎ𝑛 (𝑡0 ) (𝑡0 ) 𝒕𝟏 𝒕𝟐 ]
𝐽𝑜ℎ𝑛 𝑃𝑒𝑡𝑒𝑟 𝑃𝑒𝑡𝑒𝑟 (𝑡0 ) (𝑡0 ) 𝒕𝟐 𝒕𝟏
As we have not information about $StartTimeX or $StartTimeX, they will take a default
value t0. We also do not know the values for $EndTimeX or $EndTimeY, but we know that
AgentQ arrive later than the other. So, we can give a value t1 and t2 respectively with the
condition t1 +1 < t2. The resulting predicates for the first model are:
HoldsAt(Moving(John), t0)
HoldsAt(Moving(Peter), t0)
HoldsAt(Waiting(John), t1+1)
54
The second model will have these ones:
HoldsAt(Moving(John), t0)
HoldsAt(Moving(Peter), t0)
HoldsAt(Waiting(John), t1+1)
The key issue in this example is the predicate: HoldsAt(Waiting(John), t1+1). If we check
the 2.6 predicate, it is not possible for an agent to be Waiting before he Arrive. This
condition will make the second model to fail in a logic evaluation, while the first one will
pass. It does not matter the value of t0, t1 and t2. The only condition is that t0 < t1 +1 < t2.
6.6. Solving the WSC by way of a model
The last step in the understanding system is the Model Evaluation. After generating
both models, it is necessary to evaluate them and verify if they are logically valid. If the
generation process is done correctly, only one of them will pass this check. Figure 10
presents in more detail this last part. It consists of an Event Calculus Interpreter with an
embedded SAT Solver and an Answer Selector. The input will be two models, A and B,
that differs only in which is the agent substituting AgentQ.
Model A Model B
EVENT CALCULUS SAT

INTERPRETER SOLVER
SAT Results
ANSWER
SELECTOR
Answer
Figure 10. Model Evaluation process description
55
6.6.1. The Event Calculus Interpreter
A model is a list of predicates describing some events and their effect by way of the
Event Calculus logical language. The description or narration is not the main goal for
the system. It must be proved that there is a solution to the problem represented by the
models, and this can be done by way a SAT Solver. SAT is a common way to call a
Boolean Satisfiability Problem. So, a SAT Solver is a system capable to check if there
is a solution that satisfies the given SAT. That is, the formula is evaluated to True.
An Event Calculus Interpreter is a system able to transform a problem described with

Event Calculus to a SAT problem. It will receive as input a model and will convert it in a
Boolean formula. Then, the Solver will return a True value when the model is fully
satisfied or False if it fails in the evaluation.
The Interpreter will receive the results from the Solver and will be able to construct a
temporal description of the events and states according the Solver output. In fact, it
could be more than one description of events satisfying the SAT. This is not a problem
if the different interpretations are coherent with the situation we want to describe. Most
of the times, it happens when there is an incomplete description in the General
predicates (not all the possible states are covered) or there is not a fully definition of
the initial Properties for every agent. A possible output from the Interpreter for the
example from 6.3.3 would be:
t = 0. Both agents are moving.

Moving(John).
Moving(Peter).
t = 2. John arrives.
Happens(Arrive(John), 2).
t = 3. John stops moving and starts waiting (Peter) at the meeting point.
-Moving(John).
+AtMeetingPoint(John).
+Waiting(John).
t = 4. Peter arrives.
Happens(Arrive(Peter), 4).
t = 5. Peter stops moving and John stops waiting.

-Moving(Peter).
-Waiting(John).
+AtMeetingPoint(Peter).
56
6.6.2. Answer Selection and possible errors in the process
There are three possible results from the Interpreter:
1. The Solver fail in finding a valid solution for both model => ERROR.
2. The Solver find a valid solution for both model => ERROR.
3. The Solver returns a valid solution for only one model => CORRECT.
Only in the third case the system will be able to give an answer to the problem. In the
other cases, if the system does not find a valid model or both are valid, it means that
the system was not enough “intelligent” to understand the schema. Moreover, there are
many steps during the process where an error leads to a failure:
1. Parsing error: Bad POS or dependencies.

The natural language processing tool failed in the identification process.
2. Wrong main verb and relevant words selection.

The selected verb is not the main verb describing the essence.
More than one verb is needed.
3. Missing or incomplete Domain for the selected verb.

The main verb has no correspondence in the Knowledge Database.
The sentence cannot be understood with only one Domain.
4. Wrong Domain selection.

The main verb has no correspondence in the Knowledge Database.
The selection according the main verb and relevant words was wrong.
5. Incomplete or wrong parameters selection.

The selected words aren’t so special or relevant for the model.
Not all the needed words have been selected.
6. Bad parameters matching during the Model Setting.

One or more parameters does not correspond with the original narration.
7. Wrong model generation or model predicates conflict.

Several predicates are incoherent each other and lead to contradictory results.
8. SAT Solver execution failure.

Despite obtaining valid models, the Boolean formulas return a wrong True/False.
57
Most of the possible failure will arise during the Model Generation phase and they are
very difficult to prevent. The main cause will be an incomplete Knowledge Database at
entry level or the quality of these entries. That is, how the world is represented in the
system. Reducing the number of possible errors can be achieved by increasing the
knowledge or by reducing the number of domains covered with the system (creating
more general representations).
6.7. Model-based WSC Solver architecture
After describing the different steps of the proposed solution, it is necessary to resume
the architecture of the system.
The proposed system will be able to complete the task from start to finish. That is, it will
take a Winograd schema as input and will try to give an answer about the question.
The system allows the combination of external tools or knowledge databases to power
up its capacity. In fact, there are many interesting and powerful tools available for
fulfilling tasks like the mandatory natural language processing, the Event Calculus
interpretation or the SAT Solver step. This is thanks to these tasks use standard input
values, such as a text, Event Calculus predicates or Boolean problems respectively.
The selection of these tools can be done just with a better performance criterion.
Adapting the system to a different output format from the tools is a trivial work.
The Knowledge Database is more specific in the format, because it is composed of

very specific data entries: Domain names, Words and Qualifiers for these words (in
function of the Domain). The use of external knowledge can help in populating the
database with new information, but it is necessary to construct a module specially
designed to this function. This issue has not been covered by the system yet.
The Model Database uses standard predicates, so it is possible to reuse any set of
predicates modeling a specific domain. It will be necessary to adapt and complete this
sets to cover the requirements described in 6.6.
Figure 11 represents the Model-based WSC Solver, divided in the three main blocks
we have described before:
58
Schema Question
SENTENCE, QUESTION, ANSWER
NATURAL LANGUAGE
PROCESSING
Annotated Schema
LEMMA, POS, DEPENDENCIES,…
DOMAIN + AGENTS
SELECTION
DOMAIN
Agents Domain KNOWLEDGE
DATABASE
PROPERTIES
SELECTION
Properties
DOMAIN
MODEL TEMPLATE
MODEL SETTING MODEL
SELECTION
DATABASE
Model Model Template

Parameters GENERAL + NARRATIVE
PREDICATES
MODEL
GENERATION
Model A Model B
EVENT CALCULUS SAT

INTERPRETER SOLVER
SAT Results
ANSWER
SELECTOR
Answer
Figure 11. Model-based Winograd Schema Challenge Solver
59
6.8. Reducing the WSC problem to the Spatio-temporal domain
The most difficult problem in our proposal is finding the right Domain for every schema.
Behind each Domain must be a set of rules (predicates) describing what could happen
with the agents and how their states change depending on their behavior.
The task of creating all the possible the sets is unreachable. It could be comparable to
modeling every situation in the whole World. Also, the greater the number of situations
addressed, the greater the risk of a wrong understanding the information.
The solution we are proposing is to reduce the number of possible domains and try to
obtain a deeper understanding inside them. From here, it will be possible to increase
gradually the number of covered domains. The models can be transformed in a more
complex description of the world and the Knowledge Database can be completed with
additional entries.
As we need to compare two different situations, it seems that selecting realms easily
measurable will help in the task. Spatial and temporal domains should be good
candidates in this case. There are a great number of studies about Spatio-temporal
reasoning (Chapter 3.3). Space and time are related to each other and the words
associated with both domains are common and easy to detect in a text. In addition, it is
easy to do a qualitative comparison among words from the same domain.
When taking about space or time, each word represents a position in a coordinate
system. Figure 12 represents some of these systems with several words expressing
qualitative position inside them. Space and time can be represented by absolute or
relative values. When using relative values, we are comparing an agent coordinate with
a reference point (“down the building”, “near the door” or “after the meeting”), but also
with other agent (“the book below the box”, “the police after the firemen” or “the car
nearer the than the bus”).
If an agent is “near” to a meeting point and the other one is “far away”, it easy to infer
which is “closer” to that point. However, space and time has a more interesting
relationship. We can deduce that if both agents are moving to the meeting point, the
“nearer” one will “arrive before” the other due there is a physical relationship between
space and time. If we introduce speed in the equation, it can be deduced that a “faster”
agent will travel a “longer” distance than the other, or it will “arrive earlier” to the
meeting point. There is no dude about what is happening when an agent is “slower” or
is “closer” than other agent.
60
There are also indirect relations with other areas. For example, when putting a book
“over” other object, that object probably will not be visible to an observer. If we put a
box “after” other one (over it), we will need to remove the first box to take the second
one.
OUTSIDE INSIDE
EXTERIOR INTERIOR
EXTERNAL INTERNAL UP
OVER
ABOVE
LEFT RIGHT
WEST EAST
Position (x,y,z)
DOWN
UNDER
BELOW
SAME PLACE TOO CLOSE VERY NEAR NEAR FAR FAR AWAY
Distance (d)
TOO EARLY BEFORE ON TIME AFTER TOO LATE
Time (t)
Figure 12. Coordinate system for Position, Distance and Time
A second advantage of space and time is that all these extra relations can be added as
a supplement to our models. We do not need to change the rules (predicates) to define
when an agent is “waiting”. In fact, this is result of having an agent “arriving before” the
other or “being faster”. It is possible to include new knowledge without losing the
previous information. This adds a very interesting scalability property to the system.
Finally, it is also easier to define the qualifiers to the relevant words due they can be
placed in the coordinate systems (see Figure 12). That is, assigning a “+” or “-” value is
just a representation of what is already represented graphically. Table 3 contains
several entries valid for space and time models. With these word qualifiers, it will be
possible to place every agent in a coordinate system as presented in Figure 13.
61
AGENT X AGENT Y
FAR AWAY
d0 (Meeting Point) Distance (d)
BEFORE AFTER
t 0 (Meeting Time) Time (t)
Figure 13. Placing the agents in the coordinate systems
When comparing two agents, it is enough to understand their relative position from a
reference point or between each other. Word qualifiers are defined in function this
concept. If one agent is late or is far away, we can consider that the other agent is not
so late and not so far away from the respective reference points (see Figure 13). If both
agents are far away or near and we have no additional information, it will not be
possible to distinguish them (not even a human reader). This information is useless to
understand the sentence.
As we will see in next chapter, the implementation of the Model-Based WSC Solver is
not affected with the reduction of the covered domains. The only difference will be in
the Model Template and Domain Knowledge Database. They will include only
information about the selected domains.
62
7. IMPLEMENTATION
7.1. Introduction
After introducing the theoretical background about all the technical aspect of this
project and the detailed description of the proposed approach, it is moment for a
detailed explanation of how this proposal has been implemented. This work has the
main goal of measure and evaluate the correctness of our technical proposal. In Figure
11, we included a schema with all the needed elements for implementing the proposal.
Not all the elements have been developed for this thesis. In fact, we have adapted
three external software tools to our system, as expressed in Figure 14:
S chem a Question
NATURAL LANG UAG E

PROCE SS ING
( S tanford COR E NLP)
Annotated S chem a
INF OR MATION EX TR ACTION

& MOD EL GENERATION
Model A Model B
EVE NT CALCULUS
S AT S OLVER
INTER PRETER
( m inisat, relsa t, w alks at)
( DEC R ea soner)
S AT Res ults
ANS WER S ELECTOR
Ans wer
Figure 14. Model-based Winograd Schema External Tools
63
They are the Stanford CoreNLP, DEC Reasoner and the SAT Solver (this part includes
three different developments and is integrated with the Reasoner). All the rest of
modules and both, the Knowledge and Model Databases have been created especially
for this project.
Next chapters will describe in detail all these elements and de development
environment used for the implementation. We will follow all this description by using a
specific example schema, just to show how each block of the system returns the
expected value in function of the previous input. The example will be the same used in
Chapter 6.2 (See Figure 6):
“The firemen arrived [after/before] the police because they were coming from so far away”
Question: Who was coming from far away?
Answers: The Firemen / The Police.
All the software development has been done with the Apple Xcode IDE (Version 7.3),
especially created to build applications for Apple products, such as iPhone, iPad and
Mac. This is why all the project code must be compiled and executed in a licensed
Apple Mac computer with the operating system OS X 10.10 or higher. The code has
been written with Objective-C, a general-purpose object oriented language used by
Apple for both OS X and iOS operating systems. Xcode includes a complete API called
Cocoa, with the function and classes ready to be used in the Apple operative system
developments
7.2. External Applications and software
The external software used by the system is fully documented by the authors and free
available in the related repositories. Anyway, we are going to resume the facilities
offered by each one and how must be used specifically for our implementation.
7.2.1. The Stanford CoreNLP Suite
Stanford CoreNLP (http://stanfordnlp.github.io/CoreNLP/index.html) is a suite under a

GNU General Public License (v3 or later), developed and copyrighted by the Stanford
NLP Group. It has been written entirely in Java.
64
The suite provides a complete set of natural language analysis tools, using as input
raw texts. In [Manning et al., 2014] there is a complete description of the different tools.
It is also presented in Figure 15. The tools work independently and are executed
sequentially as a pipeline. They transform the raw input text in an annotated object that
will by finally the output of the suite. The result is returned as a XML or plain text file
(user selectable).
Raw
Text Tokenization
Sentence Splitting
Part-Of-Speech Tagging
Morphological Analysis
ANNOTATION
OBJECT Named Entity Recognit.
Syntactic Parsing
Coreference Resolution
Other Annotators
Annotated
Text
Figure 15. Stanford CoreNLP Schema (figure from [Manning et al., 2014])
In this project we are going to use most of them. We exclude, for example, the Named
Entity Recognition and the co-reference Resolution steps (this one is precisely what we
want to solve in this work).
The used annotators and their functions are:
 Tokenization (tokenize): Tokenizes the raw text into individual tokens. It adds
the start and end position for every token:
<tokens>
<token id="1">
<word>The</word>
<CharacterOffsetBeg>0</CharacterOffsetBegin>
<CharacterOffsetEnd>3</CharacterOffsetEnd>
65
</token>
<token id="2">
…
</tokens>
 Sentence Splitting (ssplit): Splits the resulting sequence of tokens into

sentences. Each of them is listed separately:
<sentences>
<sentence id="1">
<tokens>
...
</tokens>
</sentence>
<sentence id="2">
...
</sentences>
 Part-Of-Speech Tagging (pos): Add to every token its part-of-speech (POS)

tag.
 Morphological Analysis (lemma): Add to every token the base form or lemma.
The result of adding POS and lemma to the tokens will be as follow.
<tokens>
<token id="1">
<word>The</word>
<lemma>the</lemma>
<CharacterOffsetBegin>0</CharacterOffsetBegin>
<CharacterOffsetEnd>3</CharacterOffsetEnd>
<POS>DT</POS>
</token>
<token id="2">
...
</tokens>
 Syntactic Analysis (parse): Add the text parsing analysis and the
dependencies existing among the words:
<parse>(ROOT (S (NP (DT The) (NNS firemen)) (VP (VBD arrived) (PP (IN
after) (NP (DT the) (NN police))) (SBAR (IN because) (S (NP (PRP they))
(VP (VBD were) (VP (VBG coming) (PP (IN from) (ADVP (RB far) (RB
away)))))))) (. .))) </parse>
<dependencies type="basic-dependencies">
<dep type="root">
<governor idx="0">ROOT</governor>
<dependent idx="3">arrived</dependent>
66
</dep>
<dep type="det">
<governor idx="2">firemen</governor>
<dependent idx="1">The</dependent>
...
The execution of the selected annotator is done by using their identifier as input
parameter. The execution of the Java virtual machine could be as follow (using a shell
script, with the directory of the suite and the output file name as external parameters):
java -mx2g -cp "$dir/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators

tokenize,ssplit,pos,lemma,parse -file $*
7.2.2. The Discrete Event Calculus Reasoner Program
DEC Reasoner 1.0 is a fully functional program ready to perform automated

commonsense reasoning by using discrete event calculus. It is free available under
Common Public License v1.0 in http://decreasoner.sourceforge.net/ by its owner, IBM
Corporation (see [Mueller, 2008]).
The program has been implemented in Python and complemented with a core
reasoning part done in C language (it must be compiled the first time to make it
accessible for the Python script. Using is as easy as import a decreasoner.py file from a
Python interpreter/compiler.
$ python
>>> import decreasoner
Then, we can use the software by selecting the file where the description in event
calculus is included.
>>> decreasoner.run(’Winograd/Firemen.e’)
If we use the example schema, we can write a small DEC program by using the
expression rules defined in [Mueller, 2008]:
load foundations/Root.e
load foundations/EC.e
sort distance: integer

sort agent
agent Police, Firemen
fluent Moving(agent)
67
fluent DistanceToGoal(agent, distance)
event Depart(agent)
event Arrive(agent)
[agent, time]
Initiates(Depart(agent), Moving(agent), time).
[agent, distance, time]

Releases(Depart(agent), DistanceToGoal(agent, distance), time).
[agent, time]
Terminates(Arrive(agent), Moving(agent), time).

HoldsAt(DistanceToGoal(agent, distance), time) ->
Initiates(Arrive(agent), DistanceToGoal(agent, distance), time).
Delta: [agent, time]

HoldsAt(Moving(agent), time) & HoldsAt(DistanceToGoal(agent, 0), time) ->
Happens(Arrive(agent), time).
[agent, distance1, distance2, time]

HoldsAt(DistanceToGoal(agent, distance1), time) &
HoldsAt(DistanceToGoal(agent, distance2), time) -> distance1 = distance2.
[agent, distance1, distance2, offset, time]

HoldsAt(DistanceToGoal(agent, distance1), time) & distance2 = (distance1 - offset) ->
Trajectory(Moving(agent), time, DistanceToGoal(agent, distance2), offset).
!HoldsAt(Moving(Police),0).
!HoldsAt(Moving(Firemen),0).
HoldsAt(DistanceToGoal(Police, 4),0).
HoldsAt(DistanceToGoal(Firemen, 2),0).
Delta: Happens(Depart(Police), 0).

Delta: Happens(Depart(Firemen), 0).
completion Delta Happens
range time 0 5
range distance 0 4
range offset 1 4
The output from DEC Reasoner will be as follow:
Discrete Event Calculus Reasoner 1.0

loading Winograd /Firemen.e
loading foundations/Root.e
loading foundations/EC.e
168 variables and 790 clauses
relsat solver
1 model
68
---
model 1:
0
DistanceToGoal(Firemen, 2).
DistanceToGoal(Police, 4).
Happens(Depart(Firemen), 0).
Happens(Depart(Police), 0).
1
-DistanceToGoal(Firemen, 2).
-DistanceToGoal(Police, 4).
+DistanceToGoal(Firemen, 1).
+DistanceToGoal(Police, 3).
+Moving(Firemen).
+Moving(Police).
2
-DistanceToGoal(Firemen, 1).
+DistanceToGoal(Firemen, 0).
Happens(Arrive(Firemen), 2).
3
-Moving(Firemen).
4
Happens(Arrive(Police), 4).
5
-Moving(Police).
EC: 7 predicates, 0 functions, 0 fluents, 0 events, 0 axioms

Firemen: 0 predicates, 0 functions, 2 fluents, 2 events, 13 axioms
...
As we can see, it does not seem to be a special development for DEC, because the
input description and the output narration appear as standards under the discrete event
calculus language. The output will be on the terminal, so we will must to format and
capture this information to be used.
The program supports several parameter inputs (inside the event calculus file) to
select, for example, the use of trajectory axioms, or to show all the predicates. It is
necessary to select an external SAT solver, such as:
relsat: http://code.google.com/p/relsat/
minisat: http://minisat.se/MiniSat.html
walksat: http://www.cs.rochester.edu/u/kautz/walksat/
69
The use of a satisfiability solver is completely necessary to run DEC Reasoner. It is
possible to select and prioritize two different solvers. If the first one fails in finding a
valid solution to the logical problem, the second one will try it.
7.3. Domain Knowledge and Model Databases
Before exploring how the Model-based WSC Solver has been implemented, we will go
deeper into the two databases we will need to use. As described in Figure 11, we have
the Domain Knowledge Database, where it is contained the information needed to
extract the key data form the text, and the Domain Model Database, where we can find
the template for every possible model.
Domain Knowledge database has been manually implemented in raw CSV text. Due
the initial size used for testing the system, it has not been necessary a more complex
development. At this point, it is important to notice that probably the best option for this
first database could be by defining a specific ontology. In the second case, the Domain
Model Database, as the model templates must be created manually; it comprises a set
of files for every possible model option.
7.3.1. The Domain Knowledge Database
The initial implementation of this database is composed of a hundred entries in a raw

CSV text file. It includes all the different words that could be relevant for two main
functions (see Chapter 6.3):
 Selecting the Domain associated with the Winograd Schema.
 Extracting all the relevant information that could be useful in understanding it.
It means that this file will be used in two parts of the system, both included in the same
Information Extraction phase. The format for every entry is as follow:
Type : Word : Value : Options
 Type: Defines the group where the Word is included. There are three main options:
The Action type (verbs), the Domain types (such as Time, Distance, Position, Size
or Speed) and the Qualifier type (with those words that does not define the domain,
but modify a related word by magnifying or reducing its Values.
70
 Word: The word that must be found in the text to consider the Value and Options.
In fact, we will use the lemma to reduce the number of possibilities in the text
processing task.
 Value: The property we are going to use for further choices. For example, if the
Type is Action, the value will define the proposed Domain for the sentence. If the
Type is, for example, Distance or Time, the value will define if the word means a
greater or lower distance or time. It is usually represented with a “+” or “-“ symbol.
 Options: This last part is not always present in the database entry. It usually
includes additional information that will help in the decision steps. For example, the
entry Action:depart:Move:Start, define that the action word depart (verb) is related
with the Move domain, but also it defines a starting situation in a movement.
Logically, the opposite word, arrive, will have the optional property End. This is not
the only information we can add as option. When the entry is an Action value (see
entry for overtake verb), we can extend the information by including the relation of
the action with any Spatio-temporal coordinate (overtake is directly related with the
Speed coordinate and it express a positive situation or greater speed).
These are several example entries used by the system:
Action:depart:Movement:Start
Action:arrive:Movement:End
Action:go:Movement:Start
Action:move:Movement:Both
Action:land:Movement:End
Action:overtake:Movement:Both:Speed:+
Time:before:-
Time:soon:-
Time:early:-
Time:now:0
Time:next:+
Time:later:+
Time:first:-
Dist:near:-
Dist:far:+
Dist:deep:+
Pos:down:-
Pos:behind:-
Pos:up:+
71
Pos:inside:-
Pos:outside:+
Size:small:-
Size:big:+
Speed:fast:+
Speed:slow:-
Qual:very:x2
Qual::tiny:-2
Qual:small:-1
As described in Chapter 6.3.3, we are mainly comparing Spatio-temporal situations

between two agents. The word from every entry is qualified somehow, and if it is
related with the agent, it will define its position in the space and time. For example, in
the sentence from the schema example:
The firemen arrived later than the police, because they were coming from far away.
Considering the previous list (not the full database), we find three words in the
sentence. After processing the sentence with the CoreNLP suite, we will find the
relations among these words and the agents. The consideration about the qualifier
value will depend on the associated Type. The following table describes every value
and how it qualifies the Spatio-temporal situation of the agent:
Type Value Qualifier

+ The agent is positioned later than the other in the timeline.
Time 0 The agent is at the same time stamp than the other agent.
- The agent is positioned before than the other in the timeline.
++ The agent is much farther from the goal than the other.
+ The agent is farther from the goal than the other.
Distance
-
--
+ The agent is over/in front of the other (it is visible?).
Position
- The agent is under/behind the other (it is visible?).
+ The agent is bigger than the other.
Size
- The agent is smaller than the other.
+ The agent is faster than the other.
Speed
- The agent is slower than the other.
x2 The word multiplies the initial value (time, distance, size…).
+n The word increases its value by n.
Qualifier
0 The word maintains its value
-n The word decreases its value by n.
Table 4. Meaning of a Qualifier according the Group.
72
These values would seem to be too near to quantifiers instead qualifiers. The use of
mathematical symbols is only for easy recognition purposes. The part of the code
processing then can use, of course, any kind of symbol, as there are no mathematical
operations among comparisons. It allows the use of the same class of values for
different domains (not Spatio-temporal). For example, if we are involved in emotional
considerations, a happy state could be also “+” as it brings a positive evaluation, and a
sad state could be “-“.
The database can be modified and increased at any time, as the software will go down
the full list independently of the number of entries. The only considerations to have in
mind is that we should not repeat words inside the same Type (we can do it when they
are at different type groups as they will be correctly discriminated when the domain of
the sentence will first selected).
If we want to include new types, the code must be modified. The new values will be
read by the system, but it will not be able to interpret the values.
There are three special entries in the database defining the domains that can be
selected by the system and their relation (weights) with the different Spatio-temporal
coordinates. Their values are (they can be modified for additional domains and
coordinate types):
Coordinates:Time,Dist,Pos,Size,Speed
Domains:Movement:Waiting:Placement,See:Fit
Weights:Movement|1,1,0,0,1:Waiting|1,0,0,0,0:Placement|0,0,1,1,0:Fit|0,0,1,1,0
 Coordinates: Define (with a comma-separated list) the different coordinate types

that are defined in the database. Action is omitted because is included by default
as a key element for the domain selection. Every word in the database must be of
one (and only one) of these types.
 Domains: This is the list of the possible domains included in the system. Every
domain will have an associated model, so it is not possible for the solver going
outside the scope of this list. It is possible to define a submodel as a more specific
situation inside a general one. See can be a subdomain of Placement as we could
be interested in study if an object is visible before or after it is placed on some
position. It is also possible that we will define a See general domain.
 Weights: We can see this entry as a matrix relation among each domain and the
different coordinates. If a coordinate is related with a coordinate, it will have a
numeric value defining how strong it is this relation. If they are not, it will be a zero
73
value. In the example, all the numeric values are “1”, but it could be greater if we
consider a stronger relation. These values will be a key aspect to select the domain
related with the sentence according the kind of words (coordinates) we have found.
The following table represent the example in a better manner to understand it:
DOMAIN Time Distance Position Size Speed

Movement 1 1 0 0 1
Waiting 1 0 0 0 0
Placement 0 0 1 1 0
Fitting 0 0 1 1 0
Table 5. Meaning of a Qualifier according the Group.
7.3.2. The Domain Model Database
The Domain Model Database is a complete set of files that defines how a model must
be constructed for a specific Domain (see Chapter 6.5). Every possible Domain will
have three different files. For a better understanding, we will describe in the detail one
possible set for the Movement Domain. By joining the different files and filling the blank
parameters or variables, we will obtain the model to be used in the reasoning process.
 Variables File: It includes all the variables that must be considered in the
model templates. For the Movement Domain, they will be:
$AgentX
$AgentY
$SpeedX
$SpeedY
$StartDistX
$StartDistY
$StartTimeX
$StartTimeY
$EndTimeX
$EndTimeY
$maxTime
$maxDistance
$offset
These values will be clearer for the reader when we will introduce the example with the
model template, but we can intuitively suppose their use. Most of these variables are
divided in X and Y variables. It corresponds to each of the two agents. We usually will
assign the X variables to the agent appearing first in the text. Other variables are more
specific to the model, but all of them must be filled before using the model.
74
 General Rules File: It includes the general rules for the domain. It is the base
for the resulting DEC model. Following the example (Movement domain):
option timediff on
option showpred off
; Movement Sorts
sort agent
; Movement Fluents and Events Declaration

fluent Distance(agent, distance)
fluent NormalSpeed(agent)
fluent FasterSpeed(agent)
event Depart(agent)
event Arrive(agent)
; Movement Domain Rules:
; #MR1: Moving initiates with Depart

[agent, time]
; #MR2: Moving terminates with Arrive

[agent, time]
; #MR3: An agent can only be at one distance at the same time

HoldsAt(Distance(agent, distance1), time) & HoldsAt(Distance(agent, distance2), time) ->
distance1 = distance2.
; #MR4: An agent Moving will change his distance proportionally with the elapsed time.
HoldsAt(Distance(agent, distance1), time) & HoldsAt(NormalSpeed(agent), time) &
distance2 = (distance1 - offset*1) ->
Trajectory(Moving(agent), time, Distance(agent, distance2), offset).

HoldsAt(Distance(agent, distance1), time) & HoldsAt(FasterSpeed(agent), time) &
distance2 = (distance1 - offset*2) ->
Trajectory(Moving(agent), time, Distance(agent, distance2), offset).
75
; #MR5: An agent Moving (Depart) changes distance to Goal
Releases(Depart(agent), Distance(agent, distance), time).
; #MR6: An agent that is Moving will Arrive when the Distance to Goal is 0.
[agent, time]
HoldsAt(Moving(agent), time) & HoldsAt(Distance(agent, 0), time) ->
Happens(Arrive(agent), time).
; #MR7: An agent at a specified distance that arrive, will stay at that distance.
HoldsAt(Distance(agent, distance), time) ->
Initiates(Arrive(agent), Distance(agent, distance), time).
; #MR8: If an agent is not at distance 0 to Goal or is not Moving, he will not Arrive.
[agent, time]
!HoldsAt(Distance(agent, 0), time) | !HoldsAt(Moving(agent), time) ->
!Happens(Arrive(agent), time).
; #MR9: If an agent is at distance 0 to Goal or is Moving, he will not Depart.

[agent, time]
HoldsAt(Distance(agent, 0), time) | HoldsAt(Moving(agent), time) ->
!Happens(Depart(agent), time).
; #MR10: If an agent is NormalSpeed cannot be FasterSpeed.

[agent, time]
HoldsAt(NormalSpeed(agent), time) -> !HoldsAt(FasterSpeed(agent), time).
We can see a first interesting property about this second file: there are no variables or
parameters inside the DEC predicates. As all the information is a general definition, the
specific properties from the sentence or schema are not used in this case. But all the
possible rules and the relation among all the agents must be defined here. This is what
we called General Predicates in Chapter 6.4.1).
 Narration Rules File: It includes the more specific predicates. All the possible
variables or parameters are set in this file:
; Constants declaration
agent $AgentX, $AgentY
HoldsAt($SpeedX($AgentX), 0).
HoldsAt($SpeedY($AgentY), 0).
; At Time = 0: No agent is moving

!HoldsAt(Moving($AgentX),0).
!HoldsAt(Moving($AgentY),0).
76
; At Time = 0: Distance to Goal is declared
HoldsAt(Distance($AgentX, $StartDistX), 0).
HoldsAt(Distance($AgentY, $StartDistY), 0).
; At Time = X/Y: Both agents start moving

Happens(Depart($AgentX), $StartTimeX).
Happens(Depart($AgentY), $StartTimeY).
; At Time = X/Y: Both agents arrive and stop moving

Happens(Arrive($AgentX), $EndTimeX).
Happens(Arrive($AgentY), $EndTimeY).
; Range Declaration:
; maxTime: Maximum time value for the model search
; maxDistance: Maximum distance Value -> Distance to Goal
; offset: Maximum variation for the change of distance when moving.
range time 0 $maxTime
range distance 0 $maxDistance
range offset 1 $offset
option timediff on
option showpred off
Now, many of the listed predicates include one or more parameters to be fulfilled. As
seen before (Chapter 6.4.2), we must use in many cases numerical values, as it is the
best way to use these predicates with the DEC Reasoner tool.
We can (and must) define as many model sets as domain we want to introduce to the
system. If any sentence is not matched with any of the possible models, the system will
fail in the understanding process.
Next chapter describes in detail the implementation of our system. We will see in more
detail how the example input is converted in two different models by using the files from
this database.
7.4. Model-based WSC Solver implementation
If we revisit Figure 11 and Figure 14, we can see that the Solver execute five main
tasks during the reasoning process:
1. First, the schema information (system input) is sent to the Stanford CoreNLP suite
(see Chapter 7.2.1). The result will be an annotated XML file.
77
2. Second, the XML file is processed by the system and all the relevant information is
extracted:
 Agent X, Agent Y and Agent Q (pronoun).

 Spatio-temporal properties related with Agents X, Y and Q.
 Schema Domain (for the DEC Model Selection process).
3. Third, the extracted information is used to create two different DEC models (one
with the Agent X being also Agent Q and other with Agent Y being Q).
4. Fourth, both models are used as input of the DEC Reasoner (see Chapter 7.2.2).
The output will be the narration of the model execution or a “no model found”
message if the solution is not found (we expect one model running fine and the
other one without model solution).
5. Finally, the system will verify if there is one (and only one) valid model. It will allow
selecting the right answer.
As the system includes several predefined Winograd schemas, all the process start
pushing a button (we will see how in next Chapter). Now, we are going to revise every
step and detail how must be the input, which is the logic used in processing it and how
is presented or delivered the output. In previous chapters, we have seen most of the
concepts and issues related with all these steps. Therefore, we will focus only in the
implementation task.
7.4.1. Natural Language Processing of the Winograd Schema
The initial input for the solver will be a plain text ASCII file including the following lines:
PA#The firemen arrived after the police because they were coming from far away.
PB#The firemen arrived before the police because they were coming from far away.
QA#Who were coming from far away?
QB#Who were coming from far away?
KA#after
KB#before
A1#The Firemen
A2#The Police
It must include the phrases (Px#), the Questions (Qx#), the Key words (Kx#) and the
two possible answer (Ax#). Every element uses the “#” symbol as separator. The
difference between PA and PB will be the key words KA and KB. The questions QA y
QB will be equal if the key word is not included in them.
78
The phrases are passed as input parameter in the CoreNLP suite. The script created
for executing the Stanford suite and the rest of external code is executed by defining an
object from the NSTask Cocoa class. This kind of construction allows the execution of
any external script (corenlp.sh in this case) directly from an Objective-C program. In
this case, the code will be as follow:
NSTask *taskReasoner = [[NSTask alloc] init];

NSPipe *pipeOutputEC = [NSPipe pipe];
[taskReasoner setStandardInput:[NSPipe pipe]];

[taskReasoner setStandardOutput:pipeOutputEC];
[taskReasoner setCurrentDirectoryPath:@"./Winograd/stanford-corenlp"];
[taskReasoner setLaunchPath:@"./Winograd/stanford-corenlp/corenlp.sh"];
[taskReasoner setArguments: @[ phraseFile, @"-outputDirectory", outputPath]];
[taskReasoner launch];
The result will be two XML files, one for each sentence, and they will include a full
tagged structure from the text, with the POS information and the relation found among
the words.
7.4.2. Implementing the Relevant Information Extraction
The first task to complete in this step will be the Model selection. Only having this
information, we will be able to fill the blank spaces in the model templates. And having
these templates will be the key task to reason about the schema.
The Model selection, as described in Chapter 6.3.2, is based on the detection of action
verbs and specific words in the text. Then, a measurement of the number of detected
words will allow the right Domain selection. If no word is matched, the Domain
selection will fail and the program will stop the reasoning step. Also, when there are
words related with different domains, the system will study which is the better choice
according the number of words for each one (verbs has a greater ponderation in this
measurement).
As we have seen in 7.3.1, the Domain Knowledge Database includes all the selectable
words classified in coordinate types and the possible Domain list. There is also a
weight matrix relating both groups. After selecting from the sentence the actions and
relevant words matching with any entry of the database, the system will calculate a
weight for every possible domain. In most of the cases, the domain will have a zero
value as there will be no relation with the sentence. In addition, there will be many
cases with two domains related with the text. The system will select one of them in
function of their weights.
79
We can consider the following situation, where there are three relevant words detected
in the sentence and two verbs that could lead to select two different domains:
 Later Time
 Far Distance
 Greater Size
 Arrived Movement
 Placed Placement
There are two domain candidates from the two actions: Movement and Placement. The
system will measure the weights for each one by taking a default value of “2” for each
action, and then adding the additional weight contributed by every relevant word. Time
and Distance will increase the Movement value (2 + 1 + 1 = 4) and Size will do the same to
the Placement weight (2 + 1 = 3). The resulting score makes greater the Movement
weight and it will be the selected domain.
Every selected word will be included in a temporal list as it will be used to find its
possible relation with one of the agents. As every word includes the properties
described in the database, after finding a relation with an agent, the word will
parameterize and locate in the Spatio-temporal coordinates. Figure 16 schema
describes these steps. This is an easy process as they are based on an already
processed text. But only if a searched model exists, the extraction will be correct. Also,
having a model option does not mean that it is correct. It will depend on how the model
database is completed. The more complete are the models, the greater is the number
of correctly matched sentences.
80
XML File
COMMONSEN SE
SEARCH FOR
KN OW LEDG E
RELEVANT W ORDS
DATABASE
Relevant Words
List
V ERIF Y IF A MODEL NO NO
CAN BE SELECTED SOLUTION
YES
Model
Selected
FIN D AG ENTS
Agents X,Y,Q
Selected
F IN D RELATION BETW EEN

AG EN TS AN D W ORDS
Domain + Agents X,Y,Q +

Agents Properties
Figure 16. Relevant Information Extraction steps
Finally, a Domain, the Agents and their Properties have been selected. It is time to
generate a model from all this information.
81
7.4.3. DEC Model Generation
After selecting the Domain of the schema (if possible), the system will select three
different files: The Variable File, the General Rules File and the Narration File. In
Chapter 7.3.2 we have seen a detailed description for all of them.
The variables are characterized by a default value that will change only if we find that a
property is detailed or described in the text. In the following table, it is the variable list
for the Movement Domain, with the description and the default Value. We can see that
there are numerical values for most of the cases. As the model should only represent
the qualitative differences, these values can be changed when needed. But, DEC
Reasoner needs an integer value when comparing events and the proportional relation
must be maintained:
Variable Description Default

$AgentX Agent X, the first agent in the text -
$AgentY Agent Y, the second agent in the text -
$SpeedX Speed for Agent X Normal
$SpeedY Speed for Agent Y Fast
$StartDistX Initial distance of Agent X from the Goal position 2
$StartDistY Initial distance of Agent Y from the Goal position 2
$StartTimeX Initial Departing Time for Agent X 0
$StartTimeY Initial Departing Time for Agent Y 0
$StartTimeQ Initial Departing Time for Agent Q (the pronoun) 0
$EndTimeX Final Arriving Time for Agent X 2
$EndTimeY Final Arriving Time for Agent Y 2
$EndTimeQ Final Arriving Time for Agent Q 2
$maxTime Define the time point where DEC Reasoner must stop 5
$maxDistance Maximum value for any distance to the Goal position 4
$offset Define the offset for the possible integer variables 4
$AgentA Agent A is an Agent X or Y that is always referred in the sentence -
$EndTimeA Agent A EndTime 2
Table 6. Variable Description and Default Values.
The use (or not) of every default value of a variable will depend of what relevant
information we have found from the text. For example, considering the example
sentence, we will match the following relevant data with the variables:
82
$AgentX = The Firemen
$AgentY = The Police
Ending Time Agent X has assigned the value “+” (As The Firemen arrived later)
It means that $EndTimeX > $EndTimeY
Starting Position Agent Q has assigned the value “+” (This data is not related with the pronoun)
It means that (A) $StartPosX > $StartPosY (If AgentX is AgentQ - the pronoun)
OR
(B) $StartPosY > $StartPosX (If AgentY is AgentQ - the pronoun)
It does not seem too much information, but we have limited the ending time for both
agents. Also, we have found relevant information related with the pronoun that stablish
the two conditions: A and B. By considering this double possibility, we will be able to
create two different models. The other condition about the ending time will be
maintained in both cases.
As we must assign numerical values to allow the DEC Reasoner tool the execution of
the models, the system will assign appropriate values as follow:
First condition:
$EndTimeX = 4
$EndTimeY = 2
Second condition for Model A:

$StartPosX = 4
$StartPosY = 2
Second condition for Model B:

$StartPosX = 2
$StartPosY = 4
The rest of the values will take default values ($StartTimeX = $StartTimeY = 0, and so on).
From this point, we can generalize how the system will act in function of the relevant
information found:
1. Every information about Agents X or Y will condition their values in both models
2. When any information about Agent Q is found, it will be used to create the two
different models A and B. Model A will consider that this information apply to
Agent X and Model B will do the same with Agent Y.
3. Rest of variables will take default values.
The resulting DEC narration will be:
83
Model A
agent Firemen, Police
HoldsAt(Normal(Firemen), 0).
HoldsAt(Normal(Police), 0).
HoldsAt(Distance(Firemen, 4), 0).

HoldsAt(Distance(Police, 2), 0).
HoldsAt(Waiting(Police), 2 + 1).
Model B
agent Firemen, Police
HoldsAt(Normal(Firemen), 0).
HoldsAt(Normal(Police), 0).
HoldsAt(Distance(Firemen, 2), 0).

HoldsAt(Distance(Police, 4), 0).
HoldsAt(Waiting(Police), 2 + 1).
Both models differ only in the Distance fluent at time = 0. Logically, with these values,
Model B will fail as the Distance initial values are incoherent with the arriving time. So,
there will be two impossible Happens predicates.
The general model has been complemented with other predicates to allow a more
complete description of the related world. As we can see, there is a fluent defined as
Waiting, and it will represent that one of the agent is waiting to the other (this situation
happens only when one agent arrives before the other to the ending point and it will be
valid only until the other agent arrives to the same point. The system will consider this
84
fluent only if the arriving time is different or if it detects a specific mention to a waiting
state in the text.
These special considerations can be extended without any problem, but we must be
careful of not define a too complicate system due it could move the analysis and
reasoning process to a dead end.
7.4.4. DEC Model Processing
After obtaining both models, A and B, the system will run DEC Reasoner using them as
input parameters. Then, the output given by the tool will give us all the needed
information.
The execution of DEC Reasoner inside the system is equivalent to how it has been
implemented for CoreNLP. But now we call Python and use a small script as parameter
input. The following method (an Objective-C reusable function) describes this execution
from the internal code. It will take as input a string scriptFile with the DEC code. The
output will be readString (the resulting DEC narration) and it must be processed to
discover what happened with the input description:
- (NSString *)launchDecTask:(NSString *)scriptFile

{
NSTask *taskReasoner = [[NSTask alloc] init];
NSPipe *pipeOutputEC = [NSPipe pipe];
[taskReasoner setStandardInput:[NSPipe pipe]];

[taskReasoner setStandardOutput:pipeOutputEC];
[taskReasoner setCurrentDirectoryPath: @"./Winograd/decreasoner"];
[taskReasoner setLaunchPath: @"/usr/bin/python"];
[taskReasoner setArguments: @[ @"Winograd.py", @"-i", scriptFile]];
[taskReasoner launch];
NSFileHandle *readFile = [pipeOutputEC fileHandleForReading];

NSData *readData = [readFile readDataToEndOfFile];
NSString *readString = [[NSString alloc] initWithData:readData
encoding:NSUTF8StringEncoding];
return readString;
}
If we use the two models introduced previously, we will obtain the following outputs
DEC Reasoner Output for Model A

loading /Users/Alfonso/Documents/.Winograd/.Reasoning/Movement1/Movement1.e
85
relsat solver
1 model
---
model 1:
0
Distance(Firemen, 4).
Distance(Police, 2).
NormalSpeed(Firemen).
NormalSpeed(Police).
1
-Distance(Firemen, 4).
-Distance(Police, 2).
+Distance(Firemen, 3).
+Distance(Police, 1).
+Moving(Firemen).
+Moving(Police).
2
-Distance(Police, 1).
+Distance(Police, 0).
3
-Moving(Police).
+AtMeetingPoint(Police).
+Waiting(Police).
4
5
-Moving(Firemen).
-Waiting(Police).
+AtMeetingPoint(Firemen).
DEC Reasoner Output for Model B

loading /Users/Alfonso/Documents/.Winograd/.Reasoning/Movement2/Movement2.e
relsat solver
walksat solver
walksat solver
no models found
86
7.4.5. DEC Model Evaluation
After executing the DEC Reasoner tool for both models A and B, we will have four
possible combinations. Both models fail in returning a valid solution, only one of the
models, A or B, is valid, or finally, both models return a valid answer. When both
models give the same answer, it is not possible to answer the question. It means that
the models were not enough complete, or maybe, they have not been correctly defined.
Only when the evaluation subsystem acts as a OR-exclusive gate, we will have the
right answer. Figure 17 represents this idea. Of course, having only one correct answer
does not mean that the system understood correctly the text. But, the contrary will
mean the system has certainly failed.
Model A Model B
Result Result
Is Model A OR B
(and only OR)
A Valid Model ?
Answer:
A or B
Figure 17. Relevant Information Extraction steps
The implementation of this part will just verify the answer from both models execution
and will decide under the OR-exclusive condition.
87
7.5. System User Interface
To allow the use of the system, we have designed a simple User Interface, with only
one fix program window (see Figure 18) and just four selection options:
Figure 18. System User Interface Main Window
The first part, at the top of the window (see Figure 19), include a dropdown menu with
all the Winograd schemas available. There are several schemas included by default,
but the user can add new ones as we have seen before.
Figure 19. Winograd Selection Frame
When a new schema is selected, the information presented on screen show the right
values of Phrase, Question, and the two possible Answers 1 and 2. Every schema has
88
a special word, selected by default, but we can select the alternative word with the
double A/B button. As we can see, it modifies the Phrase text, but also (as expected), it
will modify the answer output.
The center part of the window resumes the two possible answers from the DEC
Reasoner tool. After pressing the DEC Reasoning button, the result output from DEC
Reasoner for each model is shown. We will see the solvers used, the number of
models founds and the sequence of events and fluents changes, detailed for every
time stamp (narration).
Figure 20. DEC Output Narration and answer about models
Also, the execution information from DEC Reasoner is displayed to allow a deeper
check of the output (see Figure 21).
89
Figure 21. DEC Output related with the Execution messages
Finally, in two additional frames is displayed the results of the reasoning done by the
system. In a first part, the system lists all the relevant information found in the schema.
That is, the data it will use when trying to deduce if there is an answer to the question
and, if so, which is the option selected between the two possible ones.
Figure 22. Relevant information found and Final Answer about the models
For example, when reasoning about the schema from Chapter 7.1, the system will
return the information shown in Figure 22. The most relevant data set the ending time
of a movement for Agent X (the first one appearing in the sentence) being later than
the other agent. In addition, the Agent Q (related with the pronoun) will depart from a
greater distance than the other agent will. Considering both conditions, the solution
must be Agent X: The Firemen, as shown in the ANSWER Frame.
90
Finally, to allow adding new schemas to the system, there is an emergent window
accessible through the “Add Schema” button (see Figure 23). When selected, we can
introduce all the properties needed to define a new schema. When finish, it will appear
in the drop down list from the main window.
Figure 23. System User Interface Main Window
91
8. EVALUATION
8.1. Test Environment
The evaluation of the Model-Based WSC System has been done by way of a reduced
Winograd Schema corpus including only Spatio-temporal situations in the text. As
described in Chapter 6.8, this will be the base of our test environment. The corpus has
been created with several of the schemas included in [Levesque et al., 2012], but also
we have introduced new proposals or variations from the previous ones.
The Following list includes the schemas from our corpus. It is not a large list at all
(twelve examples), but it covers different situations with the goal of demonstrate how
our system can (or not) solve it.
1. The firemen arrived [after/before] the police because they were coming from far away.
Who were coming from far away?
A. The Firemen
B. The Police
2. The firemen departed [after/before] the police because they were coming from far away.
Who were coming from far away?
A. The Firemen
B. The Police
3. The firemen should be [nearer/farther] than the police, because they were the first to help in the
accident.
Who were the first to help in the accident?
A. The Firemen
B. The Police
4. The rabbit arrived before the turtle because it was [faster/slower].

Which one was [faster/slower] were coming from far away?
A. The Rabbit
B. The Turtle
5. The motorbike overtook the car because it was [faster/slower].

Which one was [faster/slower]?
A. The Motorbike
B. The Car
92
6. John was waiting for Peter because he was [late/early].
Who was [late/early]?
A. John
B. Peter
7. The box should be [under/over] the book, because Oscar could see it.
What could see Oscar?
A. The Box
B. The book
8. John couldn’t see the stage with Billy in front of him because he is so [short/tall].
Who is so [short/tall]?
A. John
B. Billy
9. The trophy doesn't fit into the brown suitcase because it's too [small/big].
What is too [small/big]?
A. The Trophy
B. The Suitcase
10. The table won't fit through the doorway because it is too [wide/narrow].
What is too [wide/narrow]?
A. The Box
B. The book
11. Although they ran at about the same speed, Sue beat Sally because she had such a [good/bad] start.
Who had a [good/bad] start?
C. Sue
D. Sally
12. The sack of potatoes had been placed [above/below] the bag of flour, so it had to be moved first.
What had to be moved first?
C. The Sack of Potatoes
D. The Bag of Flour
Comments about the schemas:
 From [Levesque et al., 2012] are the schemas number 1, 6, 8, 9, 10, 11 and 12.
 Schemas 2 and 3 are variations from the first one. The idea under this selection is
to testing how the system is capable or not of dealing with small variations that
change the meaning of the text.
 Schemas 4, 5 and 7 have been created specifically for this test. They add speed
and position (visibility) concepts.
93
 Schemas from 1 to 6 and 11, are related with a Movement domain.
 Schemas 7, 8 and 12 are related with a Position domain.
 Schemas 9 and 10 are related with a Size domain. Also the schema 8 could be
related with this domain.
The idea under this selection is not only demonstrate the ability of the system to solve
them. Also, in some case, we will see that only by increasing the knowledge base of
the system, it will be able to pass the test.
The system has been provided with a 100 list of relevant words and different models
for four specific Spatio-temporal domains: Movement, Waiting, Placement and Fit. All
the models are included in ANNEX B.
8.2. Results
Following are the results in a first try for the ten schemas listed previously. For each
execution, we include the relevant information found and the output answer:
NOTE: The information shown is only for the special word, but not for its alternative. In every
case where there was an answer, the system has returned the other agent as answer with the
alternative word (as expected).
1. The firemen arrived after the police because they were coming from far away.
Model Selected: Movement Information found: One model found.
Agent X: The Firemen Model: Movement The answer is: The Firemen
Agent Y: The Police End Time of Agent X: +
Agent Q(PRP): they Start Distance of Agent Q: +
2. The firemen departed after the police because they were coming from far away.
Agent X: The Firemen Model: Movement The answer is: The Police
Agent Y: The Police End Time of Agent X: +
Agent Q(PRP): they Start Distance of Agent Q: +
3. The firemen should be nearer than the police, because they were the first to help in the accident.
Agent X: The Firemen Model: Movement The answer is: The Firemen
Agent Y: The Police End Time of Agent Q: -
Agent Q(PRP): they Start Distance of Agent X: -
94
4. The rabbit arrived before the turtle because it was faster.
Agent X: The Rabbit Model: Movement The answer is: The Rabbit
Agent Y: The Turtle Speed of Agent Q: +
Agent Q(PRP): it End Time Agent X: -
5. The motorbike overtook the car because it was faster.

Agent X: The Motorbike Model: Movement The answer is: The Motorbike
Agent Y: The Car Speed of Agent Q: +
Agent Q(PRP): it Speed of Agent X: +
6. John was waiting for Peter because he was late.

Model Selected: Waiting Information found: One model found.
Agent X: John Model: Waiting The answer is: Peter
Agent Y: Peter End Time of Agent X: -
Agent Q(PRP): he Agent X is Waiting
7. The box should be under the book, because Oscar could see it.
Model Selected: Placement Information found: One model found.
Agent X: The Box Model: Placement (See) The answer is: The Book
Agent Y: The Book Speed of Agent Q: +
Agent Q(PRP): it Speed of Agent X: +
8. John couldn’t see the stage with Billy in front of him because he is so short.
Model Selected: Placement Information found: No model found.
Agent X: John Model: Placement
Agent Y: Billy
Agent Q(PRP): he
9. The trophy doesn't fit into the brown suitcase because it's too small.
Model Selected: Fit Information found: One model found.
Agent X: The Trophy Model: Fit The answer is: The Suitcase
Agent Y: The Suitcase Size of Agent Q: -
Agent Q(PRP): it Size of Agent X: +
10. The table won't fit through the doorway because it is too wide.
Model Selected: Fit Information found: One model found.
Agent X: The Table Model: Fit The answer is: The Table
Agent Y: The Doorway Size of Agent Q: +
Agent Q(PRP): it Size of Agent X: +
95
11. Although they ran at about the same speed, Sue beat Sally because she had such a good start.
Model Selected: Movement Information found: No model found.
Agent X: Sue Model: Movement
Agent Y: Sally
Agent Q(PRP): she
12. The sack of potatoes had been placed above the bag of flour, so it had to be moved first.
Model Selected: Movement Information found: No model found.
Agent X: The Sack of Potatoes Model: Movement
Agent Y: The Bag of Flour End Time of Agent Q: -
Agent Q(PRP): it
Adding new schemas to the test is as easy as including them in the code (for a
posterior compilation) or describing them with the specific “Add Schema” button and
filling the asked data.
8.3. Results Analysis
The first conclusion obtained after trying first and analyzing the previous results, is that
probably the hardest task is the NLP semantic analysis process. That is, after obtaining
the basic information with the syntactic and morphologic analysis, the matching
process between the data and the possible domain, agent and their Spatio-temporal
properties are not so obvious. This is what we expected when dealing with this project,
as it is well known the difficulties found when trying to understand natural language.
Also, thanks to the model abstraction, it is possible to cover an interesting number of

histories or actions with not very complex domain models. It means that when we first
try to match the text with a model, most of the trivial data is discarded. It does not mean
that the information we have not taken in account would not be important in some
cases, but the system will just try to answer the question included in the Winograd
schema. If the question is about which car is faster, probably the color of the car is not
relevant at all.
The results return nine correct answers and three wrong solutions, where no model
was found or the information detected was not enough to understand the text. The first
fail was with the eighth schema. The key problem here was the part “see the stage with
Billy in front of”. The system was not able to relate the John seeing problem with the Billy
position. The domain detected was the right one, but the lack of relevant data did it fail.
96
The second fail, with the 11th schema, has a similar origin. Now, the system could not
understand that a good start (or a bad start) means a higher speed for one of the agents.
The difficulty is the qualification introduced modifying the resulting Spatio-temporal
property.
The last fail appears in the 12th schema. Now, the problem is a wrong domain
selection. The relevant words lead to a Movement domain, but the sentence is more
related with the Placement domain. This first error limits the system solution.
Additionally, the model does not consider the situation where an object placed over
other object must be moved first.
Among the most difficult issues found when creating the logic models and the Relevant
Information extraction subsystem, we should include the following:
 Considering the negative version of the verbs or actions (could see vs. couldn’t
see).
 Considering special actions included in a more general one. For example, the state
of “Being Waiting” to someone is part of a more general Movement domain. Here
the hard work is the definition of a consistent model based on a hierarchical actions
tree and the development of a powerful exploration subsystem able to match the
best model option.
 Finding the right relation among the relevant words and the agents. Also,
understanding what are the consequences of this relationship (how the Spatio-
temporal coordinates are modified in function of the word meaning.
 The ambiguity of many words or the complexity of the natural language

constructions combining several words.
 When the main verb is “to be”, the information about the agent coordinates is not so
clear than when a directional verb is used (arrive, departure, come…).
 Trying to follow the dependencies among word found by the NLP tools is not an
easy task. It requires going further than the direct connections if we want to find the
right relation with the agents.
 This work has not been intended to solve NLP questions. It has supposed a
handicap during the interpretation of the extracted information.
Other interesting idea found during this analysis has been about the generalization of the
models. When doing so, the system will be able to deal with more sentences, but the
specific detail about each of them will be harder to understand. This can be solved by
increasing the knowledge databases with additional relevant words.
97
PART IV: CONCLUSIONS
98
9. FUTURE LINES OF INVESTIGATION
We will follow the path outlined by the Model-Based WSC Solver to present some
ideas to improve and advance the presented work.
The first part of the solver searches the information extraction from a sentence by using
NLP methods. In this work, we do not attempt to advance or solve this part in a better
manner than the state-of-the-art available. At present, the tools available for extracting
the syntactic and morphological information from the text cover without any problem
the demanded task.
After having the basic information, the following analysis must conduce to the definition
of the domain related with the text. Complementing the analysis done over the word list
from the sentence (verbs and domain words) is a key aspect to improve the correct
domain selection process. At this point seems to be obvious the relevance of the
database used to select and match the words with every possible domain. Also,
increasing the number of covered domains will increase the number of sentence that
the system will be able to process successfully.
The last part to be improved is the model definition for every covered domain. It would
imply the manual declaration of new event calculus predicates describing the world.
Also, the more general will be the models, the more number of situations will be
covered by the solver. As the rest of elements from the solver, increasing the scope will
add new chances to find the right solution, but there are other opportunities aside this
logical growth of the proposal.
One possible line of investigation would be the use of automated learning and
clustering in an attempt of classify every sentence in a domain according the words
from the sentence. By way of training the system with a set of sentences with their
corresponding domain, it could be possible to reduce the number of fails of the domain
selection. It must be a complement and not a substitute of the implemented selection
method and it would help specially when the possible options do not appear to be
clearly defined. A small variant completely compatible with this investigation would be
the comparison of
As the domains are not always perfectly limited to situations that are more specific
could arise during the text analysis, one possible solution could be the definition of
general model templates with additional event calculus predicates that will be added to
the general model only when the related special situations are detected in the
sentence. For example, when the sentence describes the relative position of two
99
objects, we can consider this information as the base for the analysis. We can add later
(if needed) small pieces of knowledge about the visibility of the object, which one can
be taken before the other or how they fit in the space (one object can be or not inside
the other). It would imply more than one domain analysis as the system must
understand the possible branches from the general theme.
Other related interesting research line is linked with the relevant information from the
sentence and the agent coordinates generation. As we have seen, after a model
template selection, the information that must be completed is statically defined. Again,
it could be possible to define new processes that select dynamically the variables
defining the agents Spatio-temporal coordinates.
100
10. CONCLUSIONS
Trying to solve the Winograd Schema Challenge is, without dude, a really hard work. It
combines the classical NLP problems due their complexity and recurrent ambiguity
(also when we consider daily common situations). In this work, we had the goal of
defining a Model-Based WSC Solver, which introduced the use of event calculus
models to represent commonsense knowledge as the way to address this problem. It is
far from other automated and statistic proposals, like most of the latest solutions
introduced by different researchers.
As presented in the evaluation results, we have achieved the first and main goal of this
work: Trying to understand a sentence by way of models created with event calculus. It
is achieved by matching this sentence with a model of a specific domain, instead
working exclusively with semantic analyses of the text. The abstraction and distance
obtained with this method allow eluding the not so relevant part of the sentence and
focus only in the important information.
Against the proposed solution, we can find the following issues. We have seen that it
could fail in the understanding process if the models are not really covering what is
described in the sentence or it is not considered at all (there is no model for it). The use
of manual defined models is also a handicap, as it converts the method in a hard and
time-consuming task. There is also a critical point in the process: the domain selection.
Probably, a more complete and structured (and linked) database could be interesting to
allow the external contribution. For example, the definition of a specific ontology directly
related with the Spatio-temporal data would help in joining more general ontologies.
The system is not ready to apply this kind of information, but it could be not so difficult
to do it. The main reason for not doing this work from the beginning is because the
study was not oriented to the information format. Also, most of ontologies found are
intended to solve detailed spatio temporal questions, such as GIS or time schedule
information. The more daily natural language used in the sentences included in
common Winograd schemas place this kind of data in a pointless situation.
Additionally, we have not found ontologies or databases providing event calculus

properties about the world that could help in the model creation. It could be really
interesting to model this aspect of the project by creating general event calculus
predicates ready to be reused in future models.
101
The counterpoints we can provide are that, for example, when the sentence is matched
with the right domain, it is solved easily including many sentence variations. The
sentence structure and word components can change in many ways and the system
still will be able to solve it. In addition, the system growth is completely feasible and
open to additional knowledge source. Due the modularity achieved, we can focus only
in those parts we are interested in.
In summary, we have addressed and solved just a small part of this hard problem. We
have proved that event calculus could be an interesting path to solve the Winograd
Challenge. Of course, the task is really large and we must understand that we are far of
a fully satisfactory goal. But, maybe the defined start points are good enough to
increase the success when the proposed future lines of investigation start returning
new achievements. Due we can start modeling many different world situations with just
a few number of predicates, we will have interesting results from the beginning.
102
ANNEX A EVENT CALCULUS AXIOMATIZATION
103
EVENT CALCULUS AXIOMATIZATION
Composed of ten axioms (EC5, EC6, EC9, EC10, EC11, EC12, EC14, EC15, EC16, and EC17) and
seven definitions (EC1, EC2, EC3, EC4, EC7, EC8, and EC13):
𝑑𝑒𝑓
EC1. Clipped(t1, f, t2) ≡ ∃ e, t ( Happens(e, t) ∧ t1 ≤ t < t2 ∧ Terminates(e, f, t ) ).
𝑑𝑒𝑓
EC2. Declipped(t1, f, t2) ≡ ∃ e, t ( Happens(e, t) ∧ t1 ≤ t < t2 ∧ Initiates(e, f, t ) ).
𝑑𝑒𝑓
EC3. StoppedIn(t1,f,t2) ≡ ∃ e, t ( Happens(e, t) ∧ t1 <t <t2 ∧ Terminates(e, f, t ) ).
𝑑𝑒𝑓
EC4. StartedIn(t1,f,t2) ≡ ∃ e, t ( Happens(e, t) ∧ t1 <t <t2 ∧ Initiates(e, f, t ) ).
EC5. Happens(e, t1) ∧ Initiates(e, f1, t1) ∧ 0 < t2 ∧ Trajectory(f1, t1, f2, t2) ∧
¬StoppedIn(t1, f1, t1 + t2) ⊃ HoldsAt(f2, t1 + t2).
EC6. Happens(e, t1) ∧ Terminates(e, f1, t1) ∧ 0 < t2 ∧ AntiTrajectory(f1, t1, f2, t2) ∧
¬StartedIn(t1, f1, t1 + t2) ⊃ HoldsAt(f2, t1 + t2).
𝑑𝑒𝑓
EC7. PersistsBetween(t1, f, t2) ≡ ¬∃ t ( ReleasedAt(f, t) ∧ t1 < t ≤ t2 ).
𝑑𝑒𝑓
EC8. ReleasedBetween(t1, f, t2) ≡ ¬∃ e, t ( Happens(e, t) ∧ t1 ≤ t < t2 ∧ Releases(e, f, t ) ).
EC9. HoldsAt(f, t1) ∧ t1 < t2 ∧ PersistsBetween(t1, f, t2) ∧ ¬Clipped(t1, f, t2) ⊃ HoldsAt(f, t2).
EC10. ¬HoldsAt(f, t1) ∧ t1 < t2 ∧ PersistsBetween(t1, f, t2) ∧ ¬Declipped(t1, f, t2) ⊃

¬HoldsAt(f, t2).
EC11. ReleasedAt(f, t1)∧t1 < t2 ∧ ¬Clipped(t1, f, t2)∧¬Declipped(t1, f, t2) ⊃ ReleasedAt(f, t2).
EC12. ¬ReleasedAt(f, t1) ∧ t1 < t2 ∧ ¬ReleasedBetween(t1, f, t2) ⊃ ¬ReleasedAt(f, t2).

𝑑𝑒𝑓
EC13. ReleasedIn(t1,f,t2) ≡ ¬∃ e, t ( Happens(e, t)∧ t1 < t < t2 ∧ Releases(e, f, t ) ).
EC14. Happens(e, t1) ∧ Initiates(e, f, t1) ∧ t1 < t2 ∧ ¬StoppedIn(t1, f, t2) ∧

¬ReleasedIn(t1, f, t2) ⊃ HoldsAt(f, t2).
EC15. Happens(e, t1) ∧ Terminates(e, f, t1) ∧ t1 < t2 ∧ ¬StartedIn(t1, f, t2) ∧

¬ReleasedIn(t1, f, t2) ⊃ ¬HoldsAt(f, t2).
EC16. Happens(e, t1) ∧ Releases(e, f, t1) ∧ t1 < t2 ∧ ¬StoppedIn(t1, f, t2) ∧

¬StartedIn(t1, f, t2) ⊃ ReleasedAt(f, t2).
EC17. Happens(e,t1) ∧ (Initiates(e,f,t1) ∨ Terminates(e,f,t1)) ∧ t1 < t2 ∧

¬ReleasedIn(t1, f, t2) ⊃ ¬ReleasedAt(f, t2).
104
DISCRETE EVENT CALCULUS AXIOMATIZATION
Composed of ten axioms (EC3, EC4, EC5, EC6, EC7, EC8, EC9, EC10, EC11, and EC12) and two
definitions (EC1, and EC2).
𝑑𝑒𝑓
DEC1. StoppedIn(t1,f,t2) ≡ ∃ e, t ( Happens(e, t) ∧ t1 < t < t2 ∧ Terminates(e, f, t ) ).
𝑑𝑒𝑓
DEC2. StartedIn(t1, f, t2) ≡ ∃ e, t ( Happens(e, t) ∧ t1 < t < t2 ∧ Initiates(e, f, t ) ).
DEC3. Happens(e, t1)∧Initiates(e,f1,t1)∧0<t2∧Trajectory(f1,t1,f2,t2)∧

¬StoppedIn(t1, f1, t1 + t2) ⊃ HoldsAt(f2, t1 + t2).
DEC4. Happens(e, t1) ∧ Terminates(e, f1, t1) ∧ 0 < t2 ∧ AntiTrajectory(f1, t1, f2, t2) ∧
¬StartedIn(t1, f1, t1 + t2) ⊃ HoldsAt(f2, t1 + t2).
DEC5. HoldsAt(f, t) ∧ ¬ReleasedAt(f, t+1) ∧ ¬ ∃ e( Happens(e, t) ∧ Terminates(e, f, t ) ) ⊃

HoldsAt(f, t + 1).
DEC6. ¬HoldsAt(f, t) ∧ ¬ReleasedAt(f, t+1) ∧ ¬∃ e(Happens(e, t)∧ Initiates(e, f, t )) ⊃

¬HoldsAt(f, t + 1).
DEC7. ReleasedAt(f, t)∧ ¬∃ e(Happens(e, t) ∧ (Initiates(e, f, t)∨ Terminates(e, f, t ))) ⊃
ReleasedAt(f, t + 1).
DEC8. ¬ReleasedAt(f, t) ∧ ¬∃ e(Happens(e, t)∧Releases(e, f, t)) ⊃ ¬ReleasedAt(f, t + 1).
DEC9. Happens(e, t) ∧ Initiates(e, f, t) ⊃ HoldsAt(f, t+1).
DEC10. Happens() ∧ Terminates(e, f, t) ⊃ ¬HoldsAt(f, t+1).
DEC11. Happens() ∧ Releases(e, f, t) ⊃ ReleasedAt(f, t+1).
DEC12. Happens() ∧ (Initiates(e, f, t) ∨ Terminates(e, f, t))⊃¬ReleasedAt(f, t + 1).
105
ANNEX B REASONING DEC MODELS
106
MOVEMENT MODEL
 Variables
$AgentX
$AgentY
$SpeedX
$SpeedY
$StartDistX
$StartDistY
$StartTimeX
$StartTimeY
$EndTimeX
$EndTimeY
$maxTime
$maxDistance
$offset
 General Rules
; OPTIONS
option timediff on
option showpred off
; Movement Sorts and Fluents Declaration

sort agent

fluent Waiting(agent)
fluent AtMeetingPoint(agent)
fluent Distance(agent, distance)
fluent NormalSpeed(agent)
fluent FasterSpeed(agent)
event Depart(agent)
event Arrive(agent)
; Movement Domain Rules:

[agent, time]

[agent, time]
107
; #MR3: An agent can only be at one distance at the same time
HoldsAt(Distance(agent, distance1), time) & HoldsAt(Distance(agent, distance2), time) ->
distance1 = distance2.
; #MR4: An agent Moving will change his distance proportionally with the elapsed time.
HoldsAt(Distance(agent, distance1), time) & HoldsAt(NormalSpeed(agent), time) &
distance2 = (distance1 - offset*1) -> Trajectory(Moving(agent), time, Distance(agent, distance2), offset).

HoldsAt(Distance(agent, distance1), time) & HoldsAt(FasterSpeed(agent), time) &
distance2 = (distance1 - offset*2) -> Trajectory(Moving(agent), time, Distance(agent, distance2), offset).
; #MR5: An agent Moving (Depart) changes distance to Goal

Releases(Depart(agent), Distance(agent, distance), time).
; #MR6: An agent that is Moving will Arrive when the Distance to Goal is 0.
[agent, time]
HoldsAt(Moving(agent), time) & HoldsAt(Distance(agent, 0), time) -> Happens(Arrive(agent), time).
; #MR7: An agent at a specified distance that arrive, will stay at that distance.
HoldsAt(Distance(agent, distance), time) ->
Initiates(Arrive(agent), Distance(agent, distance), time).
; #MR8: If an agent is not at distance 0 to Goal or is not Moving, he will not Arrive.
[agent, time]
!HoldsAt(Distance(agent, 0), time) | !HoldsAt(Moving(agent), time) -> !Happens(Arrive(agent), time).
; #MR9: If an agent is at distance 0 to Goal or is Moving, he will not Depart.

[agent, time]
HoldsAt(Distance(agent, 0), time) | HoldsAt(Moving(agent), time) -> !Happens(Depart(agent), time).
; #MR10: If an agent is NormalSpeed cannot be FasterSpeed.

[agent, time]
HoldsAt(NormalSpeed(agent), time) -> !HoldsAt(FasterSpeed(agent), time).
108
 Narrative Predicates
HoldsAt($SpeedX($AgentX), 0).
HoldsAt($SpeedY($AgentY), 0).
; At Time = 0: No agent is moving

!HoldsAt(Moving($AgentX),0).
!HoldsAt(Moving($AgentY),0).
; At Time = 0: Distance to Goal is declared

HoldsAt(Distance($AgentX, $StartDistX), 0).
HoldsAt(Distance($AgentY, $StartDistY), 0).
; At Time = X/Y: Both agents start moving

Happens(Depart($AgentX), $StartTimeX).
Happens(Depart($AgentY), $StartTimeY).
; At Time = X/Y: Both agents arrive and stop moving

range distance 0 $maxDistance
109
PLACEMENT MODEL
 Variables
$AgentX
$AgentY
$AgentQ
$StartTimeX
$StartTimeY
$EndTimeX
$EndTimeY
$EndTimeQ
$maxTime
$offset
 General Rules
; OPTIONS
option timediff on
option showpred off
; Movement Sorts and Fluents Declaration

sort agent

fluent Hidden(agent)
fluent Visible(agent)
event PlaceOn(agent)
; Placement Domain Rules:
; #WR1: Stay Visible initiates with PlaceOn

[agent, time]
Initiates(PlaceOn(agent), Visible(agent), time).
; #WR2: Stay Hidden terminates with PlaceOn

[agent, time]
Terminates(PlaceOn(agent), Hidden(agent), time).
; #WR3: Only one agent (object) is placed at the same time.

[agent, time1, time2]
Happens(PlaceOn(agent), time1) & Happens(PlaceOn(agent), time2) -> time1 = time2.
110
; #WR4: A hidden agent is not visible.
[agent,time]
HoldsAt(Hidden(agent), time) -> !HoldsAt(Visible(agent), time).
; #WR5: A visible agent is not hidden

[agent,time]
HoldsAt(Visible(agent), time) -> !HoldsAt(Hidden(agent), time).
; #WR6: If a new agent is placed at the same position, the previous one is not visible
[agent1, agent2, time]
HoldsAt(Visible(agent1), time) -> Terminates(PlaceOn(agent2), Visible(agent1), time).
 Narrative Predicates
; At Time = 0: Both agents are moving
HoldsAt(Hidden($AgentX), $StartTimeX).
HoldsAt(Hidden($AgentY), $StartTimeY).
; At a specific Time = X, each object is placed and the Visible property could change.
Happens(PlaceOn($AgentX), $EndTimeX).
Happens(PlaceOn($AgentY), $EndTimeY).
HoldsAt(Visible($AgentQ), $EndTimeQ).
111
WAITING MODEL
Variables
$AgentX
$AgentY
$AgentQ
$StartTimeX
$StartTimeY
$EndTimeX
$EndTimeY
$EndTimeQ
$maxTime
$offset
General Rules
option timediff on
option showpred off
option trajectory on
; Sorts declaration
sort agent
; Fluents and Events

fluent Waiting(agent)
fluent AtMeetingPoint(agent)

[agent, time]
Initiates(Arrive(agent), AtMeetingPoint(agent), time).

[agent, time]
; #WR3: If an agent is Waiting, he is not Moving

[agent, time]
HoldsAt(Waiting(agent), time) -> !HoldsAt(Moving(agent), time).
; #WR4: If an agent is not Moving, he will not Arrive.

[agent, time]
!HoldsAt(Moving(agent), time) -> !Happens(Arrive(agent), time).
; #WR5: If agent1 arrive and agent2 is not waiting, then agent1 will be waiting.
112
!HoldsAt(Waiting(agent1), time) & agent1 != agent2 ->
Initiates(Arrive(agent2), Waiting(agent2), time).
; #WR6: If agent2 arrive and agent1 is waiting, then agent2 will terminates waiting.
HoldsAt(Waiting(agent1), time) ->
Terminates(Arrive(agent2), Waiting(agent1), time).
; #WR7: If an agent is Moving, he is not at the Meeting Point.

[agent, time]
HoldsAt(Moving(agent), time) -> !HoldsAt(AtMeetingPoint(agent), time).
Narrative Predicates
; At Time = 0: Both agents are moving

HoldsAt(Moving($AgentX), $StartTimeX).
HoldsAt(Moving($AgentY), $StartTimeY).
; Relative Time (related with Meeting Time)

; Time 3 = LATE
; Time 2 = MEETING TIME
; Time 1 = EARLY
HoldsAt(Waiting($AgentW), $EndTimeW + 1).
113
FIT MODEL
Variables
$AgentX
$AgentY
$AgentBig
$AgentSmall
$AgentFit
$maxTime
$offset
General Rules
option timediff on
option showpred off
option trajectory on
; Sorts declaration
sort agent
; Fluents
fluent Fit(agent)
fluent Bigger(agent)
fluent Smaller(agent)
;Events
event Place(agent)
; #WR1: If an agent is placed is because it fit

[agent, time]
Initiates(Place(agent), Fit(agent), time).
; #WR2: Every agent (object) is placed one time.

[agent, time1, time2]
Happens(Place(agent), time1) & Happens(Place(agent), time2) -> time1 = time2.
; #WR3: Only one agent (object) is placed at the same time.

Happens(Place(agent1), time) & Happens(Place(agent2), time) -> agent1 = agent2.
; #WR4: A Smaller object Fits into a Bigger object, but not the opposite.
HoldsAt(Bigger(agent1), time) & HoldsAt(Smaller(agent2), time) ->
HoldsAt(Fit(agent2), time) & !HoldsAt(Fit(agent1), time).
114
Narrative Predicates
; At Time = 0: We define the size of every agent and which one Fits.
HoldsAt(Bigger($AgentBig), 0).
HoldsAt(Smaller($AgentSmall), 0).
HoldsAt(Fit($AgentFit), 0).
range time 0 0
range offset 1 1
115
ANNEX C DOMAIN KNOWLEDGE DATABASE
116
The format followed by the database elements is Type:Word:Value. The Value field could
include a Domain when the Type is Action or the qualification symbol in other case.
ACTIONS:
Action:depart:Movement:Start
Action:arrive:Movement:End
Action:travel:Movement:Both
Action:run:Movement:Both
Action:go:Movement:Start
Action:move:Movement:Both
Action:land:Movement:End
Action:come:Movement:Both
Action:walk:Movement:Both
Action:ride:Movement:Both
Action:drive:Movement:Both
Action:wait:Waiting:End
Action:see:See:Both
Action:overtake:Movement:Both:Speed:+
Action:place:Placement:Both
Action:fit:Fit:Both:Size:-
TIME:
Time:before:-
Time:sooner:-
Time:soon:-
Time:previously:-
Time:early:-
Time:now:0
Time:nowadays:0
Time:at present:0
Time:next:+
Time:after:+
117
Time:afterward:+
Time:later:+
Time:late:+
Time:first:-
Time:last:+
Time:short:-
Time:delay:+
DISTANCE:
Dist:near:-
Dist:nearer:-
Dist:close:-
Dist:closer:-
Dist:next:-
Dist:far:+
Dist:farther:+
Dist:away:+
Dist:distant:+
Dist:deep:+
Dist:deeper:+
Dist:remote:+
Dist:beyond:+
POSITION:
Pos:down:-
Pos:under:-
Pos:front:+
Pos:behind:-
Pos:up:+
Pos:over:+
Pos:inside:-
118
Pos:outside:+
Pos:see:+
Pos:hidden:-
SIZE:
Size:small:-
Size:smaller:-
Size:narrow:-
Size:big:+
Size:bigger:+
Size:wide:+
Size:tall:+
Size:short:-
SPEED:
Speed:fast:+
Speed:faster:+
Speed:slow:-
Speed:slower:-
QUALIFIER:
Qual:very:x2
Qual:many:x2
Qual:insignificant:-2
Qual:enormous:+2
Qual:minuscule:-2
Qual:giant:+2
Qual:tiny:-2
Qual:huge:+2
Qual:few:-1
119
Qual:many:+1
Qual:a lot of:+1
Qual:less:-1
Qual:more:+1
Qual:short:-1
Qual:long:+1
Qual:little:-1
Qual:high:+1
Qual:poor:+1
Qual:great:+1
Qual:same:0
Qual:equal:0
Qual:identical:0
ESPECIAL ENTRIES (COORDINATES, DOMAINS AND WEIGHTS)
Coordinates:Time,Dist,Pos,Size,Speed
Domains:Movement:Waiting:Placement,See:Fit
Weights:Movement|1,1,0,0,1:Waiting|1,0,0,0,0:Placement|0,0,1,1,0:Fit|0,0,1,1,0
120
REFERENCES
Allen, J.F. (1983). Maintaining knowledge about temporal intervals. Communications of the
ACM, Vol. 26 (11), pp. 832–843.
Allen, J.F. (1987). Natural Language Understanding (2nd edition). University of Rochester.
The Benjamin Cummings Publishing Company.
Bailey, D., Harrison, A., Lierler, Y., Lifschitz, V., & Michael, J. (2015). The Winograd
Schema Challenge and Reasoning about Correlation. Working Notes of the Symposium on
Logical Formalizations of Commonsense Reasoning.
[Baker et al, 1998] Framenet
Bender, D. (2015). Establishing a Human Baseline for the Winograd Schema Challenge.
Modern Artificial Intelligence and Cognitive Science Conference 2015, pp. 39-45.
Bova et al. (2015) Towards a Framework for Winograd Schemas Resolution. ESSENCE
Workshop 2015
Budukh, T.U. (2013). An intelligent co-reference resolver for Winograd schema sentences
containing resolved semantic entities. Degree Master of Science Thesis. Arizona State
Univ.
Indurkhya, N., & Damerau, F. J. (Eds.). (2010). Handbook of natural language processing
(Vol. 2). CRC Press.
Dagan, I., Glickman, O., & Magnini, B. (2006). The PASCAL recognizing textual entailment
challenge. Machine learning challenges. evaluating predictive uncertainty, visual object
classification, and recognizing textual entailment, pp. 177-190. Springer Berlin Heidelberg.
Forbus, K.D. (1996). Qualitative Reasoning. The Computer Science and Engineering
Handbook, ed. A. B. Tucker, pp. 715–733.
Kahler, R., & Sullivan, J. (2008). “Microtheories”. Technical Report. Cycorp Inc. http://clio-
knows.sourceforge.net/Microtheories-v2.pdf
Kowalski, R., & Sergot, M. (1989). A logic-based calculus of events. New Generation
Computing, Vol. 4, No. 1, pp. 67-95.
121
Levesque, H. J., Davis, E., & Morgenstem, L. (2012). The Winograd schema challenge.
Principles of Knowledge Representation and Reasoning: Proceedings of the Thirteenth
International Conference. AAAI Press.
Levesque, H. J. (2014). On our best behavior. Artificial Intelligence 212, pages 27-35.
Liu, H. & Singh, P. (2004). Conceptnet & mdash: A practical commonsense reasoning tool-
kit. BT Technology Journal, vol. 22, no. 4, pp. 211–226.
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., & Mc-Closky, D. (2014).
The Stanford CoreNLP natural language processing toolkit. Proceedings of the 52nd
Annual Meeting of the Association for Computational Linguistics.
McCarthy, J. (1960). Programs with Common Sense. Proceedings of the Tenclington

Conference on the Mechanization of Thought Processes, H.M. Stationery Office.
McCarthy, J. (1980). Circumscription-- A form of non-monotonic reasoning. Artificial

Intelligence, Volume 13, Issues 1-2, pp. 27-39.
Mueller, E.T. (2002). Story understanding. Encyclopedia of Cognitive Science. L. Nadel ed.
Volume 4, pp. 238-246. Nature Publishing Group, London, 2002.
Mueller, E.T. (2003a). ThoughtTreasure: A natural language/commonsense platform.

http://alumni.media.mit.edu/~mueller/papers/tt.html
Mueller, E.T. (2003b). Story understanding through multirepresentation model

construction. Text Meaning: Proceedings of the HLT-NAACL 2003 Workshop, Hirst, G., and
Nirenburg, S., eds., pp. 46–53. East Stroudsburg, PA. Association for Computational
Linguistics.
Mueller, E.T. (2004a). Understanding script-based stories using commonsense reasoning.

Cognitive Systems Research, 5(4), 307-340
Mueller, E.T. (2004b). Event calculus reasoning through satisfiability. Journal of Logic and
Computation, Volume 14(5), pp. 703-730.
Mueller, E.T. (2006). Commonsense Reasoning: An Event Calculus Based Approach.

Morgan Kaufmann.
Mueller, E.T. (2008). Discrete Event Calculus Reasoner Documentation. Software

documentation, IBM Thomas J. Watson Research Center, PO Bo, 704
Mueller, E.T. (2006). Common Sense Reasoning: An Event Calculus Based Approach.
122
Ovchinnikova, E. (2012). Integration of World Knowledge for Natural Language
Understanding. Kai-Uwe K¨uhnberger eds. Athlantis Thinking Machines. Institute of
Cognitive Science, University of Osnabrück, Germany.
Peng, H., Khashabi, D., & Roth, D. (2015). Solving hard coreference problems. Urbana, 51,
61801.
Rahman, A., & Ng, V. (2012). Solving complex cases of definite pronouns: the winograd
schema challenge. Proceedings of the 2012 Joint Conference on Empirical Methods in
Natural Language Processing and Computational Natural Language Learning, pp. 777-789.
Association for Computational Linguistics.
Randell, D. A., Cui, Z & Cohn A. G. (1992). A Spatial Logic Based on Regions and
Connection, Procedures from the 3rd International Conf. on Knowledge Representation and
Reasoning (KR-92), Morgan Kaufmann, San Mateo, pp.165-176.
Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An
inquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum.
Shanahan, M., & Miller, R. (1999). The Event Calculus in classical logic—Alternative
axiomatizations. Linköping Electronic Articles in Computer and Information Science, 4(16).
Sharma, A. (2015). Solving Winograd schema challenge: Building and using a semantic
parser and a knowledge hunting module. IJCAI’15 Proceeding of the 24th International
Conference on Artificial Intelligence, pp. 1319-1325. AAAI Press.
Schüller, P. (2014). Tackling Winograd Schemas by Formalizing Relevance Theory in

Knowledge Graphs. 14th International Conference on Principles of Knowledge
Representation and Reasoning.
Siegel, N., Goolsbey, K., Kahlert, R., & Matthews, G. (2004). The Cyc System: Notes on
Architecture. Technical report, Cycorp, Inc.
http://www.cyc.com/wp-content/uploads/2015/04/Cyc_Architecture_and_API.pdf
Strasser, C. & Antonelli, G.A. (2014). Non-monotonic Logic. The Stanford Encyclopedia of
Philosophy (Fall 2015 Edition). Edward, N. Zalta (Ed).
http://plato.stanford.edu/entries/logic-nonmonotonic/
Van Harmelen, F., Lifschitz, V., & Porter, B. (Eds.). (2008). Handbook of knowledge
representation (Vol. 1). Elsevier.
123

Natural Language Understanding With Commonsense Reasoning:: MSC in Artificial Intelligence (Muia)

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Natural Language Understanding With Commonsense Reasoning:: MSC in Artificial Intelligence (Muia)

Cargado por

Copyright:

Formatos disponibles

E.T.S.

UNIVERSIDAD POLITÉCNICA DE MADRID

NATURAL LANGUAGE UNDERSTANDING

AUTHOR: ALFONSO LÓPEZ TORRES

LIST OF FIGURES .............................................................................................................................. viii

PART II: BACKGROUND .......................................................................................... 7

PART III: PROPOSAL .............................................................................................. 33

PART IV: CONCLUSIONS........................................................................................ 98

ANNEX A EVENT CALCULUS AXIOMATIZATION ................................................ 103

ANNEX B REASONING DEC MODELS .................................................................. 106

ANNEX C DOMAIN KNOWLEDGE DATABASE .................................................... 116

REFERENCES ......................................................................................................... 121

Figure 1. Different approaches to the CSR problem. .............................................. 9

In some manner, the understanding process requires an interpretation of the input

When introducing the commonsense reasoning into an automated system or computer,

Is it necessary for a computer having commonsense knowledge to reason and solve a

Commonsense reasoning (CSR) is an essential part of human behavior. By using it,

CSR is totally assumed by us in such a manner that nobody founds difficulties in

3.2. Commonsense Reasoning

Researchers have worked in many different ways of acquisition and representation of

STARTI N G DEEP ER O VER O N E DO MAIN STARTIN G WITH MAN Y DO MAIN S

UND ERS TA ND I NG UND ERS TA ND I NG

AN D EX P AN DIN G O VER N EW DO MAIN AN D GO IN G D EP ER O VER ALL O F THEM

Figure 1. Different approaches to the CSR problem.

Figure 2. General steps in a computational CSR application.

The Information Extraction process is as complex as the level of information we are

 Extraction of all the new information for populating the Commonsense

 Cyc ([Siegel et al., 2004])

 ThoughtTreasure ([Mueller, 2003a])

3.2.2. Reasoning with a Commonsense Knowledge Base

Non-monotonic Logic (NML) proposes a consistent solution to the qualification

What we know at the beginning:

Adding a new fact about what Penguins can do or not:

Qualitative reasoning is by itself a topic within Artificial Intelligence. It refers to the

3.4. Event Calculus and DEC

3.4.1. Event Calculus main concepts

When we have introduced EC as an extension, it is because it is based on the variation

 Happens(e, t). An event e happens at timepoint t

 HoldsAt(f, t). A fluent f is true at timepoint t

 Initiates(e, f, t). An event e starts (initiates) a fluent f at timepoint t.

 Terminates(e, f, t). An event e stops (terminates) a fluent f at timepoint t.

 Releases(e, f, t). An Event e releases a fluent f at timepoint t.

3.4.2. Using Event Calculus for commonsense reasoning

As we are going to use DEC as a solution for the commonsense knowledge

Gener al pr edicates (Wor ld r ules)

Pr oper ties (Wor ld O bser v ations)

Pr oper ties (Wor ld N ar r ative)

Figure 3. Main parts fully describing a domain with Event Calculus

There are three fluents and one possible event:

Rules 4: Only one object can be placed at the same time:

Rules 5 and 6: If an agent is Hidden, it is not possible, and vice versa.

The complete list could be as follow:

Natural Language Understanding (NLU) is an essential part inside the Natural

Inside NLP, it is possible to find different levels of language comprehension (from

4.2. Natural Language Processing: Shallow Understanding

3. Syntactic Analysis: Every token/word is analyzed as a string of symbols and

4.3. Natural Language Understanding: A Deeper comprehension

4.4. The Winograd Schema Challenge

4.4.1. Description of the Winograd Schema Challenge

A: Time Warner is the world’s largest media and internet company.

4.4.2. Defining the Winograd Schema Challenge Corpus

 The lion ate a zebra, because they were hungry

4.4.3. Different approaches to the Winograd Schema Challenge