Está en la página 1de 10

A computational analysis of Mahabharata

Debarati Das Bhaskarjyoti Das Kavi Mahesh


UG student,Dept. of CSE PG Student, Dept. of CSE Dean of Research and Director
PES Visvesvaraya KAnOE-Center for Knowledge Analytics
Institute of Technology Technological University and Ontological Engineering
Karnataka, India Karnataka, India PES University, Bangalore
Debarati.d1994 Bhaskarjyoti01 Drkavimahesh
@gmail.com @gmail.com @gmail.com

Abstract towards appreciating the literary work, to under-


stand underlying social network and to find or val-
Indian epics have not been analyzed com- idate literary truths. As literary text is built around
putationally to the extent that Greek epics a social backdrop, it reflects the society the author
have. In this paper, we show how inter- lives in and reveals a lot about the contemporary
esting insights can be derived from the an- social setting.
cient epic Mahabharata by applying a va- Unlike SMS and tweets, genre is important in
riety of analytical techniques based on a literary text. Amongst the past and recent liter-
combination of natural language process- ary genres, epics and novels have seen most of the
ing, sentiment/emotion analysis and so- work in the Digital Humanity community as the
cial network analysis methods. One of scope is typically large in terms of time, number
our key findings is the pattern of signif- of events and characters to facilitate computational
icant changes in the overall sentiment of analysis. The Greek epics Iliad and Odyssey,
the epic story across its eighteen chapters the English epic Beowulf, novels such as Vic-
and the corresponding characterization of tor Hugos Les Miserable and works of William
the primary protagonists in terms of their Shakespeare are some of the examples. How-
sentiments, emotions, centrality and lead- ever, there is no major existing work around Indian
ership attributes in the epic saga. epics such as Ramayana and Mahabharata. Hence
we have chosen Mahabharata as the target text for
1 Introduction a computational analysis effort.
Large epics such as the Mahabharata have a wealth
2 Related work
of information which may not be apparent to hu-
man readers who read them for the fascinating sto- The first important step in computational analy-
ries or spiritual messages they contain. Compu- sis of a literary text is to identify the protago-
tational analysis of large texts can unearth inter- nists. Next the relatedness of the protagonists can
esting patterns and insights in the structure, flow be computed to form the underlying social net-
of stories and dynamics of the numerous charac- work. There are essentially two methods to cap-
ters in the intricate stories that make up the epics. ture social network from a literary text. One op-
Unfortunately, not much attention has been paid tion is to capture all social events such as con-
to applying natural language processing and other versations assuming that all characters participat-
related techniques to carry out computational anal- ing in a social event are socially related. This
yses of Indian epics. In this work, we attempt method does not work well for narrative intensive
to carry out detailed analyses of the Mahabharata text. The other method assumes that all charac-
epic. ters appearing in a given co-occurrence window
Sentiment and social network analyses have have some kind of social relations. This approach
been applied mainly to structured texts such as ends up considering even insignificant characters
tweets, emails etc. to discover user sentiments but works better for narrative based texts such as
or important personalities. Comparatively literary epics. Newman and Girvans work (2004) to de-
works are less subjected to computational anal- tect the communities in Victor Hugos Les Mis-
ysis as there are no immediate business incen- erable is the first major effort to find the social
tives. However,similar techniques can be adopted 219 network from narratives. Sack (2012) deduced
D S Sharma, R Sangal and A K Singh. Proc. of the 13th Intl. Conference on Natural Language Processing, pages 219228,
c
Varanasi, India. December 2016. 2016 NLP Association of India (NLPAI)
the plot from network by using concepts of struc- (http://sentiwordnet.isti.cnr.it/). Alternatively, in
tural balance theory. Elson et al.(2010) proposed a supervised classification approach labelled data
dialogue based method to extract social network. sets from similar domains are utilised. How-
Jayannavar et al. (2015) updated Elsons approach ever, this approach works where the training
by broadening the scope of conversation to so- dataset from similar domain is available and this
cial events. Rydberg-Cox (2011) extracted so- method is not suitable for sentiment analysis for
cial networks from Greek tragedies. Agarwal et an epic. Emotion analysis finds causes of senti-
al.(2012) showed that a dynamic network analy- ment. Robert Plutchik(1980) defined the eight ba-
sis can present more subtle facts. Beveridge and sic emotion types. Mohammad and Turney (2010)
Shan (2016) built the underlying social network created the NRC emotion lexicon which is an as-
for the third book (A storm of swords) of the TV sociation of a list of words with these eight basic
series Game of Thrones with a co-occurrence types of emotion and two types of sentiment. Mo-
window of 15 words. Stiller et al. analyzed ten hammad (2011) presented an emotion analyzer as
of Shakespeares plays (2003) also based on the a visualization exercise of these emotions in liter-
co-occurrence logic. Carron and Kenna (2012) ary text.
provided a quantitative approach to compare net-
works. Mac Carron et al.(2014) did a structural Table 1: Key Attributes of Mahabharata Text
analysis of Iliad, English poem Beowulf and Irish
epic Tain Bo Cuailnge. P. J. Miranda et al.(2013) Attributes Value Remarks
has done a structural analysis of underlying so- Size in
15,175 K English translation
cial network of Homers Odyssey. Alberich et bytes
al.(2002) have built a social network from Marvel Size in After removing
13,947 K
comics. bytes comments
As Mahabharata is an epic, we must mention Number
28,58,609 Using NLTK
Poetics by Aristotle and an excellent commentary of words
provided by Lucas (1968). Aristotle defined lit- Number
erary genres such as poetry, tragedy, comedy and of unique 32,506 Using NLTK
epic. Poetry mimics life. Tragedy is a type of po- words
etry that showcase noble men and their noble qual- Number
ities as well as values. Epics such as Mahabharata of sen- 1,18,087 Using NLTK
are a type of tragedy and are built around noble tences
men in the form of narratives. A tragedy typically Number
has a plot with a beginning, a middle and an end of chap- 18 parva
and other constituents of the text are secondary to ters
the plot. The beginning of the plot typically is a Number
appearing at least
scenario of stability which gets disturbed by some of char- 210
10 times
events. The middle is where the disequilibrium acters
comes in along with lot of events and actions by
the characters. All the events and actions are to- For our research, we have used the
wards achieving the end where the problem gets English translation of Mahabharata
resolved and stability sets in again. Plots have var- available at Project Gutenberg site
ious constituents i.e. suffering, reversal, recogni- (http://www.gutenberg.org/ebooks/7864). This
tion of new knowledge, surprise. An epic is differ- is a translation by Kisari Mohan Ganguli done
ent from a more recent literary genre like a novel between 1883-1896. Mahabharata is larger than
and will have lot of negative sentiment across its Iliad and Odyssey together, compiled many years
breadth but in spite of that conveys a noble theme ago. This has 18 parvas or chapters and each
in the minds of its audience. parva has many sections.
One can measure sentence polarity by refer-
3 The methodology
ring to some standard thesaurus where polar-
ity measures are preassigned by researchers.This Mahabharata is not dialogue heavy and is mostly
approach uses a resource like SentiWordnet 220 narrative. So, identifying relations between char-
acters is done using co-occurrence algorithm with 4. Network analysis
window size of a sentence.The method we devised
- Various network metrics are computed
for a comprehensive computational analysis of the
for the social graph for each of the 18
Mahabharata epic is as follows:
parvas in both cumulative and stan-
1. Pre-processing dalone way viz. betweenness central-
ity, closeness centrality, degree central-
- Filter out supporting texts such as tables
ity, size of maximal cliques, number of
of content, publisher details and chapter
detected communities, size of ego net-
summaries.
works for main nodes, core periphery
- Separate the text into chapters (called analysis, density of the core and overall
parva) using suitable regular expres- network etc.
sions.
- Additionally various structural metrics
- Separate each parva into sections are computed for social graph viz. de-
based on the structural elements in the gree assortativity, percentage size of gi-
text. ant component,average clustering coef-
2. Identifying characters ficient, average shortest path length etc.

- Identify all proper nouns using POS tag- 5. Overall sentiment analysis
ging - Using syntactic meta data, phrases con-
- Input a list of known characters of the taining noun, adjective, verb and ad-
Mahabharata story (widely available on verbs are identified.
the internet).
- The above text is tokenized using stan-
- Input a thesaurus of equivalent dard NLP techniques.
names for the characters (also widely
- The tokens are POS (parts of speech)
known, e.g. Draupadi=Panchali, Ar-
tagged and tagged tokens are mapped to
juna=Phalguni etc.) to merge equivalent
synsets in Wordnet in a word sense dis-
names.
ambiguation process.
- Filter out a list of known place names
- The sentiment scores are picked up from
in ancient India and its neighbouring re-
SentiWordnet for each synset.
gions.
- Overall sentiment of the parva is derived
- Apply a threshold to retain names whose
from these values by summing the con-
frequency is above a minimum value
stituent sentiment scores.
(resulting in 210 characters for the Ma-
habharata story). 6. Sentiment analysis for main characters
- Retain only those characters which are
- Similarly sentiment analysis of each
in the top 30 percent of characters men-
protagonist is done by extracting the
tioned in a given parva (resulting in
sentences where the protagonist ap-
about 70 characters overall). Same logic
pears. This is done for each parva.
is followed for both individual and cu-
mulative analysis of each parva. 7. Emotion analysis
The following steps are carried out separately - Emotion analysis for the full text and
for each parva and also for the entire text. each of the protagonists is done with
3. Co-occurrence analysis the help of NRC word-emotion associ-
ation lexicon. After extracting the rele-
- Compute a co-occurrence matrix for vant part of the corpus,the score is cal-
the identified characters using sen- culated for each POS (part of speech)
tence boundaries as windows of co- tagged token for each emotion and fi-
occurrence. nally summed up. The obvious limita-
- Build a social graph from the co- tion with any lexicon based approach is
occurrence matrix. 221 the limitation imposed by the size of the
lexicon itself and this limitation does ap- - Amongst the princesses and queen
ply to our analysis as well. mothers, Kunti turns out to be the un-
derstated (in the existing literary anal-
We have used the Python, NLTK(Natural Lan- ysis) power behind the scene (having a
guage Toolkit), various open source libraries large ego network and high centralities).
(TextBlob, Networkx, Stanford SNAP, Gephi) and Her low eigenvector centrality leads to
data analytics/visualization software Tableau in false perception that she is not impor-
our work. tant. Other main lady characters (Gand-
hari, Madri, Draupadi) are low on be-
4 Analysis of results tweenness as their influence is limited to
4.1 The protagonists one camp.
We have tried out 3 different approaches to iden- 4.2 The words say a lot
tify the protagonists.
Word clouds show a marked difference be-
tween the protagonists as shown in Figure.3a to
- Most frequently mentioned character: As
Figure.3d. These are drawn by extracting adjec-
shown in Figure.1a, this method finds the
tives from respective corpus.
most frequent characters. However this
misses out the protagonists who are unfortu- - Both Arjuna and Bhima are mighty and
nately low on frequency but may be impor- warrior. But Arjuna has words like great,
tant otherwise. excellent, capable, celestial whereas
Bhima has terrible, fierce etc. So Arjuna
- Size of the ego network: Size of ego net-
is the best in his class whereas Bhima is a
work (number of nodes directly connected)
mighty warrior with terrible anger.
calculated from Mahabharata social network
produces different results. As shown in - Bhisma has invincible, principal, virtu-
Figure.1b, Kripa who is a teacher of the ous whereas Krishna has celestial, beau-
princes, is topping the list. Chieftains like tiful, illustrious. So, Bhisma sounds more
Shalya, Virata, Drupada come towards the like an invincible warrior famous for his
top in this list. Kunti(mother of Pandavas), virtue, whereas Krishna is almost godly.
Indra (the king of gods) and Narada (the
sage) are also in this list being well con- - For Duryodhana, wicked, terrible etc.
nected! stand out whereas for Yudhisthira, virtuous
and righteous are key words. Both are
- Centrality metrics: The betweenness, leaders of their respective camps but they are
eigenvector, closeness and degree centrality poles apart.
are compared. Few observations can be made
out of this from Figure.2: 4.3 Sentiments across the text
Mahabharata takes the readers through a roller
- Betweenness centrality differentiates coaster ride of sentiment as shown in Figure.4.
the main protagonists whereas other Aadi parva(1) starts on a positive note but the
centrality metrics are mostly equivalent. Sabha parva (2) brings lot of negativity with the
- Arjuna, Karna, Krishna, Yudhisthira, game of dice. Vana parva(3) is again positive
Bhisma, Kunti and Drona are the top as Panadavs in spite of being in exile, make lot of
few in terms of all four centrality. They friends and have achievements. Virat parva(4)
are the most important protagonists. is negative as the Pandavas have to live in disguise
- Some of the personalities with very doing odd jobs. Udyog Parva (5) is again pos-
large ego network are having very low itive with both sides are very hopeful of winning
betweenness centrality and not making war. After that as elders and leaders get killed in
into the top list (Kripa, Shalya, Drupada, the battle, it is a downward slide of sentiment with
Virata etc.) because their influence is Duryodhanas death bringing in positive emotion
limited to one camp i.e. Kaurava or Pan- in Shalya parva(9). In Stri parva (11), the
dava. Their importance is mostly local.222 destruction is complete and sentiment reaches the
(a) Frequency of occurrence (b) Size of ego network

Figure 1: Finding protagonists by number of mention and ego network

Figure 2: Finding protagonists by comparing centrality metrics

(a) Arjuna word cloud (b) Bhima word cloud (c) Duryodhana word cloud (d) Yudhisthira word cloud

Figure 3: Words say a lot

223
Figure 4: Sentiment across parvas of Mahabharata

(a) comparing sentiment: Krishna, Dhritarashtra (b) comparing sentiment: Drona, Bhisma

Figure 5: Comparing the sentiments

(a) comparing sentiment: Kunti, Gandhari (b) comparing sentiment: Yudhisthira, Duryodhana

Figure 6: Comparing the sentiments


224
lowest level. The Shanti parva (12) brings in - Amongst the key ladies, Kunti stands out by
peak of positive sentiment with coronation of Yud- the richness of positive emotion (trust and
histhira and many achievements. After that, it is joy) and is the bedrock of strength for the
again a downward slide of sentiments with many Pandavas when they go through all their re-
deaths and even death of Lord Krishna. The senti- versals of fate. Gandhari is relatively low
ment sees an uptick in the last two parvas when key whereas Draupadi displays all the neg-
Pandavas leave for Himalayas and finally attain di- ative emotions that are key ingredients of a
vine status. Figure.5a to Figure.6b depict the net tragedy.
sentiment of the main protagonists according to
- Amongst the Pandava and Kaurava leaders
the parva. It leads to some interesting observa-
(Duryodhana and Yudhisthira), Yudhisthira
tions.
displays trust and joy more than any other
- Warriors like Arjuna and Bhima have lot of emotion. Probably that is why he is perceived
negativity around them. as a leader though there are many others with
- The leaders of the two warring camps Dury- much more bravery and heroics. The con-
odhana and Yudhisthira are clear contrast trast between Duryodhana and Yudhisthira is
as Yudhisthira has lot of positive sentiment telling.
around him. - Bhima and Duryodhana are very similar in
- The gods like Indra and Agni have mostly emotions i.e. anger, trust and fear. Arjuna
positivity around them as they are mostly is quite unique and ambidextrous i.e. he dis-
neutral on the ground. plays enough of anger and fear and also large
quantity of trust and joy.
- The eldest warrior, Bhisma is mostly neutral
whereas Drona is committed to one camp and - Amongst the elders, Bhisma is a detached
so is surrounded by negativity. Dhritarashtra, persona and he does not show much of emo-
though elder, is mostly surrounded by nega- tion. Drona is more attached to one camp and
tive sentiments. comparatively shows anger more than any
other emotion.
- The two queen mothers Gandhari and Kunti
are the sources of positive energy in both - Krishna shows tremendous amount of trust,
camps. Though understated they play pivotal anticipation and joy in spite of all the
roles. Compared to them, Draupadi is sur- tragedies and it is no wonder that he is called
rounded by negative sentiment. an incarnation of god.

- Lord Krishna, when he is in the thick of war, 4.5 Leadership analysis


has negativity around him but once the battle We searched for leaders using two criteria viz.
gets over and larger senses prevail, he brings high in positive sentiments and high in cen-
in sense of karma and lot of positive senti- trality (degree and/or betweenness) as shown in
ments. Figure.10. Our assumption is leaders are not only
4.4 The emotions centrally connected but they also show lot of pos-
itivity.
We have analyzed the emotions both at the global
and the protagonist level as shown in Figure.7 to - It becomes very clear why Krishna is
Figure.9. Out of the eight basic emotion types, supreme as he is the only one who is in the
anger and trust are the key ones as expected in a high corner of this target quadrant.
tragedy that has an epic battle as the mainstay. An-
- Closely following Krishna is Yudhisthira.
ticipation, disgust, fear, sadness come in almost
That explains why in spite of not being a
equal proportion. In the scheming world of Ma-
great warrior and known addiction for gam-
habharata, there is not much of surprise and joy is
bling, Yudhisthira is so well respected.
kind of overshadowed by the other negative emo-
tions. If we consider the emotions for some of the - Going by the same yardstick for leadership,
main protagonists, interesting conclusions can be Arjuna, Bhima, Drona, Karna are more of
drawn. 225 achievers or doers rather than leaders.
Figure 7: Emotions across the text

Figure 8: Emotion Analysis of Bhisma, Dhritarashtra, Drona and Krishna

Figure 9: Emotion Analysis of Arjuna, Bhima, Duryodhana, Karna, Yudhisthira

Figure 226
10: Leadership
- The story of Mahabharata encompasses many
years before the battle, 18 days of battle
and around thirty six years after the battle.
The evolving social network of Mahabharata
across the parvas is analyzed using various
structural metrics viz. degree, average de-
gree, number of edges, number of maximal
cliques and density of the main core as well
as overall density. As shown in Figure.11a
and Figure.11b, various structural metrics of
the underlying social network tend to stabi-
(a) Considering diameter, degree and edge
lize towards the end after becoming desta-
bilised initially following Aristotelian frame-
work of stability-instability-stability.

- Mahabharata network comes out as a small


world network(small average shortest path
and large clustering coefficient). Transitiv-
ity measured is comparable to other random
graph of similar size such as Barabassi Al-
bert model. However, modularity is found to
be low (mostly 3 communities detected) com-
pared to some real world networks. Also the
(b) Considering maximal cliques and density high positive correlation coefficients for each
Figure 11: Evolution of social network across par- centrality pair, large giant component and
vas negative degree assortativity indicate large
fictional component in Mahabharata.

- Bhisma is neither great in centrality nor in 5 Discussion and conclusion


positivity. He is more of a helpless specta- In this work, we have applied various Natural
tor apart from his hard to find commitment to Language Processing and Social Network Anal-
whatever promise he makes. ysis techniques to come up with a computational
- Clearly Kaurava camp lacks in leadership. analysis of the Mahabharata. We have not only
Duryodhana, the Kaurava leader, shows the validated what the literary critics have unearthed
lack of it and it is somewhat compensated by about the epic but also augmented their findings
the combined effort of the achievers in his by discovering subtle facts. Protagonists are iden-
camp. tified and analyzed using both statistical and so-
cial network parameters such as centrality and ego
4.6 The social network of Mahabharata network. The trajectory of sentiment and various
- The core periphery analysis of the social emotions across the length of the text for each pro-
network reveals a core of size 52 and con- tagonist are examined. The findings validate what
sistently high density that remains compara- the literary critics have already found. Addition-
ble to the overall density of the network i.e. ally this analysis brings out some subtle facts i.e.
the plot is built around these members of the Kunti is understated in the existing literary analy-
core. sis but is seen to be playing a pivotal role as dis-
played by the sentiments, emotions, centrality and
- Mahabharata is also the story of three camps large ego network size. We figured out the influ-
as proved by community detection tech- ence category of various protagonists in terms of
niques using Louvain algorithm (Blondel et local or global influence.
al., 2008). They are the Kauravas, Pandavas The leadership analysis explains why Yud-
and the gods/sages who remained somewhat histhira is described in such glorious terms in spite
neutral. 227 of his many weaknesses. We have also looked at
leadership quotient of various protagonists by con- literary fiction. In Proceedings of the 48th annual
sidering their position in the centrality-positivity meeting of the association for computational lin-
guistics, pages 138147. Association for Computa-
quadrants and have brought out the leadership
tional Linguistics.
contrast between the warring camps in this epic.
The analysis also helps to explain why Mahab- Prashant Arun Jayannavar, Apoorv Agarwal, Melody
harata is an epic. Apart from the sheer number of Ju, and Owen Rambow. 2015. Validating liter-
ary theories using automatic social network extrac-
characters, events, diversity of emotion and sen- tion. on Computational Linguistics for Literature,
timent, it is found to conform to the Aristotelian page 32.
definition of epics having the stability-instability-
Padraig Mac Carron and Ralph Kenna. 2012. Univer-
stability transitions. The analysis of the struc-
sal properties of mythological networks. EPL (Eu-
tural metrics also indicate that Mahabharata is not rophysics Letters), 99(2):28002.
purely factual and has a large fictional component.
Clearly computational analysis of a literary text P Mac Carron and R Kenna. 2014. Network analy-
sis of beowulf, the iliad and the tain bo cuailnge.
does not make the literary analysis redundant. But In Sources of mythology: Ancient and contemporary
this provides an additional tool set for the students myths. Proceedings of the Seventh Annual Interna-
of literature to validate and augment their find- tional Conference on Comparative Mythology (15
ings. The methods used can be easily replicated 17 May 2013, Tubingen), pages 125141.
for other texts. Pedro J Miranda, Murilo S Baptista, and Sandro E
As a next step, we plan to extend similar analy- de S Pinto. 2013. Analysis of communities
sis to the Indian epic Ramayana and perform simi- in a mythological social network. arXiv preprint
lar structural analysis of the underlying social net- arXiv:1306.2537.
works. Saif M Mohammad and Peter D Turney. 2010. Emo-
tions evoked by common words and phrases: Us-
Acknowledgement ing mechanical turk to create an emotion lexicon. In
Proceedings of the NAACL HLT 2010 workshop on
This work is supported in part by the World computational approaches to analysis and genera-
Bank/Government of India research grant under tion of emotion in text, pages 2634. Association for
the TEQIP programme (subcomponent 1.2.1) to Computational Linguistics.
the Centre for Knowledge Analytics and Ontolog- Saif Mohammad. 2011. From once upon a time
ical Engineering (KAnOE http://www.kanoe.org) to happily ever after: Tracking emotions in novels
at PES University, Bangalore, India. and fairy tales. In Proceedings of the 5th ACL-
HLT Workshop on Language Technology for Cul-
tural Heritage, Social Sciences, and Humanities,
pages 105114. Association for Computational Lin-
References guistics.
Apoorv Agarwal, Augusto Corvalan, Jacob Jensen, and
Owen Rambow. 2012. Social network analysis of Mark EJ Newman and Michelle Girvan. 2004. Find-
alice in wonderland. In Workshop on Computational ing and evaluating community structure in networks.
Linguistics for Literature, pages 8896. Physical review E, 69(2):026113.

Ricardo Alberich, Joe Miro-Julia, and Francesc Robert Plutchik. 1980. A general psychoevolutionary
Rossello. 2002. Marvel universe looks almost theory of emotion. Theories of emotion, 1:331.
like a real social network. arXiv preprint cond-
Jeff Rydberg-Cox. 2011. Social networks and the lan-
mat/0202174.
guage of greek tragedy. In Journal of the Chicago
Poetics Aristotle. 1968. Introduction, commentary and Colloquium on Digital Humanities and Computer
appendixes by dw lucas. Oxford, 125:16. Science, volume 1.

Andrew Beveridge and Jie Shan. 2016. Network of Graham Sack. 2012. Character networks for narrative
thrones. Math Horizons, 23(4):1822. generation. In Intelligent Narrative Technologies:
Papers from the 2012 AIIDE Workshop, AAAI Tech-
Vincent D Blondel, Jean-Loup Guillaume, Renaud nical Report WS-12-14, pages 3843.
Lambiotte, and Etienne Lefebvre. 2008. Fast un-
folding of communities in large networks. Jour- James Stiller, Daniel Nettle, and Robin IM Dunbar.
nal of statistical mechanics: theory and experiment, 2003. The small world of shakespeares plays. Hu-
2008(10):P10008. man Nature, 14(4):397408.

David K Elson, Nicholas Dames, and Kathleen R


228
McKeown. 2010. Extracting social networks from

También podría gustarte