Está en la página 1de 27

What’s in a Persian Name?

Zina Saadi
Computational Linguist,
Middle Eastern Languages Specialist
June 7, 2007

Proprietary Information of Basis Technology Corp.


Why Learn About Names?
ƒ Lists of names used to retrieve information about similar events in
different languages … Transliteration
‫ﻣﺤﻤﻮد أﺣﻤﺪي ﻧﺠﺎد‬
Mahmud Amadinedschad マフムード・アフマディーネジャード
(de) (ja) (ar)
‫מחמוד אחמדינג'אד‬
Mahmoud Ahmedinejad (he) ‫اﺣﻤﺪﯼ ﻧﮋاد‬ ‫ﻣﺤﻤﻮد‬
(en)
(ur)
마흐무드 아흐마디네자드
Махмуд Ахмадинеджад
(ko)
(ru)
Μαχμούτ Αχμεντινεντζάντ
馬赫茂德·艾哈邁迪-內賈德
(el) ‫ﻣﺤﻤﻮد اﺣﻤﺪﯼﻧﮋاد‬
(zh)
(fa)
Basis provides solutions for Name Transliteration in
Arabic, Chinese, English, Farsi, Korean, Pashto, Urdu
2
Name Transliteration vs. Matching
ƒ News analysts are interested in news regarding a particular person
ƒ Spelling variants for an individual’s name are important … Matching

FA: ‫ﺟﻤﺎل ﻣﻴﺮﺻﺎدﻗﯽ‬


Matching
Spelling Variants
Prefix ‫ﻣﻴﺮ‬ FA: ‫ﺟﻤﺎل ﻣﻴﺮ ﺻﺎدﻗﯽ‬
Transliteration

EN: Jamal Mirsadeghi


Matching Spelling Variants
/gh/ /q/
‫غ‬ ‫ق‬ EN: Jamal Mirsadeqi (less common)

Basis meets these challenges with providing


solutions for Name Transliteration & Matching
3
Presentation Overview
ƒ Name Format in Farsi
ƒ Pre-Shah Period
ƒ Post-Shah Period

ƒ Farsi Linguistic Specifications


ƒ Phonological Rules
ƒ Morphological Rules
ƒ Orthographic Variations
ƒ Cross-Lingual Borrowings

ƒ Application of These Linguistic Specifications


ƒ Transliteration & Matching Experiment: to align Farsi names and
their Arabic transliteration

4
Name Format in Farsi
ƒ Reign of Reza Shah (1925-1941)

Before the Rule of the Shah

• Lack of surnames
• Combination of affixes and
given names

Since the Rule of Shah (1925)

• The Shah required surnames


• Name-Affixes remained in
use
• Given Name(s) + Surname(s)

5
Pre-Shah Name Affixes Specifications
ƒ Attached or separated

ƒ Format Prefix(es) +Given Name(s)+ Suffix(es)

ƒ Examples: Mirza Mohammad Farrokhi Yazdi


Haji Mirza Hassan Tabrizi

ƒ Affixes give more insight about the person:


ƒ Social class (education, respect)
ƒ Religion Affiliation
ƒ Origin: city where a person is born

6
Pre-Shah Name Format
Name-Prefixes

Examples:

ƒ Respect: Aqa/Agha ‫اﻗﺎ‬/‫( ﺁﻏﺎ‬meaning: sir/mister, borrowed from Turkish)


Mir ‫( ﻣﻴﺮ‬meaning: master, contraction of Amir/‫)اﻣﻴﺮ‬
Hajji ‫( ﺣﺎﺟﻰ‬person who completed the pilgrimage to Mecca)

ƒ Religion: Darvish/Dervish ‫( دروﻳﺶ‬Sufi mystic)


Mulla ‫( ﻣﻼ‬Islamic religious figure)
Seyyed/Sayyeda ‫ﺳﻴﺪﻩ‬/‫( ﺳﻴﺪ‬descendant of Prophet Mohammed)

7
Pre-Shah Name Format (Cont.)
Name-Suffixes

Examples:
ƒ Locality: Tihrani, Isfahani, Shirazi
ƒ Descendent:
Alavi ‫( ﻋﻠﻮى‬1st Imam: Ali ibn Abu Talib)
Jafari/Jafri ‫( ﺟﻌﻔﺮى‬6th Imam: Jafar as-Sadiq)
Mousavi/Kazemi ‫ آﺎﻇﻤﻰ‬/ ‫( ﻣﻮﺳﻮى‬8th Imam: Musa Al-Kazim)
Naqavi ‫( ﻧﻘﻮى‬10th Imam: Ali al-Hadi al-Naqavi)

Actual Personal Names:


Mulla Sadra/Mollasadra ‫( ﻣﻼﺻﺪرا‬17th century Persian philosopher)
Seyyed Ali Naqi Naqvi (‫( )ﺳﻴﺪ ﻋﻠﻰ ﻧﻘﻰ ﻧﻘﻮى‬Indian historian)

8
Pre-Shah Name Format (Cont.)
Pre-Shah Dual Functionality Affixes
ƒ Respect vs. Descent:

Mirza ‫ﻣﻴﺮزا‬ Name-Prefix: respect for a literate person

Name-Suffix: royal descent


Examples:
ƒ Mirza Ali (17th century Persian physician)

ƒ Iskander Ali Mirza (Persian: ‫)اﺳﮑﻨﺪر ﻋﻠﻰ ﻣﻴﺮزا‬


(Urdu: ‫)اﺳﮑﻨﺪر ﻋﻠﻰ ﻣﺮزا‬

First President Pakistan (1956-1958)


Descendent of Mir Jafar (monarchical ruler in Bengal)

9
Pre-Shah Name Format (Cont.)
ƒ Locality vs. Origin:
Karbala’i ‫آﺮﺑﻼﺋﻰ‬ Name-Prefix: pilgrimage to Karbala (Iraq)
Name-Suffix: from Karbala

Mashhadi ‫ﻣﺸﻬﺪى‬ Name-Prefix: pilgrimage to Mashad (Iran)


Name-Suffix: from Mashad

10
Names since the rule of Reza Shah (1925)

ƒ Given Name(s) Single word: Kivan ‫آﻴﻮان‬


Compound: Amir Hussein ‫اﻣﻴﺮ ﺣﺴﻴﻦ‬, Alireza ‫ﻋﻠﻴﺮﺿﺎ‬

ƒ Surnames were required by the Shah

ƒ Surname(s) Single word: Muhammadi, Ahsani, Muzhgan


With affixes: Bahramzadah, Kiyanfar
Compound: Darya-Bandari ‫درﻳﺎ ﺑﻨﺪرﯼ‬

11
Presentation Overview
ƒ Name Format in Farsi
ƒ Pre-Shah Period
ƒ Post-Shah Period

ƒ Farsi Linguistic Specifications


ƒ Phonological Rules
ƒ Morphological Rules
ƒ Orthographic Variations
ƒ Cross-Lingual Borrowings

ƒ Application of These Linguistic Specifications


ƒ Transliteration & Matching Experiment: to align Farsi names and
their Arabic transliteration

12
Farsi Phonological Rules
ƒ One letter -> one sound Many letters -> one sound

13
Farsi Vowels System
ƒ Long vowels: IPA Example (s) Translit.

/ɒː/, /uː/, /iː/ /ɒː/ ‫ﺷﺎدﯼ‬/‫ﺁرزو‬ Arezou/Shadi


ƒ Short vowels: /uː/ ‫ﻣﻮﺳﯽ‬/‫اورﻋﯽ‬ Oraee/Mousa
/æ/, /o/, /e/
/iː/ ‫ﭘﺮﯼ‬/‫ﻋﻴﺴﯽ‬ Issa/Pari
ƒ Diphthongs:
/æ/ ‫ َﻳﺤﻴٰﯽ‬/‫َا ْﻓﺸٰﺎر‬ Afshar/Yahya
/ei/, /æi/, /ɒi/, /ow/
ƒ No one-to-one mapping between /o/ ‫ُاوﻳﺴﯽ‬ Oveissi
Persian short and long-vowels and /e/ ‫ ژاِﻟﻪ‬/‫ِاﻟﻬﺎم‬ Ilham/Zhale
Arabic diacritics or Persian letters
/ei/ ‫ا ِو ْﻳﺴﯽ‬ Oveissi
example:
‫( دو‬digit 2) ‫( دوﺳﺖ‬friend) /æi/ ‫ﺣ ْﻴﺪر‬
َ Haidar

Do Doost /ɒi/ ‫ﯼ‬


ْ ‫اﺟﺎ‬ Ajay
/o/ /uː/ /ow/ ‫ﺷ ْﻮوان‬
ُ Showvan

14
Farsi Morphological Rules
ƒ Prefixes:
ƒ Por: (full)
examples: Por Helm ‫ ﭘﺮﺣﻠﻢ‬, Por-Azaram ‫ﭘﺮﺁزرم‬

ƒ Suffixes:
ƒ Zad: (birth)
examples: Shahmirzadi: ‫ﺷﻬﻤﻴﺮزادﯼ‬, Farzad: ‫ﻓﺮزاد‬
ƒ Zadeh: (descendent/son)
examples: Hassan-Zadeh: ‫ﺣﺴﻦزادﻩ‬
Wikipedia: Princess Noor Pahlavi ‫ﺷﺎهﺰادﻩ ﻧﻮر ﭘﻬﻠﻮﯼ‬

ƒ Prefixes/Suffixes: (meaning is the same)

ƒ Nezhad: (descent, race) Translation


examples: Ahmedinezhad: ‫اﺣﻤﺪﯼ ﻧﮋاد‬, Ethnology: ‫ﻧﮋادﺷﻨﺎﺳﯽ‬

15
Farsi Morphological Rules (Cont.)

ƒ Pour: (old-Farsi: son of)


examples: Pour-Azar‫( ﭘﻮرﺁذر‬metaphor: title of Ibrahim)
Richard Danielpour (Jewish composer)

These prefixes/suffixes can attach to ordinary Farsi words


e.g. Poursadef => Pearl (son of oyster)
‫ﭘﻮرﺻﺪف‬

16
Orthographic Variations
ƒ Prefixes and Suffixes can occur:
ƒ Attached: ‫ﻋﻠﻴﺰادﻩ‬
ƒ Separated with a Zero Width Non-Joiner ZWNJ(U+200C): ‫ﻣﺴﻠﻤﯽزادﻩ‬
ƒ Separated with a Space: ‫ﺗﻘﯽ زادﻩ‬

ƒ Hamza Variants: (mostly with Arabic borrowings)


Alef: ‫ اء‬vs. ‫ ا‬as in ‫ وﻓﺎء‬vs. ‫وﻓﺎ‬
Vav: ‫ؤ‬ vs. ‫ و‬as in ‫ ﻣﺆﻧﺲ‬vs. ‫ﻣﻮﻧﺲ‬
Yeh: ‫ ئ‬vs. ‫ ﯼ‬as in ‫ رﺿﺎﺋﯽ‬vs. ‫رﺿﺎﻳﯽ‬
Yeh-Vav: ‫ ﺋﻮ‬vs. ‫ و‬as in ‫ ﻣﺴﺌﻮل‬vs. ‫ﻣﺴﻮل‬

⇒Are these variants limited to Farsi names?


No
Leonardo ‫ﻟﻴﻮﻧﺎردو‬ vs. ‫ ﻟﻮﻧﺎردو‬vs. ‫( ﻟﺌﻮﻧﺎردو‬Iran-News)
17
Cross-Lingual Borrowings

ƒ Arabic Borrowings:
ƒ with the letters (‫غ‬،‫ع‬،‫ظ‬،‫ط‬،‫ض‬،‫ص‬،‫ذ‬،‫ )ث‬are mostly borrowed from Arabic
ƒ end with ‫ ة‬in Arabic -> end in ‫ ﻩ‬in Persian (AR: ‫ ﻣﺤﺒﻮﺑﺔ‬-> FA: ‫)ﻣﺤﺒﻮﺑﻪ‬
end in ‫ ا‬in Persian (AR: ‫ ﺳﻤﻴﺮة‬-> FA: ‫)ﺳﻤﻴﺮا‬
end in ‫ ت‬in Persian (AR: ‫ هﺪاﻳﺔ‬-> FA: ‫)هﺪاﻳﺖ‬

ƒ Non-Arabic Borrowings:
‫ اﺳﻼن‬Aslan (Azerbaijani-Turkish: lion)
‫ ارﺷﻤﻴﺪ‬Arashmid (Greek: Archimedes)
‫ ﺷﻮوان‬Showan/ShwAn (Kurdish: shepherd)

18
Presentation Overview
ƒ Name Format in Farsi
ƒ Pre-Shah Period
ƒ Post-Shah Period

ƒ Farsi Linguistic Specifications


ƒ Phonological Rules
ƒ Morphological Rules
ƒ Orthographic Variations
ƒ Cross-Lingual Borrowings

ƒ Application of These Linguistic Specifications


ƒ Transliteration & Matching Experiment: to align Farsi names and
their Arabic transliteration

19
Experiment
ƒ Goal: to align Farsi names and their Arabic transliteration
ƒ Data:
ƒ Collected in Feb. 2007
ƒ Source: Islamic Data Bank (IDB)
<http://www.i-b-q.com/> Input Data

ƒ Farsi personal names


ƒ Farsi Transliteration of Arabic Names
Apply the Farsi Names
Linguistic Specifications

ƒ Arabic personal names


ƒ Arabic Transliteration of Farsi Names
Matching

20
Phonological, Orthographic & Cross-Lingual Specifications

Persian Name Arabic Translit. Latin Translit. (FA) Technique


‫ اآﺮم‬،‫ﻳﺎﺳﻴﻨﻲ‬ ‫ أآﺮم‬،‫ﻳﺎﺳﻴﻨﻲ‬ Yassini, Akram Orthographic

‫ اﻣﻴﻞ ﺑﺪﻳﻊ‬،‫ﻳﻌﻘﻮب‬ ‫ إﻣﻴﻞ ﺑﺪﻳﻊ‬،‫ﻳﻌﻘﻮب‬ Yacob, Emil Badi' Orthographic

‫ داود‬،‫وﻓﺎﻳﻲ‬ ‫ داود‬،‫وﻓﺎﺋﻲ‬ Wafai, Davood Orthographic

‫ ﻣﻼﻋﺒﺪاﷲ‬،‫ﻳﺰدي‬ ‫ ﻣﻼ ﻋﺒﺪاﷲ‬،‫ﻳﺰدي‬ Yezdi, Mulla Abdullah Orthographic

‫ ﻓﻬﻴﻤﻪ‬،‫وزﻳﺮي‬ ‫ ﻓﻬﻴﻤﺔ‬،‫وزﻳﺮي‬ Vaziri Fahimeh Cross-Ling.

‫ ﻋﻠﻲ‬،‫ذوﻋﻠﻢ‬ ‫ ﻋﻠﻲ‬،‫ذوﻋﻠﻢ‬ Zouilm, Ali Cross-Ling.


‫ ﻣﺤﻤﺪ‬،‫هﺪاﻳﺖ اﷲ‬ ‫ ﻣﺤﻤﺪ‬،‫هﺪاﻳﺔ اﷲ‬ Hedayatallah Muhamed Cross-Ling.

‫ ﻣﻬﺪي‬،‫ ﻣﻬﺪي هﺎدوي ﺗﻬﺮاﻧﻲ‬،‫هﺎدوي ﻃﻬﺮاﻧﻲ‬ Hadoui Tehrani, Mehdi Phonological

‫ ﻧﺮﮔﺲ‬،‫ﺁﺑﻴﺎر‬ ‫ ﻧﺮﺟﺲ‬،‫ﺁﺑﻴﺎر‬ Abyar, Narges Phonological

21
Combination of Linguistic Specifications

Persian Name Arabic Translit. Latin Translit. (FA) Technique


Phonological
‫ ﺟﺎن اﺣﻤﺪ‬،‫هﺮﻟﻴﻬﻲ‬ ‫ ﺟﻮن أﺣﻤﺪ‬،‫هﺮﻟﻴﻬﻲ‬ Herlihi, Jon Ahmed
Orthographic

‫ ﺳﻴﺪ‬،‫هﺎﺷﻤﻲ ﺑﺎﺟﮕﺎﻧﻲ‬ Phonological


‫ ﺟﻌﻔﺮ‬،‫هﺎﺷﻤﻲ ﺑﺎﺟﻜﺎﻧﻲ‬ Hashemi Bajgani, jafar
Morphological
‫ﺟﻌﻔﺮ‬
Morphological
‫ زهﺮا‬،‫ﻳﺰديﻧﮋاد‬ ‫ زهﺮاء‬،‫ﻳﺰدي ﻧﺠﺎد‬ Yezdinezhad, Zahra Orthographic
Phonological
Morphological
‫ ﻏﻼﻣﺮﺿﺎ‬،‫ﻳﺎﺳﻲﭘﻮر‬ ‫ ﻏﻼم رﺿﺎ‬،‫ﻳﺎﺳﻲ ﺑﻮر‬ Yassipur, Gholamreza Orthographic
Phonological

22
Interesting Findings
ƒ The Persian names list contained transliteration of foreign names

Foreign Name Persian Translit. Arabic Translit. Technique

Thomas, David ‫ دﻳﻮﻳﺪ‬،‫ﺗﻮﻣﺎس‬ ‫ دﻳﻔﻴﺪ‬،‫ﺗﻮﻣﺎس‬ Phonological


Hafner, Robert W ‫ راﺑﺮت داﺑﻠﻴﻮ‬،‫هﻔﻨﺮ‬ ‫ روﺑﺮت دﺑﻠﻴﻮ‬،‫ هﻔﻨﺮ‬Phonological
Webster, Richard ‫ رﻳﭽﺎرد‬،‫وﺑﺴﺘﺮ‬ ‫ رﻳﺘﺸﺎرد‬،‫وﺑﺴﺘﺮ‬ Phonological

ƒ Unicode Variation: Instances of Unicode Right to Left Marker


(U+200F) in place of ZWNJ (U+200C)

23
Experiment Findings

ƒ Raw text was not clean:


ƒ Farsi: usage of wrong Unicode characters (Yeh vs. Farsi Yeh)
=> Farsi text normalization was required
ƒ Data contained translated names rather than transliterated

ƒ Able to align 60% of the input Persian names


Why? => because we aligned names based on Persian names specification
To get 100% alignment => A need to apply Arabic names specifications
to align Arabic transliterated names in Persian

24
Summary

ƒ The format of Persian names nowadays is different from the format


used 90 years ago

ƒ Persian names have complex linguistic structure that can be sub-


categorized and used in Natural Language Processing applications

25
References
Persian Grammar Sketch:
http://lingweb.eva.mpg.de/fieldtools/pdf/PersianGrammarSketch.pdf
Persian Names:
http://www.hellomahdi.com/farsi/Persian_Names
http://www.netencyclo.com/en/Persian_names
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Irina Temnikova, Anna Widiger,
Wajdi Zaghouani & Jan Žižka (2005). Multilingual person name recognition and
transliteration. Journal CORELA - Cognition, Représentation, Language.
Langage. Numéros spéciaux, Le traitement lexicographique des noms propres.
K.M Sharma. What's in a name? Law, religion, and Islamic names. - From the High
Beam Research Archive. Denver Journal of International Law and Policy.
US. BGN. Foreign Names Committee Geographic Names Standardization Policy: Iran.
Retrieved on April 14th 2007
http://earth-info.nima.mil/gns/html/Iran_version_2_31.pdf

26
Questions ?

Thank You!

‫ﺧﻴ ﻠ ﯽ‬
‫ﻣﺘـﺸﻜـــــــﺮم‬
zinas@basistech.com

27

También podría gustarte