您的当前位置:首页正文

Low Resources Prepositional Phrase Attachment

2023-01-03 来源:易榕旅网
2010 14th Panhellenic Conference on Informatics

Low Resources Prepositional Phrase Attachment

Pavlos Nalmpantis, Romanos Kalamatianos, Konstantinos Kordas and Katia Kermanidis

Department of Informatics, Ionian University

Corfu, Greece

{cs200664, cs200611, cs200539, kerman}@ionio.gr

Abstract—Prepositional phrase attachment is a major disambiguation problem when it’s about parsing natural language, for many languages. In this paper a low resources policy is proposed using supervised machine learning algorithms in order to resolve the disambiguation problem of Prepositional phrase attachment in Modern Greek. It is a first attempt to resolve Prepositional phrase attachment in Modern

Greek, without using sophisticated syntactic annotation and 语法semantic resources. Also there are no restrictions regarding the prepositions addressed, as is common in previous approaches.

Decision Trees, Modern Greek, PP attachment, Supervised learning

语义

Figure 2. Syntax tree of sentence 2

PP attachment has many significant uses. It can be used, as referred above, for improving the performance of syntactic parsers, as it is a major source of ambiguity in natural language. It facilitates further semantic processing, I. INTRODUCTION

and also constitutes an important pre-processing step in

The correct attachment of prepositional phrases (PPs) to many information extraction systems. It has also been another constituent in a sentence is a significant employed in speech processing as a filter in prosodic disambiguation problem for parsing natural languages. For phrasing [1]. example, take the following two sentences: Over the years, many solutions have been proposed to

resolve the disambiguation problem of PP attachment.

1 She eats soup with a spoon. Solutions include the use of machine learning algorithms [2]-2 She eats soup with tomatoes. [3], statistical analysis using corpus-based pattern distributions and lexical signatures [4], as well as the back-In sentence 1, the PP \"with a spoon\" is attached to the off model [5], the maximum entropy model [6] etc. verb phrase (VP) \"eats\" denoting the instrument utilized for However, these methods usually require many resources (i.e. the eating action, thus making it the anchor phrase of the PP. syntactic annotation and often even semantic Sentence 2 seems to differ only minimally from the first disambiguation) which are often unavailable for many sentence, but, as can be seen from their syntax trees in Fig. 1 languages. and Fig. 2 respectively, their syntactic structure is quite Regarding machine learning techniques, previous different. The PP \"with tomatoes\" does not attach to the verb approaches have experimented with various learning but to the noun phrase (NP) \"soup\" denoting the type of schemata. Memory-based learning has been proposed [2], soup. employing the 1-NN algorithm (IB1) and its tree-variation (IB1-IG). These results were compared to results of other

methods and correct attachment performance turned out much better. Also, a nearest-neighbor algorithm has been proposed, employing a cosine similarity measure of pointwise mutual information [3]. The work described in [7] recreates the EDTBL (Error-Driven Transformation-Based Learning) experiment and compares the results with three machine learning algorithms, namely: Naïve Bayes, ID3 IG

(Information Gain) and ID3 GR (Gain Ratio). The latter two

are a tree variation of the k-NN algorithm. The experiment

Figure 1. Syntax tree of sentence 1 was conducted in two phases. In phase 1 the training set was

978-0-7695-4172-3/10 $26.00 © 2010 IEEE

DOI 10.1109/PCI.2010.34

78

gradually increased until all training examples were used. In phase 2 10-fold cross validation was used, and the summary of results for all the aforementioned approaches is shown in Table 1.

In this paper we propose a methodology to resolve the disambiguation problem of PP attachment in Modern Greek using supervised machine learning algorithms given a dataset of feature-vectors extracted from a morphologically annotated corpus. The presented methodology is, to the authors’ knowledge, a first attempt to resolve the PP attachment problem in Modern Greek using minimal linguistic resources, like grammars, tools, treebanks and semantic thesauri. Finally we do not put some restriction on the type of prepositional phrases, thus taking into account all the prepositions.

TABLE I.

SUMMARY OF RESULTS OF RELATED PREVIOUS WORK

Results83.7% 84.1% 86.5% 79% (phase 1) 79% (phase 2) 78% (phase 1) 79% (phase 2) 74% (phase 1) 76% (phase 2)

III.MODERN GREEK PP-ATTACHMENT

Algorithm

IB1(1-NN) [2] IB-IG(1-NN tree) [2] Cosine similarity (NN) [3] ID3 IG(k-NN) [7] ID3 GR(k-NN) [7] Naïve Bayes [7]

The rest of this paper is organized as follows. Section 2 introduces the properties of the Modern Greek language. The corpus used in the presented experiments, the feature extraction process, as well as an example of the creation of the learning vector is discussed in Section 3. Section 4 describes in detail the experimental process and shows the results that were achieved in the experiments. The results are discussed quantitatively and qualitatively in Section 5, and future research prospects are proposed. Finally, the paper concludes with some interesting comments and remarks.

II.

MODERN GREEK PROPERTIES

Modern Greek is a relatively free-word-order language. The ordering of the phrases within a sentence may vary without affecting its correct syntax or its meaning. Internal phrase structure is stricter. Within the noun phrase, adjectives usually precede the noun (e.g. “IJȠ ȝİȖȐȜȠ ıʌȓIJȚ”, [to megalo spiti], 'the big house'), while possessors follow it (e.g. “IJȠ ıʌȓIJȚ ȝȠȣ”, [to spiti mu], 'my house'). Regarding VPs, certain grammatical elements attach to the verb as clitics and form a rigidly ordered group together with it. This applies particularly to unstressed object pronouns, negation particles, the tense particle “șĮ” [șa], and the subjunctive particle “ȞĮ” [na] [8].

In Modern Greek, prepositions normally require the accusative case: “Įʌȩ” (from), “ȖȚĮ” (for), “ȝİ” (with), “ȝİIJȐ” (after), “ȤȦȡȓȢ” (without), “ȦȢ” (as) and “ıİ” (to, in or at). The preposition “ıİ”, when followed by a definite article, fuses with it into forms like “ıIJȠ” (ıİ + IJȠ) and “ıIJȘ” (ıİ + IJȘ). For this reason, lemmatization of the preposition is required [8].

A.Corpus and Pre-processing

The text corpus used in the experiments is the ILSP/ELEFTHEROTYPIA corpus [9]. It consists of 5244 sentences; it is balanced in domain and genre, and manually annotated with complete morphological information. Further (phrase structure) information is obtained automatically by a multi-pass chunker [10].

During chunking, NPs, VPs, PPs, adverbial phrases (ADP) and conjunctions (CON) are detected via multi-pass parsing. The chunker exploits minimal linguistic resources: a keyword lexicon containing 450 keywords (i.e. closed-class words such as articles, prepositions etc.) and a suffix lexicon of 300 of the most common word suffixes in Modern Greek. The chunked phrases are non-overlapping. Embedded phrases are flatly split into distinct phrases. Nominal modifiers in the genitive case are included in the same phrase with the noun they modify; base nouns joined by a coordinating conjunction are grouped into one phrase. The chunker identifies basic phrase constructions during the first passes (e.g. adjective-nouns, article-nouns), and combines smaller phrases into longer ones in later passes (e.g. coordination, inclusion of genitive modifiers, compound phrases).

B.Feature Selection

The feature vector is necessary to enable the classification algorithms to classify the PP attachment. The attributes which can be used to form the feature vector need to be related to the PP attachment ambiguity resolution task, and they vary in the bibliography. The ones encountered most commonly in previous work (e.g. [11]) are: the number of commas between the PP and the anchor candidate, the number of other punctuation marks between the PP and the anchor candidate, the number of words between the PP and the anchor candidate, the POS-tag of the last token of the anchor candidate, the lemma of the PP, the number of PPs between the PP and the anchor candidate, the label of the phrase immediately before the PP, the anchor candidate, etc. In the present work, a feature vector is formed for every anchor candidate and every preposition in a given corpus sentence. Thus, the syntactic freedom of Modern Greek is taken into account. Of the features we mentioned earlier, after a number of experiments conducted using the tools of the WEKA machine learning workbench (Waikato Environment for Knowledge Analysis) [12], which enables the experimentation with various classification algorithms, the following features were selected to form our feature vector:

󰀂The lemma of the preposition introducing the PP. 󰀂The type of the phrase immediately before the PP. 󰀂The anchor candidate.

󰀂The POS-tag of the last token of the anchor

candidate.

󰀂The number of words between the PP and the

anchor candidate.

󰀂The number of PPs between the PP and the anchor

79

󰀂󰀂

candidate.

The number of commas between the PP and the anchor candidate.

The number of other punctuation marks between the PP and the anchor candidate.

and they will decide which the correct attachment is. In this case the PP “with his wife” (“ȝİ IJȘȞ ıȪȗȣȖȠ” in Modern Greek) is correctly attached to the NP “The dialogue of a contemporary worker” because it denotes the person with whom the dialogue took place.

TABLE II.

POS-TAG SYMBOLS

The feature “Correct attachment” was used as the classification class of the vector in the experiments conducted. The reason we selected the feature “POS-tag of the last token of the anchor candidate” is that, in most cases, the headword of NPs and the main verb in VPs is the last token of the phrase. The anchor candidate may be preceding or following the PP, as the syntactic freedom of the language does not impose restrictions on the ordering. Therefore, no such restrictions have been imposed on the described feature set.

C.Feature Vector Extraction

The process of exporting values for each feature vector is automated. We created a program that is written in C language that automatically identifies the first eight features of our vector and stores the results in an Excel file. This program gives all possible attachments (anchor candidates) of a PP in a sentence, that are NPs and VPs, as these phrase types constitute the most significant error source for PP attachment. However it is fairly easy, in future research, to include other anchor candidate phrases types. The correct class label was assigned to every extracted feature vector by three language experts, manually. This feature takes the values of TRUE or FALSE and indicates whether the attachment example represented in the given vector is correct or not. Inter-annotation agreement between the experts was 90%. For the remaining 10%, where the experts didn’t initially agree, a discussion among them followed, resulting to a common decision about correct attachment.

An example of the feature vector extraction process follows. Take the following annotated sentence (firstly presented in the Modern Greek language and then translated in English).

Modern Greek:

B

Symol ValueN Noun A Adjective S Preposition T Article R Adverb F Punctuation mark TABLE III.

FEATURE VECTORS

LPre-P ACPOS-tag #W#PPs #C#PM ȝİ NP NP N 0 0 0 0 ȝİ NP NP F 1 0 0 0 Extracted feature vectors from example sentence; L=Lemma, Pre-P=Previous Phrase, AC=Anchor Candidate, POS-tag=Anchor Candidate last word, #W=word distance between PP and anchor candidate, #PPs=number of PP’s between PP and anchor candidate, #C=number of commas, #PM=other punctuation marks.

IV.EXPERIMENTAL PROCESS

Machine learning is used in order to train the system to be able to make “smart” decisions regarding the anchor candidate phrase to which the PP phrase is to be attached. The produced dataset consisted of 8500 vectors corresponding to 500 corpus sentences. Consequently, the type of machine learning we used is supervised machine

blearning and, having already observed the right results-outputs which have been given by the experts, the system

will be able to predict results automatically.

Of the 8500 vectors, only 7.9% of them indicated a correct attachment (positive examples) and the remaining 82.1% indicated an erroneous attachment (negative examples), so we had an imbalanced dataset and biased results (recall and precision were much higher for the negative class than for the positive class). To balance the dataset random undersampling [13] was performed, i.e. the

NP[ǻȚȐȜȠȖȠȢ ıȪȖȤȡȠȞȠȣ

random removal of negative instances in order for the İȡȖĮȗȩȝİȞȠȣ] PP[ȝİ IJȘ

*ıȪȗȣȖȠ] ADP[ȤșİȢ] NP[IJȠ remaining to reach the number of positive instances. The

final number of instances was 1350. *ȝİıȘȝȑȡȚ.]

We used the following algorithms: Ik (an

implementation of the K-NN classifier), J48 (an English:

NP[The dialogue of a contemporary worker] PP[with implementation of the C4.5 decision tree classifier [14]) and his wife] ADP[yesterday] NP[afternoon.] Naïve Bayes [15]. IBk is a lazy supervised learning algorithm where a new unseen instance vector is classified, Each word in the sentence is annotated with its POS-tag according to the class label majority of the k-nearest training and lemma. Table 2 shows the value of each symbol that vectors (k is user-defined). The classification uses majority represents the POS-tag of a word. voting to classify the unseen instance. J48 is a decision tree The program will extract two feature vectors, which are supervised learning algorithm. It first needs to create a presented in Table 3, giving us two possible attachments: decision tree based on attribute values of the available The PP “with his wife” could be attached to the NP “The training data. Whenever it encounters a set of items (training dialogue of a contemporary worker” or to the NP instances) it identifies the attribute that discriminates them “afternoon”. These two possible attachments will be most clearly according to their class label. We used the examined by the three language experts mentioned earlier pruned version which means that, once the tree is created,

80

C4.5 attempts to remove branches that do not help classification by replacing them with leaf nodes. Naïve Bayes is a supervised learning algorithm based on the assumption of conditional independence [15], i.e. each attribute is independent from the others, given the class label. IBK0,860,840,820,80,780,760,740,720,70,680,660,640,620,6first feature to be checked in order to classify a new unseen attachment example. The constructed tree is represented in Fig. 7. Table 4 explains the integers on the decision tree nodes in Fig.7.

0,880,860,840,820,80,780,760,740,720,70,680,660,640,620,6J48Recall (TRUE)Precision (TRUE)Precision (FALSE)Recall (TRUE)Precision (TRUE)Recall (FALSE)Precision (FALSE)k=1k=3k=5k=7350Number of neighborsFigure 3. Precision & Recall for both class labels as a function of the number of nearest neighbors (value of k) using IBk for 1350 vectors.

Size of Dataset68010001350

Figure 5. Precision & Recall for both class labels as a function of the

dataset size using J48.

In order to validate our results we used 10-fold cross- validation. Through this method, the original sample is randomly partitioned into ten subsamples. One out of ten subsamples is retained as validation data for testing the model, and the remaining nine subsamples are used as training data. The cross-validation process is then repeated ten times, with each of the ten subsamples being used only once as validation data.

0,860,840,820,80,780,760,740,720,70,680,660,640,620,6IB5Recall (TRUE)Precision (TRUE)Precision (FALSE)0,920,90,880,860,840,820,80,780,760,740,720,70,680,660,640,620,60,580,560,540,520,50,480,460,44NaiveBayesRecall (TRUE)Precision (TRUE)Recall (FALSE)Precision (FALSE)350Size of Dataset68010001350Figure 6. Precision & Recall for both class labels as a function of the

dataset size using Naïve Bayes.

35068010001350Sizeof DatasetFigure 4. Precision & Recall for both class labels as a function of the

dataset size using IB5.

The results of the conducted experiments, i.e. precision and recall for both class labels, are shown in Fig. 3 to 6. IBk was run for various values of k (Fig. 3). The results with k=5, being the highest, are then depicted for various data set sizes in Fig. 4. Fig. 5 and Fig. 6 show the results with J48 and Naïve Bayes respectively.

V.

EVALUATION

Using the J48 algorithm one may notice that the most significant feature is the “Candidate and PPs word distance”. It is the feature that is placed at the root of the tree constructed by the C4.5 classifier. In other words, it is the

It is clear from the tree structure that the features “Preposition Lemma”, “Pos-Tag Candidates last token” and “PPs between candidate and PP” are leaves of the tree constructed by C4.5 classifier. This means that these features are not so significant. From the representation of the tree (Fig. 7) it is evident that when the feature “candidates and PPs word distance” has a value greater than 3, the example is usually classified as false. Therefore, many examples that should be classified as true, are classified as false, because their value in this feature is great. From the results produced by the J48 algorithm in Fig. 5 one may notice that as the size of the dataset increases, recall and precision of both class labels start to converge with each other. Using the IBk algorithm we can see in Fig. 3 that as the number of the nearest neighbors increases until getting the value 5 (IB5), recall and precision of both class labels rise. However, at this point, when the number of nearest neighbors takes values greater than 5, the performance of the learner is affected due to noise introduced by the 'distant' closest neighbors. Also,

81

from the results that were produced by the IBk algorithm with 5 nearest neighbors (Fig. 4), recall and precision of both class labels rise as the size of the dataset increases.

Due to the conditional independence assumption of Naïve Bayes, the results produced by this algorithm are poor compared to other algorithms. From Fig. 6, it is evident that, as the size of the dataset increases, recall and precision of both class labels decrease. Recall of class label TRUE and precision of class label FALSE increase slightly when the size of the dataset consists of 1350 vectors.

The best performance (comparing all learning algorithms and dataset sizes) was achieved with J48 with the 1350 dataset (recall and precision of approximately 80%). Compared to the previous work results in Table 2, the results are comparable to previous approaches that utilize more sophisticated external resources.

Figure 7. This figure depicts the tree constructed by the C4.5 classifier. T stands for class label TRUE and F for FALSE.

VI.CONCLUSION

[5]

Supervised machine learning was used in order to solve the PP attachment problem in Modern Greek. The proposed policy was a low resources solution employing basic syntactic pre-processing, and no use of sophisticated parsing tools or semantic thesauri was made. Also, no restrictions were placed upon the prepositions that were addressed, i.e. all prepositions were taken into account.

Future research could take into account other anchor candidate phrase types. Furthermore, the above experiments could be repeated with different learning algorithms and/or a different set of features which could lead to even better results. The low resource policy allows for the easy portability of the presented methodology to other languages that lack in resources, another interesting research directive that could be explored in the future.

REFERENCES

[1]

O. Van Herwijnen, J. Terken, A. van den Bosch and Erwin Marsi, “Learning PP attachment for filtering prosodic phrasing,” in Proceedings of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 2003. Available: http://ilk.uvt.nl/downloads/pub/papers/paper_EACL03.pdf.

J. Zavrel, W. Daelemans and J. Veenstra, “Resolving PP attachment ambiguities with memory-based learning,” in Proceedings of the Conference on Computational Natural Language Learning, pp. 136-144, 1997.

S. Zhao and D. Lin, “A nearest-neighbor method for resolving PP-attachment ambiguity,” in 1st International Joint Conference on Natural Language Processing, 2004.

N. Gala and M. Lafourcade. “Combining corpus based pattern distributions with lexical signatures for PP attachment ambiguity

[6]

[7]

[8]

[9]

[10]

[11]

[2]

[12][13]

[3]

[14][15]

[4]

resolution,” in Proceedings of the 6th Symposium on Natural Language Processing, Chiang Rai, Thailand, 2005.

M. Collins and J. Brooks, “Prepositional phrase attachment through a backed-off model,” in Proceedings of the Third Workshop on Very Large Corpora, pp 27-38, 1995, Cambridge.

A. Ratnaparkhi, J. Reyna.r and S. Roukos, “A maximum entropy model for prepositional phrase attachment,” in Proceedings of the ARPA Workshop on Human Language Technology, pp. 250-255, Plainsboro, N J, March 1994.

B. Mitchell and R. Gaizauskas, “A comparison of machine learning algorithms for prepositional phrase attachment,” in 3rd International Conference on Language Resources and Evaluation, pp 2082-2087, Las Palmas, Canary Islands, Spain, 2002.

D. Holton, P. Mackridge and I. Philippaki-Warburton, Greek: a comprehensive grammar of the modern language, Routledge, London, 1997.

N. Hatzigeorgiu, M. Gavrilido, S. Piperidis, G. Carayannis, A. Papakostopoulou, A. Spiliotopoulou, A. Vacalopoulou, P. Labropoulou, E. Mantzari, H. Papageorgiou and I. Demiros, “Design and implementation of the online ILSP Greek corpus,” in 2nd International Conference on Language Resources and Evaluation, pp.

1737-1742. Athens, Greece (2000). Available: http://www.elda.fr/catalogue/en/text/W0022.html.

E. Stamatatos, N. Fakotakis and G. Kokkinakis, “A practical chunker for unrestricted text,” in Conference on Natural Language Processing, pp. 139-150. Patras, Greece, 2000, submitted for publication.

V. Van Asch and W. Daelemans, “Prepositional phrase attachment in shallow parsing,” in 7th International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, 2009. http://www.cs.waikato.ac.nz/ml/weka/ .

N. V. Chawla, “Data mining for imbalanced datasets: an overview(Periodical style)”, Dept. of Computer Science and Engineering, Notre Dame Univ., U.S, 2005.

J. R. Quinlan, “Bagging, boosting and C4.5,” in 13th National Conference on Artificial Intelligence, Portland, Oregon, 1996.

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2009.

82

因篇幅问题不能全部显示,请点此查看更多更全内容