Комп'ютерна лінгвістика: травня 2011

пʼятниця, 6 травня 2011 р.

Перегляд синсетів у які входить задане слово (Задача №7 11 розділ)

Write a function which displays the complete entry for a lexeme. When the
lexeme is incorrectly spelled, it should display the entry for the most similarly
spelled lexeme.

Вважаю, що entry для lexeme це будуть всі синсети з WordNet у які вона входить.
Повністю повторення прикладу зі сторінки 424 але все зроблено, як одна функція, хоча так робити недоцільно. Кожен раз коли виконується ця функція відбувається індексування словника signatures = nltk.Index((signature(w), w) for w in nltk.corpus.words.words())

import nltk, re

def fuzzy_spell(word):
mappings = [('ph', 'f'), ('ght', 't'), ('^kn', 'n'), ('qu', 'kw'), ('[aeiou]+', 'a'), (r'(.)\1', r'\1')]
def signature(word):
for patt, repl in mappings:
word = re.sub(patt, repl, word)
pieces = re.findall('[^aeiou]+', word)
return ''.join(char for piece in pieces for char in piece)

signatures = nltk.Index((signature(w), w) for w in nltk.corpus.words.words())

def rank(word, wordlist):
ranked = sorted((nltk.edit_distance(word, w), w) for w in wordlist)
return [word for (_, word) in ranked]

# Якщо слово відсутнє у словнику то вважаємо що воно містить помилку
if word not in nltk.corpus.words.words():
sig = signature(word)
if sig in signatures:
for words in rank(word, signatures[sig]):
print words, nltk.corpus.wordnet.synsets(words)
else:
return []
else:
print nltk.corpus.wordnet.synsets(word)

print fuzzy_spell('closeі')

Зовсім просто (Задача№1 11 Розділ)

In Example 11.8 the new field appeared at the bottom of the entry. Modify this program so that it inserts the new subelement right after the lx field. (Hint: create the new cv field using Element('cv'), assign a text value to it, then use the insert() method of the parent element.)

Доволі все просто.
import nltk,re,pprint
from nltk.corpus import toolbox
from nltk.etree.ElementTree import Element

lexicon = toolbox.xml('rotokas.dic')

def cv(s):
s = s.lower()
s = re.sub(r'[^a-z]', r'_', s)
s = re.sub(r'[aeiou]', r'V', s)
s = re.sub(r'[^V_]', r'C', s)
return (s)

def add_cv_field(entry):

for field in entry:
if field.tag == 'lx':
cv_field = Element('cv')
cv_field.text = cv(field.text)
entry.insert(1,cv_field)

print nltk.toolbox.to_sfm_string(lexicon[53])
add_cv_field(lexicon[53])
print nltk.toolbox.to_sfm_string(lexicon[53])

пʼятниця, 6 травня 2011 р.

Перегляд синсетів у які входить задане слово (Задача №7 11 розділ)

Зовсім просто (Задача№1 11 Розділ)

пʼятниця, 6 травня 2011 р.