-
Notifications
You must be signed in to change notification settings - Fork 0
Algorithms for AIML generation
source: http://workshop.colips.org/re-wochat/documents/07_Paper_7.pdf
-
Phase One: Read the dialogue text from the corpus and insert it in a vector.
-
Phase Two: Text reprocessing modules, where all linguistic annotations such as overlapping, fillers and other linguistic annotations are filtered.
-
Phase Three: converter module, where the pre- processed text is passed to the converter to consider the first turn as a pattern and the second as a template. Removing all punctuation from the patterns and converting it to upper case is done during this phase.
-
Phase Four: Copy these atomic categories in an AIML file.
-
Phase Five: Building a frequency list of patterns lexical. This list will be used to obtain the first and second most significant words (least frequent words) from each utterance.
-
Phase Six: Building the default category file. AIML pattern-matching rules, known as “categories”, are created. There are two possible types of match: input matches a complete pattern so atomic categories will be matched; or input matches 1 st or 2 nd most significant word in the user input (least frequent words).
II. Search the least significant word (by the principle of least influence, using vector vector representation of words;
idea: find the least significant word, replace it by star (*) and so create new AIML pattern; it uses cosine Cosine similarity between whole phrase in vector representation and some part of this phrase
algorithm (needs to check by practice):
-
translate whole text into matrix, where each vector represents one unit (phrase, sentence, paragraph and etc.)
-
for each unit:
2.1) delete next word so form new unit (call such units "derivative units")
2.2) calculate cosine similarity between original and new units and insert the similarity into array
2.3) do so with all words in unit
2.4) select those derivative units, for which cosine similarity is maximal
2.5) in selected derivative units replace deleted words by star symbol and make the units patterns
https://github.com/cosimoiaia/pyAiml-2.0 https://github.com/creatorrr/pyAIML https://github.com/pandorabots/rosie https://github.com/MyRobotLab/aiml/tree/master/bots/mrturing https://github.com/MyRobotLab/aiml/tree/master/bots/BOTS-FRENCH/Inmoov_AI https://github.com/keiffster/program-y https://github.com/keiffster/program-y https://github.com/pandorabots/rosie