Skip to content

Algorithms for AIML generation

MeisterUrian edited this page Feb 23, 2017 · 6 revisions

I. Auto generation AIML (Code name "Six Phases"):

source: http://workshop.colips.org/re-wochat/documents/07_Paper_7.pdf

  1. Phase One: Read the dialogue text from the corpus and insert it in a vector.

  2. Phase Two: Text reprocessing modules, where all linguistic annotations such as overlapping, fillers and other linguistic annotations are filtered.

  3. Phase Three: converter module, where the pre- processed text is passed to the converter to consider the first turn as a pattern and the second as a template. Removing all punctuation from the patterns and converting it to upper case is done during this phase.

  4. Phase Four: Copy these atomic categories in an AIML file.

  5. Phase Five: Building a frequency list of patterns lexical. This list will be used to obtain the first and second most significant words (least frequent words) from each utterance.

  6. Phase Six: Building the default category file. AIML pattern-matching rules, known as “categories”, are created. There are two possible types of match: input matches a complete pattern so atomic categories will be matched; or input matches 1 st or 2 nd most significant word in the user input (least frequent words).


II. Search the least significant word (by the principle of least influence, using vector vector representation of words;

Code name "Least influence"):

idea: find the least significant word, replace it by star (*) and so create new AIML pattern; it uses cosine Cosine similarity between whole phrase in vector representation and some part of this phrase

algorithm (needs to check by practice):

  1. translate whole text into matrix, where each vector represents one unit (phrase, sentence, paragraph and etc.)

  2. for each unit:

    2.1) delete next word so form new unit (call such units "derivative units")

    2.2) calculate cosine similarity between original and new units and insert the similarity into array

    2.3) do so with all words in unit

    2.4) select those derivative units, for which cosine similarity is maximal

    2.5) in selected derivative units replace deleted words by star symbol and make the units patterns