Skip to content
/ gwizo Public

Simple Go implementation of the Porter Stemmer algorithm with powerful features.

License

Notifications You must be signed in to change notification settings

kampsy/gwizo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4e6d03a · Jan 22, 2017

History

70 Commits
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017
Jan 22, 2017

Repository files navigation

gwizo

home

Gwizo version GoDoc License Twitter

Package gwizo implements Porter Stemmer algorithm, M. "An algorithm for suffix stripping." Program 14.3 (1980): 130-137. Martin Porter, the algorithm's inventor, maintains a web page about the algorithm at http://www.tartarus.org/~martin/PorterStemmer/

Installation

To install, simply run in a terminal:

go get github.com/kampsy/gwizo

Stem

Stem: stem the word.

package main

import (
  "fmt"
  "github.com/kampsy/gwizo"
)

func main() {
  stem := gwizo.Stem("abilities")
  fmt.Printf("Stem: %s\n", stem)
}
$ go run main.go

Stem: able

Vowels, Consonants and Measure

gwizo returns a type Token which has two fileds, VowCon which is the vowel consonut pattern and the Measure value [v]vc{m}[c]

  package main

  import (
    "fmt"
    "github.com/kampsy/gwizo"
    "strings"
  )

func main() {
  word := "abilities"
  token := gwizo.Parse(word)

  // VowCon
  fmt.Printf("%s has Pattern %s \n", word, token.VowCon)

  // Measure value [v]vc{m}[c]
  fmt.Printf("%s has Measure value %d \n", word, token.Measure)

  // Number of Vowels
  v := strings.Count(token.VowCon, "v")
  fmt.Printf("%s Has %d Vowels \n", word, v)

  // Number of Consonants
  c := strings.Count(token.VowCon, "c")
  fmt.Printf("%s Has %d Consonants\n", word, c)
}
$ go run main.go

abilities has Pattern vcvcvcvvc
abilities has Measure value 4
abilities Has 5 Vowels
abilities Has 4 Consonants

File Stem Performance.

  package main

  import (
    "fmt"
    "github.com/kampsy/gwizo"
    "bufio"
    "io/ioutil"
    "strings"
    "os"
    "time"
  )

  func main() {
    curr := time.Now()
    writeOut()
    elaps := time.Since(curr)
    fmt.Println("============================")
    fmt.Println("Done After:", elaps)
    fmt.Println("============================")
  }

  func writeOut() {
    re, err := ioutil.ReadFile("input.txt")
    if err != nil {
      fmt.Println(err)
    }

    file := strings.NewReader(fmt.Sprintf("%s", re))
    scanner := bufio.NewScanner(file)
    out, err := os.Create("stem.txt")
    if err != nil {
      fmt.Println(err)
    }
    defer out.Close()
    for scanner.Scan() {
      txt := scanner.Text()
      stem := gwizo.Stem(txt)
      out.WriteString(fmt.Sprintf("%s\n", stem))
      fmt.Println(txt, "--->", str)
    }
    if err := scanner.Err(); err != nil {
      fmt.Println(err)
    }
  }
$ go run main.go

About

Simple Go implementation of the Porter Stemmer algorithm with powerful features.

Topics

Resources

License

Stars

Watchers

Forks