You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A word takes different inflectional forms. For instance, the word, "Compute" can take the forms, "computing", "computation", and "computerize". The NLP applications such as Search Engines and Information Extraction would want to store the base or stem of the word, i.e "Compute" instead of accomodating all its inflected forms. This will yield in dimensionality reduction and incerases the efficiency of the system. The stemmer cuts the prefix and suffix of a word.
22
-
23
-
# Languages covered:
24
-
25
-
Our stemmer works for the following 26 languages.
26
-
27
-
| Languages | ISO Code |
28
-
|--------------|------------|
29
-
| Arabic | ar |
30
-
| Catalan | ca |
31
-
| Danish | da |
32
-
| German | de |
33
-
| Greek | el |
34
-
| English | en |
35
-
| Spanish | es |
36
-
| Basque | eu |
37
-
| Finnish | fi |
38
-
| French | fr |
39
-
| Irish | ga |
40
-
| Hindi | hi |
41
-
| Hungarian | hu |
42
-
| Indonesian | id |
43
-
| Italian | it |
44
-
| Lithuanian | lt |
45
-
| Nepali | ne |
46
-
| Dutch | nl |
47
-
| Norwegian | no |
48
-
| Portuguese | pt |
49
-
| Romanian | ro |
50
-
| Russian | ru |
51
-
| Serbian | sr |
52
-
| Swedish | sv |
53
-
| Tamil | ta |
21
+
A word takes different inflectional forms. For instance, the word, "Compute" can take the forms, "computing", "computation", and "computerize". The NLP applications such as Search Engines and Information Extraction would want to store the base or stem of the word, i.e "Compute" instead of accomodating all its inflected forms. This will yield in dimensionality reduction and incerases the efficiency of the system. The stemmer cuts the prefix and suffix of a word.
22
+
23
+
# Languages covered:
24
+
25
+
Our stemmer works for the following 26 languages.
26
+
27
+
| Languages | ISO Code |
28
+
|--------------|------------|
29
+
| Arabic | ar |
30
+
| Catalan | ca |
31
+
| Danish | da |
32
+
| German | de |
33
+
| Greek | el |
34
+
| English | en |
35
+
| Spanish | es |
36
+
| Basque | eu |
37
+
| Finnish | fi |
38
+
| French | fr |
39
+
| Irish | ga |
40
+
| Hindi | hi |
41
+
| Hungarian | hu |
42
+
| Indonesian | id |
43
+
| Italian | it |
44
+
| Lithuanian | lt |
45
+
| Nepali | ne |
46
+
| Dutch | nl |
47
+
| Norwegian | no |
48
+
| Portuguese | pt |
49
+
| Romanian | ro |
50
+
| Russian | ru |
51
+
| Serbian | sr |
52
+
| Swedish | sv |
53
+
| Tamil | ta |
54
54
| Turkish | tr |
55
55
56
56
```python
@@ -71,7 +71,7 @@ def get_stemmer(self,
71
71
## Example Usage
72
72
73
73
```python
74
-
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"அவள் வேகமாக ஓடினாள்","lang":"ta"}}}')
74
+
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"அவள் வேகமாக ஓடினாள்","lang":"ta"}}}')
75
75
76
76
result = basic_ap_is_controller.get_stemmer(body)
77
77
```
@@ -81,15 +81,15 @@ result = basic_ap_is_controller.get_stemmer(body)
81
81
```json
82
82
[
83
83
{
84
-
"orginalText": "அவள்",
84
+
"originalText": "அவள்",
85
85
"stem": "அவள்"
86
86
},
87
87
{
88
-
"orginalText": "வேகமாக",
88
+
"originalText": "வேகமாக",
89
89
"stem": "வேகம்"
90
90
},
91
91
{
92
-
"orginalText": "ஓடினாள்",
92
+
"originalText": "ஓடினாள்",
93
93
"stem": "ஓடி"
94
94
}
95
95
]
@@ -106,27 +106,27 @@ result = basic_ap_is_controller.get_stemmer(body)
106
106
107
107
# Get Lemma
108
108
109
-
Lemmatizer is similar to stemmer that gives the stemmed version of a word but lemmatizer differs from the stemmer in giving a meaningful stem or the lemma. For instance, for the word, "smiling", the stemmer would give, "smil", stemming the suffix, "ing" but the lemmatizer would give the meaningful stem, "smile". lemmatizers can be used in applications such as, Machine Translation, Search Engines, Text Summarization etc.
110
-
111
-
# Languages covered:
112
-
113
-
| Languages | ISO Code |
114
-
|--------------------|----------|
115
-
| Catalan | ca |
116
-
| Danish | da |
117
-
| Dutch | nl |
118
-
| English | en |
119
-
| French | fr |
120
-
| German | de |
121
-
| Greek | el |
122
-
| Italian | it |
123
-
| Lithuanian | lt |
124
-
| Macedonian | mk |
125
-
| Norwegian (Bokmål) | nb |
126
-
| Polish | pl |
127
-
| Portuguese | pt |
128
-
| Romanian | ro |
129
-
| Russian | ru |
109
+
Lemmatizer is similar to stemmer that gives the stemmed version of a word but lemmatizer differs from the stemmer in giving a meaningful stem or the lemma. For instance, for the word, "smiling", the stemmer would give, "smil", stemming the suffix, "ing" but the lemmatizer would give the meaningful stem, "smile". lemmatizers can be used in applications such as, Machine Translation, Search Engines, Text Summarization etc.
110
+
111
+
# Languages covered:
112
+
113
+
| Languages | ISO Code |
114
+
|--------------------|----------|
115
+
| Catalan | ca |
116
+
| Danish | da |
117
+
| Dutch | nl |
118
+
| English | en |
119
+
| French | fr |
120
+
| German | de |
121
+
| Greek | el |
122
+
| Italian | it |
123
+
| Lithuanian | lt |
124
+
| Macedonian | mk |
125
+
| Norwegian (Bokmål) | nb |
126
+
| Polish | pl |
127
+
| Portuguese | pt |
128
+
| Romanian | ro |
129
+
| Russian | ru |
130
130
| Spanish | es |
131
131
132
132
```python
@@ -147,7 +147,7 @@ def get_lemma(self,
147
147
## Example Usage
148
148
149
149
```python
150
-
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"Smiling makes everyone happy","lang":"en"}}}')
150
+
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"Smiling makes everyone happy","lang":"en"}}}')
151
151
152
152
result = basic_ap_is_controller.get_lemma(body)
153
153
```
@@ -157,19 +157,19 @@ result = basic_ap_is_controller.get_lemma(body)
157
157
```json
158
158
[
159
159
{
160
-
"orginalText": "Smiling",
160
+
"originalText": "Smiling",
161
161
"lemmatized": "smile"
162
162
},
163
163
{
164
-
"orginalText": "makes",
164
+
"originalText": "makes",
165
165
"lemmatized": "make"
166
166
},
167
167
{
168
-
"orginalText": "everyone",
168
+
"originalText": "everyone",
169
169
"lemmatized": "everyone"
170
170
},
171
171
{
172
-
"orginalText": "happy",
172
+
"originalText": "happy",
173
173
"lemmatized": "happy"
174
174
}
175
175
]
@@ -186,27 +186,27 @@ result = basic_ap_is_controller.get_lemma(body)
186
186
187
187
# Get Morph
188
188
189
-
Morphological Analyzer analyzes how a word is formed. It breaks a word into smaller units called, "morphemes" and gives a clue on the pattern of words of a particular langauge. It can be used for building applications such as, Machine Translation, Text Summarization, Search systems etc.
190
-
191
-
# Languages covered:
192
-
193
-
| Languages | ISO Code |
194
-
|--------------------|----------|
195
-
| Catalan | ca |
196
-
| Danish | da |
197
-
| Dutch | nl |
198
-
| English | en |
199
-
| French | fr |
200
-
| German | de |
201
-
| Greek | el |
202
-
| Italian | it |
203
-
| Japanese | ja |
204
-
| Lithuanian | lt |
205
-
| Macedonian | mk |
206
-
| Norwegian (Bokmål) | nb |
207
-
| Polish | pl |
208
-
| Portuguese | pt |
209
-
| Russian | ru |
189
+
Morphological Analyzer analyzes how a word is formed. It breaks a word into smaller units called, "morphemes" and gives a clue on the pattern of words of a particular langauge. It can be used for building applications such as, Machine Translation, Text Summarization, Search systems etc.
190
+
191
+
# Languages covered:
192
+
193
+
| Languages | ISO Code |
194
+
|--------------------|----------|
195
+
| Catalan | ca |
196
+
| Danish | da |
197
+
| Dutch | nl |
198
+
| English | en |
199
+
| French | fr |
200
+
| German | de |
201
+
| Greek | el |
202
+
| Italian | it |
203
+
| Japanese | ja |
204
+
| Lithuanian | lt |
205
+
| Macedonian | mk |
206
+
| Norwegian (Bokmål) | nb |
207
+
| Polish | pl |
208
+
| Portuguese | pt |
209
+
| Russian | ru |
210
210
| Spanish | es |
211
211
212
212
```python
@@ -227,7 +227,7 @@ def get_morph(self,
227
227
## Example Usage
228
228
229
229
```python
230
-
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"Let us begin the API development.","lang":"en"}}}')
230
+
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"Let us begin the API development.","lang":"en"}}}')
231
231
232
232
result = basic_ap_is_controller.get_morph(body)
233
233
```
@@ -411,20 +411,20 @@ result = basic_ap_is_controller.get_morph(body)
411
411
412
412
# Get Postag
413
413
414
-
Parts of Speech Tagger, which is shortly known as POS Tagger is a software that automatically finds the word classes, when a text input is given. The text input can be a word, a sentence or a set of sentences. The word classes are the grammatical categories such as, Noun, Verb, Adverb etc. The category assigned to each word is called as a tag. A set of tags, each indicating a grammatical category is called, "tagsets". POS tagging is a mandatory pre-processing for most of the Natural Language Processing Applications such as, Information Extraction, Information Retreival systems and Summary generation systems. A POS Tagger is a language-dependent software as the grammar rules will differ for every language. For instance, a word ending with "ing" might indicate a "Verb" in English but this will not be applicable for other languages.
415
-
416
-
# Languages covered:
417
-
418
-
| Languages | ISO Code |
419
-
|--------------------|----------|
420
-
| Chinese | zh |
421
-
| Dutch | nl |
422
-
| English | en |
423
-
| German | de |
424
-
| Italian | it |
425
-
| Lithuanian | lt |
426
-
| Polish | pl |
427
-
| Romanian | ro |
414
+
Parts of Speech Tagger, which is shortly known as POS Tagger is a software that automatically finds the word classes, when a text input is given. The text input can be a word, a sentence or a set of sentences. The word classes are the grammatical categories such as, Noun, Verb, Adverb etc. The category assigned to each word is called as a tag. A set of tags, each indicating a grammatical category is called, "tagsets". POS tagging is a mandatory pre-processing for most of the Natural Language Processing Applications such as, Information Extraction, Information Retreival systems and Summary generation systems. A POS Tagger is a language-dependent software as the grammar rules will differ for every language. For instance, a word ending with "ing" might indicate a "Verb" in English but this will not be applicable for other languages.
415
+
416
+
# Languages covered:
417
+
418
+
| Languages | ISO Code |
419
+
|--------------------|----------|
420
+
| Chinese | zh |
421
+
| Dutch | nl |
422
+
| English | en |
423
+
| German | de |
424
+
| Italian | it |
425
+
| Lithuanian | lt |
426
+
| Polish | pl |
427
+
| Romanian | ro |
428
428
| Tamil | ta |
429
429
430
430
```python
@@ -445,7 +445,7 @@ def get_postag(self,
445
445
## Example Usage
446
446
447
447
```python
448
-
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"Let us begin the API development.","lang":"en"}}}')
448
+
body = jsonpickle.decode('{"$$__case":0,"$$__case_of":"oneOf","value":{"input":{"text":"Let us begin the API development.","lang":"en"}}}')
449
449
450
450
result = basic_ap_is_controller.get_postag(body)
451
451
```
@@ -455,31 +455,31 @@ result = basic_ap_is_controller.get_postag(body)
0 commit comments