Basic-NLP-PreProcessing-with-Python

Basic Learning of Natural Language Processing : These Step before processing and dealing with any text-formatted Data

Tokenization : Process of separating a piece of text into smaller units called tokens
Stemming : Process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form
Lemmatization : In contrast to stemming, lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a morphological analysis to words.
Removing Stop words : What is stop words? is a list of collection word which does not add much meaning to a sentence. These word can safely be ignored without sacrificing the meaning of the sentence.

Lemmas vs Stemming?

Stemming just removes or stems the last few characters of a word, often leading to incorrect meanings and spelling. Lemmatization considers the context and converts the word to its meaningful base form, which is called Lemma. Sometimes, the same word can have multiple different Lemmas.

Lemmatize the word 'Caring', it would return 'Care'. But stem, it would return 'Car'.
Lemmatize the word 'Stripes' in verb context, it would return 'Strip'. Lemmatize return a noun context, and would return 'Stripe'. whereas stem it, it would just return 'Strip'
Lemmatization is computationally expensive.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
01-Tokenization.ipynb		01-Tokenization.ipynb
02-Stemming.ipynb		02-Stemming.ipynb
03-Lemmas.ipynb		03-Lemmas.ipynb
04-StopWords.ipynb		04-StopWords.ipynb
Matcher.ipynb		Matcher.ipynb
NLP Basic with Spacy.ipynb		NLP Basic with Spacy.ipynb
README.md		README.md
owlcreek.txt		owlcreek.txt
pipeline1.png		pipeline1.png
reaganomics.txt		reaganomics.txt
stemming1.png		stemming1.png
stemming2.png		stemming2.png
tokenization.png		tokenization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Basic-NLP-PreProcessing-with-Python

Lemmas vs Stemming?

About

Uh oh!

Releases

Packages

Languages

muhk01/Basic-NLP-PreProcessing-with-Python

Folders and files

Latest commit

History

Repository files navigation

Basic-NLP-PreProcessing-with-Python

Lemmas vs Stemming?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages