Description
Description of the problem
Searching through text documents is a process that requires lots of computational powers. For a given search keyword, the naive searching algorithm would be to pass through the corpus of documents iteratively and for each document, check whether the word exists in the document. This algorithm will be very slow for a large corpus of documents. If the corpus had D documents each having W words each of length L then the complexity of the naive approach would be O(D * W * L) (The complexity of naive search in each document is O(W*L) ). Searching algorithm will be optimized . The straightforward way is to build an index. For an index, All the words that occurred in a document will be stored. An inverted index works in the reverse way. Instead of storing the words that occurred in each document, The ids of the documents in which each word has occurred will be stored.
References/Other comments
https://www.elastic.co/blog/found-indexing-for-beginners-part3/
http://www.dcs.bbk.ac.uk/~dell/teaching/cc/book/ditp/ditp_ch4.pdf