Skip to content

Inverted Index Algorithm #213

Open
Open
@DaliaAymanAmeen

Description

@DaliaAymanAmeen

Description of the problem

Searching through text documents is a process that requires lots of computational powers. For a given search keyword, the naive searching algorithm would be to pass through the corpus of documents iteratively and for each document, check whether the word exists in the document. This algorithm will be very slow for a large corpus of documents. If the corpus had ​D​ documents each having ​W​ words each of length ​L​ then the complexity of the naive approach would be O(D * W * L)​ (The complexity of naive search in each document is ​O(W*L)​ ). Searching algorithm will be optimized . The straightforward way is to build an index. For an index, All the words that occurred in a document will be stored. An inverted index works in the reverse way. Instead of storing the words that occurred in each document, The ids of the documents in which each word has occurred will be stored.

References/Other comments

https://www.elastic.co/blog/found-indexing-for-beginners-part3/
http://www.dcs.bbk.ac.uk/~dell/teaching/cc/book/ditp/ditp_ch4.pdf

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions