The purpose of this repository is to find the longest word in a given text efficiently. This project aims to compare different algorithms and their performance in terms of time complexity.
After running multiple tests with various approaches, we observed the following results:
Used technology | Taken Time 1x dataset | T. 4x dataset (500 MB) | T. 8x dataset (1 GB) |
---|---|---|---|
Pyspark_First | 19.6s, 26.8s, 22.1s AVG: 22.83s |
81s, 92.5s, 89.4s AVG: 87.63s |
201.7s, 179.1s, 157.4s AVG: 179.4s |
Pyspark_New | 22.7s, 22.4s, 25.8s AVG: 23.63s |
88.9s, 91.7s, 89.3s AVG: 89.97s |
160.5s, 157.4s, 150.1s AVG: 156s |
Python Script | 9.72s, 6.62s, 6.88s AVG: 7.74s |
34.39s, 33.18s, 31.01s AVG: 32.86s |
76.92s, 60.24s, 59.89s AVG: 65.68s |
To install the necessary dependencies, run the following command:
pip install -r requirements.txt
You can use the code of this repository in multiple ways:
Adjust the parent folder or root path in the notebook and execute all tasks
Adjust the ROOTPATH for your setup and execute via terminal