With the spread of the COVID-19 pandemic, the world is observing a shift in the paradigm of the system of education as more institutions adopt online and cloud-based systems of teaching and evaluation. However, this change comes with its pitfalls, one of the most prominent being the inability to check for plagiarism effectively in answer sheets, especially if it is handwritten. Our proposed software involves an Optical Character Recognition (OCR) system that can extract textual data from scanned images of pages with handwritten information on them. Then the extracted data is checked for plagiarism against a database of approximately 130 trillion web pages, and a final report is generated.
There are 3 main modules to the project:
- OCR Module
- Plagiarism Checker Module
- Website
And the internal data flow is depicted by the following block diagram.
- Clone the repository using
git clone https://github.com/DebadityaPal/PlagiarismChecker
- Move into the directory using
cd PlagiarismChecker
- Install all dependencies for the backend using
pip install -r requirements.txt
- Move into the frontend folder using
cd frontend
- Install all dependencies for the frontend using
npm install
- Move into the backend folder from the root folder using
cd backend
- Run the server using
python manage.py runserver 8000
It is important to start the server at port 8000, otherwise frontend requests might not be served.
- Move into the backend folder from the root folder using
cd frontend
- Run the server using
npm run start