Skip to content

d0r1h/SAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

518d4fa ┬╖ May 2, 2022

History

22 Commits
Apr 30, 2022
May 2, 2022
Apr 30, 2022
May 1, 2022
Apr 8, 2022
Apr 8, 2022
May 2, 2022

Repository files navigation



Website tweet

State-of-the-art Summarization methods for Hindi in ЁЯдЧ

SAR (рд╕рд╛рд░) in Hindi means summary. This repository contains my work on Hindi Text Summarization on news article.

Notebook:

Notebook Colab Kaggle
BaseLine Open In Colab Kaggle

DataSet:

  • As of now I've released a sample Dataset of 2k pairs of text and summary which can be accessed at Link

Models:

  • Inference results are on 2k sample data.
Model Checkpoint Rouge-2[f_score] Inference time
BART ai4bharat/IndicBART 21.48 20min 27s
T5 csebuetnlp/mT5_multilingual_XLSum 20.21 45min 54s

Project Pipeline

API

You can summarize any Hindi news article in just 5 lines of code

>>> import requests
>>> api_endpoint = "https://hf.space/embed/d0r1h/Hindi_News_Summarizer/+/api/predict/"
>>> news_url = "https://www.amarujala.com/uttar-pradesh/shamli/up-news-heroin-caught-in-shaheen-bagh-of-delhi-is-connection-to-kairana-and-muzaffarnagar?src=tlh\u0026position=3"
>>> r = requests.post(url= api_endpoint, 
                  json = {"data": [ news_url, "BART"]})
>>> r.json()['data'][0]
>>> рдпреВрдкреА рд╢рд╛рд╣реАрди рдмрд╛рдЧ рдореЗрдВ 100 рдХрд░реЛрдбрд╝ рд░реБрдкрдпреЗ рдХреАрдордд рдХреА рд╣реЗрд░реЛрдЗрди рдФрд░ рдЕрдиреНрдп рдорд╛рджрдХ рдкрджрд╛рд░реНрде рдХреА рдмрд░рд╛рдорджрдЧреА рд╡ рдЙрд╕реЗ рд▓рд╛рдиреЗ рдЕрдВрддрд░реНрд░рд╛рд╖реНрдЯреНрд░реАрдп рдбреНрд░рдЧреНрд╕ рддрд╕реНрдХрд░реЛрдВ рдХреЗ рдЧрд┐рд░реЛрд╣ рдХреЗ рддрд╛рд░ рд╢рд╛рдорд▓реА рдЬрд┐рд▓реЗ рдХреЗ рдХрд╕реНрдмрд╛ рдХреИрд░рд╛рдирд╛ рдФрд░ рдореБрдЬрдлреНрдлрд░рдирдЧрд░ рд╕реЗ рдЬреБрдбрд╝ рд░рд╣реЗ рд╣реИрдВред рдирд╛рд░рдХреЛрдЯрд┐рдХреНрд╕ рдХрдВрдЯреНрд░реЛрд▓ рдмреНрдпреВрд░реЛ рдПрдирд╕реАрдмреА рджрд┐рд▓реНрд▓реА рдХреА рдЯреАрдо рдиреЗ рдЧреБрд░реБрд╡рд╛рд░ рдХреЛ рдХреИрд▓рд╛рдирд╛ рд╕реЗ рджреЛ рд▓реЛрдЧреЛрдВ рдХреЛ рд╣рд┐рд░рд╛рд╕рдд рдореЗрдВ

Inference Demo:

Application is hosted on ЁЯдЧ space and can be accessed at SAR

Website Supported

ToDO

  • Add support for following website
  • Foramtting Hindi text for wordcloud