Skip to content

Commit

Permalink
added pipeline notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
mukul54 committed Mar 15, 2021
1 parent 2bc1c73 commit acb34c3
Show file tree
Hide file tree
Showing 60 changed files with 153,841 additions and 0 deletions.
131 changes: 131 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Editors
.vscode/
.idea/

# Vagrant
.vagrant/

# Mac/OSX
.DS_Store

# Windows
Thumbs.db

# Source for the following rules: https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# .ipynb checkpoints
.ipynb_checkpoints
*/.ipynb_checkpoints/*
**/*.ipynb_checkpoints/
*/.ipynb_checkpoints/*.ipynb
27 changes: 27 additions & 0 deletions Data_Dictionary/data dictionary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Input Files:
dev_article :
- Text_ID : unique article ids
- Text: Article Text Data
- Headline: Headline to the article
- Mobile_Tech_Flag: Flag shows whether article is related to mobile_tech
dev_tweet :
- Text_ID : unique tweet ids
- Text: Tweet Text Data
- Mobile_Tech_Flag: Flag shows whether tweet is related to mobile_tech

Output:
sample_output_1:
- Text_ID : unique article/tweet ids
- Mobile_Tech_Flag_Actual: Have actual mobile_tech values
- Mobile_Tech_Flag_Predicted: Have predicted mobile_tech values
- Headline_Actual_Eng: Headline to the article in English language
- Headline_Generated_Eng_Lang: Generated headline to the article in English language

sample_output_2:
- Text_ID : unique tweet ids
- Mobile_Tech_Flag_Actual: Have actual mobile_tech values
- Mobile_Tech_Flag_Predicted: Have predicted mobile_tech values
- Brands_Entity_Actual: Actual Brands available in data
- Sentiment_Actual: Actual Sentiment available in data
- Brands_Entity_Identified: Predicted Brands available in data
- Sentiment_Identified: Predicted Sentiment available in data
27 changes: 27 additions & 0 deletions Data_Dictionary/data-dictionary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Input Files:
dev_article :
- Text_ID : unique article ids
- Text: Article Text Data
- Headline: Headline to the article
- Mobile_Tech_Flag: Flag shows whether article is related to mobile_tech
dev_tweet :
- Text_ID : unique tweet ids
- Text: Tweet Text Data
- Mobile_Tech_Flag: Flag shows whether tweet is related to mobile_tech

Output:
sample_output_1:
- Text_ID : unique article/tweet ids
- Mobile_Tech_Flag_Actual: Have actual mobile_tech values
- Mobile_Tech_Flag_Predicted: Have predicted mobile_tech values
- Headline_Actual_Eng: Headline to the article in English language
- Headline_Generated_Eng_Lang: Generated headline to the article in English language

sample_output_2:
- Text_ID : unique tweet ids
- Mobile_Tech_Flag_Actual: Have actual mobile_tech values
- Mobile_Tech_Flag_Predicted: Have predicted mobile_tech values
- Brands_Entity_Actual: Actual Brands available in data
- Sentiment_Actual: Actual Sentiment available in data
- Brands_Entity_Identified: Predicted Brands available in data
- Sentiment_Identified: Predicted Sentiment available in data
Binary file added Development_Data/dev_data_article.xlsx
Binary file not shown.
Binary file added Development_Data/dev_data_tweet.xlsx
Binary file not shown.
183 changes: 183 additions & 0 deletions Mobile_Brands_Sheet1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
Brand
Condor
Walton
Gradiente
Multilaser
Positivo
BlackBerry Limited
DataWind
10.Or
Amoi
BBK
Coolpad
Cubot
Gfive
Haier
Hisense
Honor
Huawei
Konka
LeEco
Meitu
Meizu
Ningbo Bird
OnePlus
Oppo
Realme
iQOO
Smartisan
TCL Corporation
Technology Happy Life
Tecno Mobile
Vivo
Vsun
Wasam
Xiaomi
Zopo Mobile
ZTE
Subsidiary of Lenovo
Jablotron
Verzo
SICO Technology
Jolla
Nokia Corporation
HMD Global
Bittium
Archos
Groupe Bull
MobiWire
Wiko
Gigaset
Medion
TechniSat
Tiptel
MLS
X-tigi Mobile
Lenovo
CREO
Celkon
Iball
Intex Technologies
Karbonn Mobiles
Lava International
HCL Technologies
Jio
LYF
Micromax Informatics
Onida Electronics
Spice Digital
Videocon
Xolo
YU Televentures
mPhone
Nexian
MITO
Polytron
Advan
Axioo
IMO
Zyrex
Andromax
Evercoss
Luna
Genpro
Asiafone
Himax
SPC
Vitell
Venera
Osmo
HiCore
Maxtron
Brondi
New Generation Mobile
Olivetti
Onda Mobile Communication
Akai
Fujitsu
Casio
Hitachi
JRC
Kyocera
Mitsubishi Electric
NEC
Panasonic
Sansui
Sharp
Sony
Toshiba
Just5
M Dot
Ninetology
Kyoto Electronics
Lanix
Zonda
Fairphone
John's Phone
Philips
Koryolink
QMobile
Voice Mobile
Cherry Mobile
Starmobile
Cloudfone
MyPhone
Torque
Kruger&Matz
Manta Multimedia
myPhone
Allview
Evolio
E-Boda
Myria
Utok
Beeline
Explay
Gresso
Highscreen
Megafon
MTS
RoverPC
teXet
Sitronics
Yotaphone
KT Tech
LG
Pantech
Samsung
BQ
Doro
Acer
Asus
BenQ
DBTel
Dopod
Foxconn
Gigabyte Technology
HTC
AIS
DTAC
Ericsson
Wellcom
I-Mobile
EvertekTunisie
ASELSAN
Vestel
Thuraya
Bullitt Group
Wileyfox
Apple
BLU Products
Caterpillar
Firefly
Garmin
Google
HP
InFocus
InfoSonics
Motorola Mobility
Obi
Nextbit
"Purism,"
VinSmart
GTel
6 changes: 6 additions & 0 deletions Readme
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
1. Please refer to the Data Dictionary folder for detailed understanding of the features in all the datasets shared
2. Development Data folder has both the datasets for Articles and Tweets. Please use this data to build your solution
3. Sample Output folder has the structure of how the submission file should be like
4. Sample Output 1 is for the Mobile_Tech Classification & Text Summarisation task while Output Sample 2 is for the Entity Based Sentiment Analsysis
5. Non-Adherence to the structure of the sample outputs shared would lead to disqualification
6. You can use the Headline Similarity Scores.ipnb notebook to calculate BERT, ROUGH and BLUE scores. This code will be used to evaluate the performance of the headline genertor.
Loading

0 comments on commit acb34c3

Please sign in to comment.