Skip to content
View pari1jay's full-sized avatar
๐Ÿ 
Working from home
๐Ÿ 
Working from home

Block or report pari1jay

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
pari1jay/README.md

Hi, I'm Pari! ๐Ÿ‘‹

Profile view counter on GitHub

Data Analyst

๐Ÿ”— Portfolio | ๐Ÿ”— LinkedIn


About Me

I'm a Data Analyst with experience in data engineering, system integration, and cloud-based solutions. I have a Master of Science degree in Applied Data Science from Indiana University, and I am passionate about data analytics, AI and machine learning. I'm actively seeking opportunities to work on impactful projects as a Data Analyst or Data Engineer.


Technical Skills ๐Ÿ’ป

Data Analysis & Engineering

  • Languages: Python, R, SQL, Java, C
  • Databases: PostgreSQL, MySQL, MongoDB, Snowflake, MS Access
  • Data Visualization: Tableau, Power BI, Plotly, Excel

Project Management & Collaboration

  • Tools: Jira, Confluence, Lucidchart, MS Project , HP ALM
  • Methodologies: Agile, Scrum, Waterfall

Certifications

  • Career Essentials in Data Analysis by Microsoft
  • Microsoft Azure Data Fundamentals
  • Data Analytics with Microsoft Fabric
  • HackerRank SQL, R (Intermediate)
  • Atlassian Agile Project Management Professional

Projects ๐Ÿš€

1. Efficacy Prediction Model

  • Check Efficacy using a pre-processed dataset (CA, CM, CI classes) from Moleculenet.ai

  • Merge Data: Link NSC across files to combine screening results, EC50/IC50, and structures.

  • Filter Compounds: Focus on CA/CM for active candidates.

  • Calculate Selectivity Index (SI): SI = IC50/EC50 to identify compounds with high efficacy and low toxicity.

  • Data preprocessing :

    • Manage duplicate entries,
    • Mismatched screening conclusions,
    • flag interpretation sign to values and
    • Handle missing data.
  • ML model: performing random splitting (80% train, 20% test).

  • Extracted molecular descriptors (e.g., logP, Morgan Fingerprints, MORSE) from data,

  • training base models, check with test data/.

  • Evaluated models using accuracy, F1-score, and Cohenโ€™s kappa, aligning predictive insights with clinical research objectives.

2. MULTI-CLASS GENRE CLASSIFICATION using R Link

  • Automatic genre classification has long captivated researchers in Music Information Retrieval (MIR), seeking techniques to unravel the musical diversity.
  • audio feature extraction and music genre classification by utilizing Spotify's rich array of audio features and a diverse dataset.
  • Few other projects exploring concepts in R Link

3. Consumer Complaints Prediction

  • Tools: Python, NLP, Data Visualization
  • Description: Applied NLP techniques to analyze customer feedback and classify sentiment as positive, negative, or neutral. Achieved 79% accuracy using machine learning models (Naive Bayes, Decision Tree, KNN).

4. Real Estate Sales Prediction Web Application

  • Tools: Python, Machine Learning, Streamlit
  • Description: Developed a web app to predict real estate sales using Linear Regression, Random Forest, and Gradient Boosting. Enabled city-specific and overall sales predictions with user input.

5. ETL and Data Pipelines with Shell, Airflow and Kafka

  • Tools: Shell, Airflow and Kafka
  • Description: Designed and implemented ETL pipelines to integrate data from multiple sources into a centralized data warehouse, improving data quality by 25%.
  • Coursera: Link

Experience ๐Ÿ’ผ

Data Engineer/Data Analyst | Netcube Technologies | Bangalore, India | Jan 2019 โ€“ Feb 2022

  • Tools: SQL, GCP, Apache Airflow, GitHub, Restful APIs, Flask, ETL/ELT , SQL, NoSQL, Data warehouses

Associate Software Engineer | Tech Mahindra | Bangalore, India | Aug 2016 โ€“ Oct 2018

  • Tools: Oracle DB, HP ALM, Python, Automation testing scripts, Data warehouses

Education ๐ŸŽ“

  • Master of Science in Applied Data Science | Indiana University Indianapolis | Jan 2023 โ€“ May 2024

    • Coursework: Data Analytics using Python and R, Data Visualization, Deep Learning, Cloud Computing, DBMS, Statistics
    • Deanโ€™s Scholarship Recipient
  • Bachelor of Engineering | Mangalore Institute of Technology and Engineering, VTU, India


Let's Connect! ๐ŸŒ

I'm open to collaborating on interesting projects or discussing new opportunities. Feel free to reach out!

Quote

Pinned Loading

  1. Sales-Prediction-using-ML Sales-Prediction-using-ML Public

    The project is on developing a sales prediction Web app using Texas housing dataset('txhousing'). The goal here is to provide insights into real estate sales trends using this dataset. I have used โ€ฆ

    Jupyter Notebook 1 1

  2. Crop-row-detection Crop-row-detection Public

    Developed a deep learning model in Python to detect crop rows from input images, utilizing U-Net architecture with TensorFlow for image segmentation. Evaluated model performance using the Intersectโ€ฆ

    Jupyter Notebook 1

  3. Customer-sentiment-Analysis Customer-sentiment-Analysis Public

    This project focuses on analyzing customer sentiment based on textual data, such as product reviews, feedback, or social media posts. The goal is to classify customer feedback into different sentimโ€ฆ

    Jupyter Notebook 1

  4. Spotify-classification-R Spotify-classification-R Public

    Exploring Audio Features and Genre Classification for Spotify data

    1

  5. Drug-Efficacy-Prediction-Model- Drug-Efficacy-Prediction-Model- Public

    Jupyter Notebook

  6. Midwest-dataset-project-using-R Midwest-dataset-project-using-R Public

    In this project, I aim to conduct a comprehensive analysis of demographic and socioeconomic data for counties in the Midwest region of the United States. The dataset, provides information on variouโ€ฆ

    R