Skip to content

Machine learning pipeline to score DeFi wallets (0–1000) based on Aave V2 interactions. Includes feature engineering, Sybil detection, and behavioral analysis to support trust-driven lending and wallet segmentation.

License

Notifications You must be signed in to change notification settings

HmbleCreator/scoreFi

DeFi Credit Scoring System for Aave V2 Protocol

Overview

This project implements a comprehensive machine learning-based credit scoring system for DeFi wallets interacting with the Aave V2 protocol. The system analyzes transaction patterns, behavioral indicators, and risk management practices to assign credit scores between 0-1000, where higher scores indicate more reliable and responsible usage.

Architecture

Core Components

  1. Data Processing Pipeline: Extracts and flattens nested JSON data from MongoDB exports
  2. Feature Engineering Engine: Creates 15+ behavioral and financial features
  3. ML Scoring Model: Random Forest regression
  4. Risk Assessment Framework: Multi-dimensional risk evaluation
  5. Analysis & Visualization: Comprehensive scoring analysis and insights

Processing Flow

Raw JSON Data → Data Extraction → Feature Engineering → Model Training → Score Calculation → Analysis & Insights

Feature Categories

1. Transaction Metrics

  • Volume Indicators: Total volume, transaction counts
  • Action Diversity: Unique actions, action ratios

2. Temporal Behavior

  • Consistency: Activity duration
  • Frequency: Transactions per day

3. Portfolio Management

  • Diversification: Unique assets
  • Balance: Deposit/borrow/repay/liquidation ratios

4. Risk Management

  • Leverage Ratios: Borrow-to-deposit, utilization ratios
  • Repayment Behavior: Repay-to-borrow ratios
  • Liquidation Exposure: Liquidation frequency and ratios

Scoring Methodology

Base Model

  • Algorithm: Random Forest Regression
  • Target Variable: Composite risk score (higher is better):
    • risk_score = (liquidation_ratio * 2) + (1 - repay_ratio) + borrow_ratio
    • Model is trained to predict -risk_score (so higher model output = lower risk)
  • Features: 15+ engineered features, standardized

Score Adjustments

  • Positive Factors:
    • High repay ratio, low borrow/liquidation ratios
  • Negative Factors:
    • High liquidation or borrow ratio, low repay ratio

Final Score Calculation

raw_prediction = model.predict(features)
normalized_score = (raw_prediction - min) / (max - min)
credit_score = normalized_score * 1000  # Scale to 0-1000

Usage

Prerequisites

pip install pandas numpy scikit-learn matplotlib seaborn joblib

Running the Scorer

# Prepare features_df as shown in the notebook
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import MinMaxScaler

# ... (feature engineering code)

# Model training and scoring
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_scaled, y)
preds = model.predict(X_scaled)
scores = 1000 * (preds - preds.min()) / (preds.max() - preds.min())
features_df['credit_score'] = scores.astype(int)

Output Files

  • credit_score_model.pkl: Trained model
  • cleaned_wallet_transactions.csv: Cleaned transaction data
  • wallet_features_and_scores.csv: Wallet-level features and scores

Score Interpretation

Score Ranges

  • 900-1000: Excellent credit, highly reliable wallets
  • 700-899: Good credit, responsible usage patterns
  • 500-699: Fair credit, moderate risk indicators
  • 300-499: Poor credit, concerning behaviors
  • 0-299: Very poor credit, high-risk or bot-like behavior

Key Indicators

High-Quality Wallets (800+)

  • Consistent, long-term activity
  • Strong repayment history
  • Balanced leverage usage
  • Human-like transaction patterns

Low-Quality Wallets (300-)

  • Irregular or bot-like patterns
  • High liquidation exposure
  • Poor repayment behavior
  • Excessive leverage usage

Model Validation

Performance Metrics

  • Feature Importance: borrow_ratio, repay_ratio, liquidation_ratio are top drivers
  • Score Distribution: Well spread, with clear separation between risky and responsible wallets
  • Behavioral Consistency: Low-score wallets are riskier, high-score wallets are more responsible

Robustness Checks

  • Feature importance and correlation analysis
  • Behavioral analysis of low/high score wallets

Extensibility

  • Add new features or risk factors easily
  • Support for additional DeFi protocols
  • Real-time scoring capabilities
  • Advanced ML models (e.g., XGBoost)

Technical Considerations

  • Optimized for 100K+ transaction datasets
  • Memory-efficient feature engineering
  • Robust error handling for missing data
  • Feature normalization and scaling

Future Enhancements

  1. Real-time Scoring: Live transaction monitoring
  2. Cross-protocol Analysis: Multi-DeFi platform scoring
  3. Predictive Modeling: Future risk prediction
  4. Market Integration: Price and volatility considerations

Contributing

Contributions are welcome for new features, models, and optimizations.

License

Open source under MIT License.

About

Machine learning pipeline to score DeFi wallets (0–1000) based on Aave V2 interactions. Includes feature engineering, Sybil detection, and behavioral analysis to support trust-driven lending and wallet segmentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published