This project implements a comprehensive machine learning-based credit scoring system for DeFi wallets interacting with the Aave V2 protocol. The system analyzes transaction patterns, behavioral indicators, and risk management practices to assign credit scores between 0-1000, where higher scores indicate more reliable and responsible usage.
- Data Processing Pipeline: Extracts and flattens nested JSON data from MongoDB exports
- Feature Engineering Engine: Creates 15+ behavioral and financial features
- ML Scoring Model: Random Forest regression
- Risk Assessment Framework: Multi-dimensional risk evaluation
- Analysis & Visualization: Comprehensive scoring analysis and insights
Raw JSON Data → Data Extraction → Feature Engineering → Model Training → Score Calculation → Analysis & Insights
- Volume Indicators: Total volume, transaction counts
- Action Diversity: Unique actions, action ratios
- Consistency: Activity duration
- Frequency: Transactions per day
- Diversification: Unique assets
- Balance: Deposit/borrow/repay/liquidation ratios
- Leverage Ratios: Borrow-to-deposit, utilization ratios
- Repayment Behavior: Repay-to-borrow ratios
- Liquidation Exposure: Liquidation frequency and ratios
- Algorithm: Random Forest Regression
- Target Variable: Composite risk score (higher is better):
risk_score = (liquidation_ratio * 2) + (1 - repay_ratio) + borrow_ratio
- Model is trained to predict
-risk_score
(so higher model output = lower risk)
- Features: 15+ engineered features, standardized
- Positive Factors:
- High repay ratio, low borrow/liquidation ratios
- Negative Factors:
- High liquidation or borrow ratio, low repay ratio
raw_prediction = model.predict(features)
normalized_score = (raw_prediction - min) / (max - min)
credit_score = normalized_score * 1000 # Scale to 0-1000
pip install pandas numpy scikit-learn matplotlib seaborn joblib
# Prepare features_df as shown in the notebook
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import MinMaxScaler
# ... (feature engineering code)
# Model training and scoring
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_scaled, y)
preds = model.predict(X_scaled)
scores = 1000 * (preds - preds.min()) / (preds.max() - preds.min())
features_df['credit_score'] = scores.astype(int)
credit_score_model.pkl
: Trained modelcleaned_wallet_transactions.csv
: Cleaned transaction datawallet_features_and_scores.csv
: Wallet-level features and scores
- 900-1000: Excellent credit, highly reliable wallets
- 700-899: Good credit, responsible usage patterns
- 500-699: Fair credit, moderate risk indicators
- 300-499: Poor credit, concerning behaviors
- 0-299: Very poor credit, high-risk or bot-like behavior
- Consistent, long-term activity
- Strong repayment history
- Balanced leverage usage
- Human-like transaction patterns
- Irregular or bot-like patterns
- High liquidation exposure
- Poor repayment behavior
- Excessive leverage usage
- Feature Importance: borrow_ratio, repay_ratio, liquidation_ratio are top drivers
- Score Distribution: Well spread, with clear separation between risky and responsible wallets
- Behavioral Consistency: Low-score wallets are riskier, high-score wallets are more responsible
- Feature importance and correlation analysis
- Behavioral analysis of low/high score wallets
- Add new features or risk factors easily
- Support for additional DeFi protocols
- Real-time scoring capabilities
- Advanced ML models (e.g., XGBoost)
- Optimized for 100K+ transaction datasets
- Memory-efficient feature engineering
- Robust error handling for missing data
- Feature normalization and scaling
- Real-time Scoring: Live transaction monitoring
- Cross-protocol Analysis: Multi-DeFi platform scoring
- Predictive Modeling: Future risk prediction
- Market Integration: Price and volatility considerations
Contributions are welcome for new features, models, and optimizations.
Open source under MIT License.