Information theory metrics calculation issue #16

surbhir08 · 2022-04-29T05:14:45Z

Hi Team,
Thanks for addressing the issue of density estimation for multidimensional data.
I have a few questions as I am trying to implement information theory metrics:

Q1.Is this method apt for high dimensional tabular data?
Q2.I have been trying to run RBIG mutual info() over a tabular data and the results are exact same for all of them, I did check the results using SK learn MI score and got variables results (results not normalized in both cases- SK learn and RBIG). I don't understand the error, can you in anyway help me with this?

below is the piece of code I used:

X: features (attributes not in Y)
Y: set of y attributes (attributes not in X) (let's say y1,y2,y3,y4)
def calculate_miscore_xa(data,X,Y):
mis_xy = []
y_attributes = []
for y in Y:
rbig_model = MutualInfoRBIG(max_layers = 10000)
rbig_model.fit(data[X], data[[y]]);
mi_rbig = rbig_model.mutual_info() * np.log(2)
mis_xy.append(mi_rbig)
y_attributes.append(a)
mis_xy = pd.DataFrame({'Y':y_attributes, 'I(Xi,Y)': mis_xy})
return mis_xy

basically the results I am getting is
I(X,y1) = I(X,y2) = I(X,y3) = I(X,y4) = exact same
It's unusual hence I checked the results using https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html and the results for I(X,y1), I(X,y2), I(X,y3),I(X,y4) differ.
Can you help me understand if there is anythings I am doing wrong ?

Also the original calculation using entropy implemented in information theory notebook can be used used as base for tabular data by substituting respective X and Y in 2d format?

Thanks and Regards
Surbhi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information theory metrics calculation issue #16

Information theory metrics calculation issue #16

surbhir08 commented Apr 29, 2022

Information theory metrics calculation issue #16

Information theory metrics calculation issue #16

Comments

surbhir08 commented Apr 29, 2022