How to handle nutrition data #131422

ajama01 · 2024-07-08T14:29:46Z

ajama01
Jul 8, 2024

Select Topic Area

Question

Body

Here is the translation to English:

Hello everyone, I'm currently working on a data from an FFQ (Food Frequency Questionnaire). I've tried to perform PCA on this data, the results of the PCA are not really interpretable (overlapping of variables and individuals) the problem is that some columns corresponding to foods are mostly 0, I don't know how I can handle this kind of data. What do you suggest?

MostlyKIGuess · 2024-07-08T14:43:15Z

MostlyKIGuess
Jul 8, 2024

Log Transformation: ( log(x+1) to handle zeros) can sometimes stabilize the variance and make the data more standerd.
Normalization/Standardization: PCA is sensitive to the scale of the data, so it might be beneficial to standardize your data (e.g., using z-scores).

import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt

# Load data
data = pd.read_csv('ffq_data.csv')

# Log transformation
data_log_transformed = np.log1p(data)

# Standardize
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_log_transformed)

# PCA time
pca = PCA(n_components=2)
principal_components = pca.fit_transform(data_scaled)

# Convert to DataFrame
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])

# Plot it
plt.scatter(pca_df['PC1'], pca_df['PC2'])
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA of FFQ Data')
plt.show()

2 replies

ajama01 Jul 10, 2024
Author

Thank u !

MostlyKIGuess Jul 10, 2024

Hope that helps! :3

LiteBrite82 · 2024-07-08T20:29:55Z

LiteBrite82
Jul 8, 2024

Thanks for posting in the GitHub Community, @ajama01 !

We’ve moved your post to our Programming Help 🧑‍💻 category, which is more appropriate for this type of discussion.

Please review our guidelines about the Programming Help category for more information.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

How to handle nutrition data #131422

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

GitHub Community

How to handle nutrition data #131422

ajama01 Jul 8, 2024

Select Topic Area

Body

Replies: 2 comments · 2 replies

MostlyKIGuess Jul 8, 2024

ajama01 Jul 10, 2024 Author

MostlyKIGuess Jul 10, 2024

LiteBrite82 Jul 8, 2024

ajama01
Jul 8, 2024

Replies: 2 comments 2 replies

MostlyKIGuess
Jul 8, 2024

ajama01 Jul 10, 2024
Author

LiteBrite82
Jul 8, 2024