-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete graphs from plot_hill_curves #519
Comments
Hi @AdimDrewnik, Thank you for reaching out to the Google Meridian support team. Also, thanks for bringing this to our attention. We have not seen this bug before and are unable to replicate it. Could you provide some more details:
Feel free to reach out if you have any further questions or suggestions regarding the same. Thank you, Google Meridian Support Team |
Yes.
According to the Rhat, yes. I have even changed number of knots, number of variables, increased number of draws
No. Default priors, same as priors in the demo colab notebook.
Yes, a lot of warnings (edited for brevity). For example:
But after consulting with many other users of Meridian they are getting exactly same warnings as I am getting. |
I have also noticed now that x-axis scaling is not consistent. For example for some channels it is correct, e.g. maximum empirical value for a channel is 10000 and x axis is scaled to this value. But for some channels maximum empirical values is 1000 but x axis is scaled to 300000. For other channel max empirical is 4000 but x axis goes to 35000. There seems to be no consistency with this scaling. And this may indicate another bug. Meridian default is that 50% of effect is assumed to occur at median non zero cost value. But plot_hill_curves 50% do not coincide with this and hill curves are not achieving full 100% saturation even for values multiple times higher than maximum empirical cost for a given channel. For example this graph. Max empirical value for this channel is about 2000, but x axis goes above 100000, and 50% saturation is achieved at about 50000 which is more than 20 times higher than empirical max cost. |
Hi @AdimDrewnik, Thank you for providing the additional details. To further investigate the issue with the incomplete graphs and inconsistent x-axis scaling, could you please answer the following questions?
from meridian.analysis import visualizer as viz
mfx = viz.MediaEffects(meridian)
hill_df = mfx.hill_curves_dataframe()
# Let's narrow it down to only one R&F channel (assuming the problem was in R&F chart)
df = hill_df[
(hill_df.channel == 'Channel4') & # Use one of your R&F channel names here instead of 'Channel4'
(hill_df.channel_type == 'rf')
]
# Select a subset of the dataframe that is used to render _only_ the histogram facet.
# * The x-axis is mapped to `**_interval_histogram` column values
# * The y-axis is mapped to `scaled_count_histogram` column values
hist_df = df[['channel', 'distribution', 'channel_type', 'start_interval_histogram', 'end_interval_histogram', 'count_histogram', 'scaled_count_histogram']]
hist_df = hist_df[hist_df.scaled_count_histogram.notnull()]
hist_df
# Then, note the min and max values in the `??_interval_histogram` columns: these form the x axis of the histogram chart facet.
hist_df.start_interval_histogram.describe()
# Now, let's take a look at the subset of the dataframe that is used to render _only_ the curve line, and let's use the same channel as above,
# for the **posterior** distribution only.
curve_df = df[df.distribution == 'posterior']
curve_df = curve_df[['channel', 'distribution', 'channel_type', 'media_units', 'mean']]
curve_df
# Then, note the min and max values in the `media_units` columns: this is the x axis of the line chart facet.
curve_df.media_units.describe() In a typical use case, such as our demo dataset, the scale between these two axes is similar. Sharing the min/max values for these columns will help us understand where the two get cut off and further diagnose the problem. Feel free to reach out if you have any further questions or suggestions regarding the same. Thank you, Google Meridian Support Team |
None of my channels are reach and frequency channels. I have run your code and results from hist_df.start_interval_histogram.describe() are from 0 to about 2500 |
Below is an example of problematic hill curve. It seems to be cut too early. Maybe there is a bug.
Parts of the graph are erased manually to anonymize the graph but the hill curve was not modified in any way.
The text was updated successfully, but these errors were encountered: