Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete graphs from plot_hill_curves #519

Open
AdimDrewnik opened this issue Feb 16, 2025 · 5 comments
Open

Incomplete graphs from plot_hill_curves #519

AdimDrewnik opened this issue Feb 16, 2025 · 5 comments

Comments

@AdimDrewnik
Copy link

AdimDrewnik commented Feb 16, 2025

Below is an example of problematic hill curve. It seems to be cut too early. Maybe there is a bug.
Parts of the graph are erased manually to anonymize the graph but the hill curve was not modified in any way.

Image

@dattatreyam23
Copy link
Collaborator

Hi @AdimDrewnik,

Thank you for reaching out to the Google Meridian support team.

Also, thanks for bringing this to our attention. We have not seen this bug before and are unable to replicate it. Could you provide some more details:

  1. Is the bug replicated upon rerunning the same model in a new colab?
  2. Does the model achieve convergence?
  3. Are you using any custom priors or parameter settings that differ from the default?
  4. Do you receive any warning or error messages when running the model?

Feel free to reach out if you have any further questions or suggestions regarding the same.

Thank you,

Google Meridian Support Team

@AdimDrewnik
Copy link
Author

Is the bug replicated upon rerunning the same model in a new colab?

Yes.

Does the model achieve convergence?

According to the Rhat, yes. I have even changed number of knots, number of variables, increased number of draws
and results persist.

Are you using any custom priors or parameter settings that differ from the default?

No. Default priors, same as priors in the demo colab notebook.

Do you receive any warning or error messages when running the model?> > Feel free to reach out if you have any further questions or suggestions regarding the same.>

Yes, a lot of warnings (edited for brevity). For example:

  • ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    dopamine-rl 4.1.2 requires tf-keras>=2.18.0, but you have tf-keras 2.16.0 which is incompatible.
    tensorflow-text 2.18.1 requires tensorflow<2.19,>=2.18.0, but you have tensorflow 2.16.2 which is incompatible.
  • UserWarning: Hierarchical distribution parameters must be deterministically zero for national models. xi_n has been automatically set to Deterministic(0).
  • UserWarning: trace group is not defined in the InferenceData scheme
  • FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.

But after consulting with many other users of Meridian they are getting exactly same warnings as I am getting.

@AdimDrewnik
Copy link
Author

AdimDrewnik commented Feb 20, 2025

I have also noticed now that x-axis scaling is not consistent. For example for some channels it is correct, e.g. maximum empirical value for a channel is 10000 and x axis is scaled to this value. But for some channels maximum empirical values is 1000 but x axis is scaled to 300000. For other channel max empirical is 4000 but x axis goes to 35000. There seems to be no consistency with this scaling.

And this may indicate another bug. Meridian default is that 50% of effect is assumed to occur at median non zero cost value. But plot_hill_curves 50% do not coincide with this and hill curves are not achieving full 100% saturation even for values multiple times higher than maximum empirical cost for a given channel.

For example this graph. Max empirical value for this channel is about 2000, but x axis goes above 100000, and 50% saturation is achieved at about 50000 which is more than 20 times higher than empirical max cost.

Image

@dattatreyam23
Copy link
Collaborator

Hi @AdimDrewnik,

Thank you for providing the additional details. To further investigate the issue with the incomplete graphs and inconsistent x-axis scaling, could you please answer the following questions?

  • Do the problematic charts come from R&F channels?
  • Can you share the characteristics of the x-axes between the histogram facet and the curve lines? You can determine the min/max of these columns using the following code (e.g., in a Colab Notebook):
   from meridian.analysis import visualizer as viz

   mfx = viz.MediaEffects(meridian)
   hill_df = mfx.hill_curves_dataframe()


   # Let's narrow it down to only one R&F channel (assuming the problem was in R&F chart)
   df = hill_df[
        (hill_df.channel == 'Channel4') & # Use one of your R&F channel names here instead of 'Channel4'
        (hill_df.channel_type == 'rf')
   ]

   # Select a subset of the dataframe that is used to render _only_ the histogram facet.
   # * The x-axis is mapped to `**_interval_histogram` column values
   # * The y-axis is mapped to `scaled_count_histogram` column values
   hist_df = df[['channel', 'distribution', 'channel_type', 'start_interval_histogram', 'end_interval_histogram', 'count_histogram', 'scaled_count_histogram']]
   hist_df = hist_df[hist_df.scaled_count_histogram.notnull()]
   hist_df

   # Then, note the min and max values in the `??_interval_histogram` columns: these form the x axis of the histogram chart facet.
   hist_df.start_interval_histogram.describe()

   
   # Now, let's take a look at the subset of the dataframe that is used to render _only_ the curve line, and let's use the same channel as above,
   # for the **posterior** distribution only.
   curve_df = df[df.distribution == 'posterior']
   curve_df = curve_df[['channel', 'distribution', 'channel_type', 'media_units', 'mean']]
   curve_df

   # Then, note the min and max values in the `media_units` columns: this is the x axis of the line chart facet.
   curve_df.media_units.describe()

In a typical use case, such as our demo dataset, the scale between these two axes is similar. Sharing the min/max values for these columns will help us understand where the two get cut off and further diagnose the problem.

Feel free to reach out if you have any further questions or suggestions regarding the same.

Thank you,

Google Meridian Support Team

@AdimDrewnik
Copy link
Author

None of my channels are reach and frequency channels.

I have run your code and results from hist_df.start_interval_histogram.describe() are from 0 to about 2500
and results from curve_df.media_units.describe() are from 0 to about 200 000. So results are in perfect agreement with what is seen on the last graph I have shown in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants