Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/B Testing Message Variable Creation (March release) #1206

Open
depietrodeanna opened this issue Jan 30, 2025 · 26 comments
Open

A/B Testing Message Variable Creation (March release) #1206

depietrodeanna opened this issue Jan 30, 2025 · 26 comments
Labels
CCC Priority 2 Issues will be prioritized in the upcoming/next release

Comments

@depietrodeanna
Copy link
Collaborator

We need a new message variable created in order to consume the denominator of recruits who were sent each A/B message iteration as part of the experiment. New var will have 6 levels (one for each message option). Each recruit will have one of the six messages sent to them as part of this experiment.

The sites will need to either push the new var as part of the deidentified data they normally send upfront for recruits, or just record it on their side and send us these data in a batch at the end of the experiment.

Question for Dev team- do all the sites have to send the new var the same way/time (in real time or after the close of the experiment), or can they make individual decisions around when to send use these data based on what they prefer? Unsure if the timing has implications for data QC on our end. Pending input and decision from @anthonypetersen and @FrogGirl1123.

After var is created, @hullingsag to add to DD and we will give Concept ID to sites. Will require testing as part of the Feb release work. Question for Dev- in the letter experiment, we asked all participating sites to test sending the new var in Dev and Stage; do we need to do the same for the A/B work for this new message var?

@hullingsag
Copy link
Collaborator

hullingsag commented Feb 3, 2025

I've created an entry in the data dictionary change log for the new A/B testing message variable. I have a few follow up questions to complete this entry:

  • What is the Workflow, Primary Source, and Secondary Source information for this variable?
    - Resolved on 2/3/25: Workflow = Recruitment, Primary source = Recruitment, Secondary source = Sign-in
  • What are the variable responses for the 6 message options?
  • Will this be a required variable, default variable, or contain PII?

@hullingsag
Copy link
Collaborator

Adding questions about the A/B power calculations to this thread to discuss at tomorrow's meeting:

  1. What are the outcomes of interest?
    -Total number of responses
    -Conversion rate (e.g., number of people that click on vs. sign in vs. consent vs. verify?)

  2. Will we want to stratify or test for differences between groups (e.g., by site location, race/ethnicity, sex, or age group?)
    -If so, then we will need larger samples to achieve power

  3. Information to calculate parameters:
    -What is the average response rate or conversion rate?
    -Over what period are we calculating this from? (e.g., in the last week, month, year?)
    -Do we know if these outcomes differ by potential comparison groups?

@FrogGirl1123
Copy link
Collaborator

It's helpful to see this issue after the conversation today. Since this confirms that we do need a new variable to capture the data, then from the analyst perspective, I would like it sent at the same time as the other de-identified demographic variables. @anthonypetersen does this work for you too? @depietrodeanna could you provide @hullingsag with the 6 message types for the response portion of the variable? They can be modified later if needed, but this way we can create the variable and assign concept IDs.

@anthonypetersen
Copy link
Contributor

That works @FrogGirl1123

To confirm, the de-identified process is when they use the submitParticipantData API, correct?

@FrogGirl1123
Copy link
Collaborator

Yes, @anthonypetersen

@anthonypetersen
Copy link
Contributor

@mnataraj92 has a data dictionary change log been made to add variables for the denominator CID as well as the various responses?

@mnataraj92
Copy link
Collaborator

There's a change log entry but it's not yet complete; once Deanna provides Autumn with the variable responses and Nicole approves the entry, I can work with Autumn to get it into the dictionary.

@FrogGirl1123
Copy link
Collaborator

FrogGirl1123 commented Feb 6, 2025

Hi @anthonypetersen We're going to need to make this a State variable if we want the sites to push this with the other de-identified data sent when a recruit is made active, submitPaticipantData API. If we don't care about the timeliness of getting this information then it can be sent through the update participant data API. I prefer the first option, but I can be flexible if it's a problem.

@depietrodeanna
Copy link
Collaborator Author

There was a consensus at the last A/B WG meeting around abandoning the creation of this variable in exchange for using the token as a way to randomize recruits into 6 even-ish groups for this experiment. Renelle is sending out a finalization email to the group to codify that decision. If we proceed with this approach, we no longer need this new variable and can close this issue. I will update this group ASAP.

@FrogGirl1123
Copy link
Collaborator

FrogGirl1123 commented Feb 6, 2025

@depietrodeanna , I saw the email, I'm confused as to how this gets us the denominator? Creation of a token doesn't mean the invite was sent. Also when we made that decision on this week's call it was without reference to the previous meeting, because no one seemed to remember what the previous decision was or why.

@depietrodeanna
Copy link
Collaborator Author

@FrogGirl1123 is there a way for us to ingest all the tokens for recruits that have actually received an invitation? That would get us the denominator, I think. @brotzmanmj adding you here given our discussion this morning.

@FrogGirl1123
Copy link
Collaborator

FrogGirl1123 commented Feb 6, 2025

@depietrodeanna if we are generating the tokens when the study IDs are sent, then we can use the setting of the not active to active flag to determine which "special" tokes received invites and use the last two digits to determine which communication they received. It's not as straight forward but we could do it. If the sites are generating the tokens it gets a bit more convoluted.

@brotzmanmj
Copy link
Collaborator

hi, we would generate the tokens as usual, sites would not generate tokens. we would need to know the start date and end date of the experiment at each site when they go live with the a/b testing. during that time period all tokens that become active recruits (known to us by de-identified data sent date) sites would assign a message group based on last digit of token. HP would define the randomization plan, all sites would have to follow. does that work?

@FrogGirl1123
Copy link
Collaborator

Thanks, Michelle! Yes, that works.

@depietrodeanna
Copy link
Collaborator Author

Thanks, Michelle. Just to note, as we may need to come to a firm decision on this, HP suggested on the call Tuesday that they did not intend to include every active recruit during the timeframe of this experiment in the actual experiment, just a subset. Would that present any challenges (i.e., would we be able to distinguish which active recruit tokens from HP were involved in the experiment or no?)

@brotzmanmj
Copy link
Collaborator

to my mind that complicates things. we need a clear way to know who was included or not. if we cannot make a straightforward accurate assumption about this for analysis, I suggest we create the variable

@depietrodeanna
Copy link
Collaborator Author

Makes sense to me. I'll circle back with the HP. To use the token scheme, it sounds like all participating sites need to agree to use all active recruits during the timeframe of this experiment in order to avoid creation of another variable.

@depietrodeanna
Copy link
Collaborator Author

depietrodeanna commented Feb 7, 2025

It's helpful to see this issue after the conversation today. Since this confirms that we do need a new variable to capture the data, then from the analyst perspective, I would like it sent at the same time as the other de-identified demographic variables. @anthonypetersen does this work for you too? @depietrodeanna could you provide @hullingsag with the 6 message types for the response portion of the variable? They can be modified later if needed, but this way we can create the variable and assign concept IDs.

Decision- we are going to pursue the original plan of creating a new variable. @hullingsag is there a length or format for the 6 response options I should follow? Can they be something like, "altruism personal," "altruism general," "cancer connection personal," "cancer connection general," "research personal," "research general" or similar?

@sonyekere sonyekere added CCC Priority 2 Issues will be prioritized in the upcoming/next release and removed CCC Priority 1 Issues to be addressed in the current release labels Feb 10, 2025
@sonyekere sonyekere changed the title A/B Testing Message Variable Creation (Feb release) A/B Testing Message Variable Creation (March release) Feb 10, 2025
@hullingsag
Copy link
Collaborator

hullingsag commented Feb 10, 2025

This variable has been added to the data dictionary change log and its waiting on approval.

The responses are coded as:
0 = altruism personal
1 = altruism general
2 = cancer connection personal
3 = cancer connection general
4 = research personal
5 = research general

We are still waiting on the following information:
Is the variable required? (yes/no)
Does it contain PII? (yes/no)
Is it a default variable? (yes/no)

@FrogGirl1123 or @mnataraj92 - could you help with these questions?

@brotzmanmj
Copy link
Collaborator

Additional requirements from leads mtg yesterday: Send via submitParticipantData api- doesn't need to be in real time and we don’t need to set a date based on it. Eventually would like sites to send in real time as part of de-identified data push.

@anthonypetersen
Copy link
Contributor

just a reminder that we ask / encourage sites to only use the submitParticipantData API once per participant

@brotzmanmj
Copy link
Collaborator

thanks Tony, so if they are not able to send in real time with the de-id data push, they should send using updatePartcipantAPI?

@anthonypetersen
Copy link
Contributor

I think that's why it was suggested we allow them to send it with either API

@brotzmanmj
Copy link
Collaborator

perfect, thanks

@depietrodeanna
Copy link
Collaborator Author

This variable has been added to the data dictionary change log and its waiting on approval.

The responses are coded as: 0 = altruism personal 1 = altruism general 2 = cancer connection personal 3 = cancer connection general 4 = research personal 5 = research general

We are still waiting on the following information: Is the variable required? (yes/no) Does it contain PII? (yes/no) Is it a default variable? (yes/no)

@FrogGirl1123 or @mnataraj92 - could you help with these questions?

Answers to these questions, from our perspective, answers are all no.

@mnataraj92
Copy link
Collaborator

mnataraj92 commented Feb 28, 2025

CID for A/B testing message responses variable: state_d_956485028

Response CIDs:
562663942 | 0 = Altruism Personal
686986259 | 1 = Altruism General
477331464 | 2 = Cancer Connection Personal
935486262 | 3 = Cancer Connection General
518814501 | 4 = Research Personal
307763550 | 5 = Research General

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CCC Priority 2 Issues will be prioritized in the upcoming/next release
Projects
Status: Backlog
Development

No branches or pull requests

8 participants