Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add analysis functions for M&E results #13

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

ntellis
Copy link
Member

@ntellis ntellis commented Aug 13, 2024

Adds a pair of functions for evaluating the results of M&E wrt expected values and M&E inputs.

Currently identify_merged_dropped_orbits does not compare to expected merged orbits - would like to see the tracksubs to get an idea of how that should be structured.

analyze_me_output collects some pretty basic metrics at the moment right now - essentially it tabulates the number of observations that were expected in a particular linkage, how many of them landed in which orbit after M&E, and the total observations in the resulting orbits. The implication there is that any "extra" obs are bogus, if we are operating with the expected linkages as the truth state.

Currently there's no notion of arc lengths, maybe that would be useful for comparing different runs?

The members_expected is currently a FittedOrbitMembers, but could just as easily be an Associations

Wants tests/incorporation into existing tests (since the purpose is to analyze ipod results, after all)

@ntellis ntellis changed the title Nt/me analysis Add analysis functions for M&E results Aug 13, 2024
@ntellis ntellis requested a review from moeyensj August 13, 2024 16:49
ipod/utils.py Outdated
Comment on lines 483 to 512
if len(matching_members.orbit_id.unique()) == 1:
resulting_orbit_members_complete = members_me.where(
pc.equal(members_me.column("orbit_id"), matching_members.orbit_id[0])
)
me_results["old_orbit_id"].append(orb_id)
me_results["merged_orbit_id"].append(matching_members.orbit_id[0])
me_results["expected_num_obs"].append(len(orbit_members))
me_results["number_observations_matched"].append(len(matching_members))
me_results["total_obs_resulting_linkage"].append(
len(resulting_orbit_members_complete)
)
elif len(matching_members.orbit_id.unique()) > 1:
# case where an expected linkage was split between several orbits
for new_id in matching_members.orbit_id.unique():
resulting_orbit_members_complete = members_me.where(
pc.equal(members_me.column("orbit_id"), new_id)
)
me_results["old_orbit_id"].append(orb_id)
me_results["merged_orbit_id"].append(new_id)
me_results["expected_num_obs"].append(len(orbit_members))
me_results["number_observations_matched"].append(
len(
matching_members.where(
pc.equal(matching_members.column("orbit_id"), new_id)
)
)
)
me_results["total_obs_resulting_linkage"].append(
len(resulting_orbit_members_complete)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the conditional and can just use the logic where we have more than one unique orbit id. We will also need to control for when 0 orbit members match the expected orbit id.

@ntellis
Copy link
Member Author

ntellis commented Aug 23, 2024

I had mistaken the format of members_expected - had not realized Joachim had provided attributions within those data. I completely rewrote what I had in accordance with that. Now the output is structured around the primary designations, and how well the resulting M&E output mirrors the expected members for those primary designations.

I adopted a sort of "best fit", counting primarily the orbit coming out of M&E that had the least missing observations, as well as noting for each designation if there were multiple, unmerged orbits with observations with that designation.

The output I think is now actually useful, and allows us to rank how well a particular run has done. Attached is a sample csv with the breakdown for one M&E run.

sample_analysis.csv

Ex derived stats from the attached csv:

Num designations with a complete attribution: 88
Num designations with extra obs: 9
Num designations with missing obs: 7
Num designations with unmerged orbits: 40
Num designations unattributed: 2
Num designations with misattributed obs: 1
Total extra obs: 337
Total missing obs: 34.0

@ntellis ntellis requested a review from akoumjian August 23, 2024 20:38
Comment on lines +9 to +18
data_expected = {
"orbit_id": ["orbit1", "orbit1", "orbit2", "orbit2"],
"obs_id": ["obs1", "obs2", "obs3", "obs4"],
"primary_designation": [
"designation1",
"designation1",
"designation2",
"designation2",
],
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are orbits with overlapping observations handled?

Copy link
Contributor

@akoumjian akoumjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a higher level function which takes each of these analysis_dict results and collapses it to a row that counts the following:

Given a short arc of a known object, do we extend it to all the observations in NSC?
Given two non-overlapping sub-arcs, do we merge those?
Do we merge two overlapping sub-arcs?

It's not clear to me if we have the full information here to determine the above.

Then each summary of the analyzed results becomes a row in a table that looks like

[ p1 | p2 | ... | short_arc_extended | non_overlap_subarcs_merged | overlap_subarcs_merged]

Where p_x are the parameter values, and the other columns are percentages of completion (e.g. 4/5 short arc extensions were completed).

Then we can quickly see at a glance which parameter set(s) are the most optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants