-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add graph base matching #25
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR @sgibb . Just some quick response to the discussions:
I would keep the functions separate, |
Dear @sgibb thank you for inviting for the review (and sorry for late reply). Indeed I also think that It is the first time to review code, so I do not know how to use the GitHub-built in tools (I will not use them for the start). I have two points for the beginning:
In my opinion
Thus the "true" value is 300.01+-0.006 and 300.02+-0.006 with an overlap between 300.014-300.016. Thus, there is a joint peak reported.
I get the following error:
On the contrary
works fine Any ideas on that? |
@tnaake thanks for looking into this
I understand your point. I think this isn't a real problem. We have to document how @jorainer and @lgatto what is your opinion here? BTW: I use
|
@jorainer : regarding the common interface. I would like to use two matrices as input and provide a common interface for all joining functions. You use matrices in And I guess in |
@tnaake now I fixed the Now I got: x <- matrix(c(c(100.001, 100.0015, 100.002, 300.01, 300.02),
c(1, 3, 3, 1, 1)), ncol = 2, nrow = 5, byrow = FALSE)
colnames(x) <- c("mz", "intensity")
y <- matrix(c(c(100.0, 200.0, 300.002, 300.025, 300.0255),
c(1, 1, 1, 1, 1)), ncol = 2, nrow = 5, byrow = FALSE)
colnames(y) <- c("mz", "intensity")
joinGraph(x, y, ppm = 20)
# $x
# [1] 1 2 3 NA NA 4 5 NA
#
# $y
# [1] NA 1 NA 2 3 NA 4 5
#
mindiff <- function(x, y)-sum(abs(x[, 2L] - y[, 2L]), na.rm = TRUE)
joinGraph(x, y, ppm = 20, FUN = mindiff)
# $x
# [1] 1 2 3 NA NA 4 5 NA
#
# $y
# [1] 1 NA NA 2 3 NA 4 5
#
.joinGraph(x, y, ppm = 20)
# $x
# mz intensity
# [1,] 100.0010 1
# [2,] 100.0015 3
# [3,] 100.0020 3
# [4,] NA NA
# [5,] NA NA
# [6,] 300.0100 1
# [7,] 300.0200 1
# [8,] NA NA
#
# $y
# mz intensity
# [1,] NA NA
# [2,] 100.0000 1
# [3,] NA NA
# [4,] 200.0000 1
# [5,] 300.0020 1
# [6,] NA NA
# [7,] 300.0250 1
# [8,] 300.0255 1
#
.joinGraph(x, y, ppm = 20, FUN = mindiff)
# $x
# mz intensity
# [1,] 100.0010 1
# [2,] 100.0015 3
# [3,] 100.0020 3
# [4,] NA NA
# [5,] NA NA
# [6,] 300.0100 1
# [7,] 300.0200 1
# [8,] NA NA
#
# $y
# mz intensity
# [1,] 100.0000 1
# [2,] NA NA
# [3,] NA NA
# [4,] 200.0000 1
# [5,] 300.0020 1
# [6,] NA NA
# [7,] 300.0250 1
# [8,] 300.0255 1
# |
Codecov Report
@@ Coverage Diff @@
## master #25 +/- ##
==========================================
- Coverage 100% 99.74% -0.26%
==========================================
Files 20 22 +2
Lines 306 392 +86
==========================================
+ Hits 306 391 +85
- Misses 0 1 +1
Continue to review full report at Codecov.
|
@sgibb , sorry for the late reply. Regarding the use of matrices, if we have to use matrices then go ahead and use them. But you're right, then we have to use matrices also for all other joining functions. I just wanted to keep them as generic as possible to allow also their use if you have only |
Hi @sgibb, many thanks for looking into this. Here are some comments:
It is only a small check, but I am wondering if there can be a upstream function for checking the validity of spectra matrices, i.e. checking them on a list of matrices instead of matrices separately. Running
For 1000 spectra, this will result in 8 s or 0.008 s checks, respectively. It is not a lot of time I would say, but probably can be improved.
I receive the following output:
and for
Why one entry is duplicated, i.e. three peaks are returned instead of two for
|
@tnaake, could you elaborate a little on this? What exactly do you mean here? Changing the |
Dear @jorainer The question now is to which peak(s) do we apply the In my point of view, the I hope it is a bit clearer what I meant by this comment now. |
Thanks @tnaake for the explanation. I completely agree with you. The |
I had a look at the
Regarding the @sgibb , @lgatto and @tnaake what are your thoughts about that? |
I am voting for the 2nd solution. The error should be minimal. I agree that this should be properly documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All OK from my side. The problem with travis is strange, especially because in the jomaster branch I don't have it (yet?).
This PR introduces
joinGraph
an alternative implementation for the idea of graph based matching as described by @tnaake in rformassspectrometry/Spectra#49 and proposed in PR #23I am sorry but I don't want to accept @tnaake's PR #23 because of several problems.
(Ok, you could argue that you don't understand my code either.)
There are a few notable difference between my and @tnaake's approach:
igraph
package because the edge list contains already all information we need for an undirected graph.@tnaake: while I won't accept your PR I greatly appreciate your contributions and would be very happy if you would be so kind to test this function with your data and review the code (I will add you to the reviewer as soon as you accept the collaborator invitation).
Toy example:
Because of less memory allocation and vectorisation it is faster.
It also works with real (moderate small) spectra (depending on the number of possible combinations):
But it is much much slower than the other simple join commands (as expected):
Things to discuss:
type = "graph"
tojoin
and removejoinGraph
(this would require that the input arguments
x
andy
are always matrices instead ofnumeric
s forjoin
)dotproduct
,mindiff
, ...)dotproduct2
or.dotproduct
that doesn't run all these checks to save run time.