Skip to content

Commit

Permalink
Added 'get_all_subset_matches()' function to the package
Browse files Browse the repository at this point in the history
  • Loading branch information
gavieira committed Jan 20, 2024
1 parent 7869cfa commit adc3134
Show file tree
Hide file tree
Showing 4 changed files with 109 additions and 0 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
export("%>%")
export(biblioverApp)
export(biblioverlap)
export(get_all_subset_matches)
export(matching_summary_plot)
export(merge_input_files)
export(merge_results)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
* Added package logo to plots
* Added `merge_results`: a function to merge biblioverlap's results into a single dataframe
* Added `merge_input_files`: a function to merge multiple files from the same source into a single file
* Added `get_all_subset_matches`: a function to recover missing matches in subsets of data due to differences in fields from distinct bibliographic sources


# biblioverlap 1.0.3
Expand Down
49 changes: 49 additions & 0 deletions R/05-biblioverlap.R
Original file line number Diff line number Diff line change
Expand Up @@ -286,3 +286,52 @@ merge_results <- function(db_list, filter = 'none') {

return(df)
}


#' Get all matches from a given subset of biblioverlap's results
#'
#' @param subset_db_list - a subset of the results generated by [`biblioverlap`]
#' @param db_list - the full set of results generated by [`biblioverlap`]
#'
#' @return - the subset data plus any other records outside the subset that have been matched to its documents
#' @importFrom rlang .data
#' @export
#'
#' @examples
#' #Running document-level matching procedure for two datasets
#' biblioverlap_results <- biblioverlap(ufrj_bio_0122[1:2])$db_list
#'
#' #Change the document type of one of the datasets from 'journal article' to
#' #'article' to emulate bibliographical source differences
#' biblioverlap_results[[2]][['Publication Type']] <- gsub('journal article',
#' 'article',
#' biblioverlap_results[[2]][['Publication Type']])
#'
#' #Generating venn diagram for the entire dataset
#' venn_plot(biblioverlap_results)
#'
#' #Obtaining only the subset of records with publication type 'article'
#' biblioverlap_results_subset <- lapply(biblioverlap_results, function(db) {
#' db[db$'Publication Type' == "article", ] })
#'
#' #Generating venn diagram for data subset
#' #Returns us how many documents categorized as 'article' are unique to a given
#' #dataset and how many find a match against other documents in the subset
#' #(i.e. that are also categorized as 'article', in this example)
#' venn_plot(biblioverlap_results_subset)
#'
#' #Recovering missing matches due to bibliographical source differences
#' #in the subsetting process
#' subset_all_matches <- get_all_subset_matches(biblioverlap_results_subset,
#' biblioverlap_results)
#'
#' #Generating venn diagram for data subset plus all its matches
#' #Returns us how many documents categorized as 'article' are unique to a given
#' #dataset and how many find a match against any other document
#' venn_plot(subset_all_matches)

get_all_subset_matches <- function(subset_db_list, db_list) {
filtered_uuids <- unique(unlist(lapply(subset_db_list, function(db) db$UUID )))
no_missing <- lapply(db_list, function(db) dplyr::filter(db, .data$UUID %in% filtered_uuids))
return(no_missing)
}
58 changes: 58 additions & 0 deletions man/get_all_subset_matches.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit adc3134

Please sign in to comment.