Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return single value in dcast.data.table(). Required in data.table > 1.5. Fixes #94 #95

Closed
wants to merge 1 commit into from

Conversation

JanMarvin
Copy link
Contributor

This fixes outstanding issue #94 (since this is a reported revdep issue, CRAN might start throwing errors for mschart soon). I've tested it (only) with

ms_linechart(data = iris, x = "Sepal.Length", y = "Sepal.Width", group = "Species")

using data.table from CRAN and github and for the latter can confirm the problem with the current CRAN release of mschart.

As mentioned before, I'm not necessarily the biggest fan of the underlying aggregation function, though I did look at it for a moment longer than necessary. Mainly to find out what is actually going on. Below is a small sample data set. This shows that mschart matches gpplot2.

Unfortunately, both silently ignore variations in the grouping variable. I would have preferred it if - especially in cases where there is a significant deviation in the grouping value - the console would flash up with warnings and/or the aggregation function would default to something like the mean or median.

dat <- data.frame(
  orders  = c(1, 1, 5, 2, 9),
  expense = c(50e6, 50, 50e6, 4, 5),
  sales   = c(1000, 1000, 50, 500, 700)
)

library(mschart)
scatter <-
  ms_scatterchart(
    data = dat, x = "orders",
    y = "sales", group = "expense"
  )
scatter <- chart_settings(scatter, scatterstyle = "marker")

library(officer)
doc <- read_pptx()
doc <- add_slide(doc, layout = "Title and Content", master = "Office Theme")
doc <- ph_with(doc, value = scatter, location = ph_location_fullsize())

print(doc, target = "example.pptx")

system("soffice example.pptx &")
# unlink("example.pptx")

library(ggplot2)
ggplot(dat, aes(x = orders, y = sales, group = expense, col = expense)) +
  geom_point()

Screenshot_2024-04-14_16-20-33

@eli-daniels
Copy link
Collaborator

Hi @JanMarvin, thanks a lot for looking into this. I'm just a bit worried that this method is going to loss some of the user's data. In a case where value pairs of x and group are not unique, this method only takes the last pair. Had you considered this issue? Keen to hear your thoughts.

As an example (and also just me exploring how things work internal), taking the existing test and adding some value pairs of x and y illustrates this issue:

library(officer)
library(mschart) 
library(ggplot2)

dat <- data.frame(
  year = rep (c(2004, 2005, 2006), 3),
  musician=c(rep(c("Robert Wyatt", "John Zorn", "Damon Albarn"),3)),
  count = c(120, 101, 131, 200, 154, 187, 122, 197, 159),
  stringsAsFactors=F)

dat_x_group_duplicates <- data.frame(
  year = c(2004, 2005, 2004, 2005),
  musician=c(rep("Damon Albarn",2), rep("John Zorn",2)),
  count = c(8888, 9999, 8888, 9999),
  stringsAsFactors=F)

dat_with_dups <- rbind(dat, dat_x_group_duplicates)

chart_01 <- ms_linechart(data = dat_with_dups, x = "year", y = "count", group = "musician")

doc <- read_pptx()
doc <- add_slide(doc, layout = "Title and Content", master = "Office Theme")
doc <- ph_with(doc, value = chart_01, location = ph_location_fullsize())

print(doc, target = "chat01_example.pptx") |> browseURL()

# ggplot comparison
ggplot(data = dat_with_dups, aes(x = year, y = count, colour = musician))+
  geom_line()


image

image

It looks like ggplot also struggles with this kind of data, and its true that its not necessarily well structured for this chart type. One option would be to use your fix but add a warning when data is being lost such as: if(any(duplicated(dataset, by = c(x, group)))) warning("x and group value pairs should be unique, otherwise duplicate pairs will not be included."). After all maybe it should be up to the user to structure the data in a way that makes sense.

Noting that the scatter chart type (where duplicated x and group value pairs would have to be compatible) is not effected by a change to dcast_data as it uses a different underlying function to structure the data.

Continuing the above code:

chart_02 <- ms_scatterchart(data = dat_with_dups, x = "year", y = "count", group = "musician")
x <- read_pptx()
x <- add_slide(x, layout = "Title and Content", master = "Office Theme")
x <- ph_with(x, value = chart_02, location = ph_location_fullsize())

print(x, target = "scatter_example.pptx") |> browseURL()

Output:
image

Although we can't see all the points, its because they cover each other, the data is definitely all there:
image

FYI @davidgohel

@eli-daniels
Copy link
Collaborator

So in summary, I think we keep your solution @JanMarvin with a warning:
if(any(duplicated(dataset, by = c(x, group)))) warning("x and group value pairs should be unique, otherwise duplicate pairs will not be included in chart.")

@JanMarvin
Copy link
Contributor Author

Have you looked into this topic? I'm curious to hear your thoughts.

To be honest, I haven't really given it much thought. I saw that the data.table issue was not worked on and wanted to do my part to keep mschart alive.

I don't use the aggregation part of the library for my own tasks in combination with openxlsx2. So I just had a quick look at what was already available in mschart and was initially a bit shocked when I realized that obviously various combinations were ignored, hence my extreme example, but relieved when I saw that at least ggplot2 behaves similarly.

@eli-daniels eli-daniels mentioned this pull request Apr 26, 2024
@davidgohel
Copy link
Member

thank you @JanMarvin @eli-daniels

fixed in da9fb75

@davidgohel davidgohel closed this Apr 26, 2024
@JanMarvin JanMarvin deleted the gh_issue_94 branch April 26, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants