Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in infer.clonal.models: No clonal models for sample #33

Open
hoonghim opened this issue Oct 4, 2019 · 5 comments
Open

Error in infer.clonal.models: No clonal models for sample #33

hoonghim opened this issue Oct 4, 2019 · 5 comments

Comments

@hoonghim
Copy link

hoonghim commented Oct 4, 2019

Dear Ha X. Dang,

Hello, I am trying to analyze clonal evolution using PyClone and ClonEvol.

I have two WES samples from one patient.

When I followed the manual, I could not infer clonal models.

Here is my final input file for ClonEvol (it is stored in pyCloneResultMeltDcastDf below).

clonevol_input.txt

This is the original outcome from PyClone

KRCMC01270.PyClone.loci_results.txt

Below is the code for utilizing ClonEvol
#########################################################################

library(data.table)
library(clonevol)
library(reshape2)
library(tidyr)

pyCloneResult <- fread(/Absolute path/KRCMC01270.PyClone.loci_results.txt")

#To change the data frame structure - [mutation_id - sample_id - cluster_id - cellular_prevalence - cellular_prevalence_std - variant_allele_frequency] -> [mutation_id - cluster_id - sample1.vaf - sample2.vaf - sample1.cellular_prevalence - sample2.cellular_prevalence - sample1.cellular_prevalence_std - sample2.cellular_prevalence_std]
#https://stackoverflow.com/questions/11608167/reshape-multiple-value-columns-to-wide-format

pyCloneResultMeltDf <- melt(pyCloneResultDf, id.vars=c("mutation_id", "cluster_id", "sample_id"))

pyCloneResultMeltDcastDf <- dcast(pyCloneResultMeltDf, cluster_id + mutation_id ~ sample_id + variable)

#We have to start cluster id from 1, thus adding +1 to each cluster id (based on the clonevol manual)

    pyCloneResultMeltDcastDf$cluster_id <- pyCloneResultMeltDcastDf$cluster_id + 1

#To shorten vaf column names: "_variant_allele_frequency" -> "_vaf", "_cellular_prevalence" -> "_ccf", "---sampld-WBC" -> ""
#https://stackoverflow.com/questions/28700987/data-table-setnames-combined-with-regex

    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_variant_allele_frequency", "_vaf", names(pyCloneResultMeltDcastDf)))
    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_cellular_prevalence", "_ccf", names(pyCloneResultMeltDcastDf)))

#To remove the normal information ([Tumor---Normal_vaf] -> [Tumor_vaf]
setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("---\S+-\S+", "", names(pyCloneResultMeltDcastDf)))

#To change the - (minus) into _ (underbar)
setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("-", "_", names(pyCloneResultMeltDcastDf)))

    vaf.col.names <- grep('_vaf', colnames(pyCloneResultMeltDcastDf), value=T)
    ccf.col.names <- grep('_ccf$', colnames(pyCloneResultMeltDcastDf), value=T)
    sample.names <- gsub('_vaf', '', vaf.col.names)

#We utilize sample names as vaf columns (multiply 100 to utilize %)

    pyCloneResultMeltDcastDf[, sample.names] <- pyCloneResultMeltDcastDf[, vaf.col.names] * 100
    vaf.col.names <- sample.names

#We multiply 100 to ccf column (from proportion to percentage)
pyCloneResultMeltDcastDf[, ccf.col.names] <- pyCloneResultMeltDcastDf[, ccf.col.names] * 100

    # prepare sample grouping
    #sample.groups <-sample.names
    sample.groups <- c("C", "M")
    names(sample.groups) <- sample.names

    # setup the order of clusters to display in various plots (later)
    pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]

    # setup the order of clusters to display in various plots (later)
    pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]

   # To make a column which is corresponding to is.driver -> utilize CGC (cancer gene census genes) as a driver gene
Load CGC genes

cgc.file <- file.path("/BiO/Share/Database/COSMIC/grch37/v90/cancer_gene_census.csv")
cgc.df = read.csv(cgc.file, as.is = T)
cgc.genes = unique(cgc.df$Gene.Symbol)

    pyCloneResultMeltDcastDf$CGC <- sapply(strsplit(pyCloneResultMeltDcastDf$mutation_id, "_"), function(x) x[1]) %in% cgc.genes

    #Choosing colors for the clones
    clone.colors <- NULL

#Visualizing the variant clusters
outputFile <- gsub(pattern="loci_results.txt", replacement="loci_results_jitter.pdf", x = pyCloneResult)

    pdf(outputFile, width = 3, height = 3, useDingbats = FALSE, title='')
    pp <- plot.variant.clusters(pyCloneResultMeltDcastDf,
                                cluster.col.name = 'cluster',
                                show.cluster.size = FALSE,
                                cluster.size.text.color = 'blue',
                                vaf.col.names = vaf.col.names,
                                vaf.limits = 70,
                                sample.title.size = 10,
                                violin = FALSE,
                                box = FALSE,
                                jitter = TRUE,
                                jitter.shape = 1,
                                jitter.color = clone.colors,
                                jitter.size = 2,
                                jitter.alpha = 1,
                                jitter.center.method = 'median',
                                jitter.center.size = 1,
                                jitter.center.color = 'darkgray',
                                jitter.center.display.value = 'none',
                                highlight = 'is.driver',
                                highlight.shape = 21,
                                highlight.color = 'blue',
                                highlight.fill.color = 'green',
                                highlight.note.col.name = 'mutatin_id',
                                highlight.note.size = 2,
                                order.by.total.vaf = FALSE)
    dev.off()

#>> Here is the result
KRCMC01270.PyClone.loci_results_jitter.pdf

    #Plotting mean/median of clusters across samples (cluster flow)
    plot.cluster.flow(pyCloneResultMeltDcastDf, vaf.col.names = vaf.col.names,
                      sample.names = sample.names,
                      colors = clone.colors)

Here is the result.
image

########################################################################
#Inferring clonal evolution trees
y = infer.clonal.models(variants = pyCloneResultMeltDcastDf,
cluster.col.name = 'cluster',
#vaf.col.names = vaf.col.names,
ccf.col.names = ccf.col.names,
sample.groups = sample.groups,
cancer.initiation.model='monoclonal',
subclonal.test = 'bootstrap',
subclonal.test.model = 'non-parametric',
num.boots = 1000,
founding.cluster = 1,
cluster.center = 'mean',
ignore.clusters = NULL,
clone.colors = clone.colors,
min.cluster.vaf = 0.01,
# min probability that CCF(clone) is non-negative
sum.p = 0.05,
# alpha level in confidence interval estimate for CCF(clone)
alpha = 0.05)

########################################################################
###Following is the error messages

Calculate VAF as CCF/2
Sample 1: KRCMC01270_T1_D_ccf <-- KRCMC01270_T1_D_ccf
Sample 2: KRCMC01270_T2_D_ccf <-- KRCMC01270_T2_D_ccf
Using monoclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
KRCMC01270_T1_D_ccf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
KRCMC01270_T1_D_ccf : 0 clonal architecture model(s) found

lab vaf color parent ancestors occupied free free.mean
4 4 0.4168754 #cab2d6 NA - 0 0.4168754 NA
5 5 0.3003359 #ff99ff NA - 0 0.3003359 NA
3 3 0.2887949 #b2df8a NA - 0 0.2887949 NA
9 9 0.2780810 #cf8d30 NA - 0 0.2780810 NA
6 6 0.2759430 #fdbf6f NA - 0 0.2759430 NA
2 2 0.2343575 #a6cee3 NA - 0 0.2343575 NA
8 8 0.2068802 #bbbb77 NA - 0 0.2068802 NA
7 7 0.1714719 #fb9a99 NA - 0 0.1714719 NA
1 1 0.1211232 #cccccc NA - 0 0.1211232 NA
free.lower free.upper free.confident.level free.confident.level.non.negative
4 NA NA NA NA
5 NA NA NA NA
3 NA NA NA NA
9 NA NA NA NA
6 NA NA NA NA
2 NA NA NA NA
8 NA NA NA NA
7 NA NA NA NA
1 NA NA NA NA
p.value num.subclones excluded
4 NA 0 FALSE
5 NA 0 FALSE
3 NA 0 FALSE
9 NA 0 FALSE
6 NA 0 FALSE
2 NA 0 FALSE
8 NA 0 FALSE
7 NA 0 FALSE
1 NA 0 FALSE
ERROR: No clonal models for sample: KRCMC01270_T1_D_ccf
Check data or remove this sample, then re-run.

Also check if founding.cluster was set correctly!

Could you give me any idea how to solve this problem?

I think PyClone result is not very good because most variants are in cluster 1

image

Thank you in advance for your time

Sincreley,

Seung-hoon

@xmzhuo
Copy link

xmzhuo commented Mar 22, 2021

I have similar issue:
input is from pyclone vi with WGS data.
The cluster table
1 2 3 4 5
10 1805 203 116 1471

image

#The code I run

mutli_full_infer = infer.clonal.models(variants = multi_full,
cluster.col.name = 'cluster',ccf.col.names = paste(c('A','B'),'ccf',sep=''), sample.groups = sample_groups,cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric',num.boots = 1000, founding.cluster = 1, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, sum.p = 0.05, alpha = 0.05)

#error message
Calculate VAF as CCF/2
Sample 1: Accf <-- Accf
Sample 2: Bccf <-- Bccf
Using monoclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
Accf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
Accf : 0 clonal architecture model(s) found

lab vaf color parent ancestors occupied free free.mean free.lower
4 4 0.42025 #cab2d6 NA - 0 0.42025 NA NA
5 5 0.27755 #ff99ff NA - 0 0.27755 NA NA
2 2 0.16680 #a6cee3 NA - 0 0.16680 NA NA
3 3 0.09810 #b2df8a NA - 0 0.09810 NA NA
1 1 0.03360 #cccccc NA - 0 0.03360 NA NA
free.upper free.confident.level free.confident.level.non.negative p.value
4 NA NA NA NA
5 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
1 NA NA NA NA
num.subclones excluded
4 0 FALSE
5 0 FALSE
2 0 FALSE
3 0 FALSE
1 0 FALSE
ERROR: No clonal models for sample: Accf
Check data or remove this sample, then re-run.

Also check if founding.cluster was set correctly!

@edceeyuchen
Copy link

@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!

@seunghoonv
Copy link

@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!

Hi, edceeyuchen

Unfortunately, I couldn't solve the issue. And the author didn't reply to my question (maybe he is busy...).

It's been about 4 years since I couldn't solve this problem.

I think it would be helpful to find papers that use ClonEval and provide their custom script in their code availability section.

Sorry for not being helpful.

Seunghoon

@snowvov
Copy link

snowvov commented Dec 18, 2023

hello, try to use corrected VAF or CCF.
See: #21

@oghzzang
Copy link

oghzzang commented Jan 9, 2024

OMG. I found something. @edceeyuchen

I got the same error, But I changed the options "monoclonal" to "polyclonal".
It worked well!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants