SCM

[#6262] Error in nj(DM_copy) : missing values are not allowed in the distance matrix Consider using njs()

View Trackers | Support | Download .csv | Submit New | Monitor

[#6262] Error in nj(DM_copy) : missing values are not allowed in the distance matrix Consider using njs()

Submitted by:
Nobody
Date Submitted:
2015-11-16 21:03
Date Closed:
2017-12-13 18:52
Assigned to:
Andrés Pérez-Figueroa (anpefi)
Priority:
5
State:
Attachments:
CleanUp1_2subspeciesBlood.csv
Summary*:
Error in nj(DM_copy) : missing values are not allowed in the distance matrix Consider using njs()

Detailed description
Anonymous message posted by evaughn@email.arizona.edu

I had been running msap successfully with my dataset. I then reduced the number of loci in the dataset (after some data clean up) and now I am getting the following error:

Error in nj(DM_copy) :
missing values are not allowed in the distance matrix
Consider using njs()

I edited the source code to use njs() but I'm now left with an error that causes my PCoA plotting to fail:

Error in cmdscale(DM, k = length(inds) - 1, eig = T) :
NA values not allowed in 'd'

I am assuming that there are values in my distance matrix that don't exist possibly due to the reduction in the size of my data set? Do you have some advice that might help me remedy this problem? I've attached the data file I am running.

Thanks!

OR Attach A Comment: Notepad

Followup:

Message
Date: 2015-11-18 11:27
Sender: Andrés Pérez-Figueroa

Hi,

Thanks for your report.
The problem in your dataset is that you've found very few MSL with high proportion of missing data (0/0 pattern when using default no.bands="u" option). So, when you have the transformed MSL matrix (27 loci in your case) and when distance matrix is built it happens that some pairs of individuals cannot be compared as they are at least one NA across all loci compared, yielding a NA in the matrix.
By using njs() instead of nj() you can do the clustering because it is an algortithm designed for incomplete matrices. However, the PCoA cannot be done as, as far I know, there is not any algorithm allowing for working with missing data.
I need to think what is the best way to address this issue and then implement it, and it will take some time. Probably by using a different distance or any heuristic way to give uninformative states a distance. Suggestions are welcome about this.
In your specific dataset, you can use njs for clustering as you've done. But you should skip pcoa (do.pcoa=F) and amova as they cannot work. YOu get the csv file with the transformed profiles for MSL loci (where 2 stands for methylated state, 1/0 o 0/1, 1 stands for unmethylated state, 1/1, and NA is a missing value) and then try to find the way to analyse them with other software.
An alternative, if you could assume no (large) genetic differences between individuals across all the dataset you could assume that 0/0 patterns are much more probable to be caused by hemimethylation of the target than by mutation causing a lack of the target and then consider them as methylated states (1) instead of missing (NA). In this case (no.bands="h") you can run the full analysis. However, I don't think this applies to your data, as it seem to come from very different natural populations.

I'll keep open this issue until reach a solution.
Attach Files





Attached Files:

Attachments:
CleanUp1_2subspeciesBlood.csv

Change Log:

Field Old Value Date By
status_idDeleted2017-12-13 18:52None
close_date2017-12-13 03:072017-12-13 18:52None
status_idClosed2017-12-13 03:07None
close_date2017-12-12 14:122017-12-13 03:07None
status_idOpen2017-12-12 14:12None
close_dateNone2017-12-12 14:12None
detailsAnonymous message posted by evaughn@email.arizona.edu I had been running msap successfully with my dataset. I then reduced the number of loci in the dataset (after some data clean up) and now I am getting the following error: Error in nj(DM_copy) : missing values are not allowed in the distance matrix Consider using njs() I edited the source code to use njs() but I'm now left with an error that causes my PCoA plotting to fail: Error in cmdscale(DM, k = length(inds) - 1, eig = T) : NA values not allowed in 'd' I am assuming that there are values in my distance matrix that don't exist possibly due to the reduction in the size of my data set? Do you have some advice that might help me remedy this problem? I've attached the data file I am running. Thanks!2015-11-18 11:27anpefi
assigned_tonone2015-11-18 11:27anpefi
priority32015-11-18 11:27anpefi
File Added5054: CleanUp1_2subspeciesBlood.csv2015-11-16 21:03None
Thanks to:
Vienna University of Economics and Business University of Wisconsin - Madison Powered By FusionForge