[#466] phylo4 handling of number-like node labels causes surprises

2009-06-23 06:34
Submitted by:
Jim Regetz (regetz)
Assigned to:
François Michonneau (francois)
Operating System:
phylo4 handling of number-like node labels causes surprises

Detailed description
Because of its current handling of numeric node labels (and characters that can be coerced to numeric), phylo4() can unexpectedly return a phylo4d object rather than simply a phylo4 object:

tr <- phylo4(edge=matrix(data=c(4,1, 4,5, 5,2, 5,3,NA,4), ncol=2,
byrow=TRUE), tip.label=c("spA", "spB", "spC"), node.label=c("1", "2"))
## Warning message:
## In phylo4(edge = matrix(data = c(4, 1, 4, 5, 5, 2, 5, 3, NA, 4), :
## All node labels are numeric values and converted as data.
## [1] "phylo4d"

This leads to surprising errors especially in a programming context, e.g. when calling phylo4d() on phylo objects:

bird.orders$node.label <- 1:Nnode(bird.orders)
## Error in phylo4d(tree,,,, ...) :
## Your object is already a phylo4d object. If you want to modify the
## data attached to it look at the help for tdata()<-

Also note the printed representation after manually assigning number-like character node labels to the tree from above:

nodeLabels(tr) <- c("888", "999")
## [1] "888" "999"
## label node ancestr edge.length node.type labelValues labelValues
## 4 <NA> 4 NA NA root 888 1
## 5 <NA> 5 4 NA internal 999 2
## 1 spA 1 4 NA tip NA NA
## 2 spB 2 5 NA tip NA NA
## 3 spC 3 5 NA tip NA NA


Message  ↓
Date: 2009-06-25 22:15
Sender: Jim Regetz

The problems as reported were resolved (in r444) by allowing phylo4 to drop nodes labels but not convert them to data (and only for the phylo import case, at present). The ability to convert node labels to (numeric) tree data is now reimplemented (in r445) as an optional behavior for phylo4d, again specifically for the phylo import case. Consider opening a new ticket if similar behavior is needed in other cases.

Date: 2009-06-23 18:51
Sender: Jim Regetz

I had planned to start a discussion like this along with the bug report, but ran out of time, so thanks! I certainly appreciate the motivation for this, and have been bitten by non-unique numeric labels with MrBayes trees myself.

I see three cases that have bearing:
1. User doesn't care about node labels when importing a tree, and simply wants phylobase to be able to deal with them as needed if they are non-unique (e.g. drop or modify)
2. User explicitly wants to convert node labels to tree data -- perhaps even in the case of valid and/or non-numeric labels?
3. User has tree with character node labels like "1", "2", etc, and expects that these will be retained as node labels (these _are_ technically valid, no?)

IMHO, options for node label handling should *NOT* be in the main phylo4 constructor, which should use whatever node labels it is given (if any), producing an error if they are invalid. Instead, the job should fall to the relevant import functions, with phylo import being a biggie. One problem is that there is no phylo4("phylo") method, only the phylo->phylo4 coerce method (which doesn't take additional arguments).

So to address the ape phylo case, I just made the following changes locally:
* created a generic function for phylo4, similar to that for phylo4d
* added the old phylo4 function (sans node label replacement) as a method for signature x="matrix"
* created a new phylo4 method for x="phylo"; this function also takes an argument check.node.labels=c("keep", "drop")
* modified the phylo4d method for x="phylo" to take a new argument check.node.labels=c("keep", "drop", "asdata")

With this, phylo4 can be called on ape phylo objects directly, similar to the behavior of phylo4d. Both methods keep node labels as given by default, but can be instructed to drop them. Further, the phylo4d function can be instructed to drop them as labels but keep them in the data table, as you had done previously in the phylo4 function.

How does this sound? I'm not sure why phylo4 wasn't already a generic function, but it seems like a useful change to me. The main user-level change is that the first argument of phylo4 becomes 'x' instead of 'edge', but after tweaking the examples that explicitly used a first argument named 'edge' (and one instance in the R source code), R CMD CHECK passes other than warnings because I haven't documented the new methods.

Then are there other places in phylobase where this sort of handling is desired? Something involving readNexus perhaps?

[Worth bringing this to the attention of the devl list?]

Date: 2009-06-23 13:08
Sender: François Michonneau

I implemented this functionality, because when I import trees from MrBayes, it was rather inconvenient to have posterior probabilities as node labels.

I have been working on various changes that I will commit soon. One of them includes checking that replacing labels with numerical values produces an error message. This will resolve the last issue you reported.

Would you rather have the automatic conversion from phylo4 to phylo4d in the case of numerical node label be optional or better handle labels for not allowing numerical node labels?

As a quick thought, if it's optional it would have to be turned off by default (otherwise people are going to have the same problem as you did with nodeLabels). On the other hand, first time users who want to import a MrBayes in phylobase (and don't know about the option) are going to fail. Indeed, if you have 2 nodes with identical posterior probabilities then the tree can't be imported.

I guess we could add a special paragraph in the vignette about how to import trees from MrBayes.

Attached Files:


Field Old Value Date By
close_date2009-06-25 22:152009-06-25 22:15regetz
ResolutionNone2009-06-25 22:15regetz
status_idOpen2009-06-25 22:15regetz
assigned_tonone2009-06-23 13:08francois
Thanks to:
Vienna University of Economics and Business Powered By FusionForge