SCM

[#2157] required, optional and NA.

Date:
2012-07-19 19:34
Priority:
3
State:
Open
Submitted by:
Jeroen Ooms (jeroenooms)
Assigned to:
Nobody (None)
Hardware:
None
Product:
None
Operating System:
None
Component:
None
Version:
None
Severity:
None
Resolution:
None
URL:
Summary:
required, optional and NA.

Detailed description
Mailed some of this earlier, just reposting it here to make sure it doesn't get lost. Some suggestions for improvements:

(1) It seems that the required / optional flag is largely ignored:

> jeroen <- new(Person)
> jeroen$name
[1] ""
> jeroen$id
[1] 0
> serialize(jeroen, NULL)
raw(0)

Two things are somewhat unexpected:

- Even though the 'name' and 'id' fields have not been set, the getter return actual values "" and 0. Better might be NULL. For booleans a non-set value is indistinguishable from FALSE.
- I would have expected that the object won't serialize until all required properties are set. However there is no warning or error at all when a required field is omitted.


(2) Unlike R, protobuf has no concept of NA for types other than double. Hence when a user tries to serialize an R vector of e.g type Logical or Character to repeated bool or string, it should at the very least give a warning I think. Especially for bool it is unfortunate because a NA gets converted to TRUE. E.g

message Logical {
repeated bool value = 1;
}

> mybuf <- new(Logical, value=c(TRUE, FALSE, NA))
> mybuf$value
[1] TRUE FALSE TRUE



Comments:

Message  ↓
Date: 2013-07-12 08:13
Sender: Murray Stokely

Just wanted to provide an update.

For #1 serialize(jeroen, NULL) now returns a proper R error if the jeroen object is not properly initialized instead of returning an empty raw vector.

The name and ID fields that have not been sent yet will still return "" and 0 by default, but you can use has to see if they are set.

For the new extensions support, where we don't have to worry about breaking backwards compatibility, I made getExtension("foo") return NULL if "foo" hasn't been set yet. Perhaps we can move to this in a future release for fields as well, possibly with an option to control this behavior.

For (2), we've also implemented a check for this now. You get an error message if you try to assign an NA to a proto bool value.

All of these fixes are in the SVN version and we should do a new release for CRAN sometime soon.

Date: 2012-08-09 20:36
Sender: Jeroen Ooms

Thanks for looking at this. The first issue still stands though, the 'getters' return values even if they are not initialized. This seems a bit risky. Better might be to return NULL for non-initialized values.

> test <- new(Person)
> test$isInitialized()
[1] FALSE
> test$name
[1] ""
> test$id
[1] 0

Also is there any methods of finding out which fields exactly are uninitialized? In case of a complex message it might be useful to have a more informative error, when some required fields are missing.

Thanks again, I'm in the middle of some other stuff now but will play with it a bit more later on.

Date: 2012-08-09 00:56
Sender: Murray Stokely

I agree with all of the issues mentioned here, and I've had to work around the logical issue in my own Rcpp-using code by encoding them as ints, because I guess Rcpp's wrap() is not doing the right thing. It also doesn't do the right thing for 64-bit integers, but that is more expected with R.
For the serialization issue, in the C++ it is calling SerializePartial from the protobuf API which allows serialization even if the required fields are not set. The easiest way to fix that is maybe to just add a stopifnot(object$isInitialized()) from the R calling code. I will send a patch to Dirk that does that.

I'll also take a quick look at the easiest way to fix the logical NA case, but no promises there..

Attached Files:

Changes

No Changes Have Been Made to This Item

Thanks to:
Vienna University of Economics and Business Powered By FusionForge