SCM

[#2803] ROC command appears to calculate pvp and pvn incorrectly

Date:
2013-06-01 23:29
Priority:
3
State:
Open
Submitted by:
David Cavallucci (dcava)
Assigned to:
Nobody (None)
Hardware:
None
Product:
None
Operating System:
All
Component:
None
Version:
v1.1
Severity:
None
Resolution:
None
URL:
Summary:
ROC command appears to calculate pvp and pvn incorrectly

Detailed description
I'm not sure if it is just my use case, but using the ROC function currently returns incorrect values for positive and negative predictive values (pvp, pvn)

The current code to calculate pvp/pvn in ROC.R is:

...
pvp <- m[, 2]/m[, 3]
pvn <- (m[nr, 1] - m[, 1])/(m[nr, 3] - m[, 3])
...

My original data is in the form of a k x 3 dataframe

>head(nhiv)
n status test
1 0 12
2 1 12
3 1 12
4 1 12
5 1 12
6 1 12
7 1 12
8 1 12
9 1 12
10 1 12
...

ROC coerces this into this data frame ("m"):
0 1 Sum
0 0 0
2 199 0 199
3 274 2 276
4 290 9 299
5 293 18 311
6 296 33 329
11 298 69 367
12 299 90 389

If we define pvp as true pos/sum(test pos) and pvn as true neg/(sum(test(neg), then
looking at this dataframe, I cannot see how the above code will appropriately calculate pvp and pvn.

The fix for me is:

pvp <- (m[nr, 2]-m[,2])/((m[nr, 2]-m[,2])+(m[nr,1]-m[,1]))
pvn <- m[,1]/m[,3]

You can see the different results below:
Current code:
sens spec pvp pvn ehiv$cutoff
1.0000000 0.0000000 NaN 0.76863753 -Inf
2 1.0000000 0.6655518 0.000000000 0.52631579 2
3 0.9777778 0.9163880 0.007246377 0.22123894 3
4 0.9000000 0.9698997 0.030100334 0.10000000 4
5 0.8000000 0.9799331 0.057877814 0.07692308 5
6 0.6333333 0.9899666 0.100303951 0.05000000 6
11 0.2333333 0.9966555 0.188010899 0.04545455 11
12 0.0000000 1.0000000 0.231362468 NaN 12

My alteration:

sens spec pvp pvn ehiv$cutoff
1.0000000 0.0000000 0.2313625 NaN -Inf
2 1.0000000 0.6655518 0.4736842 1.0000000 2
3 0.9777778 0.9163880 0.7787611 0.9927536 3
4 0.9000000 0.9698997 0.9000000 0.9698997 4
5 0.8000000 0.9799331 0.9230769 0.9421222 5
6 0.6333333 0.9899666 0.9500000 0.8996960 6
11 0.2333333 0.9966555 0.9545455 0.8119891 11
12 0.0000000 1.0000000 NaN 0.7686375 12


I would be grateful if someone could check this to see if this is just my data structure or differing definitions, or a true bug. Attached is the whole data set in long format.

Thanks for the package, it is great!
David.

Comments:

No Comments Have Been Posted

Attached Files:

Attachments:
Size Name Date By Download
4 KiBlong_hiv.csv2013-06-01 23:29dcavalong_hiv.csv

Changes

Field Old Value Date By
File Added543: long_hiv.csv2013-06-01 23:29dcava
Thanks to:
Vienna University of Economics and Business Powered By FusionForge