SCM

[#1723] race condition in build R-2.14.0 on amd64

Date:
2012-01-03 02:55
Priority:
3
State:
Closed
Submitted by:
Steven Trogdon (strogdon)
Assigned to:
Martin Maechler (mmaechler)
Hardware:
Other
Product:
None
Operating System:
Linux
Component:
None
Version:
None
Severity:
normal
Resolution:
Fixed
URL:
https://bugs.gentoo.org/show_bug.cgi?id=395403
Summary:
race condition in build R-2.14.0 on amd64

Detailed description
A race condition seems to occasionally present itself in building R-2.14.0 in parallel on amd64. This is listed as a gentoo bug:

https://bugs.gentoo.org/show_bug.cgi?id=395403

where in building CHOLMOD in the provided Matrix tarball a failure often occurs as:

make[4]: Entering directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD'
( cd Lib ; make )
make[4]: Entering directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD'
( cd Lib ; make clean )
make[5]: Entering directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD/Lib'
x86_64-pc-
../Core/cholmod_aat.c -o cholmod_aat.o
x86_64-pc-
../Core/cholmod_add.c -o cholmod_add.o
make[5]: Entering directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD/Lib'
x86_64-pc-
../Core/cholmod_band.c -o cholmod_band.o
make[5]: Leaving directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD/Lib'
make[4]: Leaving directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD'

where then cholmod_aat.o cannot be found when an attempt is made to create CHOLMOD.a

ar: cholmod_aat.o: No such file or directory
make[5]: *** [../../CHOLMOD.a] Error 1
make[5]: Leaving directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD/Lib'
make[4]: *** [library] Error 2
make[4]: Leaving directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src/CHOLMOD'
make[3]: *** [sublibraries] Error 1
make[3]: Leaving directory
`/var/tmp/portage/dev-lang/R-2.14.0/temp/RtmpdTJrs4/R.INSTALL6ecfe941/Matrix/src'
ERROR: compilation failed for package Matrix
* removing /var/tmp/portage/dev-lang/R-2.14.0/work/R-2.14.0/library/Matrix
make[2]: *** [Matrix.ts] Error 1

The compressed log of the build is attached to the above gentoo bug at

https://bugs.gentoo.org/attachment.cgi?id=296477

Let me know if the build log should be attached directly to this bug report.

Comments:

Message  ↓
Date: 2013-01-29 21:41
Sender: Martin Maechler

Thank you, Sebastien,
for the follow up.

Your patch (which looks ugly below but I got alright by regular e-mail) seems *very* reasonable, and it will be in the next release of Matrix and the subsequent releases of R (2.15.3, and 3.0.0 fff)

Best regards,
Martin Maechler

Date: 2013-01-29 18:36
Sender: Sebastien Fabbro

https://bugs.gentoo.org/show_bug.cgi?id=395403#c21

Date: 2013-01-29 03:55
Sender: Sebastien Fabbro

about ~10 people can reproduce it on gentoo,and i can reproduce it on other platforms.

the race condition is not triggered by the makefiles in sparsesuite, but by the sublibs target in the Makevars file, with enough threads (reproducible with make -j8 and above) , the cleaning action can actually occur after the creation of objects. then the command ar -rcus tries to archive files that just have been deleted.

applying this patch should fix this issue:

Index: Makevars
===================================================================
--- Makevars (revision 2863)
+++ Makevars (working copy)
@@ -23,7 +23,7 @@
## INSTALL only cleans src/*.o src/*$(SHLIB_EXT) for each arch
sublibs: subclean sublibraries

-sublibraries:
+sublibraries: subclean
@for d in $(SUBDIRS); do \
(cd $${d} && CFLAGS="$(CFLAGS)" CXXFLAGS="$(CXXFLAGS)" MkInclude="$(MkInclude)" $(MAKE) library) || exit 1; \
done


Date: 2012-09-03 07:41
Sender: Martin Maechler

We currently much rely on the build / Makefile structure provided by the upstream
code of SparseSuite by Tim Davis.
We have not seen the bug ourselves,
*and* we have updated the Sparsesuite in summer 2012.

Before trying to fix this, we definitely need
to be able to reproduce ourselves with a new version of Matrix (>= 1.0-9, to be releases RSN ;-)

Date: 2012-01-11 00:50
Sender: Steven Trogdon

The potential still exists for this identical race condition in building R-2.14.1 on 64bit machines, i.e.

ar -rucs ../../CHOLMOD.a cholmod_aat.o cholmod_add.o cholmod_band.o cholmod_change_factor.o cholmod_common.o cholmod_complex.o cholmod_copy.o cholmod_dense.o cho
lmod_error.o cholmod_factor.o cholmod_memory.o cholmod_sparse.o cholmod_transpose.o cholmod_triplet.o cholmod_check.o cholmod_read.o cholmod_write.o cholmod_amd.
o cholmod_analyze.o cholmod_colamd.o cholmod_etree.o cholmod_factorize.o cholmod_postorder.o cholmod_rcond.o cholmod_resymbol.o cholmod_rowcolcounts.o cholmod_ro
wfac.o cholmod_solve.o cholmod_spsolve.o cholmod_drop.o cholmod_horzcat.o cholmod_norm.o cholmod_scale.o cholmod_sdmult.o cholmod_ssmult.o cholmod_submatrix.o ch
olmod_vertcat.o cholmod_symmetry.o cholmod_rowadd.o cholmod_rowdel.o cholmod_updown.o cholmod_super_numeric.o cholmod_super_solve.o cholmod_super_symbolic.o cho
lmod_l_aat.o cholmod_l_add.o cholmod_l_band.o cholmod_l_change_factor.o cholmod_l_common.o cholmod_l_complex.o cholmod_l_copy.o cholmod_l_dense.o cholmod_l_error
.o cholmod_l_factor.o cholmod_l_memory.o cholmod_l_sparse.o cholmod_l_transpose.o cholmod_l_triplet.o cholmod_l_check.o cholmod_l_read.o cholmod_l_write.o cholmo
d_l_amd.o cholmod_l_analyze.o cholmod_l_colamd.o cholmod_l_etree.o cholmod_l_factorize.o cholmod_l_postorder.o cholmod_l_rcond.o cholmod_l_resymbol.o cholmod_l_r
owcolcounts.o cholmod_l_rowfac.o cholmod_l_solve.o cholmod_l_spsolve.o cholmod_l_drop.o cholmod_l_horzcat.o cholmod_l_norm.o cholmod_l_scale.o cholmod_l_sdmult.o
cholmod_l_ssmult.o cholmod_l_submatrix.o cholmod_l_vertcat.o cholmod_l_symmetry.o cholmod_l_rowadd.o cholmod_l_rowdel.o cholmod_l_updown.o cholmod_l_super_numer
ic.o cholmod_l_super_solve.o cholmod_l_super_symbolic.o
ar: cholmod_aat.o: No such file or directory
make[5]: *** [../../CHOLMOD.a] Error 1
make[5]: Leaving directory `/var/tmp/portage/dev-lang/R-2.14.1/temp/Rtmpjnb6UC/R.INSTALL7d6c18f5a9de/Matrix/src/CHOLMOD/Lib'
make[4]: *** [library] Error 2
make[4]: Leaving directory `/var/tmp/portage/dev-lang/R-2.14.1/temp/Rtmpjnb6UC/R.INSTALL7d6c18f5a9de/Matrix/src/CHOLMOD'
make[3]: *** [sublibraries] Error 1
make[3]: Leaving directory `/var/tmp/portage/dev-lang/R-2.14.1/temp/Rtmpjnb6UC/R.INSTALL7d6c18f5a9de/Matrix/src'
ERROR: compilation failed for package Matrix
* removing /var/tmp/portage/dev-lang/R-2.14.1/work/R-2.14.1/library/Matrix
make[2]: *** [Matrix.ts] Error 1
make[2]: *** Waiting for unfinished jobs....

Attached Files:

Changes

Field Old Value Date By
status_idOpen2013-08-27 06:50mmaechler
close_dateNone2013-08-27 06:50mmaechler
SeverityNone2013-01-29 21:41mmaechler
ResolutionAwaiting Response2013-01-29 21:41mmaechler
assigned_tonone2012-09-03 07:41mmaechler
ResolutionNone2012-09-03 07:41mmaechler
Thanks to:
Vienna University of Economics and Business Powered By FusionForge