Integer alignment assumption

I've found a big problem in the Standard package that can cause segmentation faults on many machines/compilers. The problem is that many of the functions assume that character strings will be aligned on integer boundaries, and this is not always the case. I think this is the reason that Sun's newer compilers don't work with Open CASCADE. From looking at this and other code, it also appears that the two's compliment representation of signed integer types is assumed. This is another problem, but perhaps less of one. I've attached an example of some of the code from Standard_CString.cxx that suffers from this incorrect assumption, as well as code that does the right thing. These assumptions are valid under Linux, but not in general. I found this by attempting to run Open CASCADE under various compilers and machines and using a debugger to step into code when stack alignment or segmentation faults occurred.

//============================================================================ //==== HashCode of a CString with discard of bit 5 (UpperCaseLowerCase) // Efficient for Types and MethodNames (without copy of characters) // Valid if we have only alphanumeric characters and "_" (unicity) // Valid if the Strings address is aligned for Integers //============================================================================ Standard_Integer HASHCODES (const Standard_CString Value ,

const Standard_Integer Len ) { Standard_Integer aHashCode = 0 ; Standard_Integer i = 0 ; Standard_Integer itmp = 0 ;

if (Value != NULL) { #ifdef ALIGNMENT_BUG

for ( i = 1 ; i > 2 ; i++ ) {

aHashCode = aHashCode ^

( ((Standard_Integer *) Value ) [ i - 1 ] &

Standard_Mask_Upper_Lower[4] ) ;

}

aHashCode = aHashCode ^

( ((Standard_Integer *) Value ) [ i - 1 ] &

Standard_Mask_Upper_Lower[ Len & 3 ] ) ; #else

for ( i = 0 ; i

memcpy(&itmp,(const void *)&Value[i],4);

aHashCode=aHashCode^(itmp&Standard_Mask_Upper_Lower[4]);

}

if (Len&3) {

memcpy(&itmp,(const void *)&Value[i],Len&3);

aHashCode=aHashCode^(itmp&Standard_Mask_Upper_Lower[Len&3]);

} #endif

} return aHashCode ; }

//============================================================================ // IsEqual : Returns Standard_True if two CString have the same value // Comparison is done with discard of bit 5 (UpperCaseLowerCase) // Efficient for Types and MethodNames (without copy of characters) // Valid if we have only alphanumeric characters and "_" (unicity) // Valid if the Strings address are aligned for Integers

//============================================================================ Standard_Boolean ISSIMILAR(const Standard_CString One ,

const Standard_Integer LenOne ,

const Standard_CString Two ) { Standard_Integer i, iOne, iTwo ;

#ifdef ALIGNMENT_BUG for ( i = 1 ; i > 2 ; i++ ) {

if ( (((Standard_Integer *) One ) [ i - 1 ] &

Standard_Mask_Upper_Lower[ 4 ] ) !=

(((Standard_Integer *) Two ) [ i - 1 ] &

Standard_Mask_Upper_Lower[ 4 ] ) )

return Standard_False ; } if ( (((Standard_Integer *) One ) [ i - 1 ] &

Standard_Mask_Upper_Lower[ LenOne & 3 ] ) !=

(((Standard_Integer *) Two ) [ i - 1 ] &

Standard_Mask_Upper_Lower[ LenOne & 3 ] ) )

return Standard_False ; #else for ( i = 0; i

memcpy(&iOne,(const void *)&One[i],4);

memcpy(&iTwo,(const void *)&Two[i],4);

if ((iOne&Standard_Mask_Upper_Lower[4] ) !=

(iTwo&Standard_Mask_Upper_Lower[4]))

return Standard_False; } if(LenOne&3) {

memcpy(&iOne,(const void *)&One[i],4);

memcpy(&iTwo,(const void *)&Two[i],4);

if ( (iOne&Standard_Mask_Upper_Lower[LenOne&3]) !=

(iTwo&Standard_Mask_Upper_Lower[LenOne&3]))

return Standard_False; } #endif return Standard_True ; }

Jean Rahuel's picture

As noted in comments HASHCODES and ISSIMILAR are valid only if strings are integer aligned. Theese methods are called only in the map of types used by the methods' tree for Frontal-Engine architecture that is not in OpenCasCade. In this case, strings are allocated with the memory manager of Standard and are long aligned. That was done for optimization : until to /100 under some circomstances ...

As to integer with two complement (without minus zero), it is true (according to IEEE ?).

Best Regards,

Jean Rahuel

Robert Boehne's picture

Jean:

I disagree that these strings are integer aligned. Perhaps you could point out the code that insures this? I found this problem by degugging a core dump, and I think it is a general portability problem. If a compiler is used that doesn't guarantee integer alignment of strings, you're likely to core dump in this method.