How to cope with different precisions on different systems

Benjamin Bihler
Fri, 03/27/2015 - 20:07

Forums:

Hi,

my question is very general, I know that. But since this happened to me when using OCC and since it seems to be in inherent problem of doing computations with geometry, I dare to ask it here:

How can one cope with different computation precisions on different systems?

Let's assume that I have to curves and I compute whether they intersect. If they intersect, an intersection point is created and the user may define the color of the intersection point. I store the curves and the user defined color in a document file. Now the user opens the document file with the same application, but on another system (like the first one being Windows 64 Bit and the second one being Linux 32 Bit). Because the systems are different, the computations happen with different precisions. Perhaps it seems on the Linux system that the curves do not intersect at all. When the application reads in the document, it has to work with data that it cannot interpret, since the data seems wrong.

Isn't this something that happens for every CAD system?

My first idea was to introduce some tolerance parameter. But this won't help. After all an intersection is only found, if two curves are close to each other within some tolerance. So if their real distance is very close to the tolerance parameter, two systems might have different computation results.

Have you ever met a problem like this? What was your solution?

Thank you very much.
Benjamin

Rodrigo Castro ...
Sat, 03/28/2015 - 21:20

Hello Benjamin,

As I understand it, 32-bit systems use memory locations to store the results of large calculations (i.e. double-precision floating point), somewhat compensating for what a 64-bit system would be able to do. So it seems strange to me that this is causing your problem. But then again, it's your application, so I assume you've done some checking before posting this.
OCC has many classes for doing the same things, and I believe such is the case for curve intersection. Which one are you using? Perhaps checking the source code could help you.
Finally, if you are right and the different systems are causing your problem, you won't have a choice: you will have to introduce a tolerance parameter. That said, I believe there is a way to let your app know if it's running on a 32-bit or a 64-bit system (using pre-processor definitions, I believe). Then you could use a certain tolerance for a 32-bit system, and another for a 64-bit system.
I hope this helps, though it's quite possible it has not.

Good luck,
Rodrigo

Cauchy Ding
Sun, 03/29/2015 - 08:38

I guess it's non-related with the system, but the tolerance stored in your OCC shape. If you can make an intersection operation successfully in your Windows, but failed in your Linux, please make sure the tolerance of each vertex/edge are the same in two different system. My suggestion is to output the tolerance to file and compared it.

Ding

Benjamin Bihler
Mon, 03/30/2015 - 11:31

Thank you both for your answers. I guess I have chosen a bad example when I started talking about interesections. I wanted to point out that sometimes you have to extract discrete values (number of intersections) from floating-point data and even though the floating-point data differs very little, your discrete values may differ a lot (zero intersections vs. one intersection).

Actually my problem is not an intersection problem. I have points in a certain interval on a curve and I have to sort out points that are close to each other. When I compare their distances, some systems sort out more points than the other and then the point numbers differ. But I had the feeling that this might be a general problem that could happen in many cases, therefore I wanted to give an example that is general and simple.

I have another suspect: it might be that this could also happen on the same system. When I do my computations and store the input data for the computation in an OCAF binary file, it might be that after retrieving floating-point numbers they have changed a little bit, because the OCAF binary file stores them with another precision compared to the computer memory. But I am still doing research on that.

Up to now I had the idea that when restoring my OCAF file, I could give the number of expected points as an input to the computation algorithm. Then the algorithm could accept even points that seem "too close to each other". The same could be done in the intersection example. If I want to force a method to find three intersections, it could just take the three points where the curves come closest to each other - no matter whether these "close points" seem to be intersections or not.

If there are other ideas, I really would be eager to learn them.

Benjamin Bihler
Mon, 03/30/2015 - 13:40

My suspect was wrong: up to now I have not found any example where the computation results differ after storing computation input data to a file and retrieving it again.

So my precision problem happens only during point computations on Windows-64-Bit and Linux-32-Bit-Systems. There two points have one time a distance of 24.999999999999999999999mm and the other time a distance of 25.0mm (estimations). And since two points may not be closer than 25mm, on one system a point is sorted out and on the other it is not.

Benjamin Bihler
Fri, 02/26/2016 - 13:06

For the record: the problem is even worse! It seems that even on the same system I can get reproducibly different computation results depending on whether some algorithms have been run serial or parallel (with std::async). The difference is of course extremely low (around 1.0e-12), but it is there. And as soon as I extract discrete values from the floating point data (like computing the result point that is closest to some geometry), these values may differ greatly. It may happen for example, that two points had equal distance in the first calculation, but in the second calculation one point is closer to the geometry by 1.0e-12 millimeter and therefore the algorithm takes another point and does something completely different.

I still wonder, why it seems as if I were the only one fighting with such a problem. One reason could be, that I do not save all computation results in my document format, but I rely on that I can reproduce the computation results when opening the document later. Sometimes this fails because of the precision problem.

There has already been some progress in trying to stabilize computations. Introducing tolerances helps a little bit ("two points are considered equal, if they are closer than Precision::Confusion() and in that case always the first point is considered to be the closest one"). Still I have the feeling that it just shifts the problem: if two points have a distance of Precision::Confusion(), then - depending on the computation precision - they are sometimes considered as equal and sometimes not. :-(

If I got some more input how others deal with that problem, this would be great. Otherwise it seems advisable to not rely on computation results to be reproducible... or find a way to deal with minor floating point number differences in a stable way... I am still searching for something like that.

Benjamin Bihler
Mon, 02/29/2016 - 15:45

Update:

I have done more research. If I have understood the topic correctly, the results of floating-point operations are not deterministic, if they happen on different threads (and therefore in different processor registers), because some registers offer excessive precision which may influence the results of the operations.

It seems as if I can provoke that here, if I don't call a method as usual, but if I run it with

std::future<ReturnType> result = std::async(std::launch::async, method, ...);

result.wait()

Because of the wait call, it is actually not running parallel, but still on another thread. If I print the floating-point numbers appearing within that method, they differ slightly.

One way to make floating-point operations deterministic seems to be to truncate all floating-point numbers to double precision, even if they are stored in registers offering excessive precision. The MSVC compiler flag /fp:precise seems to do that. Isn't it a default flag in OCCT? Might this be the reason why only I seem to be struggling with non-deterministic behaviour? I am using MinGW and up to now I haven't been using floating-point precision control compiler flags.

Currently I am playing with such flags. On many forums they propose to use -ffloat-store with g++, but still I see non-deterministic behaviour. Therefore I am continuing with proposals from https://gcc.gnu.org/wiki/FloatingPointMath. I will post my results here. If anyone else has already solved that problem with g++, I would greatly appreciate hints about how he has done that.

Benjamin Bihler
Thu, 03/03/2016 - 16:07

The described behaviour has nothing to do with Open CASCADE. It seems to be a speciality (bug?) of MinGW (my bug report is here: https://sourceforge.net/p/mingw-w64/bugs/531/). I am sorry for having polluted the forum.

Benjamin Bihler
Tue, 08/09/2016 - 11:23

Again for the records: it is not a bug of MinGW, it is a feature! ;-)

Extract from the documentation in one of the MinGW float.h files:

________________________________________________________

   MSVCRT.dll _fpreset initializes the control register to 0x27f,
   the status register to zero and the tag word to 0FFFFh.
   This differs from asm instruction finit/fninit which set control
   word to 0x37f (64 bit mantissa precison rather than 53 bit).
   By default, the mingw version of _fpreset sets fp control as
   per fninit. To use the MSVCRT.dll _fpreset, include CRT_fp8.o when
   building your application.

________________________________________________________

Linking the application with CRT_fp8.o seems to solve the problems mentioned before.