To avoid introducing 64 bit values to older branches, but to improve the situation on those too, I guess it would be easy to at least turn the value to an unsigned one?
i did some measurement with unsigned int:
desire = (unsigned int) (benefit * attack - loss * vuln) * victim_count * SHIELD_WEIGHTING / (attack + vuln * victim_count);
amount of kill_desire() calls: 20300
amount of overflows with current code: 195 (accounts for 0.96 percent of kill_desire() calls)
amount of fixed overflows with unsigned int: 113 (accounts for 57.95 percent of overflows with current code)
looks pretty good to me.
Do you plan to make a patch yourself?
not yet, because "unsigned int" is no final solution.
so you prefer to have "unsigned int" now and leave a ticket open for a real solution in the future? then i would prepare the PR/patch.
or should we instead work towards a real solution now?
Reply To mortmann
not yet, because "unsigned int" is no final solution. so you prefer to have "unsigned int" now and leave a ticket open for a real solution in the future? then i would prepare the PR/patch. or should we instead work towards a real solution now?
We want to improve situation in stable branches (S3_1 and S3_0 at least, maybe even S2_6). Not sure if "real solution" is going to be acceptable to those, so likely we should go by the two steps approach of first having "unsigned int", and later more extensive one. That also means two tickets; for one can't have ticket listed in resolved tickets in 3.0.6 release notes if it in fact isn't resolved. You can either keep this as the "complete solution" ticket (-> split 'unsigned int' part to new ticket), or the "initial solution" ticket (-> open a new ticket about further changes) - both are fine for me.
We will keep this ticket for the complete solution.
For the split ticket see https://osdn.net/projects/freeciv/ticket/46513
Next question I have (I could have a look myself, but I want opinions) is if that's something we should not be using integer types at all, but a floating point number? Like, can there be desires like 0.1 and 0.9, which current integer using approach makes 0 in both cases (like they were equal, and not one on them nine times higher than the other)
Reply To cazfi
Next question I have (I could have a look myself, but I want opinions) is if that's something we should not be using integer types at all, but a floating point number? Like, can there be desires like 0.1 and 0.9, which current integer using approach makes 0 in both cases (like they were equal, and not one on them nine times higher than the other)
Related: All the callers, except process_attacker_want(), assign value returned from this function to variable of type adv_want, and even process_attacker_want() just has the variable types wrong -> #46530
Keeping S3_0 target for now, though the mitigation patch went in on another ticket.
i vote for float it seems the simplest change, and may fix other similar issues elsewhere in the code.
All current CPU have floating point unit (since 1995 ?)
I had a lot of fun with this !
I discovered that clang can detect runtime overflow with the additional option "-fsanitize=undefined"
On my toto.c test program it gives very nice error message , without killing the program which simply continue
toto.c:78:27: runtime error: signed integer overflow: 5000000 * 1000 cannot be represented in type 'int'
And i have trouble with gcc 11.3 , because it overflows unsigned int, but not int !
I'll try to recompile with clang and note the places where overflow occurs.
I compiled with clang14
mkdir build cd build # CLANG # https://releases.llvm.org/14.0.0/tools/clang/docs/UsersManual.html export CC=clang export CPP=clang-cpp-14 export CXX=clang++ export LD=lld export CFLAGS='-g -O2 -fsanitize=undefined -Wno-cast-align -Wno-string-plus-int' export CXXFLAGS='-g -O2 -fsanitize=undefined -Wno-cast-align -Wno-string-plus-int' export LDFLAGS='-fsanitize=undefined -Wno-cast-align -Wno-string-plus-int' ../autogen.sh ../configure --enable-gitrev --enable-debug=yes --enable-gprof --enable-client=gtk3.22 --program-transform-name=s:freeciv:clangFC:g --enable-ai-static=classic
The interesting options is -fsanitize=undefined (which must be added to link options too if understood correctly)
It quickly gave int overflow in server, while playing several turns from a saved game
FC3clang-serv.log:../../common/city.c:2866:21: runtime error: signed integer overflow: -999999985 * 100 cannot be represented in type 'int' FC3clang-serv.log:../../common/city.c:2873:25: runtime error: signed integer overflow: -999999990 * 100 cannot be represented in type 'int' FC3clang-serv.log:../../common/city.c:2880:37: runtime error: signed integer overflow: -999999985 * 100 cannot be represented in type 'int' FC3clang-serv.log:> ../../../ai/default/daimilitary.c:1489:20: runtime error: signed integer overflow: 100 * 776736900 cannot be represented in type 'int' FC3clang-serv.log:> ../../../ai/default/daimilitary.c:1489:20: runtime error: signed integer overflow: 100 * 128890609 cannot be represented in type 'int'
Tip: Instead of setting those environment variables before running configure, set them as configure parameters (e.g. ../configure CC=clang-14 CXX=clang++-14 ). The benefit is that if some code change causes 'make' to reconfigure the build tree, those parameters are used also in that reconfigure.
Thanks for the tip, i have modified my script.
I launched WW1 scenario , and will let it run for several days (half an hour per turn so 50 turns per day, it uses only one core , even when i tried tex or older threaded ai but this is another topic)
I got another overflow, even with the mitigation of using unsigned int
../../../ai/default/aiunit.c:343:36: runtime error: signed integer overflow: 12260 * 4124961 cannot be represented in type 'int'
I guess we should use float for intermediate computation, and then we will see if the end result overflows.
desire = ( (float) benefit * attack - loss * vuln ) * (float) victim_count * (float) SHIELD_WEIGHTING / (attack + vuln * victim_count);
I found a gcc option : -ftrapv
It will coredump on overflow, thus easy to backtrace, and check values because the problem may be somewhere before with wrong values.
I found the values for city.c overflow very strange : -999999985 -999999990
i attached 2 patches for daimilitary and aiunit for S3_0.
I did nothing for the weird overflow at in city.c with WW1 scenario, because values seems very strange, i'll try to investigate more.
The patch fixes one problem but :
I relaunched WW1 at turn 40 , and i got :
../../../ai/default/daimilitary.c:1489:20: runtime error: 1,94115e+11 is outside the range of representable values of type 'int'
and i have similar message from a saved game (with big map and several nations very strong compared to other, so very dangerous.
../../../ai/default/daimilitary.c:1489:20: runtime error: 7,76737e+10 is outside the range of representable values of type 'int'
another overflow :
../../../ai/default/daimilitary.c:1078:21: runtime error: signed integer overflow: 486 * 4897369 cannot be represented in type 'int'
I deleted my first patch as they don't fix the overflow.
The "best" overflows i got through 38 random games are near 100 * max_int , is also the most frequent one (50 times)
1 > ../../../ai/default/daimilitary.c:1489:20: runtime error: 2.14184e+11 is outside the range of representable values of type 'int' ### ~ 100x max_int !
other are a bit less frequent
1 > ../../../ai/default/daimilitary.c:1492:22: runtime error: signed integer overflow: 100 * 28206721 cannot be represented in type 'int' 1 > ../../../ai/default/daimilitary.c:1078:21: runtime error: signed integer overflow: 758 * 8549776 cannot be represented in type 'int' 1 > ../../../ai/default/aiunit.c:343:56: runtime error: signed integer overflow: 5090 * 640000 cannot be represented in type 'int'
I'll send a patch soon for 3.0 , with a real fix if > 2.109 then =2.109 ...
and a trace in output and log to allow further analyses if needed
for this one
./ai/default/daimilitary.c:1492:22: runtime error: signed integer overflow: 100 * 28206721 cannot be represented in type 'int'I believe the problem is somewhere else The comment in the code several lines above says it is a percentage , so getting more than 109 seems very unexpected.
/* First determine the danger. It is measured in percents of our * defensive strength, capped at 200 + urgency */
several line below the is
CITY_LOG(LOG_DEBUG, pcity, "m_a_c_d urgency=%d danger=%d num_def=%d " "our_def=%d", urgency, danger, num_defenders, our_def);
How can i activate this logging ? ( i commented out line .../server/srv_log.h:88 if (log_do_output_for_level(level)) but i guess there is a cleaner way to do this )
Which branch you are testing on? Just asking as you've been running other tests on S3_0, and not on S3_1 that this ticket is targeted to (and the code changes on the area you've been quoting might make your results relevant for specific branch only - are you finding recent regressions in S3_1 or something that has been effectively fixed there)
I started from the proposed patch for 3.1 , with unsigned int instead of int, and it is not enough, it does not fix the problem
I checked for 3.0, the current latest stable and it obviously need a fix too, probably with the same patch.
I tried to stay on the same ticket, to avoid duplication and confusion
Reply To alain_bkr
I tried to stay on the same ticket
That's exactly what we want - we don't clone the tickets for each branch. I was asking just to know how to interpret your results. Thanks.
This one in S3_1 , compiled with clang
$ git log -1 commit 960c56def698ba8b88a711722d819fcbd51a528f (HEAD -> S3_1, origin/S3_1) Author: Marko Lindqvist <cazfi74@gmail.com> Date: Thu Mar 23 00:29:35 2023 +0200
> Game saved as Sz37-Ai028-Lm15-20230325-100653-T0040-Y-2050-auto.sav.xz > ../../common/city.c:2872:21: runtime error: signed integer overflow: -999999985 * 100 cannot be represented in type 'int' ../../common/city.c:2879:25: runtime error: signed integer overflow: -999999989 * 100 cannot be represented in type 'int' ../../common/city.c:2886:37: runtime error: signed integer overflow: -999999985 * 100 cannot be represented in type 'int' .../... Game saved as Sz37-Ai028-Lm15-20230325-100653-T0196-Y01450-auto.sav.xz > ../../../ai/default/daimilitary.c:1784:20: runtime error: signed integer overflow: 100 * 35414401 cannot be represented in type 'int'
Tests are running , i'll compile more results of all errors i have
3_1 , with clang
i post my test scripts in ticket #47694 ,
I have stepped on these overflows , first column is the count of occurence, i have all saved games
1 ../../../ai/default/aitools.c:378:18: runtime error: signed integer overflow 6 > ../../../ai/default/aiunit.c:344:36: runtime error: signed integer overflow 3 > ../../../ai/default/aiunit.c:344:45: runtime error: signed integer overflow 9 > ../../../ai/default/daimilitary.c:1784:20: runtime error: signed integer overflow 1 > ../../../common/aicore/path_finding.c:175:32: runtime error: signed integer overflow 6 > ../../common/city.c:2872:21: runtime error: signed integer overflow <--- these 3 follow the same pattern as for S3_0 6 ../../common/city.c:2879:25: runtime error: signed integer overflow 6 ../../common/city.c:2886:37: runtime error: signed integer overflow 1 > ../../common/city.c:3156:33: runtime error: signed integer overflow
Attached patches for the original kill_desire() part (i.e. this ticket). Have to check what else in alain_bkr's logs, and create new tickets accordingly.
Change kill_desire() to return adv_want (which is a float) that the callers expect it to be. Should fix problems in both ends of the scale -> distinction between different sub-integer values, and overflows with very high values.
Targeting S3_1 (3.1.0-beta2) for now. We may backport this to S3_0 after 3.0.7 has been released.
3.1 , the patch is not applied (tests are still running )
cat overflow.counted 1 ../../../ai/default/aitools.c:377:33: runtime error: signed integer overflow 5 ../../../ai/default/aitools.c:378:18: runtime error: signed integer overflow 1 ../../../ai/default/aitools.c:378:41: runtime error: signed integer overflow 17 ../../../ai/default/aiunit.c:344:36: runtime error: signed integer overflow 14 ../../../ai/default/aiunit.c:344:45: runtime error: signed integer overflow 4 ../../../ai/default/aiunit.c:344:52: runtime error: signed integer overflow 30 ../../../ai/default/daimilitary.c:1784:20: runtime error: signed integer overflow 3 ../../../ai/default/daimilitary.c:1787:22: runtime error: signed integer overflow 4 ../../../common/aicore/path_finding.c:175:25: runtime error: signed integer overflow 6 ../../../common/aicore/path_finding.c:175:32: runtime error: signed integer overflow 2 ../../../common/aicore/path_finding.c:175:40: runtime error: signed integer overflow 17 ../../common/city.c:2872:21: runtime error: signed integer overflow 17 ../../common/city.c:2879:25: runtime error: signed integer overflow 17 ../../common/city.c:2886:37: runtime error: signed integer overflow 4 ../../common/city.c:3156:33: runtime error: signed integer overflow
I attach the list current of all overflows seen in my tests, it looks like
Game saved as Sz02-Ai002-Lm26-20230326-191857-T0052-Y-3949-auto.sav.xz > ../../../ai/default/daimilitary.c:1784:20: runtime error: signed integer overflow: 100 * 41434969 cannot be represented in type 'int'
I have not yet tested if i can reproduce the overflow by loading the game ...
Reply To alain_bkr
3.1 , the patch is not applied (tests are still running )
I guess you had AI at commit 4363db7ef06d5d768bf2236c057f0c5322cbd701 - can't guess the overall commit, but the AI side has not received many commits recently.
Reply To alain_bkr
3.1 , the patch is not applied (tests are still running )
17 ../../../ai/default/aiunit.c:344:36: runtime error: signed integer overflow
14 ../../../ai/default/aiunit.c:344:45: runtime error: signed integer overflow
4 ../../../ai/default/aiunit.c:344:52: runtime error: signed integer overflow
This is the kill_desire(), and the line numbers show the weakness of my patch (it won't help) - the calculations is still being made in integer math even when we will assign the result to a float.
Reply To alain_bkr
1 ../../../ai/default/aitools.c:377:33: runtime error: signed integer overflow
5 ../../../ai/default/aitools.c:378:18: runtime error: signed integer overflow
1 ../../../ai/default/aitools.c:378:41: runtime error: signed integer overflow
-> #47746
17 ../../../ai/default/aiunit.c:344:36: runtime error: signed integer overflow
14 ../../../ai/default/aiunit.c:344:45: runtime error: signed integer overflow
4 ../../../ai/default/aiunit.c:344:52: runtime error: signed integer overflow
This ticket
30 ../../../ai/default/daimilitary.c:1784:20: runtime error: signed integer overflow
3 ../../../ai/default/daimilitary.c:1787:22: runtime error: signed integer overflow
-> #47747
4 ../../../common/aicore/path_finding.c:175:25: runtime error: signed integer overflow
6 ../../../common/aicore/path_finding.c:175:32: runtime error: signed integer overflow
2 ../../../common/aicore/path_finding.c:175:40: runtime error: signed integer overflow
No ticket about exactly these yet, but #47731 touches the area, and is assumed to end up as a baseline for further fixes.
17 ../../common/city.c:2872:21: runtime error: signed integer overflow
17 ../../common/city.c:2879:25: runtime error: signed integer overflow
17 ../../common/city.c:2886:37: runtime error: signed integer overflow
4 ../../common/city.c:3156:33: runtime error: signed integer overflow
-> #47748
With the now attached patches the calculation should happen with floats from the beginning. Also fixes some accuracy losing float -> int -> float conversions.
Backport to S3_0 attached. I guess that the form this took in the end is sufficient also to that branch, and given effects outside kill_desire() itself even should be included as a bug fix.
Pushed to later branches, S3_0 waiting for 3.0.7 release to get out of the way.
Another example of why we should use long ints for move rate stuff.
AI desire to kill likely weakened by integer overflow in ai/default/aiunit.c:kill_desire()
Depending on input values, instead of returning a big positive value for desire to kill, it returns a negative one.
Even if freeciv is compiled on 64 bit systems, the integer calculation is performed 32 bit.
Random example from a real game:
input values for kill_desire():
Current formula:
desire = ((benefit * attack - loss * vuln) * victim_count * SHIELD_WEIGHTING / (attack + vuln * victim_count));
result: desire = -40270
Ive got 2 ideas how so solve it:
1. Mitigate the overflow, cap the value of (benefit * attack - loss * vuln) * victim_count so that the overflow wont happen, like:
result: desire = 48770
Theoretically the overflow could still happen before multiplying with SHIELD_WEIGHTING, but it would be an improvement over the more easily triggered overflows happening right now. The cases ive seen in real game, would be mitigated.
2. Fix the overflow for systems, that have long to be 64 bit:
desire = (long) (benefit * attack - loss * vuln) * victim_count * SHIELD_WEIGHTING / (attack + vuln * victim_count);
result: desire = 57271
I quickly went over the source code, and i dont see any 64 bit stuff happening. Also i dont know how you handle portability and such in freeciv. Thats why i didnt propose the use of int64_t here. Can you find better solutions?