浮点数比较

2025-11-12 23:58:10

【转】如何比较两个浮点数

https://bot-man-jl.github.io/articles/?post=2020/Comparing-Floating-Point-Numbers

比较两个浮点数是否相等并不是一个简单的问题：

由于浮点数的精度误差，一般不能使用绝对相等比较；基于绝对误差的近似相等比较要求使用者对允许的误差范围有明确的预期，并不通用；基于相对误差的近似相等比较看似通用，但又可能会在 0 附近 “栽跟头”。。。

本文主要参考了 “浮点数专家” Bruce Dawson 写的 Comparing Floating Point Numbers, 2012 Edition，也推荐他关于浮点数的其他文章（最近还一直在更新 👍）。

本文提到的 Equals() / AlmostEqualsAbs() / AlmostEqualsRel() / AlmostEqualsUlp() 函数的具体实现参考 almost_equals.h（在线演示）👈

绝对相等？不靠谱

比较两个数是否相等，最直观的比较方法就是直接使用 == 进行比较：

f1 == f2 // Equals()

根据 IEEE 754 浮点数标准，浮点数 \pm p \cdot b^e±p⋅be（其中 bb 为固定的基数，\pm± 为符号位，pp 为尾数，ee 为指数）表示为符号位-指数-尾数形式：

十进制浮点数（b == 10）就是数学中常用的科学计数法 (scientific notation)二进制浮点数（b = 2）常用于计算机中的表示和存储，其中单精度 (single-precision) 对应 32 位，双精度 (double-precision) 对应 64 位

在 C++ 中，0.1 属于双精度浮点数，0.1f 属于单精度浮点数。

为什么计算机使用二进制浮点数？因为基于现有的 CPU 架构，二进制运算效率更高。

虽然在数学上，实数是连续的 —— 任意十进制数和二进制数可以相互转换，它们的浮点数表示形式也可以相互转换：

十进制定点数十进制浮点数二进制定点数二进制浮点数0.1251.25 \times 10^{-1}1.25×10−10.001(2^{0} + 0) \times 2^{-3}20+0)×2−30.11.0 \times 10^{-1}1.0×10−10.000110011... （0011 循环）(2^{0} + 2^{-1} + 2^{-4} + 2^{-5} + ...) \times 2^{-4}(20+2−1+2−4+2−5+...)×2−4

但是在计算机中，数值是离散的 —— 有限精度的二进制浮点数不能准确表示所有的十进制数（在线转换工具）：

十进制数 0.1 对应的二进制浮点数的尾数 pp 是 1.100110011...（0011 循环）由于 32 位单精度浮点数的尾数 pp 只有 23 位，只能保留小数点后 23 位（即 1.10011001100110011001101）所以，近似表示的二进制数 0.000110011001100110011001101 约等于十进制数 0.10000000149，出现误差

存在问题 —— 有限精度的浮点数转换近似表示会带来误差，而浮点数运算则会放大误差，所以判断两个浮点数（其中至少一个是运算结果）的绝对相等往往是不可靠的：

float sum = 0.0f;

for (int i = 0; i < 10; ++i) { sum += 0.1f; }

assert(Equals(sum, 1.0f) == false); // expect true

为此，编译器会警告浮点数的 ==/!= 比较：

warning: comparing floating point with == or != is unsafe [-Wfloat-equal]

解决办法 —— 由于误差的存在，只能比较两个浮点数是否近似相等：

assert(AlmostEqualsAbs(sum, 1.0f) == true);

assert(AlmostEqualsRel(sum, 1.0f) == true);

assert(AlmostEqualsUlp(sum, 1.0f) == true);

近似相等 —— 绝对误差

最简单的比较方法是直接比较两数差值的绝对值是否小于一个允许的误差值 epsilon：

fabs(f1 - f2) <= epsilon // AlmostEqualsAbs()

其中，epsilon 常用 FLT_EPSILON / DBL_EPSILON，表示在单精度 1.0f / 双精度 1.0 附近的最小误差。

存在问题 —— 由于绝对误差是一个固定值，不一定适用于任意浮点数的比较：

assert(AlmostEqualsAbs(67329.234f, 67329.242f) == false); // expect true

assert(AlmostEqualsAbs(1.2e-32f, 2.4e-32f) == true); // expect false

当比较的两个数远大于 1 时，FLT_EPSILON/DBL_EPSILON 允许的误差过小（例如，虽然 67329.234f 产生误差后会被近似表示为最近的下一个浮点数 67329.242f，但差值远大于 FLT_EPSILON，所以被误判为不相等 🙃）当比较的两个数远小于 1 时，FLT_EPSILON/DBL_EPSILON 允许的误差过大（例如，尽管 1.2e-32f 和 2.4e-32f 相差了一倍，应该被认为不相等，但差值远小于 FLT_EPSILON，所以被误判为近似相等 🙃）

解决办法 —— 需要借助相对误差弥补上述局限性：

assert(AlmostEqualsRel(67329.234f, 67329.242f) == true);

assert(AlmostEqualsUlp(67329.234f, 67329.242f) == true);

assert(AlmostEqualsRel(1.2e-32f, 2.4e-32f) == false);

assert(AlmostEqualsUlp(1.2e-32f, 2.4e-32f) == false);

近似相等 —— 相对误差

a) 一种比较方法是把绝对的 epsilon 放缩到待比较数的量级 (magnitude)，即 epsilon 乘以两数绝对值的较大值：

fabs(f1 - f2) <= epsilon * std::max(fabs(f1), fabs(f2)) // AlmostEqualsRel()

然而，这种比较方法：

一方面可能下溢 (underflow) —— 如果两数的绝对值都很小，那么乘上 epsilon 后可能得到 0.0，导致判断失败另一方面并不高效 —— 需要进行多次浮点数运算（除了求绝对值外，两次比较、一次乘法、一次减法）

所以，有没有更好的比较方法呢？

b) 另一种方法是基于浮点数二进制表示的 ULP (unit in the last place) 进行比较：

根据 IEEE 754 标准，相邻两个浮点数的二进制表示恰好也是相邻的 —— 相邻两个浮点数之间的距离就是 “它们的 ULP”（定义参考：What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg）例如，距离 1.0f 最近的下一个浮点数是 nextafter(1.0f)，恰好等于 1.0f + FLT_EPSILON，所以 ULP(1.0f) 等于 FLT_EPSILON（即 2^{-23}2−23）：

十进制定点数二进制浮点数二进制表示1.0f(2^{0} + 0) \times 2^{0}(20+0)×200 01111111 00000000000000000000000nextafter(1.0f)(2^{0} + 2^{-23}) \times 2^{0}(20+2−23)×200 01111111 00000000000000000000001

直观上，比较方法很简单 —— 比较浮点数的二进制表示的差值的绝对值是否小于一个允许的 ULP 误差 max_ulp：

abs(Biased(Bits(f1)) - Biased(Bits(f2))) <= max_ulp // AlmostEqualsUlp()

首先，将两个单精度/双精度浮点数直接按位转成 32/64 位整数然后，由于这两个整数表示为符号位-数值 (sign-magnitude) 格式，导致 +0/-0 的表示形式不一致，所以需要转换为连续范围上的无符号整数最后，再比较两个无符号整数差值的绝对值是否小于 max_ulp

相对于放缩 epsilon 的方法，这种比较方法：

一方面没有浮点数乘法的下溢问题另一方面在多数现代处理器上，运算效率更高（计算次数更少，只需要进行整数运算，不涉及任何浮点数运算）

根据 gtest-internal.h，一般 max_ulp 取值为 4：

The maximum error of a single floating-point operation is 0.5 units in the last place. On Intel CPU's, all floating-point calculations are done with 80-bit precision, while double has 64 bits. Therefore, 4 should be enough for ordinary use.

另外，在实际代码中，还需要处理 +∞/-∞ 和 NaN (not a number) 等情况。（测试用例参考 gtest_unittest.cc）

再举个例子，4.0f 和它的下一个浮点数 nextafter(4.0f) 之间相差了 4 倍的 FLT_EPSILON（即 ULP(4.0f) == 4 FLT_EPSILON）：

十进制定点数二进制浮点数二进制表示4.0f(2^{0} + 0) \times 2^{2}(20+0)×220 10000001 00000000000000000000000nextafter(4.0f)(2^{0} + 2^{-23}) \times 2^{2}(20+2−23)×220 10000001 00000000000000000000001

所以，无法使用绝对误差判断 4.0f 和 nextafter(4.0f) 是否近似相等，只能使用相对误差：

const float next_after_4 = nextafter(4.0f, FLT_MAX);

assert(AlmostEqualsAbs(4.0f, next_after_4) == false); // expect true

assert(AlmostEqualsRel(4.0f, next_after_4) == true);

assert(AlmostEqualsUlp(4.0f, next_after_4) == true);

存在问题 —— 如果需要比较的数值非常接近 0，就无法使用相对误差和 0 比较：

assert(AlmostEqualsRel(FLT_EPSILON, 0.0f) == false); // expect true

assert(AlmostEqualsUlp(FLT_EPSILON, 0.0f) == false); // expect true

虽然 nextafter(1.0f) 和 1.0f 相差 FLT_EPSILON，两者近似相等但是 FLT_EPSILON（即 nextafter(1.0f) - 1.0f）和 0 相比，并不近似相等 🙃

即使放大允许的误差范围，也不一定可行 —— 因为和 0 比起来，“相对误差” 还是太大了 —— 在这种情况下，基于相对误差的比较方法往往是没有实际意义的：

assert(AlmostEqualsRel(FLT_EPSILON, 0.0f, 0.1f) == false); // expect true

assert(AlmostEqualsUlp(FLT_EPSILON, 0.0f, 1000) == false); // expect true

解决办法 —— 只能使用绝对误差进行 “有意义” 的比较：

assert(AlmostEqualsAbs(FLT_EPSILON, 0.0f) == true);

常量的陷阱

需要注意，从低精度转为高精度（从 float 转为 double）会导致 “巨大” 的误差：

assert(Equals(0.1f, 0.1) == false); // expect true

assert(AlmostEqualsAbs(0.1f, 0.1) == false); // expect true

assert(AlmostEqualsRel(0.1f, 0.1) == false); // expect true

assert(AlmostEqualsUlp(0.1f, 0.1) == false); // expect true

为此，可以通过放大允许的误差，缓解精度丢失的问题：

assert(AlmostEqualsAbs(0.1f, 0.1, static_cast(FLT_EPSILON)) ==

true);

assert(AlmostEqualsRel(0.1f, 0.1, static_cast(FLT_EPSILON)) ==

true);

assert(AlmostEqualsUlp(0.1f, 0.1, FLT_EPSILON / DBL_EPSILON) == true);

相反，从高精度转为低精度（从 double 转为 float）则会抹除细微的误差：

assert(Equals(static_cast(0.1), 0.1f) == true);

另外，由于精度的限制，两个非常接近（相差 0 ULP）的十进制定点数字面量 (literal)，只能表示为相同的二进制浮点数：

assert(Equals(67329.234f, 67329.235f) == true); // expect false

写在最后

Bruce Dawson 的总结是 “没有银弹” —— 在比较两个浮点数是否近似相等时，需要根据具体场景，选择更合适的比较方法：

如果要判断一个浮点数是否接近 0

应该使用绝对误差因为此时的相对误差没有实际意义如果要比较的两个浮点数都不是 0

一般使用更通用的相对误差但如果能提前判断允许的误差范围，也可以使用绝对误差所以，并不存在判断任意两个浮点数的通用方法 🙃

现实世界也一样 ——

对于有钱的人来说，AlmostEquals(百元钞票, 0) == true对于没钱的人来说，AlmostEquals(一元硬币, 0) == false

最后，今年双十一将抽取一名在下方留言的幸运读者，送出一张玛莎拉蒂 5 元代金券 🙃

（图片来自网络）

如果有什么问题，欢迎交流。😄

// https://bot-man-jl.github.io/articles/?post=2020/Comparing-Floating-Point-Numbers

#include

template

bool Equals(Flp f1, Flp f2) {

return f1 == f2;

}

template

bool AlmostEqualsAbs(Flp f1,

Flp f2,

Flp epsilon = std::numeric_limits::epsilon()) {

return fabs(f1 - f2) <= epsilon;

}

template

bool AlmostEqualsRel(Flp f1,

Flp f2,

Flp epsilon = std::numeric_limits::epsilon()) {

return fabs(f1 - f2) <= epsilon * std::max(fabs(f1), fabs(f2));

}

namespace detail {

// Reference:

// https://github.com/google/googletest/blob/master/googletest/include/gtest/internal/gtest-internal.h

template

class FloatingPoint {

public:

FloatingPoint(const Flp& x, size_t max_ulps = 4) : max_ulps_(max_ulps) {

u_.value_ = x;

}

// - returns false if either number is (or both are) NAN.

// - treats really large numbers as almost equal to infinity.

// - thinks +0.0 and -0.0 are 0 DLP's apart.

bool AlmostEquals(const FloatingPoint& rhs) const {

if (is_nan() || rhs.is_nan())

return false;

const Bits biased1 = SignAndMagnitudeToBiased();

const Bits biased2 = rhs.SignAndMagnitudeToBiased();

return (std::max(biased1, biased2) - std::min(biased1, biased2)) <=

max_ulps_;

}

private:

using Bits = std::conditional_t<

sizeof(Flp) == 4,

uint32_t,

std::conditional_t>;

static_assert(sizeof(Bits) == 4 || sizeof(Bits) == 8, "only float or double");

static constexpr size_t kBitCount = 8 * sizeof(Flp);

static constexpr size_t kFractionBitCount =

std::numeric_limits::digits - 1;

static constexpr size_t kExponentBitCount = kBitCount - 1 - kFractionBitCount;

static constexpr Bits kSignBitMask = static_cast(1) << (kBitCount - 1);

static constexpr Bits kFractionBitMask = ~static_cast(0) >>

(kExponentBitCount + 1);

static constexpr Bits kExponentBitMask = ~(kSignBitMask | kFractionBitMask);

bool is_nan() const {

// It's a NAN if the exponent bits are all ones and the fraction

// bits are not entirely zeros.

return ((kExponentBitMask & u_.bits_) == kExponentBitMask) &&

((kFractionBitMask & u_.bits_) != 0);

}

// Converts an integer from the sign-and-magnitude representation to

// the biased representation. More precisely, let N be 2 to the

// power of (kBitCount - 1), an integer x is represented by the

// unsigned number x + N.

// For instance,

// -N + 1 (the most negative number representable using

// sign-and-magnitude) is represented by 1;

// 0 is represented by N; and

// N - 1 (the biggest number representable using

// sign-and-magnitude) is represented by 2N - 1.

// Read http://en.wikipedia.org/wiki/Signed_number_representations

// for more details on signed number representations.

Bits SignAndMagnitudeToBiased() const {

return (kSignBitMask & u_.bits_)

? (~u_.bits_ + 1) // represents a negative number.

: (kSignBitMask | u_.bits_); // represents a positive number.

}

union FloatingPointUnion {

Flp value_;

Bits bits_;

};

FloatingPointUnion u_;

size_t max_ulps_ = 0;

};

} // namespace detail

template

bool AlmostEqualsUlp(Flp lhs, Flp rhs, size_t max_ulps = 4) {

using FloatingPoint = detail::FloatingPoint;

return FloatingPoint(lhs, max_ulps).AlmostEquals(FloatingPoint(rhs));

}

"如果要判断一个浮点数是否接近 0 应该使用绝对误差因为此时的相对误差没有实际意义" 猜测“因为此时的相对误差没有实际意义”中，“没有实际意义”的原因：是不是因为“相对误差”中的“相对”其实是不成立的？因为，0与任何数的减法都是这个数它自己，这个过程没有产生出“相对”，所以，“没有实际意义”。任何数“相对”于0的失效。（包含0自己吗？）

魔兽世界获取飞行能力的条件是什么飞行技能怎么学习
韩国 2025-2026 赛季足球衫

2010世界杯主题曲_世界杯非洲预选赛 - fsyxyy.com

浮点数比较

最新发表

友情链接