Why adding multiple 'nan' in python dictionary giving multiple entries? -
example problem:
import numpy np dc = dict() dc[np.float('nan')] = 100 dc[np.float('nan')] = 200
it creating multiple entries nan
like
dc.keys()
produce {nan: 100, nan: 200}
should create {nan: 200}
.
the short answer question (of why adding nan
keys python dict
create multiple entries), because floating-point nan
values unordered, i.e. nan
value not equal to, greater than, or less anything, including itself. behavior defined in ieee 754 standard floating point arithmetic. explanation why given ieee 754 committee member in answer.
for longer, python-specific, answer, let's first have @ how item insertion , key comparison work in cpython dictionaries.
when d[key] = val
, pydict_setitem()
dictionary d
called, in turn calls (internal) insertdict()
, either update existing dictionary item, or insert new item (maybe resizing hash table consequentially).
the first step on insert lookup key
in hash table of dictionary keys. general-purpose lookup function gets called in case (of non-string keys) lookdict()
.
lookdict
use key
's hash value locate key
, iterating on list of possible keys identical hash value, comparing first address, calling key
s' equivalence operator(s) (see excellent comments in objects/dictobject.c
more details on hash collision resolution in python's implementation of open addressing).
since every float('nan')
has same hash value, yet each 1 a different object (with different "identity", i.e. memory address), , they're not equal float-value:
>>> a, b = float('nan'), float('nan') >>> hash(a), hash(b) (0, 0) >>> id(a), id(b) (94753433907296, 94753433907272) >>> == b false
when say:
d = dict() d[float('nan')] = 1 d[float('nan')] = 2
lookdict
search second nan
looking @ hash (0
), trying resolve hash collision iterating on keys same hash , comparing keys identity/address (they different), invoking (the expensive) pyobject_richcomparebool
/do_richcompare
, in turn calls float_richcompare
compares floats c does:
/* comparison pretty nightmare. when comparing float float, * straightforwardly (and long-windedly) conceivable, * that, e.g., python x == y delivers same result platform * c x == y when x and/or y nan.
which behaves according ieee 754 standard (from gnu c library docs):
20.5.2 infinity , nan
[...]
the basic operations , math functions accept infinity , nan , produce sensible output. infinities propagate through calculations 1 expect: example, 2 + ∞ = ∞, 4/∞ = 0, atan (∞) = π/2. nan, on other hand, infects calculation involves it. unless calculation produce same result no matter real value replaced nan, result nan.
in comparison operations, positive infinity larger values except , nan, , negative infinity smaller values except , nan. nan unordered: not equal to, greater than, or less anything, including itself. x == x false if value of x nan. can use test whether value nan or not, recommended way test nan isnan function (see floating point classes). in addition, <, >, <=, , >= raise exception when applied nans.
and return false
nan == nan
.
that's why python decides second nan
object worthy of new dictionary entry. may have same hash, address , equivalence test different other nan
objects.
however, note if use same nan
object (with same address) since address tested before float equivalence, you'll expected behavior:
>>> nan = float('nan') >>> d = dict() >>> d[nan] = 1 >>> d[nan] = 2 >>> d {nan: 2}
Comments
Post a Comment