Why adding multiple 'nan' in python dictionary giving multiple entries? -


example problem:

import numpy np dc = dict() dc[np.float('nan')] = 100 dc[np.float('nan')] = 200 

it creating multiple entries nan like

dc.keys() produce {nan: 100, nan: 200} should create {nan: 200}.

the short answer question (of why adding nan keys python dict create multiple entries), because floating-point nan values unordered, i.e. nan value not equal to, greater than, or less anything, including itself. behavior defined in ieee 754 standard floating point arithmetic. explanation why given ieee 754 committee member in answer.


for longer, python-specific, answer, let's first have @ how item insertion , key comparison work in cpython dictionaries.

when d[key] = val, pydict_setitem() dictionary d called, in turn calls (internal) insertdict(), either update existing dictionary item, or insert new item (maybe resizing hash table consequentially).

the first step on insert lookup key in hash table of dictionary keys. general-purpose lookup function gets called in case (of non-string keys) lookdict().

lookdict use key's hash value locate key, iterating on list of possible keys identical hash value, comparing first address, calling keys' equivalence operator(s) (see excellent comments in objects/dictobject.c more details on hash collision resolution in python's implementation of open addressing).

since every float('nan') has same hash value, yet each 1 a different object (with different "identity", i.e. memory address), , they're not equal float-value:

>>> a, b = float('nan'), float('nan') >>> hash(a), hash(b) (0, 0) >>> id(a), id(b) (94753433907296, 94753433907272) >>> == b false 

when say:

d = dict() d[float('nan')] = 1 d[float('nan')] = 2 

lookdict search second nan looking @ hash (0), trying resolve hash collision iterating on keys same hash , comparing keys identity/address (they different), invoking (the expensive) pyobject_richcomparebool/do_richcompare, in turn calls float_richcompare compares floats c does:

/* comparison pretty nightmare.  when comparing float float,  * straightforwardly (and long-windedly) conceivable,  * that, e.g., python x == y delivers same result platform  * c x == y when x and/or y nan. 

which behaves according ieee 754 standard (from gnu c library docs):

20.5.2 infinity , nan

[...]

the basic operations , math functions accept infinity , nan , produce sensible output. infinities propagate through calculations 1 expect: example, 2 + ∞ = ∞, 4/∞ = 0, atan (∞) = π/2. nan, on other hand, infects calculation involves it. unless calculation produce same result no matter real value replaced nan, result nan.

in comparison operations, positive infinity larger values except , nan, , negative infinity smaller values except , nan. nan unordered: not equal to, greater than, or less anything, including itself. x == x false if value of x nan. can use test whether value nan or not, recommended way test nan isnan function (see floating point classes). in addition, <, >, <=, , >= raise exception when applied nans.

and return false nan == nan.

that's why python decides second nan object worthy of new dictionary entry. may have same hash, address , equivalence test different other nan objects.

however, note if use same nan object (with same address) since address tested before float equivalence, you'll expected behavior:

>>> nan = float('nan') >>> d = dict() >>> d[nan] = 1 >>> d[nan] = 2 >>> d {nan: 2} 

Comments

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -