Why adding multiple 'nan' in python dictionary giving multiple entries? -

March 15, 2010

example problem:

import numpy np dc = dict() dc[np.float('nan')] = 100 dc[np.float('nan')] = 200

it creating multiple entries nan like

dc.keys() produce {nan: 100, nan: 200} should create {nan: 200}.

the short answer question (of why adding nan keys python dict create multiple entries), because floating-point nan values unordered, i.e. nan value not equal to, greater than, or less anything, including itself. behavior defined in ieee 754 standard floating point arithmetic. explanation why given ieee 754 committee member in answer.

for longer, python-specific, answer, let's first have @ how item insertion , key comparison work in cpython dictionaries.

when d[key] = val, pydict_setitem() dictionary d called, in turn calls (internal) insertdict(), either update existing dictionary item, or insert new item (maybe resizing hash table consequentially).

the first step on insert lookup key in hash table of dictionary keys. general-purpose lookup function gets called in case (of non-string keys) lookdict().

lookdict use key's hash value locate key, iterating on list of possible keys identical hash value, comparing first address, calling keys' equivalence operator(s) (see excellent comments in objects/dictobject.c more details on hash collision resolution in python's implementation of open addressing).

since every float('nan') has same hash value, yet each 1 a different object (with different "identity", i.e. memory address), , they're not equal float-value:

>>> a, b = float('nan'), float('nan') >>> hash(a), hash(b) (0, 0) >>> id(a), id(b) (94753433907296, 94753433907272) >>> == b false

when say:

d = dict() d[float('nan')] = 1 d[float('nan')] = 2

lookdict search second nan looking @ hash (0), trying resolve hash collision iterating on keys same hash , comparing keys identity/address (they different), invoking (the expensive) pyobject_richcomparebool/do_richcompare, in turn calls float_richcompare compares floats c does:

/* comparison pretty nightmare.  when comparing float float,  * straightforwardly (and long-windedly) conceivable,  * that, e.g., python x == y delivers same result platform  * c x == y when x and/or y nan.

which behaves according ieee 754 standard (from gnu c library docs):

20.5.2 infinity , nan

[...]

the basic operations , math functions accept infinity , nan , produce sensible output. infinities propagate through calculations 1 expect: example, 2 + ∞ = ∞, 4/∞ = 0, atan (∞) = π/2. nan, on other hand, infects calculation involves it. unless calculation produce same result no matter real value replaced nan, result nan.

in comparison operations, positive infinity larger values except , nan, , negative infinity smaller values except , nan. nan unordered: not equal to, greater than, or less anything, including itself. x == x false if value of x nan. can use test whether value nan or not, recommended way test nan isnan function (see floating point classes). in addition, <, >, <=, , >= raise exception when applied nans.

and return false nan == nan.

that's why python decides second nan object worthy of new dictionary entry. may have same hash, address , equivalence test different other nan objects.

however, note if use same nan object (with same address) since address tested before float equivalence, you'll expected behavior:

>>> nan = float('nan') >>> d = dict() >>> d[nan] = 1 >>> d[nan] = 2 >>> d {nan: 2}

Search This Blog

TY

Why adding multiple 'nan' in python dictionary giving multiple entries? -

Comments

Post a Comment

Popular posts from this blog

android - IllegalStateException: Cannot call this method while RecyclerView is computing a layout or scrolling -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -