Ticket #4217 - python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post here for help on using FreeCAD's graphical user interface (GUI).
Forum rules
and Helpful information
IMPORTANT: Please click here and read this first, before asking for help

Also, be nice to others! Read the FreeCAD code of conduct!
Post Reply
apelleti
Posts: 1
Joined: Fri Dec 06, 2019 1:04 pm

Ticket #4217 - python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by apelleti »

https://www.freecadweb.org/tracker/view.php?id=4217

I see ~70 occurrences in FreeCAD of a simple python coding problem that causes a big degradation in performance, described below.

Doing a python dictionary hash lookup the recommended way:

Code: Select all

# test if 'somekey' is in dictionary mydict. order-1
if "somekey" in mydict:
    do_something()
Doing a python dictionary hash lookup the wrong way:

Code: Select all

# walk the dictionary to compile an unsorted list of keys: order-N,
# then brute force search for desired key in this unsorted set: order-N
if "somekey" in mydict.keys():
    do_something()

Performance comparison:

Code: Select all

# good way:
import random
mydict={k: random.random() for k in range(200000)}
for i in mydict:
    if i not in mydict:
        print("FAIL!")

Runtime=0.07 seconds to test 200k keys exist in the dictionary of 200k entries.

Code: Select all

# bad way:
import random
mydict={k: random.random() for k in range(200000)}
for i in mydict:
    if i not in mydict.keys():
        print("FAIL!")

Runtime=11min 29sec.....10000 times slower!! :-(


You can do a simple search of the whole codebase for occurrences of this probelm using:

Code: Select all

 find . -name "*.py" | xargs egrep "if.* in .*\.keys"
Last edited by Kunda1 on Tue Jan 14, 2020 8:00 pm, edited 1 time in total.
Reason: Added ticket number to thread title
vocx
Veteran
Posts: 5197
Joined: Thu Oct 18, 2018 9:18 pm

Re: python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by vocx »

apelleti wrote: Fri Dec 06, 2019 2:35 pm ...

You can do a simple search of the whole codebase for occurrences of this probelm using:

Code: Select all

 find . -name "*.py" | xargs egrep "if.* in .*\.keys"
Premature optimization is not a good thing. Sure this could be changed but is it really a problem right now?

Also, since you seem to be interested in efficiency, I don't understand why people use xargs with find. Find can already run commands with the -exec option.

This creates a single command with many arguments, which is efficient.

Code: Select all

find . -name "*.py" -exec egrep "if.* in .*\.keys" {} +
This runs the command for each found file, so it is less efficient.

Code: Select all

find . -name "*.py" -exec egrep "if.* in .*\.keys" {} \;
Always add the important information to your posts if you need help. Also see Tutorials and Video tutorials.
To support the documentation effort, and code development, your donation is appreciated: liberapay.com/FreeCAD.
Michel59263
Posts: 16
Joined: Mon Jul 22, 2019 3:50 pm

Re: python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by Michel59263 »

Hello,

Is it realy a "premature optimisation" ?
May be it's a "bad" use of python which should be corrected and explained.

Michel
openBrain
Veteran
Posts: 9028
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by openBrain »

IMO, as this is a pure Python concern, it is easy to patch & test locally.
So one could easily find the occurrences, patch them and test if it doesn't cause troubles.
User avatar
DeepSOIC
Veteran
Posts: 7896
Joined: Fri Aug 29, 2014 12:45 am
Location: used to be Saint-Petersburg, Russia

Re: python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by DeepSOIC »

there's a remarkable number of these occurrences in importIFC. These for sure must be corrected, I'm almost certain they can become a performance bottleneck on very large projects. As for the rest, I suspect they are insignificant in their impact, but it's worth correcting them too. There are a few obvious likely-false-positives in the list in the bug tracker:

Code: Select all

./src/Mod/Material/materialtools/cardutils.py: if k in registed_cardkeys:
./src/Mod/Material/materialtools/cardutils.py: if (k not in used_and_registered_cardkeys) and (k not in used_and_not_registered_cardkeys):
The rest seem legit at first glance.
User avatar
DeepSOIC
Veteran
Posts: 7896
Joined: Fri Aug 29, 2014 12:45 am
Location: used to be Saint-Petersburg, Russia

Re: python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by DeepSOIC »

openBrain wrote: Fri Dec 13, 2019 10:16 am IMO, as this is a pure Python concern, it is easy to patch & test locally.
So one could easily find the occurrences, patch them and test if it doesn't cause troubles.
That's only for "someone" who knows freecad all through. I've looked over the list, and off the top of my head, I don't know how to trigger any single one of them.
openBrain
Veteran
Posts: 9028
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: python speedup suggestion: avoid use of 'if mykey in mydict.keys()'

Post by openBrain »

DeepSOIC wrote: Fri Dec 13, 2019 12:01 pm there's a remarkable number of these occurrences in importIFC. These for sure must be corrected, I'm almost certain they can become a performance bottleneck on very large projects.
Like this ? https://forum.freecadweb.org/viewtopic.php?f=39&t=41600 :)
There are a few obvious likely-false-positives in the list in the bug tracker:

Code: Select all

./src/Mod/Material/materialtools/cardutils.py: if k in registed_cardkeys:
./src/Mod/Material/materialtools/cardutils.py: if (k not in used_and_registered_cardkeys) and (k not in used_and_not_registered_cardkeys):
I already posted a patched version in the ticket. ;) It will prevent such false-positive.
DeepSOIC wrote: Fri Dec 13, 2019 12:04 pm That's only for "someone" who knows freecad all through. I've looked over the list, and off the top of my head, I don't know how to trigger any single one of them.
Very probably there are more time spent verifying than patching. ;)
Post Reply