Python Performance: Why 'if not list' is 2x Faster Than Using len()

abhi9u@lemmy.world · 4 months ago

Python Performance: Why 'if not list' is 2x Faster Than Using len()

thebestaquaman@lemmy.world · 4 months ago

I write a lot of Python. I hate it when people use “X is more pythonic” as some kind of argument for what is a better solution to a problem. I also have a hang up with people acting like python has any form of type safety, instead of just embracing duck typing.This lands us at the following:

The article states that “you can check a list for emptiness in two ways: if not mylist or if len(mylist) == 0”. Already here, a fundamental mistake has been made: You don’t know (and shouldn’t care) whether mylist is a list. These two checks are not different ways of doing the same thing, but two different checks altogether. The first checks whether the object is “falsey” and the second checks whether the object has a well defined length that is zero. These are two completely different checks, which often (but far from always) overlap. Embrace the duck type- type safe python is a myth.

iAvicenna@lemmy.world · edit-2 4 months ago

isn’t the expected behaviour exactly identical on any object that has len defined:

“By default, an object is considered true unless its class defines either a bool() method that returns False or a len() method that returns zero, when called with the object.”

ps: well your objection is I guess that we cant know in advance if that said object has len defined such as being a collection so this question does not really apply to your post I guess.

thebestaquaman@lemmy.world · 4 months ago

Exactly as you said yourself: Checking falsieness does not guarantee that the object has a length. There is considerable overlap between the two, and if it turns out that this check is a performance bottleneck (which I have a hard time imagining) it can be appropriate to check for falsieness instead of zero length. But in that case, don’t be surprised if you suddenly get an obscure bug because of some custom object not behaving the way you assumed it would.

I guess my primary point is that we should be checking for what we actually care about, because that makes intent clear and reduces the chance for obscure bugs.

PattyMcB@lemmy.world · 4 months ago

I know I’m gonna get downvoted to oblivion for this, but… Serious question: why use Python if you’re concerned about performance?

JustAnotherKay@lemmy.world · edit-2 4 months ago

Honestly most people use Python because it has fantastic libraries. They optimize it because the language is middling, but the libraries are gorgeous

ETA: This might double post because my Internet sucks right now, will fix when I have a chance

ThirdConsul@lemmy.ml · edit-2 4 months ago

Honestly most people use Python because it has fantastic libraries

In C++ if I remember correctly…

Edit: I do https://codefinity.com/blog/Python-Libraries-Written-in-C-plus-plus

JustAnotherKay@lemmy.world · 4 months ago

What do I care what language the library is written in as long as it works for what I need it do?

ThirdConsul@lemmy.ml · 3 months ago

My point is tha the libraries itself are not in Python and thus most likely not exclusive to it. This is not an attack on Python, I just find it funny a bit :)

Takapapatapaka@lemmy.world · 4 months ago

You may want to beneficiate from little performance boost even though you mostly don’t need it and still need python’s advantages. Being interested in performance isnt always looking for the very best performance there is out of any language, it can also be using little tips to go a tiny bit faster when you can.

Randelung@lemmy.world · 3 months ago

It comes down to the question “Is YOUR C++ code faster than Python?” (and of course the reverse).

I’ve built a SCADA from scratch and performance requirements are low to begin with, seeing as it’s all network bound and real world objects take time to react, but I’m finding everything is very timely.

A colleague used SQLAlchemy for a similar task and got abysmal performance. No wonder, it’s constantly querying the DB for single results.

Jerkface (any/all)@lemmy.ca · 4 months ago

Alternatively, why wait twice as long for your python code to execute as you have to?

Sirber@lemmy.ca · edit-2 4 months ago

How does Python know if it’s my list or not?

jj4211@lemmy.world · 3 months ago

else: # not my list, it is ourlist

iAvicenna@lemmy.world · edit-2 4 months ago

Yea and then you use “not” with a variable name that does not make it obvious that it is a list and another person who reads the code thinks it is a bool. Hell a couple of months later you yourself wont even understand that it is a list. Moreover “not” will not throw an error if you don’t use an sequence/collection there as you should but len will.

You should not sacrifice code readability and safety for over optimization, this is phyton after all I don’t think list lengths will be your bottle neck.

Jerkface (any/all)@lemmy.ca · 4 months ago

Strongly disagree that not x implies to programmers that x is a bool.

iAvicenna@lemmy.world · 4 months ago

well it does not imply directly per se since you can “not” many things but I feel like my first assumption would be it is used in a bool context

thebestaquaman@lemmy.world · 4 months ago

I would say it depends heavily on the language. In Python, it’s very common that different objects have some kind of Boolean interpretation, so assuming that an object is a bool because it is used in a Boolean context is a bit silly.

iAvicenna@lemmy.world · edit-2 4 months ago

Well fair enough but I still like the fact that len makes the aim and the object more transparent on a quick look through the code which is what I am trying to get at. The supporting argument on bools wasn’t’t very to the point I agree.

That being said is there an application of “not” on other classes which cannot be replaced by some other more transparent operator (I confess I only know the bool and length context)? I would rather have transparently named operators rather than having to remember what “not” does on ten different types. I like duck typing as much as the next person, but when it is so opaque (name-wise) as in the case of “not”, I prefer alternatives.

For instance having open or read on different objects which does really read or open some data vs not some object god knows what it does I should memorise each case.

Jerkface (any/all)@lemmy.ca · edit-2 4 months ago

Truthiness is so fundamental, in most languages, all values have a truthiness, whether they are bool or not. Even in C, int x = value(); if (!x) x_is_not_zero(); is valid and idiomatic.

I appreciate the point that calling a method gives more context cues and potentially aids readability, but in this case I feel like not is the python idiom people expect and reads just fine.

iAvicenna@lemmy.world · 4 months ago

I don’t know, it throws me off but perhaps because I always use len in this context. Is there any generally applicable practical reason why one would prefer “not” over len? Is it just compactness and being pythonic?

Jerkface (any/all)@lemmy.ca · edit-2 4 months ago

It’s very convenient not to have to remember a bunch of different means/methods for performing the same conceptual operation. You might call len(x) == 0 on a list, but next time it’s a dict. Time after that it’s a complex number. The next time it’s an instance. not works in all cases.

thebestaquaman@lemmy.world · 4 months ago

I definitely agree that len is the preferred choice for checking the emptiness of an object, for the reasons you mention. I’m just pointing out that assuming a variable is a bool because it’s used in a Boolean context is a bit silly, especially in Python or other languages where any object can have a truthiness value, and where this is commonly utilised.

iAvicenna@lemmy.world · 4 months ago

It is not “assume” as in a conscious “this is probably a bool I will assume so” but more like a slip of attention by someone who is more used to the bool context of not. Is “not integer” or “not list” really that commonly used that it is even comparable to its usage in bool context?

thebestaquaman@lemmy.world · edit-2 4 months ago

Then I absolutely understand you :)

How common it is 100 % depends on the code base and what practices are preferred. In Python code bases where I have a word in decisions, all Boolean checks should be x is True or x is False if x should be a Boolean. In that sense, if I read if x or if not x, it’s an indicator that x does not need to be a Boolean.

In that sense, I could say that my preference is to flip it (in Python): Explicitly indicate/check for a Boolean if you expect/need a Boolean, otherwise use a “truethiness” check.

Glitchvid@lemmy.world · 4 months ago

if not x then … end is very common in Lua for similar purposes, very rarely do you see hard nil comparisons or calls to typeof (last time I did was for a serializer).

Jerkface (any/all)@lemmy.ca · 4 months ago

deleted by creator

jj4211@lemmy.world · 3 months ago

In context, one can consider it a bool.

Besides, I see c code all the time that treats pointers as bool for the purposes of an if statement. !pointer is very common and no one thinks that means pointer it’s exclusively a Boolean concept.

JustAnotherKay@lemmy.world · 4 months ago

Doesn’t matter what it implies. The entire purpose of programming is to make it so a human doesn’t have to go do something manually.

not x tells me I need to go manually check what type x is in Python.

len(x) == 0 tells me that it’s being type-checked automatically

acosmichippo@lemmy.world · edit-2 4 months ago

i haven’t programmed since college 15 years ago and even i know that 0 == false for non bool variables. what kind of professional programmers wouldn’t know that?

LegoBrickOnFire@lemmy.world · 3 months ago

I really dislike using boolean operators on anything that is not a boolean. I recently made an esception to my rule and got punished… Yeah it is skill issue on my part that I tried to check that a variable equal to 0 was not None using “if variable…”. But many programming rules are there to avoid bugs caused by this kind of inattention.

acosmichippo@lemmy.world · 4 months ago

if you’re worried about readability you can leave a comment.

thebestaquaman@lemmy.world · 4 months ago

There is no guarantee that the comment is kept up to date with the code. “Self documenting code” is a meme, but clearly written code is pretty much always preferable to unclear code with a comment, largely because you can actually be sure that the code does what it says it does.

Note: You still need to comment your code kids.

iAvicenna@lemmy.world · 4 months ago

If there is an alternative through which I can achieve the same intended effect and is a bit more safer (because it will verify that it has len implemented) I would prefer that to commenting. Also if I have to comment every len use of not that sounds quite redundant as len checks are very common

Opisek@lemmy.world · 4 months ago

The graph makes no sense. Did a generative AI make it.

pyre@lemmy.world · 3 months ago

yeah I got angry just looking at it

🌶️ - knighthawk@lemmy.ml · 4 months ago

so these are the only 2 ways then? huge if true

ne0n@lemmy.world · 4 months ago

Isn’t “-2x faster” 2x slower?

Randelung@lemmy.world · 3 months ago

Maybe they mean up to?

Harvey656@lemmy.world · 3 months ago

I could have tripped, knocked over my keyboard, cried for 13 straight minutes on the floor, picked my keyboard back up, accidentally hit the enter key making a graph and it would have made more sense than this thing.

-2x faster. What does that even mean?

AnUnusualRelic@lemmy.world · 3 months ago

There’s probably an “import * from relativity” in there somewhere.

Archr@lemmy.world · 3 months ago

I haven’t read the article. But I’d assume this is for the same reason that not not string is faster than bool(string). Which is to say that it has to do with having to look up a global function rather than a known keyword.

AnUnusualRelic@lemmy.world · edit-2 3 months ago

From that little image, they’re happy it takes a tenth of a fucking second to check if a list is empty?

What kind of dorito chip is that code even running on?

borokov@lemmy.world · 4 months ago

Isn’t it because list is linked list, so to get the Len it has to iterate over the whole list whereas to get emptyness it just have to check if there is a 1st element ?

I’ too lazy to read the article BTW.

dreugeworst@lemmy.ml · 4 months ago

why comment if you don’t even want to read the article? python lists are not linked lists, they’re contiguous with a smart growth strategy.

borokov@lemmy.world · 3 months ago

I comment because this is how a social network works, and this is how you keep lemmy alive. My comment has generated a dozen of other comments, so he achieved his goal.

There is not a single question that’s already have been answered on internet, so there no point on asking anything on social plateforms except just for the sake of interacting with other peoples.

Lemmy is not stackoverflow 😉

dreugeworst@lemmy.ml · 3 months ago

If the point of Lemmy is just to generate as many comments as possible with everyone just assuming whatever they want about linked articles without reading them I’ll quickly leave again. I’m here for informed discussion, not for a competition in generating engagement

riodoro1@lemmy.world · edit-2 3 months ago

So… it has to iterate over the whole empty list is what you’re saying? like once for every of the zero items in the list?

borokov@lemmy.world · 4 months ago

Don’t know how list are implemented in Python. But in the dumb linked list implementation (like C++ std::list), each element has a “next” member that point the the next element. So, to have list length, you have to do (pseudo code, not actual python code):

len = 0
elt = list.fisrt
while exist(elt):
    elt = elt.next
    len++
return len

Whereas to test if list is empty, you just have to:

return exist(list.first)

riodoro1@lemmy.world · 4 months ago

That’s exactly what I was getting at. Getting length of an empty list would not even enter the loop.