This immediately gets a half-dozen answers that all do some equivalent of:lst
is a list of strings and numbers. I want to convert the numbers toint
but leave the strings alone. How do I do that?
lst = [int(x) if x.isdigit() else x for x in lst]This has a number of problems, but they all come down to the same two:
- "Numbers" is vague. You can assume it means only integers based on "I
want to convert the numbers to
int
", but does it mean Python integer literals, things that can be converted with theint
function with nobase
, or things that can be converted with theint
function withbase=0
, or something different entirely, like JSON numbers or Excel numbers or the kinds of input you expect your 3rd-grade class to enter? - Whichever meaning you actually wanted,
isdigit()
does not test for that.
If it means "things that can be converted with the
int
function with no base
", the right answer—as usual in Python—is to just try
to
convert with the int
function:def tryint(x): try: return int(x) except ValueError: return x lst = [tryint(x) for x in lst]Of course if you mean something different, that's not the right answer. Even "valid integer literals in Python source" isn't the same rule. (For example,
099
is an invalid literal in both 2.x and 3.x, and
012
is valid in 2.x but probably not what you wanted, but
int('099')
and int('0123')
gives 99 and 123.)
That's why you have to actually decide on a rule that you want to apply;
otherwise, you're just assuming that all reasonable rules are the same,
which is a patently false assumption. If your rule isn't actually "things that can be converted with the int
function with no base
, then the isdigit
check is wrong, and the int(x)
conversion is also wrong.What specifically is wrong with isdigit
?
I'm going to assume that you already thought through what you meant by
"number", and the decision was "things that can be converted to
int
with the int
function with no base
", and you're just looking for how to LBYL that so you
don't have to use a try
.Negative numbers
Obviously,-234
is an integer, but just as obviously, "-234".isdigit()
is clearly going to be false, because -
is not a digit.Sometimes people try to solve this by writing
all(c.isdigit() or c == '-' for c in x)
. But, besides being a whole lot slower and more complicated, that's even more wrong. It means that 123-456
now looks like an integer, so you're going to pass it to int
without a try
, and you're going to get a ValueError
from your comprehension.Of course you can solve that problem with
(x[0].isdigit() or x[0] == '-') and x[1:].isdigit()
, and now maybe every test you've thought of passes. But it will give you "1"
instead of converting that to an integer, and it will raise an IndexError
for an empty string.One of these might be correct for handling negative integer numerals:
x.isdigit() or x.startswith('-') and x[1:].isdigit() re.match(r'-?\d+', x)?But is it obvious that either one is correct? The whole reason you wanted to use
isdigit
is to have something simple, obviously right, and fast, and you already no longer have that. And we're not even nearly done yet.Positive numbers
+234
is an integer too. And int
will treat it as one. But the code above won't. So now, whatever you did for -
, you have to do the same thing for +
. WHich is pretty ugly if you're using the non-regex solution:lst = [int(x) if x.isdigit() or x.startswith(('-', '+')) and x[1:].isdigit() else x for x in lst]
Whitespace
Theint
function allows the numeral to be surrounded by whitespace. But isdigit
does not. So, now you have to add .strip()
before the isdigit()
call. Except we don't just have one isdigit
call; to fix the other problems we've had two go with two isdigit
calls and a startswith
, and surely you don't want to call strip
three times. Or we've switched to a regex. Either way, now we've got:lst = [int(x) if x.isdigit() or x.startswith(('-', '+')) and x[1:].isdigit() else x for x in (x.strip() for x in lst)] lst = [int(x) if re.match('\s*[+-]?\d+\s*', x) else x for x in lst]
What's a digit?
Theisdigit
function tests for characters that are in the Number, Decimal Digit category. In Python 3.x, that's the same rule the int
function uses.But 2.x doesn't use the same rule. If you're using a
unicode
,
it's not entirely clear what int
accepts, but it's not all
Unicode digits, at least not in all Python 2.x implementations and versions;
if you're using a str
encoded in your default encoding, int
still accepts the same set of digits, but isdigit
only checks ASCII digits.Plus, if you're using either 2.x or 3.0-3.2, and you've got a "narrow" Python build (like the default builds for Windows from python.org),
isdigit
is actually checking each UTF-16 code point, not each character, so for "\N{MATHEMATICAL SANS-SERIF DIGIT ZERO}"
, isdigit
will return False, but int
should accept it.So, if your user types in an Arabic number like
١٠٤
, the isdigit
check may mean you end up with "١٠٤"
, or it may mean you end up with the int 104
, or it may be one on some platforms and the other on other platforms.I can't even think of any way to LBYL around this problem except to just say that your code requires 3.3+.
Have I thought of everything?
I don't know. Do you know? If you don't how are you going to write code that handles the things we haven't thought of.Other rules might be even more complicated than the
int
with no base
rule. For different use cases, users might reasonably expect 0x1234
or 1e10
or 1.0
or 1+0j
or who knows what else to count as integers. The way to test for whatever it is you want to test for is still simple: write a conversion function for that, and see if it fails. Trying to LBYL it means that you have to write most of the same logic twice. Or, if you're relying on int
or literal_eval
or whatever to provide some or all of that logic, you have to duplicate its logic.
View comments