1. Does Python pass by value, or pass by reference?

    Neither.

    If you twist around how you interpret the terms, you can call it either. Java calls the same evaluation strategy "pass by value", while Ruby calls it "pass by reference". But the Python documentation carefully avoids using either term, which is smart.

    (I've been told that Jeff Knupp has a similar blog post called Is Python call-by-value or call-by-reference? Neither. His seems like it might be more accessible than mine, and it covers most of the same information, so maybe go read that if you get confused here, or maybe even read his first and then just skim mine.)

    Variables and values

    The distinction between "pass by value" and "pass by reference" is all about variables. Which is why it doesn't apply to Python.

    In, say, C++, a variable is a typed memory slot. Values live in these slots. If you want to put a value in two different variables, you have to make a copy of the value. Meanwhile, one of the types that variables can have is "reference to a variable of type Foo". Pass by value means copying the value from the argument variable into the parameter variable. Pass by reference means creating a new reference-to-variable value that refers to the argument variable, and putting that in the parameter variable.

    In Python, values live somewhere on their own, and have their own types, and variables are just names for those values. There is no copying, and there are no references to variables. If you want to give a value two names, that's no problem. And when you call a function, that's all that happens—the parameter becomes another name for the same value. It's not the same as pass by value, because you're not copying values. It's not the same as pass by reference, because you're not making references to variables. There are incomplete analogies with both, but really, it's a different thing from either.

    A simple example

    Consider this Python code:

        def spam(eggs):
            eggs.append(1)
            eggs = [2, 3]
    
        ham = [0]
        spam(ham)
        print(ham)
    

    If Python passed by value, eggs would have a copy of the value [0]. No matter what it did to that copy, the original value, in ham, would remain untouched. So, it would print out [0].

    If Python passed by reference, eggs would be a reference to the variable ham. It could mutate ham's value, and even put a different value in it. So, it would print out [2, 3].

    But in fact, eggs becomes a new name for the same value [0] that ham is a name for. When it mutates that value, ham sees the change. When it rebinds eggs to a different value, ham is still naming the original value. So, it prints out [0, 1].

    Ways to confuse yourself

    People learn early in their schooling that pass by value and pass by reference are the two options, and when they learn Python, they try to figure out which one of the two it does. If you think about assignment, but not about mutability, Python looks like pass by value. If you think about mutability, but not assignment, Python looks like pass by reference. But it's neither.

    Why does Java call this "pass by value"?

    In Java, there are actually two types of values—native values, like integers, and class values. With native values, when you write "int i = 1", you're creating an int-typed memory slot and copying the number 1 into it. But with class values, when you type Foo foo = new Foo()", you're creating a new Foo somewhere, and also creating a reference-to-Foo-typed memory slot and copying a reference to that new Foo into it. When you call a function that takes an int argument, Java copies the int value from your variable to the parameter. And when you call a function that takes a Foo argument, Java copies the reference-to-Foo value from your variable to the parameter. So, in that sense, Java is calling by value. (This is basically the same thing that most Python implementations actually do deep under the covers, but you don't normally have to, or want to, think about it. Java puts that right up on the surface.)

    Why does Ruby call this "pass by reference"?

    Basically, Ruby is emphasizing the fact that you can modify things by passing them as arguments.

    This makes more sense for Ruby than for Python, because Ruby is chock full of mutating methods, and there's an in-place way to do almost everything, while Python only has in-place methods for the handful of things where it really matters. For example, in Python, there is no in-place equivalent of filter, but in Ruby, select! is the in-place equivalent of select.

    But really, it's still sloppy terminology in Ruby. If you assign to a parameter inside a function (or block), it does not affect the argument any more than in Python.

    So what should we call it?

    People have tried to come up with good names. Before Python ever existed, Barbara Liskov realized that there was an entirely different evaluations strategy from pass by value and pass by reference, and called it "pass by object". Others have called it "pass by sharing". Or, just to confuse novices even farther, "pass by value reference". But none of these names ever gained traction; nobody is going to know what these names mean if you use them.

    So, just don't call it anything. Anyone who tries to answer a question by saying, "Well, Python passes by reference [or by value, or by some obscure term no one has ever heard of], so…" is just confusing things, not explaining.

    When Python's parameter evaluation strategy actually matters, you have to describe how it works and how it matters. So just do that.

    When it doesn't matter, don't bring it up.

    Ways to un-confuse yourself

    Instead of trying to get your head around how argument passing works in Python, get your head around how variables work. That's where the big difference from languages like Java and C++ lies, and once you understand that difference, argument passing becomes simple and obvious.

    How do I do real pass by reference, then?

    You don't. If you want mutable objects, use mutable objects. If you want a function to re-bind a variable given to it as an argument, you're almost always making the same mistake as when you try to dynamically create variables by name.

    And that should be a clue to one way you can get around it when "almost always" isn't "always": Just pass the name, as a string. (And, if it's not implicitly obvious what namespace to look the name up in, pass that information as well.)

    But usually, it's both simpler and clearer to wrap your state up in some kind of mutable object—a list, a dict, an instance of some class, even a generic namespace object—and pass that around and mutate it.

    What about closures?

    Yes, using closures can also be a way around needing to pass by reference. This shouldn't be too surprising—closures and objects are dual concepts in a lot of ways.

    I meant how do closures work, if there's no pass by reference?

    Oh, you just had to ask, didn't you. :)

    The simple, but inaccurate, way to think about it is that a closure stores a name as a string and a scope to look it up in. Just like I just told you not to do. But it's happening under the covers. In particular, your Python code never uses the variable name as a string. The fact that the interpreter uses the variable name as a string doesn't matter; it always uses names as strings. If you look at the attributes of a function object and its code object, you'll see the names of any globals you reference (including modules, top-level functions, etc.) any locals you create, the function parameters themselves, etc.

    But if you look carefully (or think about it hard enough), you'll realize that this isn't actually sufficient for closures. Closures are implemented by storing special cell objects in the function object, which are basically references to variables in some nonlocal frame, and there are special bytecodes to load and store those variables. (The code that owns the frame uses those bytecodes to access the variables it exposes to the closure; the code inside the closure uses them to access the variables it has cells for.) You can even see one of these cell objects in Python, and read the value of the referenced variable (func.__closure__[0].cell_contents, it you're dying of curiosity), although you can't write the value this way.

    And if you're now thinking, "Aha, with some bytecode hacking, I could fake call by reference". Well, yeah, but you think it'll be simpler than passing around a 1-element list so you can fake call by reference?
    9

    View comments

  2. In database apps, you often want to create tables, views, and indices only if they don't already exist, so they do the setup work the first time, but don't blow away all of your data every subsequent time. So SQL has a special "IF NOT EXISTS" clause you can add to the various CREATE statements.

    Occasionally, you want to do the same thing in Python. For example, this StackOverflow user likes to re-run the same script over and over in the interactive session (e.g., by hitting F5 in the IDE). That's kind of an odd thing to do in general, but it's not hard to imagine cases where it makes sense. For example, you might be expanding or debugging part of the script, and want to use the rest of the script while you do so.

    Normally, that wouldn't be a problem, but what if the script and modified created some global variables, or class or function attributes, etc., and you didn't want those to be overwritten?
    That might sound like an anti-pattern, but imagine that you have a function that you've memoized with functools.lru_cache, and it's cached hundreds of expensive values. If you replace it with a new copy of the function, it'll have an empty cache.

    Of course the right thing to do is to factor out the script into separate modules, and have the script import the stable code instead of just including it directly. But you don't always want to take a break from actively hacking on code to refactor it.

    The easy (but ugly) way

    You can always do this:
        @lru_cache()
        def _square(x):
            return x*x
        try:
            square
        except NameError:
            square = _square
    
    And if you only have to do it to one function, maybe that's the best answer. But if you have to do it to a dozen functions, that'll get ugly, and all that repetition is an invitation to copy and paste and screw it up somewhere. So, what you want to do is factor it out into a function.

    But how? What you want is something like this:
        create_if_not_exists(square, _square)
    
    In a language like C++, you'd do that by taking a reference to a function variable as the first parameter, but you can't have a reference to the square variable into the function, because that doesn't make any sense in Python; variables aren't things you can take references of.

    You might be able to use some horrible frame hacks to pass the value in and have the function figure out the name from the calling frame, but this is already hacky enough. You might be able to do it with MacroPy, but there are probably cooler ways you can solve the original problem once you're using macros.

    Strings as names

    The key thing to notice is that ultimately, a variable name is just a string that gets looked up in the appropriate scope. Any frame hack, macro, etc. would just be getting the name as a string and setting its value by name anyway, so why not make that explicit?

    This is one of those examples that shows that, while usually you don't want to dynamically create variables, occasionally you do.

    So, how do you do it?

    There are three options.

    • Use exec to declare the variable global or nonlocal and then reassign it.
    • Call setattr on the enclosing scope object.
    • Use the globals dict.

    First, using exec for reflection is almost always the wrong answer, so let's just rule that out off the bat.

    The setattr solution is more flexible, but in this case I think that's actually a negative. The whole point of what we're trying to do is to modify the global scope by (re-)executing a script. If it doesn't work when you instead execfile the script in the middle of a function… good!

    The way to create a global variable dynamically is:
        def create(name, value):
            globals()[name] = value
    

    The "if not exists" part


    Of course create('square', _square) does the exact same thing as just square = _square. We wanted to only bind square if it doesn't exist, not rebind it no matter what.

    Once you think of it as dict value assignment, the answer is obvious:
        def create_if_not_exists(name, value):
            globals().setdefault(name, value)
    
    And that's the whole trick.

    Decorators

    Except it's not the whole trick; there's one more thing we can do: Turn it into a decorator.
        def create_if_not_exists(name):
            def wrap(func):
                globals().setdefault(name, func)
                return func
            return wrap
    

    Getting the name for free

    Remember when I said that you can't get the name of a variable? Well, a decorator is only going to be called on functions or classes, and almost only ever going to be called on a function or class created with the def or class statement (or the result of decorating such a thing), which means it will have a name built in, as its __name__ attribute.

    The problem is, this is the name of the actual implementation function, _square, not the name we want to bind it to, square. But those two names should give you an idea: If you just make that underscore prefix a naming convention (and it already fits in with existing Python naming conventions pretty well), you _can_ get the name to bind to. So:
        def create_if_not_exists(func):
            assert func.__name__[0] == '_'
            globals().setdefault(func.__name__[1:], func)
            return func
    
    And now, all we need to do to prevent the new (decorated) function from overwriting the old one is to attach this decorator:
        @create_if_not_exists
        @lru_cache()
        def _square(x):
            return x*x
    
    And now, we really are done. How much simpler can you get than that?

    When you want to rebind square

    Maybe square was supposed to be part of the safe, static code that you don't want to blow away with every re-run, but then you found a bug in it. How do you load the new version?

    Simple: either del square before hitting F5, or square = _square after hitting F5.

    A brief discursion on recursion

    Renaming functions after they're created doesn't play well with recursive functions. In Python, recursive functions call themselves by global lookup on their own name. So, if you write this:
        @create_if_not_exists
        @lru_cache()
        def _fact(n):
            if n < 2: return 1
            return n * _fact(-1)
    
    … your original _fact function is recursively calling whatever happens to be named _fact at run time. Which means that, after a re-run, it's going to be calling the new _fact function, with its new and separate cache, which makes the whole cache thing worthless.

    The answer is simple: Call yourself by your public name, not your private name.
        @create_if_not_exists
        @lru_cache()
        def _fact(n):
            if n < 2: return 1
            return n * fact(-1)
    
    Now, your original _fact function, which you've also bound to fact (and, for that matter, the new _fact that you haven't bound to fact) will call fact, which is still the original function. Tada.
    0

    Add a comment

  3. Sometimes you want to write a round-trippable __repr__ method--that is, you want the string representation to be valid Python code that generates an equivalent object to the one you started with.

    First, ask yourself whether you really want this. A round-trippable repr is very nice for playing with your objects in an interactive session--but if you're trying to use repr as a serialization format, don't do that.

    Now that you're sure you want to do it, here are a few simple rules that novices often get wrong.

    What if I can't write a round-trippable repr?

    Sometimes there's some format that's useful for debugging purposes, even though it's not round-trippable. The repr of a BeautifulSoup node is the raw HTML or XML for the node's subtree. The repr of a list is only round-trippable if it doesn't contain itself. The repr of a NumPy array is round-trippable if it's small, but elided if it's too big. And so on. This can be perfectly reasonable.

    In these cases, generally, you should avoid making anything that looks round-trippable, and also avoid anything that looks like the default repr with the angle brackets and address or id. But note that lists and NumPy arrays both violate that guidelines.

    If you can't think of anything useful, definitely don't try to fake the default repr with the angle brackets and address or id. Just leave the default. If you have some base class that overrides __repr__ and you want to undo that, just call object.__repr__(self).

    A repr should look like a constructor call

    The basic rule is that, for any type that doesn't have a built-in literal representation (and those types already have built-in repr methods), the repr should look like a constructor call. Look at what repr does with various stdlib classes:
        datetime.datetime(2013, 11, 7, 12, 1, 49, 915797)
        bytearray(b'abcdef')
        deque([1, 2, 3], maxlen=4)
    
    That's what you want to do with your classes. So, if you've written this:
        class Breakfast:
            def __init__(self, spam, eggs=0, beans=0):
                self.spam, self.eggs, self.beans = spam, eggs, beans
            def more_spam(self, spam):
                self.spam += spam
            def __repr__(self):
                ???
    
        breakfast = Breakfast(3, beans=10)
        breakfast.more_spam(4)
        print(repr(breakfast))
    
    ... you should get something like this:
        meals.Breakfast(spam=7, eggs=0, beans=10)
    

    What arguments?

    Obviously, the arguments are whichever arguments will generate an equivalent object.

    What's an equivalent object? Well, if you've designed your objects to be comparable, usually the answer is that whatever __eq__ considers equal is what you want. If you haven't, it may be a bit harder to muddle out the right rule, but it's worth doing--and if you can't think of one, you probably shouldn't be generating a round-trippable repr for your type.

    In the case of the Breakfast type, it's obvious: there are three attributes, and the constructor sets them all to the three arguments it gets, so just pass them.

    To keyword, or not to keyword?

    These are both perfectly valid, equivalent representations:
        Breakfast(7, 0, 10)
        Breakfast(spam=7, eggs=0, beans=10)
    
    Which one is better? Well, the first one is more concise, but the second one is more explicit. So they're each better than the other.

    The important thing to keep in mind is that you're doing this for readability and usability. If the parameters are obvious enough that you usually pass the arguments without keywords, generate the arguments without keywords. Otherwise, with. That's really the only factor that goes into the decision.

    Which class?

    It's tempting to hard-code the class name into the string. But what if someone builds a subclass of your class, and doesn't add any new constructor parameters, maybe doesn't even override __init__? Why should they have to override __repr__ as well? If that's at all plausible for your type, use the type name dynamically.

    (Also, if it seems reasonable that someone might rename your class for some reason, e.g., while wrapping it up, use the type name dynamically.)

    Finally, if you want the repr to be round-trippable, you want to use the qualified name of the class. But how qualified? Do you want just the __qualname__, or do you want to add the module name as well? Note that datetime and deque in the stdlib examples above made different choices. Ultimately, the best way to decide is whether you expect your users to be typing "datetime.datetime", or just "deque"--that is, is your module intended to be used with "from module import class1, class2", or with "import module"?

    Note that the repr of a class with a standard metaclass is the same as this:
        '{!r}.{!r}'.format(self.__module, self.__qualname__)
    
    And if you want the fully-qualfiied name, that's exactly what you'd do yourself. So you might as well just use the repr.

    Always delegate to repr, never str

    A common mistake is to write one of the following:
        def __repr__(self):
            return 'Breakfast({}, {}, {})'.format(self.spam, self.eggs, self.beans)
    
        def __repr__(self):
            return 'Breakfast(%s, %s, %s)' % (self.spam, self.eggs, self.beans)
    
    That's perfectly fine when your members are small integers, but what if they're, say, strings? The str of a string is not a valid string literal. It's even worse if the string has invisible characters, or Unicode characters that your console can't represent. Even worse if it has a comma in the middle of it, or quotes. And if you've got both bytes and unicode strings, calling str on a bytes is a bad idea--and, in 2.x, calling it on a unicode object is an even worse one.

    Don't try to fix this by manually quoting things; if you ever find yourself typing "%s" or "{}", you're probably doing something wrong. If you get to the point where you're trying to figure out how to escape quotes in the middle of it, you're definitely doing something wrong.

    And of course strings aren't the only problem. For example, a datetime's str is an ISO-format string, with or without microseconds, with no timezone information. It's not just something that isn't a valid expression on its own, it's something that's painful to parse even when you know you need to parse it (although 3.4 or 3.5 should make that better).

    The solution is to use the repr of each member, not the str:
        def __repr__(self):
            return 'Breakfast({!r}, {!r}, {!r})'.format(self.spam, self.eggs, self.beans)
    
        def __repr__(self):
            return 'Breakfast(%r, %r, %r)' % (self.spam, self.eggs, self.beans)
    
    Note that the other way around isn't as universally true: sometimes it makes sense for str to delegate to repr (as all of the built-in collections do).

    Putting it all together

    In this silly example, we don't have enough information to decide whether we want the argument names, or whether we want the module name. Even in real life, that can happen. It's generally better to err on the side of being too explicit than too implicit, so let's do that here:
        def __repr__(self):
            return '{!r}(spam={!r}, eggs={!r}, beans={!r})'.format(
                self.__class__, self.spam, self.eggs, self.beans)
    
    0

    Add a comment

  4. Many novices notice that, for many types, repr and eval are perfect opposites, and assume that this is a great way to serialize their data:
        def save(path, *things):
            with open(path, 'w') as f:
                for thing in things:
                    f.write(repr(thing) + '\n')
    
        def load(path):
            with open(path) as f:
                return [eval(line) for line in f]
    

    If you get lucky, you start running into problems, because some objects don't have a round-trippable repr. If you don't get lucky, you run into the _real_ problems later on.

    Notice that the same basic problems come up designing network protocols as file formats, with most of the same solutions.

    The obvious problems

    By default, a custom class--like many third-party and even stdlib types--will just look like <spam.Spam at 0x12345678>, which you can't eval. And these are the most fun kinds of bugs--the save succeeds, with no indication that anything went wrong, until you try to load the data later and find that all your useful information is missing.

    You can add a __repr__ method to your own types (which can be tricky; I'll get to that later), and maybe even subclass or patch third-party types, but eventually you run into something that just doesn't have an obvious round-trippable representation. For example, what string could you eval to re-generate an ElementTree node?

    Besides that, there are types that are often round-trippable, but aren't when they're too big (like NumPy arrays), or in some other cases you aren't likely to run into until you're deep into development (e.g., lists that contain themselves).

    The real problems

    Safety and security

    Let's say you've written a scheduler program. I look at the config file, and there's a line like this (with the quotes):
        "my task"
    
    What do you think will happen if I change it to this?
        __import__("os").system("rm -rf /")
    
    The eval docs explicitly point this out: "See ast.literal_eval() for a function that can safely evaluate strings with expressions containing only literals." Since the set of objects that can be safely encoded with repr and eval is pretty not much wider than the set of objects that can be encoded as literals, this can be a solution to the problem. But it doesn't solve most of the other problems.

    Robustness

    Using repr/eval often leads to bugs that only appear in certain cases you may not have thought to test for, are very hard to track down when they do.

    For example, if you accidentally write an f without quotes when you meant to write "f", that _might_ get an error at eval time... or, if you happen to have a variable named f lying around at eval time, it'll get whatever value is in that variable. (Given the name, chances are it's a file that you expected to be short-lived, which now ends up being kept open for hours, causing some bug half-way across your program...)

    And the fact that repr looks human-readable (as long as the human is the developer) makes such un-caught mistakes even more likely once you start editing the files by hand.

    Using literal_eval solves this problem, but not in the best way. You will usually get an error at read time, instead of successfully reading in garbage. But it would be a lot nicer to get an error at write time, and there's no way to do that with repr.

    Portability

    It would be nice if your data were usable by other programs. Python representations look like they should be pretty portable--they look like JavaScript, and Ruby, and most other "scripting" langauges (and in some cases, even like valid initializers for C, C++, etc.).

    But each of those languages is a little bit different. They don't backslash-escape the same characters in strings, and have different rules about what unnecessary/unknown backslashes mean, or what characters are allowed without escaping. They have different rules for quoting things with quotes in them. Almost all scripting languages agree on the basic two collection types (list/array and dict/hash/object), but most of them have at least one other native collection type that the others don't. For example, {1, 2, 3} is perfectly legal JavaScript, but it doesn't mean a set of three numbers.

    Unicode

    Python string literals can have non-ASCII characters in them... but only if you know what encoding they're in. Source files have a coding declaration to specify that. But data files don't (unless you decide you want to incorporate PEP 263 into your data file spec, and write the code to parse the coding declarations, and so on).

    Fortunately, repr will unicode-escape any strings you give it. (Some badly-designed third-party types may not do that properly, so you'll have to fix them.)

    But this means repr is not at all readable for non-English strings. A Chinese name turns into a string of \u1234 sequences.

    What to use instead

    The key is that you want to use a format designed for data storage or interchange, not one that just happens to often work.

    JSON

    JavaScript Object Notation is a subset of JavaScript literal syntax, which also happens to be a subset of Python literal syntax. But JSON has advantages over literal_eval.

    It's a de facto standard language for data interchange. Python comes with a json module; every other language you're likely to use has something similar either built in or easily available; there are even command-line tools to use JSON from the shell. Good text editors understand it.

    There's a good _reason_ it's a de facto standard: It's a good balance between easy for machines to generate, and parse, and validate, and easy for humans to read and edit. It's so simple that a JSON generator can add easy-to-twiddle knobs to let you trade off between compactness and pretty-printing, etc. (and yes, Python's stdlib module has them).

    Finally, it's based on UTF-8 instead of ASCII (or Latin-1 or "whatever"), so it doesn't have to escape any characters except a few special invisible ones; a Chinese name will look like a string of Chinese characters (in a UTF-8-compatible editor or viewer, at least).

    pickle

    The pickle module provides a Python-specific serialization framework that's incredibly powerful and flexible.

    Most types can be serialized without you even having to think about it. If you need to customize the way a class is pickled, you can, but you usually don't have to. It's robust. Anything that can't be pickled will give you an error at save time, rather than at load time. It's generally fast and compact.

    Pickle even understands references. For example, if you have a list with 20 references to the same list, and you dump it out and restore it with repr/eval, or JSON, you're going to get back a list of 20 separate copies that are equal, but not identical; with Pickle, you get what you started with. This also means that pickle will only use 1/20th as much storage as repr or JSON for that list. And it means pickle can dump things with circular dependencies.

    But of course it's not a silver bullet.

    Pickle is not meant to be readable, and in fact the default format isn't even text-based.

    Pickle does avoid the kind of accidental problems that make eval unsafe, but it's no more secure against malicious data.

    Pickle is not only Python-specific, but your-program-specific. In order to load an instance of a custom class, pickle has to be able to find the same class in the same module that it used at save time.

    Between JSON and pickle

    Sometimes you need to store data types that JSON can't handle, but you don't want all the flexibility (and insecurity and non-human-readability) of pickle.

    The json module itself can be extended to tell it how to encode your types; simple types can often be serialized as just the class name and the __dict__; complex types can mirror the pickling API. jsonpickle can do a lot of this work for you.

    YAML is a superset of JSON that adds a number of basic types (like datetime), and an extensibility model, plus an optional human-friendly (indentation-based) alternative syntax. You can restrict yourself to the safe subset of YAML (no extension types), or you can use it to build something nearly as powerful as pickle (simple types encoded as basically the class name plus the __dict__, complex types using a custom encoding).

    There are also countless other serialization formats out there, some of which fall within this range. Many of them are better when you have a more fixed, static shape for your objects. Some of them, you'll be forced to use because you're talking to some other program that only talks ASN.1 or plist or whatever. Wikipedia has a comparison table, with links.

    Beyond JSON and pickle

    Generally, you don't really need to serialize your objects, you need to serialize your data in such a way that you can create objects as-needed.

    For example, let's say you have a set of Polynomial objects, where each Polynomial has a NumPy array of coefficients. While you could pickle that set, there's a much simpler idea: Just store a list instead of a set, and each element as a list of up to N coefficients instead of an object. Now you've got something you can easily store as JSON, or even just CSV.

    Sometimes, you can reduce everything to a single list or dict of strings, or of fixed records (each a list of N strings), or to things that you already know how to serialize to one of those structures. A list of strings without newlines can be serialized as just a plain text file, with each string on its own line. If there are newlines, you can escape them, or you can use something like netstrings. Fixed records are perfect for CSV. If you have a dict instead of a list, use dbm. For many applications, simple tools like that are all you need.

    Often, using one of those trivial formats at the top level, and JSON or pickle for relatively small but flexible objects underneath, is a good solution. The shelve module can automatically pickle simple objects into a dbm for you, but you can build similar things yourself relatively easily.

    When your data turn out to have more complex relationships, you may find out that you're better off thinking through your data model, separate from "just a bunch of objects". If you have tons of simple entities with complex relations, and need to search in terms of any of those relations, a relational database like sqlite3 is perfect. On the other hand, if your data are best represented as non-trivial documents with simple relations, and you need to search based on document structure, Couchdb might make more sense. Sometimes an object model like SQLAlchemy can help you connect your live object model to your data model. And so on.

    But whatever you have, once you've extracted the data model, it'll be a lot easier to decide how to serialize it.
    1

    View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.