(In Python) Can you save an object that is in memory to disk and reload it at a later time?

spacemanspiffy@lemmy.world · 12 days ago

(In Python) Can you save an object that is in memory to disk and reload it at a later time?

solrize@lemmy.world · 12 days ago

The quick answer is to use a serialization/deserialization library like pickle. You can’t just dump a binary image and reload it in any simple way.

UnfortunateShort@lemmy.world · 12 days ago

I think pickle is what you want.

Keep in mind that this might have a huge performance impact if you do it all the time - it’s still IO even when it’s not parsing.

spacemanspiffy@lemmy.world · 12 days ago

My idea would be to load one larger file one time and not parse anything, and keep it in memory the entire time. Versus what it does now which is load the files and parse them and keep everything in memory.

But three people responding here so far with “pickle” so maybe that is the way.

UnfortunateShort@lemmy.world · 11 days ago

You can stuff all the info into an object and use it this way, no problem. I just wanted to point out that this doesn’t have zero performance impact compared to what you currently have.

So (depending on how your OS caches files) you might not want to do this like twice in a lambda that you pass to an iterator over a huge slice or something.

kevincox@lemmy.ml · 10 days ago

I don’t want the end executable to have to bundle these files and re-parse them each time it gets run.

No matter how you persist data you will need to re-parse it. The question is really just if the new format is more efficient to read than the old format. Some formats such as FlatBuffers and Cap'n Proto are designed to have very efficient loading processes.

(Well technically you could persist the process image to disk, but this tends to be much larger than serialized data would be and has issues such as defeating ASLR. This is very rarely done.)

Lots of people are talking about Pickle. But it isn’t particularly fast. That being side with Python you can’t expect much to start with.

CameronDev@programming.dev · 12 days ago

You are describing pickle, but it does come with some serious risks, especially if the file can be modified by a third party.

https://arjancodes.com/blog/python-pickle-module-security-risks-and-safer-alternatives/

I’d suggest using protobuf or similar instead, but its a bit more work.

12 days ago

Not anything more efficient than just serialising and deserialising the data u want to load.

Im sure u could use pickle to do somthing to store the entire object graph but opens u up for all kinds of exploits (arbitrary code execution etc).

WolfLink@sh.itjust.works · 12 days ago

What is the “executable” in this context? I’m kinda confused as to what you are looking for.

What’s wrong with parsing the input files at runtime? Is it performance? Do you want one file to load instead of multiple?

Many have suggested pickle, which is kinda what you are asking for, but on some level it’s not much different from parsing the input files. Also, depending on your code, you may have to write custom serialization code as part of getting pickle to work.

Note that pretty much every modern game is a bundle of often multiple pieces of executable code alongside a whole bunch of separate assets.

GBU_28@lemm.ee · 11 days ago

Pickle.

But, depending on the needs, writing to SQLite can be blazing fast and you could store your data as BLOB as needed

jerkface@lemmy.ca · 12 days ago

The Zope Object Data Base (aka ZODB) exists for more complex persistence use cases. It’s been a long time, though, there are probably more modern options.

jerkface@lemmy.ca · 12 days ago

I took a closer look at what you are asking for and no, you cannot hand a reference to a python structure to a library and have it write the binary data from memory out to disk, then read that same binary data back into living Python instances later. That’s just not how Python works. For one thing, any such structure is full of pointers which would be invalid unless you re-load to the same address in memory, which is not practical. You have to serialize and de-serialize.