[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: 'Filing' structure
Francesco Orsenigo wrote:
For a variety of reasons (saving info or sending it to a client) i must
translate this information in a machine indpiendent (and run-time
indipendent) format, ie i must translate all pointers to integer vector index
and all integers to either big-endian or little endian, assuming that chars,
floats and doubles have the same internal rapresentation on all kind of
machines my game is likely to run.
I'd like some suggestion about the topic, as making this in plain C seems very
boring: for each structure i must write a fucntion to convert it to a
unsigned char array and another to convert it back (checking for possible
invalid values): the smallest change i make on the structure must be applied
to the functions also...
That's pretty much it.
It's tedious.
V-e-r-y t-e-d-i-o-u-s.
I once wrote a tool that would parse the C header files and auto-generate that
stuff - but parsing arbitary C code is *HARD* (and doing it for C++ is almost
impossible - you end up writing an entire C++ compiler!) so it wasn't a particularly
reliable process.
Bear in mind another portability issue - which is that the padding of
structures may be different between hardware, OS's or even compilers.
eg:
struct Whatever
{
char x ;
int y ;
short z ;
} ;
...this structure might be 7 bytes long on some machines. On others
it might be 8, on others 10 because of alignment issues for that 'int'.
Arrays of things may also have padding between array elements - so an
array of 10 of these things might be 80 bytes long - even though the
sizeof(struct Whatever) is only 7 bytes.
So, if you try:
struct Whatever my_whatever;
fread ( & my_whatever, sizeof(struct Whatever), 1, fd ) ;
...then things will go horribly wrong. There isn't a *nice* way to
solve that either. There are broadly four approaches:
1) Always write your structures with the largest elements first (ie put all
the double's at the top of the structure, then all the int's and float's,
then the shorts and finally, the char's. This theoretically allows the
compiler to pack everything without padding - although there is no
guarantee of that.
2) Always insert worst-case padding yourself:
struct Whatever
{
char x ;
char useless_padding [ 3 ] ;
int y ;
short z ;
char even_more_useless_padding [ 2 ] ;
} ;
3) Read and write structures one element at a time:
fread ( & (my_whatever.x), sizeof(char), 1, fd ) ;
fread ( & (my_whatever.y), sizeof(int), 1, fd ) ;
fread ( & (my_whatever.z), sizeof(short), 1, fd ) ;
4) Don't use binary files at all. Use XML or some other ASCII format.
All of those are either horribly tedious to write - or horribly error-prone.
I generally do (3) or (4). Since you'll want to byte-swap files written
on a little-endian machine and read on big-endian (or vice-versa), you'll
end up with code for each stupid little field in the structure to swap it.
You might as well put the byte-swap code into the same function that reads
the field - so you end up with code that looks like:
fread_char_and_byte_swap ( & (my_whatever.x), 1, fd )
fread_int_and_byte_swap ( & (my_whatever.y), 1, fd )
fread_short_and_byte_swap ( & (my_whatever.z), 1, fd )
...but it's still tedious and error-prone. :-(
For pointers, bear in mind that the SIZE of a pointer is different
between machines (32 bit on 32 bit CPUs - 64 bit on 64 bit CPUs).
For C++ code be VERY CAREFUL about writing out classes that have
virtual member functions because they contain hidden pointers.
To keep pointers straight, I typically build up a table of all
the addresses used in the file as I write the file out - converting
pointers into indices into that table. As I read the file, I
recreate the table and use it to convert indices back into addresses.
It's complicated though.
If you need to do it **FAST**, you might want to consider using a
portable format (like XML/ASCII) and have the program read the file
in from XML - then write it back out in a machine-specific (but fast)
binary format. The first time the program runs, it'll take a long
time - but subsequently it can read the binary format and run much
more quickly.
Altogether, this is a miserable process. In an ideal world, we'd
have compiler support for this.
---------------------------- Steve Baker -------------------------
HomeEmail: <sjbaker1@airmail.net> WorkEmail: <sjbaker@link.com>
HomePage : http://www.sjbaker.org
Projects : http://plib.sf.net http://tuxaqfh.sf.net
http://tuxkart.sf.net http://prettypoly.sf.net
-----BEGIN GEEK CODE BLOCK-----
GCS d-- s:+ a+ C++++$ UL+++$ P--- L++++$ E--- W+++ N o+ K? w--- !O M-
V-- PS++ PE- Y-- PGP-- t+ 5 X R+++ tv b++ DI++ D G+ e++ h--(-) r+++ y++++
-----END GEEK CODE BLOCK-----