|
| (This is a early draft!) Default TLA extension = .bin Binary Header/Octet-Stream Header/Binary ID Wrapper: The header is almost identical to the PNG one. The first eight bytes of a BIN file always contain the following values:
(decimal) 137 66 73 78 13 10 26 10 This signature both identifies the file as a BIN file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish BIN files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be miss-recognized as a BIN file also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem. After the 8 byte BINID header the following/next 1 byte indicate the length (0-255) of the text string that follows. To clarify some more: The text string is the name of the actual format contents. This was chosen instead of using 4byte/32bit binary file id, or relying on file extensions etc. or equally silly methods. At least this way one probably won't run out of ids etc. And it's also humanly readable. (the text string) So any program can just simply show the text string to the user and state it's unsupported instead of screaming it's an unknown format. After the zero termination, the actual data starts. What this data is can be anything, the byte order could be Network/Motorola order or Intel order. it could have it's own sub header indicating file size and checksum etc. or it could be raw data. In fact it could also be another fileformat dumped right in, otherwise unchanged. IMPORTANT! Since the binid is plain text, it's obvious that "MPEG-1 Layer-3 Audio (MP3)" and "mpeg-1 layer-3 audio (mp3)" are the same humanly speaking, however... To avoid ANY issues or confusions, when creating and checking/verifying a id string, ALWAYS treat it as a binary id. i.e basic binary compare. In other words, typos are NOT acceptable. The only reason it IS a text string is to avoid using up a limited 32bit id field. To avoid motorola/intel byte order issues with id numbers. To have something to inform (show) the user in those situations where the software don't support the fileformat. This way it's so easy to inform the user and they will be able to search on the net or ask the company about the file format. So remember. "MPEG" and "mpeg" are the exact same, but only ONE of them is acceptable. It is also strongly advised against using version numbers in the id. Instead try to keep such stuff to the content/data area itself (or the wrapped format). Another interesting thing is that the .bin is not really needed. You can call it .dat or whatever you want. If it's a mp3 file you might wanna use this nifty binary header but call it .mp3 instead which might be useful on a human and filesystem sorting level. Although a modern OS or software should read the textual binid field and show that instead. The Binary header is basically a filetype/id header, stating that this is a binary file. (first 8 bytes header) and then the filetype/content data id in the form of a nullterminated string. (string length indicated by a single byte right before the string itself) So if the filetype is "Cool Test File" and is obviously a non existing filetype/format. The filetype name length is 14 bytes. The string is 0 terminated. so the string is actually 15 bytes in this case. And thus the entire file format header is 25 bytes in this example. (BINID header, byte indicating ID length, the UTF-8 filetype name, null terminator.) A mp3 file which itself don't really have a easily identifiable header could be just like this example. Except as filetype it would say "MPEG-1 Layer-3 Audio (MP3)" and the text length byte would be the value 26 naturally. And after the 0 termination byte, you would have a typical mp3 file. So the full BINID header would be 8+1+26+1 for a total of 36 bytes before the MP3 begins. Features/Why You should use it/cool points: It is named rather neutrally. it's simply BIN actually as you saw earlier. So your customers won't wonder why "your" fileformat is named after some other company etc. The file extension is almost redundant. But it should at least be .bin or the actual filetype extension (i.e for a MP3 it would be .mp3) to remain familiar to users/compatible. This fileformat is FREE to use, no licensing needed at all (zlib license), there never will be, no patent issues or anything like that. This format is FROZEN/LOCKED. Meaning it will never change. Ever. It's just a top layer binid. Why: just some examples: "Core Dump (Intel Byte Order)" "MPEG-1 Layer-3 (MP3)" "Windows Media Video (WMV)" "JPEG Image" ETC. Yeah! The "description" itself becomes the actual BINID. Hopefully web browsers will catch on quickly, as this would allow a fast and easy way to id files that don't have a proper mime type. So in the future a mime type "application/octet-stream" won't be as mysterious any longer. Another advantage I forgot to mention, is that since it's basically a file ID "slap on" header, it is so easy to add it on the fly. I.e a webserver could even add it to the start of a file/stream that is sent to the browsers in case the file don't originally have a BIN header... (C) 2010 Roger Hågensen, zlib license. | ||||||||||||
© Roger Hågensen, EmSai™ 2012 | |||||||||||||