opera file formats 文档格式说明

发布: 2010-04-25 18:19

这个页面在opera网站上已经找不到了,好不容易在google的快照中找到的,现复制到本地了。
原来的地址:http://www.opera.com/docs/fileformats/
快照 http://74.125.153.132/search?q=cache:pjwwhNh2jUEJ:www.peeep.us/4899d1fa+opera+file+formats+cookies4&cd=5&hl=zh-CN&ct=clnk


Opera File Formats



Introduction




This document describes the new binary file formats introduced with Opera
version 5 for the various files used in the cache and cookie management.




The new generic format that these files are based on is structured as a
sequence of tagged records with a given length. Each record may contain a
number of different data types such as strings and integers, as well as
arbitrary binary data, such as new records in this format.




The generic format is NOT backwards compatible with Opera 3.x and earlier,
but is intended to be reasonably backwards compatible for version 4.0 and
later. This means that if new fields are added by a future version, older
versions will still be able to read the information that they understand,
while ignoring the fields they do not understand.




There are some limits, but they are mostly concerned with the number of
significant bits in the integers that are used to indicate record lengths.
The formats are presently used to store the disk cache index file
(dcache4.url), the visited links file (vlink4.dat), the download rescue
file (download.dat) and the cookie file (cookies4.dat). The formats are
described in the following sequence:






The formats for the windows history, news files and global history
files are the same as the ones used by Opera v3.x and are outside
the scope of this document.






The Generic Binary Format



Data types




Integers used in the format are unsigned, and stored in
big-endian/network style (Most Significant Byte first). Integers
stored inside the records are also stored in the big-endian format,
but may be signed, and may be truncated.




The following datatypes are used in this document:




































int* Signed integer of * bits
uint* Unsigned integer of * bits
byte 8 bit unsigned value
string Sequence of characters (not null terminated)
time_t uint32 representing a time value in seconds since 00:00 Jan 1,
1970 GMT. The representation may be increased past 32 bit in the
future
tag_id_type Unsigned integer whose size is selected by the idtag_length header
field. The application must convert from this type to its internal
unsigned integer representation, preferably uint32. For more
information, see the file header format.
payload_length_type Unsigned integer whose size is selected by the length_length header
field. The application must convert from this type to its internal
unsigned integer representation, preferably uint32. For more
information, see the file header format.
record Separately defined sequence of fields



Data records




The general record format has this form:





struct record
{
// application specific tag to identify content type
tag_id_type tag_id;
// length of payload
payload_length_type length;
// Payload/content of the record
bytepayload[length];
};




NOTE: The number of bytes in tag and length may change,
see below.




The fields of each record have the following meanings:




tag_id



The identifier of the record. This value is application specific,
and can be used to indicate the meaning of the payload content.




The actual content type of the record depends on the definitions used
for the actual file or super-record.




Tag_id values in which the MSB (Most Significant Bit) is set to 1, are
reserved for records with implicit no length. The tag_id field is
NOT followed by a length field, nor a payload buffer. Such records are
used as Boolean flags: True if present, False if not present.




In the binary storage of a file this means that the MSB of the internal
storage integer must be stored as the MSB of the first byte in the tag
field. This places a limit on how many tags can be used for a given
tag_id integer length. When a file is read into a program, the program
must take care to move the MSB of the binary stored tag to a common
(internal) bit position, such as the MSB of the program's own
unsigned integers.



























bytes Max id available (excluding MSB)
1 0x7f
2 0x7fff
3 0x7fffff
4 0x7fffffff



While it is technically possible to use the same tag id (without
the MSB) for a normal record and a flag record (such as 0x0001 [16 bit
tag] for the payload record and 0x8001 for the flag record), this is
not encouraged.




length

This field is the number of bytes in the payload that immediately
follow the field. It may be zero.


payload



The payload is a sequence of bytes of the length indicated by the
length field.




The meaning of the contents is indicated by the definition for the
given record or file structure. Examples of organization may be an
array of records, unsigned integers, signed integers, or characters.




It is recommended that only records of the types described here are
used if the type of the data varies, as variable (un-tagged) type
formats tend to be inflexible and difficult to maintain across versions,
especially when compatibility with older versions is desired.




Single item integers (signed or unsigned) may be truncated (zero bytes
removed), but arrays of integers must always use a fixed number of
bytes to represent values and derive the number of items from the
payload length. If the number of bytes needed to represent the values
changes in a future version a new tag should be used.








File Format




These elements are not stored as records but directly in binary:





uint32 file_version_number;
uint32 app_version_number;
// number of bytes in the id tag, presently 1
uint16 idtag_length;
// number of bytes in the length part of a record, presently 2
uint16 length_length;
// array of records, number determined by length of file
struct record items[];




The present version number of the file format (file_version_number) is
0x00001000, where the lower 12 bits (bitmask 0x00000fff) represent the
minor version number, the rest is the major version number. Changes in the
minor version must not be used if the file format is changed in
such a manner that older versions of the software cannot read the file
successfully. If the major version number is newer (or older) than the
application can read, it must not read the file.




The integer sizes are absolute for a given major version, and the integer
size for the file version number is fixed in any version.




The "app_version_number" is the version number of the application and is
independent of the file_version_number. It may be used by the application
to determine necessary actions needed to provide forward or backward
compatibility that is outside the scope of the file formats. The
interpretation of the application version number is application dependent.




The "idtag_length" and "length_length" fields gives the number of bytes
used in the records for the idtags, as defined by the tag_id_type, and the
payload length fields, as defined by payload_length_type, respectively.




Specifically, the values of these fields define tag_id_type and
payload_length_type as the following integer types:


































Value tag_id_type payload_length_type
in idtag_length length_length
1 uint8 uint8
2 uint16 uint16
3 uint24 uint24
4 uint32 uint32



The application's internal representation of these types is not defined,
but uint32 is recommended. How an application should handle "idtag_length"
or "length_length" values larger than 4, or values larger than its internal
unsigned integer size, is not defined, but the application should
implement the rules specified in the forward
compatibility guide
for such situations.




Present versions of Opera 4.x uses idtag_length=1 (uint8) and
length_length=2 (uint16).




After the header, only records follow. The organization of the records and
their interpretation is application specific.






Forward Compatibility




An older version of an application using this file format that
is NOT able to use long integers should, regardless of this, try to
process the file, but should bypass the record if the tag of
the record's numerical value exceeds the version's own integer range,
i.e. the integer overflows. However, if the the length of the record
exceeds the application's limits on integers or buffer capabilities,
it must not continue to process the file.




All applications must ignore tag values that they do not
understand.



The Cache File Formats




This section details the record tags and formats used for the visited
link file (vlink4.dat), the disk cache index file (dcache4.url) and the
download rescue file (download.dat). The present app_version_number of
these files is 0x00020000 (major version 2, minor 0).




These files use records (of different tag values) which contain a sequence
of records with tags from the same set of tag ids. The different files
use these tags for their records:
























File Tag id Version number
Disk cache 0x01 0x00020000
Visited Links 0x02 0x00020000
Download 0x41 0x00020000



Each file consists of records of ONLY this type, with the exception
of the Disk cache index file, which also contains a single record with the
id 0x40, which contains a 5 character string used to find the next free
cache file number (oprXXXXX).




Each record is again a sequence of records with the same binary
representation format as the records in the file.






Common Elements Between All Files




These elements are used by all of the cache related files. In the case of
the visited links, these are the only fields presently used.




NOTE: "(0x0001 | MSB_VALUE)" means that the most
significant bit in the local unsigned integer is to be set. If 32 bit
values are used, that means the tag's value is 0x80000001.





























Tag ID Contents Meaning
0x0003 string The name of the URL, fully qualified
0x0004 time_t Last visited
(0x000b | MSB_VALUE) flag The URL is a result of a form query
0x0022 record Contains the name and last visited time of relative link in the
document. May repeat



Content tags of relative link (tag 0x0022) records



















Tag ID Contents Meaning
0x0023 string The name of the relative link
0x0024 time_t Last visited





Fields Used by Disk Cache and Download Rescue Files






















































Tag ID Contents Meaning
0x0005 time_t Localtime, when the file was last loaded, not GMT
0x0007 uint8


Status of load:



2

Loaded

4

Loading aborted

5

Loading failed


0x0008 uint32 Content size
0x0009 string MIME type of content
0x000a string Character set of content
(0x000c | MSB_VALUE) flag File is downloaded and stored locally on user's disk, and is not
part of the disk cache directory
0x000d string Name of file (cache files: only local to cache directory)
(0x000f | MSB_VALUE) flag Always check if modified
0x0010 record Contains the HTTP protocol specific information



Fields used only by the download rescuefile
























Tag ID Contents Meaning
0x0028 time_t Identifies the time when the loading of the last/previous segment
of the downloaded file started.
0x0029 time_t Identifies the time when the loading of the last/previous segment
of the downloaded file was stopped.
0x002A uint32 How many bytes were in the previous segement of the file being
downloaded. If the time the loading ended is not known, this value will
be assumed to be zero (0) and the download speed set to zero
(unknown).



Fields used in the HTTP protocol specific record




All methods are by default GET, at present it is not possible to cache
POST requests.






























































































Tag ID Contents Meaning
0x0015 string HTTP date header
0x0016 time_t Expiry date
0x0017 string Last modified date
0x0018 string MIME type of document
0x0019 string Entity tag
0x001A string Moved to URL (Location header)
0x001B string Response line text
0x001C uint32 Response code
0x001D string Refresh URL
0x001E uint32 Refresh delta time
0x001F string Suggested file name
0x0020 string Content Encodings
0x0021 string Content Location
0x0025 uint32


Together with tag 0x0026 (both must be present) this identifies the
User Agent string last used to load the resource. This value identifies
the User Agent string.



This value is used internally, and should not be modified.


0x0026 uint32


Together with tag 0x0025 (both must be present) this identifies the
User Agent string last used to load the resource. This value identifies
the User Agent sub version.


This value is used internally, and should not be modified.


(0x0030 | MSB_VALUE) flag Reserved for future use
(0x0031 | MSB_VALUE) flag Reserved for future use





Cookie File format




This section describes the record tags and formats used for the storage
of cookies (cookies4.dat). The present app_version_number of this file
type is 0x00002000 (major version 2, minor 0).




The cookie file is organized as a tree of domain name components, each
component then holds a tree of path components and each path component
may contain a number of cookies.




NOTE: The components are a sequence of records, teminated
with a flag record, not a single record.






Structure



Domain components




The domain components are used to organize the cookies for each server
and domain for which cookies or cookie filtering capabilities are defined.




A domain component is started with a domain record, which holds the domain
name and some flags for that particular domain. It is then followed by a
path component holding the cookies and subdirectory path components (and
cookies), followed with a path component terminator and any number of
subdomain components before it is terminated by a domain-end flag record.




E.g: cookies for the domain www.opera.com will be stored in this manner:





["com" record]
["opera" record]
["www" record
[cookies]
[Path components]
[Path component terminator]
[other domains]
[end of domain flag ("www")]
[end of domain flag ("opera")]
[end of domain flag ("com")]




All names of domain components are non-dotted, except IP addresses, which
can only be stored with the complete IP address as a Quad dotted string,
e.g. "10.11.12.13", are stored at the top level, and cannot contain any
subdomains.




A Domain Record uses the tag "0x01" and contains a sequence of these fields:





























Tag ID Contents Meaning
0x001E string The name of the domain part
0x001F int8


How cookies are filtered for this domain. If not present, the filtering
of the parent domain is used.



  1. All cookies from this domain are accepted.

  2. No cookies from this domain are accepted.

  3. All cookies from this server are accepted. Overrides 1 and 2 for
    higher level domains automatics.

  4. No cookies from this server are accepted. Overrides 1 and 2 for
    higher level domains.



Domain settings apply to all subdomains, except those with a server
specific selection.


0x0021 int8
Handling of cookies that have explicit paths which do not match the
URL setting the cookies. If enabled in the privacy preferences the
default is to warn the user, but when warning is enabled such cookies
can be filtered by their domains: Value 1 indicates reject, and 2 is
accept automatically.
0x0025 int8


While in the "Warn about third party cookies" mode, this field can be
used to automatically filter such cookies.



  1. All third party cookies from this domain are accepted.

  2. No third party cookies from this domain are accepted.

  3. All third party cookies from this server are accepted. Overrides
    1 and 2 for higher level domains automatics.

  4. No third party cookies from this server are accepted. Overrides
    1 and 2 for higher level domains.



Domain settings apply to all subdomains, except those with a server
specific selection.





This record can be followed by zero or more path components defining
toplevel paths on servers in the domain and always terminated by
a path component terminator record. Then zero or more domain components
may follow.




A domain component is terminated by a (0x0004 | MSB_VALUE) flag record.




Path Components




The path components organize the cookies defined for a given directory in
a given domain, as well any subdirectories of this directory that have
cookies defined.




Except for the path component starting immediately after the
domain component record, each path component always starts with a path
record, and is then followed by any number of cookie records and
subdirectory path components.




The path record uses the record id "0x0002" and the record has this field
record:














Tag ID Contents Meaning
0x001D string The name of the path part



The path component terminator is the (0x0005 | MSB_VALUE) flag record.




Cookie Records




The cookie entries are stored in records of type "0x0003" and have the
following field records:






























































































Tag ID Contents Meaning
0x0010 string The name of the cookie
0x0011 string The value of the cookie
0x0012 time_t Expiry date
0x0013 time_t Last used
0x0014 string Comment/Description of use (RFC 2965)
0x0015 string URL for Comment/Description of use (RFC 2965)
0x0016 string The domain received with version=1 cookies (RFC 2965)
0x0017 string The path received with version=1 cookies (RFC 2965)
0x0018 string The port limitations received with version=1 cookies (RFC 2965)
(0x0019 | MSB_VALUE) flag The cookie will only be sent to HTTPS servers.
0x001A int8+ Version number of cookie (RFC 2965)
(0x001B | MSB_VALUE) flag This cookie will only be sent to the server that sent it.
(0x001C | MSB_VALUE) flag Reserved for delete protection: Not yet implemented
(0x0020 | MSB_VALUE) flag This cookie will not be sent if the path is only a prefix of the
URL. If the path is /foo, /foo/bar will match but not /foobar.
(0x0022 | MSB_VALUE) flag If true, this cookie was set as the result of a password login
form, or by a URL that was retrieved using a cookie that can be
tracked back to such a cookie.
(0x0023 | MSB_VALUE) flag If true, this cookie was set as the result of a HTTP authentication
login, or by a URL that was retrieved using a cookie that can be
tracked back to such a cookie.
(0x0024 | MSB_VALUE) flag


In "Display Third party cookies" mode this flag will be set if
the cookie was set by a third party server, and only these cookies
will be sent if the URL is a third party. Cookies that were received
when loading a URL from the server directly will not be sent to third
party URLs in this mode. The reverse is NOT true.



NOTE: If a third party server redirects back to the
first party server, the redirected URL is considered third party.





原文: http://qtchina.tk/?q=node/428

Powered by zexport