这个页面在opera网站上已经找不到了,好不容易在google的快照中找到的,现复制到本地了。
原来的地址:http://www.opera.com/docs/fileformats/
快照 http://74.125.153.132/search?q=cache:pjwwhNh2jUEJ:www.peeep.us/4899d1fa+opera+file+formats+cookies4&cd=5&hl=zh-CN&ct=clnk
Opera File Formats
Introduction
This document describes the new binary file formats introduced with Opera
version 5 for the various files used in the cache and cookie management.
The new generic format that these files are based on is structured as a
sequence of tagged records with a given length. Each record may contain a
number of different data types such as strings and integers, as well as
arbitrary binary data, such as new records in this format.
The generic format is NOT backwards compatible with Opera 3.x and earlier,
but is intended to be reasonably backwards compatible for version 4.0 and
later. This means that if new fields are added by a future version, older
versions will still be able to read the information that they understand,
while ignoring the fields they do not understand.
There are some limits, but they are mostly concerned with the number of
significant bits in the integers that are used to indicate record lengths.
The formats are presently used to store the disk cache index file
(dcache4.url), the visited links file (vlink4.dat), the download rescue
file (download.dat) and the cookie file (cookies4.dat). The formats are
described in the following sequence:
The formats for the windows history, news files and global history
files are the same as the ones used by Opera v3.x and are outside
the scope of this document.
The Generic Binary Format
Data types
Integers used in the format are unsigned, and stored in
big-endian/network style (Most Significant Byte first). Integers
stored inside the records are also stored in the big-endian format,
but may be signed, and may be truncated.
The following datatypes are used in this document:
int* |
Signed integer of * bits |
uint* |
Unsigned integer of * bits |
byte |
8 bit unsigned value |
string |
Sequence of characters (not null terminated) |
time_t |
uint32 representing a time value in seconds since 00:00 Jan 1,
1970 GMT. The representation may be increased past 32 bit in the
future |
tag_id_type |
Unsigned integer whose size is selected by the idtag_length header
field. The application must convert from this type to its internal
unsigned integer representation, preferably uint32. For more
information, see the file header format. |
payload_length_type |
Unsigned integer whose size is selected by the length_length header
field. The application must convert from this type to its internal
unsigned integer representation, preferably uint32. For more
information, see the file header format. |
record |
Separately defined sequence of fields |
Data records
The general record format has this form:
struct record
{
tag_id_type tag_id;
payload_length_type length;
bytepayload[length];
};
NOTE: The number of bytes in tag and length may change,
see below.
The fields of each record have the following meanings:
- tag_id
The identifier of the record. This value is application specific,
and can be used to indicate the meaning of the payload content.
The actual content type of the record depends on the definitions used
for the actual file or super-record.
Tag_id values in which the MSB (Most Significant Bit) is set to 1, are
reserved for records with implicit no length. The tag_id field is
NOT followed by a length field, nor a payload buffer. Such records are
used as Boolean flags: True if present, False if not present.
In the binary storage of a file this means that the MSB of the internal
storage integer must be stored as the MSB of the first byte in the tag
field. This places a limit on how many tags can be used for a given
tag_id integer length. When a file is read into a program, the program
must take care to move the MSB of the binary stored tag to a common
(internal) bit position, such as the MSB of the program's own
unsigned integers.
bytes |
Max id available (excluding MSB) |
---|
1 |
0x7f |
2 |
0x7fff |
3 |
0x7fffff |
4 |
0x7fffffff |
While it is technically possible to use the same tag id (without
the MSB) for a normal record and a flag record (such as 0x0001 [16 bit
tag] for the payload record and 0x8001 for the flag record), this is
not encouraged.
- length
- This field is the number of bytes in the payload that immediately
follow the field. It may be zero.
- payload
The payload is a sequence of bytes of the length indicated by the
length field.
The meaning of the contents is indicated by the definition for the
given record or file structure. Examples of organization may be an
array of records, unsigned integers, signed integers, or characters.
It is recommended that only records of the types described here are
used if the type of the data varies, as variable (un-tagged) type
formats tend to be inflexible and difficult to maintain across versions,
especially when compatibility with older versions is desired.
Single item integers (signed or unsigned) may be truncated (zero bytes
removed), but arrays of integers must always use a fixed number of
bytes to represent values and derive the number of items from the
payload length. If the number of bytes needed to represent the values
changes in a future version a new tag should be used.
These elements are not stored as records but directly in binary:
uint32 file_version_number;
uint32 app_version_number;
uint16 idtag_length;
uint16 length_length;
struct record items[];
The present version number of the file format (file_version_number) is
0x00001000, where the lower 12 bits (bitmask 0x00000fff) represent the
minor version number, the rest is the major version number. Changes in the
minor version must not be used if the file format is changed in
such a manner that older versions of the software cannot read the file
successfully. If the major version number is newer (or older) than the
application can read, it must not read the file.
The integer sizes are absolute for a given major version, and the integer
size for the file version number is fixed in any version.
The "app_version_number" is the version number of the application and is
independent of the file_version_number. It may be used by the application
to determine necessary actions needed to provide forward or backward
compatibility that is outside the scope of the file formats. The
interpretation of the application version number is application dependent.
The "idtag_length" and "length_length" fields gives the number of bytes
used in the records for the idtags, as defined by the tag_id_type, and the
payload length fields, as defined by payload_length_type, respectively.
Specifically, the values of these fields define tag_id_type and
payload_length_type as the following integer types:
Value |
tag_id_type |
payload_length_type |
---|
in |
idtag_length |
length_length |
1 |
uint8 |
uint8 |
2 |
uint16 |
uint16 |
3 |
uint24 |
uint24 |
4 |
uint32 |
uint32 |
The application's internal representation of these types is not defined,
but uint32 is recommended. How an application should handle "idtag_length"
or "length_length" values larger than 4, or values larger than its internal
unsigned integer size, is not defined, but the application should
implement the rules specified in the forward
compatibility guide for such situations.
Present versions of Opera 4.x uses idtag_length=1 (uint8) and
length_length=2 (uint16).
After the header, only records follow. The organization of the records and
their interpretation is application specific.
Forward Compatibility
An older version of an application using this file format that
is NOT able to use long integers should, regardless of this, try to
process the file, but should bypass the record if the tag of
the record's numerical value exceeds the version's own integer range,
i.e. the integer overflows. However, if the the length of the record
exceeds the application's limits on integers or buffer capabilities,
it must not continue to process the file.
All applications must ignore tag values that they do not
understand.
The Cache File Formats
This section details the record tags and formats used for the visited
link file (vlink4.dat), the disk cache index file (dcache4.url) and the
download rescue file (download.dat). The present app_version_number of
these files is 0x00020000 (major version 2, minor 0).
These files use records (of different tag values) which contain a sequence
of records with tags from the same set of tag ids. The different files
use these tags for their records:
File |
Tag id |
Version number |
---|
Disk cache |
0x01 |
0x00020000 |
Visited Links |
0x02 |
0x00020000 |
Download |
0x41 |
0x00020000 |
Each file consists of records of ONLY this type, with the exception
of the Disk cache index file, which also contains a single record with the
id 0x40, which contains a 5 character string used to find the next free
cache file number (oprXXXXX).
Each record is again a sequence of records with the same binary
representation format as the records in the file.
Common Elements Between All Files
These elements are used by all of the cache related files. In the case of
the visited links, these are the only fields presently used.
NOTE: "(0x0001 | MSB_VALUE)" means that the most
significant bit in the local unsigned integer is to be set. If 32 bit
values are used, that means the tag's value is 0x80000001.
Tag ID |
Contents |
Meaning |
---|
0x0003 |
string |
The name of the URL, fully qualified |
0x0004 |
time_t |
Last visited |
(0x000b | MSB_VALUE) |
flag |
The URL is a result of a form query |
0x0022 |
record |
Contains the name and last visited time of relative link in the
document. May repeat |
Content tags of relative link (tag 0x0022) records
Tag ID |
Contents |
Meaning |
---|
0x0023 |
string |
The name of the relative link |
0x0024 |
time_t |
Last visited |
Fields Used by Disk Cache and Download Rescue Files
Tag ID |
Contents |
Meaning |
---|
0x0005 |
time_t |
Localtime, when the file was last loaded, not GMT |
0x0007 |
uint8 |
Status of load:
- 2
- Loaded
- 4
- Loading aborted
- 5
- Loading failed
|
0x0008 |
uint32 |
Content size |
0x0009 |
string |
MIME type of content |
0x000a |
string |
Character set of content |
(0x000c | MSB_VALUE) |
flag |
File is downloaded and stored locally on user's disk, and is not
part of the disk cache directory |
0x000d |
string |
Name of file (cache files: only local to cache directory) |
(0x000f | MSB_VALUE) |
flag |
Always check if modified |
0x0010 |
record |
Contains the HTTP protocol specific information |
Fields used only by the download rescuefile
Tag ID |
Contents |
Meaning |
---|
0x0028 |
time_t |
Identifies the time when the loading of the last/previous segment
of the downloaded file started. |
0x0029 |
time_t |
Identifies the time when the loading of the last/previous segment
of the downloaded file was stopped. |
0x002A |
uint32 |
How many bytes were in the previous segement of the file being
downloaded. If the time the loading ended is not known, this value will
be assumed to be zero (0) and the download speed set to zero
(unknown). |
Fields used in the HTTP protocol specific record
All methods are by default GET, at present it is not possible to cache
POST requests.
Tag ID |
Contents |
Meaning |
---|
0x0015 |
string |
HTTP date header |
0x0016 |
time_t |
Expiry date |
0x0017 |
string |
Last modified date |
0x0018 |
string |
MIME type of document |
0x0019 |
string |
Entity tag |
0x001A |
string |
Moved to URL (Location header) |
0x001B |
string |
Response line text |
0x001C |
uint32 |
Response code |
0x001D |
string |
Refresh URL |
0x001E |
uint32 |
Refresh delta time |
0x001F |
string |
Suggested file name |
0x0020 |
string |
Content Encodings |
0x0021 |
string |
Content Location |
0x0025 |
uint32 |
Together with tag 0x0026 (both must be present) this identifies the
User Agent string last used to load the resource. This value identifies
the User Agent string.
This value is used internally, and should not be modified.
|
0x0026 |
uint32 |
Together with tag 0x0025 (both must be present) this identifies the
User Agent string last used to load the resource. This value identifies
the User Agent sub version.
This value is used internally, and should not be modified.
|
(0x0030 | MSB_VALUE) |
flag |
Reserved for future use |
(0x0031 | MSB_VALUE) |
flag |
Reserved for future use |
Cookie File format
This section describes the record tags and formats used for the storage
of cookies (cookies4.dat). The present app_version_number of this file
type is 0x00002000 (major version 2, minor 0).
The cookie file is organized as a tree of domain name components, each
component then holds a tree of path components and each path component
may contain a number of cookies.
NOTE: The components are a sequence of records, teminated
with a flag record, not a single record.
Structure
Domain components
The domain components are used to organize the cookies for each server
and domain for which cookies or cookie filtering capabilities are defined.
A domain component is started with a domain record, which holds the domain
name and some flags for that particular domain. It is then followed by a
path component holding the cookies and subdirectory path components (and
cookies), followed with a path component terminator and any number of
subdomain components before it is terminated by a domain-end flag record.
E.g: cookies for the domain www.opera.com will be stored in this manner:
["com" record]
["opera" record]
["www" record
[cookies]
[Path components]
[Path component terminator]
[other domains]
[end of domain flag ("www")]
[end of domain flag ("opera")]
[end of domain flag ("com")]
All names of domain components are non-dotted, except IP addresses, which
can only be stored with the complete IP address as a Quad dotted string,
e.g. "10.11.12.13", are stored at the top level, and cannot contain any
subdomains.
A Domain Record uses the tag "0x01" and contains a sequence of these fields:
Tag ID |
Contents |
Meaning |
---|
0x001E |
string |
The name of the domain part |
0x001F |
int8 |
How cookies are filtered for this domain. If not present, the filtering
of the parent domain is used.
- All cookies from this domain are accepted.
- No cookies from this domain are accepted.
- All cookies from this server are accepted. Overrides 1 and 2 for
higher level domains automatics.
- No cookies from this server are accepted. Overrides 1 and 2 for
higher level domains.
Domain settings apply to all subdomains, except those with a server
specific selection.
|
0x0021 |
int8 |
Handling of cookies that have explicit paths which do not match the
URL setting the cookies. If enabled in the privacy preferences the
default is to warn the user, but when warning is enabled such cookies
can be filtered by their domains: Value 1 indicates reject, and 2 is
accept automatically.
|
0x0025 |
int8 |
While in the "Warn about third party cookies" mode, this field can be
used to automatically filter such cookies.
- All third party cookies from this domain are accepted.
- No third party cookies from this domain are accepted.
- All third party cookies from this server are accepted. Overrides
1 and 2 for higher level domains automatics.
- No third party cookies from this server are accepted. Overrides
1 and 2 for higher level domains.
Domain settings apply to all subdomains, except those with a server
specific selection.
|
This record can be followed by zero or more path components defining
toplevel paths on servers in the domain and always terminated by
a path component terminator record. Then zero or more domain components
may follow.
A domain component is terminated by a (0x0004 | MSB_VALUE) flag record.
Path Components
The path components organize the cookies defined for a given directory in
a given domain, as well any subdirectories of this directory that have
cookies defined.
Except for the path component starting immediately after the
domain component record, each path component always starts with a path
record, and is then followed by any number of cookie records and
subdirectory path components.
The path record uses the record id "0x0002" and the record has this field
record:
Tag ID |
Contents |
Meaning |
---|
0x001D |
string |
The name of the path part |
The path component terminator is the (0x0005 | MSB_VALUE) flag record.
Cookie Records
The cookie entries are stored in records of type "0x0003" and have the
following field records:
Tag ID |
Contents |
Meaning |
---|
0x0010 |
string |
The name of the cookie |
0x0011 |
string |
The value of the cookie |
0x0012 |
time_t |
Expiry date |
0x0013 |
time_t |
Last used |
0x0014 |
string |
Comment/Description of use (RFC 2965) |
0x0015 |
string |
URL for Comment/Description of use (RFC 2965) |
0x0016 |
string |
The domain received with version=1 cookies (RFC 2965) |
0x0017 |
string |
The path received with version=1 cookies (RFC 2965) |
0x0018 |
string |
The port limitations received with version=1 cookies (RFC 2965) |
(0x0019 | MSB_VALUE) |
flag |
The cookie will only be sent to HTTPS servers. |
0x001A |
int8+ |
Version number of cookie (RFC 2965) |
(0x001B | MSB_VALUE) |
flag |
This cookie will only be sent to the server that sent it. |
(0x001C | MSB_VALUE) |
flag |
Reserved for delete protection: Not yet implemented |
(0x0020 | MSB_VALUE) |
flag |
This cookie will not be sent if the path is only a prefix of the
URL. If the path is /foo, /foo/bar will match but not /foobar. |
(0x0022 | MSB_VALUE) |
flag |
If true, this cookie was set as the result of a password login
form, or by a URL that was retrieved using a cookie that can be
tracked back to such a cookie. |
(0x0023 | MSB_VALUE) |
flag |
If true, this cookie was set as the result of a HTTP authentication
login, or by a URL that was retrieved using a cookie that can be
tracked back to such a cookie. |
(0x0024 | MSB_VALUE) |
flag |
In "Display Third party cookies" mode this flag will be set if
the cookie was set by a third party server, and only these cookies
will be sent if the URL is a third party. Cookies that were received
when loading a URL from the server directly will not be sent to third
party URLs in this mode. The reverse is NOT true.
NOTE: If a third party server redirects back to the
first party server, the redirected URL is considered third party.
|
|