1 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
---|
2 | <html>
|
---|
3 | <head>
|
---|
4 |
|
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
|
---|
6 | <title>Ogg Vorbis Documentation</title>
|
---|
7 |
|
---|
8 | <style type="text/css">
|
---|
9 | body {
|
---|
10 | margin: 0 18px 0 18px;
|
---|
11 | padding-bottom: 30px;
|
---|
12 | font-family: Verdana, Arial, Helvetica, sans-serif;
|
---|
13 | color: #333333;
|
---|
14 | font-size: .8em;
|
---|
15 | }
|
---|
16 |
|
---|
17 | a {
|
---|
18 | color: #3366cc;
|
---|
19 | }
|
---|
20 |
|
---|
21 | img {
|
---|
22 | border: 0;
|
---|
23 | }
|
---|
24 |
|
---|
25 | #xiphlogo {
|
---|
26 | margin: 30px 0 16px 0;
|
---|
27 | }
|
---|
28 |
|
---|
29 | #content p {
|
---|
30 | line-height: 1.4;
|
---|
31 | }
|
---|
32 |
|
---|
33 | h1, h1 a, h2, h2 a, h3, h3 a {
|
---|
34 | font-weight: bold;
|
---|
35 | color: #ff9900;
|
---|
36 | margin: 1.3em 0 8px 0;
|
---|
37 | }
|
---|
38 |
|
---|
39 | h1 {
|
---|
40 | font-size: 1.3em;
|
---|
41 | }
|
---|
42 |
|
---|
43 | h2 {
|
---|
44 | font-size: 1.2em;
|
---|
45 | }
|
---|
46 |
|
---|
47 | h3 {
|
---|
48 | font-size: 1.1em;
|
---|
49 | }
|
---|
50 |
|
---|
51 | li {
|
---|
52 | line-height: 1.4;
|
---|
53 | }
|
---|
54 |
|
---|
55 | #copyright {
|
---|
56 | margin-top: 30px;
|
---|
57 | line-height: 1.5em;
|
---|
58 | text-align: center;
|
---|
59 | font-size: .8em;
|
---|
60 | color: #888888;
|
---|
61 | clear: both;
|
---|
62 | }
|
---|
63 | </style>
|
---|
64 |
|
---|
65 | </head>
|
---|
66 |
|
---|
67 | <body>
|
---|
68 |
|
---|
69 | <div id="xiphlogo">
|
---|
70 | <a href="https://xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a>
|
---|
71 | </div>
|
---|
72 |
|
---|
73 | <h1>Ogg logical bitstream framing</h1>
|
---|
74 |
|
---|
75 | <h2>Ogg bitstreams</h2>
|
---|
76 |
|
---|
77 | <p>The Ogg transport bitstream is designed to provide framing, error
|
---|
78 | protection and seeking structure for higher-level codec streams that
|
---|
79 | consist of raw, unencapsulated data packets, such as the Vorbis audio
|
---|
80 | codec or Theora video codec.</p>
|
---|
81 |
|
---|
82 | <h2>Application example: Vorbis</h2>
|
---|
83 |
|
---|
84 | <p>Vorbis encodes short-time blocks of PCM data into raw packets of
|
---|
85 | bit-packed data. These raw packets may be used directly by transport
|
---|
86 | mechanisms that provide their own framing and packet-separation
|
---|
87 | mechanisms (such as UDP datagrams). For stream based storage (such as
|
---|
88 | files) and transport (such as TCP streams or pipes), Vorbis uses the
|
---|
89 | Ogg bitstream format to provide framing/sync, sync recapture
|
---|
90 | after error, landmarks during seeking, and enough information to
|
---|
91 | properly separate data back into packets at the original packet
|
---|
92 | boundaries without relying on decoding to find packet boundaries.</p>
|
---|
93 |
|
---|
94 | <h2>Design constraints for Ogg bitstreams</h2>
|
---|
95 |
|
---|
96 | <ol>
|
---|
97 | <li>True streaming; we must not need to seek to build a 100%
|
---|
98 | complete bitstream.</li>
|
---|
99 | <li>Use no more than approximately 1-2% of bitstream bandwidth for
|
---|
100 | packet boundary marking, high-level framing, sync and seeking.</li>
|
---|
101 | <li>Specification of absolute position within the original sample
|
---|
102 | stream.</li>
|
---|
103 | <li>Simple mechanism to ease limited editing, such as a simplified
|
---|
104 | concatenation mechanism.</li>
|
---|
105 | <li>Detection of corruption, recapture after error and direct, random
|
---|
106 | access to data at arbitrary positions in the bitstream.</li>
|
---|
107 | </ol>
|
---|
108 |
|
---|
109 | <h2>Logical and Physical Bitstreams</h2>
|
---|
110 |
|
---|
111 | <p>A <em>logical</em> Ogg bitstream is a contiguous stream of
|
---|
112 | sequential pages belonging only to the logical bitstream. A
|
---|
113 | <em>physical</em> Ogg bitstream is constructed from one or more
|
---|
114 | than one logical Ogg bitstream (the simplest physical bitstream
|
---|
115 | is simply a single logical bitstream). We describe below the exact
|
---|
116 | formatting of an Ogg logical bitstream. Combining logical
|
---|
117 | bitstreams into more complex physical bitstreams is described in the
|
---|
118 | <a href="oggstream.html">Ogg bitstream overview</a>. The exact
|
---|
119 | mapping of raw Vorbis packets into a valid Ogg Vorbis physical
|
---|
120 | bitstream is described in the Vorbis I Specification.</p>
|
---|
121 |
|
---|
122 | <h2>Bitstream structure</h2>
|
---|
123 |
|
---|
124 | <p>An Ogg stream is structured by dividing incoming packets into
|
---|
125 | segments of up to 255 bytes and then wrapping a group of contiguous
|
---|
126 | packet segments into a variable length page preceded by a page
|
---|
127 | header. Both the header size and page size are variable; the page
|
---|
128 | header contains sizing information and checksum data to determine
|
---|
129 | header/page size and data integrity.</p>
|
---|
130 |
|
---|
131 | <p>The bitstream is captured (or recaptured) by looking for the beginning
|
---|
132 | of a page, specifically the capture pattern. Once the capture pattern
|
---|
133 | is found, the decoder verifies page sync and integrity by computing
|
---|
134 | and comparing the checksum. At that point, the decoder can extract the
|
---|
135 | packets themselves.</p>
|
---|
136 |
|
---|
137 | <h3>Packet segmentation</h3>
|
---|
138 |
|
---|
139 | <p>Packets are logically divided into multiple segments before encoding
|
---|
140 | into a page. Note that the segmentation and fragmentation process is a
|
---|
141 | logical one; it's used to compute page header values and the original
|
---|
142 | page data need not be disturbed, even when a packet spans page
|
---|
143 | boundaries.</p>
|
---|
144 |
|
---|
145 | <p>The raw packet is logically divided into [n] 255 byte segments and a
|
---|
146 | last fractional segment of < 255 bytes. A packet size may well
|
---|
147 | consist only of the trailing fractional segment, and a fractional
|
---|
148 | segment may be zero length. These values, called "lacing values" are
|
---|
149 | then saved and placed into the header segment table.</p>
|
---|
150 |
|
---|
151 | <p>An example should make the basic concept clear:</p>
|
---|
152 |
|
---|
153 | <pre>
|
---|
154 | <tt>
|
---|
155 | raw packet:
|
---|
156 | ___________________________________________
|
---|
157 | |______________packet data__________________| 753 bytes
|
---|
158 |
|
---|
159 | lacing values for page header segment table: 255,255,243
|
---|
160 | </tt>
|
---|
161 | </pre>
|
---|
162 |
|
---|
163 | <p>We simply add the lacing values for the total size; the last lacing
|
---|
164 | value for a packet is always the value that is less than 255. Note
|
---|
165 | that this encoding both avoids imposing a maximum packet size as well
|
---|
166 | as imposing minimum overhead on small packets (as opposed to, eg,
|
---|
167 | simply using two bytes at the head of every packet and having a max
|
---|
168 | packet size of 32k. Small packets (<255, the typical case) are
|
---|
169 | penalized with twice the segmentation overhead). Using the lacing
|
---|
170 | values as suggested, small packets see the minimum possible
|
---|
171 | byte-aligned overheade (1 byte) and large packets, over 512 bytes or
|
---|
172 | so, see a fairly constant ~.5% overhead on encoding space.</p>
|
---|
173 |
|
---|
174 | <p>Note that a lacing value of 255 implies that a second lacing value
|
---|
175 | follows in the packet, and a value of < 255 marks the end of the
|
---|
176 | packet after that many additional bytes. A packet of 255 bytes (or a
|
---|
177 | multiple of 255 bytes) is terminated by a lacing value of 0:</p>
|
---|
178 |
|
---|
179 | <pre><tt>
|
---|
180 | raw packet:
|
---|
181 | _______________________________
|
---|
182 | |________packet data____________| 255 bytes
|
---|
183 |
|
---|
184 | lacing values: 255, 0
|
---|
185 | </tt></pre>
|
---|
186 |
|
---|
187 | <p>Note also that a 'nil' (zero length) packet is not an error; it
|
---|
188 | consists of nothing more than a lacing value of zero in the header.</p>
|
---|
189 |
|
---|
190 | <h3>Packets spanning pages</h3>
|
---|
191 |
|
---|
192 | <p>Packets are not restricted to beginning and ending within a page,
|
---|
193 | although individual segments are, by definition, required to do so.
|
---|
194 | Packets are not restricted to a maximum size, although excessively
|
---|
195 | large packets in the data stream are discouraged; the Ogg
|
---|
196 | bitstream specification strongly recommends nominal page size of
|
---|
197 | approximately 4-8kB (large packets are foreseen as being useful for
|
---|
198 | initialization data at the beginning of a logical bitstream).</p>
|
---|
199 |
|
---|
200 | <p>After segmenting a packet, the encoder may decide not to place all the
|
---|
201 | resulting segments into the current page; to do so, the encoder places
|
---|
202 | the lacing values of the segments it wishes to belong to the current
|
---|
203 | page into the current segment table, then finishes the page. The next
|
---|
204 | page is begun with the first value in the segment table belonging to
|
---|
205 | the next packet segment, thus continuing the packet (data in the
|
---|
206 | packet body must also correspond properly to the lacing values in the
|
---|
207 | spanned pages. The segment data in the first packet corresponding to
|
---|
208 | the lacing values of the first page belong in that page; packet
|
---|
209 | segments listed in the segment table of the following page must begin
|
---|
210 | the page body of the subsequent page).</p>
|
---|
211 |
|
---|
212 | <p>The last mechanic to spanning a page boundary is to set the header
|
---|
213 | flag in the new page to indicate that the first lacing value in the
|
---|
214 | segment table continues rather than begins a packet; a header flag of
|
---|
215 | 0x01 is set to indicate a continued packet. Although mandatory, it
|
---|
216 | is not actually algorithmically necessary; one could inspect the
|
---|
217 | preceding segment table to determine if the packet is new or
|
---|
218 | continued. Adding the information to the packet_header flag allows a
|
---|
219 | simpler design (with no overhead) that needs only inspect the current
|
---|
220 | page header after frame capture. This also allows faster error
|
---|
221 | recovery in the event that the packet originates in a corrupt
|
---|
222 | preceding page, implying that the previous page's segment table
|
---|
223 | cannot be trusted.</p>
|
---|
224 |
|
---|
225 | <p>Note that a packet can span an arbitrary number of pages; the above
|
---|
226 | spanning process is repeated for each spanned page boundary. Also a
|
---|
227 | 'zero termination' on a packet size that is an even multiple of 255
|
---|
228 | must appear even if the lacing value appears in the next page as a
|
---|
229 | zero-length continuation of the current packet. The header flag
|
---|
230 | should be set to 0x01 to indicate that the packet spanned, even though
|
---|
231 | the span is a nil case as far as data is concerned.</p>
|
---|
232 |
|
---|
233 | <p>The encoding looks odd, but is properly optimized for speed and the
|
---|
234 | expected case of the majority of packets being between 50 and 200
|
---|
235 | bytes (note that it is designed such that packets of wildly different
|
---|
236 | sizes can be handled within the model; placing packet size
|
---|
237 | restrictions on the encoder would have only slightly simplified design
|
---|
238 | in page generation and increased overall encoder complexity).</p>
|
---|
239 |
|
---|
240 | <p>The main point behind tracking individual packets (and packet
|
---|
241 | segments) is to allow more flexible encoding tricks that requiring
|
---|
242 | explicit knowledge of packet size. An example is simple bandwidth
|
---|
243 | limiting, implemented by simply truncating packets in the nominal case
|
---|
244 | if the packet is arranged so that the least sensitive portion of the
|
---|
245 | data comes last.</p>
|
---|
246 |
|
---|
247 | <h3>Page header</h3>
|
---|
248 |
|
---|
249 | <p>The headering mechanism is designed to avoid copying and re-assembly
|
---|
250 | of the packet data (ie, making the packet segmentation process a
|
---|
251 | logical one); the header can be generated directly from incoming
|
---|
252 | packet data. The encoder buffers packet data until it finishes a
|
---|
253 | complete page at which point it writes the header followed by the
|
---|
254 | buffered packet segments.</p>
|
---|
255 |
|
---|
256 | <h4>capture_pattern</h4>
|
---|
257 |
|
---|
258 | <p>A header begins with a capture pattern that simplifies identifying
|
---|
259 | pages; once the decoder has found the capture pattern it can do a more
|
---|
260 | intensive job of verifying that it has in fact found a page boundary
|
---|
261 | (as opposed to an inadvertent coincidence in the byte stream).</p>
|
---|
262 |
|
---|
263 | <pre><tt>
|
---|
264 | byte value
|
---|
265 |
|
---|
266 | 0 0x4f 'O'
|
---|
267 | 1 0x67 'g'
|
---|
268 | 2 0x67 'g'
|
---|
269 | 3 0x53 'S'
|
---|
270 | </tt></pre>
|
---|
271 |
|
---|
272 | <h4>stream_structure_version</h4>
|
---|
273 |
|
---|
274 | <p>The capture pattern is followed by the stream structure revision:</p>
|
---|
275 |
|
---|
276 | <pre><tt>
|
---|
277 | byte value
|
---|
278 |
|
---|
279 | 4 0x00
|
---|
280 | </tt></pre>
|
---|
281 |
|
---|
282 | <h4>header_type_flag</h4>
|
---|
283 |
|
---|
284 | <p>The header type flag identifies this page's context in the bitstream:</p>
|
---|
285 |
|
---|
286 | <pre><tt>
|
---|
287 | byte value
|
---|
288 |
|
---|
289 | 5 bitflags: 0x01: unset = fresh packet
|
---|
290 | set = continued packet
|
---|
291 | 0x02: unset = not first page of logical bitstream
|
---|
292 | set = first page of logical bitstream (bos)
|
---|
293 | 0x04: unset = not last page of logical bitstream
|
---|
294 | set = last page of logical bitstream (eos)
|
---|
295 | </tt></pre>
|
---|
296 |
|
---|
297 | <h4>absolute granule position</h4>
|
---|
298 |
|
---|
299 | <p>(This is packed in the same way the rest of Ogg data is packed; LSb
|
---|
300 | of LSB first. Note that the 'position' data specifies a 'sample'
|
---|
301 | number (eg, in a CD quality sample is four octets, 16 bits for left
|
---|
302 | and 16 bits for right; in video it would likely be the frame number.
|
---|
303 | It is up to the specific codec in use to define the semantic meaning
|
---|
304 | of the granule position value). The position specified is the total
|
---|
305 | samples encoded after including all packets finished on this page
|
---|
306 | (packets begun on this page but continuing on to the next page do not
|
---|
307 | count). The rationale here is that the position specified in the
|
---|
308 | frame header of the last page tells how long the data coded by the
|
---|
309 | bitstream is. A truncated stream will still return the proper number
|
---|
310 | of samples that can be decoded fully.</p>
|
---|
311 |
|
---|
312 | <p>A special value of '-1' (in two's complement) indicates that no packets
|
---|
313 | finish on this page.</p>
|
---|
314 |
|
---|
315 | <pre><tt>
|
---|
316 | byte value
|
---|
317 |
|
---|
318 | 6 0xXX LSB
|
---|
319 | 7 0xXX
|
---|
320 | 8 0xXX
|
---|
321 | 9 0xXX
|
---|
322 | 10 0xXX
|
---|
323 | 11 0xXX
|
---|
324 | 12 0xXX
|
---|
325 | 13 0xXX MSB
|
---|
326 | </tt></pre>
|
---|
327 |
|
---|
328 | <h4>stream serial number</h4>
|
---|
329 |
|
---|
330 | <p>Ogg allows for separate logical bitstreams to be mixed at page
|
---|
331 | granularity in a physical bitstream. The most common case would be
|
---|
332 | sequential arrangement, but it is possible to interleave pages for
|
---|
333 | two separate bitstreams to be decoded concurrently. The serial
|
---|
334 | number is the means by which pages physical pages are associated with
|
---|
335 | a particular logical stream. Each logical stream must have a unique
|
---|
336 | serial number within a physical stream:</p>
|
---|
337 |
|
---|
338 | <pre><tt>
|
---|
339 | byte value
|
---|
340 |
|
---|
341 | 14 0xXX LSB
|
---|
342 | 15 0xXX
|
---|
343 | 16 0xXX
|
---|
344 | 17 0xXX MSB
|
---|
345 | </tt></pre>
|
---|
346 |
|
---|
347 | <h4>page sequence no</h4>
|
---|
348 |
|
---|
349 | <p>Page counter; lets us know if a page is lost (useful where packets
|
---|
350 | span page boundaries).</p>
|
---|
351 |
|
---|
352 | <pre><tt>
|
---|
353 | byte value
|
---|
354 |
|
---|
355 | 18 0xXX LSB
|
---|
356 | 19 0xXX
|
---|
357 | 20 0xXX
|
---|
358 | 21 0xXX MSB
|
---|
359 | </tt></pre>
|
---|
360 |
|
---|
361 | <h4>page checksum</h4>
|
---|
362 |
|
---|
363 | <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0,
|
---|
364 | generator polynomial=0x04c11db7). The value is computed over the
|
---|
365 | entire header (with the CRC field in the header set to zero) and then
|
---|
366 | continued over the page. The CRC field is then filled with the
|
---|
367 | computed value.</p>
|
---|
368 |
|
---|
369 | <p>(A thorough discussion of CRC algorithms can be found in
|
---|
370 | <a href="https://zlib.net/crc_v3.txt">"A Painless Guide to CRC Error Detection Algorithms"</a>
|
---|
371 | by Ross Williams.)</p>
|
---|
372 |
|
---|
373 | <pre><tt>
|
---|
374 | byte value
|
---|
375 |
|
---|
376 | 22 0xXX LSB
|
---|
377 | 23 0xXX
|
---|
378 | 24 0xXX
|
---|
379 | 25 0xXX MSB
|
---|
380 | </tt></pre>
|
---|
381 |
|
---|
382 | <h4>page_segments</h4>
|
---|
383 |
|
---|
384 | <p>The number of segment entries to appear in the segment table. The
|
---|
385 | maximum number of 255 segments (255 bytes each) sets the maximum
|
---|
386 | possible physical page size at 65307 bytes or just under 64kB (thus
|
---|
387 | we know that a header corrupted so as destroy sizing/alignment
|
---|
388 | information will not cause a runaway bitstream. We'll read in the
|
---|
389 | page according to the corrupted size information that's guaranteed to
|
---|
390 | be a reasonable size regardless, notice the checksum mismatch, drop
|
---|
391 | sync and then look for recapture).</p>
|
---|
392 |
|
---|
393 | <pre><tt>
|
---|
394 | byte value
|
---|
395 |
|
---|
396 | 26 0x00-0xff (0-255)
|
---|
397 | </tt></pre>
|
---|
398 |
|
---|
399 | <h4>segment_table (containing packet lacing values)</h4>
|
---|
400 |
|
---|
401 | <p>The lacing values for each packet segment physically appearing in
|
---|
402 | this page are listed in contiguous order.</p>
|
---|
403 |
|
---|
404 | <pre><tt>
|
---|
405 | byte value
|
---|
406 |
|
---|
407 | 27 0x00-0xff (0-255)
|
---|
408 | [...]
|
---|
409 | n 0x00-0xff (0-255, n=page_segments+26)
|
---|
410 | </tt></pre>
|
---|
411 |
|
---|
412 | <p>Total page size is calculated directly from the known header size and
|
---|
413 | lacing values in the segment table. Packet data segments follow
|
---|
414 | immediately after the header.</p>
|
---|
415 |
|
---|
416 | <p>Page headers typically impose a flat .25-.5% space overhead assuming
|
---|
417 | nominal ~8k page sizes. The segmentation table needed for exact
|
---|
418 | packet recovery in the streaming layer adds approximately .5-1%
|
---|
419 | nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
|
---|
420 | stereo encodings.</p>
|
---|
421 |
|
---|
422 | <div id="copyright">
|
---|
423 | The Xiph Fish Logo is a
|
---|
424 | trademark (™) of Xiph.Org.<br/>
|
---|
425 |
|
---|
426 | These pages © 1994 - 2005 Xiph.Org. All rights reserved.
|
---|
427 | </div>
|
---|
428 |
|
---|
429 | </body>
|
---|
430 | </html>
|
---|