1 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
---|
2 | <html>
|
---|
3 | <head>
|
---|
4 |
|
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
|
---|
6 | <title>Ogg Documentation</title>
|
---|
7 |
|
---|
8 | <style type="text/css">
|
---|
9 | body {
|
---|
10 | margin: 0 18px 0 18px;
|
---|
11 | padding-bottom: 30px;
|
---|
12 | font-family: Verdana, Arial, Helvetica, sans-serif;
|
---|
13 | color: #333333;
|
---|
14 | font-size: .8em;
|
---|
15 | }
|
---|
16 |
|
---|
17 | a {
|
---|
18 | color: #3366cc;
|
---|
19 | }
|
---|
20 |
|
---|
21 | img {
|
---|
22 | border: 0;
|
---|
23 | }
|
---|
24 |
|
---|
25 | #xiphlogo {
|
---|
26 | margin: 30px 0 16px 0;
|
---|
27 | }
|
---|
28 |
|
---|
29 | #content p {
|
---|
30 | line-height: 1.4;
|
---|
31 | }
|
---|
32 |
|
---|
33 | h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
|
---|
34 | font-weight: bold;
|
---|
35 | color: #ff9900;
|
---|
36 | margin: 1.3em 0 8px 0;
|
---|
37 | }
|
---|
38 |
|
---|
39 | h1 {
|
---|
40 | font-size: 1.3em;
|
---|
41 | }
|
---|
42 |
|
---|
43 | h2 {
|
---|
44 | font-size: 1.2em;
|
---|
45 | }
|
---|
46 |
|
---|
47 | h3 {
|
---|
48 | font-size: 1.1em;
|
---|
49 | }
|
---|
50 |
|
---|
51 | li {
|
---|
52 | line-height: 1.4;
|
---|
53 | }
|
---|
54 |
|
---|
55 | #copyright {
|
---|
56 | margin-top: 30px;
|
---|
57 | line-height: 1.5em;
|
---|
58 | text-align: center;
|
---|
59 | font-size: .8em;
|
---|
60 | color: #888888;
|
---|
61 | clear: both;
|
---|
62 | }
|
---|
63 | </style>
|
---|
64 |
|
---|
65 | </head>
|
---|
66 |
|
---|
67 | <body>
|
---|
68 |
|
---|
69 | <div id="xiphlogo">
|
---|
70 | <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
|
---|
71 | </div>
|
---|
72 |
|
---|
73 | <h1>Page Multiplexing and Ordering in a Physical Ogg Stream</h1>
|
---|
74 |
|
---|
75 | <p>The low-level mechanisms of an Ogg stream (as described in the Ogg
|
---|
76 | Bitstream Overview) provide means for mixing multiple logical streams
|
---|
77 | and media types into a single linear-chronological stream. This
|
---|
78 | document specifies the high-level arrangement and use of page
|
---|
79 | structure to multiplex multiple streams of mixed media type within a
|
---|
80 | physical Ogg stream.</p>
|
---|
81 |
|
---|
82 | <h2>Design Elements</h2>
|
---|
83 |
|
---|
84 | <p>The design and arrangement of the Ogg container format is governed by
|
---|
85 | several high-level design decisions that form the reasoning behind
|
---|
86 | specific low-level design decisions.</p>
|
---|
87 |
|
---|
88 | <h3>Linear media</h3>
|
---|
89 |
|
---|
90 | <p>The Ogg bitstream is intended to encapsulate chronological,
|
---|
91 | time-linear mixed media into a single delivery stream or file. The
|
---|
92 | design is such that an application can always encode and/or decode a
|
---|
93 | full-featured bitstream in one pass with no seeking and minimal
|
---|
94 | buffering. Seeking to provide optimized encoding (such as two-pass
|
---|
95 | encoding) or interactive decoding (such as scrubbing or instant
|
---|
96 | replay) is not disallowed or discouraged, however no bitstream feature
|
---|
97 | must require nonlinear operation on the bitstream.</p>
|
---|
98 |
|
---|
99 | <h3>Multiplexing</h3>
|
---|
100 |
|
---|
101 | <p>Ogg bitstreams multiplex multiple logical streams into a single
|
---|
102 | physical stream at the page level. Each page contains an abstract
|
---|
103 | time stamp (the Granule Position) that represents an absolute time
|
---|
104 | landmark within the stream. After the pages representing stream
|
---|
105 | headers (all logical stream headers occur at the beginning of a
|
---|
106 | physical bitstream section before any logical stream data), logical
|
---|
107 | stream data pages are arranged in a physical bitstream in strict
|
---|
108 | non-decreasing order by chronological absolute time as
|
---|
109 | specified by the granule position.</p>
|
---|
110 |
|
---|
111 | <p>The only exception to arranging pages in strictly ascending time order
|
---|
112 | by granule position is those pages that do not set the granule
|
---|
113 | position value. This is a special case when exceptionally large
|
---|
114 | packets span multiple pages; the specifics of handling this special
|
---|
115 | case are described later under 'Continuous and Discontinuous
|
---|
116 | Streams'.</p>
|
---|
117 |
|
---|
118 | <h3>Seeking</h3>
|
---|
119 |
|
---|
120 | <p>Ogg is designed to use an interpolated bisection search to
|
---|
121 | implement exact positional seeking. Interpolated bisection search is
|
---|
122 | a spec-mandated mechanism.</p>
|
---|
123 |
|
---|
124 | <p><i>An index may improve objective performance, but it seldom
|
---|
125 | improves subjective performance outside of a few high-latency use
|
---|
126 | cases and adds no additional functionality as bisection search
|
---|
127 | delivers the same functionality for both one- and two-pass stream
|
---|
128 | types. For these reasons, use of indexes is discouraged, except in
|
---|
129 | cases where an index provides demonstrable and noticeable performance
|
---|
130 | improvement.</i></p>
|
---|
131 |
|
---|
132 | <p>Seek operations are by absolute time; a direct bisection search must
|
---|
133 | find the exact time position requested. Information in the Ogg
|
---|
134 | bitstream is arranged such that all information to be presented for
|
---|
135 | playback from the desired seek point will occur at or after the
|
---|
136 | desired seek point. Seek operations are neither 'fuzzy' nor
|
---|
137 | heuristic.</p>
|
---|
138 |
|
---|
139 | <p><i>Although key frame handling in video appears to be an exception to
|
---|
140 | "all needed playback information lies ahead of a given seek",
|
---|
141 | key frames can still be handled directly within this indexless
|
---|
142 | framework. Seeking to a key frame in video (as well as seeking in other
|
---|
143 | media types with analogous restraints) is handled as two seeks; first
|
---|
144 | a seek to the desired time which extracts state information that
|
---|
145 | decodes to the time of the last key frame, followed by a second seek
|
---|
146 | directly to the key frame. The location of the previous key frame is
|
---|
147 | embedded as state information in the granulepos; this mechanism is
|
---|
148 | described in more detail later.</i></p>
|
---|
149 |
|
---|
150 | <h3>Continuous and Discontinuous Streams</h3>
|
---|
151 |
|
---|
152 | <p>Logical streams within a physical Ogg stream belong to one of two
|
---|
153 | categories, "Continuous" streams and "Discontinuous" streams.
|
---|
154 | Although these are discussed in more detail later, the distinction is
|
---|
155 | important to a high-level understanding of how to buffer an Ogg
|
---|
156 | stream.</p>
|
---|
157 |
|
---|
158 | <p>A stream that provides a gapless, time-continuous media type with a
|
---|
159 | fine-grained timebase is considered to be 'Continuous'. A continuous
|
---|
160 | stream should never be starved of data. Clear examples of continuous
|
---|
161 | data types include broadcast audio and video.</p>
|
---|
162 |
|
---|
163 | <p>A stream that delivers data in a potentially irregular pattern or with
|
---|
164 | widely spaced timing gaps is considered to be 'Discontinuous'. A
|
---|
165 | discontinuous stream may be best thought of as data representing
|
---|
166 | scattered events; although they happen in order, they are typically
|
---|
167 | unconnected data often located far apart. One possible example of a
|
---|
168 | discontinuous stream types would be captioning. Although it's
|
---|
169 | possible to design captions as a continuous stream type, it's most
|
---|
170 | natural to think of captions as widely spaced pieces of text with
|
---|
171 | little happening between.</p>
|
---|
172 |
|
---|
173 | <p>The fundamental design distinction between continuous and
|
---|
174 | discontinuous streams concerns buffering.</p>
|
---|
175 |
|
---|
176 | <h3>Buffering</h3>
|
---|
177 |
|
---|
178 | <p>Because a continuous stream is, by definition, gapless, Ogg buffering
|
---|
179 | is based on the simple premise of never allowing any active continuous
|
---|
180 | stream to starve for data during decode; buffering proceeds ahead
|
---|
181 | until all continuous streams in a physical stream have data ready to
|
---|
182 | decode on demand.</p>
|
---|
183 |
|
---|
184 | <p>Discontinuous stream data may occur on a fairly regular basis, but the
|
---|
185 | timing of, for example, a specific caption is impossible to predict
|
---|
186 | with certainty in most captioning systems. Thus the buffering system
|
---|
187 | should take discontinuous data 'as it comes' rather than working ahead
|
---|
188 | (for a potentially unbounded period) to look for future discontinuous
|
---|
189 | data. As such, discontinuous streams are ignored when managing
|
---|
190 | buffering; their pages simply 'fall out' of the stream when continuous
|
---|
191 | streams are handled properly.</p>
|
---|
192 |
|
---|
193 | <p>Buffering requirements need not be explicitly declared or managed for
|
---|
194 | the encoded stream; the decoder simply reads as much data as is
|
---|
195 | necessary to keep all continuous stream types gapless (also ensuring
|
---|
196 | discontinuous data arrives in time) and no more, resulting in optimum
|
---|
197 | implicit buffer usage for a given stream. Because all pages of all
|
---|
198 | data types are stamped with absolute timing information within the
|
---|
199 | stream, inter-stream synchronization timing is always explicitly
|
---|
200 | maintained without the need for explicitly declared buffer-ahead
|
---|
201 | hinting.</p>
|
---|
202 |
|
---|
203 | <p>Further details, mechanisms and reasons for the differing arrangement
|
---|
204 | and behavior of continuous and discontinuous streams is discussed
|
---|
205 | later.</p>
|
---|
206 |
|
---|
207 | <h3>Whole-stream navigation</h3>
|
---|
208 |
|
---|
209 | <p>Ogg is designed so that the simplest navigation operations treat the
|
---|
210 | physical Ogg stream as a whole summary of its streams, rather than
|
---|
211 | navigating each interleaved stream as a separate entity.</p>
|
---|
212 |
|
---|
213 | <p>First Example: seeking to a desired time position in a multiplexed (or
|
---|
214 | unmultiplexed) Ogg stream can be accomplished through a bisection
|
---|
215 | search on time position of all pages in the stream (as encoded in the
|
---|
216 | granule position). More powerful searches (such as a key frame-aware
|
---|
217 | seek within video) are also possible with additional search
|
---|
218 | complexity, but similar computational complexity.</p>
|
---|
219 |
|
---|
220 | <p>Second Example: A bitstream section may consist of three multiplexed
|
---|
221 | streams of differing lengths. The result of multiplexing these
|
---|
222 | streams should be thought of as a single mixed stream with a length
|
---|
223 | equal to the longest of the three component streams. Although it is
|
---|
224 | also possible to think of the multiplexed results as three concurrent
|
---|
225 | streams of different lengths and it is possible to recover the three
|
---|
226 | original streams, it will also become obvious that once multiplexed,
|
---|
227 | it isn't possible to find the internal lengths of the component
|
---|
228 | streams without a linear search of the whole bitstream section.
|
---|
229 | However, it is possible to find the length of the whole bitstream
|
---|
230 | section easily (in near-constant time per section) just as it is for a
|
---|
231 | single-media unmultiplexed stream.</p>
|
---|
232 |
|
---|
233 | <h2>Granule Position</h2>
|
---|
234 |
|
---|
235 | <h3>Description</h3>
|
---|
236 |
|
---|
237 | <p>The Granule Position is a signed 64 bit field appearing in the header
|
---|
238 | of every Ogg page. Although the granule position represents absolute
|
---|
239 | time within a logical stream, its value does not necessarily directly
|
---|
240 | encode a simple timestamp. It may represent frames elapsed (as in
|
---|
241 | Vorbis), a simple timestamp, or a more complex bit-division encoding
|
---|
242 | (such as in Theora). The exact encoding of the granule position is up
|
---|
243 | to a specific codec.</p>
|
---|
244 |
|
---|
245 | <p>The granule position is governed by the following rules:</p>
|
---|
246 |
|
---|
247 | <ul>
|
---|
248 |
|
---|
249 | <li>Granule Position must always increase forward or remain equal from
|
---|
250 | page to page, be unset, or be zero for a header page. The absolute
|
---|
251 | time to which any correct sequence of granule position maps must
|
---|
252 | similarly always increase forward or remain equal. <i>(A codec may
|
---|
253 | make use of data, such as a control sequence, that only affects codec
|
---|
254 | working state without producing data and thus advancing granule
|
---|
255 | position and time. Although the packet sequence number increases in
|
---|
256 | this case, the granule position, and thus the time position, do
|
---|
257 | not.)</i></li>
|
---|
258 |
|
---|
259 | <li>Granule position may only be unset if there no packet defining a
|
---|
260 | time boundary on the page (that is, if no packet in a continuous
|
---|
261 | stream ends on the page, or no packet in a discontinuous stream begins
|
---|
262 | on the page. This will be discussed in more detail under Continuous
|
---|
263 | and Discontinuous streams).</li>
|
---|
264 |
|
---|
265 | <li>A codec must be able to translate a given granule position value
|
---|
266 | to a unique, deterministic absolute time value through direct
|
---|
267 | calculation. A codec is not required to be able to translate an
|
---|
268 | absolute time value into a unique granule position value.</li>
|
---|
269 |
|
---|
270 | <li>Codecs shall choose a granule position definition that allows that
|
---|
271 | codec means to seek as directly as possible to an immediately
|
---|
272 | decodable point, such as the bit-divided granule position encoding of
|
---|
273 | Theora allows the codec to seek efficiently to key frame without using
|
---|
274 | an index. That is, additional information other than absolute time
|
---|
275 | may be encoded into a granule position value so long as the granule
|
---|
276 | position obeys the above points.</li>
|
---|
277 |
|
---|
278 | </ul>
|
---|
279 |
|
---|
280 | <h4>Example: timestamp</h4>
|
---|
281 |
|
---|
282 | <p>In general, a codec/stream type should choose the simplest granule
|
---|
283 | position encoding that addresses its requirements. The examples here
|
---|
284 | are by no means exhaustive of the possibilities within Ogg.</p>
|
---|
285 |
|
---|
286 | <p>A simple granule position could encode a timestamp directly. For
|
---|
287 | example, a granule position that encoded milliseconds from beginning
|
---|
288 | of stream would allow a logical stream length of over 100,000,000,000
|
---|
289 | days before beginning a new logical stream (to avoid the granule
|
---|
290 | position wrapping).</p>
|
---|
291 |
|
---|
292 | <h4>Example: framestamp</h4>
|
---|
293 |
|
---|
294 | <p>A simple millisecond timestamp granule encoding might suit many stream
|
---|
295 | types, but a millisecond resolution is inappropriate to, eg, most
|
---|
296 | audio encodings where exact single-sample resolution is generally a
|
---|
297 | requirement. A millisecond is both too large a granule and often does
|
---|
298 | not represent an integer number of samples.</p>
|
---|
299 |
|
---|
300 | <p>In the event that audio frames are always encoded as the same number of
|
---|
301 | samples, the granule position could simply be a linear count of frames
|
---|
302 | since beginning of stream. This has the advantages of being exact and
|
---|
303 | efficient. Position in time would simply be <tt>[granule_position] *
|
---|
304 | [samples_per_frame] / [samples_per_second]</tt>.</p>
|
---|
305 |
|
---|
306 | <h4>Example: samplestamp (Vorbis)</h4>
|
---|
307 |
|
---|
308 | <p>Frame counting is insufficient in codecs such as Vorbis where an audio
|
---|
309 | frame [packet] encodes a variable number of samples. In Vorbis's
|
---|
310 | case, the granule position is a count of the number of raw samples
|
---|
311 | from the beginning of stream; the absolute time of
|
---|
312 | a granule position is <tt>[granule_position] /
|
---|
313 | [samples_per_second]</tt>.</p>
|
---|
314 |
|
---|
315 | <h4>Example: bit-divided framestamp (Theora)</h4>
|
---|
316 |
|
---|
317 | <p>Some video codecs may be able to use the simple framestamp scheme for
|
---|
318 | granule position. However, most modern video codecs introduce at
|
---|
319 | least the following complications:</p>
|
---|
320 |
|
---|
321 | <ul>
|
---|
322 |
|
---|
323 | <li>video frames are relatively far apart compared to audio samples;
|
---|
324 | for this reason, the point at which a video frame changes to the next
|
---|
325 | frame is usually a strictly defined offset within the frame 'period'.
|
---|
326 | That is, video at 50fps could just as easily define frame transitions
|
---|
327 | <.015, .035, .055...> as at <.00, .02, .04...>.</li>
|
---|
328 |
|
---|
329 | <li>frame rates often include drop-frames, leap-frames or other
|
---|
330 | rational-but-non-integer timings.</li>
|
---|
331 |
|
---|
332 | <li>Decode must begin at a 'key frame' or 'I frame'. Keyframes usually
|
---|
333 | occur relatively seldom.</li>
|
---|
334 |
|
---|
335 | </ul>
|
---|
336 |
|
---|
337 | <p>The first two points can be handled straightforwardly via the fact
|
---|
338 | that the codec has complete control mapping granule position to
|
---|
339 | absolute time; non-integer frame rates and offsets can be set in the
|
---|
340 | codec's initial header, and the rest is just arithmetic.</p>
|
---|
341 |
|
---|
342 | <p>The third point appears trickier at first glance, but it too can be
|
---|
343 | handled through the granule position mapping mechanism. Here we
|
---|
344 | arrange the granule position in such a way that granule positions of
|
---|
345 | key frames are easy to find. Divide the granule position into two
|
---|
346 | fields; the most-significant bits are an absolute frame counter, but
|
---|
347 | it's only updated at each key frame. The least significant bits encode
|
---|
348 | the number of frames since the last key frame. In this way, each
|
---|
349 | granule position both encodes the absolute time of the current frame
|
---|
350 | as well as the absolute time of the last key frame.</p>
|
---|
351 |
|
---|
352 | <p>Seeking to a most recent preceding key frame is then accomplished by
|
---|
353 | first seeking to the original desired point, inspecting the granulepos
|
---|
354 | of the resulting video page, extracting from that granulepos the
|
---|
355 | absolute time of the desired key frame, and then seeking directly to
|
---|
356 | that key frame's page. Of course, it's still possible for an
|
---|
357 | application to ignore key frames and use a simpler seeking algorithm
|
---|
358 | (decode would be unable to present decoded video until the next
|
---|
359 | key frame). Surprisingly many player applications do choose the
|
---|
360 | simpler approach.</p>
|
---|
361 |
|
---|
362 | <h3>granule position, packets and pages</h3>
|
---|
363 |
|
---|
364 | <p>Although each packet of data in a logical stream theoretically has a
|
---|
365 | specific granule position, only one granule position is encoded
|
---|
366 | per page. It is possible to encode a logical stream such that each
|
---|
367 | page contains only a single packet (so that granule positions are
|
---|
368 | preserved for each packet), however a one-to-one packet/page mapping
|
---|
369 | is not intended to be the general case.</p>
|
---|
370 |
|
---|
371 | <p>Because Ogg functions at the page, not packet, level, this
|
---|
372 | once-per-page time information provides Ogg with the finest-grained
|
---|
373 | time information is can use. Ogg passes this granule positioning data
|
---|
374 | to the codec (along with the packets extracted from a page); it is the
|
---|
375 | responsibility of codecs to track timing information at granularities
|
---|
376 | finer than a single page.</p>
|
---|
377 |
|
---|
378 | <h3>start-time and end-time positioning</h3>
|
---|
379 |
|
---|
380 | <p>A granule position represents the <em>instantaneous time location
|
---|
381 | between two pages</em>. However, continuous streams and discontinuous
|
---|
382 | streams differ on whether the granulepos represents the end-time of
|
---|
383 | the data on a page or the start-time. Continuous streams are
|
---|
384 | 'end-time' encoded; the granulepos represents the point in time
|
---|
385 | immediately after the last data decoded from a page. Discontinuous
|
---|
386 | streams are 'start-time' encoded; the granulepos represents the point
|
---|
387 | in time of the first data decoded from the page.</p>
|
---|
388 |
|
---|
389 | <p>An Ogg stream type is declared continuous or discontinuous by its
|
---|
390 | codec. A given codec may support both continuous and discontinuous
|
---|
391 | operation so long as any given logical stream is continuous or
|
---|
392 | discontinuous for its entirety and the codec is able to ascertain (and
|
---|
393 | inform the Ogg layer) as to which after decoding the initial stream
|
---|
394 | header. The majority of codecs will always be continuous (such as
|
---|
395 | Vorbis) or discontinuous (such as Writ).</p>
|
---|
396 |
|
---|
397 | <p>Start- and end-time encoding do not affect multiplexing sort-order;
|
---|
398 | pages are still sorted by the absolute time a given granulepos maps to
|
---|
399 | regardless of whether that granulepos represents start- or
|
---|
400 | end-time.</p>
|
---|
401 |
|
---|
402 | <h2>Multiplex/Demultiplex Division of Labor</h2>
|
---|
403 |
|
---|
404 | <p>The Ogg multiplex/demultiplex layer provides mechanisms for encoding
|
---|
405 | raw packets into Ogg pages, decoding Ogg pages back into the original
|
---|
406 | codec packets, determining the logical structure of an Ogg stream, and
|
---|
407 | navigating through and synchronizing with an Ogg stream at a desired
|
---|
408 | stream location. Strict multiplex/demultiplex operations are entirely
|
---|
409 | in the Ogg domain and require no intervention from codecs.</p>
|
---|
410 |
|
---|
411 | <p>Implementation of more complex operations does require codec
|
---|
412 | knowledge, however. Unlike other framing systems, Ogg maintains
|
---|
413 | strict separation between framing and the framed bitstream data; Ogg
|
---|
414 | does not replicate codec-specific information in the page/framing
|
---|
415 | data, nor does Ogg blur the line between framing and stream
|
---|
416 | data/metadata. Because Ogg is fully data-agnostic toward the data it
|
---|
417 | frames, operations which require specifics of bitstream data (such as
|
---|
418 | 'seek to key frame') also require interaction with the codec layer
|
---|
419 | (because, in this example, the Ogg layer is not aware of the concept
|
---|
420 | of key frames). This is different from systems that blur the
|
---|
421 | separation between framing and stream data in order to simplify the
|
---|
422 | separation of code. The Ogg system purposely keeps the distinction in
|
---|
423 | data simple so that later codec innovations are not constrained by
|
---|
424 | framing design.</p>
|
---|
425 |
|
---|
426 | <p>For this reason, however, complex seeking operations require
|
---|
427 | interaction with the codecs in order to decode the granule position of
|
---|
428 | a given stream type back to absolute time or in order to find
|
---|
429 | 'decodable points' such as key frames in video.</p>
|
---|
430 |
|
---|
431 | <h2>Unsorted Discussion Points</h2>
|
---|
432 |
|
---|
433 | <p>flushes around key frames? RFC suggestion: repaginating or building a
|
---|
434 | stream this way is nice but not required</p>
|
---|
435 |
|
---|
436 | <h2>Appendix A: multiplexing examples</h2>
|
---|
437 |
|
---|
438 | <div id="copyright">
|
---|
439 | The Xiph Fish Logo is a
|
---|
440 | trademark (™) of Xiph.Org.<br/>
|
---|
441 |
|
---|
442 | These pages © 1994 - 2005 Xiph.Org. All rights reserved.
|
---|
443 | </div>
|
---|
444 |
|
---|
445 | </body>
|
---|
446 | </html>
|
---|