1 | % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
|
---|
2 | %!TEX root = Vorbis_I_spec.tex
|
---|
3 | \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
|
---|
4 |
|
---|
5 | \subsection{Overview}
|
---|
6 |
|
---|
7 | This document describes using Ogg logical and physical transport
|
---|
8 | streams to encapsulate Vorbis compressed audio packet data into file
|
---|
9 | form.
|
---|
10 |
|
---|
11 | The \xref{vorbis:spec:intro} provides an overview of the construction
|
---|
12 | of Vorbis audio packets.
|
---|
13 |
|
---|
14 | The \href{oggstream.html}{Ogg
|
---|
15 | bitstream overview} and \href{framing.html}{Ogg logical
|
---|
16 | bitstream and framing spec} provide detailed descriptions of Ogg
|
---|
17 | transport streams. This specification document assumes a working
|
---|
18 | knowledge of the concepts covered in these named backround
|
---|
19 | documents. Please read them first.
|
---|
20 |
|
---|
21 | \subsubsection{Restrictions}
|
---|
22 |
|
---|
23 | The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
|
---|
24 | streams use Ogg transport streams in degenerate, unmultiplexed
|
---|
25 | form only. That is:
|
---|
26 |
|
---|
27 | \begin{itemize}
|
---|
28 | \item
|
---|
29 | A meta-headerless Ogg file encapsulates the Vorbis I packets
|
---|
30 |
|
---|
31 | \item
|
---|
32 | The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
|
---|
33 |
|
---|
34 | \item
|
---|
35 | The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
|
---|
36 |
|
---|
37 | \end{itemize}
|
---|
38 |
|
---|
39 |
|
---|
40 | This is not to say that it is not currently possible to multiplex
|
---|
41 | Vorbis with other media types into a multi-stream Ogg file. At the
|
---|
42 | time this document was written, Ogg was becoming a popular container
|
---|
43 | for low-bitrate movies consisting of DivX video and Vorbis audio.
|
---|
44 | However, a 'Vorbis I audio file' is taken to imply Vorbis audio
|
---|
45 | existing alone within a degenerate Ogg stream. A compliant 'Vorbis
|
---|
46 | audio player' is not required to implement Ogg support beyond the
|
---|
47 | specific support of Vorbis within a degenrate Ogg stream (naturally,
|
---|
48 | application authors are encouraged to support full multiplexed Ogg
|
---|
49 | handling).
|
---|
50 |
|
---|
51 |
|
---|
52 |
|
---|
53 |
|
---|
54 | \subsubsection{MIME type}
|
---|
55 |
|
---|
56 | The MIME type of Ogg files depend on the context. Specifically, complex
|
---|
57 | multimedia and applications should use \literal{application/ogg},
|
---|
58 | while visual media should use \literal{video/ogg}, and audio
|
---|
59 | \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear
|
---|
60 | in any of those types. RTP encapsulated Vorbis should use
|
---|
61 | \literal{audio/vorbis} + \literal{audio/vorbis-config}.
|
---|
62 |
|
---|
63 |
|
---|
64 | \subsection{Encapsulation}
|
---|
65 |
|
---|
66 | Ogg encapsulation of a Vorbis packet stream is straightforward.
|
---|
67 |
|
---|
68 | \begin{itemize}
|
---|
69 |
|
---|
70 | \item
|
---|
71 | The first Vorbis packet (the identification header), which
|
---|
72 | uniquely identifies a stream as Vorbis audio, is placed alone in the
|
---|
73 | first page of the logical Ogg stream. This results in a first Ogg
|
---|
74 | page of exactly 58 bytes at the very beginning of the logical stream.
|
---|
75 |
|
---|
76 |
|
---|
77 | \item
|
---|
78 | This first page is marked 'beginning of stream' in the page flags.
|
---|
79 |
|
---|
80 |
|
---|
81 | \item
|
---|
82 | The second and third vorbis packets (comment and setup
|
---|
83 | headers) may span one or more pages beginning on the second page of
|
---|
84 | the logical stream. However many pages they span, the third header
|
---|
85 | packet finishes the page on which it ends. The next (first audio) packet
|
---|
86 | must begin on a fresh page.
|
---|
87 |
|
---|
88 |
|
---|
89 | \item
|
---|
90 | The granule position of these first pages containing only headers is zero.
|
---|
91 |
|
---|
92 |
|
---|
93 | \item
|
---|
94 | The first audio packet of the logical stream begins a fresh Ogg page.
|
---|
95 |
|
---|
96 |
|
---|
97 | \item
|
---|
98 | Packets are placed into ogg pages in order until the end of stream.
|
---|
99 |
|
---|
100 |
|
---|
101 | \item
|
---|
102 | The last page is marked 'end of stream' in the page flags.
|
---|
103 |
|
---|
104 |
|
---|
105 | \item
|
---|
106 | Vorbis packets may span page boundaries.
|
---|
107 |
|
---|
108 |
|
---|
109 | \item
|
---|
110 | The granule position of pages containing Vorbis audio is in units
|
---|
111 | of PCM audio samples (per channel; a stereo stream's granule position
|
---|
112 | does not increment at twice the speed of a mono stream).
|
---|
113 |
|
---|
114 |
|
---|
115 | \item
|
---|
116 | The granule position of a page represents the end PCM sample
|
---|
117 | position of the last packet \emph{completed} on that
|
---|
118 | page. The 'last PCM sample' is the last complete sample returned by
|
---|
119 | decode, not an internal sample awaiting lapping with a
|
---|
120 | subsequent block. A page that is entirely spanned by a single
|
---|
121 | packet (that completes on a subsequent page) has no granule
|
---|
122 | position, and the granule position is set to '-1'.
|
---|
123 |
|
---|
124 |
|
---|
125 | Note that the last decoded (fully lapped) PCM sample from a packet
|
---|
126 | is not necessarily the middle sample from that block. If, eg, the
|
---|
127 | current Vorbis packet encodes a "long block" and the next Vorbis
|
---|
128 | packet encodes a "short block", the last decodable sample from the
|
---|
129 | current packet be at position (3*long\_block\_length/4) -
|
---|
130 | (short\_block\_length/4).
|
---|
131 |
|
---|
132 |
|
---|
133 | \item
|
---|
134 | The granule (PCM) position of the first page need not indicate
|
---|
135 | that the stream started at position zero. Although the granule
|
---|
136 | position belongs to the last completed packet on the page and a
|
---|
137 | valid granule position must be positive, by
|
---|
138 | inference it may indicate that the PCM position of the beginning
|
---|
139 | of audio is positive or negative.
|
---|
140 |
|
---|
141 |
|
---|
142 | \begin{itemize}
|
---|
143 | \item
|
---|
144 | A positive starting value simply indicates that this stream begins at
|
---|
145 | some positive time offset, potentially within a larger
|
---|
146 | program. This is a common case when connecting to the middle
|
---|
147 | of broadcast stream.
|
---|
148 |
|
---|
149 | \item
|
---|
150 | A negative value indicates that
|
---|
151 | output samples preceeding time zero should be discarded during
|
---|
152 | decoding; this technique is used to allow sample-granularity
|
---|
153 | editing of the stream start time of already-encoded Vorbis
|
---|
154 | streams. The number of samples to be discarded must not exceed
|
---|
155 | the overlap-add span of the first two audio packets.
|
---|
156 |
|
---|
157 | \end{itemize}
|
---|
158 |
|
---|
159 |
|
---|
160 | In both of these cases in which the initial audio PCM starting
|
---|
161 | offset is nonzero, the second finished audio packet must flush the
|
---|
162 | page on which it appears and the third packet begin a fresh page.
|
---|
163 | This allows the decoder to always be able to perform PCM position
|
---|
164 | adjustments before needing to return any PCM data from synthesis,
|
---|
165 | resulting in correct positioning information without any aditional
|
---|
166 | seeking logic.
|
---|
167 |
|
---|
168 |
|
---|
169 | \begin{note}
|
---|
170 | Failure to do so should, at worst, cause a
|
---|
171 | decoder implementation to return incorrect positioning information
|
---|
172 | for seeking operations at the very beginning of the stream.
|
---|
173 | \end{note}
|
---|
174 |
|
---|
175 |
|
---|
176 | \item
|
---|
177 | A granule position on the final page in a stream that indicates
|
---|
178 | less audio data than the final packet would normally return is used to
|
---|
179 | end the stream on other than even frame boundaries. The difference
|
---|
180 | between the actual available data returned and the declared amount
|
---|
181 | indicates how many trailing samples to discard from the decoding
|
---|
182 | process.
|
---|
183 |
|
---|
184 | \end{itemize}
|
---|