1 | Topic:
|
---|
2 |
|
---|
3 | Sample granularity editing of a Vorbis file; inferred arbitrary sample
|
---|
4 | length starting offsets / PCM stream lengths
|
---|
5 |
|
---|
6 | Overview:
|
---|
7 |
|
---|
8 | Vorbis, like mp3, is a frame-based* audio compression where audio is
|
---|
9 | broken up into discrete short time segments. These segments are
|
---|
10 | 'atomic' that is, one must recover the entire short time segment from
|
---|
11 | the frame packet; there's no way to recover only a part of the PCM time
|
---|
12 | segment from part of the coded packet without expanding the entire
|
---|
13 | packet and then discarding a portion of the resulting PCM audio.
|
---|
14 |
|
---|
15 | * In mp3, the data segment representing a given time period is called
|
---|
16 | a 'frame'; the roughly equivalent Vorbis construct is a 'packet'.
|
---|
17 |
|
---|
18 | Thus, when we edit a Vorbis stream, the finest physical editing
|
---|
19 | granularity is on these packet boundaries (the mp3 case is
|
---|
20 | actually somewhat more complex and mp3 editing is more complicated
|
---|
21 | than just snipping on a frame boundary because time data can be spread
|
---|
22 | backward or forward over frames. In Vorbis, packets are all
|
---|
23 | stand-alone). Thus, at the physical packet level, Vorbis is still
|
---|
24 | limited to streams that contain an integral number of packets.
|
---|
25 |
|
---|
26 | However, Vorbis streams may still exactly represent and be edited to a
|
---|
27 | PCM stream of arbitrary length and starting offset without padding the
|
---|
28 | beginning or end of the decoded stream or requiring that the desired
|
---|
29 | edit points be packet aligned. Vorbis makes use of Ogg stream
|
---|
30 | framing, and this framing provides time-stamping data, called a
|
---|
31 | 'granule position'; our starting offset and finished stream length may
|
---|
32 | be inferred from correct usage of the granule position data.
|
---|
33 |
|
---|
34 | Time stamping mechanism:
|
---|
35 |
|
---|
36 | Vorbis packets are bundled into into Ogg pages (note that pages do not
|
---|
37 | necessarily contain integral numbers of packets, but that isn't
|
---|
38 | inportant in this discussion. More about Ogg framing can be found in
|
---|
39 | ogg/doc/framing.html). Each page that contains a packet boundary is
|
---|
40 | stamped with the absolute sample-granularity offset of the data, that
|
---|
41 | is, 'complete samples-to-date' up to the last completed packet of that
|
---|
42 | page. (The same mechanism is used for eg, video, where the number
|
---|
43 | represents complete 2-D frames, and so on).
|
---|
44 |
|
---|
45 | (It's possible but rare for a packet to span more than two pages such
|
---|
46 | that page[s] in the middle have no packet boundary; these packets have
|
---|
47 | a granule position of '-1'.)
|
---|
48 |
|
---|
49 | This granule position mechaism in Ogg is used by Vorbis to indicate when the
|
---|
50 | PCM data intended to be represented in a Vorbis segment begins a
|
---|
51 | number of samples into the data represented by the first packet[s]
|
---|
52 | and/or ends before the physical PCM data represented in the last
|
---|
53 | packet[s].
|
---|
54 |
|
---|
55 | File length a non-integral number of frames:
|
---|
56 |
|
---|
57 | A file to be encoded in Vorbis will probably not encode into an
|
---|
58 | integral number of packets; such a file is encoded with the last
|
---|
59 | packet containing 'extra'* samples. These samples are not padding; they
|
---|
60 | will be discarded in decode.
|
---|
61 |
|
---|
62 | *(For best results, the encoder should use extra samples that preserve
|
---|
63 | the character of the last frame. Simply setting them to zero will
|
---|
64 | introduce a 'cliff' that's hard to encode, resulting in spread-frame
|
---|
65 | noise. Libvorbis extrapolates the last frame past the end of data to
|
---|
66 | produce the extra samples. Even simply duplicating the last value is
|
---|
67 | better than clamping the signal to zero).
|
---|
68 |
|
---|
69 | The encoder indicates to the decoder that the file is actually shorter
|
---|
70 | than all of the samples ('original' + 'extra') by setting the granule
|
---|
71 | position in the last page to a short value, that is, the last
|
---|
72 | timestamp is the original length of the file discarding extra samples.
|
---|
73 | The decoder will see that the number of samples it has decoded in the
|
---|
74 | last page is too many; it is 'original' + 'extra', where the
|
---|
75 | granulepos says that through the last packet we only have 'original'
|
---|
76 | number of samples. The decoder then ignores the 'extra' samples.
|
---|
77 | This behavior is to occur only when the end-of-stream bit is set in
|
---|
78 | the page (indicating last page of the logical stream).
|
---|
79 |
|
---|
80 | Note that it not legal for the granule position of the last page to
|
---|
81 | indicate that there are more samples in the file than actually exist,
|
---|
82 | however, implementations should handle such an illegal file gracefully
|
---|
83 | in the interests of robust programming.
|
---|
84 |
|
---|
85 | Beginning point not on integral packet boundary:
|
---|
86 |
|
---|
87 | It is possible that we will the PCM data represented by a Vorbis
|
---|
88 | stream to begin at a position later than where the decoded PCM data
|
---|
89 | really begins after an integral packet boundary, a situation analagous
|
---|
90 | to the above description where the PCM data does not end at an
|
---|
91 | integral packet boundary. The easiest example is taking a clip out of
|
---|
92 | a larger Vorbis stream, and choosing a beginning point of the clip
|
---|
93 | that is not on a packet boundary; we need to ignore a few samples to
|
---|
94 | get the desired beginning point.
|
---|
95 |
|
---|
96 | The process of marking the desired beginning point is similar to
|
---|
97 | marking an arbitrary ending point. If the encoder wishes sample zero
|
---|
98 | to be some location past the actual beginning of data, it associates a
|
---|
99 | 'short' granule position value with the completion of the second*
|
---|
100 | audio packet. The granule position is associated with the second
|
---|
101 | packet simply by making sure the second packet completes its page.
|
---|
102 |
|
---|
103 | *(We associate the short value with the second packet for two reasons.
|
---|
104 | a) The first packet only primes the overlap/add buffer. No data is
|
---|
105 | returned before decoding the second packet; this places the decision
|
---|
106 | information at the point of decision. b) Placing the short value on
|
---|
107 | the first packet would make the value negative (as the first packet
|
---|
108 | normally represents position zero); a negative value would break the
|
---|
109 | requirement that granule positions increase; the headers have
|
---|
110 | position values of zero)
|
---|
111 |
|
---|
112 | The decoder sees that on the first page that will return
|
---|
113 | data from the overlap/add queue, we have more samples than the granule
|
---|
114 | position accounts for, and discards the 'surplus' from the beginning
|
---|
115 | of the queue.
|
---|
116 |
|
---|
117 | Note that short granule values (indicating less than the actually
|
---|
118 | returned about of data) are not legal in the Vorbis spec outside of
|
---|
119 | indicating beginning and ending sample positions. However, decoders
|
---|
120 | should, at minimum, tolerate inadvertant short values elsewhere in the
|
---|
121 | stream (just as they should tolerate out-of-order/non-increasing
|
---|
122 | granulepos values, although this too is illegal).
|
---|
123 |
|
---|
124 | Beginning point at arbitrary positive timestamp (no 'zero' sample):
|
---|
125 |
|
---|
126 | It's also possible that the granule position of the first page of an
|
---|
127 | audio stream is a 'long value', that is, a value larger than the
|
---|
128 | amount of PCM audio decoded. This implies only that we are starting
|
---|
129 | playback at some point into the logical stream, a potentially common
|
---|
130 | occurence in streaming applications where the decoder may be
|
---|
131 | connecting into a live stream. The decoder should not treat the long
|
---|
132 | value specially.
|
---|
133 |
|
---|
134 | A long value elsewhere in the stream would normally occur only when a
|
---|
135 | page is lost or out of sequence, as indicated by the page's sequence
|
---|
136 | number. A long value under any other situation is not legal, however
|
---|
137 | a decoder should tolerate both possibilities.
|
---|
138 |
|
---|
139 |
|
---|