Будь умным!


У вас вопросы?
У нас ответы:) SamZan.net

1 shows the block digrm of speech coding system

Работа добавлена на сайт samzan.net: 2015-07-10

Поможем написать учебную работу

Если у вас возникли сложности с курсовой, контрольной, дипломной, рефератом, отчетом по практике, научно-исследовательской и любой другой работой - мы готовы помочь.

Предоплата всего

от 25%

Подписываем

договор

Выберите тип работы:

Скидка 25% при заказе до 15.5.2024

Вопрос 1  Structure of a Speech Coding System

Figure 1.1 shows the block diagram of a speech coding system. The continuous- time analog speech signal from a given source is digitized by a standard connection

Speech source

Figure 1.1    Block diagram of a speech coding system.
of filter (eliminates aliasing), sampler (discrete-time conversion), and analog-to- digital converter (uniform quantization is assumed). The output is a discrete-time speech signal whose sample values are also discretized. This signal is referred to as the digital speech.

Traditionally, most speech coding systems were designed to support telecommu- nication applications, with the frequency contents limited between  300  and 3400 Hz. According to the Nyquist theorem, the sampling frequency must be at least twice the bandwidth of the continuous-time signal in order to avoid aliasing. A value of 8 kHz is commonly selected as the standard sampling frequency for speech signals. To convert the analog samples to a digital format using uniform quantization and maintaining toll quality [Jayant and Noll, 1984]—the digital speech will be roughly indistinguishable from the bandlimited input—more than 8 bits/sample is necessary. The use of 16 bits/sample provides a quality that is con- sidered high. Throughout this book, the following parameters are assumed for the digital speech signal:

Sampling frequency ¼ 8 kHz; Number of bits per sample ¼ 16:

This gives rise to

Bit-rate ¼ 8 kHz · 16 bits ¼ 128 kbps:

The above bit-rate, also known as input bit-rate, is what the source encoder attempts to reduce (Figure 1.1). The output of the source encoder represents the encoded digital speech and in general has substantially lower bit-rate than the input. The linear prediction coding algorithm (Chapter 9), for instance, has an output rate of

  1.  kbps, a reduction of more than 53 times with respect to the input.

The encoded digital speech data is further processed by the channel encoder, providing error protection to the bit-stream before transmission to the communica- tion channel, where various noise and interference can sabotage the reliability of the transmitted data. Even though in Figure 1.1 the source encoder and channel encoder are separated, it is also possible to jointly implement them so that source and chan- nel encoding are done in a single step.

The channel decoder processes the error-protected data to recover the encoded data, which is then passed to the source decoder to generate the output digital speech signal, having the original rate. This output digital speech signal is converted to continuous-time analog form through standard procedures: digital- to-analog conversion followed by antialiasing filtering.

In this book, the emphasis is on design of the source encoder and source decoder. For simplicity, they are referred to as the encoder and  decoder,  respectively (Figure 1.2). The input speech (a discrete-time signal having a bit-rate of 128 kbps) enters the encoder to produce the encoded  bit-stream,  or  compressed  speech data. Bit-rate of the bit-stream is normally much lower than that of the input


Input Output

speech speech

(128 kbps) (128 kbps)

bit-stream (<128 kbps)

Figure 1.2    Block diagram of a speech coder.

speech. The decoder takes the encoded bit-stream as its input to produce the output speech signal, which is a discrete-time signal having  the  same  rate  as  the input speech. As we will see later in this book, many diverse approaches can be used to design the encoder/decoder pair. Different methods provide differing speech quality and bit-rate, as well as implementational complexity.

The encoder/decoder structure represented in Figure 1.2 is known as a speech coder, where the input speech is encoded to produce a low-rate bit-stream. This bit-stream is input to the decoder, which constructs an approximation of the original signal.

Вопрос 3 Desirable Properties of a Speech Coder

The main goal of speech coding is either to maximize the perceived quality at a particular bit-rate, or to minimize the bit-rate for a particular perceptual quality. The appropriate bit-rate at which speech should be transmitted or stored depends on the cost of transmission or storage, the cost of coding (compressing) the digital speech signal, and the speech quality requirements. In almost all speech coders, the reconstructed signal differs from the original one. The bit-rate is reduced by repre- senting the speech signal (or parameters of a speech production model) with reduced precision and by removing inherent redundancy from the signal, resulting therefore in a lossy coding scheme. Desirable properties of a speech coder include:

  •  Low Bit-Rate. The lower  the bit-rate  of  the encoded  bit-stream, the  less bandwidth is required for transmission, leading to a more efficient system. This requirement is in constant conflict with other good properties of the system, such as speech quality. In practice, a trade-off is found to satisfy the necessity of a given application.
    •  High Speech Quality. The decoded speech should have a quality acceptable for the target application. There are many dimensions in quality perception, including intelligibility, naturalness, pleasantness, and speaker recognizabil- ity. See Chapter 19 for a thorough discussion on speech quality and techniques to assess it.
      •  Robustness Across Different Speakers / Languages. The underlying technique of the speech coder should be general enough to model different speakers (adult male, adult female, and children) and different languages adequately. Note that this is not a trivial task, since each voice signal has its unique characteristics.


  •  Robustness in the Presence of Channel Errors. This is crucial for digital communication systems where channel errors will have a negative impact on speech quality.
    •  Good Performance on Nonspeech Signals (i.e., telephone signaling). In a typical telecommunication system, other signals might be present besides speech. Signaling tones such as dual-tone multifrequency (DTMF) in keypad dialing and music are often encountered. Even though low bit-rate speech coders might not be able to reproduce all signals faithfully, it should not generate annoying artifacts when facing these alternate signals.
      •  Low Memory Size and Low Computational Complexity. In order for the speech coder to be practicable, costs associated with its implementation must be low; these include the amount of memory needed to support its operation, as well as computational demand. Speech coding researchers spend a great deal of effort to find out the most efficient realizations.
      •  Low Coding Delay. In the process of speech encoding and decoding, delay is inevitably introduced, which is the time shift between the input speech of the encoder with respect to the output speech of the decoder. An excessive delay creates problems with real-time two-way conversations, where the parties tend to ‘‘talk over’’ each other. Thorough discussion on coding delay is given next.

Вопрос 4 About Coding Delay

Consider the delay measured using the topology shown in Figure 1.3. The delay obtained in this way is known as coding delay, or one-way coding delay [Chen, 1995], which is given by the elapsed time from the instant a speech sample arrives at the encoder input to the instant when the same speech sample appears at the decoder output. The definition does not consider exterior factors, such as commu- nication distance or equipment, which are not controllable by the algorithm designer. Based on the definition, the coding delay can be decomposed into four major components (see Figure 1.4):

  1.  Encoder Buffering Delay. Many speech encoders require the collection of a certain number of samples before processing. For instance, typical linear prediction (LP)-based coders need to gather one frame of samples ranging from 160 to 240 samples, or 20 to 30 ms, before proceeding with the actual encoding process.

Input speech

 Synthetic

m

 

Delay


Figure 1.3    System for delay measurement.

Buffer input frame

 

Encode

 Bit

transmission Decode

 Output frame

Coding delay

Encoder buffering delay

 

Encoder processing delay

 

Transmission delay / Decoder buffering delay

 

Decoder processing delay

 Time

Figure 1.4    Illustration of the components of coding delay.

  1.  Encoder Processing Delay. The encoder consumes a certain amount of time to process the buffered data and construct the bit-stream. This delay can be shortened by increasing the computational power of the underlying platform and by utilizing efficient algorithms. The processing delay must be shorter than the buffering delay, otherwise the encoder will not be able to handle data from the next frame.
  2.  Transmission Delay. Once the encoder finishes processing one frame of input samples, the resultant bits representing the compressed bit-stream are transmitted to the decoder. Many transmission modes are possible and the choice depends on the particular system requirements. For illustration purposes, we will consider only two transmission modes: constant and burst. Figure 1.5 depicts the situations for these modes.

In constant mode the bits are transmitted synchronously at a fixed rate, which is given by the number of bits corresponding to one frame divided by the length of the frame. Under this mode, transmission delay is equal to encoder buffering delay: bits associated with the frame are fully transmitted at the instant when bits of the next frame are available. This mode of operation is dominant for most classical digital communication systems, such as wired telephone networks.

Number of bits

 Encoder buffering delay

Time

Time

Figure 1.5   Plots of bit-stream transmission pattern for constant mode (top) and burst mode (bottom). 

In burst mode all bits associated with a particular frame are completely sent within an interval that is shorter than the encoder buffering delay. In the extreme case, all bits are released right after they become available, leading to a negligibly small transmission delay. This mode is inherent to packetized network and the internet, where data are grouped and sent as packets.

Transmission delay is also known as decoder buffering delay, since it is the amount of time that the decoder must wait in order to collect all bits related to a particular frame so as to start the decoding process.

  1.  Decoder Processing Delay. This is the time required to decode in order to produce one frame of synthetic speech. As for the case of the encoder processing delay, its upper limit is given by the encoder buffering delay, since a whole frame of synthetic speech data must be completed within this time frame in order to be ready for the next frame.

As stated earlier, one of the good attributes of a speech coder is measured by its coding delay, given by the sum of the four described components. As an algorithm designer, the task is to reduce the four delay components to a minimum. In general, the encoder buffering delay has the greatest impact: it determines the upper limit for the rest of the delay components. A long encoding buffer enables a more thorough evaluation of the signal properties, leading to higher coding efficiency and hence lower bit-rate. This is the reason why most low bit-rate coders often have high delay. Thus, coding delay in most cases is a trade-off with respect to the achievable bit-rate.

In the ideal case where infinite computational power is available, the processing delays (encoder and decoder) can be made negligible with respect to the encoder buffering delay. Under this assumption, the coding delay is equal to two times the encoder buffering delay if the system is transmitting in constant mode. For burst mode, the shortest possible coding delay is equal to the encoder buffering delay, where it is assumed that all output bits from the encoder are sent instantaneously to the decoder. These values are idealistic in the sense that it is achievable only if the processing delay is zero or the computational power is infinite: the underlying platform can find the results instantly once the required amount of data is collected. These ideal values are frequently used for benchmarking purposes, since they repre- sent the lower bound of the coding delay. In the simplest form of delay comparison among coders, only the encoder buffering delay is cited. In practice, a reasonable estimate of the coding delay is to take 2.5 to 3 and 1.5 to 2.5 times the frame interval (encoder buffering delay) for constant mode transmission  and  burst mode transmission, respectively.

Вопрос 5  CLASSIFICATION OF SPEECH CODERS

The task of classifying modern speech coders is not simple and is often confusing, due to the lack of clear separation between various approaches. This section pre- sents some existent classification criteria. Readers must bear in mind that this is a constantly evolving area and new classes of coders will be created as alternative techniques are introduced. 

CLASSIFICATION OF SPEECH CODERS 9

TABLE 1.1   Classification of Speech Coders According to Bit-Rate

Category Bit-Rate Range High bit-rate  >15 kbps

Medium bit-rate 5 to 15 kbps

Low bit-rate 2 to 5 kbps

Very low bit-rate <2 kbps

Classification by Bit-Rate

All speech coders are designed to reduce the reference bit-rate of 128 kbps toward lower values. Depending on the bit-rate of the encoded bit-stream, it is common to classify the speech coders according to Table 1.1. As we will see later in this chap- ter and throughout the book, different coding techniques lead to different bit-rates. A given method works fine at a certain bit-rate range, but the quality of the decoded speech will drop drastically if it is decreased below a certain threshold. The mini- mum bit-rate that speech coders will achieve is limited by the information content of the speech signal. Judging from the recoverable message rate from a linguistic perspective for typical speech signals, it is reasonable to say that the minimum lies somewhere around 100 bps. Current coders can produce good quality at 2 kbps and above, suggesting that there is plenty of room for future improvement.

Classification by Coding Techniques

Waveform Coders

An attempt is made to preserve the original shape of the signal waveform, and hence the resultant coders can generally be applied to any signal source. These coders are better suited for high bit-rate coding, since performance drops sharply with decreasing bit-rate. In  practice,  these  coders  work  best  at  a  bit-rate  of 32 kbps and higher.

Signal-to-noise ratio (SNR, Chapter 19) can be utilized to measure the quality of waveform coders. Some examples of this class include various kinds of pulse code modulation (PCM, Chapter 6) and adaptive differential PCM (ADPCM).

Parametric Coders

Within the framework of parametric coders, the speech signal is assumed to be gen- erated from a model, which is controlled by some parameters. During encoding, parameters of the model are estimated from the input speech signal, with the para- meters transmitted as the encoded bit-stream. This type of coder makes no attempt to preserve the original shape of the waveform, and hence SNR is a useless quality measure. Perceptual quality of the decoded speech is directly related to the accu- racy and sophistication of the underlying model. Due to this limitation, the coder is signal specific, having poor performance for nonspeech signals. 

There are several proposed models in the literature. The most successful, how- ever, is based on linear prediction. In this approach, the human speech production mechanism is summarized using a time-varying filter (Section 1.3), with the coeffi- cients of the filter found using the linear prediction analysis procedure (Chapter 4). This is the only type of parametric coder considered in this book.

This class of coders works well for low bit-rate. Increasing the bit-rate normally does not translate into better quality, since it is restricted by the chosen model. Typi- cal bit-rate is in the range of 2 to 5 kbps. Example coders of this class include linear prediction coding (LPC, Chapter 9) and mixed excitation linear prediction (MELP, Chapter 17).

Hybrid Coders

As its name implies, a hybrid coder combines the strength of a waveform coder with that of a parametric coder. Like a parametric coder, it relies on a speech pro- duction model; during encoding, parameters of the model are located. Additional parameters of the model are optimized in such a way that the decoded speech is as close as possible to the original waveform, with the closeness often measured by a perceptually weighted error signal. As in waveform coders, an attempt is made to match the original signal with the decoded signal in the time domain.

This class dominates the medium bit-rate coders, with the code-excited linear prediction (CELP, Chapter 11) algorithm and its variants the most outstanding representatives. From a technical perspective, the difference between a hybrid coder and a parametric coder is that the former attempts to quantize or represent the excitation signal to the speech production model, which is transmitted as part of the encoded bit-stream. The latter, however, achieves low bit-rate by discarding all detail information of the excitation signal; only coarse parameters are extracted.

A hybrid coder tends to behave like a waveform coder for high bit-rate, and like a parametric coder at low bit-rate, with fair to good quality for medium bit-rate.

Вопрос 6 Origin of Speech Signals

The speech waveform is a sound pressure wave originating from controlled movements  of  anatomical  structures  making  up  the  human  speech  production

Figure 1.7  Diagram of the human speech production system.

system. A simplified structural view is shown in Figure 1.7. Speech is basically generated as an acoustic wave that is radiated from the nostrils and the mouth when air is expelled from the lungs with the resulting flow of air perturbed by the constrictions inside the body. It is useful to interpret speech production in terms of acoustic filtering. The three main cavities of the speech production system are nasal, oral, and pharyngeal forming the main acoustic filter. The filter is excited by the air from the lungs and is loaded at its main output by a radiation impedance associated with the lips.

The vocal tract refers to the pharyngeal and oral cavities grouped together. The nasal tract begins at the velum and ends at the nostrils of the nose. When the velum is lowered, the nasal tract is acoustically coupled to the vocal tract to produce the nasal sounds of speech.

The form and shape of the vocal and nasal tracts change continuously with time, creating an acoustic filter with time-varying frequency response. As air from the lungs travels through the tracts, the frequency spectrum is shaped by the frequency selectivity of these tracts. The resonance frequencies of the vocal tract tube are called formant frequencies or simply formants, which depend on the shape and dimensions of the vocal tract.

Inside the larynx is one of the most important components of the speech produc- tion system—the vocal cords. The location of the cords is at the height of the ‘‘Adam’s apple’’—the protrusion in the front of the neck for most adult males. Vocal cords are a pair of  elastic  bands  of  muscle  and  mucous  membrane that open and close rapidly during speech production. The speed by which the cords open and close is unique for each individual and define the feature and personality of the particular voice. 

Modeling the Speech Production System

In general terms, a model is a simplified representation of the real world. It is designed to help us better understand the world in which we live and, ultimately, duplicate many of the behaviors and characteristics of real-life phenomenon. However, it is incorrect to assume that the model and the real world that it repre- sents are identical in every way. In order for the model to be successful, it must be able to replicate partially or completely the behaviors of the particular object or fact that it intends to capture or simulate. The model may be a physical one (i.e., a model airplane) or it may be a mathematical one, such as a formula.

The human speech production system can be modeled using a rather simple structure: the lungs—generating the air or energy to excite the vocal tract—are represented by a white noise source. The acoustic path inside the body with all its components is associated with a time-varying filter. The concept is illustrated in Figure 1.9. This simple model is indeed the core structure of many speech coding algorithms, as can be seen later in this book. By using a system identification


Output speech

Lungs Trachea

Pharyngeal cavity Nasal cavity

Oral cavity Nostril Mouth

Figure 1.9    Correspondence between the human speech production system with a simplified system based on time-varying filter.

technique called linear prediction (Chapter 4), it is possible to estimate the para- meters of the time-varying filter from the observed signal.

The assumption of the model is that the energy distribution of the speech signal in frequency domain is totally due to the time-varying filter, with the lungs produ- cing an excitation signal having a flat-spectrum white noise. This model is rather efficient and many analytical tools have already been developed around the concept.

НЕ ВОПРОС. ВОзМОЖНО ПОПАДЁТСЯ СХЕМА!!!General Structure of a Speech Coder

Figure 1.12 shows the generic block diagrams of a speech encoder and decoder. For the encoder, the input speech is processed and analyzed so as to extract a number of parameters representing the frame under consideration. These parameters are encoded or quantized with the binary indices sent as the compressed bit-stream


Input PCM

speech

 

I

Pack

 

Bit-stream


Bit-stream

 

Synthetic h

Figure 1.12    General structure of a speech coder. Top: Encoder. Bottom: Decoder.

(see Chapter 5 for concepts of quantization). As we can see, the indices are packed together to form the bit-stream; that is, they are placed according to certain prede- termined order and transmitted to the decoder.

The speech decoder unpacks the bit-stream, where the recovered binary indices are directed to the corresponding parameter decoder so as to obtain the quantized parameters. These decoded parameters are combined and processed to generate the synthetic speech.

Similar block diagrams as in Figure 1.12 will be encountered many times in later chapters. It is the responsibility of the algorithm designer to decide the functionality and features of the various processing, analysis, and quantization blocks. Their choices will determine the performance and characteristic of the speech coder.

Вопрос 7 Structure of the Human Auditory System

A simplified diagram of the human auditory system appears in Figure 1.13. The pinna (or informally the ear) is the surface surrounding the canal in which sound is funneled. Sound waves are guided by the canal toward the eardrum—a mem- brane that acts as an acoustic-to-mechanic transducer. The sound waves are then translated into mechanical vibrations that are passed to the cochlea through a series of bones known as the ossicles. Presence of the ossicles improves sound propaga- tion by reducing the amount of reflection and is accomplished by the principle of impedance matching.

The cochlea is a rigid snail-shaped organ filled with fluid. Mechanical oscilla- tions impinging on the ossicles cause an internal membrane, known as the basilar membrane, to vibrate at various frequencies. The basilar membrane is characterized by a set of frequency responses at different points along the membrane; and a sim- ple modeling technique is to use a bank of filters to describe its behavior. Motion along the basilar membrane is sensed by the inner hair cells and causes neural activities that are transmitted to the brain through the auditory nerve.

The different points along the basilar membrane react differently depending on the frequencies of the incoming sound waves. Thus, hair cells located at different positions along the membrane are excited by sounds of different  frequencies. The neurons that contact the hair cells and transmit the excitation to higher auditory centers maintain the frequency specificity. Due to this arrangement, the human auditory system behaves very much like a frequency analyzer; and system characterization is simpler if done in the frequency domain.

Figure 1.13    Diagram of the human auditory system.


200

AT( f )  100

0

3 4 5

10 100 1 .10

f

 1 .10

 1 .10

Figure 1.14    A typical absolute threshold curve.

Вопрос 8 Absolute Threshold

The absolute threshold of a sound is the minimum detectable level of that sound in the absence of any other external sounds. That is, it characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment. Figure 1.14 shows a typical absolute threshold curve, where the hor- izontal axis is frequency measured in hertz (Hz); while the vertical axis is the abso- lute threshold in decibels (dB), related to a reference intensity of 1012 watts per square meter—a standard quantity for sound intensity measurement.

Note that the absolute threshold curve, as shown in Figure 1.14, reflects only the average behavior; the actual shape varies from person to person and is measured by presenting a tone of a certain frequency to a subject, with the intensity being tuned until the subject no longer perceive its presence. By repeating the measurements using a large number of frequency values, the absolute threshold curve results.

As we can see, human beings tend to be more sensitive toward frequencies in the range of 1 to 4 kHz, while thresholds increase rapidly at very high and very low frequencies. It is commonly accepted that below 20 Hz and above 20 kHz, the auditory system is essentially dysfunctional. These characteristics are due to the structures of the human auditory system: acoustic selectivity of the pinna and canal, mechanical properties of the eardrum and ossicles, elasticity of the basilar membrane, and so on.

We can take advantage of the absolute threshold curve in speech coder design.

Some approaches are the following:

  •  Any signal with an intensity below the absolute threshold need not be considered, since it does not have any impact on the final quality of the coder.
    •  More resources should be allocated for the representation of signals within the most sensitive frequency range, roughly from 1 to 4 kHz, since distortions in this range are more noticeable.

Masking

Masking refers to the phenome????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

??????

?

?

?

?

?

?

?

?

??????????

?

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

??????

?

??????????

?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????filter to weight the error spec- trum during encoding; frequency response of the filter is time-varying and depends on the original spectrum of the input signal. The mechanism is highly efficient and is widely applied in practice.

Phase Perception

Modern speech coding technologies rely heavily on the application of perceptual characteristics of the human auditory system in various aspects of a quantizer’s design and general architecture. In most cases, however, the focus on perception is largely confined to t?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????chiefly determined by the magnitude spectrum. This latter example was already described in the last section for the design of a rudimentary coder and is the foundation of some early speech coders, such as the linear predic- tion coding (LPC) algorithm, stud??????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

Вопрос 9 SPEECH CODING STANDARDS

This book focuses mainly on the study of the foundation and historical evolution of many standardized coders. As a matter of principle, a technique is included only if it is part of some standard. Standards exist because t?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????influential and successful ideas in this field of knowledge. Otherwise, we would have to spend an enormous amount of effort to deal with the endless papers, reports, and propositions in the literature; many of these might be immature, incomplete, or, in some??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

  1.  ???????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

  •  ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
    •  ????????????????????????????????????????????????????????????????????????????????????????????????????????????????specific applications. It is part of the American National Standards Institute (ANSI). The TIA has successfully developed standards for North American digital cellular telephony, including time division multiple access (TDMA) and code division multiple access (CDMA) systems.
      •  European Telecommunications Standards Institute (ETSI). The ETSI has memberships from European countries and companies and is mainly an organization of equipment manufacturers. ETSI is organized by application; the most influential group ????????????????????????????????????????????????????????????????????????????????????????????????????????????
      •  ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
      •  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

???????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

??????????????????????????????????????????????????????

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

????????is described only partially.

b Coder is fully explained.

c Coder is mentioned only briefly without detailed technical descriptions.

However, the major achievements in speech coding for the past thirty years are well represented by the coders on the list.

It is important to mention that the philosophy of this book is to explain the whys and hows of a specific algorithm; most importantly, to justify the selection of a par- ticular technique for an application. Each standardized coder tends to have its own idiosyncrasies and minute operational tricks that might not be important for the understanding of the foundation of the algorithm and hence are often omitted. In order to develop a bit-stream compatible version of the coder, consultation with official docume????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

?

?

??????????????????????????????????????????????????????????????????????

?

?

?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????ous conditions and, in many instances, it is difficult to establish a fair comparison. The data are compiled from various sources and give a rough idea of relative performance among the dif- ferent coders. Delay is reflected by the height of a particular quality/bit-rate coor- dinate and refers to the encoder buffering delay.

Finally, the fact that certain proposed techniques have not become part of a standard does not mean that they are worthless. Sometimes there is a need for refinement; in other instances t????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

?

?

?

Вопрос 10   PITCH PERIOD ESTIMATION

One of the most important parameters in speech analysis, synthesis, and coding applications is the fundamental frequency, or pitch, of voiced speech. Pitch frequency is directly related to the speaker and sets the unique characteristic of a person. Voicing is generated when the airflow from the lungs is periodically inter- rupted by movements of the vocal cords. The time between successive vocal cord openings is called the fundamental period, or pitch period.

For men, the??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????


??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

???????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

??

????????????

??????????????

?????????????????????

?

?

reflects the similarity between the frame s[n], n ¼ m N þ 1 to m, with respect to the time-shifted version s[n l], where l is a positive integer representing a time lag. The range of lag is selected so that it covers a wide range of pitch period values.?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????find the value of lag associated with the highest autocorrelation representing the pitch period estimate, since, in theory, autocorrelation is maximized when the lag is equal to the pitch period. The method is summarized with the following pseudocode:

It is important to mention that, in practice, the speech signal is often lowpass filtered before being used as input for pitch period estimation. Since the fundamental frequency associated with voicing is located in the low-frequency region (<500 Hz), lowpass fi????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?

?

?

??????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????defined by

m

MDF½l; m] ¼ X

n ¼ m N þ 1

 js½n]— s½n l]j: ð2:2Þ


For short segments of voiced speech it is reasonable to expect that s[n] s[n l] is small for l ¼ 0, T T, T 2T, ... , with T being the signal’s period. Thus, by computing the magnitude??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

????????????????????????

???????????????????????????????????????find integer-valued pitch periods. That is, the  resultant  period  values  are  multiples  of  the  sampling  period  (8 kHz)1 ¼

0.125 ms. In many applications, higher resolution is necessary to achieve good performance. In fact, pitch period of the origi???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????fixed sampling rate. Interpolation, for instance, is a widely used method, where the actual sampling rate is increased. Medan, Yair, and Chazan (1991) published an algorithm for pitch period determination, which is based on a simple linear interpolation technique. The method allows the finding of a real-valued pitch period and can be implemented efficiently in practice. This method is explained in detail as follows.

Optimal Integer-Valued Pitch Period

Consider a speech frame that ends at time instant n ¼ m, with a length of N

(Figure 2.4). The frame can be expressed by

s½nb · s½n Ne½n]; m N þ 1 Ç n Ç m: ð2:3Þ

The above equation expresses {s[n], m N þ 1 Ç n Ç m} as the sum between the product of a coefficient b with the frame {s[n N], m 2N þ 1 Ç n Ç m N} and


n

Figure 2.4    Signal frames in pitch period estimation.

the error signal* e[n]. Note from Figure 2.4 that two consecutive frames of length N are involved. The optimal pitch period at time instant n ¼ m can be defined as the??????????????????????????????????????????????????????????????????????????????????????????

?

??

?????????????????????

?????????????????

?

???????????

???

??

??????????????

???????

?????????

?

?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

??

???????????????

?

?????????????????

????

?????????

?

??

??????????????

???????????

?

?

?

?

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????ed as the product of the past with a coefficient. Chapter 4 contains further details on the topic.


Substituting (2.6) in (2.4) and manipulating yields

. m .2

P s½n]s½n N]

 n ¼ m N þ 1  

J½m; N1

 m

P

n ¼ m N þ 1

 m

s2½n N] P

n ¼ m N þ 1

 s2½n]

 : ð2:7Þ


Вопрос 11 LINEAR PREDICTION

Linear prediction (LP) forms an integral part of almost all modern day speech cod- ing algorithms. The fundamental idea is that a speech sample can be approximated as a linear combination of past samples. Within a signal frame, the weights used to compute the linear combination are found by minimizing the mean-squared predic- tion error; the resultant weights, or linear prediction coefficients (LPCs*), are used to represent the particular frame.

Within the core of the LP scheme lies the autoregressive model (Chapter 3). Indeed, linear prediction analysis is an estimation procedure to find the AR para- meters, given samples of the signal. Thus, LP is an identification technique wher?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????define the PSD of the signal itself (Chapter 3). By computing the LPCs of a signal frame, it is possible to generate another signal in such a way that the spectral contents are close to the original one.

LP can also be viewed as a redundancy removal procedu?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?

??????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????efficient procedures, namely, the Levinson– Durbin algorithm and the Leroux–Gueguen algorithm, are explained. The concept of long-term linear prediction is described, followed by some LP-based speech synthesis models. Practical issues related to speech processing are explained, with an alternative prediction scheme based on the moving average (MA) model given at the end of the chapter. LP is by no means confined to the speech processing arena; in fact, it is widely applied to many diverse areas. Readers are encouraged to consult other sources for additional information on the topic.

THE PROBLEM OF LINEAR PREDICTION

Here, linear prediction is described as a system identification problem, where the parameters of an AR model are estimated from the signal itself. The situation is illustrated in Figure 4.1. The white noise signal x½n] is filtered by the AR process

synthesizer to obtain s½n]—the AR signal—with the AR parameters denoted by ^ai. A linear predictor is used to predict s½n] based on the M past samples; this is done with

M

^s½n] ¼ — X ais½n i]; ð4:

i¼1

where the ai are the estimates of the AR parameters and are referred to as the linear prediction coefficients (LPCs)*. The constant M is known as the prediction order. Therefore, prediction is based on a linear combination of the M past samples of the

signal, and hence the prediction is linear. The prediction error is equal to

e½ns½n]— ^s½n]: ð4:

AR

signal

1 s[n]

 Predicted signal

M s[n]  

x[n]

 M -å ai z

White

 1+ åâ z-i

 i =1

noise

 i =1

AR process synthesizer

Signal source

 Predictor

 

e[n] Prediction

error

Figure 4.1  Linear prediction as system identification.

*In some literature, the sign convention for the LPC is reversed.


s [ n ] e [ n ]

z-1 a

1

z-1

2

 

  •  s[n]  

z-1

M

Figure 4.2    The prediction-error filter.

That is, it is the difference between the actual sample and the predicted one. Figure 4.2 shows the signal flow graph implementation of (4.2) and is known as the prediction-error filter: it takes an AR signal as input to produce the prediction-error signal at its output.

Вопрос 12 Error Minimization

The system identification problem consists of the estimation of the AR parameters

^ai from s½n], with the estimates being the LPCs. To perform the estimation, a ????????????????????????????????????????????????????????????????????????????????????????

?

?

???

??????????????????

??

??

??

???????????????????

????

?????

??

??

??

??????

?

?

?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?????????

????

??????????

???

??????????

????

?????

????????????????

???????

?

?

????????????????????????????????????????????????????????????????????????????????????????satisfied, then ai ¼ ^ai; that is, the LPCs are equal to the AR parameters. Justification


of this claim appears at the end of the section. Thus, when the LPCs are found, the system used to generate the AR signal (AR process synthesizer) is uniquely identifi????

?

?

????????????????

?????????????????????????????????????????

?

??

????????????????????????????????????????????????????

????

?

?

???

?

?

?

?

?????????????????????????????

??

?

??

?????????????????????????????

????

?

??????????????????????????????????????

??????????????????????????????

???????????????defines the optimal LPCs in terms of the autocorrelation Rs½l] of the signal s½n]. In matrix form, it can be written as

Rsa ¼ —rs; ð4:9Þ

where

0 Rs½0] Rs½1] ··· Rs½M 1] 1 B Rs½1] Rs½0] ··· Rs½M 2] C

Rs ¼ B

 .. .. . .

 .. C; ð4:10Þ

B C

B C

@ A

Rs½???????????????????????????

?????????????????????????????

?????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

????????????????????

???????????????????????????finding of the LPCs if the autocorrelation values of s½n]

are known from l ¼ 0 to M.


Prediction Gain

The prediction gain of a predictor is given by

PG ¼ 10 log10

 .s2.

2

e

 

¼ 10 log10

 .E.s2½n]..

?

?????????

????????

?

?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?

?

ВОЗМОЖНОЕ ДОПОЛНЕНИЕ К 12 ВОПРОСУ!!Minimum Mean-Squared Prediction Error

From Figure 4.1 we can see that when ai ¼ ^ai, e½n] ¼ x½n]; that is, the prediction error is the same as the white noise used to generate the AR signal s½n]. Indeed, this is the optim?????????????????????????????????????????????????????????????

????????????????????????????????????????

?

???????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????xcitation signal of the AR process synthesizer. This is a reasonable result since the best that the prediction- error filter can do is to ‘‘whiten’’ the AR signal s½n]. Thus, the maximum prediction


gain is given by the ratio between the variance of s½n] a?????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????

?

??

?????????????????????????????????????

????

?

??????????????????????????????????????????????????????????????????????????????????????????????

?

?

??????????????????

??????????

???????

?

???????????????

?

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

????

?

???????????????????????

?????????????????????

????????

?

?

??????????????

?????

Вопрос 13/14 Prediction Schemes

Different prediction schemes are used in various applications and are decided by system requirements. Generally, two main techniques are applied in speech coding: internal prediction and external prediction. Figure 4.3??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?

?

???????????????????????????????????

??????????????????????????????????????????????????????????

?

?

?

????

??- N m m

 

The LPCs derived from the

estimated autocorrelation values are used to predict the signal samples within the same interval.

 Interval where the

derived LPCs from the estimated autocorrelat???????????????????????????????????????????????????


?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????finite-length window is used to extract the signal samples.

External prediction is prevalently used in those applications where low coding delay is the prime concern. In that case, a much shorter frame must be used (on the order of 20 samples, such as the L????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????coefficients are combined in a specific way and applied to a given interval for the prediction task. We skip the details for now, which are covered thoroughly in Chapter 8, when interpolation of LPCs is introduced.

Prediction Gain

Prediction gain is given here using a similar definition as presented in the last sec- tion, with the expectations changed to summations

PG½m] ¼ 10 log10

 

m

n¼mNþ1

Pm

 s2½n].

2 ; ð4:23Þ

where

 n¼mNþ1 e ½n]

M

e½n] ¼ s½n]— ^s½ns½n]þ X ai½m]s½n i]; n ¼ m N þ 1; ... ; m: ð4:24Þ

i¼1


The LPCs ai½m] are found from the samples inside the interval ½m N þ 1; m] for internal prediction, and n < m N þ 1 for external prediction. Note that the pre- diction gain defined in (4.23) is a function of the time variable m. In practice, the

average performance of a prediction scheme is often measured by the segmental prediction gain, defined with

SPG ¼ AfPG½m]g; ð4:25Þ

which is the time average of the prediction gain for each frame in the decibel domain.

Example 4.2 White noise is generated using a random number generator with uniform distribution and unit variance. This signal is then filtered by an AR synthe- sizer with

???????????????

?

?

??????????????

????????????????

?

?

?????????????

??????

????????????????

???

????????????????

??????

????????????

????

?????????????????????

??????

??????????????????????

???

?????????????????????

??????

??????????????????

???????????????

????????????????

??????

???????????????

?????

????????????????????

?

??????

?

????????????????????

?

????

???????????????????????????????????????????????

??????

??????????????????

????

?????????????????????????

??????

????????????

????

?????????????????????

??????

????????????????????

???

????????????????

??????

??????????????????

???????????????

????????????????????

?

?

?

???????????????????????????

??????

???????????????????????

?????????

???????????????????????????

?

??????

?

????????????

??????

?????????????????????

??

????????????????

??????

???????????????????

?????

????????????????

??????

????????????????

????

????????????????????

?

??????

?

????????

?

????

????????????????????????????????????????????????

??????

???????????????

??????????????????

????????????????

?

?

??????????????????

??????????????????

?

?

???????????

?

?

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

???

?

?

?

?

?

?

?

??????????

?

?

?

?

?

?

?

??

????????

??

?

???????????????????????????????????????????????????????????????????????????????????????????????

?


?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????; thus, the general behavior of the linear predictor is not fully revealed. For a more accurate study on the behavior of the signal, a higher number

of sample realizations for the random signal are needed.

Figure 4.5 compares the theoretical PSD (defined with the original AR para- meters) with the spectrum estimates found with the LPCs computed from the signal frame using M ¼ 2, 10, and 20. For low prediction order, the resultant spectrum is not capable of fitting the original PSD. An excessively high order, on the other hand, leads to overfitting, where undesirable errors are introduced. In the present

case, a prediction order of 10 is optimal. Note how the spectrum of the original signal is captured by the estimated LPCs. This is the reason why LP analysis is known as a spectrum estimation technique, specifically a parametric spectrum estimation method since the process is done through a set of parameters or coefficients.

Вопрос 15 LONG-TERM LINEAR PREDICTION

Experiments in Section 4.3 using real speech dat?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????fit is indicated by the remaining periodic compo- nent. By increasing the prediction order to include one pitch period, the periodicity in the prediction error has largely disappeared, leading to a rise in prediction gain. High prediction order leads to excessive bit-rate and implementational cost since more bits are required to represent the LPCs, and extra computation is needed dur- ing analysis. Thus, it is desirable to come up with a scheme that is simple and yet able to model the signal with sufficient accuracy.

Important observation is derived from the experimental results of Section 4.3 (Figure 4.9). An increase in prediction gain is due mainly to the first 8 to 10 coeffi- cients, plus the coefficient at the pitch period, equal to 49 in that particular case. The LPCs at orders between 11 and 48 and at orders greater than 49 provide essen- tially no contribution toward improving the prediction gain. This can be seen from the flat segments from 10 to 49, and beyond 50. Therefore, in principle, the coeffi- cients that are not contributing toward elevating the prediction gain can be elimi- nated, leading to a more compact and efficient scheme. This is exactly the idea of long-term linear prediction, where a short-term predictor is connected in cascade with a long-term predictor, as shown in Figure 4.16. The short-term predictor is basically the one we have studied so far, with a relatively low prediction order M in the range of 8 to 12. This predictor eliminates the correlation between nearby


s[n]

 

M

-å a z-i

 es[n]

 

-bz-T

 

e[n]

i =1

Short-term predictor

 Long-term predictor

Figure  4.16   Short-term  prediction-error  filter  connected  in  cascade  to  a  long-term prediction-error filter.

samples or is short-term in the temporal sense. The long-term predictor, on the other hand, targets correlation between samples one pitch period apart.

The long-term prediction-error filter with input es½n] and output e½n] has system

function

HðzÞ ¼ 1 þ bzT : ð4:81Þ

Note that two parameters are required to specify the filter: pitch period T and long- term gain b (also known as long-term LPC or pitch gain). The procedure to deter- mine b and T is referred to as long-term LP analysis. Positions of the predictors in Figure 4.16 can actually be exchanged. However, experimentally it was found that the shown configuration achieves on average a higher prediction gain [Ramachan- dran and Kabal, 1989]. Thus, it is adopted by most speech coding applications.

Long-Term LP Analysis

A long-term predictor predicts the current signal sample from a past sample that is one or more pitch periods apart, using the notation of Figure 4.16:

^es½n] ¼ —bes½n T]; ð4:82Þ

where b is the long-term LPC, while T is the pitch period or lag. Within a given time interval of interest, we seek to fi????????????????????????????????????????????

???????????????????????????????????????????????????????

????

?

???????????????????????????????????????????????????????????????????????????????????????????????????????????

?

??????????????????

?

??????

???????????

??????????

?

?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?


?

??????????????????????????????????????????????????find the optimal T. Substituting (4.84) back into (4.83) leads to

J ¼ X e2½n]—

n

  1.   n es½n]es½n T].

s ½n T]

 

: ð4:85Þ

The parameters Tmin and Tmax in Line 2 define the search range within which the pitch period is determined. The reader must be a???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????filter is found to be 0.712 dB.

The Frame/Subframe Structure

Results of Example 4.6 show that the effectiveness of the long-term predictor on removing long-term correlation is limited.  In fact, the overall  prediction-error sequence is very much like the????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

???????????????????????????????????????????????

?

a1 ¼ 1:534

a2 ¼ 1

a3 ¼ 0:587 a4 ¼ 0:347

a5 ¼ 0:08

a6 ¼ —0:061

a7 ¼ —0:172

a8 ¼ —0:156 a9 ¼ —0:157

a10 ¼ —0:141

?

?

?

?

??????????????????????????????????????????????????

?

?????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?

?

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????final prediction-error sequence. Compared to the outcome of Example 4.6 (Figure 4.18), it is clear that in the present case the sequence is ‘‘whiter,’’ with lower amplitude and periodicity largely removed. A prediction gain of 2.26 dB is registered, which is a substantial improvement with respect to the 0.712 dB obtained in Example 4.6.

Вопрос …SYNTHESIS FILTERS

So far we have focused on analyzing the signal with the purpose of identifying the parameters of a system, based on the assumption that the system itself satisfies the AR constraint. The identification process is done by minimizing the prediction error. If the prediction error is ‘‘white’’ enough, we know that the estimated system is a good fit; therefore, it can be used to synthesize signals having similar statistical properties as the original one. In fact, by exciting the synthesis filter with the system function

HðzÞ ¼ 1

 1

þ i¼1 aiz

 i ð4:87Þ

using a white noise signal, the filter’s output will have a PSD close to the original signal as long as the prediction order M is adequate. In (4.87), the ai are the LPCs found from the original signal. Figure 4.23 shows the block diagram of the synth- esis filter, where a unit-variance white noise is generated and scaled by the gain g and is input to the synthesis filter to generate the synthesized speech at the output. Since x½n] has unit variance, gx½n] has variance equal to g2. From (4.16) we can readily write


x[n] Unit- variance white noise

 s n

ynthesized

speech

Synthesis filter

Figure 4.23   The synthesis filter.

Thus, the gain can be found by knowing the LPCs and the autocorrelation values of the original signal. In (4.88), g is a scaling constant. A scaling constant is needed because the autocorrelation values are normally estimated u????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? noise with uniform distribution and unit variance is used

as input to the synthesis filter. The gain g is found from (4.88) with g ¼ 1:3. The synthesized speech and periodogram are displayed in Figure 4.24. Compared to the

original signal (Figures 4.6 and ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????inefficient. Thus, many LP-based speech


x[n] Unit- variance white noise

 s n thesized speech

Figure 4.25    Long-term and short-term linear prediction model for speech production.

coding algorithms rely on a prediction order between 8 and 12, with order ten being the most widely employed. Since this low prediction order is not sufficient to recre- ate the PSD for voiced signal, a non-white-noise excitation is utilized as input to the synthesis filter. The choice of excitation is a trade-off ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

??????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????d from the original speech signal. The long-term predictor is responsible for generating correlation between samples that are one pitch period apart. The filter with system function

HPðzÞ ¼ 1

 1

bzT ; ð4:89Þ

describing the effect of the long-term predictor in synthesis, is known as the long- term synthesis filter or pitch synthesis filter. On the other hand, the short-term predictor recreates the correlation present between nearby samples, with a typical prediction order equal to ten. The synthesis filter associated with the short-term pre- dictor, with system function given by (4.87), is also known as the formant synthesis filter since it generates the envelope of the spectrum in a way similar to the vocal

track tube, with resonant frequencies known simply as formants. The gain g in Fig-

ure 4.25 is usually found by comparing the power level of the synthesized speech signal to the original level.

Example 4.10 The magnitude of the transfer functions for the pitch synthesis fil- ter and formant synthesis filter obtained from Example 4.7 are plotted in Figure

4.26. In the same figure, the product of the transfer functions is also plotted. Since


10 100

10

1 1

0.1

(a)

 0.1

 

0 0.5 1

 

(b)

 0.01

 

0 0.5 1

w/p  

100

10

1

0.1

(c)

 0.01

 

0 0.5 1

w/p  


Figure 4.26 Magnitude plots of the transfer functions for (a) a pitch synthesis filter, (b) a formant synthesis filter, and (c) a cascade connection between pitch synthesis filter and formant synthesis filter.

the two filters are in cascade, the overall transfer function is equal to their product. Parameters of the filters are

??????

???????????

???????????

???????????

???????????

b ¼ —0:735

T ¼ 49

a1 ¼ —1:502

a6 ¼ 1:255

a2 ¼ 1:738

a7 ¼ —0:693

a3 ¼ —2:029

a8 ¼ 0:376

a4 ¼ 1:789

a9 ¼ —0:08

a5 ¼ —1:376

a10 ¼ 0:033

Note that the pitch synthesis filter generates the harmonic structure due to the fundamental pitch frequency, while the formant synthesis filter captures the spec- trum envelope. Product of the two recreates the original signal spectrum. Compared to Figure 4.7, we can see that the spectrum due to the synthesis filters has a shape that closely matches the PSD of the original signal.

Вопрос 16/17 Linear Predictive Coding (LPC)

Basic Principles

LPC starts with the assumption that the speech signal is produced by a buzzer at the end of a tu??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????ederal standard 1016. It provides good quality, natural sounding speech at 4800 bits per second.

Вопрос 17/18 Принцип выбранного метода кодирования речи состоит в извлечении основных характеристик речи в форме коэффициентов фильтра, по которым речь может быть восстановлена, используя низкоскоростную квантизацию. Структурные схемы кодера и декодера речи показаны на рис. 2 [4, 5].

Уменьшение скорости до 13 кбит/с достигается тремя этапами: 1. LPC - линейным кодированием с предсказанием; 2. LTP - долговременным предсказанием; 3. RPE – регулярным импульсным возбуждением. На первом этапе входной сигнал разделяется на сегменты 260 бит по 20 мс. Затем в процессе LPC анализа вычисляются 8 коэффициентов r(i) цифрового LPC анализирующего фильтра, которые представляются как уровень, и минимизируется динамический диапазон d фильтрованной версии. На втором этапе происходит дальнейшее снижение динамического диапазона за счет долговременного предсказания, в процессе которого каждый сегмент выравнивается до уровня следующих друг за другом сегментов речи. В принципе, LTP фильтр вычитает предыдущий период сигнала из текущего периода. Этот фильтр характеризуется параметром задержки N и коэффициентом усиления b. Период вычисления этих параметров равен 5 мс. Восемь коэффициентов r(i) LPC анализирующего фильтра и параметры фильтра LTP анализа кодируются и передаются со скоростью 3,6 кбит/с. Для формирования последовательности возбуждения остаточный сигнал пропускают через фильтр нижних частот с частотой среза 3-4 кГц. Окончательно периодическая последовательность фрагментов передается со скоростью 9,4 кбит/с. Общая скорость передачи составляет 3,6 + 9,4 =13 кбит/с. В декодере речевой сигнал восстанавливается по откликам последовательности регулярного импульсного возбуждения (RPE) двухступенчатым синтезирующим фильтром, как показано на рис. 2. При этом качество речи соответствует качеству речи, передаваемой по ISDN, и превосходит качество речи в аналоговых радиотелефонных системах.

Теоретически время задержки речевого сигнала в кодеке равно длительности сегмента и составляет 20 мс Реальное время задержки, с учетом операций канального кодирования и перемежения, а также физического выполнения рассматриваемых операций, составляет 70-80 мс.

  1.  16. Speech encoding. LPC encoder

Linear predict???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

Overview?

???????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

LPC coefficient representations?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

Applications?

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????

?????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????

??????????????

?????????????????

???????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

?

?

??????????????????????????????????????

???????????????????????????????

?

????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

??????????????????

????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

?????????

?????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????

?????????????????????????????

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

???????????????????????????????

??????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????

?

?

?

??????????????????

???????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????

???????????????????????????

?

?

  1.  ??????????????????????????????????????

???????????????

????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

??????????????

?

?????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????

?????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????

???????????????????????

????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????

??????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????

???????????????

???????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????

??????????

??????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????

?

  1.  ???????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

?

?

???????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?

???????

????????

?

??????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????

????????????

?

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????

?

????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

??????????????????????????????????

?

????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

?

?????????????

?

??????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????

?????????????????????????????

?

????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

????????????????????????????????

?

???????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

?

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

?

???????????????????????

?

???????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

??????????????

?

?????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????

?

??????????????????????????????????????????????????

?????????????????????????????????????????????????????

??????????????????????????????????????????????????

?

???????????????????????????????????????????????????

??????????????????????????????????????????????????????????????

???????????????????????????????????????????????????

????????????????????????????????

??????????????????????????????????????

?

??????????????????????????????????????????????

?

????????????????????????????????????????????????????????????????????

?

???????????????????????????????????????????????????????????????????????

?

????????????????????????????????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????????

?????????????????????

?

???????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

???????????????????????????????????????

?

??????????????????????????????????????????????????????????????????

?

??????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????

?

?

???????????????????????????????????????????????????????

???????????????????????????????????????????????????????

???????????????????????????????????????????????????????

???????????????????????????????????????????????????????

????????????????????????????????????????

?

?????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????

??????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????

?

  1.  ??????????????????????????????????????????

?

?

?

?

?

???? ??????????????????????????????????????????

?

?

????????????????????????????

?

?

?????????????? ?????  ?????????????????????????????????????????????????  ?????????????????????????????????????????????????????????????????? ??????  ????????????????????????????????????????????????? ???????????????????????????????  ?? ???? ??? ? ???????????????????????????? ???? ????????????????????????????????????????????????????????????????????????????????????  ?????????????? ??????????????????????????? ???????????

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

???????????????????????????????????????????????????

????????????? ????????????  ????????????????????????????  ??????????????  ??????? Overview of basic properties[edit]?

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

  •  ???????????????? ??????????????????????
  •  ?????????????? ???????????????????????????????????????
  •  ????????????? ??????????????????????????????
  •  ??????????????? ?????????????? ????????? ?????????????? ?????????????? ?????????????

????????????????????????????????????????????????????????????

?

?

?

?

?????????????????????????????????????????

In , Motion JPEG (M-JPEG or MJPEG) is a video format in which each  or  field of a  sequence is  separately as a  . Originally developed for multimedia PC applications, M-JPEG is now used by video-capture devices such as , , and ; and by  systems. It continues to enjoy native support by the  Player, the  console, and browsers such as , , and ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??

????????????????????????????????????????????????????

MPEG-2 (aka H.222/H.262 as defined by the ) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of   and lossy  methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth/???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?

?




1. Экономика отрасли Студента- Суханова Анна Витальевна.
2. 1 Выбор метода получения заготовки Выбор методов получения исходной заготовки оказывает большое влияние н
3. Перевод электроснабжения подземных участков шахты
4. Использование статистического способа познания в политологии
5. Контрольная работа- Особенности строительного производства
6. І ПИРОГОВА ldquo;Затвердженоrdquo;
7. Существительное в русском языке
8. тема основних методів дослідження принципів і прийомів пізнання що застосовуються в будьякій науці відпов
9. Введение Отрасль малого хлебопечения в Свердловской области сформировалась благодаря энтузиазму лю
10. Аромотерапия
11. Тема 32- Правила выплаты пенсий и пособий в ОСЗН и ПФ РФ
12. Очередной развод ну или чтото в этом роде и это нормально
13. РЕФЕРАТ дисертації на здобуття наукового ступеня кандидата медичних наук Київ2002 Дисер
14. БРЯНСКИЙ МЕДИЦИНСКИЙ ТЕХНИКУМ ИМЕНИ АКАДЕМИКА Н
15. і Вчительська діяльність почалася з 1905 р
16. ЗАДАНИЕ 1 СОЗДАНИЕ ДОКУМЕНТА И РЕДАКТИРОВАНИЕ ТЕКСТА Создайте папку для хранения в ней создаваемых до
17. Частная собственность в России с правовой позиции
18. Северный Ледовитый океан
19. РЕФЕРАТ дисертації на здобуття наукового ступеня кандидата медичних наук Харків
20. реферат дисертації на здобуття наукового ступеня кандидата технічних наук Дн