When integrating a Voice over IP (VoIP) system into an existing network it is very important to have a good understanding of how much bandwidth is utilized for each call on the network. For most people, just starting out the bandwidth calculations can be a very daunting task.
But, with a basic understanding of the components of a VoIP packet, the process of calculating the VoIP bandwidth becomes easier to understand. It is very important to understand the bandwidth allocation for VoIP packets on the network. If underestimated this could lead to latency issues and poor application performance.
There are a few key factors in calculating how much bandwidth is required to carry voice on an IP network. Some of the more important factors are:
- Codec (coder/decoder), G.711, G.729, G.723, G.722
- Sample Period
- IP Header
- Header Compression, known as Compressed RTP (cRTP – Compressed Real-Time Transmission Protocol – RTP + UDP + IP)
- Transmission Medium, and layer two protocol such as point-to-point protocol (PPP), Frame Relay, Ethernet
- Silence Suppression or VAD (Voice Activation Detection
Codec
The function of the codec is performed in a hardware device called a DSP (Digital Signal Processor). It is the job of the codec to convert analog waveforms into digital waveforms to be packetized and transmitted in an IP packet. The codec samples the analog wave form typically at 8,000 times per second for a fixed time period to create a frame of data. The time period is usually 10ms, 20ms, or 30ms, and depends on the codec deployed.
For example, a G.711 codec sampling at 20ms will produce 50 frames per second and a 64K size payload based on the following numbers.
160 bytes
x  8 (8 bits in a byte)
1280 bits
x  50 (samples/sec)
64000 bps (64K)
Frames and Packets
It is also important to realize that codecs place more than one frame of VoIP data into one IP packet. G.729 places two 10ms data frames into each IP packet. It is more efficient to place two frames in each packet as it reduces the transmission overhead by reducing the number of packets sent. This is a tradeoff between latency and packet overhead. Longer sample periods produce high latency, which reduces the quality of the call. Longer delay makes conversations awkward, with the two people talking over the top of each other. The header size becomes more significant as the sample period becomes smaller because the percentage of data to overhead becomes larger.
IP Header
The IP header consists of the following
- IP – Internet Protocol – 20 bytes – is responsible for delivering the data to the destination in the correct order and is connectionless.
- UDP – User Datagram Protocol – 8 bytes – is responsible for routing the data to the correct destination port, and is connectionless and does not guarantee delivery.
- RTP – Real-Time Transport Protocol – 12 bytes – is responsible for reconstructing the samples in the correct order and provides a mechanism for measuring delay and jitter via the RTCP (Real-Time Transport Control Protocol)
Together IP/UDP/RTP add 40 bytes of overhead to the payload. With a 20ms sample rate, the IP header will adds 16kbps (Kilo bits per second) for every IP Packet. The IP overhead can also be reduced by using a 30ms sample rate as opposed to a 20ms sample rate.
+ 20 (IP Header)
+   8 (UDP Header)
+ 12 (RTP Header)
40 bytes L3+ Over Head
x   8 (8 bits in a byte)
320 bits
x  50 pps (packets per second)
16,000 bps (16K) Over Head
Compressed RTP (cRTP)
By compressing the IP Overhead with compressed Real-Time Transport Protocol (cRTP), the 40 byte IP overhead is reduced to just 2 or 4 bytes.
+ 20 (IP Header)
+   8 (UDP Header)
+ 12 (RTP Header)
2 bytes L3+ Over Head Compressed
x   8 (8 bits in a byte)
16 bits
x  50 pps (packets per second)
800 bps  Over Head
Layer 2 Overhead
Once the packet moves down the OSI (Open System Interconnection) model to layer 2, it wraps the layer 3 IP packet with a layer 2 information we call Layer 2 overhead. Layer 2 can be Ethernet, ATM, Frame Relay, PPP, VPN or many other layer two transmission types. Each one of these layer 2 types has a different amount of overhead it adds to the packet. One of the biggest mistakes people make when calculating their VoIP bandwidth, is they forget to add layer two overhead to the total amount of bandwidth for each RTP packet.

Silence Suppression
Sometimes called VAD (Voice Activity Detection), suppresses the transmission of data during times of silence when one person is talking at a time. It is said silence suppression reduces the demand for bandwidth from 30% to 50%. The receiving codec can also generate comfort noise (white noise) while silence suppression is activated. Comfort noise allows the talking person to feel comfortable that the listening person did not hang up on them when the data stops flowing from the silent person.
On-Line VoIP Bandwidth Calculators
There are many on-line calculators as well as IPhone applications. The best tool I have found yet is the IPEToolbox found in the Apple app store. Here is the output of what the voice bandwidth portion of the application looks like.

Calculating VoIP Bandwidth Chart
While teaching Cisco Voice over IP (CVOICE), I needed some way to help my students understand how to calculate VoIP bandwidth and where all the numbers were derived from. Following is the chart I use to help my students better understand this topic.

Author: Paul Stryer
 
  
  Worldwide Locations
Worldwide Locations