System Architecture

When we started designing VOCAL, we had three primary goals in mind:

  • Build a distributed architecture.

  • Build a system that was scalable.

  • Ensure no single point of failure.

A distributed architecture suited our aim to open source VOCAL as it provided components that developers from the community could build upon or build into their projects. Scaling the system meant assigning one type of server, which became known as the Marshal server, with the task of being a single point of contact for the subscribers and enabling duplicates of this server to be added to the system as the subscriber population grew. Our original idea was to achieve load balancing by assigning each additional Marshal server with a specific population of subscribers. Our original plan also called for a multihost system with redundancy for all call control servers to avoid a single point of failure.

Data Types

In its most basic form, a Voice over IP system is a set of data combined with the capacity to process calls. There is persistent data , such as the provisioning databases of users and server configurations, as well as the dynamic data that is potentially different for each call. The call control servers handle the following types of data:


When a user connects to her service provider, the system needs to add her address to a list of active endpoints.


The system requires a perimeter to allow qualified users in and keep intruders out. Security is multifaceted and includes, for example:


The system needs to ensure that the connecting users are who they say they are, that the contents of the message have not been modified, and that no one else could have sent the same message.

Call admission

The system must determine the types of calling that qualified subscribers are permitted to use.


One server needs to know who the call is for, in terms of whether the called party is a local subscriber, where to send off-network calls, and how to route the call through the system with respect to features and final destination.


Phone users are used to working with a variety of features, including voice mail, call forwarding, and call blocking. VoIP has long been touted as a possible source for advanced features that are impossible to implement in the PSTN.


Although this feature is not important for a test or hobby system, commercial VoIP applications that require billing are growing in size and population.


How do you work with other VoIP networks? How do you share billing and allow access to calls coming from known or unknown systems? These issues are handled by the Policy server.

Having set forth our goals and a generic architecture, the next phase in our planning was evaluating the different protocol stacks available to us.

VoIP Protocol Stacks

In 1997, the only VoIP protocol with any following was H.323, a specification created by the International Telecommunications Union (ITU) for the transport of call signaling over networks. By 1999, when we started Vovida Networks, there were two new options: Media Gateway Control Protocol (MGCP) and SIP.

Then, as now, each protocol had its own set of advantages. H.323, being the first widely available VoIP protocol, enjoyed a head start as developers implemented it as toll-bypass systems as well as PC-to-phone and video-conferencing applications. The best-known H.323 application was Microsoft Netmeeting. MGCP is well suited for centralized systems that work with dumb endpoints, such as analog phones. The most celebrated use of MGCP is for high-capacity gateways designed to work with traditional telecom equipment. There is also momentum building for a replacement to MGCP called MEGACO/H.248. SIP is an easy-to-use protocol that enables developers to push the intelligence to the edge of the networks, implement a distributed architecture, and create advanced features.

We chose to base VOCAL on SIP because it suited our needs for rapid development, and we liked its similarities to Hypertext Transfer Protocol (HTTP, RFC 2616) and Simple Mail Transfer Protocol (SMTP, RFC 2821). At the same time, we provided translating endpoints to help us include H.323 and MGCP developers in our community. Chapters 7 and 8 discuss specific details about SIP and our implementation. The MGCP and H.323 translators are discussed in Chapters 15 and 16, respectively.

Looking at the different organizations present at recent trade shows such as Voice on the Net (VON) , we have seen more and more implementations of VoIP using SIP. One example is Microsoft announcing its decision to drop its H.323-based Netmeeting product in favor of Messenger, a SIP application that integrates voice, video, application sharing, and instant messaging and runs on Microsoft’s operating system. Also, 3G Wireless , the new cellular phone standard from the ITU, has chosen SIP as its VoIP protocol.

Having chosen SIP, let’s look at how the standard describes the roles that different server types play within call processing and then how we implemented our requirements into a SIP-based system.

SIP Architecture Components

RFC 3261 describes the components that are required to develop a SIP-based network. In many implementations, some of these components are combined into the same software modules. As you might suspect, there are also many different ways to achieve the same results. Some implementations may duplicate some components to enable more options for interoperability with other systems.

SIP user agents

RFC 3261 defines the telephony devices as user agents (UAs), which are combinations of user agent clients (UACs) and user agent servers (UASs). The UAC is the only entity on a SIP-based network that is permitted to create an original request. The UAS is one of many server types that are capable of receiving requests and sending back responses. Normally, UAs are discussed without any distinction made between their UAC and UAS components.

SIP UAs can be implemented in hardware such as IP phone sets and gateways or in software such as softphones running on the user’s computer. It is possible for two user agents to make SIP calls to each other with no other software components. When we start talking about message flows, we’ll look at examples that include just two IP phones. Later in the chapter, we will look at the more complex configurations that involve other system components.

SIP servers

Even though the UA contains a server component, when most developers talk about SIP servers, they are referring to server roles usually played by centralized hosts on a distributed network. Here is a description of the four types of SIP servers that are discussed in the RFC:

Location server

Used by a Redirect server or a Proxy to obtain information about a called party’s possible location.

Proxy server

Also referred to as a Proxy. Is an intermediary program that acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or transferred to other servers. A proxy interprets and, if necessary, rewrites a request message before forwarding it.

Redirect server

An entity that accepts a SIP request, maps the address into zero or more new addresses, and returns these addresses to the client. Unlike a Proxy, it cannot accept calls but can generate SIP responses that instruct the UAC to contact another SIP entity.

Registrar server

A server that accepts REGISTER requests. A registrar is typically colocated with a Proxy or Redirect server and may offer location services. The Registrar saves information about where a party can be found.

In VOCAL, the SIP Location, Redirect, and Registrar servers are combined together into a single server called the VOCAL Redirect server. SIP servers can provide a security function by authenticating users before permitting their messages to flow through the network. Frequently, all four server types are included in one implementation. Proxies can also provide features such as Call Forward No Answer (CFNA ).

SIP messages

Although the messages are an integral part of the protocol, it is not necessary to understand them before working through the instructions offered in Chapter 2. See Chapter 7 for information about messages including definitions, call flows, and descriptions of the message headers and bodies.

VOCAL Servers

VOCAL contains two types of Proxy servers , the Marshal and the Feature servers. Also, VOCAL has implemented both the SIP Redirect and Location servers into the VOCAL Redirect server. Figure 1-1 shows a simplified view of VOCAL’s architecture and how it connects to different types of endpoints.

Simplified system overview

Figure 1-1. Simplified system overview

Here’s a look at the servers that are included in VOCAL :

Marshal servers

The Marshal servers are front-line devices that receive all incoming signals, authenticate the users, and forward authenticated signals to the Redirect server. The Marshal servers also receive and forward signals from other servers within the VOCAL network. See Chapter 11 for more information. Four types of Marshal servers are found within VOCAL:

Gateway Marshal server

Works with signals coming from and going to the PSTN gateway.

Internetwork Marshal server

Works with signals coming from and going a known SIP Proxy server on another IP network.

User Agent Marshal server

Works with signals either coming from or going to IP phones that are connected to the network.

Conference Bridge Marshal server

(Not shown in Figure 1-1.) Designed to work with third-party conferencing servers. This Marshal works with calls that are destined for an ad hoc conference call. At this time, we are not aware of any open source conference bridges available on the Web. Perhaps someone in our development community will write one!


The graphical user interface screens have been built as Java applets , and you need to install the Java Runtime Environment (JRE) before you install the software. If you plan to make any changes to the Java code, then you will need to install the Java Development Kit (JDK) as well. See Chapters 4 through 6 for more information.

Feature servers

The Feature servers provide enhanced telephony features such as call forwarding and call blocking. See Chapter 13 for more information.

Redirect server

This server keeps track of the users who are registered on the network and provides routing information to help incoming and outgoing calls arrive at their intended destinations. See Chapter 12 for more information.

Other system components

Besides the previous components, there are others, such as the Call Detail Record server, which captures user information for billing or accounting applications, and the Policy server, which enables calling between IP networks owned by separate entities. For more information about billing, policy, and related protocols, see Chapter 18.

Peripheral equipment: endpoints

These items are separate pieces of hardware that talk SIP and must be connected to the VOCAL system to permit calling. For more information about SIP endpoints, see Chapter 10.

IP phone

Internet Protocol phone. This can be any kind of IP telephony device including a softphone or an IP phone.

PSTN gateway

Public Switched Telephone Network gateway. This is a device that translates the SIP-based signals into signals that can be understood by traditional phone systems.

Residential gateway

This includes equipment from a variety of manufacturers. It takes the analog signal from a phone and converts it into a digital VoIP signal. The VOCAL system can connect to SIP-based, MGCP-based, or H.323-based residential gateways .

As you can see, all the VOCAL components are revisited in detail throughout this book. As stated several times, we don’t consider VOCAL to be finished and we continue to look to the community to help us upgrade its components.

Writing the Code

Several engineers have asked us why we chose to write the bulk of the code in C++ as opposed to C or Java. We chose C++ because that language seemed to have a good mix of modern object-oriented design support while still having the performance characteristics that traditional C language telecom people felt comfortable with using. We could have chosen to write the entire application in Java, and the reason that we did not has nothing to do with any perceived speed advantage that C++ code may have over Java code: traditional telecom developers feel more comfortable with the performance of C/C++ than they do with Java. We did, however, use Java to write the Provisioning GUI because of its portability.

Get Practical VoIP Using VOCAL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.