> ----------------------------------------------------------------------- > REVIEWER B > ----------------------------------------------------------------------- > Summary: > > Section 2 provides an overview of SIP. Section 3 gives an overview > of the Intelligent Network model, and describes client-side and server-side > programming on the WWW. Section 4 discusses the execution of programs > in SIP servers, and proposes two mechanisms - SIP CGI and CLP. Section 5 > starts from HTTP CGI to talk about SIP CGI. Section 6 describes the > Call Processing Language for end-user programming. > > General comments: > > I think the topic of the paper is very timely, interesting and relevant to > Network's readership, and probably useful to some readers who are working > in the area of Internet telephony. The organization of the paper is > straightforward, although it can be improved. The technical > content is correct as far as I can tell. However, I have reservations > about the paper for Network. My main concern is the level of > detail in sections 5 and 6, which comprise a large part of the paper. > While many Network readers are probably interested to learn more about > Internet telephony, I expect that very few of them would be that interested > in the detailed scripts in pages 11 to 20. The paper is not about > the types of things that might most interest the general Network > readership, such as any exciting new features that are possible (hinted > at in the third paragraph), what users might expect to see in the near > future, or what is implemented and available commercially today. As a > typical Network reader, I started to read the paper with interest but > found my interest fading out by section 4. Maybe it is just a personal > opinion but the paper seemed rather dry. If the paper is accepted, > the second half could be edited significantly, keeping in mind the > average reader. We've significantly edited section 5, removing much of the detail while keeping the basic ideas. We've also added a subsection that describes some of the services that are enabled. The more complex perl script has been removed, and the other has been heavily commented and moved to an appendix. Section 6 has also been edited, removing parts to shorted it. One of the example scripts has been removed, and the remainder have been commented and moved to an appendix. > > Detailed comments: > > sec. 1, para 2: define CGI and API. Last sentence "These tools have been > used.." should be "These tools have been used by.." Acronyms expanded, sentence fixed. > > sec. 1, para 4: it is not clear what timescale "rapidly" means... on the > fly programming or daily or yearly? Why does programming need to be so > rapid? Does the sentence refer to a comparison to service providers today? There is really no need for the word "rapid" here. Its not meant to be a comparison with existing service providers, but just to emphasize it should be fast. I've stricken the word. > > sec. 2, para 1: it is not clear from the text why RTP, RSVP, and diffserv > are mentioned here. Sure, these protocols may help to provide QoS > assurance in the Internet, but it is not explained what their specific > relations to Internet telephony is. Some text has been added to each to explain their specific role for IP telephony. > > sec. 2.1: SIP is said to be similar to HTTP several times, which brings up > the natural question of why HTTP isn't adequate... it is not explained. HTTP is not adequate since it doesn't allow for signaling. You can't set up a call with it. A sentence has been added, saying that SIP adds its own methods and headers to provide the functions needed for IP telephony signaling, which http does not have. > > sec. 2.2: this section consists of a single paragraph, and it might be better > incorporating this paragraph into sec. 2.1, for example, after paragraph > 5 which talks about call setup. Done. > > sec 3.2: this section is surprisingly brief, especially compared with the > detailed section 3.1. Either 3.2 should be expanded or 3.1 should be > abridged if the details there are unnecessary. Section 3.2 has been expanded. It provides more details on the operation of CGI. > > sec 4: the writing style in this section seems more informal and inconsistent > with the other sections. I didn't notice any real change in style; its hard to fix without knowing a specific problem. > > sec. 5: the first 3 paragraphs can be condensed into one Done. > > sec. 5.4, sec. 6: I am not sure how many readers will be interested in > the example scripts. We feel they will be valuable for the Internet Computing readership. Granted; they need comments, and these have been added. We removed one of the perl scripts, and a few of the XML scripts, and moved the rest to an appendix so uninterested readers can skip them. > > References: this is a comprehensive list but unnecessarily long for Network... > some references, such as [4] [6] [7] [8] [14] [15] [27], might be deleted > without significant detriment. [4] has been deleted, as has [8]. People often are confused by the positioning of RTP, QoS protocols, and signaling protocols in the overall solution, so we feel there is value in keeping [6] and [7]. [14] and [15] have been deleted. [27] has been deleted. > > ---------------------------------------------------------------------------------- > REVIEWER C > ----------------------------------------------------------------------- > Overall > The paper presents a good tutorial of the programming model for > telephony and the web. It outlines the requirements for programming > Internet telephony services, and proposes two solutions that meet these > requirements. The paper, for the most part, is easy to read. I recommend > acceptance, subject to the following modifications and suggestions. > > 1) The paper lacks exposition of a few critical areas. These should > be rectified. The areas are > (a) on page 2, a description of the role of signaling. A few sentences have been added summarizing the role of signaling. We indicate that it is used for setup and teardown of calls, conveying call related information, invoking services, etc. > (b) on page 2, an outline of H.323 services, and later in the paper, > a > description as to why they are not adequate for Internet telephony > services We've added a more detailed description of the three protocols on page 2. The paragraph which follows indicates why we've chosen SIP. Its not that H.323 doesn't provide telephony services; it does (we state this now). But, SIP is simpler, and its clean request-response model simplified developing models for programming services. That is why we chose it. > (c) on page 6 (section 3.2) an outline of the operation of CGI and > an example > of its use. We've extended section 6 to provide a more detailed description of CGI. > (d) page 20, section 6, how well does CPL actually work -- > experience Implementation experience is in the works; there is not yet a complete implementation (the language itself is still work in progress). > and > (e) page 20, section 7 -- expand the conclusion -- at the moment it > is far > too short. The conclusion has been rewritten. > > 2) It might be a good idea to start the paper with an example of a > typical Internet telephony service, thus motivating the need for the > programming > models described later A summary of some of the services described later on is now given in the third paragraph in the introduction. > > 3) there seems to be an asymmetry in the paper in the description of H.323 > vs > SIP, and the IN model vs. CGI. As mentioned in point 1, this should be > a bit more balanced. An exposition on H.323 would require more room than is available. Our own research has focused on SIP, and section 2 gives some motivations for our choice. Since we have used SIP, and since SIP CGI is based on SIP, it makes sense to describe it in the paper, but less sense to describe H.323 in detail. > 4) Page 3 section 2.1, last para : destroying transaction state before > completing > a transaction seems really weird. The stated motivation is not strong enough > to justify this design, and the reader is left wondering how this affects > the > transaction in progress. Perhaps this can be better explained. This has been explained better. The idea is really that if a server crashes and reboots mid-transaction, the SIP messages themselves contain sufficient information to allow the transaction to complete correctly. In most cases a server would not destroy state mid-transaction, but since transactions can be unbounded in time, this property allows a server to clean up old state safely. > > 5) Section 3.1 is very nicely done. I never really understood how the IN > worked before I read this material. > > 6) Section 4.3 has a bad title. It is more a discussion on what are the > restrictions placed on a program than how the program is developed. This has been changed, along with all of the titles in section 4. > > 7) You may want to introduce SIP-CGI and CPL just before section 4. > Otherwise, in this section, you have to make awkward forward references to > section 5. There were two forward references. The first has been deleted. The second is at the end of section 4, and it effectively introduces sections 5 and 6, which seems acceptable. > > 8) the description in the second last para of Section 5.1 is not > clear. All of section 5 has been rewritten, and in the process, this has been clarified. The rewrite was at the request of another reviewer. > 9) The last para in Section 5.1 is hand-wavy. If a script is going to > store state on a disk, who will clean away state when a buggy script > crashes? > perhaps the last line can be deleted or made more precise. This paragraph has been stricken as a result of the re-write. > > 10) The perl scripts are not too enlightening to the reader who does not > know perl. They should either be replaced by pseudocode or better > commented. > > I would rather see pseudocode myself. One has been removed, and the other heavily commented and moved to an appendix. We believe the actual perl is better than pseudocode. It accentuates the point that real services can be implemented with real programs with very little code. > > 11) How do you plan to verify XML? A comment or two would be useful, > comparing this approach to that of proof-carrying-code. The XML is validated against a DTD. Then, the graph represented by the XML is traversed, and we look for cycles. If there are none, the maximum depth of the tree gives a bound on the running time. This has been clarified in section 6.3. > > ------------------------------------------------------------------------ > REVIEWER D > ----------------------------------------------------------------------- > Summary: > > This is generally a good paper on a very timely and important topic > area, authored by the indisputable experts in the area. The lapse in > properly framing IN with respect to the Web environment (strongly > implying that they are exclusive rather than orthogonal) and the lack > of comments in the code blocks must be remedied before publication in > _Network_. Our discussion of Web and IN models in section 4 was not at all meant to imply they were exclusive; clearly they are not. Both are solving very different problems, and both of them do exist at the same time. Rather, our aim was to show that there are different models in use for both , and IP telephony will need to take components from each. The section on existing models has tried to make this more explicit. The code blocks have been heavily commented. One of the perl scripts has been removed, and the other has been moved to an appendix. One of the CPL scripts has been removed as well, and the others moved to an appendix. > > Major comments: > > In section 4, you seem to be focusing on the sorts of details that > might come out of a table comparing various things about IN and > Web-based program environments. This level of abstraction is OK, but > completely misses the bigger picture. IN has to do with separating > service functionality from the switch, *and* providing additional > services at ~layer 3 in the PSTN. The moral equivalent of IN in > packet networks is *active networks* (AN) which also seek to provide > enhanced services at layer 3. While AN is similar, we believe that web programming (ala CGI) is really the "moral equivalent" of IN. As you say, IN has to do with separating the services from the switch. Web programming, especially CGI, is the same - separating the content creation from the server. Both involve IPC (IN with INAP, CGI with environment variables and standard file descriptors). We would argue that IN is not for layer 3 services (MTP3 sits at layer 3 of the SS7 stack). IN is really for services offered by the signaling protocols, ISUP in particular, which is above layer 3. The closest equivalent to this in the web would be http. Both reside above the network/transport layer, and both contain the primitive functions for providing the service. IN programs how ISUP is used, and CGI programs how HTTP is used (roughly). In addition, in AN, the code to be executed is generally carried in the packet to which it applies. That is not true for either IN or the web programming models. The logic in the IN (in the form of service logic) is administratively configured ahead of time. Same is true for web programming. We will mention, however, that AN is related. The reason that IN uses a call model > and not packet filtering is (1) the PSTN is connection oriented, and > there is a sensible connection state machine on which to trigger > events, and (2) all signalling is out-of-band and there simply > *aren't* data packets on which to filter. Broadband ATM IN, if there > were such a thing, could have *both* a call model and cell/frame > filtering. AN does indeed generally use packet filtering > as the mechanism to do this. ACtually, there are packets on which to filter. The entire SS7 network is a packet network. The packets to filter on would be ISUP messages, such as IAM, REL, and ACM, which have rough equivalents in SIP. > > You are proposing application (or session) layer implementations to do > things that could also be done as AN layer 3 implementations, which is > fine (particularly since there are no widely deployed ANs), but please > present the whole picture and make these distinctions. Not really. Layer 3 in the IP world is, well, IP, and it is handled in routers. SIP and IP telephony protocols are application layer protocols. We are proposing mechanisms to control application layer services. This distinction aside, we could not even use application layer AN, since the party providing the logic (the callee) is not the one sending the signaling packets which would need to contain the logic. > > In your code examples (Perl and CPL) you *need* comments in the code > to explain what the major blocks are doing. You can't assume that > _Network_ readers are proficient at either Perl or XMLish languages, > and even if they are, they shouldn't have to expend major effort to > figure out how the code works. Done, as discussed above. > Detailed comments: > > p.1 para.1 Cap E in "Elemedia" This text has been stricken as a result of other reviewer comments. > > p.3 para.7 A short SIP/SDP example might be useful here An example SIP/SDP message has been added. > > p.4 para.1 How do company.com server know *where* joe currently is and > to redirect? Is this a relatively dynamic binding to support > mobility and roaming between home and offices, or a long term static > binding that says that joe is at the university this year. Either is possible. The bindings can be established dynamically using the SIP REGISTER method, or they can be in a more permenant store such as a database or local file. SIP calls the means of determining this binding the "location service". Some text has been added to clarify this. > > There are quite a few round trips here, and while session setup > latency can be a bit long, it would seem prudent to have some sort of > registry that could resolve to joe's location in one or 2 round trip > latencies. The latency for a SIP call setup can be as fast as a single round trip (for example, a call to sip:user@machine.network.domain.com would resolve directly to the machine user is sitting on), or painfully slow if there are tens of layers of redirection and proxying. To scale well, the location of a user must be effectively stored in a distributed database, and this is exactly what is happening in the example. Storing them in one, worldide central registry does NOT scale. Measuring the latency of this call setup in round trip times anyway is actually meaningless, since there are multiple entities involved, and in most cases there is one exchange between any pair. Many of these servers will be quite close together, particularly those in the university. In this case, the end to end latency may not be substantially higher than a single round trip time between the entities in the call. > > p.4 para.3 "...standards codified by the [ITU]..." They don't > *develop* anything. Changed. > > p.5 para.2 should the acronym for Physical Plane be "PE"? Oops. Acronym removed. > > p.6 para.1 this is a pretty lame example to use for IN -- at least do > something INish like 800 number translation, otherwise one wonders why > all the SCP involvement. Example changed to indicate that 800 number translation is being used. > > p.6 para.2 Be careful here; IN CSs are a *lot* more than just a > list of supported functionality. The call models are different, > there are differences in the architecture as well. For n larger > than 2 it is questionable whether CSn will ever be deployed, in spite > of ITU standards and WG documents. This has been reworded to make this explicit. > > p.7 para.2 explain here what it is to run "on" a user agent. Bad use of terminology. By user agent, we just mean the computer of the end user. This has been reworded. > > p.7 para.3 "the Web and IN both" -- fix grammar Fixed. > > p.7 para.5 expand and define URI (as opposed to URL which is the only > thing most Network readers will have heard of) URI has already been defined in the section on SIP. Some text has been added there to indicate what the difference is between a URI and URL. > > p.8 para.2 "...the how of..." ? maybe "program development process" The section names for this whole section have been changed, at the request of anothe reviewer. This section name is now "Resource Restrictions". > > p.8 para.3 antecedent problem, replace "allow them to" with "allow > users to" Fixed. > > p.8 para.6 again, comparing apples to oranges; call state vs. packet > content is an issue of connectionless vs. connection oriented and in- > vs. out-of band signalling, rather than anything inherent in layer 3 > IN vs. layer >4 Web. As we discuss above, we would argue that the IN information related to call state is application layer information, not network layer (MTP3 is network layer in the SS7 network). An alternative would have been to pass the entire ISUP message to the SCP. In the web, there is no application layer state per say, so passing the message content itself is the only appropriate thing to do. Some text has been added here along these lines. > > p.9 para.2 "...operation of restricted..." replace "of" by "is", I > assume. Fixed. > > p.10 para.4 close paren needed after "script cookie" Fixed. > > p.10 para.6 is any state cleanup needed ala garbage collection of TTL? > Even a small amount which is not properly destroyed per session could > amount to huge amounts of data over some time for reasonably expected > session transaction rates This paragraph has been removed as part of a general cleanup and shortening of this section. To answer your question, though, the server will need to clean up old state left around by scripts which never finished. This can be accomplished by a new keepalive mechanism which is being added to the next version of SIP, to address this very issue. The mechanism will allow the server to know when the call is definitely over, so it can time out old state. > > p.11 para.8 recommend "newline" rather than "carriage return" These details on script output formatting have been removed. > > p.11 para.10 insert "user@host" thusly: "... URI listed user@host." These details have also been removed. > > p.12 sec.5.5 para.2 where is the response stored, and for how long? > (state explosion issues as above) The responses are stored in the server. However, they persist only for the duration of the transaction. SIP provides mechanisms so that a server can deem a transaction terminated at any time and destroy all state, including these messages. Some text has been added to state this explicitly. > > p.13 para.1 You need to explain briefly *what* a call model is, > e.g. "a linear representation of the call state machine with event > triggers at each state" or something similar. A definition has been added in section 4. > > p.13 para.5 Explain exactly *what* a "Call Routing Distribution" is, > for those that don't have it as a regularly used feature of their phone > system A definition has been added in the section on CGI. > > p.16. para.5 "macine" spelling Fixed. > > p.16. para.8: why weren't these security concerns mentioned for the > CGI stuff. Are you assuming IPSEC is enough? No. The security issues are different. The CPL is provided by end users. These users will not have logins or administrative control over the servers where the scripts reside. Thus, they must be transported. For CGI, the scripts are meant for administrators, who can put them on the servers directly. No specific transport is required. Should CGI scripts be used by end users (which we don't recommend), the same transport issues would arise. We now mention this in the CPL section. > > p.17. para.4: provide XML [reference] Reference added. > > p.19 fig.7: so the forwarding is *static*, with no ability to read a > database? This is not much better than a .forward file. For this example, yes. CPL can also have access to databases. > > p.20 sec.7 A conclusion that looks like some real thought was put into > it would be nice. Conclusion has been rewritten.