Web Server TechnologyISM 3600 Contemporary Issues in Information Technology
Web Server Technology3 weeks X 2 hoursOne (individual) assignmentYeager and McGrath, Web Server Technology, Morgan 1996
OverviewWeb server basicsThe Hypertext Transfer Protocol (HTTP)Scripts and formsPerformance issuesEmphasize on the general workings, rather than specifc products.
Popular Web Servers
What is a Web server ?Web server = Platform + Software + Information
What does a Web server do ?Receive a requestDecipher the requestFind the requested object (file)Deliver the object
Receive the requestThe Web server program listens to a designated port (e.g. 80).It is the operating system that hides all the complexities of the underlying network connections and gives the Web server a simple way to communicate with the clients.
Find the requested object (file)An object is requested by its name which tells the location of the object within the file system of the Web server.The Web server totally relies on the operating system to retrieve the requested file.A requested object does not necessarily exist.
The Hypertext Transfer Protocol (HTTP)A set of rules that define how Web servers and browsers communicate with each other over a TCP/IP connection.The httpd program
What a Web server does not know ?Hypertext links between documents.Inline images - browsers recognize links within a document and automatically initiate requests for them.What links may point to a document.If the MIME type assigned to a document is correct.Other Web servers.
Multipurpose Internet Mail Extensions (MIME)A set of globally recognized data types
The Document Tree =Web documents +Tree organization
Web DocumentsHTML documentsASCII textPreformatted documents (e.g. PostScript)ImagesSound recordingsMoviesJava applets...
Serving different kinds of Web documentsServer tells the client what kind of document is coming before sending the document.The Content-type headerDocument files have extensions to indicate the kinds of information content.Server only knows a document as a sequence of bytes (except for scripts).
File extensions v.s. Document types.html, .htm.txt.ps.gif.jpeg.mpeg.javaHTML documentASCIIPostscriptGIF imageJPEG imageMPEG videoJava applet
The Accept and Content-encoding headersA client can optionally send a list of acceptable formats to the server, which will return None Acceptable if the type of the document to be served is not in the list.Server can also specify how a document is compressed using the Content-encoding header.
Serving HTML documentsIn general, an HTML document contains:text to be displayedanchorslinks to images and other objectsIt is the browser which recognizes the text, anchors, and links inside a HTML document and takes appropriate actions.For each anchor or link, the browser issue a separate request.
ScriptsSometimes a browser may request a document which is really a program, or script.A script is any program that is executed by the Web server.In general, a script translates the input from the client, calls other programs, and translates the output(s) for return.
Tree OrganizationHow HTML documents link to each otherHow HTML documents are physically organized in the file system(s)
Different Tree OrganizationsOne server, one treeMultiple servers, one treeMultiple servers, multiple replicated trees
Reasons for different tree organizationsSeveral working groupsToo many documentsLoad-balancing (for replicated trees)mirror sites
The Hypertext Transfer Protocol (HTTP)Define a simple request-response conversation, in particularhow to phrase a requesthow to phrase a responseDoes not definehow the network connection is made or managedhow information is actually transmitted
The RequestAn HTTP request consists ofThe method (GET, HEAD, POST, etc.)Universal Resource Identifier (URI)The protocol versionOther information (e.g. Accept)
HTTP MethodsGETHEADPOSTPUTDELETEOthersReturn the object.Return only info. about the objectSend info. to be stored on the server.Send a new copy of an existing object.Delete the object.
HTTP Request: ExampleGET /Stuff/Funny/silly.html HTTP/1.0User-agent: NCSA Mosaic for the X Window System/2.5Accept: text/plainAccept: text/htmlAccept: application/postscriptAccept: image/gif
The ResponseAn HTTP response consists of:A status line (HTTP version, status code, reason)Meta-information (e.g. Content-Type)The actual information requested
HTTP Status Codes200301302304401402403404500Document followsMoved permanetlyMoved temporarilyNot modifiedUnauthorizedPayment requiredFobiddenNot FoundServer Error
HTTP Response: ExampleHTTP/1.0 Status 200 Document followsServer: NCSA/1.4Date: Tue, 4 Jul, 1997 19:17:05 GMTContent-type: text/htmlContent-length: 5280Last-modified: Wed, 1 Jan 1997 01:00:02 GMT the contents of silly.html
How a Web server worksWait for a new requestRequest arrivesServer parses the requestDo the method requestedif success, send documentif failed, report statusClose file, close network connection
ExerciseStart Netscape NavigatorBrowse the Colleges Web pagesFor each page, check Page Info to see what meta-information is shown.
One request at a timeMany requests can arrive simultaneously.Many requests will be delayed.A request could wait for a long time even though it could be served very quickly.The queue could be built up very quickly.Poor utilization of hardware resources.
Handling more than one request at a timeForking methodMulti-threadingHelper programs
Multi-threadinghttpdResponding request 1Retrieving for request 2Parsing request 3Receiving request 4Listening
Helper ProgramsHelper #1requestProcessing request
More than one Web Service on the same ServerBy default, httpd uses port 80, which requires superuser privilege.Other ports, e.g. 8080, 8081, can be operated by users.Each httpd on the same platform can have a different tree. They may provide different services.
Virtual ServersMultiple Web servers on a single platform, each one with a different IP address and a different domain name.Only available where the operating system has virtual host support.Low-cost option for a separate domain name.
Problems with HTTPWeb servers generally deliver information, but have little ability to ensure that it is correct, and that the hyperlinks are correct.Each request requires a separate TCP connection.HTTP is stateless and does not support sessions.
Some solutionsWeb site management tools, e.g. FrontPage, help ensure the correctness and integrity of Web pages.Scripts and helper programs can overcome the lack of sessions in HTTP.Changes to HTTPe.g. Connection: Keep-Alive
Web Scripts, Gateways, and Forms
Customized and Interactive Web PagesMake legacy information systems accessible via the Web, e.g. online library catalogs.Obtain user inputs.Customized pages, e.g. Your News Page.
Web ScriptsA Web script is a program executed by the server upon requests.The result of executing a script is returned to the client in HTML format.Scripts can:access online databasesallow user-server interactionconstruct Web pages dynamically
Web Scripts (cont.)A script may: call other programscontact other servers.
GatewaysA Web script that provides access to an online service, such as an existing database.Translate an HTTP request into a database/query language.
The Common Gateway InterfaceA standard which defines how scripts are executed by servers and how data are passed between a script and a server.Actually a suite of standards, one for each operating system environment.
What does httpd do with scripts ?Determine that a request is for a script.Locate the script and check permission.Start the script and pass clients input to the script.Read the scripts output and pass it to the client.Error handling.Close network connection.
How to distinguish scripts from other Web objects ?According specific rules laid down by the system administrator, e.g.All scripts are contained in a particular directory such as /scriptAll files with extension .cgi
When Problems OccurA script should be robust, fast, and safe.Its useful to include error messages in a script so that it can tell the client when problems occur.
Interpreted v.s. Compiled ScritpsAn interpreted script, like a Perl script, is actually executed by an intepreter program which reads and execute the script line by line.A compiled script, like a C program, runs faster and takes up less memory.
Costs of Using ScriptsWhenever a script is called, the resource implication would mean at least double or even more.Script outputs are normally parsed by httpd before being sent to clients. httpd ensures that proper headers are there; if not, httpd would add appropriate headers, hence, overhead.
Scripts and FormsAn HTML form is just an HTML document with inputs.A client requests a form just like any HTML document.Once filled-in, the client may request a script to process the input in the form by attaching the form data as arguments to the request.
The HTML FormA HTML form should contain:The METHOD (GET or PUT)The ACTION (the script)A SUBMIT buttionInput items:Input boxesCheckboxesRadio buttionsetc.
ExampleForm for CSO PH query
Form for CSO PH queryThis form will send oa PH query to the specified ph server.
PH ServerReturn name?Return phone?Return email?
At least one of these field must be specified:
Form: The GET methodInput is simply attached to the GET request, preceded by ?. At the server, the input is copied to the environment variable QUERY_STRING before the script is called. Script gets the input from QUERY_STRING.Some browsers attach input data to the pathname of the script.
ExampleGET http://www.server.org:80/scripts/direectory_assistance?Jserver=ns.uiuc.edu&doname=yes&dophone=yes&Qname=&Qemailemail@example.com HTTP/1.0
FORM: The POST methodInput is passed to the server as an HTTP object.
Converting Input and OutputThe script is responsible for parsing the user input and returning the result in HTML or some suitable format.Forms and script must use the same set of field names.
Costs of using forms and CGIProcesseshttpd, script, other programsMessage passingone request for the form, one request for the script, one responseData conversion (parsing)Different platforms execute scripts in different ways.
Web Server PerformanceThe Web is built upon many other non-Web components; the performance of Web server therefore heavily depends on these components.Performance evaluation is difficult.Web servers can get really busy as the number of clients is potentially huge.
Performance Measurement:What to measure ?Connections per secondBytes per second (throughput)Round-trip timeThe time from when the client begins to set up the connection to the Web server until the last byte of the request is received by the client.Performance of non-Web components, e.g. network, disks
How to measure ?Field testsLaboratory experiments (Benchmarks)Instrumentation
Field testsExtract connections/sec, bytes/sec from server log.Round-trip time depends on where the clients are on the network and many other factors.Statistics on disks, CPU, and memory usage can be useful.
Laboratory experimentsRealistic setup is needed.User requests can be simulated with Web pingers, which also keep logs.RTT can be measured.Synthetic workloads, called benchmarks, can be created.Stress testing.
InstrumentationInsert code into Web servers to keep more detailed logs.Inserted code could drain resources and affect server performance.Risk: too much (junk) data.
Performance of Web Servershttpd itself is simple enough but httpd often spawns new processes in order to serve requests (e.g. CGI).forking httpd can be expensive.CGI scripts could be a source of performance problems.Perl scripts are less efficient than compiled C programs.
(continued)Data compression and encryption demand a lot of resources.Disks can easily be the slowest component of Web server; caching documents in memory could help.
Assignment (15%)Groups of one or two studentsRead the article on PC Magazine issue May 8: Web Servershttp://www.zdnet.com/products/content/pcmg/1709/302244.htmlChoose one of the 9 servers reviewed in the article and follow the hyperlinks provided to find out more information about the chosen web server.
AssessmentEach group will present their findings to the lecturer in a 20 session followed by a 10 mins of questions and answers.CriteriaEvidence of information gathering.Appreciation of the latest Web server technology and its trends.Understanding of the technical details.Clarity and structure of the presentation.Ability to answer questions.