Decisions, decisions, decisions: standards for evaluating international statistics resources

Librarians routinely evaluate new print publications and make purchasing decisions as towhat best serves their clientele. However, because of ever-evolving software, newproducts, and new delivery methods, librarians are less comfortable evaluating electronicstatistical publications. Increasing demand for these products and limited resources makethis evaluation process critical.The bulk of this article will focus on resources that are not freely availablebecause the choices are more important and often mean weighing one product againstanother. There is so much variation in construction and delivery of internationalstatistical resources that articulating a single set of standards by which to judge themwould be inappropriate. Instead, potential users should apply a variable list ofstandards, some of which apply to all resources and some of which apply to onlya few. This article describes such a combination of standards, noting resources thatexemplify them.Briefly, the standards groups are: (1) those independent of the resource; (2) thoseInternational Information UpdateDecisions, decisions, decisions: standards for evaluatinginternational statistics resourcesAmy West*Government Publications Library, 10 Wilson Library, 309-19th Avenue South, University of Minnesota,Minneapolis, MN 55455-0414, USAAvailable online 29 December 20031. IntroductionNo institution ever has as much money as it could use for collection development.Journal of Government Information 29 (2002) 365370dependent on the resource itself; (3) those dependent on the resource in context of otherrelated resources; and (4) those dependent on the resource in context of the customersbudget. At the end is a select list of resources annotated with their best features andbiggest shortcomings.1352-0237/$ see front matter D 2003 Elsevier Inc. All rights reserved.doi:10.1016/j.jgi.2003.11.005* E-mail address: for the UNSTATS database takes advantage of hyperlinks to link individualdefinitions of indicators with their original source, so that users can view all the indicatorscontributed by a given source or all the sources for a given indicator.2.2. Is it compatible with the local computing environment?When a resource is delivered via a tangible medium, then it will have to work with thelocal operating system while a security program is running. This isnt too much of a problemthese days. However, there have been products designed to interface with the hard drive of thecomputer in such a way as to make the hard drive accessible by users. On a librarysnetworked public workstation this would conflict with security standards.3. Group 2 standardsBuyers may choose from so many resources with so many different uses, target audiences,and methods of delivery, that it would be pointless not to use standards that are specific to eachresource. Broadly, thismeans testing them to see if they live up to their advertising. For example,if a producer says the benefit of a given product is that said product will provide remote accessvia Internet delivery, then the test of the product should be Will this be really usable by a enduser using a standard dial-up connection to the Internet? Even if the end-user has a 56kmodemand the producer has the biggest, fastest server in the world, a standard connection travels onphone lines and phone lines transmit at 28.8bps. In essence, this means one should ask howmany graphics are used, how large they are, whether there is behind the scenes programmingthat is invoked every time a page is loaded, and how many clicks it takes to get to the statistics.UNSTATS is an excellent example of how to provide true remote access. Its interface isvery simple and has minimal graphics. The interface pages, as delivered to the end-user, are2. Group 1 standards2.1. DocumentationThere should be two separate types of documentation for any resource. The first isdocumentation of the interface software, with clear installation instructions (for resourcesdelivered via tangible media), a help file and contact information should problems ininstallation or operation occur. Generally, most resources have enough of this type ofdocumentation to suffice. The second type of documentation covers the data itself. It shouldindicate where the numbers came from, if they were modified and how so, what calculationsare then used to generate the statistics, and explanations of all symbols used in tables.World Development Indicators (WDI) has pretty good documentation. On the CD there areseparate files for the index of indicators, acronyms, and abbreviations, a bibliography, theirgroups of economies, the primary data documentation, and their statistical methods. TheA. West / Journal of Government Information 29 (2002) 365370366straightforward HTML and involve no scripting apart from what may be used to initiallygenerate and load the page. As a result, it is very fast.end-user. One effective instance of this is the WISTAT CD-ROM from the United Nations.Users can interact with the statistics using Beyond 20/20, but there is a separate directory thatcontains Excel formats of all the tables in the database. Users who already know what theyneed can go straight to the Excel files, save a copy, and head back to their office to work whileusers who do not know what they need can browse with Beyond 20/20.Ideally, producers would include character delimited ASCII text file formats in case theend-users file is too big for a disk or if the end-user wants to use her file with some softwareother than Excel or because the end-user has another need for a nonproprietary file format.WDI and the World Bank Africa Database use the same software and both offer users theoption of saving as ASCII text, Excel, SAS, and more.4. Group 3 standardsThe third group of standards depend on the resource in context: what about it makes itworth having: content, querying software and/or data structure?This standard can be the hardest to judge. Most resources for international statistics startwith the same sources, i.e., they start with data gathered by other international inter-governmental organizations. In the absence of a summary comparing sources, potentialbuyers have to go to the resources and try compare on a series by series basis. This isvirtually impossible due to the massive size of most resources, the limited time available tobuyers and, most importantly, structural differences in databases that mask similaritiesbetween sources. The UNSTATS and International Financial Statistics (IFS) databases area good example of this.When UNSTATS was introduced, its producers highlighted in particular its inclusion ofIMF data otherwise only available on the IFS CD. For potential buyers of UNSTATS whoResources are also routinely called easy to use. Easy is subjective, but someillustrative examples are available. SourceOECD makes good use of web design standards.The link to the Statistics section is easily spotted on the home page. When clicked, the end-users eye will be drawn to the menu down the left which contains links to broad subjects, e.g.,Agriculture. Given that end-users typically think in broad terms, this makes for a good match.The end-user thinks, I want stuff on agriculture. Oh, there is agriculture. Then the userclicks on Agriculture. SourceOECD also uses graphics to help orient the end-user, such assmiley faces to indicate whether the end users institution has access to a given database.SourceOECDs implementation of the Beyond 20/20 browser is also well done. Because theend-users operational options are always in view in a menu on the left-hand side of thescreen, it is easy to change variables, time periods, countries, or output options.If a resource is supposed to allow users to download or save the information theyve lookedup, take a look at the file formats users can choose from. Microsoft Excel is the most widelyused spreadsheet in the world and there should be a format compatible with it available to theA. West / Journal of Government Information 29 (2002) 365370 367already bought IFS, it was then important to determine the extent of overlap because IFS ismore expensive than UNSTATS. If UNSTATS provided enough data from the IMF, thenbuyers might decide to discontinue the IFS subscription.typographical errors, but which inflate the number of rows and thereby the number of seriesGiven all of the above, is a resource worth the cost or not? The answer is, of course, itdepends. Certainly, any resource that is cheap will get considered and in all honesty willprobably get judged less stringently simply because the financial stakes arent as high.Conversely, any really expensive resource, even it appears to be really, really good, couldbe dismissed out of hand.WDI on CD-ROM is very reasonably priced, works well, has lots ofcontent, and is fairly easy to use. WDI Online, to the extent that it performs as well as the freeData Query on the World Bank web site, looks to be significantly better. It integratesdocumentation, effectively exploits hyperlinks, and does not overdo the graphics. However,compared with the cost of both the network license for the CD and for a similar web deliveredservice such as UNSTATS, the cost is astronomical.6. ConclusionIn an imperfect world where buyers have limited income, they must critically assess anyresource that provides access to international statistical resources. Some of the standards forassessment will be applicable across the board, some will be specific to the resource and somewill be specific to the financial state of the buyer.One test that all the producers of resources discussed above pass with flying colors isand observations.On the surface, UNSTATS appears to provide just under 100 series (with the attendantlarger number of observations). That implies that very little of the IFS database is captured byUNSTATS. However, it turns out that the database structure underlying UNSTATS is multi-dimensional, not two-dimensional. That means that all of the rows that would belong to, say,capital account credit, and which would be counted individually in IFS as described above,are collapsed in UNSTATS. In UNSTATS there will be a series, like capital account credit,which has multiple dimensions including time and place. Thus, while there is a contentdifference between IFS and UNSTATS, it is not as extreme as it might seem nor is it smallenough to justify dropping an IFS subscription.5. Group 4 standardsThe IFS database is a two-dimensional table in which every row represents a seriesdefined as a set of statistics for a given country over a period of years. There is a minimumof 30,000 rows in the IFS database. The number of observations would then be 30,000times about 50 years plus an unknown number of quarterly and monthly periods, i.e., aminimum of 1,500,000 observations. The maximum is harder to calculate. Not everycountry will have a row for every statistic and IFS treats aggregated groups of countries asif they were individual nations. Also, there are several odd series names that are probablyA. West / Journal of Government Information 29 (2002) 365370368responsiveness to customers. They have each taken critical comments constructively andmoved to address them and it has been appreciated by their users.Resource name Best feature Biggest shortcomingEurostat (web site) content Almost none of it is freeFAOSTAT (web and CD) Lengthy time series,unique contentUser doesnt find out webdownloads arent free untilafter trying to downloadCensus Bureau International Database (web and downloadable software) time seriesof demographic dataLabeling and descriptionson web site confusingForeign Labor Statistics (web) Excellent documentation,public data query clearlydirects user with numberedstepsPublic Data Query doeshave a download option,but it is not explicitlydescribed that way andusers could easily end updoing more work thannecessaryInternational Financial Statistics (CD) content, extremelytimely, lengthy time series,lots of series, lowmaintenanceInterface initially confusingto usersLABORSTA (web) Free, lengthy time series,includes worker injury andstrike statisticsInterface uses frames whichdont meet accessibilitystandardsSourceOECD (web) Provides trade by commodityby country by year; Beyond20/20 implementation isexcellentToo many graphics, toolong to load each page, toomany clicks to get to data,down too oftenTable of ResourcesA.West/JournalofGovernmentInformation29(2002)365370369UNSTATS (web) Fast, tells user coverage forseries as a whole and foreach country in each seriesPutting a link to theAdvanced Data Selectionon every screen falselyimplies a context-sensitivefunction; user will notexpect to have to start overfrom scratchUN Demographic Yearbook Historical Supplement (CD) 50-year time series in manyformats, including raw dataand sample SPSS datadictionariesOverly complex framesinterface that squeezestarget information into avery small frameUNESCO Statistics (web site) available, stable, easyto use, clear directions fordownloadingLimited statistics ascompared with othersources that draw onUNESCO dataWISTAT (CD) Unique content thats hard tocome byBeyond 20/20 softwarecan be difficult to use on apublic workstation that hasother titles also usingBeyond 20/20World Bank Africa Database (CD) Unique content thats hard tocome by, uses the samesoftware as WorldDevelopment IndicatorsNot as much documentationas on World DevelopmentIndicatorsWorld Development Indicators (CD) years of a huge numberof series drawn from manydifferent sourcesSoftware is a little clunky,initial results display isconfusingA.West/JournalofGovernmentInformation29(2002)365370370Decisions, decisions, decisions: standards for evaluating international statistics resources1. Introduction2. Group 1 standards2.1. Documentation2.2. Is it compatible with the local computing environment?3. Group 2 standards4. Group 3 standards5. Group 4 standards6. Conclusion