Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1221
--> TABLEDEX: A New Coordinate Indexing Method for Bound Book Form Bibliographies ROBERT S.LEDLEY 1. Introduction The conventional bound book form of bibliography is an important class of publication, widely used in the sciences, arts, industry, business, and government. The publication of such a book is routine and has the additional advantages of convenience and familiarity. However, there is a serious disadvantage to a bibliography of this form: the indexing system commonly used is inadequate for coordinate retrieval. That is, one cannot easily look up all articles1 in the bibliography associated with several (three or more) descriptive words. In many cases this severely limits the utilization and value of the bibliography as a reference source. The purpose of this paper is to describe a new method for indexing a conventional bound book form of bibliography, called Tabledex, which will enable easy and rapid complete coordinate retrieval. That is, by means of this new indexing method, all articles listed in the bibliography and associated with any number of descriptive words can be found easily and rapidly. For example, in a bibliography of current research reports suppose it is desired to look up all articles on “laboratory aids to the differential diagnosis of hematological diseases.” With a conventional index one might start by looking under “hematological diseases” in the hope of finding “diagnosis”; if this failed, one might then look under “diagnosis” for “hematological diseases,” and most likely the “differential” and “laboratory aids” aspect will have been completely lost. On the other hand, if the bibliography has a coordinate index of the kind described in this report, then it would be an easy, straightforward ROBERT S.LEDLEY Data Processing Division National Bureau of Standards, Washington, D.C. 1 We are using the word “article” genetically to mean all titles and references that may be listed in a bibliography.
OCR for page 1222
--> matter to look up all articles on “laboratory aids to the differential diagnosis of hematological diseases.” One would automatically find all articles associated with all of the words: laboratory, aids, differential, diagnosis, hematological, and disease. Background. The coordinate retrieval concept has long been known in the field of indexing, and several methods for manually implementing coordinate retrieval of information are well known. All these require a certain amount of manipulation of cards or pages, and are unfortunately not appropriate for use as an index in a bound book. Coordinate retrieval has been accomplished by using marginal punched cards, where each card represents an article, and each marginal position a word; on each card the words that characterize the corresponding article are recorded by notching the respective marginal position; coordinate retrieval is accomplished by separating cards from such a deck by means of the notches corresponding to a set of coordinated words. On the other hand, another technique, often called the Peek-a-boo method, uses a card to represent a word, and the articles are represented by respective x-y coordinate positions on the card; on each card, the articles that are characterized by the corresponding word are recorded by punching a hole at the respective x-y coordinates; coordinate retrieval is accomplished by placing together the cards from the deck that correspond to a set of coordinated words and then observing which x-y coordinate positions are punched on every selected card. Finally, in the Uniterm split book method, on each half page appears one or more words, and numbers that represent articles characterized by each word; the half pages corresponding to a set of coordinated words are scanned for common numbers. The special nature of this book departs enough from the usual publishing format to cause extra problems for the publisher. The method has the additional disadvantages for our purposes that when making a single coordination with more than two words, one must refer to several pages, and must also copy down on scrap paper the article numbers that appear on the intermediate steps.2 Advantages of Tabledex. There are two fundamental advantages of this new method of indexing. The first is that it maintains the convenience of the conventional book form together with easy and rapid coordinate retrieval.3 No manipulation of cards is necessary; no artifacts, such as holes in the paper, are needed; it is not necessary to compare two different pages, nor copy down intermediate numbers. All the work is done on a single page of the book. Of 2 Of course, the Uniterm method is well suited for other purposes, particularly because it is essentially open-ended. 3 Some advantages that would result from non-manipulative coordinate indexes have been discussed in the literature, cf. Charles Bernier, Correlative Indexes I. Alphabetical Correlative Indexes. American Documentation, 7, (1956).
OCR for page 1223
--> course, a bound book form of bibliography implies a closed indexing method, because additional articles or words cannot be included in the bibliography once it is published. Use of computers. The second fundamental advantage is that, with the new method, making a bibliography can be highly mechanized, and the actual work of the bibliographer is reduced to a minimum. That is, a computer (such as a digital electronic computer) can, within a few hours, automatically organize, compile, and print the entire bibliography including the Tabledex coordinate index—the computer will do everything described in section 3 of this paper even to making up the page formats. The bibliographer need only choose the words that are associated with each article included in the bibliography, which, of course, he must do no matter what system is used. Thus the work of the bibliographer is reduced solely to making human decisions—the computer will do the rest. The process might involve three steps: (1) The associations of words for articles is accomplished by the bibliographer. (2) These associations, after being transcribed onto an appropriate medium, are read into the computer. (3) Within a few hours the pages of the book bibliography are printed out by the computer, and then reproduced directly by the photo-offset process and bound into books. It is difficult to estimate the number of pages such a bibliography will require. However, we might guess that if a bibliography contains 10,000 articles, 6000 words (3000 words with an average of 2 synonyms per word, see below), and if an average of 10 words is associated with an article, and if the print is similar to that of Webster’s New Collegiate Dictionary, 1953, then there would probably be approximately 300 pages in the bibliography. 2. The new indexing method and its use THREE PARTS The book itself would consist of three parts. Part I is the bibliography proper, or list of articles with the author, journal, etc., and if desired some descriptive material. The article list designates each article with an underlined article number. Part II is the alphabetical list of descriptive words by means of which the retrieval is accomplished. The word list designates each word with a nonunderlined word number. Part III contains the indexing tables. For purposes of illustration, we have reproduced in Section 5 a short bibliography composed of twelve articles in Part I (the article list) and twenty-two words in Part II (the word list). (This illustrative bibliography is obviously incomplete and is used as an example only.)
OCR for page 1224
--> THE TABLES The actual work of looking up the desired articles is done by means of the tables in Part III. These contain two types of numbers, the underlined article numbers down the left most column, and the nonunderlined word numbers comprising the rows. There is one table for each (distinct) word of the word list and each table is numbered with this word number. The article numbers in the left most column are all articles associated with that word, and the other numbers in each row are other words which describe or are significant to this article. USING THE NEW TABLEDEX INDEXING METHOD An example of the actual method of using the tables can easily be seen by a simple example. Suppose one wants to find all articles on the application of nuclear theory, i.e., all articles each of which is associated with all of the following given words: application nuclear theory (1) From these given words, one first looks in the word list for the numbers designating each of them, and writes these given numbers in numerical order. (2) One then takes the smallest of these given numbers and turns to the table of that number. (3) One inspects this table, searching for a row which contains all the other given numbers. Check mark this row and each row that also has all the other given numbers. Since each of these rows is labeled with the underlined article number on the left, these are all the articles one is looking for, i.e., all articles associated with all these words. For example: From the alphabetical word list one finds: application 5. 1, nuclear 4. 3, theory 4.4, which in numerical order are 4.3, 4.4, and 5.1. Then one turns to Table 4.3, which is reproduced for convenience. Noting that only the first two rows of Table 4.3 contain both 4.4 and 5.1 these rows are checked. Hence articles 2. 4 and 3. 4 by Seidle and by Tove are associated with all the given words.
OCR for page 1225
--> FURTHER OBSERVATIONS In Part I, the articles can be listed in any convenient order, i.e., alphabetical by author, or title, or journal. (In our illustrative bibliography they are listed by author.) The assignment of article numbers need only be sequential, so that the article designated by a given article number can be easily located. In Part II, the words are listed in alphabetical order. To maximize the probability of a person finding the word he is looking for in the word list, synonyms of the descriptive words are also included in the list; but synonyms have the same word number. In this sense, the list of words is a type of thesaurus. In Part III, as noted above, the article numbers that label the rows of a table correspond to all the articles associated with the word of the table. Hence a quick glance at a table can present all articles associated with the corresponding word. The word numbers appearing within a row correspond to words associated with the article of that row. For example, observation of Table 4.3 shows that 2. 4, 3. 4, 3. 3, and 2. 1 are all the articles associated with word 4. 3. Also observe, for example, that words 4. 4, 5. 1, 5. 3, and 6. 2 are associated with article 2. 4. However, as discussed in further detail in the next section, these are not all the words associated with article 2. 4. CHECKING THE ROWS After one has turned to the table of the smallest given word number, checking the rows for the other given word numbers is often best accomplished in steps of one number at a time as follows. One begins by searching for the second smallest given word number in each successive row of the table. A check mark is placed next to those rows that contain this number. One next searches for the third smallest given word number, but the search is made only in those rows that have been previously check-marked. The check is crossed off when this given word number does not appear; another check is marked when it does appear. One continues this process for each of the remaining given word numbers, each time considering only checked rows whose checks have not been previously crossed off. When this has been accomplished for all the remaining given word numbers, those rows that are checked, and whose checks have not been crossed off correspond to articles that are associated with all of the given words. For our example, after turning to Table 4.3 (see Section 5) search for the second smallest given word number, namely 4. 4; check the first three rows. Next search in these rows for the third smallest given word number, namely 5. 1; place another check next to the first two rows; cross off the check of the third row. Thus rows corresponding to articles 2. 4 and 3. 4 (by Seidle and
OCR for page 1226
--> by Tove) are the desired rows, and hence these are all articles associated with all three of the given words. It is important to note that within each row the numbers are always increasing to the right, which significantly aids the search. Also note that the numbers in the first column are increasing downward; hence when the first number of a row is larger than the second given word number, this row and all rows below need NOT be searched. Tracing paper can aid this process of making the check marks and crossing off checks. The paper is placed over the page on which the proper table is located and pushed into the page binding. The table is read through the tracing paper, and the markings are made directly on it. 3. Compiling the new indexing system: Tabledex THE MATRIX A conceptual visualization that often clarifies retrieval problems is the article-word matrix. The rows of this matrix are labeled by the articles under consideration, and the columns are labeled by the descriptive words. An element of this matrix is a unit if the article corresponding to the element’s row is associated with the word corresponding to the element’s column; otherwise the element is a zero. Of course, in practice such a matrix is never actually FIGURE 1. The article-word association matrix. Short bibliography of articles on instrumentation (from the files of the Office of Basic Instrumentation of the National Bureau of Standards).
OCR for page 1227
--> made, but the concept aids the visualization of many coordinate retrieval methods. Such a matrix for our illustrative bibliography is given in Fig. 1 so that the description of the system may be more easily understood. Marginal punch cards (where a card represents an article and the marginal positions represent words) are visualized as representing the rows of this matrix. Peek-a-boo cards (where a card represents a word and the holes, articles) are visualized as representing the columns of this matrix. In any event, given a word, the associated articles can be easily read from the matrix and, conversely, given an article, the associated words can be easily read from the matrix. ARTICLE NUMBERS AND WORD NUMBERS Although we have mentioned article and word numbers above, we have not yet discussed in detail how they are composed. Let us start with the article numbers. As was noted above, the assignment of article numbers to the articles need be only sequential, according to the article list in Part I of the bibliography, so that the article designated by a given article number can be easily located. For example, in our illustrative bibliography the article number represents the page and line on which the article is located. Hence, the article corresponding to article number 2. 3 is located on page 2, line 3 of the bibliography. The word numbers, on the other hand, are assigned in a very specific manner, in order to reduce the tables to a minimum size. The part of a word number to the left of the decimal point is equal to the number of articles listed in the bibliography that are associated with this word; the part to the right of the decimal point is meant to distinguish between different words that are to be associated with the same number of articles. For example, if three different words are each associated with 5 articles, then their numbers would be 5. 1, 5. 2, and 5. 3. The fact that the word number gives the number of articles associated with the word can often be directly used to great advantage. MAKING THE TABLES As has been mentioned, there is one table for each word number; and each table has one row for every article associated with the word. Each row is labeled on the left with the article number. For example, consider the table for the word nuclear, bearing the word number 4. 3. From the matrix it is seen that this word is associated with articles 2. 1, 2. 4, 3. 3, and 3. 4 hence these article numbers label the rows of Table 4.3. The nonunderlined numbers of a row consist of all the word numbers associated with this article that are greater than the word number of the table, arranged in increasing order from left to right. For example, in Table 4.3 consider the row for article 2. 4. From the matrix it can easily be seen that this article is
OCR for page 1228
--> associated with word numbers 2. 1, 4. 2, 4. 3, 4. 4, 5. 1, 5. 3, and 6. 2. However, since the word number of the table is 4. 3, then only word numbers 4.4, 5.1, 5.3, and 6.2 (which are greater than 4. 3) appear in the row. Finally, the rows of a table are rearranged so that the first column of word numbers increases from top to bottom. The purpose of making the tables in the above manner is to reduce the tables to a minimum size. Since according to the system, the given word numbers are first arranged in numerical order, and the table corresponding to the smallest word number is then used for retrieval, within the table the word numbers need never be smaller than the word number of the table. Hence only word numbers larger than the word number of the table need be included in the rows of the table. In addition, tables that have many rows, i.e., tables for words associated with many articles, will be very thin; since such tables will correspond to words having large word numbers, from which it follows that the rows will contain few word numbers. (For example, see Table 6. 2.) On the other hand, tables that have many words within a row will probably correspond to small word numbers, and hence they will have few rows. (See Table 1.1.) The critical tables are those for words associated with an average number of articles. Here the search for numbers within a row is simplified by arranging the numbers in increasing order; and the necessity for searching all rows has been eliminated in many cases by making the numbers in the first column increase from top to bottom. The process of making the tables for a particular bibliography might be difficult to accomplish by manual means, but it is easy to do with the aid of a high-speed digital electronic computer. The input to the computer need be only the list of articles with their respective associated words. The computer will automatically assign both article numbers and the word numbers, as well as form all the tables. The output from the computer will be the completed bibliography, Parts I, II, and III, printed even with the correct page formats, ready for photo-offset duplication and binding into books. All this can be accomplished by the computer within a few hours. In Appendix I we have tried to give an indication of the mechanical feasibility and advantages of the new technique. In Appendix II we have considered an additional problem: what to do if no article exists in the bibliography that is characteristic of all given words. In Appendix III we have suggested an important alternative approach to making index tables: using words instead of numbers. The approach is still based on the same principles developed in this article. Appendix IV discusses the advantages and disadvantages of population vs. substantive meaning for the entries in the tables.
OCR for page 1229
--> 4. Applications of Tabledex 1. For rapidly preparing bibliographies of current periodical literature The new indexing system would not only provide a means for highly mechanizing the production of such a publication but it would also provide a coordinate index. For example, consider the monthly “Current List of Medical Literature” compiled by the National Medical Library. Articles from thousands of journals are indexed each month in this publication. Mechanizing some of the tasks of the bibliographers might have advantages, such as further reducing time delays in compilation and publication, and eliminating human errors. In addition, a coordinate index can often enhance the value and effectiveness of such a publication as, for example, when trying to find articles on such subjects as “laboratory aids to the differential diagnosis of hematological diseases,” or “incidence of post-partum hemorrhage due to retained placental fragments.” 2. In handbooks and other compilations A coordinate index of this type can often enhance the value of a handbook. For example, it might be very difficult to find quickly within a handbook of mathematical formulas and tables some particular formulas that give “continued fraction expansions of the arctan x converging rapidly for 0<x<1,” or “Fourier series expansion for reciprocals of Jacobian elliptic functions.” 3. As an annual index for a current journal Many scientific periodicals, such as the Chemical Abstracts and The Physical Review have annual, semiannual, or quarterly cumulative indexes. If a punched card were made at the time of publication of each abstract or article, such cumulative indexes could be automatically compiled and printed in coordinate form within a few hours simply by sending the cards thru a computer as described above. Hence the cost of preparing such a cumulative index would probably be reduced at the same time as its value might be enhanced by becoming a complete coordinate index. 4. For survey of current foreign literature It is often important just to know that an article on a particular subject exists. For example, it is claimed that the Russians published several articles about their Sputnik I before it was launched—of which the Western World was u naware. A periodical survey or list of current Russian scientific literature that included a coordinate index might have enabled the proper people
OCR for page 1230
--> to know of the existence of these articles even if the articles had not yet been translated. Such a survey and coordinate index is relatively inexpensive and can enable United States scientists to become aware of Russian articles on subjects of particular and specialized interest. Only the titles of the articles would be translated, and the coordinate index would of course be in terms of English words. A scientist, having found by means of the coordinate index a Russian article on a subject of particular interest, might then have it translated. Otherwise the article could have been entirely overlooked. In this way access can be had to hundreds of Russian journals that may contain important articles, but which, owing to their nature and various other circumstances, would in general not be translated at all. 5. In coordinating current research and development contracts A bound book coordinate index of current research and development contracts, placed on the desks of research scientists could significantly aid the coordination of a large-scale research program, stimulate a more extensive and healthy information exchange between scientists, and prevent costly, time-consuming and unnecessary duplications. Up to now, large research and development programs, such as those of the Department of Defense, usually keep such indexes on punched cards at a centrally located installation; and they are interrogated upon request. However, such a central service has the serious disadvantage that a scientist must wait in line for his request to come up for processing, the request often involves preparing official correspondence by mail or memorandum, and the efficiency is low because the retrieval is done by someone other than the scientist himself. Such disadvantages can be eliminated by publishing periodical coordinate indexes in bound book form, and sending these to the contractors and research scientists. 6. In conjunction with open-ended card type coordinate files Very frequently open-ended coordinate indexes on current periodical scientific or other literature are kept and compiled in the form of punched cards by organizations such as the Library of Congress. Each month many new articles or reports are received and added to the index. At the present great rate of research publication, such indexes can rapidly become too huge to handle. For example, an index of 100,000 reports may receive over 20,000 additional reports each year. Hence it is desirable to “close off” or take out of this current card index about 20,000 of the older reports each year. By means of the methods described above, whenever it is desirable to close off a section of such a current card index, a bibliography which includes the new coordinate indexing system can automatically be printed within a few hours simply by sending the cards directly through a computer. This bound book bibliography
OCR for page 1231
--> can then be mailed to libraries and laboratories throughout the country. In this way (1) the current open-ended punched card coordinate index can be maintained at a reasonable size, (2) past information can still be retrieved in coordinate index form from the bound book bibliographies, and (3) the compiling and printing of the bound book bibliographies can be completely and automatically accomplished from the cards within a few hours on a computer. 7. As a rapid method for machine retrieval Searching literature by means of computing machines is an important aspect of information retrieval techniques. If the information is recorded in our tabular forms, great technical advantages can result for the retrieval process. These advantages stem from the fact that when one starts to retrieve from the tables, he effectively has a head start over other techniques. For example, suppose the article-word associations are recorded on magnetic tape, and the retrieval is carried out by searching the tape. Consider the following comparison of the effect on retrieval of three modes of recording the article-word associations on the tape (See Fig. 2.) First, suppose each article is grouped FIGURE 2. Three modes of recording information on magnetic tape. with its associated words as in (1) of Fig. 2; then in order to retrieve, every group on the tape must be searched, comparing the words of each group with the given words for retrieval, and those articles that are associated with at least all of the given words are recorded. Second, suppose each word is grouped with its associated articles as in (2) of the figure; then in order to retrieve, the groups associated with each of the given words must be searched, and the common articles recorded. Finally suppose the tables themselves were recorded on the tape, as in (3) of the figure. Here only a single table need be searched, namely the one
OCR for page 1234
--> APPENDIX I Illustration on an actual-size table The possible effectiveness of the method can best be seen by trying a retrieval on an actual-size table. Although it is difficult to estimate the size of an actual table, say for a bibliography of 10,000 articles and 6000 words, we have attempted to illustrate such a table in Fig. 3. Suppose it was desired to find all articles associated with the given word numbers (which have already been arranged in ascending order): 17.16, 24.03, 35.31, 53.26, 72.07, and 89.03 The first step is to turn to Table No. 17.16. Next the nonunderlined numbers of the rows are searched for the second given word number, 24.03. It will be noticed that all rows below the indicated arrow, being greater than the second given word number, need not be searched. Then the rows that contain 24.03 are searched for 35.31, and so forth. The appropriate checks as they finally look are illustrated. Only articles 136.03 and 32.02 are associated with all six of the given word numbers.
OCR for page 1235
--> APPENDIX II Finding articles associated with all except one, or all except two, or all except three,…, or all except n of the given words It may often happen that, given a list of words, there exists no article in the bibliography associated with all of them. That is to say, the result of the above procedures on the index tables of a bibliography produce no articles that are associated with all of the given words. In such cases, one of two paths can be taken. The first is to relax the conditions, i.e., drop one or more of the words from the given word list. In many instances reconsideration of the purposes of the particular retrieval problem indicates that some of the given words are not as important as others and can therefore be dropped from the list. Of course, the smaller the list of given words, the greater is the chance that there exist articles in the bibliography associated with all of them. FIGURE 3. On the other hand, it may often occur that all of the given words seem equally important. In such a case it would be desirable to see if there are any articles associated with all except one of the given words, or if there are any articles associated with all except two of the given words, and so forth. In order to accomplish this, the second path is followed. The procedures for accomplishing this are described in this section. For example, suppose it is desired to find all articles associated with all of the following given word numbers (see Fig. 3): 17.16, 24.03, 25.09, 42.42, 72.07, and 81.08 It happens that there are no articles associated with all six of these given word
OCR for page 1236
--> numbers. To determine if there exist articles associated with all except one of these given word numbers, the straightforward method would be to successively omit one number at a time; this would require a total of seven “passes” through the tables: six passes omitting a number each time plus the first pass. It so happens that in the example there exist no articles in the bibliography associated with all except one of these given word numbers. Hence we would continue and try to find all articles associated with all except two of the six given word numbers; this would require fifteen more passes, or a total of twenty-two passes through the tables, which would certainly be a laborious process. However, there is another scheme for producing the desired results. It turns out that by this new scheme only three passes are needed instead of the seven, and only six passes instead of the twenty-two. In fact, if there are n given word numbers, the total number of passes P needed to find all articles associated with at least n−k of the given word numbers is: by the straightforward successive omission method but only by the new technique to be described. Although the new scheme is rather complicated to describe in words, it is exceedingly simple to perform once the method becomes clear. The new scheme can best be described in terms of the concept of a pass, which will now be described. A pass is associated with the indexing table of the smallest of a particular set of given word numbers. In the following discussion the smallest of the given word numbers is called the first given word number, the next to the smallest, the second given word number, etc. Hence, 17.16 is the first given word number, 24.03 is the second given word number, 25.09 the third given word number, etc., for the set of numbers in our example. There is the first pass, second pass, third pass, etc., for a table. Hence the symbol (T 17.16/P2) is used to represent the second pass for Table 17.16, where 17.16 is understood to be the smallest of a previously given set of word numbers. The first pass for a table is a little different from subsequent passes, and we shall start by describing the first pass by means of an example. The first step is to draw on tracing paper the same number of vertical lines as there are given word numbers, in our case six lines, and then overlay the paper on the indexing table of the first given word number, in our case 17.16, tucking the tracing paper into the binding of the book so that it will not easily slip. These vertical lines are labeled on top by the successive given word numbers in order, starting with the second given word number. Henceforth these vertical lines will be called the given word number lines (see Fig. 4), and each line will be denoted by its label. Now search each row of the table for the second given word number. Draw a horizontal line from the second given word number line to the third given
OCR for page 1237
--> word number line corresponding to each of the rows that contain this second given word number. These horizontal lines will henceforth be called the article row lines. Next, in those rows that contained the second given word number, i.e., those rows which now correspond to an article row line, search for the third given word number; if it appears, extend the corresponding article row line from the third given word number line to the fourth given word number line. Next, in those rows whose article row lines have just been extended, search for the fourth given word number; FIGURE 4. First pass. extend the article row lines another segment, and continue in this method until no more article row lines can be extended. The work for our example, using the table in Fig. 3, is drawn in Fig. 4. Only rows corresponding to articles 136.03, 3.36, 106.21, 32.02, 76.15, and 137.43 include the second given word number 24.03 and hence article row lines are drawn corresponding only to these articles. Next, searching these rows, the rows of articles 32.02 and 137.43 are found to include the third given word number 25.09; the corresponding article row lines are extended. Searching these rows, the given word number 42.42 is not found; hence the procedure has been concluded. This is the process denoted by the first pass for a table. Now consider a pass other than the first pass as, for example, the second pass at the table in Fig. 3. The first step is to relabel the given word number lines; this is done starting with the third given word number (see Fig. 5). Now, in those rows FIGURE 5. Second pass.
OCR for page 1238
--> that do not have a corresponding article row line (i.e., those rows that did not contain the second word number), search for the third given word number and draw the corresponding first segment of the article row line where indicated from the newly labeled third given word number line to the newly labeled fourth given word number line. Next, in those rows whose article row line ends on this fourth given word number line, search for the fourth given word number; where it is found, extend the corresponding article row line another segment; and so forth, as before. The procedure for passes other than the first pass differs from that for the first pass only in that given word number lines must be relabeled, and initially only those rows that do not correspond to any previous article row line are searched. Note, of course, that in order to remember for which given word number any row is to be searched, one merely has to observe the label on the given word number line where the article row line ends (see Fig. 4 and 5). The work for the second pass appears in Fig. 5. Figure 5 illustrates that first searching rows 27.05, 148.16, and 65.38 for the given word number 25.09, we find only row 65.38 includes the word number; and the article row line for 65.38 is extended accordingly. Next, searching rows 136.03, 3.36, 106.21, 76.15, and 65.38 for word number 42.42, we find that only rows 3.36, 106.21, and 65.38 include this word number. Then, searching rows 3.36, 106.21, 32.02, 65.38, and 137.43 for word number 72.07, we find that rows 32.02 and 137.43 include this word number. The article row lines are extended accordingly. Finally, searching rows 32.02 and 137.43 for word number 81.08, we find that 81.08 is not included in either row; and the process for the pass is concluded. The process for a third or fourth pass is similar to the process for the second pass. The process for the third pass of our example is illustrated in Fig. 6. FIGURE 6. Third pass. A pass is said to be brought to a successful conclusion if, when the procedure ends, at least one article row line has been drawn past the largest given word number line of that pass. The articles corresponding to these successful article row lines are called the results of the pass. For example, (T 17.16/P3) is brought to a successful conclusion, as illustrated in Fig. 6, where the results of the pass are 3.36, 32.02, 65.38, and 137.43. Now we are ready to describe the new scheme. Let us consider as an example the table in Fig. 3 and the given word numbers listed in Fig. 3 (Table 17.16). The first
OCR for page 1239
--> step is to proceed with pass (T 17.16/P1). The lack of any results indicates that there is no article associated with all of the given word numbers, and so we proceed to find all articles associated with at least five of the six given words. To do this, we proceed with pass (T 17.16/P2) and pass (T 24.03/P1), where for the latter pass the set of given word numbers is understood to be the original set with 17.16 omitted. The results of these passes will be the desired answer. If again there are no results, then we would proceed to find all articles that are associated with at least four of the six given words. These will be found as the results of passes (T 17.16/P3), (T 24.03/P2), and (T 25.09/P1), where for the latter pass the set of given word numbers is understood to be the original set with both 17.16 and 24.03 omitted. The procedure now becomes clear. For instance, to find all articles that are associated with at least three of the six given words, we would list the results of the following passes: (T 17.16/P4), (T 24.03/P3), (T 25.09/P2), (T 42.42/P1). (See Fig. 7.) All (T 17.16/P1) All except one (T 17.16/P2) (T 24.03/P1) All except two (T 17.16/P3) (T 24.03/P2) (T 25.09/P1) All except three (T 17.16/P4) (T 24.03/P3) (T 25.09/P2) (T 42.42/P1) … … … … … FIGURE 7. The second and third passes are, of course, extensions of the first pass and even though they are here illustrated in three different figures, in practice the work will be done on the same “pass diagram.” APPENDIX III Using words instead of numbers The previous discussions were based on the use of numbers to represent the words. On the other hand, there is an alternative method, encompassing the same principles, that is based on the use of the words themselves. For this method, Part II (the word list) no longer associates numbers with the words. Instead, the word list just becomes a list of all possible words by which one can look up an article. Hence, step 1 simply reduces itself to making certain that the given words (by which it is desired to retrieve) are listed in the word list. Step 2 consists of putting the given words in alphabetical order and looking up the table associated with the first word. The tables are now labeled with the words themselves, there being one table for each word and the tables, accordingly, being placed in alphabetical order. Step 3 is similar to the original method except that words are found in the table where nonunderlined numbers were found before. The words are in alphabetical order within each row, and the first column of words is in alphabetical order. As an example of the use of the words themselves, we have given on the following pages an illustrative bibliography based on the matrix of Fig. 1. Part I is, of course, the same as that given in Section 5, Part I. Part II has the synonyms and word numbers omitted. In Part III there are words in the tables instead of the word numbers. To find all articles associated with the application of nuclear theory, we put the three words in alphabetical order: application, nuclear, theory. Then the application
OCR for page 1240
--> table is turned to and nuclear looked up. It appears in two rows: 2.4 and 3.4. In these two rows, theory is looked up; it appears in both of these rows; so the desired articles are 2.4 and 3.4 by Seidle and by Tove. The advantage of using words in the tables is twofold: first, looking up words is a familiar process in our society, and second, the use of words enables the user to make associations that he might otherwise not think of. If in each row of a table are listed all the words associated with the article of that row, the association aspect would be complete. In fact, in this case, the set of tables would closely resemble an ordinary index, but one which is exceedingly complete. Such a close resemblance to an ordinary index greatly simplifies the use of the tables as far as the user is concerned. PART I. THE ARTICLE LIST PAGE 1 PAGE 2 PAGE 3 Article No. Article No. Article No. 1.1 Abrahams, A.P. Autoradiographic determination of radioactivity in rocks. Nucleonics 15:85–86 Mar 1957 2.1 Nicholls, J. Alpha-scintillation monitor for hands and clothing. Nucleonics 15:80, 81, 83, 84 Mar 1957 3.1 Senior, D.A. The Kerr cell, a high speed electro-optical shutter, Pt. II. Instr. Pract. 11: 471–476 May 1957 1.2 Aravindakshan, C. A simple arrangement for obtaining optical transforms of crystal structures. J. Sci. Instr. 34:250 Jn 1957 2.2 Pope, M.I. An automatically recording vacuum balance. J. Sci. Instr. 34:229–232 Jn 957 3.2 Smith, B.O., and Grimshaw, A.G. A pneumatic level indicator. Instr. Pract. 11:469–470 May 1957 1.3 Gasstrom, R.V. A very fast pulse-height analyser with independent uptake, sorting and storage of information. Nuclear Instruments 1:75–79 Mar 1957 2.3 Powell, D.A. An apparatus giving thermogravimetric and differential thermal curves simultaneously from one sample. J. Sci. Instr. 34:225–227 Jn 1957 3.3 Stockendal, R., and Bergkvist, K.E. Evaporation device for beta-spectrometer samples. Nuclear Instruments 1:53–54 Jan 1957 1.4 NBS Circular 580. Bibliography on ignition and spark ignition systems. Nov 1 1956 2.4 Seidle, F.G.P., et al. Modification of the Brookhaven fast chopper. Nuclear Instruments 1:92–93 Mar 1957 3.4 Tove, Per-Arne. Electronic time analyzer applied to the measurement of the half-lives of metastable nuclear states. Nuclear Instruments 1:95–100 Mar 1957
OCR for page 1241
--> PART II. THE WORD LIST PAGE 1 PAGE 2 PAGE 3 analysis England mass application evaluation Netherlands concept gas nuclear counting hysteresis theory design instrumentation thermal differentiation versatility PART III. THE INDEX TABLES
OCR for page 1242
--> APPENDIX IV Population vs. substantive meaning of the table entries The entries in the tables can be chosen to convey information concerning the population of articles with respect to words, or else chosen to convey some substantive meaning associated with the words themselves. Thus in Section 5 of this paper, the entry numbers associated with the words described how many articles are associated with each word. On the other hand, in Appendix III, the words themselves were used in the tables, and as such the table entries conveyed the meaning of the words. In this appendix we discuss the relative advantages and disadvantages of the two approaches. Using population based entries has the practical advantage of making tables with many rows have few columns, and tables with many columns have few rows. In this way adjacent tables are relatively similarly shaped which is of some advantage in making up page formats, and in addition tends to reduce the number of entries that need be searched. When many words are to be coordinated together, in general, it is most efficient to choose as the first word for the coordination the one which will eliminate the largest number of article possibilities. In the procedure presented above, this is precisely what is accomplished when the table corresponding to the smallest
OCR for page 1243
--> word number is looked up. Continuing, it is best to choose as the second word for the coordination that one which will eliminate the most articles from those possibilities that still remain. The probability of choosing as the second word the one that fills this requirement is maximized in general by choosing as the second word the one with the next smallest word number, and so forth. All this increased efficiency of search depends upon the fact that a population reflecting number is used for the table entries. Finally, there is a possible psychological advantage in using such numbers, for since the meaning of the number has no substantive value, the procedures to be accomplished might be carried out with more precision as a strict routine. On the other hand, when using entries with substantive meaning, one may be more tempted to deviate from the correct method by changing words in the middle, and this will give wrong results. The disadvantage of using population based entries is that, if a bibliography is updated and republished including additional articles, the numbers for the words will change; i.e., the number for the same word will in general be different in the newer version than in the older version of the bibliography. On the other hand, if the entries in the tables have substantive meaning, the entry symbol corresponding to each word will be the same in any updated bibliography as it was in the original. This has the advantage that a person who has memorized a symbol corresponding to a substantive meaning of a word will not be thrown off when a bibliography is updated. In addition the substantive symbol may be more familiar to the user of the bibliography, as for example is the case in Appendix III where the word itself was used as its own symbol. Also when looking in a table, the substantive symbols found as entries might have greater suggestive and associative properties which is completely lost when population based symbols are used. The kind of entry to be employed in any particular bibliography depends on the purpose and use of the bibliography. If the bibliography is small and if it is to be periodically updated and extensively and frequently used by the same individuals, then perhaps entries with substantive meanings should be used. On the other hand, if the bibliography is very large, and will not be revised, and if it will in general not be referred to so often by an individual that he will unconsciously memorize the entries for the different words, then the population based entry is probably most efficient. ACKNOWLEDGMENTS The author would like to acknowledge the encouragement and advice of Richard Dahl of the Office of the Navy Judge Advocate General, John Scherrod and Charles Gottshalk of the Library of Congress, and Mary Stevens, Joshua Stern, and Robert Elbourn of the National Bureau of Standards The author would also like to express his appreciation for support from the Information Systems Branch of the Office of Naval Research, and the Office of Scientific Information of the National Science Foundation.
OCR for page 1244
--> This page intentionally left blank.
Representative terms from entire chapter: