Cover Image

Not for Sale



View/Hide Left Panel

SUMMARY OF DISCUSSION

Introduction

The task of the discussion panel, as outlined by Dr. Gilbert King, was to try to present a coherent picture of the status of developments in information storage and retrieval systems, developments which are in a state of flux. The subject is not yet a science, and there was no attempt to classify the submitted papers in a formal arrangement. Rather, the discussion was divided into two pieces, called the “what we have to do” section and the “how to do it” section.

To a scientist trying to make use of information, such questions arise as, are there really any good retrieval systems, and do scientists in the laboratories really use these systems? Dr. King said that one cannot point to a system which has appeal and stimulation and which has been spontaneously adopted.

One reason is that in present systems the process of putting material into the store or library can be considered a translation process, from the material language of the text into the index language of the system. The user is forced to ask his questions in this primitive language. We must introduce some kind of syntax, so that scientists can communicate with libraries.

The devisers of systems seem to think that scientists do their work by asking specific questions, but this is not so. What the scientist wants from the library is suggestions, parallelisms, and stimulation. To get this, he must be able to ask questions freely. But as already noted, he must learn this peculiar language invented for him by librarians. So one big problem, not discussed in the papers in this area, is an operations analysis of the type of questions which are asked of a library. How can the users learn to speak in the language employed to organize the information in the store?

In Dr. King’s opinion, we shall not make much more progress until we have a rigorous mathematical model of storage and retrieval systems, whether mechanical or manual. In the case of the simpler indexing and classification schemes the model is very simple. Each document is assigned a binary number. Each digit corresponds to some description or index head. The problem of classification is merely a question of how to arrange an arbitrary set of binary numbers. Retrieval is merely the question of searching for binary numbers with “ones” in specified locations. More importantly, we realize immediately that the language of the library and the language that the interrogator is required to use are both the simple language of binary numbers.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1255
--> SUMMARY OF DISCUSSION Introduction The task of the discussion panel, as outlined by Dr. Gilbert King, was to try to present a coherent picture of the status of developments in information storage and retrieval systems, developments which are in a state of flux. The subject is not yet a science, and there was no attempt to classify the submitted papers in a formal arrangement. Rather, the discussion was divided into two pieces, called the “what we have to do” section and the “how to do it” section. To a scientist trying to make use of information, such questions arise as, are there really any good retrieval systems, and do scientists in the laboratories really use these systems? Dr. King said that one cannot point to a system which has appeal and stimulation and which has been spontaneously adopted. One reason is that in present systems the process of putting material into the store or library can be considered a translation process, from the material language of the text into the index language of the system. The user is forced to ask his questions in this primitive language. We must introduce some kind of syntax, so that scientists can communicate with libraries. The devisers of systems seem to think that scientists do their work by asking specific questions, but this is not so. What the scientist wants from the library is suggestions, parallelisms, and stimulation. To get this, he must be able to ask questions freely. But as already noted, he must learn this peculiar language invented for him by librarians. So one big problem, not discussed in the papers in this area, is an operations analysis of the type of questions which are asked of a library. How can the users learn to speak in the language employed to organize the information in the store? In Dr. King’s opinion, we shall not make much more progress until we have a rigorous mathematical model of storage and retrieval systems, whether mechanical or manual. In the case of the simpler indexing and classification schemes the model is very simple. Each document is assigned a binary number. Each digit corresponds to some description or index head. The problem of classification is merely a question of how to arrange an arbitrary set of binary numbers. Retrieval is merely the question of searching for binary numbers with “ones” in specified locations. More importantly, we realize immediately that the language of the library and the language that the interrogator is required to use are both the simple language of binary numbers.

OCR for page 1255
--> Some papers discuss a model of retrieval in which the binary numbers are arranged as a partially ordered set, but even these models reveal the lack of texture in the organization. The relationship of documents can only be expressed by the limited properties of binary numbers. Their environment is not at all subtle and therefore can hardly be expected to provide sophisticated search procedures. It is extremely important that more complex retrieval systems be invented, and formulated in a similar kind of mathematical model. Types of questions It is ten years since there was a similar conference and in most disciplines of science there are no more information services available for the scientist now than there were ten years ago. One of the main reasons for this, according to Dr. Dyson, may be that we are thinking on too broad a basis. He said that it would seem advisable to devote ourselves to the solution of a certain number of smaller practical problems in the various disciplines, rather than trying to cover the whole realm of scientific thought. One of the first things to settle is what kinds of questions do we want to answer in new systems, which present systems do not enable us to answer. These questions would be different in the different disciplines. In organic chemistry, for example, we cannot now find all the organic compounds of iodine in regular indexes, without reading through the whole index. This kind of search would be useful if it could be carried out fairly easily. One reason why this type of search has not been made possible is that we have not yet made a sufficient study of language. Nobody has yet ascertained whether a basic language in organic chemistry can be created, to simplify the question of putting vast amounts of knowledge into mechanical records. Another type of search we would like to make in organic chemistry is the correlative search, where one searches for substances with certain common properties. The language of new information systems must be designed so that these searches can be made. Mr. Perry emphasized that we must not fail to consider the importance of the questions to be asked and the form these questions might take. Answering questions is the objective in any system, and economic benefits result from the ability to answer questions. It is the margin of benefit over costs that will be decisive in the selection and operation of new systems. But we must beware of false assumptions that impede or misguide efforts to emphasize the practical side in the development of documentation systems. One assumption is that the use of the literature is not much influenced by the available facilities for getting at it. But even the questions posed in the

OCR for page 1255
--> library are strongly influenced by the available means for getting the questions answered. Many persons ask questions in terms of what they think a conventional subject index can answer, rather than what a machine searching system can answer. To design our new systems on the basis of the kind of questions that now are directed to the library is to overlook what we can do. Of course this does not suggest that we should abolish abstracts and indexes because there will always be questions for which such tools will be extremely useful. The forms of questions asked may be very diverse, Mr. Perry said. Often, people ask questions with no idea of what they really need. In mechanized systems, a preliminary search can be run to orient oneself in a field. At the present time and probably for the foreseeable future, we shall be operating in terms of first identifying what is of probable interest. This involves relating the questions to the means established for getting answers: looking under the proper subject headings in an index, using the right class numbers in a classification scheme, devising instructions for programming in a mechanized system. By using the wrong headings or class numbers, or devising a wrong program, one can run a search and report that nothing on the subject exists in the system. This kind of error will not be changed by mechanized systems. Functions of information systems Suppose, now, we were suddenly confronted with the ideal retrieval machine so that the mechanics of retrieval is easy—would we know what to do with it? In Dr. Swanson’s opinion, the benefit of thinking in terms of such a machine would be the ability to separate the conceptual and intellectual problems from the strictly hardware problems. Then, at some point in the future we will be able to tell the machine designers what is actually wanted in the way of a machine. We do not know whether the operations we now know how to perform with machines would lead to a better system than the present methods of getting information out of libraries. The operations we now know how to perform can be enumerated: we can search for words; we can give the machine a dictionary and search for synonyms and near synonyms; we can search for logical combinations of words and require that they occur in certain proximity to one another. Even granting that we could do all this, what would be our procedure for evaluating such a system? No soundly designed experiments have been carried out to measure the efficiency of retrieval. One of the critical questions in evaluating a retrieval system, Dr. Swanson stated, is what still remains in the library of interest that was not recovered by any of the known methods of retrieval. We may have a false sense of security

OCR for page 1255
--> that we have retrieved everything just because we get a large response to our question. We now have the tools for carrying out experimental observations on what we would do once we are presented with the hypothetical ideal machine. Once we study the retrieval process we may decide to keep the separation into indexing and then retrieval operations. It might well be more economical to perform once certain operations which are identical to all questions, and reserve for the retrieval processes those operations which vary from question to question. In considering the practical aspects of the subject, Dr. Sanford commented that it is interesting to note the competitive nature of the problems discussed in the papers in this area. The scientist requires a fast, exact information service. He also requires a service which will provide bibliographic search. Unfortunately, the best organization of material to serve these two basic needs is different in each case. In the storage of information, choices also have to be made; terms are collected under item codes or item codes are collected under terms. We have the added rivalries of subject matter. Various kinds of scientific subjects require different approaches in analysis and organization. We now inquire as to what to do, and this is presumably the question for which this conference was called. The members of this panel are sure of this, that an integrated effort will grow and mature only in a climate of active cooperation among us. Dr. Sanford went on to suggest that in seeking an answer as to what to do, we might consider system compatibility as the first objective. We already have satisfactory answers to a large segment of our problems, but only when each problem is considered alone. When we try to go from one individually good system to another, we find we lack a bridge. Language itself might be used as the missing bridge. Let us explore this use of more direct language. The need for compatibility is one of our most immediate and fundamental problems. This compatibility must be not only between disciplines but also among the tools for handling information. We must also consider the need for economy, not only economies in cost but also in effort. Many electronic developments are ready to serve us as soon as we know how to use them. Considering machines as additional tools of our trade, their first contribution may be to provide the missing links between our systems. We might be able to continue to use our own separate classification schemes without the penalties of insularity. Machines might also provide the economies of handling which we need. The evolution of regional, national, and international centers and clearinghouses equipped with electronic devices already existing, could provide bridges between our present specialized disciplines and

OCR for page 1255
--> methods. These centers could bring order out of our present kaleidoscope of compatible problems, if we are willing and able to think and work together. Mr. Cleverdon said that in establishing cooperation between librarians and those who come into documentation from other fields, the librarians have the right and duty to tell what their problems are and to direct the efforts of these specialists toward the solutions of the problems. The librarians must make sure first that the problems are not of their own making. For example, many work in the vast field of the applied sciences. It appears probable that the useful life of the information in these collections is reasonably short. The time for retaining references in catalogs should be equally short. The librarians should make certain they are not creating their own difficulties by allowing subject catalogues to become large and complicated. Only when all irrelevancies are cleared out can the true problem be posed to those who are trying to help. We already know enough of the capabilities of machines in retrieval to know that within the limits of the information put into them and if correctly programmed, they can do everything a human can do in a clerical sense, but we are justified in asking what will be the cost. And we can doubt that machines will for many years to come solve all the problems in information storage and retrieval. For example, Mr. Cleverdon continued, one comment on the papers in this area is that many appear to be fascinated by the ultimate goal and are unwilling to appreciate the efforts required at the input end. Our really hard work is in indexing. This is a chore which should be lightened as far as possible. The indexer has to decide on the subject content of the document. Then the chosen subject must be translated into the language of the indexing system. More work should be done from the input side, for the work of indexing is the most important subject of information retrieval. The more straightforward this task can be made, the more efficient and satisfactory will be the retrieval. Dr. O’Connor focused his comments on questions that have been raised about certain difficulties in using mechanical systems. The Crane-Bernier paper, for example, raised these questions about manipulative search systems in general. Certainly subject indexes are easily accessible compared with access to a machine documentation center. Searching in a printed index or through a card catalog has certain virtues in keeping one in immediate contact with the subject matter. But many of the problems suggested for manipulative systems can be solved. Thus it is possible to avoid the danger of using too many search terms in a question, and thereby not finding relevant papers, by a device called sub-searching. By this one searches for all documents containing all the terms one specifies; those containing all but one of the terms, those with all but two of the terms, etc., to those documents containing just one of the

OCR for page 1255
--> terms. Sub-searches may be done automatically on the IBM 101 machine or on computers. Otherwise by making a number of separate searches these sub-searches are accomplished. They are also a way of getting related information while looking for relevant documents. The paper by Opler and the work of the Cambridge Language Research Unit and the U.S. Patent Office all touch on this question of relevant browsing in a mechanical system. Dr. O’Connor said that other questions raised as to the difficulties in using mechanized systems, such as objections as to programming time or delay in getting information, can similarly be answered when one is more experienced in the use of machines, what they can do, and how they can be used to good advantage. As for costs, we do not have enough cost information now to know whether a mechanized search costs more than a non-mechanized search. Even if it should cost more, the information found may be worth it. Classification and indexing Research on classification going on in England was presented in some papers. Mr. Farradane said that the workers there define classification as the ordered representation of an idea or a complex of concepts in such a way that a unique symbolization of it is given in an ordered sequence and it is retrievable because it has a position which can be defined. That is, classification is not only a hierarchical division; it can be indexing as long as the index is sufficiently ordered to give a sequence with a unique and recognizable order. Facet analysis divides a subject into a number of different groups of terms of the same kind. Then ideas are expressed by reassembling these terms in various combinations with considerable freedom. This method has been proved a useful tool in various practical cases in England. Within a special field of knowledge a special classification can be made along faceted lines which gives a considerable degree of flexibility and usefulness in handling the field. In putting terms in a preferred order, one assumes some reason for that order which is really an implied relation between the terms. This problem, of relational connection between ideas, is a field of study which requires a great deal more attention. All this work requires more basic research, basic experimental investigation of what we are dealing with and what we are trying to do. When asked what is the ordering relationship practiced by the British workers, Mr. Farradane replied that inclusion is one of the possible relations; others are hard to define but the preferred sequence gives the implied relations between terms. Dr. O’Connor commented on the differences between some British and American approaches to retrieval systems. In America we tend to have an

OCR for page 1255
--> unorganized list of descriptive terms, or a list organized only to permit the user to search it to find the terms he wants to use in his question. In the British systems, the list of index terms is so organized as to impose an order on the documents to which those terms are assigned. The intent is that the order imposed on the document collection will be useful for searching. In answer to a question as to how to handle relations with the peek-a-boo retrieval system described in his paper, Dr. Gardin reported on three ways of expressing them. A simple solution is to use inflected terms, specifying whether a term is used as subject or object in the action. That gives a first degree of abstraction useful when you do not have to take into consideration the exact type of action taking place. If you want to indicate what happens between the subject and the object you have to add a third term, usually a verb. To indicate verbal relations, the type of action actually taking place, you can use two very general positions, he said, such as positive action and negative action. There is a third type of relational term which concerns topographical relations, indicating terms such as “to the left,” “above,” and “interweaved.” These three classes of terms are treated just as if they were entities as a being or noun in a lingual system. Use of large-scale machines In our efforts to organize information, we have again not yet taken full advantage of present capabilities of automatic data-processing machines in the analysis of material previous to even starting a system. Mr. Luhn pointed out that we can employ machines to great advantage to do this analysis, to give us material on which to build a sound system. We can use the machine to encode this material. A searching procedure may be tested by setting up experiments where a computer simulates conditions under which a system will operate and tests it to find optimum conditions of operation. This is a saver in time and money for designers of retrieval systems. We want to assign to machines the run-of-the-mill chores. As we get more experience we can give the machine more and more of what we have to do. There is, however, a residual which may never be turned over to the machine. This is the place where the librarian plays the role of interpreter. The interrogator approaches the librarian, gives him the questions and ideas, and the librarian then communicates with the machine. Another point brought out by Mr. Luhn was that in our attempts to mechanize indexing and retrieval we are imitating systems designed originally for human capabilities. Machines have entirely new capabilities. Unless we utilize them to the fullest, we will never solve this problem. It will take many

OCR for page 1255
--> years before we learn what machines are and how they would prefer to act, but this is the general direction in which we will have to go. According to Dr. Moore, in our discussions we can find three levels of what machines can do: present day capabilities; capabilities after machines have learned to read printed text; and capabilities after machines have learned how to think. Misunderstandings develop unless we are quite specific as to which level we are talking about. Present day capabilities require us to do a lot of translating ahead of time into special code. Systems in use at the present time are based on a large amount of clerical transformation and critical analysis of the information to be put into the system. There are some problems definitely amenable to such machines. A good example is the organic structural formula problem, of identifying and searching for compounds with certain fragments, Dr. Moore said. Again the concordance type of analysis, making an exhaustive list of every occurrence of every word, is something that is feasible and easy for a machine to do. Dr. O’Connor noted that there are ways in which machines can help in the production of a traditional search tool, a printed subject index for example. Or a computer suitably programmed might take an input of index entries and determine what would be the most efficient and the most easily searched arrangement of the index entries. Machines that can read are possible, in Dr. Moore’s opinion, but their application to scientific material will pose some problems. Thus if you want to read Chemical Abstracts you need to have over 900 symbols at your disposal. A device to read these would be much harder to build than one that can read 50 or 100 symbols. The machines that “think” are logically possible but whether or not they are economically feasible is another matter. If we can build a machine that can think and still keep it interested in doing drudgery and the bibliographic jobs we now assign to machines, then we will be on an entirely different level of discussion than we are on now. In the area between the two levels, of machines to which you give explicit codes and those which can read, there is an interesting character recognition problem, that of recognizing bibliographic references—to decide whether or not two references are to the same document. Dr. Moore observed that if the ingenious idea of a citation index is to be carried out economically, it would depend on the machine taking bibliographies of the technical papers, sorting them internally, and comparing to see which are the same references. Then the machine could type out an inverse listing, giving you all the articles which cite a given document. It would seem that this is an easier recognition problem than reading ordinary letters.

OCR for page 1255
--> On the general subject of mechanized indexing and abstracting, even though it might not be feasible on present computers in terms of cost, then it might very well be feasible on the next generation of computers, according to Dr. O’Connor. One of the questions that occur on reading of Harris’ and Luhn’s work is how to handle synonyms. A purely formal frequency count might be thrown off as to which are main subject words. And leaving some adjuncts together with the centers might throw off the frequency determination of the main subject somewhat. Mr. Luhn’s thesaurus approach would take care of the synonym problem. Two further complications arise from a mechanical index. Some articles might deserve as an indexing term a word not contained in the article. By an authority list, the product of the mechanized indexing procedure might have such additional words added to it. Again, an article might use a particular word but the vocabulary of the system might prefer another one. This also can be handled by a mechanized authority list. Another complication is that mechanized indexing finds in a paper what was important to the author. What happens if there is something in the paper not important to the author but of importance to the indexer? One possibility is to have a list of words and phrases expressing the interests of a particular collection, which the machine looks for in the papers. If this word or phrase occurs even once, it should be picked up as an indexing term. Research on natural languages “Why was indexing invented in the first place?” asked Mr. Luhn. It was so that we would not have to scan everything to recognize those things of interest to us. But machines can scan at fantastic speeds today and theoretically do not need indexing systems. (This is the basic assumption of automatic abstracting.) If we can take the inquiry in the open language of the inquirer and by means of the proverbial black box, translate this inquiry into the many variations that might occur in the stored text, then we could ask the machine to lead us to similar passages in the collection. Dr. Swanson suggested that basically, we should give more attention to the storage of and operations on raw natural language texts. The separation into indexing and then retrieval becomes artificial if the capability of our machine is great enough. This does not mean we should abandon study on the organization and classification of documents. But in proportion far less work is now being done on how to handle raw text or natural language. What do we now know about the potential of operating upon natural language? We should conduct more research on natural language text without regard to the division of input and output. A number of papers in this area were

OCR for page 1255
--> addressed to this problem, in particular the papers of Harris and of Masterman et al. A number of people allege, Dr. Swanson continued, that there exists an analogy between mechanical translation and information retrieval. We can say that they both take as their starting point operations on natural language. In the Harris paper linguistic transformations are discussed from the point of view of using them for the purpose of more compact storage or in the indexing process. They might first be used on natural language recorded in a computer in order to discover the relationship and usefulness of such transformations in the retrieval process. The paper by Masterman et al. purports to discuss the question of analogy between mechanical translation and information retrieval. Dr. Swanson asked if one can translate from a foreign language into a pidgin English, then improve the translation in the target language by means of the thesaurus, why it is not easy to translate from English to English using the thesaurus? It would seem to be a worthy objective to establish before attempting translating from a foreign language into English. He also asked if, in translating from one language to another via an interlingual thesaurus, this thesaurus would have to be at variance with respect to language. That is, the thesaurus headings would have to be clearly identifiable in the source language and in the target language, even though the list of synonyms that appears under each would be different in the two languages. Dr. Swanson went on to say that both mechanical translation and information retrieval take as their starting point operations on natural language and they are both somewhat concerned with the synonym problem. They might have one more attribute in common, namely that there may be certain approaches (e.g., the thesaurus approach) which fail for one as for the other. In replying to Dr. Swanson’s questions, Miss Masterman said that he is asking that the thesaurus should become a philosopher and should engage in a continual redefinition. And this process of redefinition within a field is just about the most sophisticated, profound, and subtle operation of which the human mind is capable. To require this of what must be at this stage a very primitive thesaurus seems to be philosophically unreasonable. About the other criticism of the interlingualness of the thesaurus, Miss Masterman felt that mechanical translation and the study of natural language cannot contribute to library retrieval unless we can find an interlingual way of doing it. On the other hand, Dr. Bar-Hillel, of The Hebrew University, Jerusalem, said the hope that analysis of ordinary language would yield anything in the foreseeable future for information retrieval or language translation is totally unjustified.

OCR for page 1255
--> Partial replacement, at least, of two-stage information retrieval by one-stage retrieval should be investigated, according to Dr. Bar-Hillel. We now go to the literature for reference to documents in which the information we need might conceivably be found. There is advantage in going straight to the information and finding out which sentences in the storage systems are relevant to our problem. But this would require first a transformation of the information from ordinary language into some kind of normalized system. The difficulties of doing this are reduced by the fact that in using scientific language our vocabulary is reduced, and the syntax of scientific writing is essentially less complex than that of ordinary expression. The U.S. Patent Office is tackling its problem in the non-chemical field by a system of “ruly English.” Mr. Andrews of the Research and Development Division there pointed out that there are many complex relationships which cannot be handled by the syntax useful in the chemical field. The Patent Office is trying to build up a basic understanding of the language we use and speak every day. By and large, natural languages are with us to stay, as Dr. Oettinger of Harvard said. A long-term view of the information problem requires acquiescence with the retention of natural language in these devices. It is also clear that artificial languages are here to stay inside machines. There are compelling reasons of technology and economy that dictate that inside a machine is a language best for the machine even though not best for humans. Here we are faced with a coupling problem. The use of kernels as a means for generating abstracts or for picking out significant passages in a text, is all well and good once the kernel is formed. What would be of tremendous importance so far as the direct use of natural language in this area is concerned, is an effective algorithm for generating kernels. This is all one area where some of the research on automatic translation may join with the problems of retrieval in that this problem (of finding kernels) is also one of automatic translation. Some technical requirements Papers in this area have not really covered the whole problem of the design of an information retrieval system, according to Mr. Buckland of Itek Corporation. The user has been left out, or has been paid only lip service all along. The user is an active rather than a passive element. Current machine design should try to make use of his tremendous inductive powers and use the interrogator in a feed-back loop. It might amount to giving the man more eyes

OCR for page 1255
--> and ears so that he might have data presented to him for him to make the decisions as to whether the data are relevant or not. The author of an Area 5 paper, Professor Meredith, also observed that one of the stages in the process of communication, that of the user, has not been considered very much in this conference. Even those who have stressed the importance of the user of information have tended to assume that they are only concerned with those users of information who are specialists. There are also problems of the comprehension of scientific literature and its transmission across disciplinary boundaries. There are obvious conceptual difficulties and tools are needed which can help to analyze concepts into general terms. Another author, Dr. Vickery, said that one purpose of this Conference is the improvement of the flow of scientific information at all levels. Although the prospects held out by the use of large computers are fascinating and exciting, we should not neglect the study of smaller systems. It is important to study not only how to design larger systems and more efficiently, but also to examine to what extent we can improve the efficiency of all the many small existing systems. The catalog based on faceted classification is admittedly only suitable for a relatively small system. The faceted method of analysis is an attempt to formalize the language, to control the terms and relations between them in constructing an indexing system of this relatively small scale. Improvement of retrieval at all levels and all sizes of systems needs to be studied, he added, not merely to concentrate on what may indeed prove to be the hope of the future, that is, the direct use of natural language text. Dr. Moore remarked that there are hopes that when one can read the full text of material it will not be necessary to use a concordance. The method of Mr. Luhn, of choosing the most frequent words and selecting sentences containing clusters of them as a way of writing abstracts automatically, is a first attempt in this direction. Innovations like machine abstracting and machine indexing provide us with a facility which can give a significant statistical sample for analysis. Mr. Estrin, author of an Area 5 paper, suggested that Mr. Luhn should see that there is widespread distribution to abstractors and linguists of the products of the automatic abstracting process, so that they might vigorously attack the deficiencies. If he should get such feedback, he would get clues as to how to make whatever improvements can be made in his particular system. Scientific writing Dr. Ranganathan concluded the panel discussion by considering the problems of scientific writing. One important factor we might lose sight of is the quantity of literature being produced, that has to be stored and retrieved. The

OCR for page 1255
--> scientist would prefer to have very few papers and very short ones. But why should so many different papers be produced on the same subject? We must examine whether there is any valid reason for repetition. We must examine the kind of audience as well as the kind of writing necessary to reach the audience. We can recognize six levels of audience and therefore six levels of writing in which any idea or any new thought should be embodied. There is the master mind that creates ideas and expresses them in ordinary language. His audience is comprised of the tops among the intellectuals. Their job is to exploit what this genius has discovered. They have no time to write for the benefit of other people. They write for their peers. That is the second level of writing. A third level communicates facts and data. This kind of writing is needed largely by engineers, technologists, and so on. We also want the knowledge to reach the people. It should be socialized. For that purpose we want somebody who can write to catch the attention of the people. There is a need for another kind of writing. We want research to run in series, not in parallel, and we do not want unnecessary repetition. Therefore, we want report writing. We also want the ideas to reach down to those less mature, both in the physiological and psychological sense. We want all of these types of writing, Dr. Ranganathan continued. But should we put all of them into storage for retrieval? This Conference is concerned only with the first three classes: seminal writing, knowledge writing, and data writing. But in storing them we have to select, and selection implies rejection. Unfortunately, this Conference has not thought about these techniques. But we should take them into consideration if we are going to make a store of reasonable size. Dr. Ranganathan then asked, where classification comes in relation to machinery? In data writing, the data, facts, formulae, properties, are in associative-form. It is quite easy for these data to be fed into machines and to be retrieved as and when we want. Classification will have to do the facet analysis accurately and make it available to be fed into the machine. In knowledge writings there are two difficulties. We cannot index them merely by words because the messages that they contain are often found not exactly in this word or that, but between the lines, so to speak. If we miss them we miss the document altogether. Machinery might not be able to spot these ideas. The other difficulty has to do with the words we use. When new thoughts emerge, we have no words prepared for them. So we must ask some existing words to take care of the new ideas. This raises the question of the possible efficiency of machines in indexing or abstracting these knowledge writings. With regard to data documents, we can probably surrender some of the

OCR for page 1255
--> work to machines. Whether we can do the same with knowledge documents is questionable. We at least have to put into the machine some work already done by the human mind. That is the work of classification, Dr. Ranganathan concluded. Professor de Grolier remarked that there were a number of papers in Area 5 which fall in the field of relations between linguistics and semantics and information retrieval. There are two different views on the point. One thesis seems to be that the present state of the language of science is more or less permanent. There is another school which tends to prove that there is something to be changed in the language of science from the point of view of better information retrieval. Professor de Grolier suggested that it would be useful after this Conference to have a conference on linguistics and semantics in relation to information retrieval. MADELINE M.BERRY, Rapporteur GILBERT W. KING, Discussion Panel Chairman