Biocomputing UnitSequence Analysis ServiceEMBL

Autumn 01 Course

Literature retrieval with the Web of Science server

by Toby Gibson, Chenna Ramu and Aidan Budd, November 26th-29th, 2001

In this practical we will investigate the ISI Web of Science server which provides access to the Science Citation Index. As well as pandering to one's vanity by enabling citations to be counted, the server allows one to retrieve articles that cite a key reference. Sometimes this can be a very useful way of searching the literature. For example, if you are interested in applying a method that you are not familiar with, you may want to look in the literature at how and when that method has been applied - and whether it has been modified and improved by anyone. Keyword searches are not much use for doing this because abstracts usually do not detail which methods have been applied. So, finding articles which cite the method is going to be very helpful. By contrast, if you wanted to catch up on recent work with the giant muscle protein, titin, then you just need to do the keyword search since the word titin is extremely likely to occur in the abstracts. So, whether or not collecting articles by citation is useful depends on the kind of literature retrieval you need to do.

Getting started

The teaching machines are INTEL PCs running the LINUX OS. It will take a few moments to get set up.

Exercise 1 Counting citations

You are setting up a new biotech company and want some academic directors on the board. You recently heard Prof. I. W. Mattaj talk at a conference about "nuclear baskets - woven or wickerwork?", a trendy, happening area of cell science. Would he be worth approaching as a director? Has his work been influential or is this just a flash in the pan? You decide to check the citation index...

Citation counts - and especially short term impact factors - should not be used uncritically. There are many misleading factors and biases such as how many scientists work in a particular field and how often they publish. Impact factors are calculated annually so the month of publication can have a big effect on a given paper. Furthermore, impact factors lead to a focus on short term research goals that might be damaging. As an example, "high impact" Cell and Nature have shorter citation half-lives than NAR - so the most cited NAR papers tend to remain valuable to researchers for longer than those in the trendier journals. The true worth of a paper (whatever that is) may not be apparent for many years. An example would be the apoptosis field which suddenly exploded into action based upon pioneering work over many years by rather few researchers. It would be a shame to so construct the scientific enterprise that there was no room for pioneers any more.

Exercise 2 Collecting articles that cite a key paper

Retrieving articles by citation of a key paper is useful any time that one cannot be sure whether keyable information will be present in the abstracts (as with methods for example) or when one cannot be sure which keywords are appropriate. The example we will use is the well known "Nuclear Localisation Signal" (NLS) which refers to a sumbstantial body of literature.

The problem:

Collecting NLS articles:

Collecting articles via primary NLS work:

Comparing annual citations of a paper:

Take home lessons

The workhorse of literature retrieval is the freely accessible PubMed service run at Entrez/NCBI. PubMed has many useful properties including cross-links from databases like EMBL and SWISS-PROT that will never be practical for a commercial service like ISI. The advantages of ISI are: a larger set of scientific journals; citation counts; finding articles by citation. These features are useful for text mining and complementary to the free PubMed service. But let's end with a warning: don't get too obsessed by citation counts...