by Toby Gibson, Chenna Ramu and Aidan Budd, November 26th-29th, 2001
In this practical we will investigate the ISI Web of Science server which provides access to the Science Citation Index. As well as pandering to one's vanity by enabling citations to be counted, the server allows one to retrieve articles that cite a key reference. Sometimes this can be a very useful way of searching the literature. For example, if you are interested in applying a method that you are not familiar with, you may want to look in the literature at how and when that method has been applied - and whether it has been modified and improved by anyone. Keyword searches are not much use for doing this because abstracts usually do not detail which methods have been applied. So, finding articles which cite the method is going to be very helpful. By contrast, if you wanted to catch up on recent work with the giant muscle protein, titin, then you just need to do the keyword search since the word titin is extremely likely to occur in the abstracts. So, whether or not collecting articles by citation is useful depends on the kind of literature retrieval you need to do.
Getting started
The teaching machines are INTEL PCs running the LINUX OS. It will take a few moments to get set up.
- Login with your EMBL name and password.
- Start the KDE Desktop by typing exec startx.
- Start netscape from the pullup starting icon at the lower left of the screen.
- Load this page into it. (It is a few clicks from EMBL's home page).
- Check that javascript and style sheets are enabled in the netscape preferences in the advanced options.
Exercise 1 Counting citations
You are setting up a new biotech company and want some academic directors on the board. You recently heard Prof. I. W. Mattaj talk at a conference about "nuclear baskets - woven or wickerwork?", a trendy, happening area of cell science. Would he be worth approaching as a director? Has his work been influential or is this just a flash in the pan? You decide to check the citation index...
- Load the Web of Science top page into a navigator window.
- Note the options, then click on the Full Search button.
- The Full Search date and database set up page loads.
- What databases can you search?
- Which is the earliest year you can search?
- Tick the Science citation index box, then click the General Search button.
- Type mattaj i* in the AUTHOR box and click SEARCH. (We'll initially assume there is only one I. Mattaj.)
- Examine the search results page.
- Note the buttons at the top of the page - we will use these later.
- How many articles are listed on the page?
- Is that all the articles found?
- If not, how many articles has the researcher published: < 10, < 50, 100 - 200?
- When was the first article published?
- (Hint - you will have to click to the relevant page.)
- Has the researcher been active enough to warrant our interest?
- Now we want to check if the researcher's more recent work is actively cited.
- Click on the DATE & DB LIMITS button to return to the setup page.
- Click on the Limit search button, tick the boxes for the years 1995 - 2001 and click GENERAL SEARCH.
- Note that your keyword query setup has been remembered!
- You must use the buttons to return to the setup pages.
- Choose Sort results by: Times Cited and click on search.
- By clicking on the links find out:
- If the results are correctly sorted by number of citations.
- Does the most cited paper have <10; 10 - 100; 100 - 500; 500 - 1000 citations?
- Is this paper a review or an experimental article?
- Is it being cited much this year?
- How many reviews are in the top 10?
- Are there any papers in this time period with no citations?
- So called "high impact" papers are collected by ISI for the two preceding years:
- Use the buttons to refine the query for the years 1999-2001.
- Top journals have an "impact factor" roughly in the range 20-30.
- By this yardstick are any of the recent papers "high impact"?
- You are now in a position to decide whether you want to recruit our subject to the board!
- [Er... citation counts say nothing about personality and appropriateness. If that is our only criterion, we could get a shock when our new colleague turns out to have appalling man-management skills while frittering away venture capital due to his expensive habits, insisting on flying first class, dining in expensive restaurants and staying in top hotels! (Though not in this example of course!)]
Citation counts - and especially short term impact factors - should not be used uncritically. There are many misleading factors and biases such as how many scientists work in a particular field and how often they publish. Impact factors are calculated annually so the month of publication can have a big effect on a given paper. Furthermore, impact factors lead to a focus on short term research goals that might be damaging. As an example, "high impact" Cell and Nature have shorter citation half-lives than NAR - so the most cited NAR papers tend to remain valuable to researchers for longer than those in the trendier journals. The true worth of a paper (whatever that is) may not be apparent for many years. An example would be the apoptosis field which suddenly exploded into action based upon pioneering work over many years by rather few researchers. It would be a shame to so construct the scientific enterprise that there was no room for pioneers any more.
Exercise 2 Collecting articles that cite a key paper
Retrieving articles by citation of a key paper is useful any time that one cannot be sure whether keyable information will be present in the abstracts (as with methods for example) or when one cannot be sure which keywords are appropriate. The example we will use is the well known "Nuclear Localisation Signal" (NLS) which refers to a sumbstantial body of literature.
The problem:
- Retrieving articles that discuss nuclear localisation signals is clearly going to be awkward:
- Will it be referred to as a signal or a sequence, or even as a motif or pattern?
- Is it localisation or localization? - unfortunately it is both.
- Is NLS sometimes used in abstracts without the full definition?
- What if it also stands for nonlinear Schrodinger equation?!
- Will the NLS be mentioned at all in the abstract?
Collecting NLS articles:
- Using your newly gained experience, set up queries of science citation index for all years
- Using these TOPIC keyword combinations:
- Count the number of articles retrieved for:
- nuclear localisation signal
- nuclear localization signal
- nuclear locali* signal* or nuclear locali* sequence* or NLS not schrodinger not nonlinear
- Note the use of logical operators to better specify the query.
- A list of operators and their usage is outlined in the on-line help.
- You might be able to think of even more complex variations but we have probably retrieved most of what we can get by NLS-based keyword search.
Collecting articles via primary NLS work:
- The classical bipartite NLS was clearly described by Dingwall and Laskey in a 1991 TIBS review.
- (Note that the literature goes back several years before that.)
- To get this article, search all years, sorting by Times Cited and entering dingwall and laskey into AUTHORS .
- Is the article their joint best-cited paper?
- What variant of nuclear localisation signal is in the abstract?
- Or do they use another definition?
- Ought we to enlarge our keyword query to include or nuclear target* signal* or nuclear target* sequence*?
- Are there more articles citing this paper than we retrieved with the NLS keyword search?
- Is there are a 2nd highly cited nuclear targeting paper from this group?
- Do you think the sets of citing papers will strongly overlap?
- Examine the abstracts of the 10 most recent papers citing Dingwall and Laskey:
- How many of these papers could also be retrieved by the complex NLS keyword search?
Comparing annual citations of a paper:
- One problem with using the 1991 Dingwall and Laskey paper to retrieve current articles is that it is now 10 years old and its citation rate may be in decline. We can find out if this is a serious limitation.
- Click to Date & DB Limits and set the year to 1995, then click to CITED REF SEARCH.
- Type Dingwall C in AUTHORS and 1991 in CITED YEAR then click LOOKUP.
- This retrieves a list of papers published in 1991 but cited in 1995.
- How many TIBS articles did Dingwall apparently publish in 1991?
- Is he really that prolific?
- Tick the box for the proper citation and click SEARCH.
- How many articles cited the 1991 paper in 1995?
- Return to Date & DB Limits and set the year to 2001. (Note unclick 1995 too!)
- Repeat the process to find out how many articles cite the 1991 paper in 2001.
- How many articles cited the 1991 paper in 2001?
- Is it more or less than in 1995?
- Is it a big difference?
- One could repeat the process for every year since 1991 to get the citation curve for this paper.
The workhorse of literature retrieval is the freely accessible PubMed service run at Entrez/NCBI. PubMed has many useful properties including cross-links from databases like EMBL and SWISS-PROT that will never be practical for a commercial service like ISI. The advantages of ISI are: a larger set of scientific journals; citation counts; finding articles by citation. These features are useful for text mining and complementary to the free PubMed service. But let's end with a warning: don't get too obsessed by citation counts...