Introduction to the Teaching Computers
(and some common bioinformatic tasks)
Niall Haslam and Aidan Budd
Before we begin...
We'll ask those of you interested in presenting their sequences of interest on the final day to let us know (stick your hands in the air!)
During today, hopefully you can form groups of two people, presenting together one of your sequences/systems on the final afternoon.
We will lead you through some common computational and sequence analyses tasks on the projector. This should help make the rest of the course easier to follow for some of you.
Browsing the web
- Starting a web browser
- Click the "Firefox" icon top left of the screen
- Searching the web
- Let's begin by looking for pictures of Toby...
- Search using the "Google" search window top-right of the browser window
- Opening an additional web browser window
- Opening a web browser tab
- Closing a web browser tab
- Opening a link in a new tab
- Right-Click (on mouse) and choose "Open Link in New Tab"
- Downloading a file via a link
- Let's do that with the PDF of one of the ELM papers, e.g. searching using
toby gibson elm pdf
- Right-Click and choose "Save Link As..."
- This works with lots of different types of files e.g. containing sequences, structures etc.
- Saving a webpage in HTML - and opening the local version of the file
- "File" (from browswer toolbar) -> "Save Page As"
- Bookmarking a page
- Creating a bookmark folder
- Firefox Bookmarks->Organise Bookmarks
- Select bookmark collection in left column
- Right click in Right Box and choose "New Folder"
- Changing Desktops
- Use the Desktop Chooser on the bottom panel
- Add more Desktops using
- Konqueror (Icon top-left of screen) Go->Settings->Multiple Desktops
- Creating a new folder
- Right-click on Desktop space->"Create New"->Folder (give it a name e.g. "work_files". Don't use characters other than A-Z a-z 0-9 _ and .
- Opening this plain-text file in a text editor
- "Save File As..." from the browser, and choose a folder to save the file in (e.g. the folder I've just created)
- Drag and drop the file onto the "kate" text-editor icon on your Desktop
- Alternatively, open "kate" by double-clicking on the icon, and choose File->Open
- What does the same file looks like in Word format?
- We view the file using "kate"
- With bioinformatic data ALWAYS save as plain text because:
- the files are smaller than e.g. Word/.doc files
- most software does not accept sequences/data uploaded in Word format
- Changing and then saving an edited version of a file in a text editor under a different name
- Open the file using "kate"
- File->Save As
- Then choose the folder to save into e.g. Desktop (choose from list on the left), and then select newly-created folder
- Where it says "location" add the new name for the file e.g. "changed.txt"
- Copying and pasting text using the mouse
- Select text using the left mouse button
- Once you've selected the text you want to copy
- EITHER: Right-mouse click and choose "Copy"
- OR: CTRL-C
- To paste, select location with the mouse and***EITHER: Right-Click and choose "Paste"
- Obtaining FASTA protein sequence format files from Entrez Protein using a simple text query
- We'll look for proteins similar to the human YES tyrosine kinase
- A simple way to do this - searching for records that have "
YES_*" (where "*" acts as a wildcard, matching any words that begin with "
yes_* mammal*") somewhere in the text annotation associated with the record
- (Note - you can and will later do something similar using SRS at either the EBI or EMBL.)
- Choose a format e.g. Display->GenPept
- EITHER: Send To->Text (and then copy-and-paste into editor
- OR: Send To->File
- We can quickly search the resulting database files to check out which organisms are represented using the text-search option in the web browser
- Loading sequences into CLUSTALX
- Click on Desktop CLUSTALX icon
- File->Load Sequences
- Loading sequences into a webpage e.g. MUSCLE multiple sequence alignment at EBI or BLAST at the NCBI) and downloading the resulting files
- Changing format of sequences
- Visit the BaliBase2 website
- Use the browser's "text search" to find the link to the
1aho reference set
- Download alignment in RSF or MSF format
- Look/view at RSF/MSF format in a text editor
- Load alignment into CLUSTALX
- Save alignment in FASTA format from CLUSTALX - File->Save Sequences As (choose FASTA)
- View this FASTA format file in a text editor
- Removing gaps from an alignment using CLUSTALX
- Often you want to query a tool with the sequence taken from an alignment - and you'd like to submit the sequence in an ungapped format
- To get this, we usually:
- select all the sequences we want to remove all gaps from (usually all the sequences) in ClustalX
- EITHER: Drag left-clicked mouse over sequence names while holding SHIFT
- OR: Edit->Select all Sequences
- Edit->Remove All Gaps
- Re-save the alignment in a FASTA format file WITH A DIFFERENT NAME TO THE FILE CONTAINING THE ORIGINAL ALIGNMENT!!! (otherwise you'll write over your original alignment that you may have spent hours preparing!)
- Open your webbrowser and visit the following link www.link.to.this.page (by cutting and pasting the name of the link into your browser using the mouse and CTRL-C and CTRL-V)
- Open a second browser window and search the internet to find the Gibson group's teaching pages (example query). Open these course pages in a new tab.
- Create a bookmark folder in your browser to store bookmarks to result pages
- Bookmark the Gibson course page and place this bookmark on the browser's "Bookmark's Bar"
Downloading biological sequences from the web
- Using the SRS server available either at the EBI or EMBL, carry out a query to identify a set of proteins similar to SRC_HUMAN
- Select only a few of these sequences (around 10), and download them, storing them locally in a file on your computer, in FASTA format
- Try the same exercise using ENTREZ at the NCBI
- Open the downloaded file in the simple text editor, and alter some of the identifiers of the sequences, saving the file under a different name
- Load the edited FASTA format file into CLUSTALX
Working with sequence formats
- Download the alignment for BaliBase reference set 1pfc_ref1
- Load the downloaded alignment into CLUSTALX and save in FASTA format
- Remove all gaps from the sequence using CLUSTALX and save the ungapped alignment in FASTA format into a file with a new name
- Upload the ungapped sequence alignment into the EBI MUSCLE multiple sequence alignment webserver. Run MUSCLE and download the result file onto your local computer, loading the result file into CLUSTALX.
- Carry out the same exercise, using cut and paste of the FASTA sequences from the text editor into the EBI MUSCLE webpage
Here are several files whose formating means CLUSTALX will not load them properly. Look at the files using your text editor, and try to identify the problem (by comparison with files that you previously downloaded which DO successfully load into CLUSTALX). Make appropriate changes to the file using a text editor to try and fix the problem, then test these ideas by attempting to load the sequences into CLUSTALX.
- The following set of proteins was obtained from a BLAST search with an E. coli alpha-2-macroglobulin-like sequence YFHM_ECOLI using a maximum E-value of 0.01 against the SwissProt database. The proteins are provided here in GenPept and, FASTA format. Write a short sed/awk/python/perl script to determine whether any of these sequences come from organisms other than bacteria.
- Using the GenPept file, write a script to extract a non-redundant list of all the PFAM families identified in these sequences
DON'T FORGET TO CHOOSE YOUR MEAL FOR TONIGHT AND THURSDAY DURING THE COFFEE BREAK!!!
- Provide experience and familiarity with using features of tools commonly used in sequence analysis
- Web browsers
- File system navigation
- Inspecting file contents (text editors)
- Provide experience and familiarity with some common sequence analysis tasks