Information Retrieval
Instructor: Muh-Chyun Tang 
Dept. of Library and Information Science,
Nationa Taiwan University

Course description
The course is designed to provide an introduction to the use, design and evaluation of information (IR) systems. It covers major components in the IR process such as search strategies, indexing, IR models and IR evaluation. Students will also acquire hand-on experiences with IR evaluation and designing a digital library system. Special attention will be given to  the comparision  of  different  indexing  methods and IR models and how they might be complement each other.

Course objectives
After this class, you are able to
1. Have better search skills (for both professional databases and web search engines)
    Query construction and use of search strategies
2. Acquire skills in IR evaluation
3. Basic understand of how search engine works
    Automatic indexing
    Ranking algorithms
4. Design and implement an online digital collection using content management system

Schedule

Week  Topic Note
W1
9/16
Introduction to syllabus
History of IR; data vs. information retrieval
Information access: browsing, seaching, and recommendation
W2
9/23
Advanced search with PubMed; introduction to search features with PubMed/Ovid/Ebsco/EMBASE
Discussion of your search demo project
O'Connor, p.61-65 ;  Bell, p. 9-18;
browse PubMed help and PubMed tutorial (see references)
W3
9/30,
Search strategies tactics ; PICO;
Camtasia demo (laptop)
Discussion of your search demo project
(Hersh, 2003).p.191-194 (subheading and explode);


W4
10/7

Indexing exhaustivitiy vs. specificity
Automatic index basic (text analysis, term weighting)
Lancaster 2003 p. 252-258:
Natural Language vs. controlled vocabulary; Salton & McGill, p.59-63;
Soergel, 1985,P.328-338. 
W5
10/14
Search feature/command demo due

Apply ctext.org account
W6
10/21
Lab  or  laptop 
TF*IDF tool
ctext


W7
10/28

IR evaluation;
Discussion of your second (IR evalaution project)
Hersh, pp.95-113, relevance based evaluation
W8
11/04
IR models I: Boolean; term weighting and vector space model; similarity measures;
Discussion of your IR evalaution project
Homework/automatic indexing due
Hersh p. 270-272; appendix I-III
(Vector space);
Wickens; p.12-13 (basic vector operation)
PubMed demo presentation due
W9
11/11
Relevance feedback and query expansion;
Discussion of your IR evaluation project

Hersh, pp. 184-190 (relevant feedback); 
Relevance feedback in class exercise
W10
11/18
Simulated search evaluation presentation
W11
11/25
Facet analysis and information architecture
Wordpress demo at computer lab
Discussion of your DL project


ªL¶²瑶, 2006 Facet structure
W12
12/02
IR model II: Probability model
Discussion of your DL project
Manning et.al. (2008). 201-211



W13
12/09
IR model: probablitic and language models
Discussion of your DL project
Query likelihood  in  class exercise
W14
12/16
Lab session with your DL project

W15
12/23
DL assignment presentation   DL assignment presentation due
W16
12/30
Web search and link structure,

(Easley, Kleinberg, 2010) Link Analysis and Web search (351-365)
W17
01/06

Final review
    
W18
01/13
Final exam


Assignments and Grading

Attendance to all class sessions is mandatory. Your grade will be judged based on your participation, homeworks, and in-class assignments. For your group projects will be judged by both the instructor

A. Homework and participation (10%)
There will be one homework on the practice of text tokenization and TFxIDF calculation.

B. Group projects: Students will form into groups of 3 to 5  to conduct 3 group projects. For each project, besides the group reports,
*** each group member should prepare an one to two paragraphs personal report explaining your contributions and what you have learned from the assignments.

1. Search feature/command demo (accounts for 10% of your final grade)
create and present a video demo that explains a search tactics or function available with Ovid/medline, Ebsco/Medline, Embase, or Scopus.
See example: PubMed clinical queries

2. Simulated literature search evaluation (30%)
a. To obtain the search topics, interview two users (preferably graduate students or faculty members in sciences), each on one research topic they are interested in.
Collect from each user:  a search statement and associated query terms that you both agree best represent her information
need. Also try to characterize her information need using attributes such as "topic familiarity" and "uncertainty".
b. For each search topic, submit the queries on the user¡¦s behalf to Google Scholar , Microsoft academic search, Semantic scholar or other major citation databases (e.g. Scopus, WOS). Collect the first 30 links from each of the two returned sets.
c. Find out the degree of overlap among the two returned sets. 
d. Mix the non-duplicative (30X2, maximum) links together and strip the graphic cues. 
This is done so that the user will not be able to tell which search engine each link is from. 
e. For each link, marks its original and rank position. 
f.  Present the URLs in Microsoft Word files that allow the users to examine the actual webpage by clicking 
on its hyperlink. Ask them to judge the relevance (topical as well as situational) of the pages based on a 0-4 scale (0 stands for not relevant at all; 4, very relevant).
g. Create an EXCEL or SPSS data file to input the relevance scores. 
h. Compare the performance of the search engines based on 1) Mean Average Precision, 
2) CG and DCG
I. Next submit the same query to Scopus and Web of Science and conduct a domain analysis, in which you will identify the publication trends, major authors, institues, journals, countries, and disciplines that have published in this area.

3. Digital library construction (30%) 
Each group will build a functional online digital library collaboratively using WordPress, Joomla , or Greenstone digital library (GSDL) open source content management system.
DL_project_exampl1  DL_project_example2   DL_project_example3
The project consists of three components: the implantation of a digital collection on the topic of your own choosing, a written report (5-6 pages) and an oral presentation of the project. 

The digital collection should include:
a. A minimum of 70 documents representative of different document formats such as pdf, word, and html. 
b. An index structure that enables browsing of the collection
c. The provision of faceted and fielded search
The written report (4-6 pages) should:
d. Explain the aim, purpose, sources, intended users and their information needs of the collection.
It is better that you come up with an institutional context (real or imaginary) for the use of the collection.  
e. Define your selection and indexing policies (human and machine indexing components; metadata structure) based on the aim and purpose stated above. 
f.  Include a graphic presentation of the browsable index structure and the rationales behind your design
(i.e. explain why you choose certain browsable facets and searchable fields to represent your collection) 

C. Final exam (30%)
The exam is based on the lecture notes and readings, a review will be given before the exam to help you prepare for it.
 
References
    Bell, S. S.(2006). Librarian's guide to online searching.
    Bhavani, S. K. K. Drabenstott, D. Radev (2000). Towards a unified framework of IR tasks and strategies.
    Manning, Raghavan, Schutze (2008). Introduction to Informaiton Retrieval. Cambridge.
Chowdhury, G.G. (2004), Introduction to modern information retrieval. London: Facet publishing.
    William, H. R. (1996). Information retrieval : a health and biomedical perspective. New York: Springer-Verlag New York, Inc. 
    Salton & McGill (1983). Introduction to modern information retrieval. McGraw-Hill..
Growssman, and Frieder (2004). Information retrieval: algorithms and Heuristics
    Belew, Richard K. (2000). Finding out about: a cognitive perspective on search engine technology and the WWW. Cambridge: Cambridge University Press. 
    O'Connor, B. (1996). Explorations in indexing and abstracting.
Lancster, 2003. Indexing and abstracting in theory and practice.
    Evaluation of Web-Based Search Engines Using User-Effort Measures. Availableonline: http://libres.curtin.edu.au/libres13n2/tang.htm
    Ian H. Witten, David Bainbridge (2003). How to Build a Digital Library, Amsterdam: Morgan Kaufmann Publishers.   
Janach, D., M. Zanker, A. Felfernig, G. Friedrich (2011). Recommender systems: an introduction. Cambridge.
Soergel (1985). Organizing information: principles of data base and retrieval systems Academic Press Professional, Inc. San Diego, CA.
Camtasia  
Download ; Video tutorial
PubMed
PubMed tutorials, available http://www.nlm.nih.gov/bsd/disted/pubmed.html
OVID SP tutorial from Yale University Library
SCOPUS tutorial
PubMed help Available at Online_help
http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/helppubmed/pubmedhelp.pdf
Greenstone: The software can be downloaded at
http://www.greenstone.org/cgi-bin/library?e=p-en-home-utfZz-8&a=p&p=download
Manuals (the ¡§User¡¦s guide¡¨ is most relevant to our purpose) http://greenstone.sourceforge.net/wiki/index.php/Manual