Help Page of MedEvi
Image
Image
Image
Image Image
Image

Help Page

Image
Image


We would appreciate comments with enquiries that are not covered in the current help page.
  1. How does MedEvi work?
  2. What is the syntax of MedEvi queries?
  3. What are the options in the advanced search required for?
  4. What are query items?
  5. What are the co-occurrence results of a query?
  6. What are the result sets in the output pages?
  7. How are the highlighted terms selected?
  8. What concept variables can be used in the query?
  9. How do I cite MedEvi?

1. How does MedEvi work?

The concordancer works as follows:
  1. Parsing an input query string (see the second FAQ),
  2. Fetching MEDLINE abstracts indexed by Apache Lucene,
  3. Filtering out irrelevant abstracts according to given options (see the third FAQ),
  4. Searching for co-occurrence results of the query in relevant abstracts,
  5. Grouping the co-occurrence results according to the order of query items in the results (see the sixth FAQ), and
  6. Highlighting terms that frequently appear in the contexts of the query items (see the seventh FAQ).

2. What is the syntax of MedEvi queries?

The syntax of queries of MedEvi is based on the syntax of Lucene queries. However, we introduce concept variables which are explained in FAQ 8. Like Lucene, MedEvi allows both single terms and phrases, both wildcards (i.e. '*' and '?'), three boolean operators (i.e. 'AND', 'OR', 'NOT'), and all escaping special characters, but does not support field search, fuzzy search, proximity search, range search, and boosting.
However, MedEvi highly restricts grouping so that it only allows grouping for 'OR' operator (e.g. "(ada OR acrR) AND (activat* OR inhibit*)", but not grouping for 'AND' operator (e.g. "(ada AND activat*) OR (acrR AND inhibit*)"). This restriction enables the concordancer to divide a query string into a linear list of query items (e.g. "ada OR acrR", "activat* OR inhibit*"),to group co-occurrence results of the query according to the order of the query items in the results, and to visualize co-occurrence results in each result set by arranging each query item into a distinctive column (see Figure 1).
Thus, the syntax of MedEvi queries can be represented as follows:
  • Query ::= QueryItem | (QueryItem) | (QueryItem) AND Query
  • QueryItem ::= Term | Term OR QueryItem
  • Term ::= String | "String" | Variable
  • String ::= [A-Za-z0-9]+*? | [A-Za-z0-9]+*? String
  • Variable := "[antibody]" | "[cell]" | "[disease]" | "[drug]" | "[gene]" | "[molecularfunction]" | "[mutation]" | "[symptom]" | "[toxicity]" | "[tumor]"

Please see FAQ 8 for details of the Variables.

We can utilize MedEvi for various purposes, for example, for searching for partners of gene regulation and, specifically, of transcription regulation, as follows:
  1. "(ada OR acrR) AND (activat* OR inhibit*)"
  2. "crp AND (bind OR binds OR binding OR bound) AND DNA"

3. What are the options in the advanced search required for?

Users can change options of MedEvi in the advanced search. The descriptions for the options are provided below. Default values are given in parenthesis.
  • Context in chars (50): The number of characters of the leftmost context in a cooccurrence result (Left), that of context between two query items (Middle), and that of the rightmost context (Right).
  • Sentence boundary (false): If checked, then the results page shows only matching terms which co-occur within the boundary of a sentence.
  • Start/End pub. date (null): The date of publication where all fetched MEDLINE abstracts published before/after the date are filtered out
  • Maximum occurrences (1000): The maximum number of co-occurrence results fetched in the result set.
  • Maximum document hits (500): The maximum number of MEDLINE abstracts that are fetched with a given query string
  • Sort by (Relevance score): By default the results are sorted by document relevance. Other sorting options include: variations of query items found in the abstracts (Keyword), the leftmost contexts (Left context), the contexts between query items (Middle context), the rightmost contexts (Right context), the publication dates of the abstracts (Pub date)
  • Match SwissProt IDs (null): The SwissProt IDs where a relevant abstract should contain at least one gene/protein name of SwissProt entries of the IDs
  • Only search abstracts with PubMed IDs (null): The PubMed IDs only in which the concordancer looks for co-occurrence results of a query string

4. What are query items?

Query items are substrings of a query string that are concatenated with 'AND' operator. For example, a query string "(ada OR acrR) AND (activat* OR inhibit*)" has two query items (i.e. "ada OR acrR", "activat* OR inhibit*"). They are identical to strings matched to the variable QueryItem in the syntax representation of FAQ 2. Each query item corresponds to a set of alternative terms, concatenated with 'OR' operator, which will be arranged into a distinctive column in an output page.
5. What are co-occurrence results of a query?

A co-occurrence result of a query is a substring of a title or an abstract that contains all query items within the boundary set by the options (i.e. Context in chars), where if a query item is a collection of terms concatenated with 'OR' operator, one and only one of them should be included into the result.
6. What are the result sets in the output pages?

The co-occurrence results of a given query string are grouped according to the order of query items. Each result set contains occurrences whose orders of query items are identical. Since all co-occurrence results in a result set have the same order of query items, the query items are arranged into distinctive columns as shown in Figure 1.



7. How are the highlighted terms selected?

The concordancer collects single words (nouns, verbs, adjectives, and adverbs) and phrases (noun+noun, adjective+noun, etc) from contexts of query items, scores them with a statistical score method called Z Score, and classifies them into three classes: top, middle, and low classes.
8. What concept variables can be used in MedEvi queries?


The variables refer to semantic types of prevailing biomedical entities, which are originally proposed for the Genomics Track of TREC 2007 (http://ir.ohsu.edu/genomics/2007protocol.html). They can be used as a QueryItem, which virtually consists of all terms belonging to a certain type of biomedical entities. For instance, the query "cancer AND [gene]" will fetch pairs of the string "cancer" and any gene/protein name. Note that the usage of the variables has one condition that the queries should have at least one constant QueryItem (e.g. "cancer"), which are practically used to retrieve documents containing the non-variable and to identify terms of the type referred to by the variable. Also, note that a query can have more than one variables (e.g. "[disease] AND centrosom* AND [gene]") as long as it has at least one constant. The table below defines the set of posible variables and the terminological resources used to identify them:

VariableDescriptionSource
[antibody]AntibodiesMeSH
[cell]Cell or Tissue typesMeSH
[disease]DiseasesMeSH
[drug]DrugsDrugBank
[gene]Genes or ProteinsUniProtKB/Swiss-Prot
[molecularfunction]Molecular functionsGene Ontology
[mutation]MutationsMeSH
[symptom]Signs or SymptomsMeSH
[toxicity]ToxicitiesUMLS
[tumor]Tumor typesMeSH


Further details of the term recognition methods used can be found in:
A Jimeno, P Pezik, and D Rebholz-Schuhmann. (2007) Information retrieval and information extraction in TREC Genomics 2007, In proceedings of the TREC Genomics competition 2007, Washington, U.S.A.

9.How do I cite MedEvi?

To reference MedEvi, please cite the following paper:

MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline. Jung-jae Kim, Piotr Pezik, Dietrich Rebholz-Schuhmann (2008) Bioinformatics 24(11):1410-1412 (Abstract).

Image Image Image Image
Image Image
Image
Which EBI biological databases are available and how do I access them? EBI Site Map