Patent Data Resources

Patent data resources at the EBI contain patent abstracts, patent chemical compounds, patent sequences and patent equivalents. There are various ways of accessing and searching the patent data.

Resources Description Access
Patent Abstracts Patent abstracts contain the biology-related abstracts of patent applications derived from data products of the European Patent Office (EPO). Patent documents from Europe (EP), USA (US) and World (WO) are included. Patent abstracts can be accessed via the EBI Search and the Europe PUbMed Central. EBI Search
Europe PUbMed Central
Patent Chemical Compounds Chemical compounds extracted from patents are available in the SureChEMBL database.  SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text- and image-mining pipeline on a daily basis. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Direct downloads of patent-compound associations are also available on our FTP page. 


Patent Chemical Compounds Patent chemical compounds are available in the ChEBI database which is a dictionary of molecular entities focused on ‘small’ chemical compounds. You can search the patent chemical compounds using the ChEBI Advanced Search page by narrowing down your search to the Patent Database. ChEBI
ChEBI Advanced
Patent Sequences Multiple sets of patent sequences are available at EBI.
  1. Patent proteins cover sequences of EPO (European Patent Office) proteins, JPO (Japan Patent Office) proteins, KIPO (Korean Intellectual Property Office) proteins and USPTO (United States Patent and Trademark Office) proteins.
  2. Patent nucleotides contain the patent class data in the EMBL-Bank.
  3. Non-redundant patent sequences consist of 2 levels databases. Level-1 non-redundant patent sequences are 100% identical over the same length; Level-2 non-redundant patent sequences are identical and belong to a same patent family (a same invention).
Patent proteins
Patent nucleotides
Non-redundant patent sequences
Patent Equivalent Data A "patent family" can be defined as all patent equivalents for a single invention. All of the published patent applications from various countries and the subsequent granted patents on an invention are commonly referred to as patent equivalents. They are not "true equivalents" in that each country may have different regulations for filing and different interpretations of the invention. It may include multiple patents in some countries because of differences in patent laws (e.g., how much new technology can be included in a single patent). Equivalent report