The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies. PIR has provided many protein databases and analysis tools to the scientific community, including the PIR-International Protein Sequence Database (PSD) of functionally annotated protein sequences. Because the protein sequence database activities of PIR, Swiss-Prot, and TrEMBL are now combined to produce UniProt databases, the PIR-PSD is no longer being updated. Release 80.00 (31-Dec-2004), the final release for PIR-PSD, is available for ftp download and online searching at the PIR website. All PIR-PSD entries have been merged into the UniProt databases and PIR-PSD identifiers can be used to retrieve and track these sequences in either UniProtKB or UniParc. PIR contributes to the functional annotation of UniProtKB protein sequences. Major ongoing annotation efforts include curation of protein families in the PIRSF (SuperFamily) system, definition of classification-driven rules for the propagation of position-specific features, protein names, and GO terms to protein entries, as well as bibliographic attribution of experimental features. A new PIR resource iProLINK provides multiple annotated literature corpora to facilitate text mining research in the area of literature-based database curation, named entity recognition, and protein ontology development. PIR continues to enhance iProClass, an integrated database of protein family, function, and structure information and a hub for mapping and integrating protein data from multiple sources, and maintains PIR-NREF, a non-redundant reference database of protein sequences. The PIR web site connects data mining and sequence analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and annotation text searches, and sorting and visual exploration of search results. The FTP site provides free download for database releases.
protein sequence general sequence sequence analysis