搜档网
当前位置:搜档网 › 机器学习_Yeast Data Set(酵母数据集)

机器学习_Yeast Data Set(酵母数据集)

机器学习_Yeast Data Set(酵母数据集)
机器学习_Yeast Data Set(酵母数据集)

Yeast Data Set(酵母数据集)

数据摘要:

Predicting the Cellular Localization Sites of Proteins

中文关键词:

多变量,分类,UCI,酵母,

英文关键词:

Multivariate,Classification,UCI,Yeast,

数据格式:

TEXT

数据用途:

This data set is used for classification.

数据详细介绍:

Yeast Data Set Abstract: Predicting the Cellular Localization Sites of Proteins

Source:

Creator and Maintainer:

Kenta Nakai

Institue of Molecular and Cellular Biology

Osaka, University

1-3 Yamada-oka, Suita 565 Japan

nakai '@' imcb.osaka-u.ac.jp

http://www.imcb.osaka-u.ac.jp/nakai/psort.html

Donor:

Paul Horton (paulh '@' https://www.sodocs.net/doc/d119024136.html,)

Data Set Information:

Predicted Attribute: Localization site of protein. ( non-numeric ).

The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset.

Reference: "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991.

Reference: "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.

Attribute Information:

1. Sequence Name: Accession number for the SWISS-PROT database

2. mcg: McGeoch's method for signal sequence recognition.

3. gvh: von Heijne's method for signal sequence recognition.

4. alm: Score of the ALOM membrane spanning region prediction program.

5. mit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins.

6. erl: Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute.

7. pox: Peroxisomal targeting signal in the C-terminus.

8. vac: Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins.

9. nuc: Score of discriminant analysis of nuclear localization signals of nuclear and

non-nuclear proteins.

Relevant Papers:

Paul Horton & Kenta Nakai, "A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins", Intelligent Systems in Molecular Biology, 109-115. St. Louis, USA 1996.

[Web Link]

The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset:

Kenta Nakai & Minoru Kanehisa, "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", PROTEINS: Structure, Function, and Genetics 11:95-110, 1991.

Kenta Nakai & Minoru Kanehisa, "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Genomics 14:897-911, 1992.

[Web Link]

数据预览:

点此下载完整数据集

相关主题