On-Line Learning of Undirected Sparse n-grams
From Tetherless World Wiki
Citation: Karl Pfleger. (2002) On-Line Learning of Undirected Sparse n-grams. In KSL-02-02, 2002.
| Publication techreport ( Edit ) | |
| type | Technical Report |
| bibtype | techreport |
| Bibtex basics | |
| author | Karl Pfleger |
| title | On-Line Learning of Undirected Sparse n-grams |
| number | KSL-02-02 |
| institution | Knowledge Systems, AI Laboratory |
| year | 2002 |
| Bibtex more | |
| Access Paper | |
| abstract | Intelligent autonomous agents in complex worlds must be able to identify frequent patterns in the massive data streams of their lives. These patterns serve as general foundations for inferring unseen data and as building blocks for higher-level knowledge representation. We address this need with systems based on n-grams, simple learning models considered state-of-the-art in many sequential domains. Standard n-grams suffer from an exponential number of parameters in the model width, n. We introduce an undirected sparse n-gram model, which stores probability estimates only for some(frequent) n-tuples from the unconditional joint distribution and which outperforms narrower exhaustive n-grams with the same number of parameters.Techniques exist for pruning exhaustive n-grams after training but are inappropriate when on-line learning is required (as when data overwhelms storage capacity). We present an unsupervised, on-line learning algorithm for sparse n-grams which induces both model parameters and structure (which patterns to include) despite the challenging a priori uncertainty of which patterns are frequent. The resulting dynamic pattern inclusion complicates probability estimates and inference, but we present several solutions, including Bayesian and EM approaches. We also discuss multiwidth combinations of sparse n-grams, which are capable of hierarchically representing repeated substructure. They learn on-line with no fixed bound on pattern size, but still using less memory than that taken by the training data. Thus, they learn frequent patterns from data streams incrementally and at multiple size scales, without fitting parameters for all possible patterns and without remembering or iterating over training data. |
| KSL Technical Report ID: KSL-02-02 |
Facts about On-Line Learning of Undirected Sparse n-gramsRDF feed
| Abstract | Intelligent autonomous agents in complex w … Intelligent autonomous agents in complex worlds must be able to identify frequent patterns in the massive data streams of their lives. These patterns serve as general foundations for inferring unseen data and as building blocks for higher-level knowledge representation. We address this need with systems based on n-grams, simple learning models considered state-of-the-art in many sequential domains. Standard n-grams suffer from an exponential number of parameters in the model width, n. We introduce an undirected sparse n-gram model, which stores probability estimates only for some(frequent) n-tuples from the unconditional joint distribution and which outperforms narrower exhaustive n-grams with the same number of parameters.Techniques exist for pruning exhaustive n-grams after training but are inappropriate when on-line learning is required (as when data overwhelms storage capacity). We present an unsupervised, on-line learning algorithm for sparse n-grams which induces both model parameters and structure (which patterns to include) despite the challenging a priori uncertainty of which patterns are frequent. The resulting dynamic pattern inclusion complicates probability estimates and inference, but we present several solutions, including Bayesian and EM approaches. We also discuss multiwidth combinations of sparse n-grams, which are capable of hierarchically representing repeated substructure. They learn on-line with no fixed bound on pattern size, but still using less memory than that taken by the training data. Thus, they learn frequent patterns from data streams incrementally and at multiple size scales, without fitting parameters for all possible patterns and without remembering or iterating over training data. membering or iterating over training data. |
| Author | Karl Pfleger + |
| Bibtype | techreport + |
| Has author | Karl Pfleger + |
| Has identifier | KSL-02-02 + |
| Has publishing details | 2002 + |
| Has title | On-Line Learning of Undirected Sparse n-grams + |
| Has where published | KSL-02-02 + |
| Has year | 2002 + |
| Institution | Knowledge Systems, AI Laboratory + |
| Ksl tr id | KSL-02-02 + |
| Number | KSL-02-02 + |
| Process note | NO + |
| Title | On-Line Learning of Undirected Sparse n-grams + |
| Year | 2002 + |
Resource > Thing > Entity > Document > Scientific Document > Publication
Resource > Thing > Entity > Document > Scientific Document > Publication > Technical Report
Resource > Thing > Entity > Document > Scientific Document > Publication > Technical Report > KSL Technical Report
