QuickSearch:   Number of matching entries: 0.

AuthorTitleYearJournal/ProceedingsReftypeDOI/URL
Akutsu, T. Approximate string matching with don't care characters 1995 Information Processing Letters   article URL  
Abstract: This paper presents parallel and serial approximate matching algorithms for strings with don't care characters. They are based on Landau and Vishkin's approximate string matching algorithm and Fisher and Paterson's exact string matching algorithm with don't care characters. The serial algorithm works in O(radickmn log¦Sgr¦ log2 m/k log log m/k) time, and the parallel algorithm works in O(k log m) time using O(radicm/kn log ¦Sgr¦ log m/k log log m/k) Processors on a CRCW-PRAM, where n denotes the length of a text string, m denotes the length of a pattern string, k denotes the maximum number of differences, and sum denotes the alphabet (i.e. the set of characters). Several extensions are also described.
BibTeX:
@article{Aku95,
  author = {T. Akutsu},
  title = {Approximate string matching with don't care characters},
  journal = {Information Processing Letters},
  year = {1995},
  volume = {55},
  number = {5},
  pages = {235--239},
  url = {http://www.springerlink.com/content/f81x53t725041651/}
}
Akutsu, T. Approximate string matching with don't care symbols 1994 Proc. 5th Combinatorial Pattern Matching Conference(Asilomar, CA 1994   inproceedings  
BibTeX:
@inproceedings{Aku94,
  author = {T. Akutsu},
  title = {Approximate string matching with don't care symbols},
  booktitle = {Proc. 5th Combinatorial Pattern Matching Conference(Asilomar, CA 1994},
  publisher = {#svb#},
  year = {1994},
  number = {807},
  pages = {240--249}
}
Allison, L., Powell, D. & Dix, T. I. Compression and approximate matching. 1999 The Computer Journal   article URL  
Abstract: A population of sequences is called non-random if there is a statistical model and an associated compression algorithm that allows members of the population to be compressed, on average. Any available statistical model of a population should be incorporated into algorithms for alignment of the sequences and doing so changes the rank order of possible alignments in general. The model should also be used in deciding if a resulting approximate match between 2 sequences is significant or not. It is shown how to do this for 2 plausible interpretations involving pairs of sequences that might or might not be related. Efficient alignment algorithms are described for quite general statistical models of sequences. The new alignment algorithms are more sensitive to what might be termed "features" of the sequences. A natural significance test is shown to be rarely fooled by apparent similarities between 2 sequences that are merely typical of all or most members of the population, even unrelated members.
BibTeX:
@article{APD99,
  author = {L. Allison and D. Powell and T. I. Dix},
  title = {Compression and approximate matching.},
  journal = {The Computer Journal},
  year = {1999},
  volume = {42},
  number = {1},
  pages = {1--10},
  url = {http://comjnl.oxfordjournals.org/cgi/content/abstract/42/1/1}
}
Alves, C., Caceres, E. & Song, S. A BSP/CGM algorithm for the all-substrings longest common subsequence problem 2003 Parallel and Distributed Processing Symposium, 2003. Proceedings. International   inproceedings DOI  
BibTeX:
@inproceedings{Alves2003,
  author = {Alves, C.E.R. and Caceres, E.N. and Song, S.W.},
  title = {A BSP/CGM algorithm for the all-substrings longest common subsequence problem},
  booktitle = {Parallel and Distributed Processing Symposium, 2003. Proceedings. International},
  year = {2003},
  pages = {8pp.},
  doi = {http://dx.doi.org/10.1109/IPDPS.2003.1213150}
}
Amir, A. Multidimensional and relative approximate pattern matching 1987   techreport  
BibTeX:
@techreport{Ami87,
  author = {A. Amir},
  title = {Multidimensional and relative approximate pattern matching},
  year = {1987},
  number = {UMIACS-TR-87-41 and CS-TR-1912}
}
Amir, A. & Farach, M. Adaptive dictionary matching 1991 Foundations of Computer Science, 1991. Proceedings., 32nd Annual Symposium on   inproceedings DOIURL  
Abstract: Semiadaptive and fully adaptive dictionary matching algorithms are presented. In the fully adaptive algorithm, the dictionary is processed in time O(|D| log |D|). Inserting a new pattern Pk+1 into the dictionary can be done in time O |PK+1| log |D|). A dictionary pattern can be deleted in time O(log |D|). Text scanning is accomplished in time O(|T| log |D|). Also presented is a parallel version of the algorithm with optimal speedup for the dictionary construction and pattern addition phase and a logarithmic overhead in the text scan phase. The method used incorporates a new way of using suffix trees as well as a new data structure in which the suffix tree is embedded for the sequential algorithm
BibTeX:
@inproceedings{Amir1991,
  author = {Amir, A. and Farach, M.},
  title = {Adaptive dictionary matching},
  booktitle = {Foundations of Computer Science, 1991. Proceedings., 32nd Annual Symposium on},
  year = {1991},
  pages = {760--766},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=185445},
  doi = {http://dx.doi.org/10.1109/SFCS.1991.185445}
}
Amir, A., Farach, M. & Muthukrishnan, S. Alphabet dependence in parameterized matching 1994 Information Processing Letters   article URL  
Abstract: The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set ?. A recently introduced model is that of parameterized pattern matching. The main motivation for this scheme lies in software maintenance where program fragments are considered «identical» even if variables names are different. Besides the fixed symbols from ?, strings under this model have additional symbols from a variable set ? and occurrences of one string in the other are sought, where renaming of the variables from ? is allowed in a match. In this paper we provide an algorithm to find all occurrences of a pattern string of length m in a text string of length n under the parameterized pattern matching model. Our algorithm takes time O(n log pi), where pi = min(m, |Lamda|), independent of |Sigma|. Our algorithm is optimal since we show that this depedence on |Lambda| is inherent to any algorithms for this problem in the comparison model.
BibTeX:
@article{AFM94,
  author = {A. Amir and M. Farach and S. Muthukrishnan},
  title = {Alphabet dependence in parameterized matching},
  journal = {Information Processing Letters},
  year = {1994},
  volume = {49},
  number = {3},
  pages = {111--115},
  url = {http://citeseer.ist.psu.edu/35642.html}
}
Apostolico, A., Bock, M. & Lonardi, S. Linear global detectors of redundant and rare substrings 1999 Data Compression Conference, 1999. Proceedings. DCC '99   inproceedings DOIURL  
Abstract: We show here that under several accepted measures of deviation from expected frequency, the candidates over- or underrepresented words are restricted to the O(n) words that end at internal nodes of a compact suffix tree, as opposed to the ?(n 2 ) possible substrings. This surprising fact is a consequence of properties in the form that if a word that ends in the middle of an arc is, say, overrepresented, then its extension to the nearest node of the tree is even more so. Based on this, we design global linear detectors of favored and unfavored words for our probabilistic framework, and display the results of some preliminary that apply our constructions to the analysis of genomic sequences.
BibTeX:
@inproceedings{Apostolico1999,
  author = {Apostolico, A. and Bock, M.E. and Lonardi, S.},
  title = {Linear global detectors of redundant and rare substrings},
  booktitle = {Data Compression Conference, 1999. Proceedings. DCC '99},
  year = {1999},
  pages = {168--177},
  url = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/dcc/1999/0096/00/0096toc.xml&DOI=10.1109/DCC.1999.755666},
  doi = {http://dx.doi.org/10.1109/DCC.1999.755666}
}
Apostolico, A., Erdős, P. & Lewenstein, M. Parameterized matching with mismatches 2007 Journal of Discrete Algorithms   article DOIURL  
Abstract: The problem of approximate previous termparameterized stringnext term searching consists of finding, for a given text t=t1t2…tn and pattern p=p1p2…pm over respective alphabets ?t and ?p, the injection ?i from ?p to ?t maximizing the number of matches between ?i(p) and titi+1…ti+m?1 (i=1,2,…,n?m+1). We examine the special case where both previous termstringsnext term are run-length encoded, and further restrict to the case where one of the alphabets is binary. For this case, we give a construction working in time O(n+(rp×rt)?(rt)log(rt)), where rp and rt denote the number of runs in the corresponding encodings for y and x, respectively, and ? is the inverse of the Ackermann's function.
BibTeX:
@article{apostolico2007pmm,
  author = {Apostolico, A. and Erd{\H{o}}s, P.L. and Lewenstein, M.},
  title = {Parameterized matching with mismatches},
  journal = {Journal of Discrete Algorithms},
  publisher = {Elsevier},
  year = {2007},
  volume = {5},
  number = {1},
  pages = {135--140},
  url = {http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B758J-4JW128C-6&_user=10&_coverDate=03%2F31%2F2007&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=a99cf64e4a5c764a08703303ef4b3602},
  doi = {http://www.sinab.unal.edu.co:2053/science?_ob=ArticleURL\&_udi=B758J-4JW128C-6\&_user=1998314\&_coverDate=03%2F31%2F2007\&_alid=649864815\&_rdoc=3\&_fmt=full\&_orig=search\&_cdi=12928\&_sort=d\&_docanchor=\&view=c\&_ct=32\&_acct=C000055778\&_version=1\&_urlVersion=0\&_userid=1998314\&md5=84cbc18b5d3366bc07d6a75ac0e71766}
}
Apostolico, A. & Galil, Z. Pattern Matching Algorithms 1997   book URL  
BibTeX:
@book{apostolico1997pma,
  author = {Apostolico, A. and Galil, Z.},
  title = {Pattern Matching Algorithms},
  publisher = {Oxford University Press US},
  year = {1997},
  url = {http://books.google.com.co/books?id=mFd_grFyiT4C&pg=PA341&dq=stringology&sig=0nDT-tJo0NtDFzfJ46qK41P85Lo#PPP12,M1}
}
Apostolico, A. & Giancarlo, R. Periodicity and Repetitions in Parameterized Strings 2005 Electronic Notes in Discrete Mathematics   article DOIURL  
Abstract: One of the most beautiful and useful notions in the Mathematical Theory of Strings is that of a Period, i.e., an initial piece of a given string that can generate that string by repeating itself at regular intervals. Periods have an elegant mathematical structure and a wealth of applications [F. Mignosi and A. Restivo, Periodicity, Algebraic Combinatorics on Words, in: M. Lothaire (Ed.), Cambridge University Press, Cambridge, pp. 237–274, 2002]. At the hearth of their theory, there are two Periodicity Lemmas: one due to Lyndon and Schutzenberger [The equation aM = bNcP in a free group, Michigan Math. J. 9 (1962) 289–298], referred to as the Weak Version, and the other due to Fine and Wilf [Uniqueness theorems for periodic functions, Proc. Amer. Math. Soc. 16

(1965) 109–114]. In this paper, we investigate the notion of periodicity and the closely related one of repetition in connection with parameterized strings as introduced by Baker [Parameterized pattern matching: algorithms and applications, J. Comput. System Sci. 52(1) (1996) 28–42; Parameterized duplication in strings: algorithms and an application to software maintenance, SIAM J.

Comput. 26(5) (1997) 1343–1362]. In such strings, the notion of pairwise match or “equivalence” of symbols is more relaxed than the usual one, in that it rests on some mapping, rather than identity, of symbols. It seems natural to try and extend notions of periods and periodicities to encompass parameterized strings. However, we know of no previous attempt in this direction. Our preliminary investigation yields results as follows. For periodicity, we get (a) a generalization of theWeakVersion of the Periodicity Lemma for parameterized strings, showing that it is essential that the two mappings inducing the periodicity must commute; (b) a proof that an analogous of the Lemma by Fine andWilf [Uniqueness theorems for periodic functions, Proc. Amer. Math. Soc. 16 (1965) 109–114] cannot hold for parameterized strings, even if the mappings inducing the periodicity “commute”, in a sense to be specified below; (c) a proof that parameterized strings over an alphabet of at least three letters may have a set of periods which differ from those of any binary string of the same length—whereby the parameterized analog of a classic result by Guibas and Odlyzko [String overlaps, pattern matching, and nontransitive games, J. Combin. Theory Ser. A 30 (1981) 183–208] cannot hold. We also derive necessary and sufficient conditions characterizing parameterized repetitions, which are patterns of length at least twice that of the period, and show how the notion of root differs from the standard case, and highlight some of the implications on extending algorithmic criteria previously adopted for string searching, detection of repetitions and the likes. Finally, as a corollary of our main results, we also show that binary parameterized trings behave much in the same way as non-parameterized ones with respect to periodicity and repetitions, while there is a substantial difference for strings over alphabets of at least three symbols.

BibTeX:
@article{apostolico2005par,
  author = {Apostolico, A. and Giancarlo, R.},
  title = {Periodicity and Repetitions in Parameterized Strings},
  journal = {Electronic Notes in Discrete Mathematics},
  publisher = {Elsevier},
  year = {2005},
  volume = {21},
  pages = {227--230},
  url = {http://www.sinab.unal.edu.co:2053/science?_ob=ArticleURL&_udi=B6TYW-4PHJGJ5-3&_user=1998314&_coverDate=08%2F27%2F2007&_alid=649864815&_rdoc=2&_fmt=full&_orig=search&_cdi=5629&_sort=d&_docanchor=&view=c&_ct=32&_acct=C000055778&_version=1&_urlVersion=0&_userid=1998314&md5=c96eab64330fdc882607ac317392b8ad},
  doi = {http://www.sinab.unal.edu.co:2053/science?_ob=ArticleURL\&_udi=B6TYW-4PHJGJ5-3\&_user=1998314\&_coverDate=08%2F27%2F2007\&_alid=649864815\&_rdoc=2\&_fmt=full\&_orig=search\&_cdi=5629\&_sort=d\&_docanchor=\&view=c\&_ct=32\&_acct=C000055778\&_version=1\&_urlVersion=0\&_userid=1998314\&md5=c96eab64330fdc882607ac317392b8ad}
}
Apostolico, A., Landau, G. & Skiena, S. Matching for run-length encoded strings 1997 Compression and Complexity of Sequences 1997. Proceedings   inproceedings DOI  
BibTeX:
@inproceedings{Apostolico1997,
  author = {Apostolico, A. and Landau, G.M. and Skiena, S.},
  title = {Matching for run-length encoded strings},
  booktitle = {Compression and Complexity of Sequences 1997. Proceedings},
  year = {1997},
  pages = {348--356},
  doi = {http://dx.doi.org/10.1109/SEQUEN.1997.666929}
}
ASCACÍBAR, F. J. M. D. P. DOCTOR EN INGENIERÍA INDUSTRIAL 2003 School: UNIVERSIDAD DE LA RIOJA   phdthesis  
Abstract: La búsqueda constante por aumentar la calidad del producto fabricado y reducir los gastos ocasionados por fallos en el proceso de fabricación, son requisitos fundamentales en una planta industrial. Cada vez se buscan métodos y herramientas más eficientes que puedan servir de ayudar en estas tareas. Un ejemplo de ellas, es el Data Mining. Las herramientas de Data Mining y estadística multivariante son útiles cuando se dispone de un volumen de históricos importante y de buena calidad. El análisis de los históricos con estas nuevas técnicas puede ayudar en múltiples facetas: control de calidad, identificación de sistemas, determinación de causas en fallos del proceso, detección de anomalías, prevención de fallos, modelización de sistemas, obtención de reglas y patrones de comportamiento, búsqueda de causas y relaciones entre variables, etc. En esta tesis, se presenta una aplicación de la metodología CRISP-DM [CRI00], para la mejora, dentro de una línea de galvanizado en continuo de bobinas de acero, del tratamiento térmico de la lámina de acero antes de su paso por la inmersión del baño de zinc líquido. El control y planificación de este proceso de recocido es clave para la mejora de las propiedades de la banda y del recubrimiento. A lo largo de esta tesis, se muestran los pasos que han llevado a desarrollar:

Ø Una metodología que, mediante el uso de algoritmos genéticos y redes neuronales, permite la optimización de las curvas de consigna del horno y velocidades de la banda entre bobinas de diferentes dimensiones, reduciendo la diferencia de temperatura esperada de la banda y la real.

Ø Un clasificador de bobinas según la composición de los aceros que ha resultado ser una excelente herramienta para la predicción de roturas de banda o para detectar otro tipo de problemas debidos a bobinas con aceros anómalos.

Ø Un sensor-software que proyecta los puntos de operación del horno y que puede ayudar considerablemente en las tareas de visualización de la tendencia de los puntos de operación del horno.

BibTeX:
@phdthesis{ASCACIBAR2003,
  author = {FRANCISCO JAVIER MARTÍNEZ DE PISÓN ASCACÍBAR},
  title = {DOCTOR EN INGENIERÍA INDUSTRIAL},
  school = {UNIVERSIDAD DE LA RIOJA},
  year = {2003}
}
Ayala-Rincon, M. & Conejo, P. A linear time lower bound on updating algorithms for suffix trees 1998 String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings   inproceedings DOIURL  
Abstract: Suffix trees are the fundamental data structure of combinatorial pattern matching on words. Suffix trees have been used in order to give optimal solutions of a great variety of problems on static words, but for practical situations, such as in a text editor, where the incremental changes of the text make dynamic updating of the corresponding suffix trees necessary, this data structure alone has not been used with success. We prove that, for dynamic modifications of order $O(1)$ of words of length $n$, any suffix tree updating algorithm requires $O(n)$ worst-case running time, as for the full reconstruction of the suffix tree. Consequently, we argue that this data structure alone is not appropriate for the solution of combinatorial problems on words that change dynamically.
BibTeX:
@inproceedings{Ayala-Rincon1998,
  author = {Ayala-Rincon, M. and Conejo, P.D.},
  title = {A linear time lower bound on updating algorithms for suffix trees},
  booktitle = {String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings},
  year = {1998},
  pages = {1--6},
  url = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/spire/1998/8664/00/8664toc.xml&DOI=10.1109/SPIRE.1998.712976},
  doi = {http://dx.doi.org/10.1109/SPIRE.1998.712976}
}
Ayala-Rincon, M., Jacobi, R., Carvalho, L., Llanos, C. & Hartenstein, R. Modeling and prototyping dynamically reconfigurable systems for efficient computation of dynamic programming methods by rewriting-logic 2004 Integrated Circuits and Systems Design, 2004. SBCCI 2004. 17th Symposium on   inproceedings  
BibTeX:
@inproceedings{Ayala-Rincon2004,
  author = {Ayala-Rincon, M. and Jacobi, R.P. and Carvalho, L.G.A. and Llanos, C.H. and Hartenstein, R.W.},
  title = {Modeling and prototyping dynamically reconfigurable systems for efficient computation of dynamic programming methods by rewriting-logic},
  booktitle = {Integrated Circuits and Systems Design, 2004. SBCCI 2004. 17th Symposium on},
  year = {2004},
  pages = {248--253}
}
Baeza-Yates, R. A. String searching algorithms revisited 1989 Proceedings of WADS’89   inproceedings URL  
Abstract: We present bounds for the average case of the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore-Horspool (BMH) algorithm for random text. Experimental results in both random and English text suggests that the bounds are tight. We also present a hybrid algorithm which combines the KMP and BMH algorithms, and which, in practice, is faster than the Boyer-Moore algorithm.
BibTeX:
@inproceedings{Bae89c,
  author = {R. A. Baeza-Yates},
  title = {String searching algorithms revisited},
  booktitle = {Proceedings of WADS’89},
  publisher = {#svb#},
  year = {1989},
  number = {382},
  pages = {75--96},
  url = {http://www.springerlink.com/content/y53321817588m882/}
}
Baeza-Yates, R. A. & Navarro, G. Faster Approximate String Matching 1999 Algorithmica   article URL  
Abstract: We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input. This simulation uses bit operations on a RAM machine with word length w = z (log n) bits, where n is the text size. This is essentially similar to the model used in Wu and Manber's work, although we improve the search time by packing the automaton states differently. The running time achieved is O(n) for small patterns (i.e., whenever mk = O(log n)) , where m is the pattern length and kWe then explore other novel techniques to cope with longer patterns. We show how to partition the pattern into short subpatterns which can be searched with less errors using the simple automaton, to obtain an average cost close to O(Ömk/w n) . Moreover, we allow the superimposition of many subpatterns in a single automaton, obtaining near O(Ömk/(sw) n) average complexity (? is the alphabet size).

We perform a complete analysis of all the techniques and show how to combine them in an optimal form, also obtaining new tighter bounds for the probability of an approximate occurrence in random text. Finally, we show experimental results comparing our algorithms against previous work. These experiments show that our algorithms are among the fastest for typical text searching, being the fastest in some cases. Although we aim mainly at text searching, we believe that our ideas can be successfully applied to other areas such as computational biology.

BibTeX:
@article{BYN99,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {Faster Approximate String Matching},
  journal = {Algorithmica},
  year = {1999},
  volume = {23},
  number = {2},
  pages = {127--158},
  url = {http://www.springerlink.com/content/yuejwcff7ygfkjay/}
}
Baeza-Yates, R. A. & Navarro, G. Fast Two-Dimensional Approximate Pattern Matching 1998 Proceedings of the Third Latin American Symposium on Theoretical Informatics   inproceedings URL  
Abstract: We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size in a text of size with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O(m2 n2), this is in general too expensive for small k. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is for , which is optimal and matches the best previous result which allows only substitutions.For higher error levels, we present an algorithm with time complexity (where w is the size in bits of the computer word and is the alphabet size). This algorithm works for , where e = 2.718..., a limit which is not possible to improve. These are the first good expected-case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text has a different length.
BibTeX:
@inproceedings{BYN98,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {Fast Two-Dimensional Approximate Pattern Matching},
  booktitle = {Proceedings of the Third Latin American Symposium on Theoretical Informatics},
  publisher = {#svb#},
  year = {1998},
  number = {1380},
  pages = {341--351},
  url = {http://www.springerlink.com/content/21brrxyuppwt7mqu/}
}
Baeza-Yates, R. A. & Navarro, G. New and faster filters for multiple approximate string matching 1998 Random Structures and Algorithms   article URL  
Abstract: We present three new algorithms for on-line multiple string matching allowing errors. These are extensions of previous algorithms that search for a single pattern. The average running time achieved is in all cases linear in the text size for moderate error level, pattern length, and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We theoretically analyze when each algorithm should be used, and show their performance experimentally. The only previous solution for this problem allows only one error. Our algorithms are the first to allow more errors, and are faster than previous work for a moderate number of patterns (e.g. less than 50-100 on English text, depending on the pattern length).
BibTeX:
@article{BYN98b,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {New and faster filters for multiple approximate string matching},
  journal = {Random Structures and Algorithms},
  year = {1998},
  volume = {1},
  number = {TR/DCC-98-10},
  pages = {1},
  url = {http://www3.interscience.wiley.com/cgi-bin/abstract/88511567/ABSTRACT?CRETRY=1&SRETRY=0}
}
Baeza-Yates, R. A. & Navarro, G. Multiple Approximate String Matching 1997 Proceedings of the5th Workshop on Algorithms and Data Structures   inproceedings URL  
Abstract: We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of the first one is based on the simulation with bits of a non-deterministic finite automaton built from the pattern and using the text as input. To search for multiple patterns, we superimpose their automata, using the result as a filter. The second algorithm partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm. To handle multiple patterns, we search the sub-patterns of all of them together. The average running time achieved is in both cases O(n) for moderate error level, pattern length and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We analyze theoretically when each algorithm should be used, and show experimentally that they are faster
BibTeX:
@inproceedings{BYN97,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {Multiple Approximate String Matching},
  booktitle = {Proceedings of the5th Workshop on Algorithms and Data Structures},
  publisher = {#svb#},
  year = {1997},
  number = {1272},
  pages = {174--184},
  url = {http://citeseer.ist.psu.edu/75471.html}
}
Baeza-Yates, R. A. & Navarro, G. Block-addressing indices for approximate text retrieval 1997 Journal of the American Society for Information Science#proc#ACM CIKM'97   article URL  
Abstract: The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collections grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called approximate text searching, which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurrences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behavior of these block addressing indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching. We study the space overhead and retrieval times as functions of the block size. We find that, under reasonable assumptions, it is possible to build an index which is simultaneously sublinear in space overhead and in query time. This surprising analytical conclusion is validated with extensive experiments, obtaining typical performance figures. These results are valid for classical exact queries as well as for approximate searching. We apply our analysis to the Web, using recent statistics on the distribution of the document sizes. We show that pointing to documents instead of to fixed size blocks reduces space requirements but increases search times.
BibTeX:
@article{BYN97b,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {Block-addressing indices for approximate text retrieval},
  booktitle = {#proc#ACM CIKM'97},
  journal = {Journal of the American Society for Information Science},
  year = {1997},
  volume = {1},
  pages = {1--8},
  url = {http://www3.interscience.wiley.com/cgi-bin/abstract/69500929/ABSTRACT}
}
Baeza-Yates, R. A. & Navarro, G. A faster algorithm for approximate string matching 1996 Combinatorial Pattern Matching (CPM'96), Springer-Verlag LNCS   inproceedings URL  
Abstract: We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic finite automaton built from the pattern and using the text as input. This simulation uses bit operations on a RAM machine with word length O(log n), being n the maximum size of the text. The running time achieved is O(n) for small patterns (i.e. of length m=O(radiclog n)), independently of the maximum number of errors allowed, k. This algorithm is then used to design two general algorithms. One of them partitions the problem into subproblems, while the other partitions the automaton into sub-automata. These algorithms are combined to obtain a hybrid algorithm which on average is O(n) for moderate k/m ratios, O(radicmk/log n n) for medium ratios, and O((m–k)kn/log n) for large ratios. We show experimentally that this hybrid algorithm is faster than previous ones for moderate size of patterns and error ratios, which is the case in text searching.
BibTeX:
@inproceedings{BYN96a,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {A faster algorithm for approximate string matching},
  booktitle = {Combinatorial Pattern Matching (CPM'96), Springer-Verlag LNCS},
  publisher = {#svb#},
  year = {1996},
  number = {1075},
  pages = {1--23},
  url = {http://www.springerlink.com/content/y2736r088223u552/}
}
Baeza-Yates, R. A. & Navarro, G. A fast heuristic for approximate string matching 1996 Proceedings of the 3rd South American Workshop on String Processing (WSP'96)   inproceedings URL  
Abstract: We study a fast algorithm for on-line approximate string matching. It is based on a non-deterministic finite automaton, which is simulated using bit-parallelism. If the automaton does not fit in a computer word, we partition the problem into subproblems. We show experimentally that this algorithm is the fastest for typical text search. We also show which algorithms are the best in other cases, and derive the fastest known heuristic for on-line approximate string matching, when the pattern is not very large. The focus of this work is mainly practical.
BibTeX:
@inproceedings{BYN96b,
  author = {R. A. Baeza-Yates and G. Navarro},
  title = {A fast heuristic for approximate string matching},
  booktitle = {Proceedings of the 3rd South American Workshop on String Processing (WSP'96)},
  publisher = {#cup#},
  year = {1996},
  pages = {47--63},
  url = {http://citeseer.ist.psu.edu/50985.html}
}
Baeza-Yates, R. A. & Perleberg, C. H. Fast and practical approximate string matching 1996 Information Processing Letters   article URL  
Abstract: We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs.
BibTeX:
@article{BYP96,
  author = {R. A. Baeza-Yates and C. H. Perleberg},
  title = {Fast and practical approximate string matching},
  journal = {Information Processing Letters},
  year = {1996},
  volume = {59},
  number = {1},
  pages = {21--27},
  url = {http://cat.inist.fr/?aModele=afficheN&cpsidt=3175909}
}
Baeza-Yates, R. & Gonnet, G. A fast algorithm on average for all-against-all sequence matching 1999 String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware   inproceedings DOIURL  
Abstract: In this paper we present an algorithm which attempts to align pairs of subsequences from a database of genetic sequences. The algorithm simulates the classical dynamic programming alignment algorithm over a suffix array of the database. We provide a detailed average case analysis which shows that the running time of the algorithm is sub-quadratic with respect to the database size. A similar algorithm solves the approximate string matching problem in sub-linear average time.
BibTeX:
@inproceedings{Baeza-Yates1999,
  author = {Baeza-Yates, R.A. and Gonnet, G.H.},
  title = {A fast algorithm on average for all-against-all sequence matching},
  booktitle = {String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware},
  year = {1999},
  pages = {16--23},
  url = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/spire/1999/0268/00/0268toc.xml&DOI=10.1109/SPIRE.1999.796573},
  doi = {http://dx.doi.org/10.1109/SPIRE.1999.796573}
}
Baeza-Yates, R. & Navarro, G. New Models and Algorithms for Multidimensional Approximate Pattern Matching 2000 J. Discret. Algorithms   article URL  
Abstract: We focus on how to compute the edit distance (or similarity) between two images and the problem of approximate string matching in two dimensions, that is, to find a pattern of size m m in a text of size n n with at most k errors (character substitutions, insertions and deletions). Pattern and text are matrices over an alphabet of size . We present new models and give the first sublinear time search algorithms for the new and the existing models
BibTeX:
@article{BYN2000,
  author = {R. Baeza-Yates and G. Navarro},
  title = {New Models and Algorithms for Multidimensional Approximate Pattern Matching},
  journal = {J. Discret. Algorithms},
  year = {2000},
  volume = {1},
  number = {1},
  pages = {21--49},
  url = {http://citeseer.ist.psu.edu/550957.html}
}
Baeza-Yates, R. & Navarro, G. Fast approximate string matching in a dictionary 1998 String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings   article URL  
Abstract: A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that online search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over the fastest online algorithms, when the tolerated error level is low (which is reasonable in text searching)
BibTeX:
@article{baezayates1998fas,
  author = {Baeza-Yates, R. and Navarro, G.},
  title = {Fast approximate string matching in a dictionary},
  journal = {String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings},
  year = {1998},
  volume = {1},
  pages = {14--22},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=712978}
}
Baker, B. S. Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance 1997 SIAM Journal of Computing   article URL  
Abstract: As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model such parameterized duplication in code, this paper introduces the notions of parameterized strings and parameterized matches of parameterized strings. A data structure called a parameterized suffix tree is defined to aid in searching for parameterized matches. For fixed alphabets, algorithms are given to construct a parameterized suffix tree in linear time and to find all maximal parameterized matches over a threshold length in a parameterized p-string in time linear in the size of the input plus the number of matches reported. The algorithms have been implemented and experimental results show that they perform well on C code.
BibTeX:
@article{baker97parameterized,
  author = {Brenda S. Baker},
  title = {Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance},
  journal = {SIAM Journal of Computing},
  year = {1997},
  volume = {26(5)},
  pages = {1343–1362},
  url = {citeseer.ist.psu.edu/baker93parameterized.html}
}
Baker, B. S. Parameterized pattern matching: algorithms and applications 1996 Journal Computer System Science   article URL  
Abstract: The problem of finding sections of code that either are identical or are related by the systematic renaming of variables or constants can be modeled in terms of parameterized strings (p-strings) and parameterized matches (pmatches) [Baker93a]. P-strings are strings over two alphabets, one of which represents parameters. Two p-strings are a parameterized match (p-match) if one pstring is obtained by renaming the parameters of the other by a one-to-one function. In this paper, we investigate parameterized pattern matching via parameterized suffix trees (p-suffix trees), defined in [Baker93a]. We give two algorithms for constructing p-suffix trees: one (eager) that runs in linear time for fixed alphabets, and another that uses auxiliary data structures and runs in O(nlog (n) ) time for variable alphabets, where n is input length. We show that using a psuffix tree for a pattern p-string P, it is possible to search for all p-matches of P within a text p-string T in space linear in ïPïand time linear in ïTïfor fixed alphabets, or O(ïTïlog ( min (ïPï,s) ) time and O(ïPï) space for variable alphabets, where s is the sum of the alphabet sizes. The simpler p-suffix tree construction algorithm eager has been implemented, and experiments show it to be practical. Since it runs faster than predicted by the above worst-case bound, we reanalyze the algorithm, and show that eager runs in time O( min (tïSï+m(t,S)ït > 0 ) logs) ), where for an input pstring S, m(t,S) is the number of maximal p-matches of length at least t that occur within S, and s is the sum of the alphabet sizes. Experiments with the author’s program dup [Baker92] for finding all maximal p-matches within a pstring have found m(t,S) to be less than ïSïin practice unless t is small.
BibTeX:
@article{Bak96,
  author = {B. S. Baker},
  title = {Parameterized pattern matching: algorithms and applications},
  journal = {Journal Computer System Science},
  year = {1996},
  volume = {52},
  number = {1},
  pages = {28--42},
  url = {http://citeseer.ist.psu.edu/117724.html}
}
Baker, B. S. Parameterized pattern matching by Boyer-Moore-type algorithms 1995 Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms   inproceedings URL  
Abstract: This paper investigates generalizations of the Boyer-Moore string pattern matching algorithm to parameterized pattern matching. Parameterized pattern matching was invented by Baker [Bak93b] for the purpose of finding sections of code in a software system that are the same except for a systematic change of parameters. We show that for Boyer-Moore-type algorithms that do not save information about previously matched portions of text, straightforward generalizations to parameterized pattern-matching must have a running time of OmegaGamma nmin(m; p)), where n is the text length, m is the pattern length, and p is the number of parameter symbols in the alphabet. However, we describe a parameterized pattern matching algorithm PturboBM that has the same overall structure as the Boyer-Moore algorithm but saves information about previously matched portions of text and runs in time O(n log min(m; p)), with preprocessing time O(m log min(m; p)), where n is the length of the text, m is the lenght of the pattern, and p is the number of distinct parameter symbols. Experiments show that the PTurboBM algorithms has promiseas an efficient means of accomplishing parameteized pattern matching for applications.
BibTeX:
@inproceedings{Bak95,
  author = {B. S. Baker},
  title = {Parameterized pattern matching by {Boyer-Moore-type} algorithms},
  booktitle = {Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms},
  year = {1995},
  pages = {541--550},
  url = {http://citeseer.ist.psu.edu/6775.html}
}
Barman, R., Bolotski, M., Camporese, D. & Little, J. Silt: the bit-parallel approach 1990 Pattern Recognition, 1990. Proceedings., 10th International Conference on   inproceedings DOI  
Abstract: A particular form of parallelism, called bit-parallelism, is introduced. A bit-parallel organization distributes each bit of a data item to a different processor. Bit-parallelism allows computation that is sublinear with word size for such operations as integer addition, arithmetic shifts, and data moves. The implications of bit-parallelism for system architecture are analyzed. An implementation of a bit-parallel architecture based on a mesh with a bypass network is presented. Using a conservative estimate for cycle time, a Silt processor performs 64-b integer additions more than 10 times faster than the Connection Machine-2. Using current CMOS technology, a 16 M processor Silt system would be capable of nearly 500 billion 32-b adds per second. The application of the architecture to low-level vision algorithms is discussed
BibTeX:
@inproceedings{Barman1990,
  author = {Barman, R. and Bolotski, M. and Camporese, D. and Little, J.J.},
  title = {Silt: the bit-parallel approach},
  booktitle = {Pattern Recognition, 1990. Proceedings., 10th International Conference on},
  year = {1990},
  volume = {ii},
  pages = {332--336vol.2},
  doi = {http://dx.doi.org/10.1109/ICPR.1990.119378}
}
Barry, D. K. Web Services and Service-Oriented Architecture: The Savvy Manager's Guide 2003   book  
Abstract: Web services are leading to the use of more packaged software either as an internal service or an external service available over the Internet. These services, which will be connected together to create the information technology systems of the future, will require less custom software in our organizations and more creativity in the connections between the services. This book begins with a high-level example of how an average person in an organization might interact with a service-oriented architecture. As the book progresses, more technical detail is added in a "peeling of the onion" approach. The leadership opportunities within these developing service-oriented architectures are also explained. At the end of the book there is a compendium or "pocket library" for software technology related to service-oriented architectures. Only web services book to cover both data management and software engineering perspectives, excellent resource for ALL members of IT teams Jargon free, highly illustrated, with introduction that anyone can read that then leads into increasing technical detail Provides a set of leadership principles and suggested application for using this technology.
BibTeX:
@book{Barry2003,
  author = {Douglas K. Barry},
  title = {Web Services and Service-Oriented Architecture: The Savvy Manager's Guide},
  publisher = {Morgan Kaufmann},
  year = {2003},
  pages = {245}
}
Bedathur, S. & Haritsa, J. Engineering a fast online persistent suffix tree construction 2004 Data Engineering, 2004. Proceedings. 20th International Conference on   inproceedings DOIURL  
Abstract: Online persistent suffix tree construction has been considered impractical due to its excessive I/O costs. However, these prior studies have not taken into account the effects of the buffer management policy and the internal node structure of the suffix tree on I/O behavior of construction and subsequent retrievals over the tree. We study these two issues in detail in the context of large genomic DNA and protein sequences. In particular, we make the following contributions: (i) a novel, low-overhead buffering policy called TOP-Q which improves the on-disk behavior of suffix tree construction and subsequent retrievals, and (ii) empirical evidence that the space efficient linked-list representation of suffix tree nodes provides significantly inferior performance when compared to the array representation. These results demonstrate that a careful choice of implementation strategies can make online persistent suffix tree construction considerably more scalable - in terms of length of sequences indexed with a fixed memory budget, than currently perceived.
BibTeX:
@inproceedings{Bedathur2004,
  author = {Bedathur, S.J. and Haritsa, J.R.},
  title = {Engineering a fast online persistent suffix tree construction},
  booktitle = {Data Engineering, 2004. Proceedings. 20th International Conference on},
  year = {2004},
  pages = {720--731},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1320040},
  doi = {http://dx.doi.org/10.1109/ICDE.2004.1320040}
}
Bergeron, A. & Hamel, S. Vector Algorithms for Approximate String Matching 2002 International Journal of Foundations of Computer Science   article URL  
Abstract: Vector algorithms allow the computation of an output vector r = r 1 r2 : : : rm given an input vector e = e 1 e 2 : : : em in a bounded number of operations, independent of m the length of the vectors. The allowable operations are usually restricted to bit-wise operations available in processors, including shifts and binary addition with carry. These restrictions imply that the existence of a vector algorithm for a particular problem opens the way to extremely fast implementations, using the inherent parallelism of bit-wise operations.
BibTeX:
@article{bergeron02vector,
  author = {Anne Bergeron and Sylvie Hamel},
  title = {Vector Algorithms for Approximate String Matching},
  journal = {International Journal of Foundations of Computer Science},
  year = {2002},
  volume = {13},
  number = {1},
  pages = {53-66},
  url = {http://citeseer.ist.psu.edu/489295.html}
}
Berghel, H., Roach, D. & Talburt, J. Approximate string matching and the automation of word games 1990 Applied Computing, 1990., Proceedings of the 1990 Symposium on   inproceedings DOIURL  
Abstract: It is shown how approximate sting matching can be involved in the automation of various aspects of word game construction or solution. In the discussion, the authors try to identify and explicate the underlying issues in computational linguistics and suggest techniques which have been used to address these issues. It is shown that the two main issues which arise in this context have to do with lexical processing and heuristics, issues which arise in more practical contexts as well
BibTeX:
@inproceedings{Berghel1990,
  author = {Berghel, H. and Roach, D. and Talburt, J.},
  title = {Approximate string matching and the automation of word games},
  booktitle = {Applied Computing, 1990., Proceedings of the 1990 Symposium on},
  year = {1990},
  pages = {209--213},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=82170},
  doi = {http://dx.doi.org/10.1109/SOAC.1990.82170}
}
Bergroth, L., Hakonen, H. & aRaita, T. New approximation algorithms for longest common subsequences 1998 String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings   inproceedings DOI  
BibTeX:
@inproceedings{Bergroth1998,
  author = {Bergroth, L. and Hakonen, H. and aRaita, T.},
  title = {New approximation algorithms for longest common subsequences},
  booktitle = {String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings},
  year = {1998},
  pages = {32--40},
  doi = {http://dx.doi.org/10.1109/SPIRE.1998.712980}
}
Bergroth, L., Hakonen, H. & Raita, T. A survey of longest common subsequence algorithms 2000 String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Bergroth2000,
  author = {Bergroth, L. and Hakonen, H. and Raita, T.},
  title = {A survey of longest common subsequence algorithms},
  booktitle = {String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on},
  year = {2000},
  pages = {39--48},
  doi = {http://dx.doi.org/10.1109/SPIRE.2000.878178}
}
Bertino, E. & Martino, L. Tutorial 6: Security in SOA and Web Services 2006 Web Services, 2006. ICWS '06. International Conference on   article URL  
Abstract: Security is today a relevant requirement for any distributed application, and in particular for these enabled by the Web such as e-health, e-commerce, and e-learning. It is thus crucial that the use of Web services, stand-alone or composed, provide strong security guarantees. Web services security encompasses several requirements that can be described along the well known security dimensions, that is: integrity, whereby a message must remain unaltered during transmission; confidentiality, whereby the contents of a message cannot be viewed while in transit, except by authorized services; availability, whereby a message is promptly delivered to the intended recipient, thus ensuring that legitimate users receive the services they are entitled to. Moreover, each Web service must protect its own resources against unauthorized access. This in turn requires suitable means for: identification, whereby the recipient of a message must be able to identify the sender; authentication, whereby the recipient of a message needs to verify the claimed identity of the sender; authorization, whereby the recipient of a message needs to apply access control policies to determine whether the sender has the right to use the required resources. In the tutorial we will first discuss the main security requirements underlying the interactions between clients and Web services and among the Web services themselves. Then we will describe how such security requirements are addressed by standards for Web services security recently developed or under development by various standardizations bodies. Standards that are covered include: WSS, that encompasses a large number of components addressing various security aspects; XACML, that is related to access control and has been recently extended with a profile for Web services access control; WS-Federation, Liberty Alliance and Shibboleth, that address the important problem of identity management in federated organizations. Issues related to the use of th- ese standards are discussed. Then, research approaches to the problem of Web service security will be surveyed, including negotiation-based access control for Web services, and access control for conversation-based Web services
BibTeX:
@article{Bertino2006,
  author = {Elisa Bertino and Lorenzo Martino},
  title = {Tutorial 6: Security in SOA and Web Services},
  journal = {Web Services, 2006. ICWS '06. International Conference on},
  year = {2006},
  pages = {xlv - xlv},
  url = {http://www.sinab.unal.edu.co:2365/xpl/RecentCon.jsp?punumber=4031979}
}
Bertossi, A. A., Luccio, F., Pagli, L. & Lodi, E. A parallel solution to the approximate string matching algorithm problem 1992 The Computer Journal   article URL  
BibTeX:
@article{BLPL92,
  author = {A. A. Bertossi and F. Luccio and L. Pagli and E. Lodi},
  title = {A parallel solution to the approximate string matching algorithm problem},
  journal = {The Computer Journal},
  year = {1992},
  volume = {35},
  number = {5},
  pages = {524--526},
  url = {http://portal.acm.org/citation.cfm?id=141772.146121}
}
Bieganski, P., Riedl, J., Cartis, J. & Retzel, E. Generalized suffix trees for biological sequence data: applications and implementation 1994 System Sciences, 1994. Vol.V: Biotechnology Computing, Proceedings of the Twenty-Seventh Hawaii International Conference on   inproceedings DOIURL  
Abstract: This paper addresses applications of suffix trees and generalized suffix trees (GSTs) to biological sequence data analysis. We define a basic set of suffix trees and GST operations needed to support sequence data analysis. While those definitions are straightforward, the construction and manipulation of disk-based GST structures for large volumes of sequence data requires intricate design. GST processing is fast because the structure is content addressable, supporting efficient searches for all sequences that contain particular subsequences. Instead of laboriously searching sequences stored as arrays, we search by walking down the tree. We present a new GST-based sequence alignment algorithm, called GESTALT. GESTALT finds all exact matches in parallel, and uses best-first search to extend them to produce alignments. Our implementation experiences with applications using GST structures for sequence analysis lead us to conclude that GSTs are valuable tools for analyzing biological sequence data
BibTeX:
@inproceedings{Bieganski1994,
  author = {Bieganski, P. and Riedl, J. and Cartis, J.V. and Retzel, E.F.},
  title = {Generalized suffix trees for biological sequence data: applications and implementation},
  booktitle = {System Sciences, 1994. Vol.V: Biotechnology Computing, Proceedings of the Twenty-Seventh Hawaii International Conference on},
  year = {1994},
  volume = {5},
  pages = {35--44},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=323593},
  doi = {http://dx.doi.org/10.1109/HICSS.1994.323593}
}
Birman, K. P. Reliable Distributed Systems: Technologies, Web Services, and Applications 2005   book  
BibTeX:
@book{Birman2005,
  author = {Kenneth P. Birman},
  title = {Reliable Distributed Systems: Technologies, Web Services, and Applications},
  publisher = {Springer},
  year = {2005},
  pages = {668}
}
Bluthgen, H. & Noll, T. A programmable processor for approximate string matching with high throughput rate 2000 Application-Specific Systems, Architectures, and Processors, 2000. Proceedings. IEEE International Conference on   inproceedings DOIURL  
Abstract: In this paper, we present the algorithm and architecture of a processor for approximate string matching with high throughput rate. The processor is dedicated for multimedia and information retrieval applications working on huge amounts of mass data where short response times are necessary. The algorithm used for the approximate string matching is based on a dynamic programming procedure known as the string-to-string correction problem. It has been extended to fulfill the requirements of full text search in a database system, including string matching with wildcards and handling of idiomatic turns of some languages. The processor has been fabricated in a 0.6-um CMOS technology. It performs a maximum of 8.5 Billion character comparisons per second when operating at the specified clock frequency of 132 MHz.
BibTeX:
@inproceedings{Bluthgen2000,
  author = {Bluthgen, H.-M. and Noll, T.G.},
  title = {A programmable processor for approximate string matching with high throughput rate},
  booktitle = {Application-Specific Systems, Architectures, and Processors, 2000. Proceedings. IEEE International Conference on},
  year = {2000},
  pages = {309--316},
  url = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/asap/2000/0716/00/0716toc.xml&DOI=10.1109/ASAP.2000.862401},
  doi = {http://dx.doi.org/10.1109/ASAP.2000.862401}
}
Boyer, R. S. & Moore, J. S. A fast string searching algorithm 1977 Communications of the ACM   article URL  
Abstract: An algorithm is presented that searches for the location, “il” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched. Thus the algorithm has the unusual property that, in most cases, not all of the first i characters of string are inspected. The number of characters actually inspected (on the average) decreases as a function of the length of pat. For a random English pattern of length 5, the algorithm will typically inspect i/4 characters of string before finding a match at i. Furthermore, the algorithm has been implemented so that (on the average) fewer than i + patlen machine instructions are executed. These conclusions are supported with empirical evidence and a theoretical analysis of the average behavior of the algorithm. The worst case behavior of the algorithm is linear in i + patlen, assuming the availability of array space for tables linear in patlen plus the size of the alphabet.
BibTeX:
@article{BM77,
  author = {R. S. Boyer and J. S. Moore},
  title = {A fast string searching algorithm},
  journal = {Communications of the ACM},
  year = {1977},
  volume = {20},
  number = {10},
  pages = {762--772},
  url = {http://portal.acm.org/citation.cfm?id=359842.359859}
}
Brodal, G. & Gasieniec, L. Approximate Dictionary Queries 1996 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: Given a set of n binary strings of length m each. We consider the problem of answering d-queries. Given a binary query string alfa of length m a d-query is to report if there exists a strin within Hamming distance d of alfa. We presente a data structure of size O9nm) supporting 1-queries in time O(m) and the reporting of all strings within Hammingdistance 1 of alfa in time O(m). The data structure can be constructed in time O(mn). A slightly modified version of the data structure supports the insertion of new strings in amortized time O(m).
BibTeX:
@inproceedings{BG96,
  author = {Brodal, G.S. and Gasieniec, L.},
  title = {Approximate Dictionary Queries},
  booktitle = {Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1996},
  number = {1075},
  pages = {65--74},
  url = {http://portal.acm.org/citation.cfm?id=738443}
}
Bunke, H. Fast Approximate Matching of Words Against a Dictionary 1995 Computing   article URL  
Abstract: A new algorithm for string edit distance computation is given. The algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted in an off-line phase into a deterministic finite state automaton. Given an input string and the automaton derived from the dictionary word, the computation of the edit distance between the two strings corresponds to a traversal of the states of the automaton. This procedure needs time which is only linear in the length of the input string. It is independent of the length of the dictionary word. Given not only one butN different dictionary words, their corresponding automata can be combined into a single deterministic finite state automaton. Thus the computation of the edit distance between the input word and each dictionary entry, and the determination of the nearest neighbor in the dictionary need time that is only linear in the length of the input string. However, the number os states of the automation is exponential.
BibTeX:
@article{Bun95,
  author = {H. Bunke},
  title = {Fast Approximate Matching of Words Against a Dictionary},
  journal = {Computing},
  year = {1995},
  volume = {55},
  number = {1},
  pages = {75--89},
  url = {http://www.springerlink.com/content/71621v504j281841/}
}
Cerami, E. Web Services Essentials 2002   book  
Abstract: As a developer new to Web Services, how do you make sense of this emerging framework so you can start writing your own services today? This concise book gives programmers both a concrete introduction and a handy reference to XML web services, first by explaining the foundations of this new breed of distributed services, and then by demonstrating quick ways to create services with open-source Java tools. Web Services make it possible for diverse applications to discover each other and exchange data seamlessly via the Internet. For instance, programs written in Java and running on Solaris can find and call code written in C# that run on Windows XP, or programs written in Perl that run on Linux, without any concern about the details of how that service is implemented. A common set of Web Services is at the core of Microsoft's new .NET strategy, Sun Microsystems's Sun One Platform, and the W3C's XML Protocol Activity Group. In this book, author Ethan Cerami explores four key emerging technologies: XML Remote Procedure Calls (XML-RPC) SOAP - The foundation for most commercial Web Services development Universal Discovery, Description and Integration (UDDI) Web Services Description Language (WSDL) For each of these topics, Web Services Essentials provides a quick overview, Java tutorials with sample code, samples of the XML documents underlying the service, and explanations of freely-available Java APIs. Cerami also includes a guide to the current state of Web Services, pointers to open-source tools and a comprehensive glossary of terms. If you want to break through the Web Services hype and find useful information on these evolving technologies, look no further than Web Services Essentials.
BibTeX:
@book{Cerami2002,
  author = {Ethan Cerami},
  title = {Web Services Essentials},
  publisher = {O'Reilly},
  year = {2002},
  pages = {288}
}
Cha, S., Shin, Y. & Srihari, S. Approximate stroke sequence string matching algorithm for character recognition and analysis 1999 Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on   inproceedings DOIURL  
Abstract: Given two character images, we would like to measure their similarity or difference. Such a similarity or difference measure facilitates the solution to character recognition and handwriting analysis problems. There is, however, no universal definition for similarity measure satisfying a wide range of characteristics such as the slant, deformation or other invariant constraints. For this reason, we propose a new definition for the character similarity measure. First, the proposed method converts a two-dimensional image into a one-dimensional string. Next, it computes the edit distance by the modified approximate string matching algorithm. We describe how to extract the string information and compute the distance and then present the details of applications in handwriting analysis and both online and offline character recognition
BibTeX:
@inproceedings{Cha1999,
  author = {Sung-Hyuk Cha and Yong-Chul Shin and Srihari, S.N.},
  title = {Approximate stroke sequence string matching algorithm for character recognition and analysis},
  booktitle = {Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on},
  year = {1999},
  pages = {53--56},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=791723},
  doi = {http://dx.doi.org/10.1109/ICDAR.1999.791723}
}
Chang, W. & Lampe, J. Theoretical and Empirical Comparisons of Approximate String Matching Algorithms 1991   book  
BibTeX:
@book{chang1991tae,
  author = {Chang, W.I. and Lampe, J.},
  title = {Theoretical and Empirical Comparisons of Approximate String Matching Algorithms},
  publisher = {University of California, Berkeley, Computer Science Division},
  year = {1991}
}
Chang, W. & Marr, T. Approximate String Matching with Local Similarity 1994 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: The best known rigorous method for biological sequence comparison has been the algorithm of Smith and Waterman. It computes in quadratic time the highest scoring local alignment of two sequences given a (nonmetric) similarity measure and gap penalty. In this paper, we describe how the distance-based sublinear expected time algorithm of Chang and Lawler can be extended to solve efficiently the local similarity problem. We present both a new theoretical result, polynomialspace, constant-fraction-error matching that is provably optimal, and a practical adaptation of it that produces nearly identical results as Smith-Waterman, at speedups of 2X (PAM 120, roughly corresponding to 33% identity) to 10X (PAM 90, 50% identity) or better. Further improvements are anticipated. What makes this possible is the addition of a new constraint on unit score (average score per residue), which filters out both very short alignments and very long alignments with unacceptably low average. This program is part of a package called Genome Analyst that is being developed at CSHL.
BibTeX:
@inproceedings{CM94,
  author = {W.I. Chang and T. Marr},
  title = {Approximate String Matching with Local Similarity},
  booktitle = {Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1994},
  number = {807},
  pages = {259--273},
  url = {http://portal.acm.org/citation.cfm?id=647814.738431}
}
Chang, W. I. Approximate pattern matching and biological applications 1991 School: University of California, Berkeley   phdthesis  
BibTeX:
@phdthesis{Cha91,
  author = {W. I. Chang},
  title = {Approximate pattern matching and biological applications},
  school = {University of California, Berkeley},
  year = {1991}
}
Chen, Y. & Lee, S. Parsimony-spaced suffix trees for DNA sequences 2003 Multimedia Software Engineering, 2003. Proceedings. Fifth International Symposium on   inproceedings DOIURL  
Abstract: Bioinformatics has become an important research field because there is more genetic data to be analyzed. The suffix tree is a powerful data structure for string analysis and has many applications in bioinformatics. Its linear construction time, linear construction space and short search time all make it very impressive. However, consuming huge space is a fatal drawback especially while using suffix trees to handle the large number of DNA sequences. We utilize some characteristics of DNA sequences to reduce the space requirement of suffix trees. A new bit layout is proposed for the node of a suffix tree which requires less space than others. We also use an index table, called a "prefix table", which can reduce the number of internal nodes in suffix trees. In addition, we propose a preprocessing technique to improve the construction time based on our data structure. The experiments show that our proposed method is the most space-parsimony implementation of suffix trees for DNA sequences and it also has a good construction time.
BibTeX:
@inproceedings{Chen2003,
  author = {Yun-Ching Chen and Suh-Yin Lee},
  title = {Parsimony-spaced suffix trees for DNA sequences},
  booktitle = {Multimedia Software Engineering, 2003. Proceedings. Fifth International Symposium on},
  year = {2003},
  pages = {250--256},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1254449},
  doi = {http://dx.doi.org/10.1109/MMSE.2003.1254449}
}
Chen, Y., Nascimento, M., Ooi, B. C. & Tung, A. SpADe: On Shape-based Pattern Detection in Streaming Time Series 2007 Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Chen2007,
  author = {Yueguo Chen and Nascimento, M.A. and Beng Chin Ooi and Tung, A.K.H.},
  title = {SpADe: On Shape-based Pattern Detection in Streaming Time Series},
  booktitle = {Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on},
  year = {2007},
  pages = {786--795},
  doi = {http://dx.doi.org/10.1109/ICDE.2007.367924}
}
Chen;, H. G. B. J. J. W. W. X. N. A framework for application server based Web services management 2005 Software Engineering Conference, 2005. APSEC '05. 12th Asia-Pacific   article DOI  
Abstract: Web services are considered as solution for solving the interoperability problem and the challenge of integration. How to manage Web services more efficiently is a key problem to the applications based on Web services at present. This paper presents a framework for application server based Web services management named FASWSM. The main contribution of FASWSM is Web service adaptation, so that a Web service can be plugged into the application server (AS) as a Web service adapter which enables a Web service to be managed by the application server. Therefore, the Web services management efforts are greatly reduced by leveraging various services provided by the AS. The FASWSM also provides mechanisms for dynamical reconfiguration which improve the extensibility and flexibility of Web services management. The FASWSM has been applied to OnceAS 2.0 application server, and the experiments data show that FASWSM outperforms JAX-RPC in terms of Web service invocation.
BibTeX:
@article{,
  author = {Heqing Guan; Beihong Jin; Jun Wei; Wei Xu; Ningjiang Chen;},
  title = {A framework for application server based Web services management},
  journal = {Software Engineering Conference, 2005. APSEC '05. 12th Asia-Pacific},
  year = {2005},
  doi = {http://dx.doi.org/10.1109/APSEC.2005.8}
}
Cheng, Q. Z. J. Web Services and Service-Oriented Application Provisioning: An Analytical Study of Application Service Strategies 2006 Engineering Management, IEEE Transactions on   article DOI  
Abstract: As a new network computing technology, Web services help reduce application integration cost and have become the foundation for the next generation of service-oriented architecture. While there is ample research on the technological issues of Web services, little has been done thus far on the business implications of this groundbreaking technology. In this paper, we attempt to close this research gap by comparing three application service strategies for providing Web services of complementary functionalities, i.e., independent service vendors (ISVs), joint venture (JV), and strategic alliance (SA). Although Web service integration is analogous to product bundling, the unique features of software integration, such as low integration cost and the possibility of creating an integrated Web service with new functionalities, have led to interesting analytical results not seen in the product bundling literature. We find that the optimal application service strategy is determined by several factors, including the integration cost, the valuations, and market potentials of the individual and integrated Web services. Our study shows that the application service strategy of ISVs is always dominated by the SA, implying that Web service providers benefit from the increased interoperability of Web services. In cases where the integrated Web service is highly valuated and the integration cost is small, JV becomes the optimal application service strategy. In other cases, the Web service providers are better off to form SAs
BibTeX:
@article{Cheng2006,
  author = {Cheng, H.K.; Tang, Q.C.; Zhao, J.L.;},
  title = {Web Services and Service-Oriented Application Provisioning: An Analytical Study of Application Service Strategies},
  journal = {Engineering Management, IEEE Transactions on},
  year = {2006},
  volume = {53},
  pages = {520-533},
  doi = {http://dx.doi.org/10.1109/TEM.2006.883711}
}
Cheng, L., Cheung, D. & Yiu, S. Approximate string matching in DNA sequences 2003 Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on   inproceedings DOIURL  
Abstract: Approximate string matching on large DNA sequences data is very important in bioinformatics. Some studies have shown that suffix tree is an efficient data structure for approximate string matching. It performs better than suffix array if the data structure can be stored entirely in the memory. However our study find that suffix array is much better than suffix tree for indexing the DNA sequences since the data structure has to be created and stored on the disk due to its size. We propose a novel auxiliary data structure which greatly improves the efficiency of suffix array in the approximate string matching problem in the external memory model. The second problem we have tackled is the parallel approximate matching in DNA sequence. We propose 2 novel parallel algorithms for this problem and implement them on a PC cluster The result shows that when the error allowed is small, a direct partitioning of the array over the machines in the cluster is a more efficient approach. On the other hand, when the error allowed is large, partitioning the data over the machines is a better approach.
BibTeX:
@inproceedings{Cheng2003,
  author = {Lok-Lam Cheng and Cheung, D.W. and Siu-Ming Yiu},
  title = {Approximate string matching in DNA sequences},
  booktitle = {Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on},
  year = {2003},
  pages = {303--310},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1192395},
  doi = {http://dx.doi.org/10.1109/DASFAA.2003.1192395}
}
Cheng, W., Yen, K., Lin, C., Ching, Y. & Yang, Y. Comparing lanes in the pulsed-field gel electrophoresis (PFGE) images 2001 Engineering in Medicine and Biology Society, 2001. Proceedings of the 23rd Annual International Conference of the IEEE   inproceedings  
BibTeX:
@inproceedings{Cheng2001,
  author = {Wei-Zen Cheng and Kue-Sai Yen and Chih-Yang Lin and Yu-Tai Ching and Yun-Liang Yang},
  title = {Comparing lanes in the pulsed-field gel electrophoresis (PFGE) images},
  booktitle = {Engineering in Medicine and Biology Society, 2001. Proceedings of the 23rd Annual International Conference of the IEEE},
  year = {2001},
  volume = {3},
  pages = {2911--2913vol.3}
}
Chou, L. L. F. Web Services for Service-Oriented Communication 2006 Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006. International Conference on   article DOI  
Abstract: Service-oriented communication (SOC) is a new development in the industry to enable communication through Web services and SOA. SOC is to make communication as service and provide a service-oriented architecture to integrate communication in business applications. Recent advances of Web service and SOA have made it possible for a full Web service and SOA based communication paradigm over IP. This paper is an overview of the recent development in this area with a focus on key technologies that can be applied to Web service enablement of communication. In particular, we discuss the generic Web service based application session management based on WS-session, the two-way full duplex Web service interaction framework, and the development of Web service initiation protocol (WIP). WIP is a full Web service and SOA based communication protocol for multimedia and voice communication over IP. The generic Web service approach of WIP overcomes many limitations which would be otherwise difficult to achieve in non-Web service based communication methods used today. In addition, we discuss the service composition and orchestration framework based on WS-BPEL, and illustrate the application of this approach through some real use cases
BibTeX:
@article{Chou2006,
  author = {Chou, Wu Li, Li Liu, Feng},
  title = {Web Services for Service-Oriented Communication},
  journal = {Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006. International Conference on},
  year = {2006},
  pages = {1-8},
  doi = {http://dx.doi.org/10.1109/COLCOM.2006.361895}
}
Clifford, P., Clifford, R. & Iliopulos, C. Faster algorithms for $, $-matching and related problems 2005 Lecture notes in computer science   article URL  
Abstract: We present new faster algorithms for the problems of delta and (delta, gamma)-matching on numeric strings. In both cases the running time of the proposed algorithms is shown to be O(delta n log m), where m is the pattern length, n is the text length and delta a given integer. Our approach makes use of Fourier transform methods and the running times are independent of the alphabet size. $O(nmm)$ algorithms for the gamma-matching and total-difference problems are also given. In all the above cases, we improve existing running time bounds in the literature.
BibTeX:
@article{clifford:fadelta,
  author = {Clifford, P. and Clifford, R. and Iliopulos, C.},
  title = {Faster algorithms for $\delta$, $\gamma$-matching and related problems},
  journal = {Lecture notes in computer science},
  publisher = {Springer},
  year = {2005},
  volume = {3537},
  pages = {68--78},
  url = {http://www.springerlink.com/content/pvadxkuxkk9crqj9/}
}
Cobbs, A. L. Fast Identification of Approximately Matching Substrings 1994 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: Let two strings S, T over a finite alphabet Sgr be given, and let M be an arbitrary relation on Sgr×Sgr. Define an approximate match (x,y) of two length m subwords (substrings) x sqsube S, y sqsube T when M(x i ,y i , for all 1leilem. A match implies all the local alignments (without insertions and deletions) which are pairings of specific occurrances of x and y. A match (x,y) is maximal if there exists no longer match (u, v) such that all of the local alignments implied by (x,y) are contained in a local alignment implied by (u,v). We give an efficient algorithm for finding all maximal matches between S and T. The algorithm runs in time bounded by the sum of the lengths of the maximal matches, at worst. O(¦Sgr¦2 n 2). The main application is identifying homologous regions of protein sequences.
BibTeX:
@inproceedings{Cob94,
  author = {A. L. Cobbs},
  title = {Fast Identification of Approximately Matching Substrings},
  booktitle = {Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1994},
  number = {807},
  pages = {64--74},
  url = {http://www.springerlink.com/content/70153wgj65035018/}
}
Cole, R. & Hariharan, R. Faster Suffix Tree Construction with Missing Suffix Links 2003 SIAM Journal on Computing   article DOIURL  
Abstract: We consider suffix tree construction for situations with missing suffix links. Two examples of such situations are suffix trees for parameterized strings and suffix trees for two-dimensional arrays. These trees also have the property that the node degrees may be large. We add a new back-propagation component to McCreight's algorithm and also give a high probability hashing scheme for large degrees. We show that these two features enable construction of suffix trees for general situations with missing suffix links in O(n) time, with high probability. This gives the first randomized linear time algorithm for constructing suffix trees for parameterized strings.
BibTeX:
@article{cole:26,
  author = {Richard Cole and Ramesh Hariharan},
  title = {Faster Suffix Tree Construction with Missing Suffix Links},
  journal = {SIAM Journal on Computing},
  publisher = {SIAM},
  year = {2003},
  volume = {33},
  number = {1},
  pages = {26-42},
  url = {http://link.aip.org/link/?SMJ/33/26/1},
  doi = {http://citeseer.ist.psu.edu/375895.html}
}
Cole, R. & Hariharan, R. Approximate String Matching: A Faster Simpler Algorithm 1998 The 1998 9 th Annual ACM SIAM Symposium on Discrete Algorithms   inproceedings URL  
Abstract: We give two algorithms for finding all approximate matches of a pattern in a text, where the edit distance between the pattern and the matching text substring is at most k. The first algorithm, which is quite simple, runs in time O(nk super(3)/m+n+m) on all patterns except mostly periodic strings (defined later). The second algorithm runs in time O(nk super(4)/m+n+m) on mostly periodic patterns. The two classes of patterns are easily distinguished in O(m) time.
BibTeX:
@inproceedings{CH98b,
  author = {R. Cole and R. Hariharan},
  title = {Approximate String Matching: A Faster Simpler Algorithm},
  booktitle = {The 1998 9 th Annual ACM SIAM Symposium on Discrete Algorithms},
  year = {1998},
  pages = {463--472},
  url = {http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=SMJCAT000031000006001761000001&idtype=cvips&gifs=yes}
}
Commentz-Walter, B. A String Matching Algorithm Fast on the Average 1979   book  
BibTeX:
@book{commentzwalter1979sma,
  author = {Commentz-Walter, B.},
  title = {A String Matching Algorithm Fast on the Average},
  publisher = {Springer-Verlag London, UK},
  year = {1979}
}
Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W. & Rytter, W. Speeding up two string matching algorithms 1994 Algorithmica   article URL  
Abstract: We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 ?n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.
BibTeX:
@article{CCGJLPR94,
  author = {M. Crochemore and A. Czumaj and L. G{\c a}sieniec and S. Jarominek and T. Lecroq and W. Plandowski and W. Rytter},
  title = {Speeding up two string matching algorithms},
  journal = {Algorithmica},
  year = {1994},
  volume = {12},
  number = {4/5},
  pages = {247--267},
  url = {http://www.springerlink.com/content/p783wl7323874635/}
}
Crochemore, M. & Rytter, W. Jewels of Stringology 2002   book URL  
BibTeX:
@book{crochemore2002js,
  author = {Crochemore, M. and Rytter, W.},
  title = {Jewels of Stringology},
  publisher = {World Scientific Publishing Company},
  year = {2002},
  url = {http://www-igm.univ-mlv.fr/~mac/JOS/JOS.html}
}
Cull, P. & Holloway, J. A divide and conquer approach to approximate string matching 1991   techreport  
BibTeX:
@techreport{CH91,
  author = {P. Cull and J. Holloway},
  title = {A divide and conquer approach to approximate string matching},
  year = {1991},
  number = {TR-91-50-1}
}
D. J. Hand, P. S. Principles of Data Mining 2001   book  
Abstract: The first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics
BibTeX:
@book{D.2001,
  author = {D. J. Hand, Heikki Mannila, Padhraic Smyth},
  title = {Principles of Data Mining},
  publisher = {MIT Press},
  year = {2001},
  pages = {546}
}
Darabont, T. Approximate string matching with $k$ mismatches 1992 School: Faculty of Electrical Engineering, Czech Technical University, Prague, Czech Republic   mastersthesis  
BibTeX:
@mastersthesis{Dar92,
  author = {T. Darabont},
  title = {Approximate string matching with $k$ mismatches},
  school = {Faculty of Electrical Engineering, Czech Technical University, Prague, Czech Republic},
  year = {1992}
}
Das, S., Datta, A. & Pothuru, S. Implementing string-to-string correction and longest common subsequence problems on the Sequent Symmetry multiprocessor 1996 High Performance Computing, 1996. Proceedings. 3rd International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Das1996,
  author = {Das, S.K. and Datta, A.K. and Pothuru, S.},
  title = {Implementing string-to-string correction and longest common subsequence problems on the Sequent Symmetry multiprocessor},
  booktitle = {High Performance Computing, 1996. Proceedings. 3rd International Conference on},
  year = {1996},
  pages = {330--335},
  doi = {http://dx.doi.org/10.1109/HIPC.1996.565843}
}
Du Mouza, C., Rigaux, P. & Scholl, M. Parameterized pattern queries 2007 Data & Knowledge Engineering   article DOIURL  
Abstract: We introduce parameterized pattern queries as a new paradigm to extend traditional pattern expressions over sequence databases. A parameterized pattern is essentially a string made of constant symbols or variables where variables can be matched against any symbol of the input string.

Parameterized patterns allow a concise and expressive definition of regular expressions that would be very complex to describe without variables. They can also be used to express additional constraints, to “relax” pattern expressions by allowing more freedom, and finally to cluster patterns in order to minimize the number of symbols’ comparisons.

BibTeX:
@article{dumouza2007ppq,
  author = {Du Mouza, C. and Rigaux, P. and Scholl, M.},
  title = {Parameterized pattern queries},
  journal = {Data \& Knowledge Engineering},
  publisher = {Elsevier},
  year = {2007},
  volume = {63},
  number = {2},
  pages = {433--456},
  url = {http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TYX-4NF4DY8-1&_user=10&_coverDate=11%2F30%2F2007&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=31d96ce23e89dce00de84c2a3cc56cd2},
  doi = {http://www.sinab.unal.edu.co:2053/science?_ob=ArticleURL\&_udi=B6TYX-4NF4DY8-1\&_user=1998314\&_coverDate=11%2F30%2F2007\&_alid=649864815\&_rdoc=1\&_fmt=full\&_orig=search\&_cdi=5630\&_sort=d\&_docanchor=\&view=c\&_ct=32\&_acct=C000055778\&_version=1\&_urlVersion=0\&_userid=1998314\&md5=323eeee648d3b4ed09ce315162de7be7}
}
Du, M. W. & Chang, S. C. An Approach to Designing Very Fast Approximate String Matching Algorithms 1994 IEEE Transactions on Knowledge and Data Engineering   article URL  
Abstract: An approach to designing very fast algorithms for approximate string matching in a dictionary is proposed. Multiple spelling errors corresponding to insert, delete, change, and transpose operations on character strings are considered in the fault model. The design of very fast approximate string matching algorithms through a four-step reduction procedure is described. The final and most effective step uses hashing techniques to avoid comparing the given word with words at large distances. The technique has been applied to a library book catalog textbase. The experiments show that performing approximate string matching for a large dictionary in real-time on an ordinary sequential computer under our multiple fault model is feasible.
BibTeX:
@article{DC94,
  author = {M. W. Du and S. C. Chang},
  title = {An Approach to Designing Very Fast Approximate String Matching Algorithms},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {1994},
  volume = {6},
  number = {4},
  pages = {620--633},
  url = {http://portal.acm.org/citation.cfm?id=627599}
}
Dumbill, J. J. E. Programming Web Services with XML-RPC 2001   book  
BibTeX:
@book{Dumbill2001,
  author = {Joe Johnston Edd Dumbill},
  title = {Programming Web Services with XML-RPC},
  publisher = {O'Really},
  year = {2001},
  pages = {230}
}
Eisler, M. J. A Research Report on the Design of an Approximate String Matching Device using the LCS Device Approach 1985 School: Univ. Central Florida   mastersthesis  
BibTeX:
@mastersthesis{Eis85,
  author = {M. J. Eisler},
  title = {A Research Report on the Design of an Approximate String Matching Device using the LCS Device Approach},
  school = {Univ. Central Florida},
  year = {1985}
}
El-Mabrouk, N. & Crochemore, M. Boyer-Moore Strategy to Efficient Approximate String Matching 1996 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size !, that is, m(dlog 2 (k + 1)e + 1) !.
BibTeX:
@inproceedings{EC96,
  author = {N. El-Mabrouk and M. Crochemore},
  title = {{Boyer-Moore} Strategy to Efficient Approximate String Matching},
  booktitle = {Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1996},
  number = {1075},
  pages = {24--38},
  url = {http://citeseer.ist.psu.edu/153747.html}
}
Farach, M. Optimal suffix tree construction with large alphabets 1997 Foundations of Computer Science, 1997. Proceedings., 38th Annual Symposium on   inproceedings DOIURL  
Abstract: The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. Weiner (1973), who introduced the data structure, gave an O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant size alphabet. In the comparison model, there is a trivial ?(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound trivially. For integer alphabets, a substantial gap remains between the known upper and lower bounds, and closing this gap is the main open question in the construction of suffix trees. There is no super-linear lower bound, and the fastest known algorithm was the O(n log n) time comparison based algorithm. We settle this open problem by closing the gap: we build suffix trees in linear time for integer alphabet
BibTeX:
@inproceedings{Farach1997,
  author = {Farach, M.},
  title = {Optimal suffix tree construction with large alphabets},
  booktitle = {Foundations of Computer Science, 1997. Proceedings., 38th Annual Symposium on},
  year = {1997},
  pages = {137--143},
  url = {http://portal.acm.org/citation.cfm?id=796326},
  doi = {http://dx.doi.org/10.1109/SFCS.1997.646102}
}
Farach, M. & Thorup, M. Optimal evolutionary tree comparison by sparse dynamic programming 1994 Foundations of Computer Science, 1994 Proceedings., 35th Annual Symposium on   inproceedings DOI  
Abstract: In computational biology one is often interested in finding the concensus between different evolutionary trees for the same set of species. A popular formalizations is the Maximum Agreement Subtree Problem (MAST) defined as follows: given a set A and two rooted trees &sub>0and &sub>1leaf-labeled by the elements of A, find a maximum cardinality subset B of A such that the restrictions of &sub>0and &sub>1to B are topologically isomorphic. Polynomial time solutions exist, but they rely on a dynamic program with &n nodes-and &n running time. We sparsify this dynamic program and show that MAST is equivalent to Unary Weighted Bipartite Matching (UWBM) modulo an O(nc/sup &log n/) additive overhead. Applying the best bound for UWBM, we get an O(n.5log n) algorithm for MAST. From our sparsification follows an O(nc/sup &log n/)) time algorithm for the special case of bounded degrees. Also here the best previous bound was &n
BibTeX:
@inproceedings{Farach1994,
  author = {Farach, M. and Thorup, M.},
  title = {Optimal evolutionary tree comparison by sparse dynamic programming},
  booktitle = {Foundations of Computer Science, 1994 Proceedings., 35th Annual Symposium on},
  year = {1994},
  pages = {770--779},
  doi = {http://dx.doi.org/10.1109/SFCS.1994.365716}
}
Fashandi, H. & Moghaddam, A. A new rotation invariant similarity measure for trajectories 2005 Computational Intelligence in Robotics and Automation, 2005. CIRA 2005. Proceedings. 2005 IEEE International Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Fashandi2005,
  author = {Fashandi, H. and Moghaddam, A.M.E.},
  title = {A new rotation invariant similarity measure for trajectories},
  booktitle = {Computational Intelligence in Robotics and Automation, 2005. CIRA 2005. Proceedings. 2005 IEEE International Symposium on},
  year = {2005},
  pages = {631--634},
  doi = {http://dx.doi.org/10.1109/CIRA.2005.1554347}
}
Ferragina, P. & Manzini, G. Opportunistic data structures with applications 2000 Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on   inproceedings DOIURL  
Abstract: We address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because text T[1,u] is stored using O(Hk (T))+o(1) bits per input symbol in the worst case, where Hk (T) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P[1,p], the opportunistic data structure allows to search for the occurrences of P in T in O(p+occlog ϵu) time (for any fixed ϵ>0). If data are uncompressible we achieve the best space bound currently known (Grossi and Vitter, 2000); on compressible data our solution improves the succinct suffix array of (Grossi and Vitter, 2000) and the classical suffix tree and suffix array data structures either in space or in query time or both. We also study our opportunistic data structure in a dynamic setting and devise a variant achieving effective search and update time bounds. Finally, we show how to plug our opportunistic data structure into the Glimpse tool (Manber and Wu, 1994). The result is an indexing tool which achieves sublinear space and sublinear query time complexity.
BibTeX:
@inproceedings{Ferragina2000,
  author = {Ferragina, P. and Manzini, G.},
  title = {Opportunistic data structures with applications},
  booktitle = {Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on},
  year = {2000},
  pages = {390--398},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=892127},
  doi = {http://dx.doi.org/10.1109/SFCS.2000.892127}
}
Fredriksson, K. & Grabowski, S. Efficient Bit-parallel Algorithms for $($-matching. 2006 Proc. 5th Workshop on Efficient and Experimental Algorithms (WEA'06)   inproceedings URL  
Abstract: We consider the following string matching problem. Pattern p0 p1 p2 ... pm–1 (?,?)-matches the text substring $t_i_0 t_i_1 t_i_2 ldots t_i_m-1$ , if $|p_j-t_i_j|leq for j ? 0,..., m–1, where 0 < ij+1 – ij ? ? + 1. The task is then to find all text positions im–1 that (?,?)-match the pattern. For a text of length n, the best previously known algorithms for this string matching problem run in time O(nm) and in time O(n?m?/w?), where the former is based on dynamic programming, and the latter on bit-parallelism with w bits in computer word (32 or 64 typically). We improve these to take $O(nn/wrceil m)$ and O(n ?m log(?)/w?), respectively, worst case time using bit-parallelism. On average the algorithms run in $O(n/wrceilalphasigman)$ and O(n) time. Our experimental results show that the algorithms work extremely well in practice. Our algorithms handle general gaps as well, having important applications in computational biology.
BibTeX:
@inproceedings{FGwea06,
  author = {K. Fredriksson and Sz. Grabowski},
  title = {Efficient Bit-parallel Algorithms for $(\delta,\alpha)$-matching.},
  booktitle = {Proc. 5th Workshop on Efficient and Experimental Algorithms (WEA'06)},
  publisher = {Springer--Verlag},
  year = {2006},
  pages = {170--181},
  url = {http://www.springerlink.com/content/p223923610k2k457/}
}
Fredriksson, K. & Grabowski, S. Efficient algorithms for ($, $, $)-matching 2006 Proceedings of PSC'06   inproceedings URL  
Abstract: We propose new algorithms for (?, ?, ?)-matching. In this string matching problem we are given a pattern P=p0p1...pm-1 and a text T=t0t1...tn-1 over some integer alphabet ? = 0...?-1. The pattern symbol pi matches the text symbol tj iff |pi-tj| ? ?. The pattern P (?,?)-matches some text substring tj...tj+m-1 iff for all i it holds that |pi-tj+i| ? ? and ?|pi-tj+i| ? ?. Finally, in (?, ?, ?)-matching we also permit at most ? length gaps (text substrings) between each matching text symbol. The only known previous algorithm runs in O(mn) time. We give several algorithms that improve the average case up to O(n) for small ?, and the worst case to O(minmn,|M|?) or O(mn log (?/w)), where M = (i,j); |pi-tj| ? ? and w is the number of bits in a machine word. We conclude with experimental results showing that the algorithms are very efficient in practice.
BibTeX:
@inproceedings{fredriksson:eadelta,
  author = {Fredriksson, K. and Grabowski, S.},
  title = {Efficient algorithms for ($\delta$, $\gamma$, $\alpha$)-matching},
  booktitle = {Proceedings of PSC'06},
  year = {2006},
  url = {http://www.stringology.org/event/2006/p05.html}
}
Fredriksson, K. & Grabowski, S. Practical and optimal string matching 2005 Lecture notes in computer science   article URL  
Abstract: We develop a new exact bit-parallel string matching algorithm, based on the Shift-Or algorithm (Baeza-Yates & Gonnet, 1992). Assuming that the pattern representation fits into a single computer word, this algorithm has optimal O(n log? m / m) average running time, as well as optimal O(n) worst case running time, where n, m and ? are the sizes of the text, the pattern, and the alphabet, respectively. We also study several implementation details. The experimental results show that our algorithm is the fastest in most of the cases where it can be applied, displacing even the long-standing BNDM (Navarro & Raffinot, 2000) family of algorithms. Finally, we show how to adapt our techniques for the Shift-Add algorithm (Baeza-Yates & Gonnet, 1992), obtaining optimal time for searching under Hamming distance.
BibTeX:
@article{fredriksson:pao,
  author = {Fredriksson, K. and Grabowski, S.},
  title = {Practical and optimal string matching},
  journal = {Lecture notes in computer science},
  publisher = {Springer},
  year = {2005},
  volume = {3772},
  pages = {376--387},
  url = {http://www.springerlink.com/content/y5l41h5503w372wj/}
}
Fredriksson, K. & Mozgovoy, M. Efficient parameterized string matching 2006 Information Processing Letters   article DOIURL  
Abstract: In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t. We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns.
Review: Este articulo da una descripcion del objetivo y aplicaciones de la busqueda parametrizada y presenta dos algoritmos para resolver el problema. Uno de ellos basado en Shift-OR y el otro en backward trie matching. Finalmente hace una comparacion entre los resultados.
BibTeX:
@article{1226010,
  author = {Kimmo Fredriksson and Maxim Mozgovoy},
  title = {Efficient parameterized string matching},
  journal = {Information Processing Letters},
  publisher = {Elsevier North-Holland, Inc.},
  year = {2006},
  volume = {100},
  number = {3},
  pages = {91--96},
  url = {http://portal.acm.org/citation.cfm?id=1226010#},
  doi = {http://dx.doi.org/10.1016/j.ipl.2006.06.009}
}
Fu, K. J. Approximate Pattern Matching in Directed Graphs 1996 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: Pattern matching in directed graphs is useful in many areas including type systems, functional languages, regular tree expressions, cyclic term graph rewriting systems and machine translation. We investigate in this paper the problem of directed graph pattern matching allowing some mismatches in labels. Two algorithms for computing the distance between ordered labeled rooted directed graphs, which is the central part of approximate pattern matching, are presented: the first algorithm computes the distance between two graphs P and T in time O(¦E P ¦ ¦V T ¦) and space O(¦E T ¦+¦E P ¦ ¦V T ¦). It is as fast as the best solution for determining whether a directed graph matches another. The second algorithm computes the distance between every subgraph of P and every subgraph of T in time O(¦E P ¦ ¦V T ¦ (¦V P ¦+¦V T ¦)) and space O(¦V P ¦ ¦V T ¦ (¦V P ¦+¦V T ¦)). It is the first solution for this problem.
BibTeX:
@inproceedings{Fu96,
  author = {K. J. Fu},
  title = {Approximate Pattern Matching in Directed Graphs},
  booktitle = {Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1996},
  number = {1075},
  pages = {373--383},
  url = {http://www.springerlink.com/content/p18p63um87m681ph/}
}
Fu, L. W. X. Data Mining with Computational Intelligence 2005   book  
BibTeX:
@book{Fu2005,
  author = {Lipo Wang Xiuju Fu},
  title = {Data Mining with Computational Intelligence},
  publisher = {Springer},
  year = {2005},
  pages = {276}
}
Galil, Z. & Giancarlo, R. Data structures and algorithms for approximate string matching 1988 Journal of Complexity   article URL  
BibTeX:
@article{GG88,
  author = {Z. Galil and R. Giancarlo},
  title = {Data structures and algorithms for approximate string matching},
  journal = {Journal of Complexity},
  year = {1988},
  volume = {4},
  number = {1},
  pages = {33--72},
  url = {http://portal.acm.org/citation.cfm?id=49441.49444&dl=GUIDE&dl=ACM}
}
Galil, Z. & Park, K. An improved algorithm for approximate string matching 1990 SIAM Journal on Computing   article URL  
Abstract: Given a text string, a pattern string, and an integer $k$, a new algorithm for finding all occurrences of the pattern string in the text string with at most $k$ differences is presented. Both its theoretical and practical variants improve upon the known algorithms.
BibTeX:
@article{GP90b,
  author = {Z. Galil and K. Park},
  title = {An improved algorithm for approximate string matching},
  journal = {SIAM Journal on Computing},
  year = {1990},
  volume = {19},
  number = {6},
  pages = {989--999},
  url = {http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=SMJCAT000019000006000989000001&idtype=cvips&gifs=yes}
}
Garcia, T., Myoupo, J. & Seme, D. A coarse-grained multicomputer algorithm for the longest common subsequence problem 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Garcia2003,
  author = {Garcia, T. and Myoupo, J.-F. and Seme, D.},
  title = {A coarse-grained multicomputer algorithm for the longest common subsequence problem},
  booktitle = {Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on},
  year = {2003},
  pages = {349--356},
  doi = {http://dx.doi.org/10.1109/EMPDP.2003.1183610}
}
Giegerich, R., Kurtz, S., Hischke, F. & Ohlebusch, E. A General Technique to Improve Filter Algorithms for Approximate String Matching 1997 Proceedings of the 4th South American Workshop on String Processing (WSP’97)   inproceedings URL  
Abstract: Approximate string matching searches for occurrences of a pattern in a text, where a certain number of character differences (errors) is allowed. Fast methods use filters: A fast preprocessing phase determines regions of the text where a match cannot occur; only the remaining text regions must be scrutinized by the slower approximate matching algorithm. Such filters can be very effective, but they (naturally) degrade at a critical error threshold. We introduce a general technique to improve the efficiency of filters and hence to push out further this critical threshold value. Our technique intermittently reevaluates the possibility of a match in a given region. It combines precise information about the region already scanned with filtering information about the region yet to be searched. We apply this technique to four approximate string matching algorithms published by Chang & Lawler and Sutinen & Tarhio.
BibTeX:
@inproceedings{GKHO97,
  author = {R. Giegerich and S. Kurtz and F. Hischke and E. Ohlebusch},
  title = {A General Technique to Improve Filter Algorithms for Approximate String Matching},
  booktitle = {Proceedings of the 4th South American Workshop on String Processing (WSP’97)},
  publisher = {#cup#},
  year = {1997},
  pages = {38--52},
  url = {http://citeseer.ist.psu.edu/giegerich97general.html}
}
Grossi, R. A fast VLSI solution for the approximate string matching 1992 Integration, the VLSI Journal   article URL  
BibTeX:
@article{Gro92,
  author = {R. Grossi},
  title = {A fast {VLSI} solution for the approximate string matching},
  journal = {Integration, the VLSI Journal},
  year = {1992},
  volume = {13},
  number = {2},
  pages = {195--206},
  url = {http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=146803}
}
Guimarães, K. & Melo, J. String alignment and approximate matching 1996 Third South American Workshop on String Processing   inproceedings  
BibTeX:
@inproceedings{GM96b,
  author = {K. Guimar{\~a}es and J. Melo},
  title = {String alignment and approximate matching},
  booktitle = {Third South American Workshop on String Processing},
  publisher = {#cup#},
  year = {1996},
  pages = {85--100}
}
Gusfield, D. Suffix trees (and relatives) come of age in bioinformatics 2002 Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society   inproceedings DOIURL  
Abstract: Summary form only given. In the past, several things limited the wider application of suffix trees: large memory requirements; limited locality of reference; the conceptual difficulty of the algorithms; and lack of available code; lack of general exposure in the bioinformatics community (and even the computer science community) to suffix trees. Much has changed since 1997. Suffix trees and close relatives are now widely taught in graduate level courses on computer algorithms and on bioinformatics; there are several good expositions on suffix tree algorithms and uses; the space requirements have been substantially reduced; machine memories have greatly increased; additional variants of suffix trees have been introduced that address some of their deficiencies; and suffix tree code is publicly available. As a result, and to some extent as a cause, there are now many more applications in bioinformatics of suffix trees and related data structures. The author addresses the wider uses of suffix trees in bioinformatics.
BibTeX:
@inproceedings{Gusfield2002,
  author = {Gusfield, D.},
  title = {Suffix trees (and relatives) come of age in bioinformatics},
  booktitle = {Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society},
  year = {2002},
  pages = {3},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1039321},
  doi = {http://dx.doi.org/10.1109/CSB.2002.1039321}
}
Gusfield, D. Algorithms on Strings, Trees and Sequences: computer science and computational biology 1997   book URL  
BibTeX:
@book{gusfield1997ast,
  author = {Gusfield, D.},
  title = {Algorithms on Strings, Trees and Sequences: computer science and computational biology},
  publisher = {Cambridge University Press},
  year = {1997},
  url = {http://books.google.com.co/books?hl=es&lr=&id=STGlsyqtjYMC&oi=fnd&pg=PR13&ots=khU9uuK5x4&sig=FgigZ_nhGkV3m36crViNQYD9n3M}
}
Gustavo Alonso, H. K. V. M. Web Services: Concepts, Architectures and Applications 2003   book  
Abstract: Like many other incipient technologies, Web services are still surrounded by a tremendous level of noise. This noise results from the always dangerous combination of wishful thinking on the part of research and industry and of a lack of clear understanding of how Web services came to be. On the one hand, multiple contradictory interpretations are created by the many attempts to realign existing technology and strategies with Web services. On the other hand, the emphasis on what could be done with Web services in the future often makes us lose track of what can be really done with Web services today and in the short term. These factors make it extremely difficult to get a coherent picture of what Web services are, what they contribute, and where they will be applied. Alonso and his co-authors deliberately take a step back. Based on their academic and industrial experience with middleware and enterprise application integration systems, they describe the fundamental concepts behind the notion of Web services and present them as the natural evolution of conventional middleware, necessary to meet the challenges of the Web and of B2B application integration. Rather than providing a reference guide or a "how to write your first Web service" kind of book, they discuss the main objectives of Web services, the challenges that must be faced to achieve them, and the opportunities that this novel technology provides. Established, as well as recently proposed, standards and techniques (e.g., WSDL, UDDI, SOAP, WS-Coordination, WS-Transactions, and BPEL), are then examined in the context of this discussion in order to emphasize their scope, benefits, and shortcomings. Thus, the book is ideallysuited both for professionals considering the development of application integration solutions and for research and students interesting in understanding and contributing to the evolution of enterprise application technologies.
BibTeX:
@book{Gustavo2003,
  author = {Gustavo Alonso, Fabio Casati, Harumi Kuno, Vijay Machiraju},
  title = {Web Services: Concepts, Architectures and Applications},
  publisher = {Springer},
  year = {2003}
}
Habegger, M. Web services for information extraction from the Web 2004 Web Services, 2004. Proceedings. IEEE International Conference on   article DOI  
Abstract: Extracting information from the Web is a complex task with different components which can either be generic or specific to the task, going from downloading a given page, following links, querying a Web-based applications via an HTML form and the HTTP protocol, querying a Web service via the SOAP protocol, etc. Therefore building Web services which proceed to executing an information tasks can not be simply hard coded (i.e. written and compiled once and for all in a given programming language). In order to be able to build flexible information extraction Web Services we need to be able to compose different sub tasks together. We propose a, XML-based language to describe information extraction Web services as the compositions of existing Web services and specific functions. The usefulness the proposed framework is demonstrated by three real world applications. (1) Search engines: we show how to describe a task which queries Google's Web service, retrieves more information on the results by querying their respective HTTP servers, and filters them according to this information. (2) E-commerce sites : an information extraction Web service giving access to an existing HTML-based e-commerce online application such as Amazon is built. (3) Patent extraction: a last example shows how to describe an information extraction Web service which allows to query a Web-based application, extract the set of result links, follow them, and extract the needed information on the result pages. In all three applications the generated description can be easily modified and completed to further respond the user's needs and create value-added Web services.
BibTeX:
@article{Habegger2004,
  author = {Habegger, B. Quafafou, M.},
  title = {Web services for information extraction from the Web},
  journal = {Web Services, 2004. Proceedings. IEEE International Conference on},
  year = {2004},
  pages = {279 - 286},
  doi = {http://dx.doi.org/10.1109/ICWS.2004.1314749}
}
Hall, P. A. V. & Dowling, G. R. Approximate string matching 1980 ACM Computing Surveys   article URL  
Abstract: Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For sinuiarity problems difference measures are surveyed, with a full description of the wellestablmhed dynamic programming method relating this to the approach using probabilities and likelihoods. Searches for approximate matches in large sets using a difference function are seen to be an open problem still, though several promising ideas have been suggested. Approximate matching (error correction) during parsing is briefly reviewed.
BibTeX:
@article{HD80,
  author = {P. A. V. Hall and G. R. Dowling},
  title = {Approximate string matching},
  journal = {ACM Computing Surveys},
  year = {1980},
  volume = {12},
  number = {4},
  pages = {381--402},
  url = {http://portal.acm.org/citation.cfm?id=356830&dl=GUIDE,}
}
He, D. Using Suffix Tree to Discover Complex Repetitive Patterns in DNA Sequences 2006 Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE   inproceedings DOIURL  
Abstract: The discovery of repetitive patterns is a fundamental problem in bioinformatics. It remains a challenging open problem because most of the existing methods, such as using annotated repeat database and extracting pairs of maximum repeated regions, can not give a correct definition incorporating both the length and frequency factors of the repetitive patterns. There is an algorithm considering both the pattern length and frequency. However, it could only find the simple "elementary" repeats and is not able to reveal the complex structure of the repetitive patterns. Furthermore, its time complexity O(n2f), where n is the length of the sequence, f is the minimum frequency requirement, could be still too high for long DNA sequences. In this paper, we propose a novel algorithm using suffix tree to reveal the complex structure of the repetitive patterns in DNA sequences. We show that our algorithm achieves an O(n2f2 ) time complexity
BibTeX:
@inproceedings{He2006,
  author = {Dan He},
  title = {Using Suffix Tree to Discover Complex Repetitive Patterns in DNA Sequences},
  booktitle = {Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE},
  year = {2006},
  pages = {3474--3477},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=4029323},
  doi = {http://dx.doi.org/10.1109/IEMBS.2006.260445}
}
Herbordt, J., Model, J., Gu, Y., Sukhwani, B. & VanCourt, T. Single Pass, BLAST-Like, Approximate String Matching on FPGAs 2006 Field-Programmable Custom Computing Machines, 2006. FCCM '06. 14th Annual IEEE Symposium on   inproceedings DOIURL  
Abstract: Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic programming- (DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at streaming rate (110 Maa/sec on a VP70 for query sizes up to 600 and 170 Maa/sec on a Virtex4 for query sizes up to 1024), and with no preprocessing other than loading the query string. Further, they use very high sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations
BibTeX:
@inproceedings{Herbordt2006,
  author = {Herbordt, J.C. and Model, J. and Yongfeng Gu and Sukhwani, B. and VanCourt, T.},
  title = {Single Pass, BLAST-Like, Approximate String Matching on FPGAs},
  booktitle = {Field-Programmable Custom Computing Machines, 2006. FCCM '06. 14th Annual IEEE Symposium on},
  year = {2006},
  pages = {217--226},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4020910},
  doi = {http://dx.doi.org/10.1109/FCCM.2006.64}
}
de la Higuera, C. & Casacuberta, F. Topology of strings: Median string is NP-complete 2000 Theoretical Computer Science   article URL  
Abstract: Given a set of strings, the problem of finding a string that minimises its distance to the set is directly related with problems frequently encountered in areas involving Pattern Recognition or Computational Biology. Based on the Levenshtein (or edit) distance, different definitions of distances between a string and a set of strings can be adopted. In particular, if this definition is the sum of the distances to each string of the set
BibTeX:
@article{higuera00topology,
  author = {C. de la Higuera and F. Casacuberta},
  title = {Topology of strings: {Median} string is {NP}-complete},
  journal = {Theoretical Computer Science},
  year = {2000},
  volume = {230},
  number = {1--2},
  pages = {39--48},
  url = {citeseer.ist.psu.edu/delahiguera00topology.html}
}
Holsti, N. & Sutinen, E. Approximate string matching using $q$-gram places 1994 Proc. 7th Finnish Symposium on Computer Science   inproceedings URL  
BibTeX:
@inproceedings{HS94,
  author = {N. Holsti and E. Sutinen},
  title = {Approximate string matching using $q$-gram places},
  booktitle = {Proc. 7th Finnish Symposium on Computer Science},
  year = {1994},
  pages = {23--32},
  url = {http://www.citeulike.org/user/romano/article/558631}
}
Holub, J. Dynamic programming for reduced NFAs for approximate string and sequence matching 1998 Kybernetika, 2002   inproceedings URL  
Abstract: Approximate string and sequence matching is a problem of searching for all occurrences of a pattern (string or sequence) in some text, where the pattern can occur with some limited number of errors given by edit distance.Several methods were designed for the approximate string matching that simulate nondeterministic finite automata (NFA) constructed for this problem. This paper presents reduced NFAs for the approximate string matching usable in case, when we are interested only in occurrences having edit distance less than or equal to a given integer, but we are not interested in exact edit distance of each found occurrence. Then an algorithm based on the dynamic programming that simulates these reduced NFAs is presented. It is also presented how to use this algorithm for the approximate sequence matching.
BibTeX:
@inproceedings{Hol98b,
  author = {J. Holub},
  title = {Dynamic programming for reduced {NFAs} for approximate string and sequence matching},
  booktitle = {Kybernetika, 2002},
  year = {1998},
  pages = {73--82},
  note = {Collaborative Report DC--98--06},
  url = {http://cat.inist.fr/?aModele=afficheN&cpsidt=13867022}
}
Holub, J. Simulation of NFA in Approximate String and Sequence Matching 1997 Proceedings of the Prague Stringology Club Workshop 97   inproceedings URL  
Abstract: We present detailed description of simulation of nondeterministic finite automata (NFA) for approximate string matching. This simulation uses bit parallelism and used algorithm is called Shift-Or algorithm. Using knowledge of simulation of NFA by Shift-Or algorithm we design modification of Shift-Or algorithm for approximate string matching using generalized Levenshtein distance and modification for exact and approximate sequence matching.
BibTeX:
@inproceedings{Hol97,
  author = {J. Holub},
  title = {Simulation of {NFA} in Approximate String and Sequence Matching},
  booktitle = {Proceedings of the Prague Stringology Club Workshop 97},
  year = {1997},
  pages = {39--46},
  note = {Collaborative Report DC--97--03},
  url = {http://www.stringology.org/event/1997/p5.html}
}
Holub, J. Reduced nondeterministic finite automata for approximate string matching 1996 The Prague Stringology Club Workshop '96   inproceedings URL  
Abstract: We will show how to reduce the number of states of nondeterministic finite automata for approximate string matching with k mismatches and nondeterministic finite automata for approximate string matching with k differences in the case when we do not need to know how many mismatches or differences are in the found string. Also we will show impact of this reduction on Shift-Or based algorithms.
BibTeX:
@inproceedings{Hol96a,
  author = {J. Holub},
  title = {Reduced nondeterministic finite automata for approximate string matching},
  booktitle = {The Prague Stringology Club Workshop '96},
  year = {1996},
  pages = {19--27},
  note = {Collaborative Report DC--96--10},
  url = {http://www.stringology.org/event/1996/p3.html}
}
Holub, J. Approximate string matching in text 1996 School: Faculty of Electrical Engineering, Czech Technical University, Prague, Czech Republic   mastersthesis  
BibTeX:
@mastersthesis{Hol96b,
  author = {J. Holub},
  title = {Approximate string matching in text},
  school = {Faculty of Electrical Engineering, Czech Technical University, Prague, Czech Republic},
  year = {1996}
}
Holub, J. & Melichar, B. Approximate string matching using factor automata 2000 Theoretical Computer Science   article URL  
Abstract: Given a text T over alphabet ? and a complete index for T constructed using the finite automaton (called the factor automaton or DA WG) accepting all the substrings (factors) of text T. An answer to the query whether a pattern P occurs in text T with k differences is discussed to be done by an algorithm having the time complexity independent on the length of text T. The algorithm searches for the final state of the finite automaton accepting the intersection of languages Fac(T) (the set of all factors of T) and L[k](P) (the set of all strings having at most k differences with respect to pattern P).
BibTeX:
@article{HM2000,
  author = {J. Holub and B. Melichar},
  title = {Approximate string matching using factor automata},
  journal = {Theoretical Computer Science},
  year = {2000},
  volume = {249},
  number = {2},
  pages = {305--311},
  url = {http://portal.acm.org/citation.cfm?id=358331.358342}
}
Holub, J. & Melichar, B. Implementation of Nondeterministic Finite Automata for Approximate Pattern Matching 1999 Lecture notes in computer science   article URL  
Abstract: There are two ways of using the nondeterministic finite automata ( NFA). The first one is the transformation to the equivalent deterministic finite automaton and the second one is the simulation of the run of NFA. In this paper we discuss the second way. We present an overview of the simulation methods that have been found in the approximate string matching. We generalize these simulation methods and form the rules for the usage of these methods.
BibTeX:
@article{HM98b,
  author = {J. Holub and B. Melichar},
  title = {Implementation of Nondeterministic Finite Automata for Approximate Pattern Matching},
  journal = {Lecture notes in computer science},
  publisher = {#svb#},
  year = {1999},
  volume = {1},
  number = {1660},
  pages = {92--99},
  url = {http://www.springerlink.com/content/p7gbrfhebgyqrnaq/}
}
Hu, W., Schmalz, M. & Ritter, G. Image retrieval using the longest approximate common subsequences 1999 Multimedia Computing and Systems, 1999. IEEE International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Hu1999,
  author = {Wen-Chen Hu and Schmalz, M.S. and Ritter, G.X.},
  title = {Image retrieval using the longest approximate common subsequences},
  booktitle = {Multimedia Computing and Systems, 1999. IEEE International Conference on},
  year = {1999},
  volume = {2},
  pages = {730--734vol.2},
  doi = {http://dx.doi.org/10.1109/MMCS.1999.778575}
}
Hwang, E. L. C. C. C. On Composing a Reliable Composite Web Service: A Study of Dynamic Web Service Selection 2007 Web Services, 2007. ICWS 2007. IEEE International Conference on   article DOI  
Abstract: Dynamic web service selection refers to determining a subset of component web services to be invoked so as to orchestrate a composite web service. Previous work in web service selection usually assumes the invocations of web service operations to be independent of one another. This assumption however does not hold in practice as both the composite and component web services often impose some orderings on the invocation of their operations. Such orderings constrain the selection of component web services to orchestrate the composite web service. We therefore propose to use finite state machine (FSM) to model the invocation order of web service operations. We define a measure, called aggregated reliability, to measure the probability that a given state in the composite web service will lead to successful execution in the context where each component web service may fail with some probability. We show that the computation of aggregated reliabilities is equivalent to eigenvector computation. The power method is hence adopted to efficiently derive aggregated reliabilities. In orchestrating a composite web service, we propose two strategies to select component web services that are likely to successfully complete the execution of a given sequence of operations. Our experiments on a synthetically generated set of web service operation execution sequences show that our proposed strategies perform better than the baseline random selection strategy.
BibTeX:
@article{Hwang2007,
  author = {Hwang, San-Yih; Lim, Ee-Peng; Lee, Chien-Hsiang; Chen, Cheng-Hung},
  title = {On Composing a Reliable Composite Web Service: A Study of Dynamic Web Service Selection},
  journal = {Web Services, 2007. ICWS 2007. IEEE International Conference on},
  year = {2007},
  pages = {184-191},
  doi = {http://www.sinab.unal.edu.co:2365/iel5/4279552/4279553/04279598.pdf?tp=\&arnumber=4279598\&isnumber=4279553}
}
Hyyro, H. On using two-phase filtering in indexed approximate string matching with application to searching unique oligonucleotides 2001 String Processing and Information Retrieval, 2001. SPIRE 2001. Proceedings.Eighth International Symposium on   inproceedings URL  
Abstract: We discuss using an indexing scheme to accelerate approximate search over a static text in the case of using unit cost edit distance as the measure of similarity between strings. First we generally consider the filtering criteria that can be used as a basis for the index, and then propose using filtering twice before the final checking phase. The last part consists of presenting an indexed approximate string matching application in bioinformatics, which is the search of unique oligonucleotides. We present practical comparisons and results for using different filtering schemes in this application. Our tests have involved a total of 15 different genomes, from which we present some results involving the largest two of these: The genome of Saccharomyces

erevisiae (baker's yeast) and a recent draft of the human genome, the latter being also the main target of the application.

BibTeX:
@inproceedings{Hyyro2001,
  author = {Hyyro, H.},
  title = {On using two-phase filtering in indexed approximate string matching with application to searching unique oligonucleotides},
  booktitle = {String Processing and Information Retrieval, 2001. SPIRE 2001. Proceedings.Eighth International Symposium on},
  year = {2001},
  pages = {84--95},
  url = {http://citeseer.ist.psu.edu/532133.html}
}
Ian H. Witten, E. F. Data Mining: Practical Machine Learning Tools and Techniques 2005   book  
Abstract: As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more.* Algorithmic methods at the heart of successful data miningincluding tried and true techniques as well as leading edge methods* Performance improvement techniques that work by transforming the input or output* Downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualizationin a new, interactive interface
BibTeX:
@book{Ian2005,
  author = {Ian H. Witten, Eibe
Frank},
  title = {Data Mining: Practical Machine Learning Tools and Techniques},
  publisher = {Academic Press},
  year = {2005},
  pages = {525}
}
Idury, R. & Schaffer, A. Multiple matching of parameterized patterns 1996 Theoretical computer science   article URL  
Abstract: We extend Baker's theory of parameterized pattern matching [Proc. 25th Annual STOC, 1993, pp. 71–80] to algorithms that match multiple patterns in a text. We first consider the case where the patterns are fixed and preprocessed once, and then the case where the pattern set can change by insertions and deletions. Baker's algorithms are based on suffix trees, whereas ours are based on pattern matching automata.
BibTeX:
@article{IS96,
  author = {Idury, R.M. and Schaffer, A.A.},
  title = {Multiple matching of parameterized patterns},
  journal = {Theoretical computer science},
  year = {1996},
  volume = {154},
  number = {2},
  pages = {203--224},
  url = {http://www.springerlink.com/content/p385102m21022625/}
}
Itoh, H. & Tanaka, H. An efficient method for in memory construction of suffix arrays 1999 String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware   inproceedings DOIURL  
Abstract: The suffix array is a string-indexing structure and a memory efficient alternative to the suffix tree. It has many advantages for text processing. We propose an efficient algorithm for sorting suffixes. We call this algorithm the two-stage suffix sort. One of our ideas is to exploit the specific relationships between adjacent suffixes. Our algorithm makes it possible to use the suffix array for much larger texts and suggests new areas of application. Our experiments on several text data sets (including 514-MB Japanese newspapers) demonstrate that our algorithm is 4.5 to 6.9 times faster than Quicksort, and 2.5 to 3.6 times faster than K. Sadakane's (1998) algorithm, which is considered to be the fastest algorithm in previous work
BibTeX:
@inproceedings{Itoh1999,
  author = {Itoh, H. and Tanaka, H.},
  title = {An efficient method for in memory construction of suffix arrays},
  booktitle = {String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware},
  year = {1999},
  pages = {81--88},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=796581},
  doi = {http://dx.doi.org/10.1109/SPIRE.1999.796581}
}
Iverson, W. Real World Web Services 2004   book  
Abstract: The core idea behind Real World Web Services is simple: after years of hype, what are the major players really doing with web services? Standard bodies may wrangle and platform vendors may preach, but at the end of the day what are the technologies that are actually in use, and how can developers incorporate them into their own applications? Those are the answers Real World Web Services delivers. It's a field guide to the wild and wooly world of non-trivial deployed web services. The heart of the book is a series of projects, demonstrating the use and integration of Google, Amazon, eBay, PayPal, FedEx, and many more web services. Some of these vendors have been extremely successful with their web service deployments: for example, eBay processes over a billion web service requests a month! The author focuses on building 8 fully worked out example web applications that incorporate the best web services available today. The book thoroughly documents how to add functionality like automating listings for auctions, dynamically calculating shipping fees, automatically sending faxes to your suppliers, using an aggregator to pull data from multiple news and web service feeds into a single format or monitoring the latest weblog discussions and Google searches to keep web site visitors on top of topics of interest-by integrating APIs from popular websites most people are already familiar with. For each example application, the author provides a thorough overview, architecture, and full working code examples. This book doesn't engage in an intellectual debate as to the correctness of web services on a theological level. Instead, it focuses on the practical, real world usage of web services as the latest evolution in distributed computing, allowing for structured communication via Internet protocols. As you ll see, this includes everything from sending HTTP GET commands to retrieving an XML document through the use of SOAP and various vendor SDKs.
BibTeX:
@book{,
  author = {Will Iverson},
  title = {Real World Web Services},
  publisher = {O'Reilly},
  year = {2004},
  pages = {207}
}
James Snell, P. K. Programming Web Services with SOAP 2002   book  
Abstract: The web services architecture provides a new way to think about and implement application-to-application integration and interoperability that makes the development platform irrelevant. Two applications, regardless of operating system, programming language, or any other technical implementation detail, communicate using XML messages over open Internet protocols such as HTTP or SMTP. The Simple Open Access Protocol (SOAP) is a specification that details how to encode that information and has become the messaging protocol of choice for Web services. "Programming Web Services with SOAP" is a detailed guide to using SOAP and other leading web services standards--WSDL (Web Service Description Language), and UDDI (Universal Description, Discovery, and Integration protocol). You'll learn the concepts of the web services architecture and get practical advice on building and deploying web services in the enterprise. This authoritative book decodes the standards, explaining the concepts and implementation in a clear, concise style. You'll also learn about the major toolkits for building and deploying web services. Examples in Java, Perl, C#, and Visual Basic illustrate the principles. Significant applications developed using Java and Perl on the Apache Tomcat web platform address real issues such as security, debugging, and interoperability. Covered topic areas include: The Web Services Architecture SOAP envelopes, headers, and encodings WSDL and UDDI Writing web services with Apache SOAP and Java Writing web services with Perl's SOAP:: Lite Peer-to-peer (P2P) web services Enterprise issues such as authentication, security, and identity Up-and-comingstandards projects for web services "Programming Web Services with SOAP" provides you with all the information on the standards, protocols, and toolkits you'll need to integrate information services with SOAP. You'll find a solid core of information that will help you develop individual Web services or discover new ways to integrate core business processes across an enterprise.
BibTeX:
@book{,
  author = {James Snell, Doug
Tidwell, Pavel
Kulchenko},
  title = {Programming Web Services with SOAP},
  publisher = {O'Reilly},
  year = {2002},
  pages = {244}
}
Jewell, D. C. T. Java Web Services 2002   book  
BibTeX:
@book{,
  author = {David Chappell Tyler Jewell},
  title = {Java Web Services},
  publisher = {O'Really},
  year = {2002},
  pages = {276}
}
Jiawei Han, M. K. Data Mining: Concepts and Techniques 2001   book  
BibTeX:
@book{,
  author = {Jiawei Han, Micheline Kamber},
  title = {Data Mining: Concepts and Techniques},
  publisher = {Morgan Kaufmann},
  year = {2001},
  pages = {550}
}
Johnson, D. A priority queue in which initialization and queue operations takeO (loglogD) time 1981 Theory of Computing Systems   article URL  
Abstract: Many computer algorithms have embedded in them a subalgorithm called a priority queue which produces on demand an element of extreme priority among elements in the queue. Queues on unrestricted priority domains have a running time of THgr(nlogn) for sequences ofn queue operations. We describe a simple priority queue over the priority domain 1,ctdot,N in which initialization, insertion, and deletion takeO(loglogD) time, whereD is the difference between the next lowest and next highest priority elements in the queue. In the case of initialization,D=THgr(N). Finding a least element, greatest element, and the neighbor in priority order of some specified element take constant time. We also consider dynamic space allocation for the data structures used. Space can be allocated in blocks of size THgr(N 1/p ), for small integerp.
BibTeX:
@article{johnson1981pqi,
  author = {Johnson, D.B.},
  title = {A priority queue in which initialization and queue operations takeO (loglogD) time},
  journal = {Theory of Computing Systems},
  publisher = {Springer},
  year = {1981},
  volume = {15},
  number = {1},
  pages = {295--309},
  url = {http://www.springerlink.com/content/v6t081t8r711x438/}
}
Jokinen, P., Tarhio, J. & Ukkonen, E. A comparison of approximate string matching algorithms 1996 Software Practice and Experience   article URL  
Abstract: Experimental comparisons of the running time of approximate string matching algorithms for the k differences problem are presented. Given a pattern string, a text string, and an integer k, the task is to find all approximate occurrences of the pattern in the text with at most k differences (insertions, deletions, changes). We consider seven algorithms based on different approaches including dynamic programming, Boyer-Moore string matching, suffix automata, and the distribution of characters. It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.
BibTeX:
@article{JTU96,
  author = {P. Jokinen and J. Tarhio and E. Ukkonen},
  title = {A comparison of approximate string matching algorithms},
  journal = {Software Practice and Experience},
  year = {1996},
  volume = {26},
  number = {12},
  pages = {1439--1458},
  url = {http://portal.acm.org/citation.cfm?id=246475&dl=ACM&coll=portal}
}
Jokinen, P. & Ukkonen, E. Two algorithms for approximate string matching in static texts 1991 Proceedings Mathematical Foundations of Computer Science 1991   inproceedings URL  
Abstract: The problem of finding all approximate occurrences P~ of a pattern string P in a text string T such that the edit distance between P and pr is _< k is considered. We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast. Two preprocessing methods and the corresponding search algorithms are described. The first is based suffix automata and is applicable for edit distances with general edit operation costs. The second is a special design for unit cost edit distance and is based on q-gram lists. The preprocessing needs in both cases time and space O(IT]). The search algorithms run in the worst case in time O(IPtlTI) or O(ktTt), and in the best case in time O(IPI).
BibTeX:
@inproceedings{JU91,
  author = {P. Jokinen and E. Ukkonen},
  title = {Two algorithms for approximate string matching in static texts},
  booktitle = {Proceedings Mathematical Foundations of Computer Science 1991},
  publisher = {#svb#},
  year = {1991},
  number = {520},
  pages = {240--248},
  url = {http://www.cs.helsinki.fi/u/ukkonen/MFCS91.pdf}
}
Karkkainen, J., Navarro, G. & Ukkonen, E. Approximate string matching on Ziv-Lempel compressed text 2003 Journal of Discrete Algorithms   article URL  
Abstract: We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions. On LZ78/LZW we need O(mkn + R) time in the worst case and O(k2n + mk min(n, (m?)k) + R) on average where ? is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.
BibTeX:
@article{karkkainen2003asm,
  author = {Karkkainen, J. and Navarro, G. and Ukkonen, E.},
  title = {{Approximate string matching on Ziv-Lempel compressed text}},
  journal = {Journal of Discrete Algorithms},
  publisher = {Elsevier},
  year = {2003},
  volume = {1},
  number = {3-4},
  pages = {313--338},
  url = {http://portal.acm.org/citation.cfm?id=973441}
}
Karloff, H. Fast algorithms for approximately counting mismatches 1993 Information Processing Letters   article URL  
Abstract: Given a text string T ? ?[n] and a pattern string P ? ?[m], for each i = 1, 2,..., n ? m + 1 define f(i) to be the number of mismatches when the pattern is aligned below the text and starts in position i of the text. The fastest known algorithm to compute all the f(i)'s when the alphabet is arbitrary has worst-case time exceeding n?m. For any ? > 0, we give simple randomized and deterministic algorithms that compute g(i) such that f(i) ? g(i) ? f(i)(1 + ?) for all i. The algorithms run in time O((n/?[2]) log[c]m) for a small universal constant c and are easy reductions from the case of arbitrary ? to the case of ? = 0, 1
BibTeX:
@article{Kar93,
  author = {H. Karloff},
  title = {Fast algorithms for approximately counting mismatches},
  journal = {Information Processing Letters},
  year = {1993},
  volume = {48},
  number = {2},
  pages = {53--60},
  url = {http://portal.acm.org/citation.cfm?id=171839.171842}
}
Kim, D. K., Lee, J. S., Park, K. & Cho, Y. Efficient Algorithms for Approximate String Matching with Swaps 1999 Journal of Complexity   article URL  
Abstract: Most research on the edit distance problem and the k-differences problem considered the set of edit operations consisting of changes, insertions, and deletions. In this paper we include the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and we present an O(t min(m, n))-time algorithm for the extended edit distance problem, where t is the edit distance between the given strings, and an O(kn)-time algorithm for the extended k-differences problem. That is, we add swaps into the set of edit operations without increasing the time complexities of previous algorithms that consider only changes, insertions, and deletions for the edit distance and k-differences problems.
BibTeX:
@article{KLPC99,
  author = {D. K. Kim and J. S. Lee and K. Park and Y. Cho},
  title = {Efficient Algorithms for Approximate String Matching with Swaps},
  journal = {Journal of Complexity},
  year = {1999},
  volume = {15},
  pages = {128--147},
  url = {http://portal.acm.org/citation.cfm?id=311299}
}
Kim, J. & Shawe-Taylor, J. An approximate string-matching algorithm 1992 Theoretical Computer Science   article URL  
BibTeX:
@article{KST92,
  author = {J.-Y. Kim and J. Shawe-Taylor},
  title = {An approximate string-matching algorithm},
  journal = {Theoretical Computer Science},
  year = {1992},
  volume = {92},
  number = {1},
  pages = {107--117},
  url = {http://portal.acm.org/citation.cfm?id=135900}
}
Knight, J. R. & Myers, E. W. Approximate regular expression pattern matching with concave gap penalties 1995 Algorithmica   article URL  
Abstract: Given a sequenceA of lengthM and a regular expressionR of lengthP, an approximate regular expression pattern-matching algorithm computes the score of the optimal alignment betweenA and one of the sequencesB exactly matched byR. An alignment between sequencesA=a1a2 ... aM andB=b1b2... bN is a list of ordered pairs, lang(i1,j1), (i2j2), ..., (it,jtt)rang such that ik < ik+1 and jk < jk+1. In this case the alignmentaligns symbols aik and bjk, and leaves blocks of unaligned symbols, orgaps, between them. A scoring schemeS associates costs for each aligned symbol pair and each gap. The alignment's score is the sum of the associated costs, and an optimal alignment is one of minimal score. There are a variety of schemes for scoring alignments. In a concave gap penalty scoring schemeS=delta, w, a function delta(a, b) gives the score of each aligned pair of symbolsa andb, and aconcave function w(k) gives the score of a gap of lengthk. A function w is concave if and only if it has the property that, for allk > 1, w(k + 1) –w(k) lew(k) –w(k –1). In this paper we present an O(MP(logM + log2 P)) algorithm for approximate regular expression matching for an arbitrarydelta and any concavew.
BibTeX:
@article{KM95b,
  author = {J. R. Knight and E. W. Myers},
  title = {Approximate regular expression pattern matching with concave gap penalties},
  journal = {Algorithmica},
  year = {1995},
  volume = {14},
  number = {1},
  pages = {85--121},
  url = {http://www.springerlink.com/content/l5r8735mu11mp216/}
}
Koschke, R., Falke, R. & Frenzel, P. Clone Detection Using Abstract Syntax Suffix Trees 2006 Reverse Engineering, 2006. WCRE '06. 13th Working Conference on   inproceedings DOIURL  
Abstract: Reusing software through copying and pasting is a continuous plague in software development despite the fact that it creates serious maintenance problems. Various techniques have been proposed to find duplicated redundant code (also known as software clones). A recent study has compared these techniques and shown that token-based clone detection based on suffix trees is extremely fast but yields clone candidates that are often no syntactic units. Current techniques based on abstract syntax trees-on the other hand-find syntactic clones but are considerably less efficient. This paper describes how we can make use of suffix trees to find clones in abstract syntax trees. This new approach is able to find syntactic clones in linear time and space. The paper reports the results of several large case studies in which we empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors
BibTeX:
@inproceedings{Koschke2006,
  author = {Koschke, R. and Falke, R. and Frenzel, P.},
  title = {Clone Detection Using Abstract Syntax Suffix Trees},
  booktitle = {Reverse Engineering, 2006. WCRE '06. 13th Working Conference on},
  year = {2006},
  pages = {253--262},
  url = {http://portal.acm.org/citation.cfm?id=1174733},
  doi = {http://dx.doi.org/10.1109/WCRE.2006.18}
}
Krishnamoorthy, K., Maruvada, S. & Balasa, F. Fast evaluation of symmetric-feasible sequence-pairs for analog topological placement 2003 ASIC, 2003. Proceedings. 5th International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Krishnamoorthy2003,
  author = {Krishnamoorthy, K. and Maruvada, S.C. and Balasa, F.},
  title = {Fast evaluation of symmetric-feasible sequence-pairs for analog topological placement},
  booktitle = {ASIC, 2003. Proceedings. 5th International Conference on},
  year = {2003},
  volume = {1},
  pages = {71--74Vol.1},
  doi = {http://dx.doi.org/10.1109/ICASIC.2003.1277493}
}
Kulchenko, D. T. J. S. P. Programming Web Services with SOAP 2001   book  
BibTeX:
@book{Kulchenko2001,
  author = {Doug Tidwell James Snell Pavel Kulchenko},
  title = {Programming Web Services with SOAP},
  publisher = {O'Really},
  year = {2001},
  pages = {216}
}
Kurtz, S. Approximate string searching under weighted edit distance 1996 Proceedings of the 3rd South American Workshop on String Processing   inproceedings URL  
Abstract: Let p 2 Sigma be a string of length m and t 2 Sigma be a string of length n. The approximate string searching problem is to find all approximate matches of p in t having weighted edit distance at most k from p. We present a new method that preprocesses the pattern into a DFA which scans t online in linear time, thereby recognizing all positions in t where an approximate match ends. We show how to reduce the exponential preprocessing effort and propose two practical algorithms
BibTeX:
@inproceedings{kurtz96approximate,
  author = {S. Kurtz},
  title = {Approximate string searching under weighted edit distance},
  booktitle = {Proceedings of the 3rd South American Workshop on String Processing},
  publisher = {Carleton University Press},
  year = {1996},
  pages = {156--170},
  url = {http://citeseer.ist.psu.edu/kurtz96approximate.html}
}
Landau, G. M. & Vishkin, U. Approximate string searching 1997 Pattern matching algorithms   incollection  
BibTeX:
@incollection{LV97,
  author = {G. M. Landau and U. Vishkin},
  title = {Approximate string searching},
  booktitle = {Pattern matching algorithms},
  publisher = {#oxup#},
  year = {1997},
  pages = {185--199}
}
Landau, G. M. & Vishkin, U. Fast parallel and serial approximate string matching 1989 Journal of Algorithms   article URL  
Abstract: Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern, or a superfluous character in the text, or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, serial and parallel algorithms for finding all occurrences of the pattern in the text with at most k differences are presented. For completeness we also describe an efficient algorithm for preprocessing a rooted tree, so that queries requesting the lowest common ancestor of every pair of verticies in the tree can be processed quickly.
BibTeX:
@article{LV89,
  author = {G. M. Landau and U. Vishkin},
  title = {Fast parallel and serial approximate string matching},
  journal = {Journal of Algorithms},
  year = {1989},
  volume = {10},
  number = {2},
  pages = {157--169},
  url = {http://portal.acm.org/citation.cfm?id=67206}
}
Landau, G. M. & Vishkin, U. Efficient parallel and serial approximate string matching 1986   techreport URL  
Abstract: Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences. The first part of the parallel algorithm consists of analysis of the pattern and takes O(log m) time using m 2 processors. The rest of the algorithm consists of handling the text. The text handling part applies the following new approach. This part starts by obtaining a concise characterization of the text which is based solely on substrings of the pattern in O(log m) time using n / log m processors.
BibTeX:
@techreport{LV86b,
  author = {G. M. Landau and U. Vishkin},
  title = {Efficient parallel and serial approximate string matching},
  year = {1986},
  number = {TR-221-U101},
  url = {http://citeseer.ist.psu.edu/landau86efficient.html}
}
Landau, G. M. & Vishkin, U. Introducing efficient parallelism into approximate string matching and a new serial algorithm 1986 Proceedings of the eighteenth annual ACM symposium on Theory of computing   inproceedings URL  
BibTeX:
@inproceedings{LV86c,
  author = {G. M. Landau and U. Vishkin},
  title = {Introducing efficient parallelism into approximate string matching and a new serial algorithm},
  booktitle = {Proceedings of the eighteenth annual ACM symposium on Theory of computing},
  publisher = {#acmp#},
  year = {1986},
  pages = {220--230},
  url = {http://portal.acm.org/citation.cfm?id=12152&dl=}
}
LAROSE, D. T. Data Mining Methods and Models 2006   book  
BibTeX:
@book{LAROSE2006,
  author = {DANIEL T. LAROSE},
  title = {Data Mining Methods and Models},
  publisher = {John Wiley \& Sons, Inc.},
  year = {2006},
  pages = {322}
}
Lee, H. & Ercal, F. RMESH algorithms for parallel string matching 1997 Parallel Architectures, Algorithms, and Networks, 1997. (I-SPAN '97) Proceedings. Third International Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Lee1997,
  author = {Hsi-Chieh Lee and Ercal, F.},
  title = {RMESH algorithms for parallel string matching},
  booktitle = {Parallel Architectures, Algorithms, and Networks, 1997. (I-SPAN '97) Proceedings. Third International Symposium on},
  year = {1997},
  pages = {223--226},
  doi = {http://dx.doi.org/10.1109/ISPAN.1997.645099}
}
Lee, K. K. & Xu, Y. Boundary modeling in human walking trajectory analysis for surveillance 2004 Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Lee2004,
  author = {Ka Keung Lee and Yangsheng Xu},
  title = {Boundary modeling in human walking trajectory analysis for surveillance},
  booktitle = {Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on},
  year = {2004},
  volume = {5},
  pages = {5201--5205Vol.5},
  doi = {http://dx.doi.org/10.1109/ROBOT.2004.1302543}
}
Li, B., Deng, X., Golin, M. J. & Sohraby, K. On the Optimal Placement of Web Proxies in the Internet: The Linear Topology 1998 HPN   inproceedings URL  
Abstract: Let x denote a given nonempty string of length n = jxj. A substring u of x is a cover of x if and only if every position of x lies within an occurrence of u within x. This paper extends the work of Moore and Smyth on computing the covers of a string: our algorithm computes all the covers of every prefix of x in time n). 1 INTRODUCTION Let x denote a nonempty string of length n 1.
BibTeX:
@inproceedings{li98optimal,
  author = {Bo Li and Xin Deng and Mordecai J. Golin and Kazem Sohraby},
  title = {On the Optimal Placement of Web Proxies in the Internet: The Linear Topology},
  booktitle = {{HPN}},
  year = {1998},
  pages = {485-495},
  url = {citeseer.ist.psu.edu/article/li98optimal.html}
}
Lin, H., Lu, M. & Fang, J. An optimal algorithm for the longest common subsequence problem 1991 Parallel and Distributed Processing, 1991. Proceedings of the Third IEEE Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Lin1991,
  author = {Lin, H. and Lu, M. and Fang, J.},
  title = {An optimal algorithm for the longest common subsequence problem},
  booktitle = {Parallel and Distributed Processing, 1991. Proceedings of the Third IEEE Symposium on},
  year = {1991},
  pages = {630--639},
  doi = {http://dx.doi.org/10.1109/SPDP.1991.218203}
}
Lin, Y. A new systolic algorithm for computing longest common subsequences 1992 TENCON '92. Technology Enabling Tomorrow : Computers, Communications and Automation towards the 21st Century. 1992 IEEE Region 10 International Conference.   inproceedings DOI  
BibTeX:
@inproceedings{Lin1992,
  author = {Lin, Y.-C.},
  title = {A new systolic algorithm for computing longest common subsequences},
  booktitle = {TENCON '92. Technology Enabling Tomorrow : Computers, Communications and Automation towards the 21st Century. 1992 IEEE Region 10 International Conference.},
  year = {1992},
  pages = {126--130vol.1},
  doi = {http://dx.doi.org/10.1109/TENCON.1992.271970}
}
Liu, H., Ma, Z., Zhang, L. & Shao, W. Detecting Duplications in Sequence Diagrams Based on Suffix Trees 2006 Software Engineering Conference, 2006. APSEC 2006. 13th Asia Pacific   inproceedings DOIURL  
Abstract: With the popularity of UML and MDA, models are re- placing source code as core artifacts of software devel- opment and maintenance. But duplications in models re- duce modelsýý maintainability and reusability. To address the problem, we should detect duplications first. As an ini- tial step to address the problem, we propose an approach to detect duplications in sequence diagrams. With special pre- processing, we convert 2-dimensional sequence diagrams into a 1-dimensional array. Then we construct a suffix tree of the array. We revise the traditional construction algo- rithm of suffix trees by proposing a special algorithm to detect common prefixes of suffixes. The algorithm ensures that every duplication detected with the suffix tree can be extracted into a separate reusable sequence diagram. With the suffix tree, duplications are found as refactoring candi- dates. With tool support, the proposed approach has been applied to real industrial projects, and the evaluation re- sults suggest that the approach is effective.
BibTeX:
@inproceedings{Liu2006,
  author = {Liu, Hui and Ma, Zhiyi and Zhang, Lu and Shao, Weizhong},
  title = {Detecting Duplications in Sequence Diagrams Based on Suffix Trees},
  booktitle = {Software Engineering Conference, 2006. APSEC 2006. 13th Asia Pacific},
  year = {2006},
  pages = {269--276},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=4137427},
  doi = {http://dx.doi.org/10.1109/APSEC.2006.32}
}
Liu, J. C. L. Design and Implementation of an Extended UDDI Registration Center for Web Service Graph 2007 Web Services, 2007. ICWS 2007. IEEE International Conference on   article DOI  
Abstract: UDDI registration center provides a set of management mechanism for Web services providers to publish their Web service and for Web service consumers to inquire what they needs. It solves the problem of Web service description, discovery and Integration. Web services graph is the semantic index established on the UDDI registration center according to the logical relations between web services [1,2]. It can enhance the Web service discovery efficiency. jUDDI is an open source Java implementation of UDDI specification for Web Services. Based on the research on UDDI and Web service graph, this article takes jUDDI as an example to show how the traditional UDDI registry center can be extended so as to supporting Web service graph.
BibTeX:
@article{Liu2007,
  author = {Liu, Jianxun Liu, Jie Chao, Lian},
  title = {Design and Implementation of an Extended UDDI Registration Center for Web Service Graph},
  journal = {Web Services, 2007. ICWS 2007. IEEE International Conference on},
  year = {2007},
  pages = {1174 - 1175},
  doi = {http://dx.doi.org/10.1109/ICWS.2007.74}
}
Liu, W. & Chen, L. A Parallel Algorithm for Solving LCS of Multiple Bioseqences 2006 Machine Learning and Cybernetics, 2006 International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Liu2006a,
  author = {Wei Liu and Ling Chen},
  title = {A Parallel Algorithm for Solving LCS of Multiple Bioseqences},
  booktitle = {Machine Learning and Cybernetics, 2006 International Conference on},
  year = {2006},
  pages = {4316--4321},
  doi = {http://dx.doi.org/10.1109/ICMLC.2006.259020}
}
Liu, W., Chen, Y., Chen, L. & Qin, L. A Fast Parallel Longest Common Subsequence Algorithm Based on Pruning Rules 2006 Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on   inproceedings DOI  
BibTeX:
@inproceedings{Liu2006,
  author = {Wei Liu and Yixin Chen and Ling Chen and Ling Qin},
  title = {A Fast Parallel Longest Common Subsequence Algorithm Based on Pruning Rules},
  booktitle = {Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on},
  year = {2006},
  volume = {1},
  pages = {27--34},
  doi = {http://dx.doi.org/10.1109/IMSCCS.2006.6}
}
Lodi, E. A Parallel Solution to the Approximate String Matching Problem 1992 The Computer Journal   article URL  
Abstract: The approximate string matching problem (ASMP) consists of finding all the occurrences of a string of characters X of length m in another string Y of length n, m<
BibTeX:
@article{Lod92,
  author = {E. Lodi},
  title = {A Parallel Solution to the Approximate String Matching Problem},
  journal = {The Computer Journal},
  year = {1992},
  volume = {35},
  number = {5},
  pages = {524--526},
  url = {http://portal.acm.org/citation.cfm?id=146121}
}
Lopresti, D. Models and algorithms for duplicate document detection 1999 Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Lopresti1999a,
  author = {Lopresti, D.P.},
  title = {Models and algorithms for duplicate document detection},
  booktitle = {Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on},
  year = {1999},
  pages = {297--300},
  doi = {http://dx.doi.org/10.1109/ICDAR.1999.791783}
}
Lopresti, D. Robust retrieval of noisy text 1996 Research and Technology Advances in Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on   inproceedings DOI  
BibTeX:
@inproceedings{Lopresti1996,
  author = {Lopresti, D.P.},
  title = {Robust retrieval of noisy text},
  booktitle = {Research and Technology Advances in Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on},
  year = {1996},
  pages = {76--85},
  doi = {http://dx.doi.org/10.1109/ADL.1996.502518}
}
Lopresti, D. & Wilfong, G. Cross-domain approximate string matching 1999 String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware   inproceedings DOI  
BibTeX:
@inproceedings{Lopresti1999,
  author = {Lopresti, D. and Wilfong, G.},
  title = {Cross-domain approximate string matching},
  booktitle = {String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware},
  year = {1999},
  pages = {120--127},
  doi = {http://dx.doi.org/10.1109/SPIRE.1999.796586}
}
Luce, G. & Myoupo, J. An efficient linear systolic algorithm for recovering longest common subsequences 1995 Algorithms and Architectures for Parallel Processing, 1995. ICAPP 95. IEEE First ICA/sup 3/PP. IEEE First International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Luce1995,
  author = {Luce, G. and Myoupo, J.F.},
  title = {An efficient linear systolic algorithm for recovering longest common subsequences},
  booktitle = {Algorithms and Architectures for Parallel Processing, 1995. ICAPP 95. IEEE First ICA/sup 3/PP. IEEE First International Conference on},
  year = {1995},
  volume = {1},
  pages = {20--29vol.1},
  doi = {http://dx.doi.org/10.1109/ICAPP.1995.472166}
}
Luczak, T. & Szpankowski, W. A Suboptimal Lossy Data Compression Based on Approximate Pattern Matching 1997 Information Theory, IEEE Transactions on   article URL  
Abstract: A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless Lempel-Ziv (1977) data compression scheme. Among others we consider the typical length of an approximately repeated pattern within the first n positions of a stationary mixing sequence where D percent of mismatches is allowed. We prove that there exists a constant r0(D) such that the length of such an approximately repeated pattern converges in probability to 1/r0(D) log n (pr.) but it almost surely oscillates between 1/r-?(D) log n and 2/r1(D) log n, where r -?(D)>r0(D)>r1(D)/2 are some constants. These constants are natural generalizations of Renyi entropies to the lossy environment. More importantly, we show that the compression ratio of a lossy data compression scheme based on such an approximate pattern matching is asymptotically equal to r0(D). We also establish the asymptotic behavior of the so-called approximate waiting time Nl which is defined as the time until a pattern of length C repeats approximately for the first time. We prove that log Nl/l?r0(D) (pr.) as l??. In general, r0(D)>R(D) where R(D) is the rate distortion function. Thus for stationary mixing sequences we settle in the negative the problem investigated by Steinberg and Gutman by showing that a lossy extension of the Wyner-Ziv (1989) scheme cannot be optimal
BibTeX:
@article{LS97,
  author = {T. Luczak and W. Szpankowski},
  title = {A Suboptimal Lossy Data Compression Based on Approximate Pattern Matching},
  journal = {Information Theory, IEEE Transactions on},
  year = {1997},
  volume = {43},
  pages = {1439--1451},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=623143}
}
Luczak, T. & Szpankowski, W. A lossy data compression based on an approximate pattern matching 1995 Information Theory, 1995. Proceedings., 1995 IEEE International Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Luczak1995,
  author = {Luczak, T. and Szpankowski, W.},
  title = {A lossy data compression based on an approximate pattern matching},
  booktitle = {Information Theory, 1995. Proceedings., 1995 IEEE International Symposium on},
  year = {1995},
  pages = {80},
  doi = {http://dx.doi.org/10.1109/ISIT.1995.531182}
}
Luo, Q. Advancing Knowledge Discovery and Data Mining 2008 Knowledge Discovery and Data Mining, 2008. WKDD 2008. International Workshop on   article DOI  
Abstract: Knowledge discovery and data mining have become areas of growing significance because of the recent increasing demand for KDD techniques, including those used in machine learning, databases, statistics, knowledge acquisition, data visualization, and high performance computing. Knowledge discovery and data mining can be extremely beneficial for the field of Artificial Intelligence in many areas, such as industry, commerce, government, education and so on. The relation between Knowledge and Data Mining, and Knowledge Discovery in Database (KDD) process are presented in the paper. Data mining theory, Data mining tasks, Data Mining technology and Data Mining challenges are also proposed. This is an belief abstract for an invited talk at the workshop.
BibTeX:
@article{Luo2008,
  author = {Luo, Qi},
  title = {Advancing Knowledge Discovery and Data Mining},
  journal = {Knowledge Discovery and Data Mining, 2008. WKDD 2008. International Workshop on},
  year = {2008},
  pages = {3 - 5},
  doi = {http://dx.doi.org/10.1109/WKDD.2008.153}
}
Magnani, M. & Montesi, D. Integration of Patent and Company Databases 2007 Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International   inproceedings DOI  
BibTeX:
@inproceedings{Magnani2007,
  author = {Magnani, Matteo and Montesi, Danilo},
  title = {Integration of Patent and Company Databases},
  booktitle = {Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International},
  year = {2007},
  pages = {163--171},
  doi = {http://dx.doi.org/10.1109/IDEAS.2007.4318101}
}
Makinen, V., Navarro, G. & Ukkonen, E. Transposition invariant string matching 2005 Journal of Algorithms   article URL  
Abstract: Given strings A = a1a2...am and B = b1b2...bn over an alphabet ? ? U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is mint?Ud(A + t, B), where A + t = (a1 + t)(a2 + t)...(am + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, including Hamming distance, longest common sabsequence (LCS), Levenshtein distance, and their versions where the exact matching condition is replaced by an approximate one. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance, and for some we achieve these upper bounds. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search. As a byproduct, we give improved sparse dynamic programming algorithms to compute LCS and Levenshtein distance.
BibTeX:
@article{makinen2005tis,
  author = {Makinen, V. and Navarro, G. and Ukkonen, E.},
  title = {Transposition invariant string matching},
  journal = {Journal of Algorithms},
  publisher = {Academic Press, Inc. Duluth, MN, USA},
  year = {2005},
  volume = {56},
  number = {2},
  pages = {124--153},
  url = {http://portal.acm.org/citation.cfm?id=1096305.1096308}
}
Manber, U. & Myers, G. Suffix arrays: a new method for on-line string searches 1993 SIAM Journal on ComputingSIAM Journal on Computing   article URL  
Abstract: A new and conceptually simple data structure, called a sufsuc array, for on-line string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel aIgorithms. The main advantage of suffix arrays over suffix trees is that they are three to five times more space efficient. Suffix arrays permit on-line string searches of the type, “Is W a substring of A?” to be answered in time 0 (Z’ +- IogN), where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees. The only drawback is that in those instances where the underlying alphabet is finite and small, suffix trees can be constructed in 0 (N) time in the worst-case versus 0 (NlogN) time for suffix arrays. We show, however, that suffix arrays can be constructed in 0 (N) expected time, regardless of the alphabet size. We believe that suffix arrays will prove to be better

in practice than suffix trees for many applications.

BibTeX:
@article{MM93,
  author = {U. Manber and G. Myers},
  title = {Suffix arrays: a new method for on-line string searches},
  journal = {SIAM Journal on ComputingSIAM Journal on Computing},
  year = {1993},
  volume = {22},
  number = {5},
  pages = {935--948},
  url = {http://portal.acm.org/citation.cfm?id=159918.159921}
}
Manber, U. & Wu, S. Approximate string-matching with arbitrary costs for text and hypertext 1992 Proc. of the IAPR International Workshop on Structural and Syntactic Pattern Recognition   inproceedings  
BibTeX:
@inproceedings{MW92,
  author = {U. Manber and S. Wu},
  title = {Approximate string-matching with arbitrary costs for text and hypertext},
  booktitle = {Proc. of the IAPR International Workshop on Structural and Syntactic Pattern Recognition},
  year = {1992},
  pages = {22--33}
}
Manber, U. & Wu, S. Some assembly required: Approximate pattern matching 1992 Byte Magazine   article URL  
BibTeX:
@article{MW92b,
  author = {U. Manber and S. Wu},
  title = {Some assembly required: Approximate pattern matching},
  journal = {Byte Magazine},
  year = {1992},
  volume = {17},
  number = {12},
  pages = {281--292},
  url = {http://www.math.utah.edu/pub/tex/bib/toc/byte1990.html}
}
Manber, U. & Wu, S. An algorithm for approximate string matching with non uniform costs 1989   techreport URL  
BibTeX:
@techreport{MW89,
  author = {U. Manber and S. Wu},
  title = {An algorithm for approximate string matching with non uniform costs},
  year = {1989},
  number = {TR-89-19},
  url = {http://www.amazon.com/algorithm-approximate-string-matching-uniform/dp/B00071Y0M6}
}
Marshall, K. Web Services on Rails 2006   book  
BibTeX:
@book{Marshall2006,
  author = {Kevin Marshall},
  title = {Web Services on Rails},
  publisher = {O'Reilly},
  year = {2006},
  pages = {32}
}
Martinek, T., Fucik, O., Beck, P. & Lexa, M. Automatic generation of circuits for approximate string matching 2007 Design and Diagnostics of Electronic Circuits and Systems, 2007. DDECS '07. IEEE   inproceedings DOI  
BibTeX:
@inproceedings{Martinek2007,
  author = {Martinek, Tomas and Fucik, Otto and Beck, Patrik and Lexa, Matej},
  title = {Automatic generation of circuits for approximate string matching},
  booktitle = {Design and Diagnostics of Electronic Circuits and Systems, 2007. DDECS '07. IEEE},
  year = {2007},
  pages = {1--6},
  doi = {http://dx.doi.org/10.1109/DDECS.2007.4295281}
}
Martinek, T., Korenek, J., Fucik, O. & Lexa, M. A flexible technique for the automatic design of approximate string matching architectures 2006 Design and Diagnostics of Electronic Circuits and systems, 2006 IEEE   inproceedings  
BibTeX:
@inproceedings{Martinek2006,
  author = {Martinek, T. and Korenek, J. and Fucik, O. and Lexa, M.},
  title = {A flexible technique for the automatic design of approximate string matching architectures},
  booktitle = {Design and Diagnostics of Electronic Circuits and systems, 2006 IEEE},
  year = {2006},
  pages = {81--82}
}
Matsumoto, T., Kida, T., Takeda, M., Shinohara, A. & Arikawa, S. Bit-parallel approach to approximate string matching in compressed texts 2000 String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on   inproceedings DOI  
Abstract: Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)/spl les/L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k/sup 2/(/spl par//spl Dscr//spl par/+|/spl Sscr/|)+km) time using O(k/sup 2//spl par//spl Dscr//spl par/) space, where /spl par//spl Dscr//spl par/ is the size of dictionary /spl Dscr/ and |/spl Sscr/| is the number of tokens in the sequence /spl Sscr/. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=/spl par//spl Dscr//spl par/+|/spl Sscr/| as the compressed length, the time and space complexities are O(k/sup 2/n+km) and O(k/sup 2/n), respectively. For general k and m, they become O(k/sup 3/mn/L+km) and O(k/sup 3/mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Ka/spl uml/rkka/spl uml/inen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(/spl radic/L).
BibTeX:
@inproceedings{Matsumoto2000,
  author = {Matsumoto, T. and Kida, T. and Takeda, M. and Shinohara, A. and Arikawa, S.},
  title = {Bit-parallel approach to approximate string matching in compressed texts},
  booktitle = {String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on},
  year = {2000},
  pages = {221--228},
  doi = {http://dx.doi.org/10.1109/SPIRE.2000.878198}
}
Melichar, B. Space complexity of linear time approximate string matching 1996 The Prague Stringology Club Workshop '96   inproceedings URL  
Abstract: Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. Nondeterministic finite automata are constructed for string matching with k mismatches and k differences. The corresponding deterministic finite automata are base for approximate string matching in linear time. Then the space complexity of both types of deterministic automata is calculated. Moreover, reduced versions of nondeterministic automata are taken into account and the space complexity of their deterministic equivalents is calculated.
BibTeX:
@inproceedings{Mel96a,
  author = {B. Melichar},
  title = {Space complexity of linear time approximate string matching},
  booktitle = {The Prague Stringology Club Workshop '96},
  year = {1996},
  pages = {28--36},
  note = {Collaborative Report DC--96--10},
  url = {http://www.stringology.org/event/1996/p4.html}
}
Melichar, B. String matching with k differences by finite automata 1996 Pattern Recognition, 1996., Proceedings of the 13th International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Melichar1996,
  author = {Melichar, B.},
  title = {String matching with k differences by finite automata},
  booktitle = {Pattern Recognition, 1996., Proceedings of the 13th International Conference on},
  year = {1996},
  volume = {2},
  pages = {256--260vol.2},
  doi = {http://dx.doi.org/10.1109/ICPR.1996.546828}
}
Melichar, B. Approximate String Matching by Finite Automata 1995 Proceedings of the 6th International Conference on Computer Analysis of Images and Patterns   inproceedings URL  
Abstract: Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. A nondeterministic finite automaton is constructed for string matching with k mismatches. It is shown, how ldquodynamic programmingrdquo and ldquoshift- andrdquo based algorithms simulate this nondeterministic finite automaton. The corresponding deterministic finite automaton have $$O(m^k + 1 )$$ states, where m is the length of the pattern and k is the number of mismatches. The time complexity of algorithms based on such deterministic finite automaton is $$O(n)$$ , where n is the length of text.
BibTeX:
@inproceedings{Mel95,
  author = {B. Melichar},
  title = {Approximate String Matching by Finite Automata},
  booktitle = {Proceedings of the 6th International Conference on Computer Analysis of Images and Patterns},
  publisher = {#svb#},
  year = {1995},
  number = {970},
  pages = {342--349},
  url = {http://www.springerlink.com/content/p5005q851n58472m/}
}
Michailidis, P. & Margaritis, K. Implementation of a programmable array processor architecture for approximate string matching algorithms on FPGAs 2006 Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International   inproceedings DOI  
BibTeX:
@inproceedings{Michailidis2006,
  author = {Michailidis, P.D. and Margaritis, K.G.},
  title = {Implementation of a programmable array processor architecture for approximate string matching algorithms on FPGAs},
  booktitle = {Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International},
  year = {2006},
  pages = {4pp.},
  doi = {http://dx.doi.org/10.1109/IPDPS.2006.1639474}
}
Michailidis, P. & Margaritis, K. A programmable array processor architecture for flexible approximate string matching algorithms 2005 Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on   inproceedings DOI  
BibTeX:
@inproceedings{Michailidis2005,
  author = {Michailidis, P.D. and Margaritis, K.G.},
  title = {A programmable array processor architecture for flexible approximate string matching algorithms},
  booktitle = {Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on},
  year = {2005},
  pages = {201--209},
  doi = {http://dx.doi.org/10.1109/ICPPW.2005.15}
}
Miyake, S., Tahsato, Y. & Matsuda, H. An application of a pathway alignment method to comparative analysis between genome and pathways 2002 Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society   inproceedings DOI  
BibTeX:
@inproceedings{Miyake2002,
  author = {Miyake, S. and Tahsato, Y. and Matsuda, H.},
  title = {An application of a pathway alignment method to comparative analysis between genome and pathways},
  booktitle = {Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society},
  year = {2002},
  pages = {329},
  doi = {http://dx.doi.org/10.1109/CSB.2002.1039356}
}
Modha, D. Codelet parsing: quadratic-time, sequential, adaptive algorithms for lossy compression 2003 Data Compression Conference, 2003. Proceedings. DCC 2003   inproceedings DOI  
BibTeX:
@inproceedings{Modha2003,
  author = {Modha, D.S.},
  title = {Codelet parsing: quadratic-time, sequential, adaptive algorithms for lossy compression},
  booktitle = {Data Compression Conference, 2003. Proceedings. DCC 2003},
  year = {2003},
  pages = {223--232},
  doi = {http://dx.doi.org/10.1109/DCC.2003.1194013}
}
Monostori, K., Zaslavsky, A. & Schmidt, H. Efficiency of data structures for detecting overlaps in digital documents 2001 Computer Science Conference, 2001. ACSC 2001. Proceedings. 24th Australasian   inproceedings DOIURL  
Abstract: This paper analyses the efficiency of different data structures for detecting overlap in digital documents. Most existing approaches use some hash function to reduce the space requirements for their indices of chunks. Since a hash function can produce the same value for different chunks, false matches are possible. In this paper we propose an algorithm that can be used for eliminating those false matches. This algorithm uses a suffix tree structure, which is space consuming. We define a modified suffix tree that only considers chunks starting at the beginning of words and we show how the algorithm can work on this structure. We can alternatively reduce space requirements of a suffix tree by converting it to a directed acyclic graph. We show that suffix link information can be preserved in this new structure and the matching statistics algorithm still works with those modifications that we propose.
BibTeX:
@inproceedings{Monostori2001,
  author = {Monostori, K. and Zaslavsky, A. and Schmidt, H.},
  title = {Efficiency of data structures for detecting overlaps in digital documents},
  booktitle = {Computer Science Conference, 2001. ACSC 2001. Proceedings. 24th Australasian},
  year = {2001},
  pages = {140--147},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=906635},
  doi = {http://dx.doi.org/10.1109/ACSC.2001.906635}
}
Mori, T. & Stolfo, S. J. Approximate String Matching on the DADO2 Parallel Computer 1988   techreport  
BibTeX:
@techreport{MS88,
  author = {T. Mori and S. J. Stolfo},
  title = {Approximate String Matching on the {DADO}2 Parallel Computer},
  year = {1988},
  number = {CUCS-361-88}
}
Mukherjee, A. Determining longest common subsequences of two sequences on a linear array of processors 1992 Application Specific Array Processors, 1992. Proceedings of the International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Mukherjee1992,
  author = {Mukherjee, A.},
  title = {Determining longest common subsequences of two sequences on a linear array of processors},
  booktitle = {Application Specific Array Processors, 1992. Proceedings of the International Conference on},
  year = {1992},
  pages = {526--537},
  doi = {http://dx.doi.org/10.1109/ASAP.1992.218547}
}
Myers, G. A fast bit-vector algorithm for approximate string matching based on dynamic programming 1999 Journal of the ACM (JACM)   article URL  
Abstract: The approximate string matching problem is to find all locations at which a query of lengthm matches a substring of a text of length n with k-or-fewer differences. Simple and practical bit-vector algorithms have been designed for this problem, most notably the one used in agrep. These algorithms compute a bit representation of the current state-set of the k-difference automaton for the query, and asymptotically run in either O(nm/w) or O(nm log &sgr;/w) time where w is the word size of the machine (e.g., 32 or 64 in practice), and &sgr; is the size of the pattern alphabet. Here we present an algorithm of comparable simplicity that requires only O(nm/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the problem. Thus, the algorithm's performance is independent of k, and it is found to be more efficient than the previous results for many choices of k and smallm. Moreover, because the algorithm is not dependent on k, it can be used to rapidly compute blocks of the dynamic programming matrix as in the 4-Russians algorithm of Wu et al.(1996). This gives rise to an O(kn/w) expected-time algorithm for the case where m may be arbitrarily large. In practice this new algorithm, that computes a region of the dynamic progr amming (d.p.) matrx w entries at a time using the basic algorithm as a subroutine is significantly faster than our previous 4-Russians algorithm, that computes the same region 4 or 5 entries at a time using table lookup. This performance improvement yields a code that is either superior or competitive with all existing algorithms except for some filtration algorithms that are superior when k/m is sufficiently small.
BibTeX:
@article{Mye99,
  author = {G. Myers},
  title = {A fast bit-vector algorithm for approximate string matching based on dynamic programming},
  journal = {Journal of the ACM (JACM)},
  year = {1999},
  volume = {46},
  number = {3},
  pages = {395--415},
  url = {http://portal.acm.org/citation.cfm?id=316550}
}
Mäkinen, V., Navarro, G. & Ukkonen, E. Matching Numeric Strings under Noise   misc URL  
Abstract: Numeric string is a sequence of symbols from an alphabet Sigma U, where U is some numerical universe closed under addition and subtraction.
BibTeX:
@misc{makinen-matching,
  author = {Veli M{\"a}kinen and Gonzalo Navarro and Esko Ukkonen},
  title = {Matching Numeric Strings under Noise},
  url = {citeseer.ist.psu.edu/647848.html}
}
Mäkinen, V., Navarro, G. & Ukkonen, E. Transposition Invariant String Matching   misc URL  
Abstract: Given strings A and B over an alphabet &Sigma; &sube; U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min_t&isin;U d(A+t, B), where A+t = (a_1+t)(a_2+t)... (a_m+t).
BibTeX:
@misc{makinen-transposition,
  author = {Veli M{\"a}kinen and Gonzalo Navarro and Esko Ukkonen},
  title = {Transposition Invariant String Matching},
  url = {citeseer.ist.psu.edu/700489.html}
}
Nandan Babu, K. & Saxena, S. Parallel algorithms for the longest common subsequence problem 1997 High Performance Computing, 1997. Proceedings. Fourth International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Nandan1997,
  author = {Nandan Babu, K. and Saxena, S.},
  title = {Parallel algorithms for the longest common subsequence problem},
  booktitle = {High Performance Computing, 1997. Proceedings. Fourth International Conference on},
  year = {1997},
  pages = {120--125},
  doi = {http://dx.doi.org/10.1109/HIPC.1997.634481}
}
Nasser, S., Vert, G., Nicolescu, M. & Murray, A. Multiple Sequence Alignment using Fuzzy Logic 2007 Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on   inproceedings  
BibTeX:
@inproceedings{Nasser2007,
  author = {Nasser, S. and Vert, G.L. and Nicolescu, M. and Murray, A.},
  title = {Multiple Sequence Alignment using Fuzzy Logic},
  booktitle = {Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on},
  year = {2007},
  pages = {304--311}
}
Navarro, G. A guided tour to approximate string matching 2001 ACM Computing Surveys (CSUR)   article URL  
Abstract: We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.
BibTeX:
@article{navarro2001gta,
  author = {Navarro, G.},
  title = {A guided tour to approximate string matching},
  journal = {ACM Computing Surveys (CSUR)},
  publisher = {ACM Press New York, NY, USA},
  year = {2001},
  volume = {33},
  number = {1},
  pages = {31--88},
  url = {http://portal.acm.org/citation.cfm?id=375365}
}
Navarro, G. Improved Approximate Pattern Matching on Hypertext 2000 Theoretical Computer Science   article URL  
Abstract: The problem of approximate pattern matching on hypertext is defined and solved by Amir et al. in time, where m is the length of the pattern, n is the total text size and e is the total number of edges. Their space complexity is O(mn). We present a new algorithm which is O(mk(n+e)) time and needs only O(n) extra space, where k
BibTeX:
@article{Nav2000,
  author = {G. Navarro},
  title = {Improved Approximate Pattern Matching on Hypertext},
  journal = {Theoretical Computer Science},
  year = {2000},
  volume = {237},
  number = {1--2},
  pages = {455--463},
  url = {http://www.springerlink.com/content/r9dnj0c11v5r92hg/}
}
Navarro, G. Very fast and simple approximate string matching 1999 Information Processing Letters   article URL  
Abstract: We improve the fastest known algorithm for approximate string matching, which can be used only for low error levels. By using a new method to verify potential matches and a new optimization technique for biased texts (such as English), the algorithm also becomes the fastest one for medium error levels. This includes most of the interesting cases in this area.
BibTeX:
@article{Nav99,
  author = {G. Navarro},
  title = {Very fast and simple approximate string matching},
  journal = {Information Processing Letters},
  year = {1999},
  volume = {72},
  number = {1--2},
  pages = {65--70},
  url = {http://citeseer.ist.psu.edu/410789.html}
}
Navarro, G. A Partial Deterministic Automaton for Approximate String Matching 1997 Proceedings 4th South American Workshop on String Processing (WSP’97)   inproceedings URL  
Abstract: One of the simplest approaches to approximate string matching is to consider the associated non-deterministic finite automaton and make it deterministic. Besides automaton generation, the search time is O(n) in the worst case, where n is the text size. This solution is mentioned in the classical literature but has not been further pursued, due to the large number of automaton states that may be generated. We study the idea of generating the deterministic automaton on the fly. That is, we only generate the states that are actually reached when the text is traversed. We show that this limits drastically the number of states actually generated. Moreover, the algorithm is competitive, being the fastest one for intermediate error ratios and pattern lengths.
BibTeX:
@inproceedings{Nav97,
  author = {G. Navarro},
  title = {A Partial Deterministic Automaton for Approximate String Matching},
  booktitle = {Proceedings 4th South American Workshop on String Processing (WSP’97)},
  publisher = {#cup#},
  year = {1997},
  pages = {95--111},
  url = {http://citeseer.ist.psu.edu/148605.html}
}
Navarro, G. Multiple Approximate String Matching by Counting 1997 Proceedings of the 4th South American Workshop on String Processing   inproceedings URL  
Abstract: We present a very simple and efficient algorithm for online multiple approximate string matching. It uses a previously known counting-based filter [9] that searches for a single pattern by quickly discarding uninteresting parts of the text. Our multi-pattern algorithm is based on the simulation of many parallel filters using bits of the computer word. Our average complexity to search r patterns of length m is O(rn log m= log n), being n is the text size. We can search patterns of different length, each one with a different number of errors. We show experimentally that our algorithm is competitive with the fastest known algorithms, being the fastest for a wide range of intermediate error ratios. We give the first average-case analysis of the filtering efficiency

of the counting method, applicable also to [9].

BibTeX:
@inproceedings{Nav97b,
  author = {G. Navarro},
  title = {Multiple Approximate String Matching by Counting},
  booktitle = {Proceedings of the 4th South American Workshop on String Processing},
  publisher = {#cup#},
  year = {1997},
  pages = {95--111},
  url = {http://citeseer.ist.psu.edu/navarro97multiple.html}
}
Navarro, G. & Baeza-Yates, R. A Hybrid Indexing Method for Approximate String Matching 2000 Journal of Discrete Algorithms   article URL  
Abstract: We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is O(n^lambda *log n), for some lambda>0 that depends on the error fraction tolerated alpha and the alphabet size sigma. It is shown that lambda<1 for approximately alpha < 1-e/sqrt(sigma), where e=2.718.... The space required is four times the text size, which is quite moderate for this problem. We experimentally show that this index can outperform by far all the existing alternatives for indexed approximate searching. These are also the first experiments that compare the different existing schemes.
BibTeX:
@article{NBY2000,
  author = {G. Navarro and R. Baeza-Yates},
  title = {A Hybrid Indexing Method for Approximate String Matching},
  journal = {Journal of Discrete Algorithms},
  year = {2000},
  volume = {1},
  number = {1},
  pages = {205--239},
  url = {http://citeseer.ist.psu.edu/455944.html}
}
Navarro, G. & Baeza-Yates, R. A New Indexing Method for Approximate String Matching 1999 10th Annual Symposium, CPM 99, Warwick University, UK, July 1999. Proceedings   inproceedings URL  
Abstract: We present a new indexing method for the approximate string matching problem. The method is based on a suffix tree combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the retrieval time is, for , whenever , where is the error level tolerated and is the alphabet size. We experimentally show that this index outperforms by far all other algorithms for indexed approximate searching, also being the first experiments that compare the different existing schemes. We finally show how this index can be implemented using much less space.
BibTeX:
@inproceedings{NBY99a,
  author = {G. Navarro and R. Baeza-Yates},
  title = {A New Indexing Method for Approximate String Matching},
  booktitle = {10th Annual Symposium, CPM 99, Warwick University, UK, July 1999. Proceedings},
  publisher = {#svb#},
  year = {1999},
  number = {1645},
  pages = {163--185},
  url = {http://www.springerlink.com/content/pvbjvdf2nd30kk6n/}
}
Navarro, G. & Baeza-Yates, R. Fast Multi-Dimensional Approximate Pattern Matching 1999 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: We address the problem of approximate string matching in d dimensions, that is, to find a pattern of size md in a text of size nd with at most k < md errors (substitutions, insertions and deletions along any dimension). We use a novel and very flexible error model, for which there exists only an algorithm to evaluate the similarity between two elements in two dimensions at O(m4) time. We extend the algorithm to d dimensions, at O(d!m2d) time and O(d!m2d-1) space. We also give the first search algorithm for such model, which is O(d!mdnd) time and O(d!mdnd-1) space. We show how to reduce the space cost to O(d!3dm2d-1) with little time penalty. Finally, we present the first sublinear-time (on average) searching algorithm (i.e. not all text cells are inspected), which is O(knd/md-1) for , where is the alphabet size. After that error level the filter still remains better than dynamic programming for . These are the first search algorithms for the problem. As side-effects we extend to d dimensions an already proposed algorithm for two-dimensional exact string matching, and we obtain a sublinear-time filter to search in d dimensions allowing k mismatches.
BibTeX:
@inproceedings{NBY99b,
  author = {G. Navarro and R. Baeza-Yates},
  title = {Fast Multi-Dimensional Approximate Pattern Matching},
  booktitle = {Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1999},
  number = {1645},
  pages = {243--257},
  url = {http://www.springerlink.com/content/qnjpl0whecahh24y/}
}
Navarro, G. & Baeza-Yates, R. Improving an algorithm for approximate pattern matching 1998 Algorithmica   article URL  
Abstract: We study a recent algorithm for fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on a very fast kernel which is able to search short patterns using a nondeterministic finite automaton, which is simulated using bit-parallelism. A number of techniques to extend this kernel for longer patterns are presented in that work. However, the techniques can be integrated in many ways and the optimal interplay among them is by no means obvious.

The solution to this problem starts at a very low level, by obtaining basic probabilistic information about the problem which was not previously known, and ends integrating analytical results with empirical data to obtain the optimal heuristic. The conclusions obtained via analysis are experimentally confirmed. We also improve many of the techniques and obtain a combined heuristic which is faster than the original work.

This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.

BibTeX:
@article{NBY98,
  author = {G. Navarro and R. Baeza-Yates},
  title = {Improving an algorithm for approximate pattern matching},
  journal = {Algorithmica},
  year = {1998},
  volume = {1},
  number = {TR/DCC-98-5},
  pages = {1},
  url = {http://www.springerlink.com/content/jgycnkarave3hf2x/}
}
Navarro, G., Baeza-Yates, R., Sutinen, E. & Tarhio, J. Indexing Methods for Approximate String Matching 2001 IEEE Data Engineering Bulletin   article URL  
Abstract: Indexing for approximate text searching is a novel problem receiving much attention because of its applications in signal processing, computational biology and text retrieval, to name a few. We classify most indexing methods in a taxonomy that helps understand their essential features. We show that the existing methods, rather than completely different as they are regarded, form a range of solutions whose optimum is usually somewhere in between.
BibTeX:
@article{navarro2001ima,
  author = {Navarro, G. and Baeza-Yates, R.A. and Sutinen, E. and Tarhio, J.},
  title = {{Indexing Methods for Approximate String Matching}},
  journal = {IEEE Data Engineering Bulletin},
  year = {2001},
  volume = {24},
  pages = {19--27},
  url = {http://citeseer.ist.psu.edu/navarro00indexing.html}
}
Navarro, G., Kida, T., Takeda, M., Shinohara, A. & Arikawa, S. Faster Approximate String Matching over Compressed Text 2001 dcc   article DOIURL  
Abstract: Approximate string matching on compressed text was a problem open during almost a decade. The two existing solutions are very recent. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. In this paper we present a different approach, which reduces the problem to multi-pattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.
BibTeX:
@article{10.1109/DCC.2001.917177,
  author = {Gonzalo Navarro and Takuya Kida and Masayuki Takeda and Ayumi Shinohara and Setsuo Arikawa},
  title = {Faster Approximate String Matching over Compressed Text},
  journal = {dcc},
  publisher = {IEEE Computer Society},
  year = {2001},
  volume = {00},
  pages = {0459},
  url = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/dcc/2001/1031/00/1031toc.xml&DOI=10.1109/DCC.2001.917177},
  doi = {http://doi.ieeecomputersociety.org/10.1109/DCC.2001.917177}
}
Navarro, G., Kida, T., Takeda, M., Shinohara, A. & Arikawa, S. Faster approximate string matching over compressed text 2001 Data Compression Conference, 2001. Proceedings. DCC 2001.   inproceedings DOI  
BibTeX:
@inproceedings{Navarro2001,
  author = {Navarro, G. and Kida, T. and Takeda, M. and Shinohara, A. and Arikawa, S.},
  title = {Faster approximate string matching over compressed text},
  booktitle = {Data Compression Conference, 2001. Proceedings. DCC 2001.},
  year = {2001},
  pages = {459--468},
  doi = {http://dx.doi.org/10.1109/DCC.2001.917177}
}
Navarro, G. & Raffinot, M. Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences 2002   book DOIURL  
Abstract: This book presents a practical approach to string matching problems, focusing on the algorithms and implementations that perform best in practice. It covers searching for simple, multiple, and extended strings, as well as regular expressions, exactly and approximately. It includes all of the most significant new developments in complex pattern searching. The clear explanations, step-by-step examples, algorithms pseudo-code, and implementation efficiency maps will enable researchers, professionals, and students in bioinformatics, computer science, and software engineering to choose the most appropriate algorithms for their applications.
BibTeX:
@book{navarro2002fpm,
  author = {Navarro, G. and Raffinot, M.},
  title = {Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences},
  publisher = {Cambridge University Press},
  year = {2002},
  url = {http://books.google.com/books?hl=en&lr=&id=GTTPGJctsMAC&oi=fnd&pg=PA1&ots=N2Asgdp2gl&sig=SH5A0hJlmL4lNyO9GtzuA35ICec#PPP1,M1},
  doi = {http://www3.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521813075}
}
Navarro, G. & Raffinot, M. Flexible Pattern Matching in Strings -- Practical on-line search algorithms for texts and biological sequences 2002   book URL  
BibTeX:
@book{NRbook02,
  author = {Gonzalo Navarro and Mathieu Raffinot},
  title = {Flexible Pattern Matching in Strings -- Practical on-line search algorithms for texts and biological sequences},
  publisher = {Cambridge University Press},
  year = {2002},
  note = {ISBN 0-521-81307-7. 280 pages.},
  url = {http://www.amazon.com/exec/obidos/tg/detail/-/0521813077/qid=1029796460/sr=8-1/ref=sr_8_1/104-5627967-1826302?s=books&n=507846}
}
Navarro, G. & Raffinot, M. Fast and simple character classes and bounded gaps pattern matching, with application to protein searching 2001 Proceedings of the fifth annual international conference on Computational biology   article URL  
Abstract: The problem of fast searching of a pattern that contains Classes of characters and Bounded size Gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK] — x(2, 3) — [DE] — x(2, 3) — Y, where the brackets match any of the letters inside, and x(2, 3) a gap of length between 2 and 3). Currently, the only way to search a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a-priori the fastest between both. We performed many practical experiments using the PROSITE database, and all them show that our algorithms are the fastest in virtually all cases.
BibTeX:
@article{navarro2001fas,
  author = {Navarro, G. and Raffinot, M.},
  title = {Fast and simple character classes and bounded gaps pattern matching, with application to protein searching},
  journal = {Proceedings of the fifth annual international conference on Computational biology},
  publisher = {ACM Press New York, NY, USA},
  year = {2001},
  volume = {1},
  pages = {231--240},
  url = {http://portal.acm.org/citation.cfm?id=369220}
}
Navarro, G. & Raffinot, M. Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata 1998   techreport URL  
Abstract: this paper we merge bit-parallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bit-parallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as Shift-Or. It inherits from Shift-Or the ability to handle flexible patterns and from BDM the ability to skip characters. BNDM is 30%-40% faster than BDM and up to 7 times faster than Shift-Or.
BibTeX:
@techreport{navarro98fast,
  author = {G. Navarro and M. Raffinot},
  title = {Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata},
  year = {1998},
  number = {TR/DC--98--4},
  url = {citeseer.ist.psu.edu/navarro01fast.html}
}
Nesbit, J. The accuracy of approximate string matching algorithms 1986 Journal of Computer-Based Instruction   article URL  
BibTeX:
@article{Nes86,
  author = {J. Nesbit},
  title = {The accuracy of approximate string matching algorithms},
  journal = {Journal of Computer-Based Instruction},
  year = {1986},
  volume = {13},
  number = {3},
  pages = {80--83},
  url = {http://portal.acm.org/citation.cfm?id=15451.15455}
}
Newcomer, E. Understanding Web Services: XML, WSDL, SOAP, and UDDI 2002   book  
Abstract: The expert introduction to Web Services for every decision-maker and developer!-- Covers all of the most important standards for building Web Services: SOAP, UDDI, WSDL, and ebXML.-- By Eric Newcomer, one of the world's leading experts on Web Services development.-- Includes up-to-the-minute coverage of vendor tools and products for building Web Services.In Understanding Web Services, leading Web Services expert Eric Newcomer systematically addresses the core issues developers and IT professionals need to understand to make intelligent decisions about Web Services. Newcomer explains exactly how Web Services work, reviews each key standard for enabling Web Services, and previews tomorrow's most important products and technologies for Web Services development. Newcomer reviews the key goals and advantages of Web Services, the applications they are best suited for, and today's key standards for describing, sending, receiving, publishing, discovering, and utilizing them. He explains how Web Services are being built upon the foundation of XML technologies, then covers each key Web Services standard in detail: SOAP transport, WSDL services description, UDDI discovery services, and ebXML message exchange. Newcomer concludes with insightful, vendor-independent coverage of today's leading tools and products for Web Services development. For every IT manager, architect, developer, and strategist who wants a thorough understanding of Web Services.
BibTeX:
@book{Newcomer2002,
  author = {Eric Newcomer},
  title = {Understanding Web Services: XML, WSDL, SOAP, and UDDI},
  publisher = {Addison-Wesley},
  year = {2002},
  pages = {332}
}
Newling, M. E. J. A. A. A. S. C. P. C. P. K. M. L. T. Patterns: Service-Oriented Architecture and Web Services 2004   book  
BibTeX:
@book{Newling2004,
  author = {Mark Endrei Jenny Ang Ali Arsanjani Sook Chua Philippe Comte Pål Krogdahl Min Luo Tony Newling},
  title = {Patterns: Service-Oriented Architecture and Web Services},
  publisher = {IBM RedBooks},
  year = {2004},
  pages = {348}
}
Ning, K., Ng, H. & Leong, H. Finding Patterns in Biological Sequences by Longest Common Subsequencesand Shortest Common Supersequences 2006 BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Ning2006,
  author = {Ning, K. and Ng, H.K. and Leong, H.W.},
  title = {Finding Patterns in Biological Sequences by Longest Common Subsequencesand Shortest Common Supersequences},
  booktitle = {BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on},
  year = {2006},
  pages = {53--60},
  doi = {http://dx.doi.org/10.1109/BIBE.2006.253315}
}
Noh-sam Park, G. L. Agent-based Web Services Middleware 2003 Global Telecommunications Conference, 2003. GLOBECOM '03. IEEE   article URL  
Abstract: Web services enable the integration of applications in a Web environment. Due to an increased automation of Web service interoperation, intelligent Web services are recommended, such as semantic Web services. In this paper, we first introduce Web services and the concept of intelligent Web services. We then propose an agent-based Web service middleware. We focus on the intelligent Web services platform with existing standards instead of proposing other standards. In particular, we present an agent approach to dynamic Web services, which invokes access points of UDDI registry automatically and returns execution results for Web services. The proposed approach is used to build experimental solutions involving intelligent Web service systems. Finally, we give an example to illustrate a typical scenario in which consumers search Web services using our Web services middleware.
BibTeX:
@article{Noh-sam2003,
  author = {Noh-sam Park, Gil-haeng Lee},
  title = {Agent-based Web Services Middleware},
  journal = {Global Telecommunications Conference, 2003. GLOBECOM '03. IEEE},
  year = {2003},
  volume = {6},
  pages = {3186 - 3190},
  url = {http://www.sinab.unal.edu.co:2365/iel5/8900/28137/01258824.pdf?tp=&arnumber=1258824&isnumber=28137}
}
Nuallain, B. & de Rooij, S. Online suffix trees with counts 2004 Data Compression Conference, 2004. Proceedings. DCC 2004   inproceedings DOIURL  
Abstract: This paper extend Ukkonen's online suffix tree construction algorithm to support substring frequency queries, by adding count fields to the internal nodes of the tree. This has applications in the field of sequential data compression. One major problem is that Ukkonen's online construction algorithm does not maintain explicit end of string markers in the tree. The major part of our work concerns quickly determining where the end markers for a particular edge would be, so that frequencies can be correctly obtained. So a complete characterization of all end markers on leaf edges is given. Furthermore we found that edges between two internal nodes can contain at most one end marker. Using these results, the algorithms are given to update the count fields and do frequency queries correctly. All algorithms have been implemented and tested correct in practice.
BibTeX:
@inproceedings{Nuallain2004,
  author = {Nuallain, B.O. and de Rooij, S.},
  title = {Online suffix trees with counts},
  booktitle = {Data Compression Conference, 2004. Proceedings. DCC 2004},
  year = {2004},
  pages = {555},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1281531},
  doi = {http://dx.doi.org/10.1109/DCC.2004.1281531}
}
Oellermann, W. L. Architecting Web Services 2001   book  
BibTeX:
@book{,
  author = {William L. Oellermann},
  title = {Architecting Web Services},
  publisher = {Apress},
  year = {2001},
  pages = {654}
}
Owolabi, O. & McGregor, D. R. Fast approximate string matching 1988 Software—Practice & Experience   article URL  
Abstract: Approximate string matching is an important operation in information systems because an input string is often an inexact match to the strings already stored. Commonly known accurate methods are computationally expensive as they compare the input string to every entry in the stored dictionary. This paper describes a two-stage process. The first uses a very compact n-gram table to preselect sets of roughly similar strings. The second stage compares these with the input string using an accurate method to give an accurately matched set of strings. A new similarity measure based on the Levenshtein metric is defined for this comparison.
BibTeX:
@article{OMG88,
  author = {O. Owolabi and D. R. McGregor},
  title = {Fast approximate string matching},
  journal = {Software—Practice \& Experience},
  year = {1988},
  volume = {18},
  number = {4},
  pages = {387--393},
  url = {http://portal.acm.org/citation.cfm?id=46220.46226}
}
Pan, C., Yang, K. & Lee, T. Approximate string matching in LDAP based on edit distance 2002 Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM   inproceedings  
BibTeX:
@inproceedings{Pan2002,
  author = {Chi-Chien Pan and Kai-Hsiang Yang and Tzao-Lin Lee},
  title = {Approximate string matching in LDAP based on edit distance},
  booktitle = {Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM},
  year = {2002},
  pages = {222--228}
}
Parida, L. Pattern Discovery in Bioinformatics: Theory & Algorithms (Mathematical and Computational Biology) 2007   book URL  
BibTeX:
@book{parida2007pdb,
  author = {Parida, L.},
  title = {Pattern Discovery in Bioinformatics: Theory \& Algorithms (Mathematical and Computational Biology)},
  publisher = {Chapman \& Hall},
  year = {2007},
  pages = {512},
  note = {ISBN-10: 1584885491},
  url = {http://www.amazon.com/Pattern-Discovery-Bioinformatics-Mathematical-Computational/dp/1584885491}
}
Park, J. Reconfigurable parallel approximate string matching on FPGAs 2005 Digital System Design, 2005. Proceedings. 8th Euromicro Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Park2005,
  author = {Park, J.H.},
  title = {Reconfigurable parallel approximate string matching on FPGAs},
  booktitle = {Digital System Design, 2005. Proceedings. 8th Euromicro Conference on},
  year = {2005},
  pages = {214--217},
  doi = {http://dx.doi.org/10.1109/DSD.2005.66}
}
Park, J. H. & Demirdag, B. High performance pattern matching with dynamic load balancing on heterogeneous systems 2006 Parallel, Distributed, and Network-Based Processing, 2006. PDP 2006. 14th Euromicro International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Park2006,
  author = {Jin Hwan Park and Demirdag, B.A.},
  title = {High performance pattern matching with dynamic load balancing on heterogeneous systems},
  booktitle = {Parallel, Distributed, and Network-Based Processing, 2006. PDP 2006. 14th Euromicro International Conference on},
  year = {2006},
  pages = {6pp.},
  doi = {http://dx.doi.org/10.1109/PDP.2006.41}
}
Park, K. Analysis of Two-Dimensional Approximate Pattern Matching Algorithms 1998 Theoretical Computer Science   article URL  
Abstract: We present a new and more rigorous analysis of the two algorithms for two-dimensional approximate pattern matching due to Kärkkäinen and Ukkonen. We also present modifications of these algorithms that use less space while keeping the same expected time
BibTeX:
@article{Par98,
  author = {K. Park},
  title = {Analysis of Two-Dimensional Approximate Pattern Matching Algorithms},
  journal = {Theoretical Computer Science},
  year = {1998},
  volume = {201},
  number = {1--2},
  pages = {263--273},
  url = {http://portal.acm.org/citation.cfm?id=647815.738439}
}
Paul, W. & Simon, J. Decision trees and random access machines 1980 Logic and algorithmic, int. Symp., Zurich   article  
BibTeX:
@article{paul1980dta,
  author = {Paul, W. and Simon, J.},
  title = {{Decision trees and random access machines}},
  journal = {Logic and algorithmic, int. Symp., Zurich},
  year = {1980},
  volume = {1},
  pages = {1}
}
Ponnalagu, J. R. Aspect-oriented Approach for Non-functional Adaptation of Composite Web Services 2007 Services, 2007 IEEE Congress on   article DOI  
Abstract: We provide a novel approach for specifying and relating non-functional properties for distributed component web services that can be used to adapt a composite web service. Our approach uses distributed aspect-oriented programming (AOP) technology to model an adaptive architecture for web services composition and execution. Existing web service adaptation mechanisms are limited only to the process of web service choreography in terms of web service selection/invocation vis-à-vis pre-specified (Service Level Agreement) SLA constraints. Our system extends this idea by representing the non-functional properties of each web service - composite and component - via AOP. Hence our system models a relation function between the aspects of the composite web service, and the individual aspects of the component web services. This enables mid-flight adaptation of the composite web service - in response to changes in non-functional requirements - via suitable modifications in the individual aspects of the component web service. From the end users' viewpoint, such upfront aspectoriented modeling of non-functional properties enables on-demand composite Web service adaptation with minimal disruption in quality of service.
BibTeX:
@article{Ponnalagu2007,
  author = {Ponnalagu, Karthikeyan Krishnamurthy, Jayatheerthan R.},
  title = {Aspect-oriented Approach for Non-functional Adaptation of Composite Web Services},
  journal = {Services, 2007 IEEE Congress on},
  year = {2007},
  pages = {284 - 291},
  doi = {http://dx.doi.org/10.1109/SERVICES.2007.18}
}
Qianhui Althea Liang Jen-Yao Chung Miller, S. Y. O. Service Pattern Discovery of Web Service Mining in Web Service Registry-Repository 2006 e-Business Engineering, 2006. ICEBE '06. IEEE International Conference on   article DOI  
Abstract: This paper presents and elaborates the concept of Web service usage patterns and pattern discovery through service mining. We define three different levels of service usage data: i) user request level, ii) template level and iii) instance level. At each level, we investigate patterns of service usage data and the discovery of these patterns. An algorithm for service pattern discovery at the template level is presented. We show the system architecture of a service-mining enabled service registry repository. Web service patterns, pattern discovery and pattern mining supports the discovery and composition of complex services, which in turn supports the application of Web services to increasingly complex business processes and applications
BibTeX:
@article{Qianhui2006,
  author = {Qianhui Althea Liang Jen-Yao Chung Miller, S. Yang Ouyang},
  title = {Service Pattern Discovery of Web Service Mining in Web Service Registry-Repository},
  journal = {e-Business Engineering, 2006. ICEBE '06. IEEE International Conference on},
  year = {2006},
  pages = {286 - 293},
  doi = {http://dx.doi.org/10.1109/ICEBE.2006.90}
}
Rahman, C. & Lu, M. A Partition Approach to Find the Length of the Longest Common Subsequence 1993 VLSI Design, 1993. Proceedings. The Sixth International Conference on   inproceedings  
BibTeX:
@inproceedings{Rahman1993,
  author = {Rahman, C.S. and Mi Lu},
  title = {A Partition Approach to Find the Length of the Longest Common Subsequence},
  booktitle = {VLSI Design, 1993. Proceedings. The Sixth International Conference on},
  year = {1993},
  pages = {31--36}
}
Ramanathan, M., Grama, A. & Jagannathan, S. Sieve: A Tool for Automatically Detecting Variations Across Program Versions 2006 Automated Software Engineering, 2006. ASE '06. 21st IEEE/ACM International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Ramanathan2006,
  author = {Ramanathan, M.K. and Grama, A. and Jagannathan, S.},
  title = {Sieve: A Tool for Automatically Detecting Variations Across Program Versions},
  booktitle = {Automated Software Engineering, 2006. ASE '06. 21st IEEE/ACM International Conference on},
  year = {2006},
  pages = {241--252},
  doi = {http://dx.doi.org/10.1109/ASE.2006.61}
}
Ranganathan, N. & Motamarri, R. A VLSI architecture for computing the optimal correspondence of string subsequences 1997 Computer Architecture for Machine Perception, 1997. CAMP '97. Proceedings Fourth IEEE International Workshop on   inproceedings DOI  
BibTeX:
@inproceedings{Ranganathan1997,
  author = {Ranganathan, N. and Motamarri, R.},
  title = {A VLSI architecture for computing the optimal correspondence of string subsequences},
  booktitle = {Computer Architecture for Machine Perception, 1997. CAMP '97. Proceedings Fourth IEEE International Workshop on},
  year = {1997},
  pages = {290--294},
  doi = {http://dx.doi.org/10.1109/CAMP.1997.632070}
}
Ranganathan, N. & Motamarri, R. A VLSI architecture for computing the optimal correspondence of string subsequences 1997 Computer Architecture for Machine Perception, 1997. CAMP '97. Proceedings Fourth IEEE International Workshop on   inproceedings DOI  
BibTeX:
@inproceedings{Ranganathan1997,
  author = {Ranganathan, N. and Motamarri, R.},
  title = {A VLSI architecture for computing the optimal correspondence of string subsequences},
  booktitle = {Computer Architecture for Machine Perception, 1997. CAMP '97. Proceedings Fourth IEEE International Workshop on},
  year = {1997},
  pages = {290--294},
  doi = {http://dx.doi.org/10.1109/CAMP.1997.632070}
}
Rao Kosaraju, S. Faster algorithms for the construction of parameterized suffix trees 1995 Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on   inproceedings DOIURL  
Abstract: Parameterized strings were introduced by Baker to solve the problem of identifying blocks of code that get duplicated in a large software system. Parameter symbols capture the notion of code identity while permitting renaming of variables. The code duplication problem was solved by first constructing a generalized suffix tree for the corresponding parameterized strings. The fastest known generalized suffix tree algorithm has an O(n(|II| + log| capital sigma |)) speed, where n is the length of the input, II is the set of parameters, and capital sigma is the set of fixed symbols. Here an algorithm that has a running time of O(n log|II| log| capital sigma |) is constructed. The algorithm is then improved to another that has a running time of O(n(log|II| + log| capital sigma |)).
BibTeX:
@inproceedings{Rao1995,
  author = {Rao Kosaraju, S.},
  title = {Faster algorithms for the construction of parameterized suffix trees},
  booktitle = {Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on},
  year = {1995},
  pages = {631--638},
  url = {http://portal.acm.org/citation.cfm?id=796302},
  doi = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/\&toc=comp/proceedings/focs/1995/7183/00/7183toc.xml\&DOI=10.1109/SFCS.1995.492664}
}
Ren, Y. & Chang, F. ATTEST: A Testing Toolkit for Validating Software Properties 2007 Software Maintenance, 2007. ICSM 2007. IEEE International Conference on   inproceedings DOI  
Abstract: System-level test automation emulates testers?? interactions with a System Under Test (SUT) to verify system properties. It is usually achieved through writing scripts in scripting languages, such as Perl or Tcl, in order to feeding input to and correlating data from various interfaces of SUT. Test scripts, especially ones requiring thorough results analysis, can easily become complicated and hard to maintain as the software system evolves. ATTEST is a toolkit to address problems in test automation and maintenance. It provides easy-to-use mechanisms for helping testers to write and maintain automated test scripts through describing system behaviors at a high abstract level. It includes a Test Behavior Language (TBL) that uses innovative b>parameterized/font> b>patterns/font> to specify and validate trace-based properties abstractly but precisely. A compiler translates TBL specifications into executable scripts. Initial results show that TBL specifications range from 1/2 to 1/5 the size of their script counterparts and can save up to 5 times effort for developing test scripts. TBL demonstrates greater benefit as the complexity of the validation increases.
BibTeX:
@inproceedings{Ren2007,
  author = {Ren, Yansong and Chang, Fangzhe},
  title = {ATTEST: A Testing Toolkit for Validating Software Properties},
  booktitle = {Software Maintenance, 2007. ICSM 2007. IEEE International Conference on},
  year = {2007},
  pages = {469--472},
  doi = {http://dx.doi.org/10.1109/ICSM.2007.4362660}
}
Reznik, Y. On tries, suffix trees, and universal variable-length-to-block codes 2002 Information Theory, 2002. Proceedings. 2002 IEEE International Symposium on   inproceedings DOIURL  
Abstract: We show how a digital tree (or trie) structure can be used for both parsing and encoding (in a variable-length to block (VB) or variable-length to variable-length (VV) fashion) of sequences of symbols from a stochastic source. As an example, we construct a simple VB code based on a fixed database adaptation model, and derive an asymptotic expression for its average redundancy rate for memoryless sources.
BibTeX:
@inproceedings{Reznik2002,
  author = {Reznik, Y.A.},
  title = {On tries, suffix trees, and universal variable-length-to-block codes},
  booktitle = {Information Theory, 2002. Proceedings. 2002 IEEE International Symposium on},
  year = {2002},
  pages = {123},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1023395},
  doi = {http://dx.doi.org/10.1109/ISIT.2002.1023395}
}
Ristad, E., Yianilos, P., Inc, M. & Princeton, N. Learning string-edit distance 1998 Pattern Analysis and Machine Intelligence, IEEE Transactions on   article URL  
Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes
BibTeX:
@article{ristad1998lse,
  author = {Ristad, ES and Yianilos, PN and Inc, M.T. and Princeton, NJ},
  title = {Learning string-edit distance},
  journal = {Pattern Analysis and Machine Intelligence, IEEE Transactions on},
  year = {1998},
  volume = {20},
  number = {5},
  pages = {522--532},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=682181}
}
Sadakane, K. A fast algorithm for making suffix arrays and for Burrows-Wheeler 1998 Data Compression Conference, 1998. DCC '98. Proceedings   inproceedings DOIURL  
Abstract: We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an array of indexes of suffixes is called a suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the Burrows-Wheeler (see Technical Report 124, Digital SRC Research Report, 1994) transformation in the block sorting text compression, therefore fast sorting algorithms are desired. We compare algorithms for making suffix arrays of Bentley-Sedgewick (see Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, p.360-9, 1997), Andersson-Nilsson (see 35th Symp. on Foundations of Computer Science, p.714-21, 1994) and Karp-Miller-Rosenberg (1972) and making suffix trees of Larsson (see Data Compression Conference, p.190-9, 1996) on the speed and required memory and propose a new algorithm which is fast and memory efficient by combining them. We also define a measure of difficulty of sorting suffixes: average match length. Our algorithm is effective when the average match length of a text is large, especially for large databases
BibTeX:
@inproceedings{Sadakane1998,
  author = {Sadakane, K.},
  title = {A fast algorithm for making suffix arrays and for Burrows-Wheeler},
  booktitle = {Data Compression Conference, 1998. DCC '98. Proceedings},
  year = {1998},
  pages = {129--138},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=672139},
  doi = {http://dx.doi.org/10.1109/DCC.1998.672139}
}
Sadeh, I. Universal compression algorithms based on approximate string matching 1995 Information Theory, 1995. Proceedings., 1995 IEEE International Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Sadeh1995,
  author = {Sadeh, I.},
  title = {Universal compression algorithms based on approximate string matching},
  booktitle = {Information Theory, 1995. Proceedings., 1995 IEEE International Symposium on},
  year = {1995},
  pages = {84},
  doi = {http://dx.doi.org/10.1109/ISIT.1995.531186}
}
Sadeh, I. On approximate string matching 1993 Data Compression Conference, 1993. DCC '93.   inproceedings DOI  
BibTeX:
@inproceedings{Sadeh1993,
  author = {Sadeh, I.},
  title = {On approximate string matching},
  booktitle = {Data Compression Conference, 1993. DCC '93.},
  year = {1993},
  pages = {148--157},
  doi = {http://dx.doi.org/10.1109/DCC.1993.253135}
}
Sagot, M. & Myers, E. W. Identifying Satellites and Periodic Repetitions in Biological Sequences 1998 Journal of Computational Biology   article URL  
Abstract: We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 30-40 base pairs) approximate tandem repeats where copies may di#er up to # = 15-20% from a consensus model of the repeating unit
BibTeX:
@article{sagot98identifying,
  author = {Marie-France Sagot and Eugene W. Myers},
  title = {Identifying Satellites and Periodic Repetitions in Biological Sequences},
  journal = {Journal of Computational Biology},
  year = {1998},
  volume = {5},
  number = {3},
  pages = {539-554},
  url = {citeseer.ist.psu.edu/article/sagot98identifying.html}
}
Sahinalp, S. & Vishkin, U. Efficient approximate and dynamic matching of patterns using a labeling paradigm 1996 Foundations of Computer Science, 1996. Proceedings., 37th Annual Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Sahinalp1996,
  author = {Sahinalp, S.C. and Vishkin, U.},
  title = {Efficient approximate and dynamic matching of patterns using a labeling paradigm},
  booktitle = {Foundations of Computer Science, 1996. Proceedings., 37th Annual Symposium on},
  year = {1996},
  pages = {320--328},
  doi = {http://dx.doi.org/10.1109/SFCS.1996.548491}
}
Sanjiva Weerawarana, F. L. T. S. D. F. F. Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS ... 2005   book  
Abstract: A guide to Web services covers such topics as service orientation, UDDI, transactions, security, BPEL, and WS-MetadataExchange
BibTeX:
@book{Sanjiva2005,
  author = {Sanjiva Weerawarana , Francisco Curbera, Frank Leymann, Tony Storey, Donald F. Ferguson},
  title = {Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS ...},
  publisher = {Prentice Hall PTR},
  year = {2005},
  pages = {416}
}
Sastry, R. & Ranganathan, N. A systolic array for approximate string matching 1993 Computer Design: VLSI in Computers and Processors, 1993. ICCD '93. Proceedings., 1993 IEEE International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Sastry1993,
  author = {Sastry, R. and Ranganathan, N.},
  title = {A systolic array for approximate string matching},
  booktitle = {Computer Design: VLSI in Computers and Processors, 1993. ICCD '93. Proceedings., 1993 IEEE International Conference on},
  year = {1993},
  pages = {402--405},
  doi = {http://dx.doi.org/10.1109/ICCD.1993.393344}
}
Schuller, B., Muller, R., Rigoll, G. & Lang, M. Applying Bayesian belief networks in approximate string matching for robust keyword-based retrieval 2004 Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on   inproceedings  
BibTeX:
@inproceedings{Schuller2004,
  author = {Schuller, B. and Muller, R. and Rigoll, G. and Lang, M.},
  title = {Applying Bayesian belief networks in approximate string matching for robust keyword-based retrieval},
  booktitle = {Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on},
  year = {2004},
  volume = {3},
  pages = {1999--2002Vol.3}
}
Shang, H. & Merrett, T. H. Tries for Approximate String Matching 1996 IEEE Transactions on Knowledge and Data Engineering   article URL  
Abstract: Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers, case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern differs from the document by k substitutions, transpositions, insertions or deletions, have hitherto been carried out only at costs linear in the size of the document. We present a trie-based method whose cost is independent of document size. Our experiments show that this new method significantly outperforms the nearest competitor for k = 0 and k = 1, which are arguably the most important cases. The linear cost (in k) of the other methods begins to catch up, for our small files, only at k = 2. For larger files, complexity arguments indicate that tries will outperform the linear methods for larger values of k. Trie indexes combine suffixes and so are compact in storage. When the text itself does not need to be stored, as in a spelling checker, we even obtain negative overhead: 50% compression. We discuss a variety of applications and extensions, including best match (for spelling checkers), case insensitivity, and limited approximate regular expression matching.
BibTeX:
@article{SM96,
  author = {H. Shang and T. H. Merrett},
  title = {Tries for Approximate String Matching},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {1996},
  volume = {8},
  number = {4},
  pages = {540--547},
  url = {http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/tk/&toc=comp/trans/tk/1996/04/k4toc.xml&DOI=10.1109/69.536247}
}
Shi, F. Fast approximate string matching with $q$-blocks sequences 1996 In Proc. 3rd South American Workshop on String Processing   inproceedings URL  
BibTeX:
@inproceedings{Shi96,
  author = {F. Shi},
  title = {Fast approximate string matching with $q$-blocks sequences},
  booktitle = {In Proc. 3rd South American Workshop on String Processing},
  publisher = {#cup#},
  year = {1996},
  pages = {257--271},
  url = {http://www.citeulike.org/user/romano/article/557766}
}
Short, S. Building XML Web Services for the Microsoft .NET Platform 2002   book  
BibTeX:
@book{Short2002,
  author = {Scott Short},
  title = {Building XML Web Services for the Microsoft .NET Platform},
  publisher = {Microsoft Press},
  year = {2002},
  pages = {388}
}
Shuai, D. An approach to solving the longest common subsequences based on string-coding functional and neural network 1992 Systems Engineering, 1992., IEEE International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Shuai1992,
  author = {Dian-Xun Shuai},
  title = {An approach to solving the longest common subsequences based on string-coding functional and neural network},
  booktitle = {Systems Engineering, 1992., IEEE International Conference on},
  year = {1992},
  pages = {564--567},
  doi = {http://dx.doi.org/10.1109/ICSYSE.1992.236964}
}
Skiena, S. The Algorithm Design Manual 1998   book URL  
BibTeX:
@book{skiena1998adm,
  author = {Skiena, S.S.},
  title = {The Algorithm Design Manual},
  publisher = {TELOS},
  year = {1998},
  url = {http://books.google.com.co/books?hl=es&lr=&id=TrXd-gxPhVYC&oi=fnd&pg=PR9&ots=BwA42az0lO&sig=W8M1wlQkfLEis_R3xiYUlORLuuU#PPR8,M1}
}
Smith, D. & Pierzchala, E. An algorithm and architecture for approximate string matching 1990 Circuits and Systems, 1990., Proceedings of the 33rd Midwest Symposium on   inproceedings DOI  
BibTeX:
@inproceedings{Smith1990,
  author = {Smith, D. and Pierzchala, E.},
  title = {An algorithm and architecture for approximate string matching},
  booktitle = {Circuits and Systems, 1990., Proceedings of the 33rd Midwest Symposium on},
  year = {1990},
  pages = {736--739vol.2},
  doi = {http://dx.doi.org/10.1109/MWSCAS.1990.140825}
}
Smith, D. & Pierzchala, E. An algorithm and architectures for approximate string matching 1990 #proct#33rd Midwest Symposium on Circuits and Systems   inproceedings  
BibTeX:
@inproceedings{SP90,
  author = {D. Smith and E. Pierzchala},
  title = {An algorithm and architectures for approximate string matching},
  booktitle = {#proct#33rd Midwest Symposium on Circuits and Systems},
  year = {1990},
  pages = {736--739}
}
Snav sel, V. Approximate string matching by fuzzy automata 1998 The Prague Stringology Club Workshop '98   inproceedings URL  
Abstract: I explain new ways of construction of search algorithms using fuzzy sets. I define Fuzzy Automata and I discuss the possibilities of use.
BibTeX:
@inproceedings{Sna98,
  author = {V. Sn{\' a}{\v s}el},
  title = {Approximate string matching by fuzzy automata},
  booktitle = {The Prague Stringology Club Workshop '98},
  year = {1998},
  pages = {101},
  note = {Collaborative Report DC--98--06},
  url = {http://www.stringology.org/event/1998/p11.html}
}
Song, T., Xue, Y. & Wang, D. An Algorithm of Large-Scale Approximate Multiple String Matching for Network Security 2006 Communications and Networking in China, 2006. ChinaCom '06. First International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Song2006,
  author = {Tian Song and Yibo Xue and Dongsheng Wang},
  title = {An Algorithm of Large-Scale Approximate Multiple String Matching for Network Security},
  booktitle = {Communications and Networking in China, 2006. ChinaCom '06. First International Conference on},
  year = {2006},
  pages = {1--5},
  doi = {http://dx.doi.org/10.1109/CHINACOM.2006.344838}
}
Stephen, G. String Searching Algorithms 1994   book URL  
BibTeX:
@book{stephen1994ssa,
  author = {Stephen, G.A.},
  title = {String Searching Algorithms},
  publisher = {World Scientific},
  year = {1994},
  url = {http://books.google.com.co/books?hl=es&lr=&id=rfSjZhDxBtUC&oi=fnd&pg=PA1&dq=approximate-string-matching&ots=TlD9U5xW3m&sig=WMz_cCd9_aGXkMQZG1IBl6mc3gI}
}
Steve Graham, T. B. D. D. G. D. Y. N. R. N. Building Web Services with Java: Making Sense of XML,SOAP, WSDL, and UDDI 2001   book  
BibTeX:
@book{Steve2001,
  author = {Steve Graham, Simeon Simeonov, Toufic Boubez, Doug Davis, Glen Daniels, Yuichi Nakamura, Ryo Neyama},
  title = {Building Web Services with Java: Making Sense of XML,SOAP, WSDL, and UDDI},
  publisher = {Sams Publishing},
  year = {2001},
  pages = {600}
}
Sutinen, E. & Tarhio, J. Filtration with $q$-Samples in Approximate String Matching 1996 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: Two filtration schemes are presented for approximate string matching with k differences. In our approach q-samples, which are non-overlapping q-grams, are drawn from the text, and a text area is checked with dynamic programming, if there are enough exact or slightly distorted q-grams of the pattern in the right order in a short sequence of the q-samples. The filtration schemes are applied to searching both in the text itself and in a q-gram index of the text. The results of preliminary experiments support the applicability of the new methods.
BibTeX:
@inproceedings{ST96,
  author = {E. Sutinen and J. Tarhio},
  title = {Filtration with $q$-Samples in Approximate String Matching},
  booktitle = {Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1996},
  number = {1075},
  pages = {50--63},
  url = {http://www.springerlink.com/content/f001g04jxu84px76/}
}
Sutinen, E. & Tarhio, J. On using $q$-gram locations in approximate string matching 1995 Proceedings of the Third Annual European Symposium on Algorithms   inproceedings URL  
BibTeX:
@inproceedings{ST95,
  author = {E. Sutinen and J. Tarhio},
  title = {On using $q$-gram locations in approximate string matching},
  booktitle = {Proceedings of the Third Annual European Symposium on Algorithms},
  publisher = {#svb#},
  year = {1995},
  number = {979},
  pages = {327--340},
  url = {http://portal.acm.org/citation.cfm?id=739616}
}
Takaoka, T. Approximate Pattern Matching with Samples 1994 Proceedings of the 5th International Symposium on Algorithms and Computation   inproceedings URL  
Abstract: We simplify in this paper the algorithm by Chang and Lawler for the approximate string matching problem, by adopting the concept of sampling. We have a more general analysis of expected time with the simplified algorithm for the one-dimensional case under a non-uniform probability distribution, and we show that our method can easily be generalized to the two-dimensional approximate pattern matching problem with sublinear expected time.
BibTeX:
@inproceedings{Tak94,
  author = {T. Takaoka},
  title = {Approximate Pattern Matching with Samples},
  booktitle = {Proceedings of the 5th International Symposium on Algorithms and Computation},
  publisher = {#svb#},
  year = {1994},
  number = {834},
  pages = {236--242},
  url = {http://citeseer.ist.psu.edu/takaoka94approximate.html}
}
Takasu, A. Document filtering for fast approximate string matching of erroneous text 2001 Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Takasu2001,
  author = {Takasu, A.},
  title = {Document filtering for fast approximate string matching of erroneous text},
  booktitle = {Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on},
  year = {2001},
  pages = {916--920},
  doi = {http://dx.doi.org/10.1109/ICDAR.2001.953919}
}
Takasu, A. An approximate string match for garbled text with various accuracy 1997 Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Takasu1997,
  author = {Takasu, A.},
  title = {An approximate string match for garbled text with various accuracy},
  booktitle = {Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on},
  year = {1997},
  volume = {2},
  pages = {957--961vol.2},
  doi = {http://dx.doi.org/10.1109/ICDAR.1997.620652}
}
Tang, X., Tian, R. & Wong, D. Fast evaluation of sequence pair in block placement by longest common subsequence computation 2000 Design, Automation and Test in Europe Conference and Exhibition 2000. Proceedings   inproceedings DOI  
BibTeX:
@inproceedings{Tang2000,
  author = {Xiaoping Tang and Ruiqi Tian and Wong, D.F.},
  title = {Fast evaluation of sequence pair in block placement by longest common subsequence computation},
  booktitle = {Design, Automation and Test in Europe Conference and Exhibition 2000. Proceedings},
  year = {2000},
  pages = {106--111},
  doi = {http://dx.doi.org/10.1109/DATE.2000.840024}
}
Tang, X. & Wong, D. Floorplanning with alignment and performance constraints 2002 Design Automation Conference, 2002. Proceedings. 39th   inproceedings DOI  
BibTeX:
@inproceedings{Tang2002,
  author = {Xiaoping Tang and Wong, D.F.},
  title = {Floorplanning with alignment and performance constraints},
  booktitle = {Design Automation Conference, 2002. Proceedings. 39th},
  year = {2002},
  pages = {848--853},
  doi = {http://dx.doi.org/10.1109/DAC.2002.1012740}
}
Tang, X. & Wong, D. FAST-SP: a fast algorithm for block placement based on sequence pair 2001 Design Automation Conference, 2001. Proceedings of the ASP-DAC 2001. Asia and South Pacific   inproceedings DOI  
BibTeX:
@inproceedings{Tang2001,
  author = {Xiaoping Tang and Wong, D.F.},
  title = {FAST-SP: a fast algorithm for block placement based on sequence pair},
  booktitle = {Design Automation Conference, 2001. Proceedings of the ASP-DAC 2001. Asia and South Pacific},
  year = {2001},
  pages = {521--526},
  doi = {http://dx.doi.org/10.1109/ASPDAC.2001.913361}
}
Tang, X. & Wong, M. On handling arbitrary rectilinear shape constraint 2004 Design Automation Conference, 2004. Proceedings of the ASP-DAC 2004. Asia and South Pacific   inproceedings  
BibTeX:
@inproceedings{Tang2004,
  author = {Tang, X. and Wong, M.D.F.},
  title = {On handling arbitrary rectilinear shape constraint},
  booktitle = {Design Automation Conference, 2004. Proceedings of the ASP-DAC 2004. Asia and South Pacific},
  year = {2004},
  pages = {38--41}
}
Tarhio, J. & Ukkonen, E. Approximate Boyer-Moore string matching 1993 SIAM Journal on Computing   article URL  
Abstract: The Boyer–Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The $k$ mismatches problem is to find all approximate occurrences of a pattern string (length $m$) in a text string (length $n$) with at most $k$ mismatches. The generalized Boyer–Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time $O(kn(1 / (m - k) + (k / c)))$, where $c$ is the size of the alphabet. A related algorithm is developed for the $k$ differences problem, where the task is to find all approximate occurrences of a pattern in a text with $ leqslant k$ differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer–Moore algorithm when $k = 0$.
BibTeX:
@article{TU93,
  author = {J. Tarhio and E. Ukkonen},
  title = {Approximate {Boyer-Moore} string matching},
  journal = {SIAM Journal on Computing},
  year = {1993},
  volume = {22},
  number = {2},
  pages = {243--260},
  url = {http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=SMJCAT000022000002000243000001&idtype=cvips&gifs=yes}
}
Tarhio, J. & Ukkonen, E. Boyer-Moore approach to approximate string matching 1990 Proceedings of the 2nd Scandinavian Workshop on Algorithm Theory   inproceedings URL  
Abstract: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m – k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with le k differences (insertions, deletions, changes).
BibTeX:
@inproceedings{TU90,
  author = {J. Tarhio and E. Ukkonen},
  title = {{Boyer-Moore} approach to approximate string matching},
  booktitle = {Proceedings of the 2nd Scandinavian Workshop on Algorithm Theory},
  publisher = {#svb#},
  year = {1990},
  number = {447},
  pages = {348--359},
  url = {http://www.springerlink.com/content/707704ug186513x8/}
}
Thomas, M. G. G. Modeling of Web services flow 2003 E-Commerce, 2003. CEC 2003. IEEE International Conference on   article  
Abstract: Services such as automatic purchasing, automatic updating of prices, or getting latest information etc, can be provided on the Internet using Web services technology. A client can access these services using the Internet. Web services infrastructure includes some standards, such as simple object access protocol (SOAP), Web services description language (WSDL) and universal description, discovery and integration (UDDI). In this paper we represent distributed Web services by modeling the flow of messages and methods in a Web service transaction. Such a model assists the Web services designer to ensure the correctness of Web flows in terms of deadlock and correct termination of the Web services transaction. WSDL and methods are modeled using Petri Nets. A software tool is implemented for extracting the model from the WSDL description of the Web services flow.
BibTeX:
@article{Thomas2003,
  author = {Thomas, J.P. Thomas, M. Ghinea, G.},
  title = {Modeling of Web services flow},
  journal = {E-Commerce, 2003. CEC 2003. IEEE International Conference on},
  year = {2003},
  pages = {391 - 398}
}
Tiezheng, S. D. Y. G. C. Y. K. Y. N. An effective Web services discovery strategy for Web services composition 2005 Computer and Information Technology, 2005. CIT 2005. The Fifth International Conference on   article DOI  
Abstract: With the popularity of Web services, how to discover suitable Web services to support Web services composition has become a challenge. Traditional keyword search is insufficient due to its lower recall and precision. This paper proposes an effective Web service discovery strategy based on Web services description information. In this paper, the description information for Web services is divided into operation information named as WSDL attributes and profile information, where profile information consists of service function information and service constraint information, and they are represented as general attributes and instance attributes respectively. Based on the information and ontology knowledge, three steps are adopted for discovering Web services. Firstly, according to WSDL attribute, the heterogeneities of service operations are resolved. Secondly, based on general attributes, the Web services to meet the function requirement of customers are found. Thirdly, a matching model is applied based on instance attributes and suitable Web services are obtained and provided for customers. The service discovering approach is simple, available and become highly effective by introducing semantics, moreover, it has been applied to e/spl I.bar/Scope successfully.
BibTeX:
@article{Tiezheng2005,
  author = {Shen Derong Yu Ge Cao Yu Kou Yue Nie Tiezheng},
  title = {An effective Web services discovery strategy for Web services composition},
  journal = {Computer and Information Technology, 2005. CIT 2005. The Fifth International Conference on},
  year = {2005},
  pages = {257 - 263},
  doi = {http://dx.doi.org/10.1109/CIT.2005.68}
}
Ting, A. & Leung, M. Linear layout processing 1998 Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Ting1998,
  author = {Ting, A. and Leung, M.K.H.},
  title = {Linear layout processing},
  booktitle = {Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on},
  year = {1998},
  volume = {1},
  pages = {403--405vol.1},
  doi = {http://dx.doi.org/10.1109/ICPR.1998.711166}
}
Tosic, B. On comprehensive contractual descriptions of Web services 2005 e-Technology, e-Commerce and e-Service, 2005. EEE '05. Proceedings. The 2005 IEEE International Conference on   article DOI  
BibTeX:
@article{Tosic2005,
  author = {Tosic, V. Pagurek, B.},
  title = {On comprehensive contractual descriptions of Web services},
  journal = {e-Technology, e-Commerce and e-Service, 2005. EEE '05. Proceedings. The 2005 IEEE International Conference on},
  year = {2005},
  pages = {444 - 449},
  doi = {http://dx.doi.org/10.1109/EEE.2005.94}
}
Ukkonen, E. Approximate string matching over suffix trees 1993 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching   inproceedings URL  
Abstract: The classical approximate string-matching problem of finding the locations of approximate occurrences P' of pattern string P in text string T such that the edit distance between P and P' is &le; k

is considered. We concentrate on the special case in which T is available for preprocessing before the searches with varying P and k. It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree. Three variations of the search algorithm are developed with running times O(mq + n), O(mqlog q + size of the output), and O(m&sup2;q + size of the output). Here n = |T|, m = |P|, and q varies depending on the problem instance between 0 and n. In the case of the unit cost edit distance it is shown that q = O(min(n, m^(k+1) |&Sigma;|^k)) where &Sigma; is the alphabet.

BibTeX:
@inproceedings{Ukk93,
  author = {E. Ukkonen},
  title = {Approximate string matching over suffix trees},
  booktitle = {Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching},
  publisher = {#svb#},
  year = {1993},
  number = {684},
  pages = {228--242},
  url = {http://citeseer.ist.psu.edu/ukkonen93approximate.html}
}
Ukkonen, E. Approximate string matching with $q$-grams and maximal matches 1992 Theoretical Computer Science   article URL  
BibTeX:
@article{Ukk92,
  author = {E. Ukkonen},
  title = {Approximate string matching with {$q$}-grams and maximal matches},
  journal = {Theoretical Computer Science},
  year = {1992},
  volume = {92},
  number = {1},
  pages = {191--212},
  url = {http://portal.acm.org/citation.cfm?id=135908}
}
Ukkonen, E. Algorithms for approximate string matching 1985 Information and Control   article URL  
Abstract: The edit distance between strings a sub(l) multiplied by multiplied by multiplied by a sub(m) and b sub(l) multiplied by multiplied by multiplied by b sub(n) is the minimum cost s of a sequence of editing steps (insertions, deletions, changes) that convert one string into the other. A well-known tabulating method computes s as well as the corresponding editing sequence in time and in space O(mn) (in space O(min(m, n)) if the editing sequence is not required). Starting from this method, the author develops an improved algorithm that works in time and in space O(s multiplied by min(m, n)). Another improvement with time O(s multiplied by min(m, n)) and space O(s multiplied by min(s, m, n)) is given for the special case where all editing steps have the same cost independently of the characters involved.
BibTeX:
@article{Ukk85,
  author = {E. Ukkonen},
  title = {Algorithms for approximate string matching},
  journal = {Information and Control},
  year = {1985},
  volume = {64},
  number = {1--3},
  pages = {100--118},
  url = {http://portal.acm.org/citation.cfm?id=4620.4626}
}
Ukkonen, E. Finding approximate patterns in strings 1985 Journal of Algorithms   article URL  
BibTeX:
@article{Ukk85b,
  author = {E. Ukkonen},
  title = {Finding approximate patterns in strings},
  journal = {Journal of Algorithms},
  year = {1985},
  volume = {6},
  number = {1--3},
  pages = {132--137},
  url = {http://cat.inist.fr/?aModele=afficheN&cpsidt=9264743}
}
Ukkonen, E. & Wood, D. Approximate string matching with suffix automata 1993 Algorithmica   article URL  
Abstract: Theapproximate string matching problem is, given a text string, a pattern string, and an integerk, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at mostk from the pattern. We give a newO(kn) algorithm for this problem, wheren is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported.
BibTeX:
@article{UW93,
  author = {E. Ukkonen and D. Wood},
  title = {Approximate string matching with suffix automata},
  journal = {Algorithmica},
  year = {1993},
  volume = {10},
  number = {5},
  pages = {353--364},
  url = {http://www.springerlink.com/content/t720711r28174513/}
}
Ukkonen, E. & Wood, D. A simple on-line algorithm to approximate string matching 1989   techreport  
BibTeX:
@techreport{UW89,
  author = {E. Ukkonen and D. Wood},
  title = {A simple on-line algorithm to approximate string matching},
  year = {1989}
}
Van Court, T. & Herbordt, M. Families of FPGA-based algorithms for approximate string matching 2004 Application-Specific Systems, Architectures and Processors, 2004. Proceedings. 15th IEEE International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Van2004,
  author = {Van Court, T. and Herbordt, M.C.},
  title = {Families of FPGA-based algorithms for approximate string matching},
  booktitle = {Application-Specific Systems, Architectures and Processors, 2004. Proceedings. 15th IEEE International Conference on},
  year = {2004},
  pages = {354--364},
  doi = {http://dx.doi.org/10.1109/ASAP.2004.1342484}
}
Vlachos, M., Gunopulos, D. & Kollios, G. Robust similarity measures for mobile object trajectories 2002 Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on   inproceedings  
BibTeX:
@inproceedings{Vlachos2002,
  author = {Vlachos, M. and Gunopulos, D. and Kollios, G.},
  title = {Robust similarity measures for mobile object trajectories},
  booktitle = {Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on},
  year = {2002},
  pages = {721--726}
}
Vlachos, M., Kollios, G. & Gunopulos, D. Discovering similar multidimensional trajectories 2002 Data Engineering, 2002. Proceedings. 18th International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Vlachos2002a,
  author = {Vlachos, M. and Kollios, G. and Gunopulos, D.},
  title = {Discovering similar multidimensional trajectories},
  booktitle = {Data Engineering, 2002. Proceedings. 18th International Conference on},
  year = {2002},
  pages = {673--684},
  doi = {http://dx.doi.org/10.1109/ICDE.2002.994784}
}
Wan, M., Huang, S. & Yang, J. Finding the longest similar subsequence of thumbprints for intrusion detection 2006 Advanced Information Networking and Applications, 2006. AINA 2006. 20th International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Wan2006,
  author = {Wan, M.D. and Huang, S.-H.S. and Yang, J.},
  title = {Finding the longest similar subsequence of thumbprints for intrusion detection},
  booktitle = {Advanced Information Networking and Applications, 2006. AINA 2006. 20th International Conference on},
  year = {2006},
  volume = {1},
  pages = {6pp.},
  doi = {http://dx.doi.org/10.1109/AINA.2006.181}
}
Wang, Y. Image indexing and similarity retrieval based on a new spatial relation model 2001 Distributed Computing Systems Workshop, 2001 International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Wang2001,
  author = {Ying-Hong Wang},
  title = {Image indexing and similarity retrieval based on a new spatial relation model},
  booktitle = {Distributed Computing Systems Workshop, 2001 International Conference on},
  year = {2001},
  pages = {396--401},
  doi = {http://dx.doi.org/10.1109/CDCS.2001.918736}
}
Wang, Y., Xu, Y. & Xu, Z. A two-phase method of approximate string match 2004 Systems, Man and Cybernetics, 2004 IEEE International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Wang2004,
  author = {Yi Wang and Yang Xu and ZhenMing Xu},
  title = {A two-phase method of approximate string match},
  booktitle = {Systems, Man and Cybernetics, 2004 IEEE International Conference on},
  year = {2004},
  volume = {4},
  pages = {3371--3376vol.4},
  doi = {http://dx.doi.org/10.1109/ICSMC.2004.1400863}
}
Wright, A. H. Approximate String Matching using Within-word Parallelism 1994 Software - Practice and Experience   article URL  
Abstract: Given a text string, a pattern string, and an integer k, the problem of approximate string matching with k differences is to find all substrings of the text string whose edit distance from the pattern

string is less than k. The edit distance between two string is defined as the minimum number of differences, where a difference can be a substitution, insertion, or deletion of a single character.

An implementation of the dynamic programming algorithm for this problem is given that packs several characters and mod-4 integers into a computer word. Thus, it is a parallelization of the

conventional implementation that runs on ordinary processors. Since a small alphabet means that characters have short binary codes, the degree of parallelism is greatest for small alphabets and for processors with long words. For an alphabet of size 8 or smaller and a 64 bit processor, a 21-fold parallelism over the conventional algorithm can be obtained. Empirical comparisons to the basic dynamic programming algorithm, to a version of Ukkonen's algorithm, and to the algorithm of Galil and Park, are given.

BibTeX:
@article{Wri94,
  author = {A. H. Wright},
  title = {Approximate String Matching using Within-word Parallelism},
  journal = {Software - Practice and Experience},
  year = {1994},
  volume = {24},
  number = {4},
  pages = {337--362},
  url = {http://citeseer.ist.psu.edu/202497.html}
}
Wu, L. Suffix Tree Based WEB Information Search System and Optimal Index Algorithms 2006 Machine Learning and Cybernetics, 2006 International Conference on   inproceedings DOIURL  
Abstract: Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information search system is developed and applied to the Chinese Web Test collection with 100 GB Web pages (CWT-100g). The experimental results show that the system is capable of searching Chinese information without segmentation of Chinese words and the speed of index build is reduced to the theoretical limitation.
BibTeX:
@inproceedings{Wu2006a,
  author = {Lian-Long Wu},
  title = {Suffix Tree Based WEB Information Search System and Optimal Index Algorithms},
  booktitle = {Machine Learning and Cybernetics, 2006 International Conference on},
  year = {2006},
  pages = {4450--4454},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4028855},
  doi = {http://dx.doi.org/10.1109/ICMLC.2006.259157}
}
Wu, S., Chang, H., Hsu, T. & Liu, P. Approximate Keyword Search in Web Search Engines 2006 Digital Information Management, 2006 1st International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Wu2006,
  author = {Wu, Sun and Chang, Hsien-Tsung and Hsu, Ting-Chao and Liu, Pei-Shin},
  title = {Approximate Keyword Search in Web Search Engines},
  booktitle = {Digital Information Management, 2006 1st International Conference on},
  year = {2006},
  pages = {404--411},
  doi = {http://dx.doi.org/10.1109/ICDIM.2007.369229}
}
Wu, X. Data mining: artificial intelligence in data analysis 2004 Intelligent Agent Technology, 2004. (IAT 2004). Proceedings. IEEE/WIC/ACM International Conference on   article DOI  
Abstract: Summary form only given. Data mining is a fast-growing area. The first Knowledge Discovery in Databases Workshop was held in August 1989, in conjunction with the 1989 International Joint Conference on Artificial Intelligence, and this workshop series became the International Conference on Knowledge Discovery and Data Mining (KDD) in 1995. In 2003, there were a total of 15 data mining conferences, most of which are listed at http://www.kdnuggets.com/meetings/meetings-2OO3-past.html. These 15 conferences do not include various artificial intelligence (AI), statistics and database conferences (and their workshops) that also solicited and accepted data mining related papers, such as DC AI, ICML, ICTAI, COMPSTAT, AI & Statistics, SIGMOD, VLDB, ICDE, and CIKM. Among various data mining conferences, KDD and ICDM (the IEEE International Conference on Data Mining) are arguably (or unarguably) the two premier ones in the field. ICDM was established in 2000, sponsored by the IEEE Computer Society, and had its first annual meeting in 2001. This work reviews the topics of interest from ICDM from an AI perspective, and analyze common topics in data mining and AI, including key AI ideas that have been used in both data mining and machine learning. We also discuss two current research projects on (1) user-centered agents for biological information exploration on the Web, and (2) dynamic classifier selection in dealing with streaming data. Both projects apply data mining techniques for intelligent analysis of large volumes of data.
BibTeX:
@article{Wu2004,
  author = {Xindong Wu},
  title = {Data mining: artificial intelligence in data analysis},
  journal = {Intelligent Agent Technology, 2004. (IAT 2004). Proceedings. IEEE/WIC/ACM International Conference on},
  year = {2004},
  pages = {7},
  doi = {http://dx.doi.org/10.1109/IAT.2004.1342916}
}
Yang, E. & Zhang, Z. The Shortest Common Superstring Problem: Average Case Analysis for Both Exact and Approximate Matching 1999 Information Theory, IEEE Transactions on   article URL  
Abstract: The shortest common superstring problem and its extension to approximate matching are considered in the probability model where each string in a given set has the same length and letters of strings are drawn independently from a finite set. In the exact matching case, several algorithms proposed in the literature are shown to be asymptotically optimal in the sense that the ratio of the savings resulting from the superstring constructed by each of these algorithms, that is the difference between the total length of the strings in the given set and the length of the superstring, to the optimal savings from the shortest superstring approaches in probability to 1 as the number of strings in the given set increases. In the approximate matching case, a modified version of the shortest common approximate matching superstring problem is analyzed; it is demonstrated that the optimal savings in this case is given approximately by nlogn/Il(Q,Q,2D), where n is the number of strings in the given set, Q is the probability distribution governing the selection of letters of strings, Il(Q,Q,2D) is the lower mutual information between Q and Q with respect to 2D, and D&ges;0 is the distortion allowed in approximate matching. In addition, an approximation algorithm is proposed and proved asymptotically optimal
BibTeX:
@article{YZ99,
  author = {E.-h. Yang and Z. Zhang},
  title = {The Shortest Common Superstring Problem: Average Case Analysis for Both Exact and Approximate Matching},
  journal = {Information Theory, IEEE Transactions on},
  year = {1999},
  volume = {45},
  number = {6},
  pages = {1867--1886},
  url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=782108}
}
Yang, J., Wang, W. & Yu, P. BASS: approximate search on large string databases 2004 Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on   inproceedings DOIURL  
Abstract: In this paper, we study the problem on how to build an index structure for large string databases to efficiently support various types of string matching without the necessity of mapping the substrings to a numerical space (e.g., string B-tree and MRS-index) nor the restriction of in-memory practice (e.g., suffix tree and suffix array). Towards this goal, we propose a new indexing scheme, BASS-tree, to efficiently support general approximate substring match (in terms of certain symbol substitutions and misalignments) in sublinear time on a large string database. The key idea behind the design is that all positions in each string are grouped recursively into a fully balanced tree according to the similarities of the subsequent segments starting at those positions. Each node is labeled with a regular expression that describes the commonality of the substrings indexed through the subtree. Any search can then be properly directed to the portion in the database with a high potential of matching quickly. With the BASS-tree in place, wild card(s) in the query pattern can also be handled in a seamless way. In addition, search of a long pattern can be decomposed into a series of searches of short segments followed by a process to join the results. It has been demonstrated in our experiments that the potential performance improvement brought by BASS-tree is in an order of magnitude over alternative methods.
BibTeX:
@inproceedings{Yang2004,
  author = {Jiong Yang and Wei Wang and Yu, P.},
  title = {BASS: approximate search on large string databases},
  booktitle = {Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on},
  year = {2004},
  pages = {181--190},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1311210},
  doi = {http://dx.doi.org/10.1109/SSDM.2004.1311210}
}
Zhang, A., Kleeman, L. & Russell, R. Topological Mapping Inspired by Techniques in DNA Sequence Alignment 2006 Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on   inproceedings DOI  
BibTeX:
@inproceedings{Zhang2006,
  author = {Zhang, A.M. and Kleeman, L. and Russell, R.A.},
  title = {Topological Mapping Inspired by Techniques in DNA Sequence Alignment},
  booktitle = {Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on},
  year = {2006},
  pages = {2754--2759},
  doi = {http://dx.doi.org/10.1109/IROS.2006.282002}
}
Zhang, N., Mukherjee, A., Adjeroh, D. & Bell, T. Approximate pattern matching using the Burrows-Wheeler transform 2003 Data Compression Conference, 2003. Proceedings. DCC 2003   inproceedings DOIURL  
Abstract: Summary form only given. The approximate pattern matching on the text transformed by the Burrows-Wheeler transform (BWT) was considered. This is an important first step towards developing a compressed pattern matching algorithm for the BWT based compression system. Algorithms are proposed to solve the K-mismatch problem. Tests were performed on different pattern lengths using 133 selected files from the Canterbury, Calgary, and TREC corpus. The results on the K-mismatch pattern matching show that the running time and storage are superior to the fast suffix tree approach. Thus, once the index arrays are created, for repeated pattern search operations and for long patterns, the proposed algorithms perform significantly better than the agrep and ngrep. Using DFA verification, the search time is almost constant. The amortized cost is lower for multiple patterns search operations.
BibTeX:
@inproceedings{Zhang2003,
  author = {Nan Zhang and Mukherjee, A. and Adjeroh, D. and Bell, T.},
  title = {Approximate pattern matching using the Burrows-Wheeler transform},
  booktitle = {Data Compression Conference, 2003. Proceedings. DCC 2003},
  year = {2003},
  pages = {458},
  url = {http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1194077},
  doi = {http://dx.doi.org/10.1109/DCC.2003.1194077}
}
Zhang, Q., Chamberlain, R., Indeck, R., West, B. & White, J. Massively parallel data mining using reconfigurable hardware: approximate string matching 2004 Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International   inproceedings DOI  
BibTeX:
@inproceedings{Zhang2004,
  author = {Zhang, Q. and Chamberlain, R.D. and Indeck, R.S. and West, B.M. and White, J.},
  title = {Massively parallel data mining using reconfigurable hardware: approximate string matching},
  booktitle = {Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International},
  year = {2004},
  pages = {259},
  doi = {http://dx.doi.org/10.1109/IPDPS.2004.1303326}
}
Zheng, A. A Web Service Mining Framework 2007 Web Services, 2007. ICWS 2007. IEEE International Conference on   article DOI  
Abstract: We propose a service mining framework for exploring interesting compositions of existing Web services. The framework first screens Web services for composition leads using a "coarse-grained" filtering approach. It then verifies these leads based on runtime conditions. Top candidates are selected from the verified leads and evaluated for their interestingness. We present algorithms to automate the screening phase of the framework. Finally, we study the effects of key variables on lead compositions' interestingness. As a motivating example, we apply these algorithms to the field of biological pathway discovery and rely on knowledge obtained from reverse engineering online resources to assess their effectiveness.
BibTeX:
@article{Zheng2007,
  author = {Zheng, George Bouguettaya, Athman},
  title = {A Web Service Mining Framework},
  journal = {Web Services, 2007. ICWS 2007. IEEE International Conference on},
  year = {2007},
  pages = {1096 - 1103},
  doi = {http://dx.doi.org/10.1109/ICWS.2007.27}
}
Zhong, C., Chen, G. & He, J. Parallel computing for the longest common subsequences in network intrusion detection system 2004 Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on   inproceedings  
BibTeX:
@inproceedings{Zhong2004,
  author = {Cheng Zhong and Guo-Ling Chen and Jia-Hua He},
  title = {Parallel computing for the longest common subsequences in network intrusion detection system},
  booktitle = {Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on},
  year = {2004},
  volume = {4},
  pages = {2553--2557vol.4}
}

Created by JabRef on 12/04/2008.