SHOGUN  6.1.3
SubsequenceStringKernel.h File Reference

Go to the source code of this file.


class  CSubsequenceStringKernel
 class SubsequenceStringKernel that implements String Subsequence Kernel (SSK) discussed by Lodhi et. al.[1]. A subsequence is any ordered sequence of \(n\) characters occurring in the text, though not necessarily contiguous. More formally, string \(u\) is a subsequence of string \(s\), iff there exists indices \(\mathbf{i}=(i_{1},\dots,i_{|u|})\), with \(1\le i_{1} \le \cdots \le i_{|u|} \le |s|\), such that \(u_{j}=s_{i_{j}}\) for \(j=1,\dots,|u|\), written as \(u=s[\mathbf{i}]\). The feature mapping \(\phi\) in this scenario is given by

\[ \phi_{u}(s)=\sum_{\mathbf{i}:u=s[\mathbf{i}]}\lambda^{l(\mathbf{i})} \]

for some \(lambda\le 1\), where \(l(\mathbf{i})\) is the length of the subsequence in \(s\), given by \(i_{|u|}-i_{1}+1\). The kernel here is an inner product in the feature space generated by all subsequences of length \(n\).

\[ K_{n}(s,t)=\sum_{u\in\Sigma^{n}}\langle \phi_{u}(s), \phi_{u}(t)\rangle = \sum_{u\in\Sigma^{n}}\sum_{\mathbf{i}:u=s[\mathbf{i}]} \sum_{\mathbf{j}:u=t[\mathbf{j}]}\lambda^{l(\mathbf{i})+l(\mathbf{j})} \]

Since the subsequences are weighted by the exponentially decaying factor \(\lambda\) of their full length in the text, more weight is given to those occurrences that are nearly contiguous. A direct computation is infeasible since the dimension of the feature space grows exponentially with \(n\). The paper describes an efficient computation approach using a dynamic programming technique. More...


SHOGUN Machine Learning Toolbox - Documentation