Papadopoulos, A., Roy, P. and Pachet, F. Avoiding Plagiarism in Markov Sequence Generation. 28th Conference on Artificial Intelligence (AAAI 2014), pages 2731-2737, Quebec (Canada), July 2014

Sony CSL authors: Fran├žois Pachet, Pierre Roy


Markov processes are widely used to generate sequences that imitate a given style, using random walk. Random walk generates sequences by iteratively concatenating states to prefixes of length equal or less than the given Markov order. However, at higher orders, Markov chains tend to replicate chunks of the corpus with a size possibly higher than the order, a primary form of plagiarism. The Markov order defines a maximum length for training but not for generation. In the framework of constraint satisfaction (CSP), we introduce M AX O RDER . This global constraint ensures that generated sequences do not include chunks larger than a given maximum order. We exhibit an automaton that recognises the solution set, with a size linear in the size of the corpus. We propose a linear-time procedure to generate this automaton from a corpus and a given max order. We then use this automaton to achieve generalised arc consistency for the M AX O RDER constraint, holding on a sequence of size n, in O(n.T ) time, where T is the size of the automaton. We illustrate our approach by generating text sequences from text corpora with a maximum order guarantee, effectively controlling plagiarism.

Keywords: markov constraints, constraints, plagiarism, style, flow machines


[PDF] Adobe Acrobat PDF file

BibTeX entry

@INPROCEEDINGS { papadopoulos:14a, ADDRESS="Quebec (Canada)", AUTHOR="Papadopoulos, A. and Roy, P. and Pachet, F.", BOOKTITLE="28th Conference on Artificial Intelligence (AAAI 2014)", MONTH="July", PAGES=" 2731--2737", TITLE="Avoiding Plagiarism in Markov Sequence Generation", YEAR="2014", }