Search This Blog

Sunday, August 24, 2014

Lyndon Words

Lyndon Words Generation

Lyndon words can be used to generate De Bruijn sequence. To generate a De Bruijn sequence concatenating together, in lexicographic order, all the Lyndon words whose length divides n.

An implementation of the Lyndon words generation algo present in wiki is below.
 
using System;
using System.Collections.Generic;
using System.Linq;
namespace Strings
{
    class Program
    {
        static void Main(string[] args)
        {
            int n = 5, k = 4;
            HashSet<string> result = new HashSet<string>();

            for (int i = 1; i <= k; i++)
                result.Add((char)('A' + i - 1) + "");

            for (int i = 2; i <= n; i++)
                DuvalsAlgo(k, i, ref result);

            foreach (String str in result)
                Console.WriteLine(str);
            Console.ReadKey();
        }

       private static void DuvalsAlgo(int k, int n, ref HashSet<string> result)
        {
            int elements = 0;
            while(elements < result.Count)
            {
                string str = result.ElementAt(elements);
                elements++;
                string val = "";

                for (int i = 0; i < n; i++)
                    val = val + str.ElementAt(i % str.Length);

                if (val.ElementAt(val.Length - 1) < 'A' + k - 1)
                {
                    val = val.Substring(0, val.Length - 1) + (char)(val.ElementAt(val.Length - 1) + 1);
                    if (result.Contains(val))
                        continue;
                    result.Add(val);
                }
            }
        }
    }
}

Explanation

The main concept of the algorithm, is to start at Lyndon words of length 1, keep expanding them for larger lengths.

Say we want to generate Lyndon words of length 2,with alphabets {a, b, c}, then Lyndon words of length 1 are a, b and c.

TO get Lyndon words of length 2,

1. Start with 'a' -> expand to 'aa' -> last char is not 'c', so continue -> substitute for last character in word with next character in sequence 'ab', and that's our Lyndon word.

2. Lets consider we start with 'c',
Start with 'c' -> expand to 'cc' -> last char 'c', so remove last char until last char is not 'c'. At end of this step we will get empty string.


Consider that we want to find the Lyndon word of length 3.

 Start with 'b' -> expand to 'bbb' -> last char is not 'c', so continue -> substitute for last character in word with next character in sequence 'bbc', and that's our Lyndon word.


Standard Factorization

We can use factorization to find the lexicographically minimal string rotation.

     1. The standard factorization of some string S = w_1*w_2*w_3*...*w_n (here * means concatenation of strings).
     2. If you have this factorization then your problem is solved, you just take w_n as a start.
     3. Or if there are w_i equal to w_n you should start with the leftmost such place.
    


Code for standard factorization.


using System;
using System.Linq;

namespace LyndonFactorization
{
   

    class Program
    {
        static void Main(string[] args)
        {
            String str = "mississippi";
            while(str.Length > 0)
            {
                str = getFact(str);
            }
            Console.ReadKey();
        }

            public static string getFact(String str)
            {
            int k = 0, m = 1;

            while (k < str.Length && m < str.Length)
            {
                if(str.ElementAt(k) == str.ElementAt(m))
                {
                    k++; m++;
                }
                else if (str.ElementAt(k) < str.ElementAt(m))
                {
                    k = 0; m++;
                }
                else
                {
                    int index = m - k;
                    Console.WriteLine(str.Substring(0, index));
                    str = str.Substring(index);
                    k = 0;  m = 1;
                }
            }
            if (str.Length > 0)
            {
                int index = m - k;
                Console.WriteLine(str.Substring(0, index));
                str = str.Substring(index);
            }
            return str;
        }
    }
}
 


Explanation
The main concept is to split a string into Lyndon words. We achieve this by always comparing each character in the string to the smallest character in the string.

For a string as 'Mississipi', have k = 0 & m = 1.
Compare then both, and if character at position m is less than at position k, then the Lyndon word ends there.

Mark 'm' as the Lyndon word and remaining string is 'ississipi'.

Now start repeating the process as before, k = 0 & m = 1.
Compare then both, and if character at position m is greater than at position k, increment m to 2. Index of k remains constant so that we know the smallest character.

Doing this we will get 'iss' as the next Lyndon word.
Similarly we can factorize the string.





No comments:

Post a Comment