Search This Blog

Tuesday, August 26, 2014

Border and Prefix of a String


Border of String

For string ‘abacabab’, the border is 'ab' as its present at both the start and end of the string.

Explanation

Adjusting the index to reuse identified border of the shorter string str[0..i- 1] when there is a mismatch is important.

For string abacabab i = 6, border/index = 3.

Now to find border at i = 7, since str[3] != str[7], we set index = border[3 - 1] = 1 and find str[1] == str[7]. So we increment index and 2 is the border.

 This works because,
aba is border when i = 6 (aba c aba)
a is border when i = 2 (a b a)
when we cannot expand on aba, we use the border of aba which is a, to expand the matching.


Code

using System;
using System.Linq;
namespace Border
{
    class Program
    {
        static void Main(string[] args)
        {
            String str = " abacabab";
            foreach (int n in getBorder(str))
                Console.WriteLine(n);
            Console.ReadKey();
        }
        private static int[] getBorder(string str)
        {
            int length = str.Length;
            int index = 0;
            int[] borderArr = new int[length];
            for (int i = 1; i < length; i++)
            {
                borderArr[i - 1] = index;
                while (index >= 0 && str.ElementAt(index) != str.ElementAt(i))
                {
                    if (index == 0)
                        index--;
                    else
                        index = borderArr[index - 1];
                }
                index++;
            }
            borderArr[length - 1] = index;
            return borderArr;
        }
    }
}


Prefix array of String

It’s the LCP of a string.

Explanation

Adjusting the index to reuse identified prefix of the shorter string str[0..i- 1] is important.

For string abbabaabbabaa i = 7, backindex = 6 index = 13.

Since the prefix for string starting at i = 6 extended till 13, the prefix for i=7 will be minimum of prefix of string starting at 7-6 --> which is 0, or 13 – 7 which is 6.

Code

using System;
using System.Linq;
 
namespace Prefix
{
   class Program
    {
        static void Main(string[] args)
        {
            String str = "abbabaabbabaaaabbabbaa";
            foreach (int n in getPrefix(str))
                Console.WriteLine(n);
            Console.ReadKey();
        }
 
        private static int[] getPrefix(string str)
        {
            int length = str.Length;
            int index = 0;
            int backindex = 0;
            int[] prefixArr = new int[length];
            prefixArr[0] = length;

            for (int i = 1; i < length; i++)
            {
                if (i < index && prefixArr[i - backindex] != index - i)
                    prefixArr[i] = Math.Min(prefixArr[i - backindex], index - i);
                else
                {
                    index = Math.Max(index, i);
                    backindex = i;
                    while (index < length && str.ElementAt(index) == str.ElementAt(index - backindex))
                        index++;
                    prefixArr[i] = index - backindex;
                }
            }
            return prefixArr;
        }
    }
}

Sunday, August 24, 2014

Lyndon Words

Lyndon Words Generation

Lyndon words can be used to generate De Bruijn sequence. To generate a De Bruijn sequence concatenating together, in lexicographic order, all the Lyndon words whose length divides n.

An implementation of the Lyndon words generation algo present in wiki is below.
 
using System;
using System.Collections.Generic;
using System.Linq;
namespace Strings
{
    class Program
    {
        static void Main(string[] args)
        {
            int n = 5, k = 4;
            HashSet<string> result = new HashSet<string>();

            for (int i = 1; i <= k; i++)
                result.Add((char)('A' + i - 1) + "");

            for (int i = 2; i <= n; i++)
                DuvalsAlgo(k, i, ref result);

            foreach (String str in result)
                Console.WriteLine(str);
            Console.ReadKey();
        }

       private static void DuvalsAlgo(int k, int n, ref HashSet<string> result)
        {
            int elements = 0;
            while(elements < result.Count)
            {
                string str = result.ElementAt(elements);
                elements++;
                string val = "";

                for (int i = 0; i < n; i++)
                    val = val + str.ElementAt(i % str.Length);

                if (val.ElementAt(val.Length - 1) < 'A' + k - 1)
                {
                    val = val.Substring(0, val.Length - 1) + (char)(val.ElementAt(val.Length - 1) + 1);
                    if (result.Contains(val))
                        continue;
                    result.Add(val);
                }
            }
        }
    }
}

Explanation

The main concept of the algorithm, is to start at Lyndon words of length 1, keep expanding them for larger lengths.

Say we want to generate Lyndon words of length 2,with alphabets {a, b, c}, then Lyndon words of length 1 are a, b and c.

TO get Lyndon words of length 2,

1. Start with 'a' -> expand to 'aa' -> last char is not 'c', so continue -> substitute for last character in word with next character in sequence 'ab', and that's our Lyndon word.

2. Lets consider we start with 'c',
Start with 'c' -> expand to 'cc' -> last char 'c', so remove last char until last char is not 'c'. At end of this step we will get empty string.


Consider that we want to find the Lyndon word of length 3.

 Start with 'b' -> expand to 'bbb' -> last char is not 'c', so continue -> substitute for last character in word with next character in sequence 'bbc', and that's our Lyndon word.


Standard Factorization

We can use factorization to find the lexicographically minimal string rotation.

     1. The standard factorization of some string S = w_1*w_2*w_3*...*w_n (here * means concatenation of strings).
     2. If you have this factorization then your problem is solved, you just take w_n as a start.
     3. Or if there are w_i equal to w_n you should start with the leftmost such place.
    


Code for standard factorization.


using System;
using System.Linq;

namespace LyndonFactorization
{
   

    class Program
    {
        static void Main(string[] args)
        {
            String str = "mississippi";
            while(str.Length > 0)
            {
                str = getFact(str);
            }
            Console.ReadKey();
        }

            public static string getFact(String str)
            {
            int k = 0, m = 1;

            while (k < str.Length && m < str.Length)
            {
                if(str.ElementAt(k) == str.ElementAt(m))
                {
                    k++; m++;
                }
                else if (str.ElementAt(k) < str.ElementAt(m))
                {
                    k = 0; m++;
                }
                else
                {
                    int index = m - k;
                    Console.WriteLine(str.Substring(0, index));
                    str = str.Substring(index);
                    k = 0;  m = 1;
                }
            }
            if (str.Length > 0)
            {
                int index = m - k;
                Console.WriteLine(str.Substring(0, index));
                str = str.Substring(index);
            }
            return str;
        }
    }
}
 


Explanation
The main concept is to split a string into Lyndon words. We achieve this by always comparing each character in the string to the smallest character in the string.

For a string as 'Mississipi', have k = 0 & m = 1.
Compare then both, and if character at position m is less than at position k, then the Lyndon word ends there.

Mark 'm' as the Lyndon word and remaining string is 'ississipi'.

Now start repeating the process as before, k = 0 & m = 1.
Compare then both, and if character at position m is greater than at position k, increment m to 2. Index of k remains constant so that we know the smallest character.

Doing this we will get 'iss' as the next Lyndon word.
Similarly we can factorize the string.