Search This Blog

Tuesday, August 26, 2014

Border and Prefix of a String


Border of String

For string ‘abacabab’, the border is 'ab' as its present at both the start and end of the string.

Explanation

Adjusting the index to reuse identified border of the shorter string str[0..i- 1] when there is a mismatch is important.

For string abacabab i = 6, border/index = 3.

Now to find border at i = 7, since str[3] != str[7], we set index = border[3 - 1] = 1 and find str[1] == str[7]. So we increment index and 2 is the border.

 This works because,
aba is border when i = 6 (aba c aba)
a is border when i = 2 (a b a)
when we cannot expand on aba, we use the border of aba which is a, to expand the matching.


Code

using System;
using System.Linq;
namespace Border
{
    class Program
    {
        static void Main(string[] args)
        {
            String str = " abacabab";
            foreach (int n in getBorder(str))
                Console.WriteLine(n);
            Console.ReadKey();
        }
        private static int[] getBorder(string str)
        {
            int length = str.Length;
            int index = 0;
            int[] borderArr = new int[length];
            for (int i = 1; i < length; i++)
            {
                borderArr[i - 1] = index;
                while (index >= 0 && str.ElementAt(index) != str.ElementAt(i))
                {
                    if (index == 0)
                        index--;
                    else
                        index = borderArr[index - 1];
                }
                index++;
            }
            borderArr[length - 1] = index;
            return borderArr;
        }
    }
}


Prefix array of String

It’s the LCP of a string.

Explanation

Adjusting the index to reuse identified prefix of the shorter string str[0..i- 1] is important.

For string abbabaabbabaa i = 7, backindex = 6 index = 13.

Since the prefix for string starting at i = 6 extended till 13, the prefix for i=7 will be minimum of prefix of string starting at 7-6 --> which is 0, or 13 – 7 which is 6.

Code

using System;
using System.Linq;
 
namespace Prefix
{
   class Program
    {
        static void Main(string[] args)
        {
            String str = "abbabaabbabaaaabbabbaa";
            foreach (int n in getPrefix(str))
                Console.WriteLine(n);
            Console.ReadKey();
        }
 
        private static int[] getPrefix(string str)
        {
            int length = str.Length;
            int index = 0;
            int backindex = 0;
            int[] prefixArr = new int[length];
            prefixArr[0] = length;

            for (int i = 1; i < length; i++)
            {
                if (i < index && prefixArr[i - backindex] != index - i)
                    prefixArr[i] = Math.Min(prefixArr[i - backindex], index - i);
                else
                {
                    index = Math.Max(index, i);
                    backindex = i;
                    while (index < length && str.ElementAt(index) == str.ElementAt(index - backindex))
                        index++;
                    prefixArr[i] = index - backindex;
                }
            }
            return prefixArr;
        }
    }
}

Sunday, August 24, 2014

Lyndon Words

Lyndon Words Generation

Lyndon words can be used to generate De Bruijn sequence. To generate a De Bruijn sequence concatenating together, in lexicographic order, all the Lyndon words whose length divides n.

An implementation of the Lyndon words generation algo present in wiki is below.
 
using System;
using System.Collections.Generic;
using System.Linq;
namespace Strings
{
    class Program
    {
        static void Main(string[] args)
        {
            int n = 5, k = 4;
            HashSet<string> result = new HashSet<string>();

            for (int i = 1; i <= k; i++)
                result.Add((char)('A' + i - 1) + "");

            for (int i = 2; i <= n; i++)
                DuvalsAlgo(k, i, ref result);

            foreach (String str in result)
                Console.WriteLine(str);
            Console.ReadKey();
        }

       private static void DuvalsAlgo(int k, int n, ref HashSet<string> result)
        {
            int elements = 0;
            while(elements < result.Count)
            {
                string str = result.ElementAt(elements);
                elements++;
                string val = "";

                for (int i = 0; i < n; i++)
                    val = val + str.ElementAt(i % str.Length);

                if (val.ElementAt(val.Length - 1) < 'A' + k - 1)
                {
                    val = val.Substring(0, val.Length - 1) + (char)(val.ElementAt(val.Length - 1) + 1);
                    if (result.Contains(val))
                        continue;
                    result.Add(val);
                }
            }
        }
    }
}

Explanation

The main concept of the algorithm, is to start at Lyndon words of length 1, keep expanding them for larger lengths.

Say we want to generate Lyndon words of length 2,with alphabets {a, b, c}, then Lyndon words of length 1 are a, b and c.

TO get Lyndon words of length 2,

1. Start with 'a' -> expand to 'aa' -> last char is not 'c', so continue -> substitute for last character in word with next character in sequence 'ab', and that's our Lyndon word.

2. Lets consider we start with 'c',
Start with 'c' -> expand to 'cc' -> last char 'c', so remove last char until last char is not 'c'. At end of this step we will get empty string.


Consider that we want to find the Lyndon word of length 3.

 Start with 'b' -> expand to 'bbb' -> last char is not 'c', so continue -> substitute for last character in word with next character in sequence 'bbc', and that's our Lyndon word.


Standard Factorization

We can use factorization to find the lexicographically minimal string rotation.

     1. The standard factorization of some string S = w_1*w_2*w_3*...*w_n (here * means concatenation of strings).
     2. If you have this factorization then your problem is solved, you just take w_n as a start.
     3. Or if there are w_i equal to w_n you should start with the leftmost such place.
    


Code for standard factorization.


using System;
using System.Linq;

namespace LyndonFactorization
{
   

    class Program
    {
        static void Main(string[] args)
        {
            String str = "mississippi";
            while(str.Length > 0)
            {
                str = getFact(str);
            }
            Console.ReadKey();
        }

            public static string getFact(String str)
            {
            int k = 0, m = 1;

            while (k < str.Length && m < str.Length)
            {
                if(str.ElementAt(k) == str.ElementAt(m))
                {
                    k++; m++;
                }
                else if (str.ElementAt(k) < str.ElementAt(m))
                {
                    k = 0; m++;
                }
                else
                {
                    int index = m - k;
                    Console.WriteLine(str.Substring(0, index));
                    str = str.Substring(index);
                    k = 0;  m = 1;
                }
            }
            if (str.Length > 0)
            {
                int index = m - k;
                Console.WriteLine(str.Substring(0, index));
                str = str.Substring(index);
            }
            return str;
        }
    }
}
 


Explanation
The main concept is to split a string into Lyndon words. We achieve this by always comparing each character in the string to the smallest character in the string.

For a string as 'Mississipi', have k = 0 & m = 1.
Compare then both, and if character at position m is less than at position k, then the Lyndon word ends there.

Mark 'm' as the Lyndon word and remaining string is 'ississipi'.

Now start repeating the process as before, k = 0 & m = 1.
Compare then both, and if character at position m is greater than at position k, increment m to 2. Index of k remains constant so that we know the smallest character.

Doing this we will get 'iss' as the next Lyndon word.
Similarly we can factorize the string.





Friday, October 15, 2010

Kadane's 2D Algorithm

Kadane's 2D Algorithm

Kadane's 2D algorithm uses Kadane's 1D algo. So i expect anyone to know Kadane's 1D algo.

Kadane's 1D algo, finds the maximum contigous sum in an 1D array in O(n) time complexity.

Now to the 2D algo. Lets consider the problem of finding a maximum sub-matrix from a matrix. Consider the below matrix 'm'-

1 2 -1
-3 -1 -4
1 -5 2

the maximum sub matrix in this example is - [1 2]

To do this programatically, do follow the steps below.

Step1
Create a temp matrix temp[3][3]. Any element in this matrix temp[i][j] is the sum of all temp[1][j], temp[2][j]...temp[i][j].

That's temp[i][j] = temp[1][j]+temp[2][j].+..+temp[i][j].

So our temp matrix is

1 2 -1
-2 1 -5
-1 -4 -3
This has a time complexity of n^2

Step2
Now in this 3 x 3 matrix, the row can be grouped as [1],[1,2],[1,2,3],[2],[2,3],[3].

We can find the sum of these above row combinations using the temp matrix we created as below.

[1] = Row 1 in temp matrix - [1,2,-1]
[1,2] =Row 2 in temp matrix - [-2,1,-5]
[1,2,3] = Row 3 in temp matrix - [-1,-4,-3]
[2] = Row 2 - Row 1 in temp matrix - [-3,-1,-4]
[2,3] = Row 3 - Row 1 in temp matrix - [-2,-6,-2]
[3] = Row 3 - Row 2 in temp matrix - [1,-5,2]

This has a time complexity of n^2

Step3
Now i provide each of the above row to Kadane's 1D algo to find the maximum contigous sub-sequence. After the algo processed all the rows the maximum we have is for row [1] - column 0 to column 1.

So the maximum sub matrix is from m[0][0] to m[0][1] thats [1 2].

The time complexity is n^3.
This solution will work for any matrix including positive and negative numbers.

Thursday, October 14, 2010

Josephus Problem

Josephus Problem

In computer science and mathematics, the Josephus problem (or Josephus permutation) is a theoretical problem related to a certain counting-out game.

There are people standing in a circle waiting to be executed. After the first person is executed, certain number of people are skipped and one person is executed. Then again, people are skipped and a person is executed. The elimination proceeds around the circle (which is becoming smaller and smaller as the executed people are removed), until only the last person remains, who is given freedom.

The task is to choose the place in the initial circle so that you are the last one remaining and so survive.

Lets consider 'n' as the number of people and i execute every 'k'th person. What will be the initial position of the survivor?

There are multiple solutions to this. Lets start with the easy on of using a linked list.

Naive solution

1. Create a circular linked list with 'n' nodes each having info about their position in the list.
2. Start at first element.
3. from current element keep moving to next element until you reach 'k'th element.
4. Remove the 'k'th element from the list.
5. Now 'k + 1'th is current element. Repeat steps 3, 4 and 5 until there is only one element in the list.

Now the space complexity is O(n).

You will walk the list 'k' times to one node delete node. Hence the time complexity is O(nk).

More Efficient Solution

A more efficient solution is to use dynamic programming.

We have the recurrence here

f(n,k) = (f(n-1,k) + k)%n with f(1,k) = 0

How does this recurrence work?

Lets consider that i know the survivor position is 's' for a given number of people 'n' and skip count 'k' where i start counting from position 1.

so if the list is as below

1 2 3.......s.....n-1 n

now i want to find the survivor position for n + 1 elements with same skip 'k'.
So let me introduce one more person 'n + 1' as below.

1 2 3.......s.....n-1 n n+1

If i eliminate the 'n + 1'th person first, then, the next time i start counting 'k' from position '1', which would give me the same result when i had only 'n' persons to count - that's 's'.

But here we are starting at position (n + 1 - k). So if i have to start at position '1' then the result will be (s + k)%n.

Thats how we get the above recurrence.

Hope i am clear in explaining this.
You can also have a look at dicussion on Problem - Kidsgame at TopCoder SRM 315 for different solutions to this problem.

I havent yet decided

I did not know what my blog should be about and that confusion became the name of my blog. Long after creating this blog i have decided to post something technical here. Let me see how long i can write...Lots of chances that this wont even take off :)