Strings

A string in computer science is just a list of characters, most often stored in an array. Different languages have different string nuances, so we’ll address them separately.

C

C strings are literally arrays of characters, terminated by a null character (‘\0’), or NULLCHAR. As a result of this null termination requirement, the length of the array must be one greater than the number of characters in your string to accommodate the NULLCHAR.

Because C does not track array size, finding the length of a string is a linear time operation achieved by going down the array and counting the characters until NULLCHAR is reached; in fact, this is how the strlen() library function works. However, you can make it a constant time lookup by storing the size of the string in another variable.

Copying a string is just like copying an array. Assignment (str1 = str2) will not work; instead, copy each character from the original string to the new string. This is how the strcpy() library function works.

Remember to null terminate strings in C; forgetting to is a classic beginner mistake.

C++

C++ allows the use of character arrays as strings but also provides a string class in its standard library (STL), which is the recommended method for string representation.

Java

Java strings are objects of a system class called String. They are a separate type from character arrays, allowing for a greater amount of functionality, although they are stored as character arrays internally. You can convert from String to char[] with the toCharArray() library function, and from char[] to String with the toString() library function in the Java API. Java keeps track of the length of a string for you, which you can access using the length() library function.

Strings are immutable in Java and cannot be modified after construction. Therefore, if you are looking to change a string often, use the StringBuilder or StringBuffer class instead. Both enable mutable strings that can be converted to a String object after you’re done manipulating them. StringBuilder is not synchronized, leading to better performance at the cost of safety in threaded programs, while StringBuffer is synchronized.

In fact, string concatenation (“one” + “two”) in Java is implicitly done using StringBuffer. Although convenient, this can lead to performance issues if you’re unaware of this fact, especially if you are performing string manipulations inside a loop.
ASCII vs Unicode

As the world grows smaller, internationalization (i10n) and localization (l8n) are increasingly important technical issues. With the diversity of languages out there, the traditional ASCII 7-bit character set is no longer adequate to express every language. As a result, Unicode’s most common encoding called UTF-8 is the preferred character set. UTF-8 is popular because it keeps the ASCII characters in a byte and other characters in 4 bytes. This allows the usual English characters to be expressed with minimal overhead (8 bits) and still accommodate more complicated languages with up to 32 bits.

Java and C# have built in support for Unicode character types, and C++ has a standard library class for Unicode strings, whereas C can only handle ASCII characters. Most interviewers will ask you to assume an ASCII character set when solving problems for simplicity, but it is a big plus to know about Unicode.

Practice Question: Escape all % characters in a string; % is the escape character.

Example: “I’d like 2% milk” becomes “I’d like 2%% milk”.

A lot people tend to overcomplicate this problem, but the solution is simple and elegant. Escaping a character usually requires prepending the escape character, which would require that you track the position and then move backwards to prepend. Thankfully, since the escape character and the character to be escaped are the same, you can just append another % character for each % found in the string.

It is usually not be possible to do this in-place since the original unescaped string may not have enough space in the array to store the additional escape characters. Luckily, the problem is lot easier to do with a new string in both Java and C, so it works out quite well to the interviewee’s advantage.

String escapePercent(String input) {
    StringBuilder sb = new StringBuilder();
    char[] str = input.toCharArray();
 
    for (int i = 0; i < input.length(); i++) {
        if (str[i] ==%) sb.append(%);
        sb.append(str[i]);
    }
 
    return new String(sb);
}

Practice Question: Write a function that takes two strings, a needle and a haystack, and returns the index in the haystack where the first instance of the needle occurs. Return -1 if the needle is not found in the haystack.

This problem basically asks you to rewrite the substring method. You should first let the interviewer know that you are aware this functionality exists in most language libraries and would use that implementation normally, but you’d be happy to derive your own solution for the purposes of the interview.

The answer is fairly straightforward; iterate through the haystack until the needle is found by keeping track of how many characters of the needle match so far. Remember to reset the needle counter if a mismatch occurs. Also, once you find the needle, remember to subtract its length + 1 to get the index of the first character of the needle in the haystack.

public static int strtstr(String n, String h) {
    char[] needle = n.toCharArray();
    char[] haystack = h.toCharArray();
    int needleIndex = 0;
 
    for (int i = 0; i < h.length(); i++) {
        if (needle[needleIndex] == haystack[i])
            needleIndex++;
        else {
            // Thanks to Cedric for the correct fix.
            i -= needleIndex;
            needleIndex = 0;
            continue;
        }
 
        if (needleIndex == n.length())
            return i - n.length() + 1;
    }
    return -1;
} 

Practice Question: Remove whitespace from a string in-place, in linear time.

The first idea for this problem would be to create a new string and copy the non-space characters from the original string into it in the order that they’re found from left to right. This algorithm would work great and run in linear time, and would be the accepted answer in Java.

However, if you’re using C, it’s not the most efficient in terms of space since we can do this in-place. In-place string manipulation problems usually involve having two separate array indices referring to two separate positions within the string – this is no different. In fact, it’s the only trick in this question.

So, we need to decide what each array index refers to. Since we’re removing the space character, one array index should reference the current string character and the other index should skip spaces to find the next non-space character. At each increment of the indexes, the character at the ahead index should be placed in the character at the current index. This continues until the end of the string is reached.

void removeWhitespace(char* str) {
    int current = 0;
    int ahead = 0;
 
    while (str[ahead] != '\0') {
        while (str[ahead] == ' ') ahead++;
 
        str[current] = str[ahead];
        current++;
        ahead++;
    }
 
    // Terminate the string with a null char at the new ending.
    str[current] = '\0';
}

Practice Question: Reverse the position of words in a string. Words are separated by spaces only and include punctuation as part of the word.

Example: “Hello world, it’s a beautiful day!” becomes “day! beautiful a it’s world, Hello”.

To start off, we’ll need a token scanner to separate the space-delimited words from the string. This is pretty simple to accomplish by iterating through the string to find each space character.
After you have a token scanner, the easiest solution would be to use it to identify the words, put them on a stack scanning the string from left to right, and then pop them back off to get the words you inserted in reverse order. Even though that solution would work, it requires a lot of additional space, so let’s try to solve this problem in-place.
Reversing the entire string without regard to the words is pretty easy by just swapping characters. Let’s see an example of this: “Hello world, it’s a beautiful day!” becomes “!yad lufituaeb a s’ti ,dlrow olleh”. This is interesting because the words are now in the correct position, just backwards. Perhaps we can use this to our advantage…
It turns out that we just need to reverse each word after reversing the entire string. This can be done easily by using the token scanner to iterate through the string and find the start and end indexes of each word, then reversing those. Separating the reverse string functionality from the reverse word functionality results in a cleaner design.

// Reverse a string in-place in O(n)
void reverseString(char* str, int start, int end)
{
    if (str == NULL) return;
 
    while (start < end) {
        char temp  = str[start];
        str[start]  = str[end];
        str[end] = temp;
 
        start++;
        end--;
    }
}
 
void reverseWords(char* input)
{
    // First reverse the entire string
    reverseString(input, 0, strlen(input) - 1);
 
    // Reverse each word
    int startWord = 0;
    for (int i = 0; i <= strlen(input); i++) {
        if (input[i] == ' ' || input[i] == 0) {
            reverseString(input, startWord, i - 1);
            startWord = i + 1;
        }
    }
}

Practice Question: Convert an ASCII encoded string into an integer.

As usual, you should first mention to the interviewer that this functionality is included in language libraries. In C, this is the atoi() function and in Java, this is the Integer.parseInt() method. This problem is composed of two different sub-problems; each digit in the string must be parsed into an integer from 0-9 and the correct place for it must be calculated.

First, we’ll figure out how to parse a character into an integer. In ASCII, the character 0 is represented as a byte with the value 48. Each successive numeral has a byte value sequential to 48, so 1 is 49, 2 is 50, etc. all the way to 9 being 57. Therefore, to go from character to integer in ASCII, just subtract 48 or ‘0’ from the ASCII byte representation of the character to get the integer value.

Next, we’ll need to figure out how to calculate the correct place value for each number we parse. The ones place starts at the right, followed by the tens place, and so forth, so at first that might seem like the best place to start, from right to left. To accomplish this, multiply the digit by the place value and then multiply the place value by 10 each time you parse the next digit. However, this requires two multiplications – is there a more efficient way?

It turns out the going from left to right not only works, but is more efficient. This is achieved by a loop that multiplies the current value you have scanned so far by 10, then adding the next digit you encounter until all digits have been parsed. For example, parsing 123 yields 1 first. Since there’s another digit, we multiply 1 * 10 = 10 and add 2, the next digit, to get 12. Then, since there’s another digit, we multiply 12 * 10 = 120 and add 3 to get 123! By avoiding having to keep track of the place value, you’ve saved a multiplication per step. This trick is called Homer’s Rule and is often used in computing checksums.

Let’s not forget to handle negative integers. A negative number will have a minus sign, so if the leftmost character in the string is ‘-‘, then we will skip it to avoid parsing it as an integer. Instead, we’ll set a flag that denotes that the number we’re processing is negative, and continue as before. At the end, we’ll make the parsed number negative by multiplying it by -1 if the isNegative flag is set. We need the flag because we can’t multiply by -1 in the beginning, only at the end after the number is fully parsed.

int strToInt (String s) {
    int i = 0; 
    int num = 0;
    int len = str.length();
    boolean isNegative = false;
    char[] str = s.toCharArray();
 
    if (str[0] == '-') {
        isNegative = true;
        i = 1;
    }
 
    while (i < len) {
        num *= 10;
        num += (str[i++] - '0');
    }
 
    if (isNegative) { 
        num *= -1; 
    }
 
    return num;
}

Practice Question: Convert an integer into an ASCII encoded string.

A lot of the same principles from the previous question hold here, too. The C function itoa() and the Java method toString() accomplish this functionality. This problem is composed of two sub-problems; each digit in the integer must be extracted and then parsed into a string.

Parsing a digit into a character is just the reverse of before. Instead of subtracting by 48, just add 48 to the integer value to get the ASCII byte representation of that digit. Again, adding ‘0’ to the integer value will achieve the same effect.

Extracting each digit from the integer is the harder problem. Some people come up with solutions that use logarithms, but since most languages don’t have functions for logs built in, we’ll try to avoid that. Since there’s no way to select a decimal digit from its binary representation in the computer, we’ll have to compute it instead. We’ll try starting from right to left again.

To get the leftmost one’s place of a number, use the modulus operator. For example, 123 modulo 10 yields 3, which is the digit we wanted. Now, how to get the next digit? Integer division seems to do the trick because it truncates: 123 / 10 = 12, which we can then modulo by 10 to get 2, our next digit. However, this method yields the digits in reverse order, so you’ll either have to write the digits backwards using a static sized array or write them and then reverse the string.

Lastly, we again have to deal with negative numbers. The modulus operator doesn’t work correctly with negative values, so we’ll use the same trick as before and just multiply by -1 if the number is negative to make it positive and prepend a ‘-‘ character to the final string.

String intToStr (int num) {
    int i = 0;
    boolean isNegative = false;
 
    char[] tmp = new char[11];
 
    // Check to see if the number is negative.
    if (num < 0) {
        num *= -1;
        isNegative = true;
    }
 
    // Fill buffer with digit characters in reverse order.
    while (num != 0) {
        tmp[i++] = (char) ((num % 10) + '0');
        num /= 10;
    }
 
    StringBuffer b = new StringBuffer();
 
    if (isNegative)
        b.append('-');
 
    while (i > 0) {
        b.append(tmp[--i]);
    }
 
    return b.toString();
}

Practice Question: Print all the possible permutations of a string, including repeats.

[[[ Code Answer ]]]

Practice Question: Given a dictionary of words, print out every word that is an anagram of another word in the dictionary.

First, an anagram is any word that rearranges the letters of another word to form itself. For example, “act” and “cat” are anagrams of each other, as well as “admirer” and “married”.

Next, we need an algorithm to figure out whether two words are anagrams of each other. The obvious solution is to go through all combinations of letters of the first word and compare against the second, but generating all combinations is inefficient. Since anagrams are just rearrangements, the simplest idea seems to be to just sort each word by letter so that if two words are anagrams of each other, their sorted arrangement is identical.

Example: “admirer” sorted by character is adeimrr, and “married” sorted is also adeimrr.

The immediate naïve solution is to compare each word against every other word in the dictionary, checking if they are anagrams of each other by sorting by character and testing for equality. This requires O(n^2) and is a working solution, but we can do a little better.

One place where we can save time is to sort each word only once, saving it into a temporary copy of the dictionary. Then, comparisons are just for equality and don’t require sorting each time, but this algorithm still runs in O(n^2) and now takes twice as much space.

Example dictionary: {cat, dog, sad, act, cab, mat, god }
Copy dictionary and sort by character: { act, dgo, ads, act, abc, amt, dgo }

We can take it one step further and gain real savings from sorting the temporary copy as well, which only costs O(n log n). It then allows you to find anagrams in O(n) time by going down the list and checking neighbors for equality. However, you’ll have to store the index into the original dictionary of the word as well since they’ll get rearranged.

Example dictionary: {cat, dog, sad, act, cab, mat, god }
Copy dictionary along with index and sort by character:
{ (act 0), (dgo 1), (ads 2), (act 3), (abc 4), (amt 5), (dgo 6) }
Sort dictionary: { (abc 4), (act 0), (act 3), (ads 2), (amt 5), (dgo 1), (dgo 6) }
Go through and find matching neighbors, extracting their indices and printing the corresponding word in the dictionary: 0, 3, 1, 6 -> cat act dog god

Now we have an O(n log n) algorithm that requires an extra O(n) of space, so let the interviewer know the performance tradeoffs depending on which is more valuable, space or time.

[[[ Code Answer ]]]

Webfaction is the ULTIMATE hosting platform for any serious developer.
SSH, WordPress, Rails, Django, cronjobs, compile and execute anything.
ProgrammingInterview.com is proudly powered by Webfaction.