Substring with Concatenation of All Words

Problem

You are given a string s and an array of strings words of the same length. Return all starting indices of substring(s) in s that is a concatenation of each word in words exactly once, in any order, and without any intervening characters.

You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Substrings starting at index 0 and 9 are "barfoo" and "foobar" respectively.
The output order does not matter, returning [9,0] is fine too.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]

Constraints:

1 <= s.length <= 10⁴
s consists of lower-case English letters.
1 <= words.length <= 5000
1 <= words[i].length <= 30
words[i] consists of lower-case English letters.

Solution

/**
 * @param {string} s
 * @param {string[]} words
 * @return {number[]}
 */
var findSubstring = function(s, words) {
    let wordLen = words[0].length; // length of each word in words
    const res = [];

    const map = {}; // count of each word in words
    for (const w of words) {
        if (!(w in map)) {
            map[w] = 0;
        }
        map[w]++;
    }

    const cache = []; // every possible wordLen word in s
    for (let i = 0; i <= s.length - wordLen; i++) {
        const word = s.substring(i, i + wordLen);
        cache.push(word);
    }

    for (let i = 0; i < s.length; i++) {
        let count = 0;
        let start = i;
        const curMap = {};

        // check if i is valid starting index
        while (count !== words.length) {
            const word = cache[start];
            if (!(word in map) || curMap[word] === map[word]) {
                break;
            }
            if (!(word in curMap)) {
                curMap[word] = 0;
            }
            curMap[word]++;
            count++;
            start += wordLen;
        }
        if (count === words.length) {
            res.push(i);
        }
    }
    return res;
};

We use map to keep track of the word count of each word in words. Next, trying every index i of s as a possible candidate, check if i is a starting index of each word from words concatenated. For optimization, we create cache such that cache[i] contains s[i:wordLen], as this removes the need to use the substring method during our actual check.

Let a be words.length, b be words[0].length, and n be s.length. Our solution takes O(a) time to fill in map, O(nb) time to fill in cache, and O(na) time to fill in res. Thus, our solution has overall runtime of O(na + nb).

Follow-up

Can you come up with an O(a + nb) runtime solution?

Solution

/**
 * @param {string} s
 * @param {string[]} words
 * @return {number[]}
 */
var findSubstring = function(s, words) {
    let wordLen = words[0].length; // length of each word in words
    const res = [];

    // count each word in words
    const map = {};
    for (const w of words) {
        if (!(w in map)) {
            map[w] = 0;
        }
        map[w]++;
    }

    // find every possible wordLen word in s
    const cache = [];
    for (let i = 0; i <= s.length - wordLen; i++) {
        const word = s.substring(i, i + wordLen);
        cache.push(word);
    }

    // sliding window wordLen number of times
    for (let i = 0; i < wordLen; i++) {
        let l = i; // left of window
        let r = i; // right of window
        let curMap = {}; // word count of words in window
        let count = 0; // total count of curMap

        for (; r < s.length; r += wordLen) {
            const rWord = cache[r];
            let lWord = cache[l];

            // current word is not in words (reset variables and shift l)
            if (!(rWord in map)) {
                count = 0;
                curMap = {};
                l = r + wordLen;
                continue;
            }

            // current word is excess word (shift l by one word, don't move r)
            if (curMap[rWord] === map[rWord]) {
                curMap[lWord]--;
                l += wordLen;
                r -= wordLen;
                count--;
                continue;
            }

            // increment curMap count of current word
            if (!(rWord in curMap)) {
                curMap[rWord] = 0;
            }
            curMap[rWord]++;
            count++;

            // window is valid, update res and shift l by one word
            if (count === words.length) {
                res.push(l)
                curMap[lWord]--;
                l += wordLen;
                count--;
            }
        }
    }
    return res;
};

We will implement a sliding window solution. cache and map are defined the same as our previous solution. For our window, we slide wordLen at a time. As we iterate through s, we slide the right of our window r until:

The current word at r is not in words. In this case, we shift l to the word after the current word as well as reset count and curMap (ie. since any window that includes an invalid word is also an invalid window).
The current word at r causes us to have an excess amount of this word (eg. if word appears once in words, our window is invalid if word appears twice in it). Thus, we shift l one word at a time until the previous occurrence of the excess word is skipped over.
The current word at r is the last word that makes our partial window complete (ie. count == words.length). Hence, we add l to res and shift l one word to the right.

Notice that since we slide wordLen at a time, there are wordLen different starting positions we need to try.

Our solution takes O(a) time to fill in map, O(nb) time to fill in cache, and O(bn / b) = O(n) time to fill in res. Thus, our solution has overall runtime of O(a + nb).

Problem​

Solution​

Follow-up​

Solution​

Problem

Solution

Follow-up

Solution