Skip to main content

Substring with Concatenation of All Words

Problem

You are given a string s and an array of strings words of the same length. Return all starting indices of substring(s) in s that is a concatenation of each word in words exactly once, in any order, and without any intervening characters.

You can return the answer in any order.

 

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Substrings starting at index 0 and 9 are "barfoo" and "foobar" respectively.
The output order does not matter, returning [9,0] is fine too.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]

 

Constraints:

  • 1 <= s.length <= 104
  • s consists of lower-case English letters.
  • 1 <= words.length <= 5000
  • 1 <= words[i].length <= 30
  • words[i] consists of lower-case English letters.

Solution

/**
* @param {string} s
* @param {string[]} words
* @return {number[]}
*/
var findSubstring = function(s, words) {
let wordLen = words[0].length; // length of each word in words
const res = [];

const map = {}; // count of each word in words
for (const w of words) {
if (!(w in map)) {
map[w] = 0;
}
map[w]++;
}

const cache = []; // every possible wordLen word in s
for (let i = 0; i <= s.length - wordLen; i++) {
const word = s.substring(i, i + wordLen);
cache.push(word);
}

for (let i = 0; i < s.length; i++) {
let count = 0;
let start = i;
const curMap = {};

// check if i is valid starting index
while (count !== words.length) {
const word = cache[start];
if (!(word in map) || curMap[word] === map[word]) {
break;
}
if (!(word in curMap)) {
curMap[word] = 0;
}
curMap[word]++;
count++;
start += wordLen;
}
if (count === words.length) {
res.push(i);
}
}
return res;
};

We use map to keep track of the word count of each word in words. Next, trying every index i of s as a possible candidate, check if i is a starting index of each word from words concatenated. For optimization, we create cache such that cache[i] contains s[i:wordLen], as this removes the need to use the substring method during our actual check.

Let a be words.length, b be words[0].length, and n be s.length. Our solution takes O(a) time to fill in map, O(nb) time to fill in cache, and O(na) time to fill in res. Thus, our solution has overall runtime of O(na + nb).

Follow-up

Can you come up with an O(a + nb) runtime solution?

Solution

/**
* @param {string} s
* @param {string[]} words
* @return {number[]}
*/
var findSubstring = function(s, words) {
let wordLen = words[0].length; // length of each word in words
const res = [];

// count each word in words
const map = {};
for (const w of words) {
if (!(w in map)) {
map[w] = 0;
}
map[w]++;
}

// find every possible wordLen word in s
const cache = [];
for (let i = 0; i <= s.length - wordLen; i++) {
const word = s.substring(i, i + wordLen);
cache.push(word);
}

// sliding window wordLen number of times
for (let i = 0; i < wordLen; i++) {
let l = i; // left of window
let r = i; // right of window
let curMap = {}; // word count of words in window
let count = 0; // total count of curMap

for (; r < s.length; r += wordLen) {
const rWord = cache[r];
let lWord = cache[l];

// current word is not in words (reset variables and shift l)
if (!(rWord in map)) {
count = 0;
curMap = {};
l = r + wordLen;
continue;
}

// current word is excess word (shift l by one word, don't move r)
if (curMap[rWord] === map[rWord]) {
curMap[lWord]--;
l += wordLen;
r -= wordLen;
count--;
continue;
}

// increment curMap count of current word
if (!(rWord in curMap)) {
curMap[rWord] = 0;
}
curMap[rWord]++;
count++;

// window is valid, update res and shift l by one word
if (count === words.length) {
res.push(l)
curMap[lWord]--;
l += wordLen;
count--;
}
}
}
return res;
};

We will implement a sliding window solution. cache and map are defined the same as our previous solution. For our window, we slide wordLen at a time. As we iterate through s, we slide the right of our window r until:

  1. The current word at r is not in words. In this case, we shift l to the word after the current word as well as reset count and curMap (ie. since any window that includes an invalid word is also an invalid window).
  2. The current word at r causes us to have an excess amount of this word (eg. if word appears once in words, our window is invalid if word appears twice in it). Thus, we shift l one word at a time until the previous occurrence of the excess word is skipped over.
  3. The current word at r is the last word that makes our partial window complete (ie. count == words.length). Hence, we add l to res and shift l one word to the right.

Notice that since we slide wordLen at a time, there are wordLen different starting positions we need to try.

Our solution takes O(a) time to fill in map, O(nb) time to fill in cache, and O(bn / b) = O(n) time to fill in res. Thus, our solution has overall runtime of O(a + nb).