Substring with Concatenation of All Words
Problem
You are given a string s
and an array of strings words
of the same length. Return all starting indices of substring(s) in s
that is a concatenation of each word in words
exactly once, in any order, and without any intervening characters.
You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"] Output: [0,9] Explanation: Substrings starting at index 0 and 9 are "barfoo" and "foobar" respectively. The output order does not matter, returning [9,0] is fine too.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"] Output: []
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"] Output: [6,9,12]
Constraints:
1 <= s.length <= 104
s
consists of lower-case English letters.1 <= words.length <= 5000
1 <= words[i].length <= 30
words[i]
consists of lower-case English letters.
Solution
/**
* @param {string} s
* @param {string[]} words
* @return {number[]}
*/
var findSubstring = function(s, words) {
let wordLen = words[0].length; // length of each word in words
const res = [];
const map = {}; // count of each word in words
for (const w of words) {
if (!(w in map)) {
map[w] = 0;
}
map[w]++;
}
const cache = []; // every possible wordLen word in s
for (let i = 0; i <= s.length - wordLen; i++) {
const word = s.substring(i, i + wordLen);
cache.push(word);
}
for (let i = 0; i < s.length; i++) {
let count = 0;
let start = i;
const curMap = {};
// check if i is valid starting index
while (count !== words.length) {
const word = cache[start];
if (!(word in map) || curMap[word] === map[word]) {
break;
}
if (!(word in curMap)) {
curMap[word] = 0;
}
curMap[word]++;
count++;
start += wordLen;
}
if (count === words.length) {
res.push(i);
}
}
return res;
};
We use map
to keep track of the word count of each word in words
. Next, trying every index i
of s
as a possible candidate, check if i
is a starting index of each word from words
concatenated. For optimization, we create cache
such that cache[i]
contains s[i:wordLen]
, as this removes the need to use the substring
method during our actual check.
Let a
be words.length
, b
be words[0].length
, and n
be s.length
. Our solution takes O(a)
time to fill in map
, O(nb)
time to fill in cache
, and O(na)
time to fill in res
. Thus, our solution has overall runtime of O(na + nb)
.
Follow-up
Can you come up with an O(a + nb)
runtime solution?
Solution
/**
* @param {string} s
* @param {string[]} words
* @return {number[]}
*/
var findSubstring = function(s, words) {
let wordLen = words[0].length; // length of each word in words
const res = [];
// count each word in words
const map = {};
for (const w of words) {
if (!(w in map)) {
map[w] = 0;
}
map[w]++;
}
// find every possible wordLen word in s
const cache = [];
for (let i = 0; i <= s.length - wordLen; i++) {
const word = s.substring(i, i + wordLen);
cache.push(word);
}
// sliding window wordLen number of times
for (let i = 0; i < wordLen; i++) {
let l = i; // left of window
let r = i; // right of window
let curMap = {}; // word count of words in window
let count = 0; // total count of curMap
for (; r < s.length; r += wordLen) {
const rWord = cache[r];
let lWord = cache[l];
// current word is not in words (reset variables and shift l)
if (!(rWord in map)) {
count = 0;
curMap = {};
l = r + wordLen;
continue;
}
// current word is excess word (shift l by one word, don't move r)
if (curMap[rWord] === map[rWord]) {
curMap[lWord]--;
l += wordLen;
r -= wordLen;
count--;
continue;
}
// increment curMap count of current word
if (!(rWord in curMap)) {
curMap[rWord] = 0;
}
curMap[rWord]++;
count++;
// window is valid, update res and shift l by one word
if (count === words.length) {
res.push(l)
curMap[lWord]--;
l += wordLen;
count--;
}
}
}
return res;
};
We will implement a sliding window solution. cache
and map
are defined the same as our previous solution. For our window, we slide wordLen
at a time. As we iterate through s
, we slide the right of our window r
until:
- The current word at
r
is not inwords
. In this case, we shiftl
to the word after the current word as well as resetcount
andcurMap
(ie. since any window that includes an invalid word is also an invalid window). - The current word at
r
causes us to have an excess amount of this word (eg. ifword
appears once inwords
, our window is invalid ifword
appears twice in it). Thus, we shiftl
one word at a time until the previous occurrence of the excess word is skipped over. - The current word at
r
is the last word that makes our partial window complete (ie.count == words.length
). Hence, we addl
tores
and shiftl
one word to the right.
Notice that since we slide wordLen
at a time, there are wordLen
different starting positions we need to try.
Our solution takes O(a)
time to fill in map
, O(nb)
time to fill in cache
, and O(bn / b) = O(n)
time to fill in res
. Thus, our solution has overall runtime of O(a + nb)
.