0943-最短超级串

Raphael Liu Lv10

给定一个字符串数组 words,找到以 words 中每个字符串作为子字符串的最短字符串。如果有多个有效最短字符串满足题目条件,返回其中
任意一个 即可。

我们可以假设 words 中没有字符串是 words 中另一个字符串的子字符串。

示例 1:

**输入:** words = ["alex","loves","leetcode"]
**输出:** "alexlovesleetcode"
**解释:** "alex","loves","leetcode" 的所有排列都会被接受。

示例 2:

**输入:** words = ["catg","ctaagt","gcta","ttca","atgcatc"]
**输出:** "gctaagttcatgcatc"

提示:

  • 1 <= words.length <= 12
  • 1 <= words[i].length <= 20
  • words[i] 由小写英文字母组成
  • words 中的所有字符串 互不相同

方法一:动态规划

由于没有一个字符串是另一个字符串的子串,因此最优的方法是:将这些字符串排成一行,对于每两个相邻的字符串,将前一个字符串的后缀和后一个字符串的前缀的重复部分合并,得到的长字符串就是最优的答案。也就是说,如果需要最后的字符串最短,就需要让这些字符串的重复部分最长。

假设我们已经选出了若干个字符串将它们排成一行且合并了重复部分,并且最后一个选出的字符串是 A[i],那么如果我们现在选出一个新的字符串 A[j],那么重复部分的长度会增加 overlap(A[i], A[j]),而与在 A[i] 之前选取了哪些字符串无关。

因此我们可以使用动态规划来解决这个问题。设 dp(mask, i) 表示已经选出的字符串为 maskmask 是一个长度为 A.length 的二进制数,它的第 k 位如果为 1,则表示第 k 个字符串已经选出,否则表示第 k 个字符串没有被选出),且最后一个选出的字符串是 A[i] 时的重复部分的最大长度。在状态转移时,我们枚举下一个选出的字符串 j,就有

dp(mask ^ (1 << j), j) = max{overlap(A[i], A[j]) + dp(mask, i)}

当然 dp(mask, i) 只记录了重复部分的最大长度,要得到这个最大长度对应的字符串,我们还需要记录一下每个状态从哪个状态转移得来,最后通过逆推的方式还原这个字符串。

算法

我们的算法包括三个部分:

  • 预先计算出所有的 overlap(A[i], A[j])

  • 使用动态规划计算出所有的 dp(mask, i),并记录每个状态从哪个状态转移得来,记为 parent

  • 通过 parent 还原这个字符串。

[sol1]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class Solution {
public String shortestSuperstring(String[] A) {
int N = A.length;

// Populate overlaps
int[][] overlaps = new int[N][N];
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j) if (i != j) {
int m = Math.min(A[i].length(), A[j].length());
for (int k = m; k >= 0; --k)
if (A[i].endsWith(A[j].substring(0, k))) {
overlaps[i][j] = k;
break;
}
}

// dp[mask][i] = most overlap with mask, ending with ith element
int[][] dp = new int[1<<N][N];
int[][] parent = new int[1<<N][N];
for (int mask = 0; mask < (1<<N); ++mask) {
Arrays.fill(parent[mask], -1);

for (int bit = 0; bit < N; ++bit) if (((mask >> bit) & 1) > 0) {
// Let's try to find dp[mask][bit]. Previously, we had
// a collection of items represented by pmask.
int pmask = mask ^ (1 << bit);
if (pmask == 0) continue;
for (int i = 0; i < N; ++i) if (((pmask >> i) & 1) > 0) {
// For each bit i in pmask, calculate the value
// if we ended with word i, then added word 'bit'.
int val = dp[pmask][i] + overlaps[i][bit];
if (val > dp[mask][bit]) {
dp[mask][bit] = val;
parent[mask][bit] = i;
}
}
}
}

// # Answer will have length sum(len(A[i]) for i) - max(dp[-1])
// Reconstruct answer, first as a sequence 'perm' representing
// the indices of each word from left to right.

int[] perm = new int[N];
boolean[] seen = new boolean[N];
int t = 0;
int mask = (1 << N) - 1;

// p: the last element of perm (last word written left to right)
int p = 0;
for (int j = 0; j < N; ++j)
if (dp[(1<<N) - 1][j] > dp[(1<<N) - 1][p])
p = j;

// Follow parents down backwards path that retains maximum overlap
while (p != -1) {
perm[t++] = p;
seen[p] = true;
int p2 = parent[mask][p];
mask ^= 1 << p;
p = p2;
}

// Reverse perm
for (int i = 0; i < t/2; ++i) {
int v = perm[i];
perm[i] = perm[t-1-i];
perm[t-1-i] = v;
}

// Fill in remaining words not yet added
for (int i = 0; i < N; ++i) if (!seen[i])
perm[t++] = i;

// Reconstruct final answer given perm
StringBuilder ans = new StringBuilder(A[perm[0]]);
for (int i = 1; i < N; ++i) {
int overlap = overlaps[perm[i-1]][perm[i]];
ans.append(A[perm[i]].substring(overlap));
}

return ans.toString();
}
}

[sol1]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class Solution(object):
def shortestSuperstring(self, A):
N = len(A)

# Populate overlaps
overlaps = [[0] * N for _ in xrange(N)]
for i, x in enumerate(A):
for j, y in enumerate(A):
if i != j:
for ans in xrange(min(len(x), len(y)), -1, -1):
if x.endswith(y[:ans]):
overlaps[i][j] = ans
break

# dp[mask][i] = most overlap with mask, ending with ith element
dp = [[0] * N for _ in xrange(1<<N)]
parent = [[None] * N for _ in xrange(1<<N)]
for mask in xrange(1, 1 << N):
for bit in xrange(N):
if (mask >> bit) & 1:
# Let's try to find dp[mask][bit]. Previously, we had
# a collection of items represented by pmask.
pmask = mask ^ (1 << bit)
if pmask == 0: continue
for i in xrange(N):
if (pmask >> i) & 1:
# For each bit i in pmask, calculate the value
# if we ended with word i, then added word 'bit'.
value = dp[pmask][i] + overlaps[i][bit]
if value > dp[mask][bit]:
dp[mask][bit] = value
parent[mask][bit] = i

# Answer will have length sum(len(A[i]) for i) - max(dp[-1])
# Reconstruct answer:

# Follow parents down backwards path that retains maximum overlap
perm = []
mask = (1<<N) - 1
i = max(xrange(N), key = dp[-1].__getitem__)
while i is not None:
perm.append(i)
mask, i = mask ^ (1<<i), parent[mask][i]

# Reverse path to get forwards direction; add all remaining words
perm = perm[::-1]
seen = [False] * N
for x in perm:
seen[x] = True
perm.extend([i for i in xrange(N) if not seen[i]])

# Reconstruct answer given perm = word indices in left to right order
ans = [A[perm[0]]]
for i in xrange(1, len(perm)):
overlap = overlaps[perm[i-1]][perm[i]]
ans.append(A[perm[i]][overlap:])

return "".join(ans)

复杂度分析

  • 时间复杂度:O(N^2 * (2^N + W)),其中 N 是字符串的数目,W 是字符串的最大长度。

  • 空间复杂度:O(N * (2^N + W))。

 Comments
On this page
0943-最短超级串