187. Repeated DNA Sequences
Description
The DNA sequence is composed of a series of nucleotides abbreviated as 'A'
, 'C'
, 'G'
, and 'T'
.
- For example,
"ACGAATTCCG"
is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s
that represents a DNA sequence, return all the 10
-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]
Constraints:
1 <= s.length <= 105
s[i]
is either'A'
,'C'
,'G'
, or'T'
.
Solution
repeated-dna-sequences.py
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
mp = {"A": 1, "C": 2, "G": 3, "T": 4}
q = (1 << 31) - 1
h = (pow(4, 9)) % q
seen, res = set(), set()
t = 0
for i, x in enumerate(s):
if i >= 10:
t -= (mp[s[i - 10]] * h)
t = (4 * t + mp[x]) % q
if i >= 9 and t in seen:
res.add(s[i - 9: i + 1])
else:
seen.add(t)
return list(res)