Skip to content

187. Repeated DNA Sequences

Difficulty Topics

Description

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

  • For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

 

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]

 

Constraints:

  • 1 <= s.length <= 105
  • s[i] is either 'A', 'C', 'G', or 'T'.

Solution

repeated-dna-sequences.py
class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        mp = {"A": 1, "C": 2, "G": 3, "T": 4}
        q = (1 << 31) - 1
        h = (pow(4, 9)) % q
        seen, res = set(), set()
        t = 0

        for i, x  in enumerate(s):
            if i >= 10:
                t -= (mp[s[i - 10]] * h)
            t = (4 * t + mp[x]) % q

            if i >= 9 and t in seen:
                res.add(s[i - 9: i + 1])
            else:
                seen.add(t)

        return list(res)