Crypto-OneTimePad

One Time Pad （一次性密码本）

什么是 One Time Pad

先来仔细看看什么是 One Time Pad 。

通俗的说，就是存在一个密钥字符串，长度与明文和密文一样，逐位将明文的每一位和密钥的对应位作混淆处理（可能移位，也可能异或以及相加取余等等）。

安全性

使用凯撒密文进行加密的时候，我们把信息的每一个字母都按照字母表移动相同的位数。移位数量可以取1到26的任意一个数。比如，我们想加密的信息是 ALICE ，这样其实总的密文的可能性也没有多少种，所以可以很容易用暴力搜索的形式找到信息。

但是使用 One Time Pad 的时候，每一个字母移动的位数是不同的，每一个字母的取值就有26种可能，这样可能生成的密文种类就是26的五次方，有一千多万种可能。这几个移动的位数组成的字符串，就是本次加密的秘钥，长度是跟密文一致的，或者说，它就是一个 One Time Pad 。

可以看到 One Time Pad 是无条件安全的。

局限性

One Time Pad 虽然是最强的加密方法，但是也有自己的局限性。

使用 One Time Pad 有两个最佳实践。第一，一个 One Time Pad 只用一次。第二，每个 One Time Pad 中的数据要保证真正的随机。

如果对于一个One Time Pad使用了多次，则存在被爆破的风险。

Many Time Pad(多次加密)

转载至 Many-Time-Pad 攻击 (ruanx.net)

以简单异或实现的流密码，如果不能保证一次一密，则是不安全的。本文展示了多次加密采用同一个密钥*的情形，此时从密文可能推断出明文和密钥。

一次一密的密钥分发是比较困难的。首先，Alice 想要给 Bob 发送长度为 n 的信息，则必须在这之前传送长度为 n 的密钥，相当于传输的数据总量翻了倍。其次，尽管密文是无条件安全的，但密钥的传输信道未必是安全的，攻击者一旦窃听了密钥，则可以解密密文。

那么马上就可以想到一个投机取巧的方法—— Alice 造一个比较长的密钥，然后用非常秘密的方式告诉 Bob.接下来，Alice 每次向 Bob 发送信息，都把明文异或上这个约定好的字符串;Bob 收到信息之后，把密文异或上，于是就可以拿到明文。整个过程只需要传送一次密钥，这是很方便的。这种方式称为 Many-Time-Pad （MTP）.key

　　很遗憾，上述的 MTP 办法是不安全的。攻击者如果截获了足够多的密文，就有可能推断出明文、进而拿到密钥。这个缺陷是异或运算的性质带来的。

例子

　　作为 MTP 攻击的范例，来看下面一道例题：

BUUCTF： [AFCTF2018]你听过一次一密么？

（原题有bug，笔者有少量改动）
25030206463d3d393131555f7f1d061d4052111a19544e2e5d54 0f020606150f203f307f5c0a7f24070747130e16545000035d54
1203075429152a7020365c167f390f1013170b1006481e1 3144e
0f4610170e1e2235787f7853372c0f065752111b15454e0e0e0901 081543000e1e6f3f3a3348533a270d064a02111a1b5f4e0a1855

0909075412132e247436425332281a1c561f04071d520f0b1158
4116111b101e2170203011113a69001b47520601155205021901 041006064612297020375453342c17545a01451811411a470e44 021311114a5b0335207f7c167f2 2001b44520c15544801125d40 06140611460c26243c7f5c167f3d015446010053005907145d44

0f05110d160f263f3a7f4210372c03111313090415481d49530f

　　上述的每一个字符串 Ci，都是某个异或上明文 key米i 得到的。我们的目标是获取这个 . 已知明文是英文句子。key

　　回顾异或运算的性质：结合律、交换律、逆元为其自身。这是非常好的性质，然而也为攻击者提供了方便。因为：C1⊕C2=(M1⊕key)⊕(M2⊕key)=M1⊕M2

　　这表明，两个密文的异或，就等于对应明文的异或。这是很危险的性质，高明的攻击者可以通过频率分析，来破译这些密文。我们来看字符串 C1异或上其他密文会得到什么东西。以下只保留了英文字符，其余字符以 “.” 代替。

....S....N.U.....A..M.N...
...Ro..I...I....SE....P.I.
.E..H...IN..H...........TU
..A.H.R.....E....P......E.
...RT...E...M....M....A.L.
d...V..I..DNEt........K.DU
.......I....K..I.ST...TiS.
.....f...N.I........M.O...
.........N.I...I.S.I..I...
....P....N.OH...SA....Sg..

　　可以观察到，有些列上有大量的英文字符，有些列一个英文字符都没有。这是偶然现象吗？

腹水表

　　ascii 码表在 Linux 下可以通过指令查看。它的性质有：man ascii

0x20是空格。低于的，全部是起特殊用途的字符；的，是可打印字符。0x20``0x20~0x7E
0x30~0x39 是数字。0,1,2...9
0x41~0x5A 是大写字母；是小写字母 .A-Z``0x61~0x7A``a-z

我们可以注意到一个至关重要的规律：小写字母 xor 空格，会得到对应的大写字母;大写字母 xor 空格，会得到小写字母！所以，如果x⊕y 得到一个英文字母，那么 x，y 中的某一个有很大概率是空格。再来回头看上面 C1 xor 其他密文——也就等于米1 xor 其他明文的表，如果第中欧列存在大量的英文字母，我们可以猜测 M1[col] 是一个空格。那一列英文字母越多，把握越大。

　　知道米1 的中欧位是空格有什么用呢？别忘了异或运算下，x 的逆元是其自身。所以Mi[col]=M1[col]⊕Mi[col]⊕M1[col]=M1[col]⊕Mi[col]⊕0x20

　　于是，只要知道某个字符串的某一位是空格，我们就可以恢复出所有明文在这一列的值。

攻击

　　攻击过程显而易见：对于每一条密文Ci，拿去异或其他所有密文。然后去数每一列有多少个英文字符，作为”米i在这一位是空格”的评分。

上面的事情做完时候，依据评分从大到小排序，依次利用 “某个明文的某一位是空格” 这种信息恢复出所有明文的那一列。如果产生冲突，则舍弃掉评分小的。不难写出代码：

import Crypto.Util.strxor as xo
import libnum, codecs, numpy as np

def isChr(x):
    if ord('a') <= x and x <= ord('z'): return True
    if ord('A') <= x and x <= ord('Z'): return True
    return False

def infer(index, pos):
    if msg[index, pos] != 0:
        return
    msg[index, pos] = ord(' ')
    for x in range(len(c)):
        if x != index:
            msg[x][pos] = xo.strxor(c[x], c[index])[pos] ^ ord(' ')

dat = []

def getSpace():
    for index, x in enumerate(c):
        res = [xo.strxor(x, y) for y in c if x!=y]
        f = lambda pos: len(list(filter(isChr, [s[pos] for s in res])))
        cnt = [f(pos) for pos in range(len(x))]
        for pos in range(len(x)):
            dat.append((f(pos), index, pos))

c = [codecs.decode(x.strip().encode(), 'hex') for x in open('Problem.txt', 'r').readlines()]

msg = np.zeros([len(c), len(c[0])], dtype=int)

getSpace()

dat = sorted(dat)[::-1]
for w, index, pos in dat:
    infer(index, pos)

print('\n'.join([''.join([chr(c) for c in x]) for x in msg]))

　　执行代码，得到的结果是：

Dear Friend, T%is tim< I u
nderstood my m$stake 8nd u
sed One time p,d encr ptio
n scheme, I he,rd tha- it 
is the only en.ryptio7 met
hod that is ma9hemati:ally
 proven to be #ot cra:ked 
ever if the ke4 is ke)t se
cure, Let Me k#ow if  ou a
gree with me t" use t1is e
ncryption sche e alwa s...

显然这不是最终结果，我们得修正几项。把 “k#now” 修复成 “know”，把 “alwa s” 修复成 “always”.代码如下：

def know(index, pos, ch):
    msg[index, pos] = ord(ch)
    for x in range(len(c)):
        if x != index:
            msg[x][pos] = xo.strxor(c[x], c[index])[pos] ^ ord(ch)

know(10, 21, 'y')
know(8, 14, 'n')

print('\n'.join([''.join([chr(c) for c in x]) for x in msg]))

　　结果得到：

Dear Friend, This time I u
nderstood my mistake and u
sed One time pad encryptio
n scheme, I heard that it 
is the only encryption met
hod that is mathematically
 proven to be not cracked 
ever if the key is kept se
cure, Let Me know if you a
gree with me to use this e
ncryption scheme always...

　　我们成功恢复了明文！那么也很好取得了：把 keyC1 异或上米1 即可。

key = xo.strxor(c[0], ''.join([chr(c) for c in msg[0]]).encode())
print(key)

# b'afctf{OPT_1s_Int3rest1ng}!'

结论

Many-Time-Pad 是不安全的。我们这一次的攻击，条件稍微有点苛刻：明文必须是英文句子、截获到的密文必须足够多。但是只要攻击者有足够的耐心进行词频分析、监听大量密文，还是能够发起极具威胁性的攻击。如果铁了心要用直接xor来加密信息，应当采用一次一密（One-Time-Pad）.

python2 完整exp：

import string
import collections
import sets, sys

# 11 unknown ciphertexts (in hex format), all encrpyted with the same key

c1='25030206463d3d393131555f7f1d061d4052111a19544e2e5d'
c2='0f020606150f203f307f5c0a7f24070747130e16545000035d'
c3='1203075429152a7020365c167f390f1013170b1006481e1314'
c4='0f4610170e1e2235787f7853372c0f065752111b15454e0e09'
c5='081543000e1e6f3f3a3348533a270d064a02111a1b5f4e0a18'
c6='0909075412132e247436425332281a1c561f04071d520f0b11'
c7='4116111b101e2170203011113a69001b475206011552050219'
c8='041006064612297020375453342c17545a01451811411a470e'
c9='021311114a5b0335207f7c167f22001b44520c15544801125d'
c10='06140611460c26243c7f5c167f3d015446010053005907145d'
c11='0f05110d160f263f3a7f4210372c03111313090415481d49'
ciphers = [c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11]
# The target ciphertext we want to crack
#target_cipher = "0529242a631234122d2b36697f13272c207f2021283a6b0c7908"

# XORs two string
def strxor(a, b):     # xor two strings (trims the longer input)
    return "".join([chr(ord(x) ^ ord(y)) for (x, y) in zip(a, b)])

def target_fix(target_cipher):
    # To store the final key
    final_key = [None]*150
    # To store the positions we know are broken
    known_key_positions = set()

    # For each ciphertext
    for current_index, ciphertext in enumerate(ciphers):
        counter = collections.Counter()
        # for each other ciphertext
        for index, ciphertext2 in enumerate(ciphers):
            if current_index != index: # don't xor a ciphertext with itself
                for indexOfChar, char in enumerate(strxor(ciphertext.decode('hex'), ciphertext2.decode('hex'))): # Xor the two ciphertexts
                    # If a character in the xored result is a alphanumeric character, it means there was probably a space character in one of the plaintexts (we don't know which one)
                    if char in string.printable and char.isalpha(): counter[indexOfChar] += 1 # Increment the counter at this index
        knownSpaceIndexes = []

        # Loop through all positions where a space character was possible in the current_index cipher
        for ind, val in counter.items():
            # If a space was found at least 7 times at this index out of the 9 possible XORS, then the space character was likely from the current_index cipher!
            if val >= 7: knownSpaceIndexes.append(ind)
        #print knownSpaceIndexes # Shows all the positions where we now know the key!

        # Now Xor the current_index with spaces, and at the knownSpaceIndexes positions we get the key back!
        xor_with_spaces = strxor(ciphertext.decode('hex'),' '*150)
        for index in knownSpaceIndexes:
            # Store the key's value at the correct position
            final_key[index] = xor_with_spaces[index].encode('hex')
            # Record that we known the key at this position
            known_key_positions.add(index)

    # Construct a hex key from the currently known key, adding in '00' hex chars where we do not know (to make a complete hex string)
    final_key_hex = ''.join([val if val is not None else '00' for val in final_key])
    # Xor the currently known key with the target cipher
    output = strxor(target_cipher.decode('hex'),final_key_hex.decode('hex'))

    print "Fix this sentence:"
    print ''.join([char if index in known_key_positions else '*' for index, char in enumerate(output)])+"\n"

    # WAIT.. MANUAL STEP HERE 
    # This output are printing a * if that character is not known yet
    # fix the missing characters like this: "Let*M**k*ow if *o{*a" = "cure, Let Me know if you a"
    # if is too hard, change the target_cipher to another one and try again
    # and we have our key to fix the entire text!

    #sys.exit(0) #comment and continue if u got a good key

    target_plaintext = "cure, Let Me know if you a"
    print "Fixed:"
    print target_plaintext+"\n"

    key = strxor(target_cipher.decode('hex'),target_plaintext)

    print "Decrypted msg:"
    for cipher in ciphers:
        print strxor(cipher.decode('hex'),key)

    print "\nPrivate key recovered: "+key+"\n"
    
for i in ciphers:
    target_fix(i)