一直以来都以为PEM只是单纯存个密钥,后来发现其实除了密钥还可以存很多奇奇怪怪的信息。

首先简单说以下PEM是啥,就是类似下面图片的一串东西,由-----BEGIN <TAG>-----开头,-----END <TAG>-----结尾,中间是Base64编码的一串二进制,每64个字母(即解码后的48bytes)有一个换行。中间的Base64解码后其实是一串遵循ASN.1协议的DER编码,简单来说可以看成一种序列化,把一个结构体中的整数、字串等编码成一个方便传输的二进制。

下面以RSA的公私钥为例子,可以用PyCrypto生成,也可以用openssl(略):

1
2
3
4
5
6
7
8
9
10
11
from Crypto.PublicKey import RSA

rsa = RSA.generate(1024)
sk = rsa.exportKey()
pk = rsa.publickey().exportKey()

with open ('./pub.pem', 'wb') as f:
f.write(pk)

with open ('./priv.pem', 'wb') as f:
f.write(sk)

RSA私钥·

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-----BEGIN RSA PRIVATE KEY-----
MIICXQIBAAKBgQCg0VTVv5fED3eXtEgZ0Jxgj6S1w45w2DvBMmcTjG7/TBqs7+Pd
tXHhtB2RHHq2E2z5BJMYlWNFDh9CcMq7xCB8VMTae4SiAxHPu6voK5/mC99IoI1X
g50M35Rk2EJivMBrwwgJWmmH9grQfWaaMStafkEzITeI7s8lhjJIuRNJ7wIDAQAB
AoGAD4JwxJaQO/Pj7EkSRQ8V7cgcsfz0sVRhWu4R+9Qo5k1AK1qNZtX3cDWPPm35
NbMk6NU0nIPXyZKlmCJJoxc0rLHbGcTI2CkmdRS8Hve7++JC1DUPZ6ACpW0z5W0a
lK3HHGjwINw5q30AZMERsWTia6BpjclKA839UW/9lm6HeUkCQQDKl+ScBYI3+W6Z
EYzjg/kZEsuhFj3pI2GB/3VO8+8aJg+sjS2a7oZtUai2g2mDsFz4UOeGKJtoWZJb
yGlfxnxHAkEAyzYwqv/8spYH8IM9x/BcFD7pL63+l12kz2cZ5xImvuclYuhjEyii
XXNRUHqNQ8EpWrbqJCtgoosQkjOpg/QhGQJAG0oypUGotNmIqF3Q2KTiXRpHC7/v
PwRhEh3TM3twbdlKqzepOQGAYiFp1IwHHpIXM+vSBCRcKsZGDM8GQrx96QJBAI2f
RKfII+qqWPor3SC8yM9rUMRj9Ky1HKlW51x87/fXy9x0rKeriAys05zM7CquMg4A
sIlomb5uQKxDyP4nY/ECQQDGfKbZiPU6vqghWUMaFGUSqNlCl41Kj4Py1CbxCV47
8bW5uLHMu60qMcZAGIBEekX14HkCaQYawTtfaPF3fX8H
-----END RSA PRIVATE KEY-----

以上是我生成的一个RSA私钥,首先从Base64上根本啥都看不出,所以可以先简单地转换成二进制(Hex):

1
2
3
4
5
6
7
8
9
10
11
with open('./priv.pem', 'r') as f:
data = f.read()

key_64 = ''.join(data.split('\n')[1:-1])
key_num = libnum.s2n(base64.b64decode(key_64))
key_hex = hex(key_num)[2:]
print(key_hex)

'''
3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07
'''

直接看的话其实也不能看出什么东西,但既然我是用PyCrypto生成的,那我就可以追下生成代码找到用什么格式生成,代码文件在对应Python库路径的.../site-packages/Crypto/PublicKey/RSA.py(PS:这篇文章的版本是pycryptodome 3.9.9),关于私钥生成的主要功能在RsaKey类的export_key函数(如果是跟我同版本的话是225行):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class RsaKey(object):
... ...
def export_key(self, format='PEM', passphrase=None, pkcs=1,
protection=None, randfunc=None):
... ...
# DER format is always used, even in case of PEM, which simply
# encodes it into BASE64.
if self.has_private():
binary_key = DerSequence([0,
self.n,
self.e,
self.d,
self.p,
self.q,
self.d % (self.p-1),
self.d % (self.q-1),
Integer(self.q).inverse(self.p)
]).encode()
if pkcs == 1:
key_type = 'RSA PRIVATE KEY'
... ...
... ...
if format == 'PEM':
from Crypto.IO import PEM

pem_str = PEM.encode(binary_key, key_type, passphrase, randfunc)
return tobytes(pem_str)
... ...

首先从后往前看,最后返回的东西是PEM.encode编码出来的,所以先看PEM.encode做了什么,位置是.../site-packages/Crypto/IO/PEM.pyencode函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
def encode(data, marker, passphrase=None, randfunc=None):
... ...
out = "-----BEGIN %s-----\n" % marker
... ...

# Each BASE64 line can take up to 64 characters (=48 bytes of data)
# b2a_base64 adds a new line character!
chunks = [tostr(b2a_base64(data[i:i + 48]))
for i in range(0, len(data), 48)]
out += "".join(chunks)
out += "-----END %s-----" % marker
return out

其实PEM.encode做的只是每48个bytes编码成一行Base64,然后附上BEGINEND而已,不是什么关键函数。重点是输入的data是怎么生成的。

所以继续往上看,输入的data是由DerSequence[0, n, e, d, ...]的顺序生成的,如果熟悉的话可以知道,这个也是openssl读RSA私钥时的输出顺序,可以用openssl rsa -in priv.pem --text试试,这个顺序在RFC3447中有定义:

1
2
3
4
5
6
7
8
9
10
11
12
RSAPrivateKey ::= SEQUENCE {
version Version,
modulus INTEGER, -- n
publicExponent INTEGER, -- e
privateExponent INTEGER, -- d
prime1 INTEGER, -- p
prime2 INTEGER, -- q
exponent1 INTEGER, -- d mod (p-1)
exponent2 INTEGER, -- d mod (q-1)
coefficient INTEGER, -- (inverse of q) mod p
otherPrimeInfos OtherPrimeInfos OPTIONAL
}

其中Version中的0是指普通的两个素数的RSA,如果是1的话则表示多素数的RSA:

1
2
3
Version ::= INTEGER { two-prime(0), multi(1) }
(CONSTRAINED BY
{-- version must be multi if otherPrimeInfos present --})

所以接着追到DerSequence,在.../site-packages/Crypto/Util/asn1.py(344行):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class DerSequence(DerObject):
... ...
def encode(self):
"""Return this DER SEQUENCE, fully encoded as a
binary string.
"""
self.payload = b''
for item in self._seq:
if byte_string(item):
self.payload += item
elif _is_number(item):
self.payload += DerInteger(item).encode()
else:
self.payload += item.encode()
return DerObject.encode(self)

encode函数把输入seq中的每一个item分成三类,除了数字应该看代码都能理解,数字的话则还需要经过DerInteger(item)编码,所以还要追一下DerInteger,在同一个文件(249行):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class DerInteger(DerObject):
... ...
def encode(self):
"""Return the DER INTEGER, fully encoded as a
binary string."""

number = self.value
self.payload = b''
while True:
self.payload = bchr(int(number & 255)) + self.payload
if 128 <= number <= 255:
self.payload = bchr(0x00) + self.payload
if -128 <= number <= 255:
break
number >>= 8
return DerObject.encode(self)

盲猜是一个数字转byte的功能(懒得逆),最后是由DerObject.encode编码的,而且上面的DerSequenceencode最后也是由DerObject.encode编码的,所以追到DerObject.encode,也是同一个文件(165行):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class DerObject(object):
... ...
def encode(self):
"""Return this DER element, fully encoded as a binary byte string."""

# Concatenate identifier octets, length octets,
# and contents octets

output_payload = self.payload

... ...

return (bchr(self._tag_octet) +
self._definite_form(len(output_payload)) +
output_payload)

直接看return的东西就好了,是<tag> + <length> + <payload>的格式,payload是由上层函数做的所以这里不用管(已经逆完了);tag是ASN.1的类型标签,可以参考这里,比如0x30是指序列(Sequence),0x02指整数(Integer)等;lengthpayload的长度,但是前面还有个_definite_form对长度做格式化,继续追,还是在同一个文件(156行):

1
2
3
4
5
6
7
8
9
def _definite_form(length):
"""Build length octets according to BER/DER
definite form.
"""
if length > 127:
encoding = long_to_bytes(length)
return bchr(len(encoding) + 128) + encoding
return bchr(length)

大概意思是,如果长度小于127的话(即byte的最高位还没为1)就直接返回;如果超过127的话,把<length>的最高比特置1,然后加上存储长度需要占用的byte数量x,然后剩下的x个bytes用来存储长度。比如长度是0x0100的话需要2bytes存储,会被编成0x820100;长度是0xf0的话,因为最高比特为1所以不能直接存,占用1byte,被编成0x81f0。

手撕RSA私钥·

经过上面的逆向后就可以开撕了,首先看前面转出来的二进制:

1
3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07

30就是Sequence的tag,82就是说接下来后两个bytes是这个Sequence的长度,即0x025d个bytes,也就是剩下全部都是。接着的020100就是整数0,其中02是整数的tag,01是这个整数占1byte,00是value同样的方法也可以解02818100a0...和后面其他整数(其实生成的私钥PEM只有整数-),大概长这样:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
3082025d  	# Begin Sequence: len=0x025d

0201 # Version: (len=0x01)
00

028181 # n: (len=0x81)
00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef

0203 # e: (len=0x03)
010001

028180 # d: (len=0x80)
0f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949

0241 # p: (len=0x41)
00ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47

0241 # q: (len=0x41)
00cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f42119

0240 # d mod (p-1): (len=0x40)
1b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de9

0241 # d mod (q-1): (len=0x41)
008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1

0241 # (inverse of q) mod p: (len=0x41)
00c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07

# End Sequence

另外也可以from Crypto.Util.asn1 import DerSequence,DerInteger ,然后用PyCrypto解,略。

RSA公钥·

公钥部分也是类似的,先看.../site-packages/Crypto/PublicKey/RSA.py(348行)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class RsaKey(object):
... ...
def export_key(self, format='PEM', passphrase=None, pkcs=1,
protection=None, randfunc=None):
... ...
if self.has_private():
... ...
else:
key_type = "PUBLIC KEY"
binary_key = _create_subject_public_key_info(oid,
DerSequence([self.n,
self.e])
)
... ...

主要看_create_subject_public_key_info,在.../site-packages/Crypto/PublicKey/__init__.py(63行):

1
2
3
4
5
6
7
8
9
10
11
def _create_subject_public_key_info(algo_oid, secret_key, params=None):
if params is None:
params = DerNull()

spki = DerSequence([
DerSequence([
DerObjectId(algo_oid),
params]),
DerBitString(secret_key)
])
return spki.encode()

即会编码成一个嵌套数组,最终转化为DER时会是平坦化后的spki。另附上RFC 3447说明:

1
2
3
4
RSAPublicKey ::= SEQUENCE {
modulus INTEGER, -- n
publicExponent INTEGER -- e
}

手撕RSA公钥·

过程和私钥的差不多,就略着讲了,首先是拿二进制:

1
30819f300d06092a864886f70d010101050003818d0030818902818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef0203010001

然后拆分成:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
30819f 		# Begin Main Sequence: len=0x9f

300d # Begin Sub1 Sequence: len=0x0d

0609 # algo_oid: (1.2.840.113549.1.1.1 - PKCSv1.2)
2a864886f70d010101

0500 # params: (null)


# End Sub1 Sequence

03818d # BitString: len=0x8d ([n, e])

00308189 # Begin Sub2 Sequence: len=0x89

028181 # n:
00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef

0203 # e:
010001

# End Sub2 Sequence

# End Main Sequence

另外,关于algo_oid(OBJECT IDENTIFIER)的Hex编码还是有点迷,可以参考这里

参考·

https://www.shangyang.me/2017/05/24/encrypt-rsa-keyformat/

https://docs.microsoft.com/en-us/windows/win32/seccertenroll/about-introduction-to-asn-1-syntax-and-encoding

https://docs.microsoft.com/en-us/windows/win32/seccertenroll/about-encoded-tag-bytes

https://datatracker.ietf.org/doc/html/rfc3447

https://crypto.stackexchange.com/questions/29115/how-is-oid-2a-86-48-86-f7-0d-parsed-as-1-2-840-113549

https://www.alvestrand.no/objectid/