一直以来都以为PEM 只是单纯存个密钥,后来发现其实除了密钥还可以存很多奇奇怪怪的信息。
首先简单说以下PEM是啥,就是类似下面图片的一串东西,由-----BEGIN <TAG>-----
开头,-----END <TAG>-----
结尾,中间是Base64编码的一串二进制,每64个字母(即解码后的48bytes)有一个换行。中间的Base64解码后其实是一串遵循ASN.1 协议的DER 编码,简单来说可以看成一种序列化,把一个结构体中的整数、字串等编码成一个方便传输的二进制。
下面以RSA的公私钥为例子,可以用PyCrypto 生成,也可以用openssl(略):
1 2 3 4 5 6 7 8 9 10 11 from Crypto.PublicKey import RSArsa = RSA.generate(1024 ) sk = rsa.exportKey() pk = rsa.publickey().exportKey() with open ('./pub.pem' , 'wb' ) as f: f.write(pk) with open ('./priv.pem' , 'wb' ) as f: f.write(sk)
RSA私钥·
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -----BEGIN RSA PRIVATE KEY----- MIICXQIBAAKBgQCg0VTVv5fED3eXtEgZ0Jxgj6S1w45w2DvBMmcTjG7/TBqs7+Pd tXHhtB2RHHq2E2z5BJMYlWNFDh9CcMq7xCB8VMTae4SiAxHPu6voK5/mC99IoI1X g50M35Rk2EJivMBrwwgJWmmH9grQfWaaMStafkEzITeI7s8lhjJIuRNJ7wIDAQAB AoGAD4JwxJaQO/Pj7EkSRQ8V7cgcsfz0sVRhWu4R+9Qo5k1AK1qNZtX3cDWPPm35 NbMk6NU0nIPXyZKlmCJJoxc0rLHbGcTI2CkmdRS8Hve7++JC1DUPZ6ACpW0z5W0a lK3HHGjwINw5q30AZMERsWTia6BpjclKA839UW/9lm6HeUkCQQDKl+ScBYI3+W6Z EYzjg/kZEsuhFj3pI2GB/3VO8+8aJg+sjS2a7oZtUai2g2mDsFz4UOeGKJtoWZJb yGlfxnxHAkEAyzYwqv/8spYH8IM9x/BcFD7pL63+l12kz2cZ5xImvuclYuhjEyii XXNRUHqNQ8EpWrbqJCtgoosQkjOpg/QhGQJAG0oypUGotNmIqF3Q2KTiXRpHC7/v PwRhEh3TM3twbdlKqzepOQGAYiFp1IwHHpIXM+vSBCRcKsZGDM8GQrx96QJBAI2f RKfII+qqWPor3SC8yM9rUMRj9Ky1HKlW51x87/fXy9x0rKeriAys05zM7CquMg4A sIlomb5uQKxDyP4nY/ECQQDGfKbZiPU6vqghWUMaFGUSqNlCl41Kj4Py1CbxCV47 8bW5uLHMu60qMcZAGIBEekX14HkCaQYawTtfaPF3fX8H -----END RSA PRIVATE KEY-----
以上是我生成的一个RSA私钥,首先从Base64上根本啥都看不出,所以可以先简单地转换成二进制(Hex):
1 2 3 4 5 6 7 8 9 10 11 with open ('./priv.pem' , 'r' ) as f: data = f.read() key_64 = '' .join(data.split('\n' )[1 :-1 ]) key_num = libnum.s2n(base64.b64decode(key_64)) key_hex = hex (key_num)[2 :] print (key_hex)''' 3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07 '''
直接看的话其实也不能看出什么东西,但既然我是用PyCrypto生成的,那我就可以追下生成代码找到用什么格式生成,代码文件在对应Python库路径的.../site-packages/Crypto/PublicKey/RSA.py
(PS:这篇文章的版本是pycryptodome 3.9.9),关于私钥生成的主要功能在RsaKey
类的export_key
函数(如果是跟我同版本的话是225行):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 class RsaKey (object ): ... ... def export_key (self, format ='PEM' , passphrase=None , pkcs=1 , protection=None , randfunc=None ): ... ... if self.has_private(): binary_key = DerSequence([0 , self.n, self.e, self.d, self.p, self.q, self.d % (self.p-1 ), self.d % (self.q-1 ), Integer(self.q).inverse(self.p) ]).encode() if pkcs == 1 : key_type = 'RSA PRIVATE KEY' ... ... ... ... if format == 'PEM' : from Crypto.IO import PEM pem_str = PEM.encode(binary_key, key_type, passphrase, randfunc) return tobytes(pem_str) ... ...
首先从后往前看,最后返回的东西是PEM.encode
编码出来的,所以先看PEM.encode
做了什么,位置是.../site-packages/Crypto/IO/PEM.py
的encode
函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 def encode (data, marker, passphrase=None , randfunc=None ): ... ... out = "-----BEGIN %s-----\n" % marker ... ... chunks = [tostr(b2a_base64(data[i:i + 48 ])) for i in range (0 , len (data), 48 )] out += "" .join(chunks) out += "-----END %s-----" % marker return out
其实PEM.encode
做的只是每48个bytes编码成一行Base64,然后附上BEGIN
和END
而已,不是什么关键函数。重点是输入的data
是怎么生成的。
所以继续往上看,输入的data
是由DerSequence
以[0, n, e, d, ...]
的顺序生成的,如果熟悉的话可以知道,这个也是openssl读RSA私钥时的输出顺序,可以用openssl rsa -in priv.pem --text
试试,这个顺序在RFC3447 中有定义:
1 2 3 4 5 6 7 8 9 10 11 12 RSAPrivateKey ::= SEQUENCE { version Version, modulus INTEGER, -- n publicExponent INTEGER, -- e privateExponent INTEGER, -- d prime1 INTEGER, -- p prime2 INTEGER, -- q exponent1 INTEGER, -- d mod (p-1) exponent2 INTEGER, -- d mod (q-1) coefficient INTEGER, -- (inverse of q) mod p otherPrimeInfos OtherPrimeInfos OPTIONAL }
其中Version
中的0
是指普通的两个素数的RSA,如果是1
的话则表示多素数的RSA:
1 2 3 Version ::= INTEGER { two-prime(0), multi(1) } (CONSTRAINED BY {-- version must be multi if otherPrimeInfos present --})
所以接着追到DerSequence
,在.../site-packages/Crypto/Util/asn1.py
(344行):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class DerSequence (DerObject ): ... ... def encode (self ): """Return this DER SEQUENCE, fully encoded as a binary string. """ self.payload = b'' for item in self._seq: if byte_string(item): self.payload += item elif _is_number(item): self.payload += DerInteger(item).encode() else : self.payload += item.encode() return DerObject.encode(self)
encode
函数把输入seq
中的每一个item
分成三类,除了数字应该看代码都能理解,数字的话则还需要经过DerInteger(item)
编码,所以还要追一下DerInteger
,在同一个文件(249行):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class DerInteger (DerObject ): ... ... def encode (self ): """Return the DER INTEGER, fully encoded as a binary string.""" number = self.value self.payload = b'' while True : self.payload = bchr(int (number & 255 )) + self.payload if 128 <= number <= 255 : self.payload = bchr(0x00 ) + self.payload if -128 <= number <= 255 : break number >>= 8 return DerObject.encode(self)
盲猜是一个数字转byte的功能(懒得逆),最后是由DerObject.encode
编码的,而且上面的DerSequence
的encode
最后也是由DerObject.encode
编码的,所以追到DerObject.encode
,也是同一个文件(165行):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class DerObject (object ): ... ... def encode (self ): """Return this DER element, fully encoded as a binary byte string.""" output_payload = self.payload ... ... return (bchr(self._tag_octet) + self._definite_form(len (output_payload)) + output_payload)
直接看return
的东西就好了,是<tag> + <length> + <payload>
的格式,payload
是由上层函数做的所以这里不用管(已经逆完了);tag
是ASN.1的类型标签,可以参考这里 ,比如0x30是指序列(Sequence),0x02指整数(Integer)等;length
即payload
的长度,但是前面还有个_definite_form
对长度做格式化,继续追,还是在同一个文件(156行):
1 2 3 4 5 6 7 8 9 def _definite_form (length ): """Build length octets according to BER/DER definite form. """ if length > 127 : encoding = long_to_bytes(length) return bchr(len (encoding) + 128 ) + encoding return bchr(length)
大概意思是,如果长度小于127的话(即byte的最高位还没为1)就直接返回;如果超过127的话,把<length>
的最高比特置1,然后加上存储长度需要占用的byte数量x,然后剩下的x个bytes用来存储长度。比如长度是0x0100的话需要2bytes存储,会被编成0x820100;长度是0xf0的话,因为最高比特为1所以不能直接存,占用1byte,被编成0x81f0。
手撕RSA私钥·
经过上面的逆向后就可以开撕了,首先看前面转出来的二进制:
1 3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07
30
就是Sequence的tag,82
就是说接下来后两个bytes是这个Sequence的长度,即0x025d
个bytes,也就是剩下全部都是。接着的020100
就是整数0,其中02
是整数的tag,01
是这个整数占1byte,00
是value同样的方法也可以解02818100a0...
和后面其他整数(其实生成的私钥PEM只有整数-),大概长这样:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3082025d # Begin Sequence: len=0x025d 0201 # Version: (len=0x01) 00 028181 # n: (len=0x81) 00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef 0203 # e: (len=0x03) 010001 028180 # d: (len=0x80) 0f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949 0241 # p: (len=0x41) 00ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47 0241 # q: (len=0x41) 00cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f42119 0240 # d mod (p-1): (len=0x40) 1b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de9 0241 # d mod (q-1): (len=0x41) 008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1 0241 # (inverse of q) mod p: (len=0x41) 00c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07 # End Sequence
另外也可以from Crypto.Util.asn1 import DerSequence,DerInteger
,然后用PyCrypto解,略。
RSA公钥·
公钥部分也是类似的,先看.../site-packages/Crypto/PublicKey/RSA.py
(348行)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class RsaKey (object ): ... ... def export_key (self, format ='PEM' , passphrase=None , pkcs=1 , protection=None , randfunc=None ): ... ... if self.has_private(): ... ... else : key_type = "PUBLIC KEY" binary_key = _create_subject_public_key_info(oid, DerSequence([self.n, self.e]) ) ... ...
主要看_create_subject_public_key_info
,在.../site-packages/Crypto/PublicKey/__init__.py
(63行):
1 2 3 4 5 6 7 8 9 10 11 def _create_subject_public_key_info (algo_oid, secret_key, params=None ): if params is None : params = DerNull() spki = DerSequence([ DerSequence([ DerObjectId(algo_oid), params]), DerBitString(secret_key) ]) return spki.encode()
即会编码成一个嵌套数组,最终转化为DER时会是平坦化后的spki
。另附上RFC 3447说明:
1 2 3 4 RSAPublicKey ::= SEQUENCE { modulus INTEGER, -- n publicExponent INTEGER -- e }
手撕RSA公钥·
过程和私钥的差不多,就略着讲了,首先是拿二进制:
1 30819f300d06092a864886f70d010101050003818d0030818902818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef0203010001
然后拆分成:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 30819f # Begin Main Sequence: len=0x9f 300d # Begin Sub1 Sequence: len=0x0d 0609 # algo_oid: (1.2.840.113549.1.1.1 - PKCSv1.2) 2a864886f70d010101 0500 # params: (null) # End Sub1 Sequence 03818d # BitString: len=0x8d ([n, e]) 00308189 # Begin Sub2 Sequence: len=0x89 028181 # n: 00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef 0203 # e: 010001 # End Sub2 Sequence # End Main Sequence
另外,关于algo_oid
(OBJECT IDENTIFIER)的Hex编码还是有点迷,可以参考这里 。
https://www.shangyang.me/2017/05/24/encrypt-rsa-keyformat/
https://docs.microsoft.com/en-us/windows/win32/seccertenroll/about-introduction-to-asn-1-syntax-and-encoding
https://docs.microsoft.com/en-us/windows/win32/seccertenroll/about-encoded-tag-bytes
https://datatracker.ietf.org/doc/html/rfc3447
https://crypto.stackexchange.com/questions/29115/how-is-oid-2a-86-48-86-f7-0d-parsed-as-1-2-840-113549
https://www.alvestrand.no/objectid/