priyom.org

Message Format

This page describes the message format used in the data transfer mode of the DPRK-ARQ modem, when a station has traffic to send; it is not used otherwise.

Transmission header

The transmission header appears first, before any of the messages. It tallies information about the amount of messages and data intended to be sent in the transmission. (It is not rare for operators to abort the transmission partway without successfully sending all the messages, due to poor propagation conditions or timing constraints for example.)

In Moscow and relay variants of the DPRK-ARQ protocol, the transmission header takes the following form:

  Marker Message count Total message size Total group count
Hexadecimal 30 02 5B020000 E1000000
Decimal   2 603 25
  1-byte constant,
always 0x30
1-byte number of messages
following in the transmission
4-byte, little-endian
size of all message data
4-byte, little-endian
sum of group counts
of all messages

The Pyongyang variant of the DPRK-ARQ protocol uses the following alternative format instead:

  Marker Transmission ID Message count Total message size
Hexadecimal 30 087F 02 8405
Decimal     2 1412
  1-byte constant,
always 0x30
2-byte, random-looking
binary value
1-byte number of messages
following in the transmission
2-byte, little-endian
size of all message data

As further ties with the DPRK-ARQ protocol, the transmission ID also appears as a parameter in the data mode command initiating (or resuming) the transmission; the 0x30 marker might be related and in opposition to the 0x60 end-of-data marker ending the data transmission or replacing it when there is no traffic.

The total message size is the total size in bytes of all messages, including message headers, but excluding message block checksums - plus 3 bytes for each message. It matches the sum of the message length header fields of every message, minus the combined size of all 6-byte checksum portions, plus 3 bytes for each message (see sections below). The significance of the extra 3 bytes is unknown.

Message blocks

Message data is wrapped inside consecutive blocks featuring checksum trailers. Every block contains 100 bytes of data (except for the last block of a message, which may contain less) followed by three 2-byte checksums.

Block data Checksums
B60100001901015FD10D2C019B00000011... 5902162409 131BB123EB3F
131B B123 EB3F
Variable-length block data, up to 100 bytes 2-byte, little-endian
0x00 checksum
2-byte, little-endian
0x55 checksum
2-byte, little-endian
0xAA checksum

Each checksum field is calculated by applying, to each block data byte, a XOR with the indicated constant (0x00, 0x55 or 0xAA), and then summing the XOR results for all bytes. (XOR with 0x00 being a no-op, the first checksum is effectively the simple sum of all block data bytes.)

Message header

A message header appears at the beginning of each message in the transmission.

  Message length Unknown Distribution header count Message type Origin/recipient Serial number Unknown Group count Unknown Optional distribution headers Obfuscation
Hexa 2D010000 19 02 03 5F 1716 2C01 6800 0000 106806 155605 00
Decode 301   2   Distribution list 5655 300 104 0 MF16 #1640, MF21 #1366 Disabled
  4-byte, little-endian total byte count of all message blocks, including checksums Always 0x19, purpose unknown Number of optional distribution headers included later in this header 0x01, 0x03, 0x05 or 0x07 for 5FG messages, 0x00 for cleartext Embassy "MF" number of message recipient (from Pyongyang), or origin (to Pyongyang); 0x5F for distribution to multiple embassies. 2-byte, little-endian serial number, incrementing from 1 2-byte, little-endian number; almost always 300, rarely 103 or 118; meaning unknown. 2-byte, little-endian number of 5FG in message; uncertain in cleartext messages 2-byte, little-endian number; almost always 0, rarely 1; meaning unknown. Optional, variable-number list of 3-byte distribution headers (recipient details) 0x00 when disabled,
0x01 when enabled

The origin/recipient field can correspond to one of three cases.

  • In messages transmitted in the upstream direction, originating from an embassy and sent towards Pyongyang, this field contains the embassy number of this origin embassy. It decodes to a number which is prepended with "MF", e.g. MF16, to designate the embassy. Known embassy numbers, some identified with their location, are listed here.
  • In messages transmitted in the downstream direction, from Pyongyang and intended to a single embassy as recipient, this field contains the embassy number of this recipient embassy, encoded in the same way as above.
  • However most downstream messages from Pyongyang are meant for distribution to multiple, often many embassies. Their recipient field is set to 0x5F (the upper bound of the embassy number range), and separate distribution headers are used instead: the distribution header count is set to more than 0 and a list of recipients is included as 3-byte distribution headers, one per recipient.
  Recipient Serial number
Hexadecimal 10 6806
Decode MF16 1640
  1-byte embassy
"MF" number
2-byte, little-endian
serial number,
incrementing from 1

The distribution headers are always listed in order of increasing embassy number.

The two main hubs, Pyongyang and the Moscow embassy, edit headers of messages for distribution before transmitting them: they include only the distribution headers corresponding to the embassy receiving that copy of the message, and of embassies further downstream behind that one. Other embassies relay messages according to the distribution list included, without modifying it. For these reasons, it is not uncommon to see copies of messages for distribution including only a single distribution header.

Single-recipient message header example (MF17):

F8000000 19 00 01 11 3701 2C01 5500 0000 00

Message header example with a distribution list showing only one of the recipients (MF17):

B6010000 19 01 01 5F D10D 2C01 9B00 0000 11 4704 00

Message header example with distribution headers listing four of the recipients (MF46, MF51, MF53 and MF63):

3B030000 19 04 01 5F 970D 2C01 2C01 0000 2E 5A03 33 8A03 35 A403 3F 9803 00

The serial numbers, whether in the main header field or in distribution headers, are particular to the embassy number, the direction of the message (downstream from Pyongyang or upstream to Pyongyang), whether it is in a single-recipient message header or a distribution header, and the message type possibly too. Serial numbers increment with each message within a series defined by these characteristics. All serial numbers are reset to 1 at the beginning of a new year, and can reach up in the thousands by the end of the year.

Structure

A full transmission encloses a transmission header, followed by the indicated number of messages, each split into a number of its own message blocks. The messages are sorted by their recipient/origin header field, in increasing order: single-recipient messages and lower embassy numbers appear first, then 0x5F messages for distribution last. For a same recipient/origin header value, messages are further listed in order of increasing main-header serial number.

Decoding the full transcript comes down to:

  • Parsing the fixed-size transmission header.
  • Parsing each message, one by one:
    1. Peeking at the first 4 bytes, which are the message length header.
    2. Splitting the indicated length of data into message blocks, of 106 bytes each, except for the last block whose size can be lower than 106 bytes and is simply whatever amount of data is left for the end of this message.
    3. For each block, stripping the last 6 bytes, which are checksums. They can be checked against the remaining 100 bytes (or less) of block data to detect corruption, and maybe attempt manual data restoration if possible; however as an offline process, it is normally too late at this point to request retransmission of the corrupted block, and thus the message should be considered as incomplete or of unreliable integrity.
    4. Concatenating the data of all the blocks, in transmission order, to reconstitute the message.
    5. Parsing the message header, in particular checking the distribution header count, in order to separate the variable-length headers from the message body.
    6. Decoding the message body depending on its type.

5-figure group messages

This is by far the most common message type. Digits are encoded using basic packed binary-coded decimal. Each byte encodes two digits: each 4-bit half-byte encodes one digit according to its natural binary value from 0 to 9. Hexadecimal values between A and F are never used. In case the message contains an odd number of 5-figure groups (and thus an intended odd number of digits), the last byte is completed with a 0 padding digit.

Obfuscation is always disabled, and the message type parameter takes either 0x01, 0x03, 0x05 or 0x07. The purpose of this parameter, which seems to be used only by this type of message, is unknown. Some of the groups usually repeat in two independent identified patterns:

  • The 1st, 2nd and 3rd groups are repeated as the 3 penultimate groups of the message: in other words, the 3-group sequence at the very beginning of the message body is repeated at the end of it, followed by a single, original group ending the message. This behavior seems tied to the unidentified "2C01" field in the headers: the groups repeat when this field's value is 300 ("2C01") - which is most often - but don't when it is 103 ("6700") or 118 ("7300").
  • The 1st group is repeated across the 6th and 7th groups, in the last 2 digits of the 6th group and first 3 digits of the 7th group. This behavior is followed most often, and the conditions under which it isn't are not yet understood.

Example of a 5-figure group message, including headers, following both repeat behaviors and featuring a final 0 padding digit:

F8000000 19 00 01 11 3701 2C01 5500 0000 00
47547 27997 50675 45783 77046 01247 54723 19203 64656 48260
99665 86520 86646 00281 13600 44673 16760 14653 92766 42564
66832 38756 17156 89639 68102 23711 29045 91764 65853 74547
61171 06681 22221 33108 55812 17173 58397 21035 53662 96625
58259 05798 18234 78397 42021 02904 10215 82904 47777 81503
60744 68245 81733 29413 32575 99090 10808 69865 56438 53251
93652 14389 57865 36519 10504 57757 76740 69832 01816 55480
54470 99769 39457 34484 25947 97438 05638 29333 33949 92143
39311 47547 27997 50675 14009 0

Cleartext messages

Cleartext messages encode text using the North Korean KPS 9566 character encoding. Although they have been observed in messages for distribution to multiple embassies, this message type seems to appear mostly in single-recipient messages, and even more frequently in messages sent upstream to Pyongyang, especially coming from certain embassies. Due to lack of data, the interpretation of the group count in the header is uncertain: it could be the message body length in units of 3 or 4 bytes, rounded up.

Most messages of this type have obfuscation enabled. It is performed by prepending a single bit in front of each 8-bit byte, effectively making it appear incoherent when viewed as hex. Every byte that is part of the text has a zero bit in front of it. However, there are also bytes that throw off the KPS 9566 parser; this includes every byte with a one bit prepended, but also some using the zero bit. Unobfuscated messages are simply streams of characters.

Cleartext messages are usually mentions of North Korea in international press, contact data of presumed visa applicants ("대표부앞"), congratulatory telegrams, and reports on internal affairs of countries the embassies are located in ("내부정세").