This page describes the message format used in the data transfer mode of the DPRK-ARQ modem, when a station has traffic to send; it is not used otherwise.
The transmission header appears first, before any of the messages. It tallies information about the amount of messages and data intended to be sent in the transmission. (It is not rare for operators to abort the transmission partway without successfully sending all the messages, due to poor propagation conditions or timing constraints for example.)
In Moscow and relay variants of the DPRK-ARQ protocol, the transmission header takes the following form:
|Marker||Message count||Total message size||Total group count|
|1-byte number of messages
following in the transmission
size of all message data
sum of group counts
of all messages
The Pyongyang variant of the DPRK-ARQ protocol uses the following alternative format instead:
|Marker||Transmission ID||Message count||Total message size|
|1-byte number of messages
following in the transmission
size of all message data
As further ties with the DPRK-ARQ protocol, the transmission ID also appears as a parameter in the data mode command initiating (or resuming) the transmission; the 0x30 marker might be related and in opposition to the 0x60 end-of-data marker ending the data transmission or replacing it when there is no traffic.
The total message size is the total size in bytes of all messages, including message headers, but excluding message block checksums - plus 3 bytes for each message. It matches the sum of the message length header fields of every message, minus the combined size of all 6-byte checksum portions, plus 3 bytes for each message (see sections below). The significance of the extra 3 bytes is unknown.
Message data is wrapped inside consecutive blocks featuring checksum trailers. Every block contains 100 bytes of data (except for the last block of a message, which may contain less) followed by three 2-byte checksums.
|Variable-length block data, up to 100 bytes||2-byte, little-endian
Each checksum field is calculated by applying, to each block data byte, a XOR with the indicated constant (0x00, 0x55 or 0xAA), and then summing the XOR results for all bytes. (XOR with 0x00 being a no-op, the first checksum is effectively the simple sum of all block data bytes.)
A message header appears at the beginning of each message in the transmission.
|Message length||Unknown||Distribution header count||5FG parameter||Origin/recipient||Serial number||Unknown||Group count||Optional distribution headers||Content type|
|Decode||301||2||Distribution list||5655||300||104||MF16 #1640, MF21 #1366||5FG / cleartext|
|4-byte, little-endian total byte count of all message blocks, including checksums||Always 0x19, purpose unknown||Number of optional distribution headers included later in this header||0x01, 0x03, 0x05 or 0x07 for 5FG messages, 0x00 otherwise||Embassy "MF" number of message recipient (from Pyongyang), or origin (to Pyongyang); 0x5F for distribution to multiple embassies.||2-byte, little-endian serial number, incrementing from 1||Presumed 2-byte, little-endian number; almost always 300, rarely 103 or 118; meaning unknown.||4-byte, little-endian number of 5FG in message; uncertain in cleartext and binary messages||Optional, variable-number list of 3-byte distribution headers (recipient details)||0x00 for 5FG or cleartext messages, 0x01 for binary messages|
The origin/recipient field can correspond to one of three cases.
- In messages transmitted in the upstream direction, originating from an embassy and sent towards Pyongyang, this field contains the embassy number of this origin embassy. It decodes to a number which is prepended with "MF", e.g. MF16, to designate the embassy. Known embassy numbers, some identified with their location, are listed here.
- In messages transmitted in the downstream direction, from Pyongyang and intended to a single embassy as recipient, this field contains the embassy number of this recipient embassy, encoded in the same way as above.
- However most downstream messages from Pyongyang are meant for distribution to multiple, often many embassies. Their recipient field is set to 0x5F (the upper bound of the embassy number range), and separate distribution headers are used instead: the distribution header count is set to more than 0 and a list of recipients is included as 3-byte distribution headers, one per recipient.
incrementing from 1
The distribution headers are always listed in order of increasing embassy number.
The two main hubs, Pyongyang and the Moscow embassy, edit headers of messages for distribution before transmitting them: they include only the distribution headers corresponding to the embassy receiving that copy of the message, and of embassies further downstream behind that one. Other embassies relay messages according to the distribution list included, without modifying it. For these reasons, it is not uncommon to see copies of messages for distribution including only a single distribution header.
Single-recipient message header example (MF17):
F8000000 19 00 01 11 3701 2C01 55000000 00
Message header example with a distribution list showing only one of the recipients (MF17):
B6010000 19 01 01 5F D10D 2C01 9B000000 11 4704 00
Message header example with distribution headers listing four of the recipients (MF46, MF51, MF53 and MF63):
3B030000 19 04 01 5F 970D 2C01 2C010000 2E 5A03 33 8A03 35 A403 3F 9803 00
The serial numbers, whether in the main header field or in distribution headers, are particular to the embassy number, the direction of the message (downstream from Pyongyang or upstream to Pyongyang), whether it is in a single-recipient message header or a distribution header, and the message type possibly too. Serial numbers increment with each message within a series defined by these characteristics. All serial numbers are reset to 1 at the beginning of a new year, and can reach up in the thousands by the end of the year.
A full transmission encloses a transmission header, followed by the indicated number of messages, each split into a number of its own message blocks. The messages are sorted by their recipient/origin header field, in increasing order: single-recipient messages and lower embassy numbers appear first, then 0x5F messages for distribution last. For a same recipient/origin header value, messages are further listed in order of increasing main-header serial number.
Decoding the full transcript comes down to:
- Parsing the fixed-size transmission header.
- Parsing each message, one by one:
- Peeking at the first 4 bytes, which are the message length header.
- Splitting the indicated length of data into message blocks, of 106 bytes each, except for the last block whose size can be lower than 106 bytes and is simply whatever amount of data is left for the end of this message.
- For each block, stripping the last 6 bytes, which are checksums. They can be checked against the remaining 100 bytes (or less) of block data to detect corruption, and maybe attempt manual data restoration if possible; however as an offline process, it is normally too late at this point to request retransmission of the corrupted block, and thus the message should be considered as incomplete or of unreliable integrity.
- Concatenating the data of all the blocks, in transmission order, to reconstitute the message.
- Parsing the message header, in particular checking the distribution header count, in order to separate the variable-length headers from the message body.
- Decoding the message body depending on its type.
5-figure group messages
This is by far the most common message type. Digits are encoded using basic packed binary-coded decimal. Each byte encodes two digits: each 4-bit half-byte encodes one digit according to its natural binary value from 0 to 9. Hexadecimal values between A and F are never used. In case the message contains an odd number of 5-figure groups (and thus an intended odd number of digits), the last byte is completed with a 0 padding digit.
This message type uses content type 0x00 and a 5FG parameter value different from 0x00: either 0x01, 0x03, 0x05 or 0x07. The purpose of this parameter, which seems to be used only by this type of message, is unknown. Messages usually contain at least several tens of groups, although the group count is also often much higher. Some of the groups usually repeat in two independent identified patterns:
- The 1st, 2nd and 3rd groups are repeated as the 3 penultimate groups of the message: in other words, the 3-group sequence at the very beginning of the message body is repeated at the end of it, followed by a single, original group ending the message. This behavior seems tied to the unidentified "2C01" field in the headers: the groups repeat when this field's value is 300 ("2C01") - which is most often - but don't when it is 103 ("6700") or 118 ("7300").
- The 1st group is repeated across the 6th and 7th groups, in the last 2 digits of the 6th group and first 3 digits of the 7th group. This behavior is followed most often, and the conditions under which it isn't are not yet understood.
Example of a 5-figure group message, including headers, following both repeat behaviors and featuring a final 0 padding digit:
F8000000 19 00 01 11 3701 2C01 55000000 00 47547 27997 50675 45783 77046 01247 54723 19203 64656 48260 99665 86520 86646 00281 13600 44673 16760 14653 92766 42564 66832 38756 17156 89639 68102 23711 29045 91764 65853 74547 61171 06681 22221 33108 55812 17173 58397 21035 53662 96625 58259 05798 18234 78397 42021 02904 10215 82904 47777 81503 60744 68245 81733 29413 32575 99090 10808 69865 56438 53251 93652 14389 57865 36519 10504 57757 76740 69832 01816 55480 54470 99769 39457 34484 25947 97438 05638 29333 33949 92143 39311 47547 27997 50675 14009 0
This rare message type contains a straightforward unencrypted text payload, in Korean. It uses the North Korean KPS 9566 character encoding. The CRCRLF sequence is used to mark new lines. All messages of this type seen so far have followed a similar structure: starting with "대표부앞" (in front of representative department), followed by a person's name and their personal contact information, and finally an indication of a number of days. These messages could be related to visa clearances or the like.
These messages use 0x00 for both content type and 5FG parameter. Due to lack of data, the interpretation of the group count header is uncertain: it could be the message body length in units of 3 or 4 bytes, rounded up.
This occasional message type is characterized by content type 0x01, and uses 0x00 for 5FG parameter. Little is understood about its contents. They are made of random-looking binary data. The same sequences of a few bytes sometimes appear with a variable length across different messages of this type, most visibly being reused as the beginning of the payload. The contents are presumed to be some undetermined kind of compressed or encrypted data.
Although it has also been observed in messages for distribution to multiple embassies, this message type seems to appear mostly in single-recipient messages, and even more frequently in messages sent upstream to Pyongyang, especially coming from certain embassies.