about summary refs log tree commit diff
path: root/users/Profpatsch/netencode/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'users/Profpatsch/netencode/README.md')
-rw-r--r--users/Profpatsch/netencode/README.md28
1 files changed, 25 insertions, 3 deletions
diff --git a/users/Profpatsch/netencode/README.md b/users/Profpatsch/netencode/README.md
index 3058e36eaf..3538a110a6 100644
--- a/users/Profpatsch/netencode/README.md
+++ b/users/Profpatsch/netencode/README.md
@@ -1,6 +1,6 @@
 # netencode 0.1-unreleased
 
-[bencode][] and [netstring][]-inspired pipe format that should be trivial go generate correctly in every context (only requires a `byte_length()` and a `printf()`), easy to parse (100 lines of code or less), mostly human-decipherable for easy debugging, and support nested record and sum types.
+[bencode][] and [netstring][]-inspired pipe format that should be trivial to generate correctly in every context (only requires a `byte_length()` and a `printf()`), easy to parse (100 lines of code or less), mostly human-decipherable for easy debugging, and support nested record and sum types.
 
 
 ## scalars
@@ -73,7 +73,11 @@ A tag (`<`) gives a value a name. The tag is UTF-8 encoded, starting with its le
 ### records (products/records), also maps
 
 A record (`{`) is a concatenation of tags (`<`). It needs to be closed with `}`.
-If tag names repeat the later ones should be ignored. Ordering does not matter.
+
+If tag names repeat the *earlier* ones should be ignored.
+Using the last tag corresponds with the way most languages handle converting a list of tuples to Maps, by using a for-loop and Map.insert without checking the contents first. Otherwise you’d have to revert the list first or remember which keys you already inserted.
+
+Ordering of tags in a record does not matter.
 
 Similar to text, records start with the length of their *whole encoded content*, in bytes. This makes it possible to treat their contents as opaque bytestrings.
 
@@ -81,7 +85,7 @@ Similar to text, records start with the length of their *whole encoded content*,
 * A record with one empty field, `foo`: `{9:<3:foo|u,}`
 * A record with two fields, `foo` and `x`: `{21:<3:foo|u,<1:x|t3:baz,}`
 * The same record: `{21:<1:x|t3:baz,<3:foo|u,}`
-* The same record (later occurences of fields are ignored): `{28:<1:x|t3:baz,<3:foo|u,<1:x|u,}`
+* The same record (earlier occurences of fields are ignored): `{<1:x|u,28:<1:x|t3:baz,<3:foo|u,}`
 
 ### sums (tagged unions)
 
@@ -98,6 +102,24 @@ Similar to records, lists start with the length of their whole encoded content.
 * The list with text `foo` followed by i3 `-42`: `[14:t3:foo,i3:-42,]`
 * The list with `Some` and `None` tags: `[33:<4:Some|t3:foo,<4None|u,<4None|u,]`
 
+## parser security considerations
+
+The length field is a decimal number that is not length-restricted,
+meaning an attacker could give an infinitely long length (or extremely long)
+thus overflowing your parser if you are not careful.
+
+You should thus put a practical length limit to the length of length fields,
+which implicitely enforces a length limit on how long the value itself can be.
+
+Start by defining a max value length in bytes.
+Then count the number of decimals in that number.
+
+So if your max length is 1024 bytes, your length field can be a maximum `count_digits(1024) == 4` bytes long.
+
+Thus, if you restrict your parser to a length field of 4 bytes,
+it should also never parse anything longer than 1024 bytes for the value
+(plus 1 byte for the type tag, 4 bytes for the length, and 2 bytes for the separator & ending character).
+
 ## motivation
 
 TODO