From ed68ba675191bc3da57019b14609deb713247821 Mon Sep 17 00:00:00 2001
From: Profpatsch <mail@profpatsch.de>
Date: Sat, 29 Jan 2022 12:50:19 +0100
Subject: feat(users/Profpatsch/netencode): ignore earlier record entries
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It turns out that the netencode spec requiring to ignore *later*
entries meant that every parser has to do an extra check for each
element, instead of just overriding the key in the hash map.

This leads to a situation where the simple implementation is the wrong
one, which would lead to very subtle problems in parsers (see also the
infamous “json duplicate record entry” problem which has been used for
various exploits in the past).

To be fair, exploits are still possible, but at least a `Map.fromList`
will be the right implementation (provided it folds from the left) now
instead of the wrong one.

Examples of the trivial implementation being now right:

Python:

    > dict([("foo", 1), ("foo", 2)])
    {'foo': 2}

Rust:

    > println!("{:?}", HashMap::from([
      ("foo", 1),
      ("foo", 2)
    ]));
    {"foo": 2}

Haskell:

    > Data.Map.fromList [ ("foo", 1), ("foo", 2) ]
    fromList [("foo",2)]

Change-Id: Ife9593956f4718e5e720f4f348c227e4f3a71e2d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/5108
Tested-by: BuildkiteCI
Reviewed-by: Profpatsch <mail@profpatsch.de>
Reviewed-by: sterni <sternenseemann@systemli.org>
Autosubmit: Profpatsch <mail@profpatsch.de>
---
 users/Profpatsch/netencode/README.md    | 6 +++++-
 users/Profpatsch/netencode/netencode.rs | 8 +++-----
 2 files changed, 8 insertions(+), 6 deletions(-)

(limited to 'users')

diff --git a/users/Profpatsch/netencode/README.md b/users/Profpatsch/netencode/README.md
index 67cb843a58c7..8dc39f633761 100644
--- a/users/Profpatsch/netencode/README.md
+++ b/users/Profpatsch/netencode/README.md
@@ -73,7 +73,11 @@ A tag (`<`) gives a value a name. The tag is UTF-8 encoded, starting with its le
 ### records (products/records), also maps
 
 A record (`{`) is a concatenation of tags (`<`). It needs to be closed with `}`.
-If tag names repeat the later ones should be ignored. Ordering does not matter.
+
+If tag names repeat the *earlier* ones should be ignored.
+Using the last tag corresponds with the way most languages handle converting a list of tuples to Maps, by using a for-loop and Map.insert without checking the contents first. Otherwise you’d have to revert the list first or remember which keys you already inserted.
+
+Ordering of tags in a record does not matter.
 
 Similar to text, records start with the length of their *whole encoded content*, in bytes. This makes it possible to treat their contents as opaque bytestrings.
 
diff --git a/users/Profpatsch/netencode/netencode.rs b/users/Profpatsch/netencode/netencode.rs
index 0d92cf1ed4ae..bb08dca4aa57 100644
--- a/users/Profpatsch/netencode/netencode.rs
+++ b/users/Profpatsch/netencode/netencode.rs
@@ -405,11 +405,9 @@ pub mod parse {
                     inner_no_empty_string(tag_g(&inner)),
                     HashMap::new(),
                     |mut acc: HashMap<_, _>, Tag { tag, mut val }| {
-                        // ignore duplicated tag names that appear later
+                        // ignore earlier tags with the same name
                         // according to netencode spec
-                        if !acc.contains_key(tag) {
-                            acc.insert(tag, *val);
-                        }
+                        let _ = acc.insert(tag, *val);
                         acc
                     },
                 ),
@@ -633,7 +631,7 @@ pub mod parse {
                 record_t("{25:<1:a|u,<1:b|u,<1:a|i1:-1,}".as_bytes()),
                 Ok((
                     "".as_bytes(),
-                    vec![("a".to_owned(), T::Unit), ("b".to_owned(), T::Unit),]
+                    vec![("a".to_owned(), T::I3(-1)), ("b".to_owned(), T::Unit),]
                         .into_iter()
                         .collect::<HashMap<_, _>>()
                 )),
-- 
cgit 1.4.1