1. 13 Jan, 2020 3 commits
  2. 06 Jan, 2020 2 commits
  3. 31 Dec, 2019 3 commits
    • rust-nodemap: pure Rust example · 00ea75fd5a4
      To run, use `cargo run --release --example nodemap`
      This demonstrates that simple scenarios entirely written
      in Rust can content themselves with `NodeTree<T>`.
      The example mmaps both the nodemap file and the revlog index.
      We had of course to include an implementation of `RevlogIndex`
      directly, which isn't much at this stage. It felt a bit
      prematurate to include it in the lib.
      Here are some first performance measurements, obtained with
      this example, on a clone of mozilla-central with 440000
        (create) Nodemap constructed in RAM in 153.638305ms
        (query CAE63161B68962) found in 22.362us: Ok(Some(269489))
        (bench) Did 3 queries in 36.418µs (mean 12.139µs)
        (bench) Did 50 queries in 184.318µs (mean 3.686µs)
        (bench) Did 100000 queries in 31.053461ms (mean 310ns)
      To be fair, even between bench runs, results tend to depend whether
      the file is still in kernel caches, and it's not so easy to
      get back to a real cold start. The worst we've seen was in the
      50us ballpark.
      In any busy server setting, the pages would always be in RAM.
      We hope it's good enough not to be significantly slower on any
      concrete Mercurial operation than the C nodetree when fully in RAM,
      and of course this implementation has the serious headstart advantage
      of persistence.
      Differential Revision: https://phab.mercurial-scm.org/D7797
      Georges Racinet authored
    • rust-nodemap: input/output primitives · dabc0bed269
      These allow to initiate a `NodeTree` from an immutable opaque
      sequence of bytes, which could be passed over from Python
      (extracted from a `PyBuffer`) or directly mmapped from a file.
      Conversely, we can consume
      a `NodeTree`, extracting the bytes that express what
      has been added to the immutable part, together with the
      original immutable part.
      This gives callers the choice to start a new Nodetree.
      After writing to disk, some would prefer to reread for
      best guarantees (very cheap if mmapping), some others will
      find it more convenient to grow the memory that was considered
      immutable in the `NodeTree` and continue from there.
      In `load_bytes`, we anticipate a bit on the file format for
      the final version, allowing an offset for fixed data at the
      beginning of the file.
      This is enough to build examples running on real data and
      start gathering performance hints.
      Differential Revision: https://phab.mercurial-scm.org/D7796
      Georges Racinet authored
    • rust-nodemap: insert method · a65028b45d9
      In this implementation, we are in direct competition
      with the C version: this Rust version will have a clear
      startup advantage because it will read the data from disk,
      but the insertion happens all in memory for both.
      Differential Revision: https://phab.mercurial-scm.org/D7795
      Georges Racinet authored
  4. 27 Dec, 2019 2 commits
    • rust-nodemap: generic NodeTreeVisitor · 35d302b6657
      This iterator will help avoid code duplication when we'll
      implement `insert()`, in which we will need to
      traverse the node tree, and to remember the visited blocks.
      The iterator converts the three variants of `Element` into the
      boolean `leaf` and `Option<Revision>` instead of just emitting the
      variant it's seen. The motivation is to avoid a dead match arm
      in the future `insert()`.
      Differential Revision: https://phab.mercurial-scm.org/D7794
      Georges Racinet authored
    • rust-nodemap: mutable NodeTree data structure · cb06a74efd2
      Thanks to the previously indexing abstraction,
      the only difference in the lookup algorithm is that we
      don't need the special case for an empty NodeTree any more.
      We've considered making the mutable root an `Option<Block>`,
      but that leads to unpleasant checks and `unwrap()` unless we
      abstract it as typestate patterns (`NodeTree<Immutable>` and
      `NodeTree<Mutated>`) which seem exaggerated in that
      The initial copy of the root block is a very minor
      performance penalty, given that it typically occurs just once
      per transaction.
      Differential Revision: https://phab.mercurial-scm.org/D7793
      Georges Racinet authored
  5. 26 Dec, 2019 2 commits
  6. 27 Dec, 2019 1 commit
    • rust-node: handling binary Node prefix · 9147c562047
      Parallel to the inner signatures of the nodetree functions in
      revlog.c, we'll have to handle prefixes of `Node` in binary
      There's a complication due to the fact that we'll be sometimes
      interested in prefixes with an odd number of hexadecimal digits,
      which translates in binary form by a last byte in which only the
      highest weight 4 bits are considered.
      There are a few candidates for inlining here, but we refrain from
      such premature optimizations, letting the compiler decide.
      Differential Revision: https://phab.mercurial-scm.org/D7790
      Georges Racinet authored
  7. 25 Dec, 2019 1 commit
  8. 27 Dec, 2019 1 commit
  9. 25 Dec, 2019 1 commit
  10. 15 Jan, 2020 3 commits
  11. 14 Jan, 2020 1 commit
  12. 15 Jan, 2020 3 commits
  13. 14 Jan, 2020 7 commits
  14. 23 Dec, 2019 2 commits
    • verify: allow the storage to signal when renames can be tested on `skipread` · b9e174d4ed1
      This applies the new marker in the lfs handler to show it in action, and adds
      the test mentioned at the beginning of the series to show that fulltext isn't
      necessary in the LFS case.
      The existing `skipread` isn't enough, because it is also set if an error occurs
      reading the revlog data, or the data is censored.  It could probably be cleared,
      but then it technically violates the interface contract.  That wouldn't matter
      for the existing verify algorithm, but it isn't clear how that will change as
      alternate storage support is added.
      The flag is probably pretty revlog specific, given the comments in verify.py.
      But there's already filelog specific stuff in there and I'm not sure what future
      storage will bring, so I don't want to over-engineer this.  Likewise, I'm not
      sure that we want the verify method for each storage type to completely drive
      the bus when it comes to detecting renames, so I don't want to go down the
      rabbithole of having verifyintegrity() return metadata hints at this point.
      Differential Revision: https://phab.mercurial-scm.org/D7713
      Matt Harbison authored
    • lfs: don't skip locally available blobs when verifying · 1a6dd50cd0d
      The `skipflags` config was introduced in a2ab9ebcd85b, which specifically calls
      out downloading and storing all blobs as potentially too expensive.  But I don't
      see any reason to skip blobs that are already available locally.  Hashing the
      blob is the only way to indirectly verify the rawdata content stored in the
      (The note in that commit about skipping renamed is still correct, but the reason
      given about needing fulltext isn't.)
      Differential Revision: https://phab.mercurial-scm.org/D7712
      Matt Harbison authored
  15. 20 Dec, 2019 1 commit
  16. 08 Jan, 2020 2 commits
  17. 07 Dec, 2019 2 commits
  18. 14 Jan, 2020 3 commits