Skip to content
  • Pierre-Yves David's avatar
    match: sort patterns before compiling them into a regex · 47686726545d
    Pierre-Yves David authored
    While investigating cripping performance for `hg cat` in some context, I
    discovered that, for large inputs, building a regex from out of order patterns
    result may result in a *much* slower regex and a much slower associated
    matcher's performance.
    
    So we are now sorting the patterns to help the regex engine.
    
    There is more to the story as we rely on regexp more than we should. See the
    next changeset for details.
    
    Benchmarks
    ==========
    
    In the following benchmark we are comparing the `hg cat` and `hg files` run
    time when matching against the full list of files in the repository. They are
    run:
     - without the rust extensions
     - with the standard python enfine (so without re2)
    
    sort vs non-sorted - Before this changeset (3f5137543773)
    ---------------------------------------------------------
    
    ###### hg files ###############################################################
    ### mercurial-2018-08-01-zstd-sparse-revlog
    sorted:      0.230092 seconds
    shuffled:    0.234235 seconds    (+1.80%)
    ### pypy-2018-08-01-zstd-sparse-revlog
    sorted:      0.613567 seconds
    shuffled:    0.801880 seconds   (+30.69%)
    ### mozilla-central-2018-08-01-zstd-sparse-revlog
    sorted:     62.474221 seconds
    shuffled: 1364.180218 seconds (+2083.59%)
    ### netbeans-2018-08-01-zstd-sparse-revlog
    sorted:    21.541828 seconds
    shuffled: 172.759857 seconds   (+701.97%)
    
    ###### hg cat #################################################################
    ### mercurial-2018-08-01-zstd-sparse-revlog
    sorted:      0.764407 seconds
    shuffled:    0.768924 seconds
    ### pypy-2018-08-01-zstd-sparse-revlog
    sorted:      2.065220 seconds
    shuffled:    2.276388 seconds  (+10.22%)
    ### netbeans-2018-08-01-zstd-sparse-revlog
    sorted:     40.967983 seconds
    shuffled:  216.388709 seconds (+428.19%)
    ### mozilla-central-2018-08-01-zstd-sparse-revlog
    sorted:    105.228510 seconds
    shuffled: 1448.722784 seconds (+1276.74%)
    
    sort vs non-sorted - With this changeset
    ----------------------------------------
    
    ###### hg files ###############################################################
    ### mercurial-2018-08-01-zstd-sparse-revlog
    all-list-pattern-sorted:    0.230069
    all-list-pattern-shuffled:  0.231165
    ### pypy-2018-08-01-zstd-sparse-revlog
    all-list-pattern-sorted:    0.616799
    all-list-pattern-shuffled:  0.616393
    ### netbeans-2018-08-01-zstd-sparse-revlog
    all-list-pattern-sorted:   21.586773
    all-list-pattern-shuffled: 21.908197
    ### mozilla-central-2018-08-01-zstd-sparse-revlog
    all-list-pattern-sorted:   61.279490
    all-list-pattern-shuffled: 62.473549
    
    ###### hg cat #################################################################
    ### mercurial-2018-08-01-zstd-sparse-revlog
    sorted:      0.763883 seconds
    shuffled:    0.765848 seconds
    ### pypy-2018-08-01-zstd-sparse-revlog
    sorted:      2.070498 seconds
    shuffled:    2.069197 seconds
    ### netbeans-2018-08-01-zstd-sparse-revlog
    sorted:     41.392423 seconds
    shuffled:   41.648689 seconds
    ### mozilla-central-2018-08-01-zstd-sparse-revlog
    sorted:    103.315670 seconds
    shuffled:  104.369358 seconds
    47686726545d