VirtualBox

source: kBuild/vendor/grep/current/ChangeLog

Last change on this file was 3529, checked in by bird, 3 years ago

Imported grep 3.7 from grep-3.7.tar.gz (sha256: c22b0cf2d4f6bbe599c902387e8058990e1eee99aef333a203829e5fd3dbb342), applying minimal auto-props.

File size: 473.8 KB
Line 
12021-08-14 Jim Meyering <meyering@fb.com>
2
3 version 3.7
4 * NEWS: Record release date.
5
62021-08-09 Jim Meyering <meyering@fb.com>
7
8 tests: provide an awk-based seq replacement
9 ...so we can continue to use seq, but the wrapper when needed.
10 * tests/init.cfg (seq): Some systems lask seq.
11 Provide a replacement.
12 * tests/hash-collision-perf: Use seq once again.
13 * tests/long-pattern-perf: Likewise. And remove a comment about seq.
14
152021-08-09 Paul Eggert <eggert@cs.ucla.edu>
16
17 grep: simplify EGexecute
18 * src/dfasearch.c (EGexecute): Remove a label and goto.
19 This also makes the machine code a bit shorter, on x86-64 gcc.
20
21 grep: simplify data movement slightly
22 * src/grep.c (fillbuf): Simplify movement of saved data.
23
24 grep: pointer-integer cast nit
25 * src/grep.c (ALIGN_TO): When converting pointers to unsigned
26 integers, convert to uintptr_t not size_t, as size_t in theory
27 might be too narrow.
28
29 tests: use awk, not seq
30 Portability problem reported by Dagobert Michelsen in:
31 https://lists.gnu.org/r/grep-devel/2021-08/msg00004.html
32 * tests/hash-collision-perf, tests/long-pattern-perf:
33 Don’t assume seq is installed; use awk instead.
34
352021-08-08 Jim Meyering <meyering@fb.com>
36
37 build: update gnulib to latest
38
39 build: update gnulib to latest
40
412021-08-06 Kevin Locke <kevin@kevinlocke.name>
42
43 doc: usage: --group-separator/--no-group-separator
44 * src/grep.c (usage): Document --group-separator
45 and --no-group-separator.
46
47 doc: man: add --group-separator/--no-group-separator
48 * doc/grep.in.1:
49 Add copy of docs for --group-separator from doc/grep.texi.
50 Add copy of docs for --no-group-separator from doc/grep.texi.
51
522021-08-06 Jim Meyering <meyering@fb.com>
53
54 build: update gnulib to latest
55
562021-06-19 Mateusz Okulus <mmokulus@gmail.com>
57
58 doc: note that -H is a GNU extension in man page, too
59 * doc/grep.in.1 (-H): Mention that this is a GNU extension.
60
612021-06-13 Paul Eggert <eggert@cs.ucla.edu>
62
63 build: update gnulib submodule to latest
64
652021-06-11 Paul Eggert <eggert@cs.ucla.edu>
66
67 build: update gnulib submodule to latest
68
692021-06-10 Paul Eggert <eggert@cs.ucla.edu>
70
71 doc: improve examples and wording
72 * doc/grep.texi (The Backslash Character and Special Expressions)
73 (Usage): Improve doc (Bug#48948).
74
752021-01-31 Jim Meyering <meyering@fb.com>
76
77 doc: man: fix -L description and improve -l's
78 * doc/grep.texi (-L): Remove erroneous sentence about stopping early.
79 With -L, grep cannot stop scanning early.
80 (-l): Tweak existing wording.
81 * doc/grep.in.1: Remove the -L sentence here, too.
82 (-l): Copy the sentence from grep.texi, to clarify: it's only per-file
83 scanning that stops upon match. Reported by Robert Bruntz
84 in http://debbugs.gnu.org/46179
85
862021-01-05 Jim Meyering <meyering@fb.com>
87
88 build: avoid long-string warnings in gnulib tests
89 * configure.ac (GNULIB_TEST_WARN_CFLAGS): Add
90 -Woverlength-strings to avoid clang warnings.
91
922021-01-01 Paul Eggert <eggert@cs.ucla.edu>
93
94 doc: further clarify regexp structure
95 * doc/grep.texi (Fundamental Structure)
96 (Back-references and Subexpressions, Basic vs Extended):
97 Further clarifications.
98
99 maint: copy bootstrap, tests/init.sh from Gnulib
100
101 doc: update grep.texi cite to 2021
102
103 maint: run "make update-copyright"
104
105 build: update gnulib submodule to latest
106
1072020-12-30 Jim Meyering <meyering@fb.com>
108
109 build: update gnulib to latest
110 * gnulib: update for clang-10 warning warning-avoidance
111 fixes in hash and regex-tests.
112
113 maint: add parentheses to avoid new clang-10 warning
114 * src/dfasearch.c (regex_compile): Parenthesize arith-OR vs
115 ternary, to placate clang-10.
116
1172020-12-29 Paul Eggert <eggert@cs.ucla.edu>
118
119 doc: clarify special chars and }
120 * doc/grep.texi (Fundamental Structure)
121 (Character Classes and Bracket Expressions)
122 (The Backslash Character and Special Expressions, Anchoring)
123 (Basic vs Extended): Clarify which characters are special,
124 and why \ is needed before } in grep even though } is not special.
125 Use Posix terminology for ordinary and special characters and for
126 interval expressions.
127
1282020-12-29 Marek Suppa <mr@shu.io>
129
130 doc: fix missing right curly brace
131 * doc/grep.texi (Basic vs Extended Regular Expressions): Mention that
132 the right curly brace (}) meta-character must be backslash-escaped.
133 It had been omitted from the list.
134
1352020-12-25 Jim Meyering <meyering@fb.com>
136
137 build: update gnulib to latest
138
139 grep: use of --unix-byte-offsets (-u) now elicits a warning
140 * NEWS (Change in behavior): Mention this.
141 * src/grep.c (main): Warn about each use of obsolete
142 --unix-byte-offsets (-u).
143 * doc/grep.in.1 (-u): Remove its documentation.
144
1452020-12-23 Helge Kreutzmann <debian@helgefjell.de>
146
147 doc: adjust man page syntax
148 * doc/grep.in.1: Mark some manual names with B<...>.
149 Mark PATTERNS with I<...>.
150 Drop final period in SEE ALSO.
151 With suggestions from of several members of the manpage-l10n
152 translation community. This resolves https://bugs.gnu.org/45353
153
1542020-11-26 Jim Meyering <meyering@fb.com>
155
156 grep: avoid performance regression with many patterns
157 * src/grep.c (hash_pattern): Switch from PJW to DJB2, to avoid an
158 O(N) to O(N^2) performance regression due to hash collisions with
159 patterns from e.g., seq 500000|tr 0-9 A-J
160 Reported by Frank Heckenbach in https://bugs.gnu.org/44754
161 * NEWS (Bug fixes): Mention it.
162 * tests/hash-collision-perf: New file.
163 * tests/Makefile.am (TESTS): Add it.
164
165 build: update gnulib to latest for warning fixes
166 * gnulib: Update submodule to latest.
167 * src/grep.c (printf_errno): Reflect gnulib's renaming: change
168 _GL_ATTRIBUTE_FORMAT_PRINTF to
169 _GL_ATTRIBUTE_FORMAT_PRINTF_STANDARD
170
171 tests: enable warnings for the gnulib-tests subdir
172 * gnulib-tests/Makefile.am (AM_CFLAGS): Enable gnulib
173 warning options for these tests.
174 * configure.ac (GNULIB_TEST_WARN_CFLAGS): Disable the same three
175 warning options that coreutils does, and a few more for GCC11.
176
1772020-11-08 Jim Meyering <meyering@fb.com>
178
179 maint: post-release administrivia
180 * NEWS: Add header line for next release.
181 * .prev-version: Record previous version.
182 * cfg.mk (old_NEWS_hash): Auto-update.
183
184 version 3.6
185 * NEWS: Record release date.
186
1872020-11-05 Jim Meyering <meyering@fb.com>
188
189 build: update gnulib to latest for test improvements
190
1912020-11-03 Jim Meyering <meyering@fb.com>
192
193 build: update gnulib to latest for C++-ready dfa.h and test-verify.c fix
194
1952020-11-03 Paul Eggert <eggert@cs.ucla.edu>
196
197 grep: remove GREP_OPTIONS
198 * NEWS: Mention this.
199 * doc/grep.in.1:
200 Remove GREP_OPTIONS documentation.
201 * doc/grep.texi (Environment Variables):
202 Move GREP_OPTIONS stuff into a “no longer implemented” paragraph.
203 * src/grep.c (prepend_args, prepend_default_options): Remove.
204 (main): Do not look at GREP_OPTIONS.
205 * tests/Makefile.am (TESTS_ENVIRONMENTS):
206 * tests/init.cfg (vars_): Remove GREP_OPTIONS.
207
2082020-11-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
209
210 grep: use RE_NO_SUB when calling regex solely to check syntax
211 * src/dfasearch.c (regex_compile): New parameter. All callers changed.
212 (GEAcompile): Move setting syntax for regex into regex_compile()
213 function. This addresses a performance problem exposed by extreme
214 regular expressions, as described in https://bugs.gnu.org/43862 .
215
216 tests: add the test for bugfix in gnulib's dfa
217 * tests/ere.tests: Add new test.
218
2192020-11-01 Jim Meyering <meyering@fb.com>
220
221 grep: avoid erroneous matches for e.g., a+a+a+
222 * gnulib: Update to latest, for dfa's invalid-merge fix.
223 * NEWS (Bug fixes): Mention this.
224
2252020-10-11 Jim Meyering <meyering@fb.com>
226
227 grep: -P: report input filename upon PCRE execution failure
228 Without this, it could be tedious to determine which input
229 file evokes a PCRE-execution-time failure.
230 * src/pcresearch.c (Pexecute): When failing, include the
231 error-provoking file name in the diagnostic.
232 * src/grep.c (input_filename): Make extern, since used above.
233 * src/search.h (input_filename): Declare.
234 * tests/filename-lineno.pl: Test for this.
235 ($no_pcre): Factor out.
236 * NEWS (Bug fixes): Mention this.
237
2382020-10-11 Paul Eggert <eggert@cs.ucla.edu>
239
240 grep: minor kwset cleanups
241 * src/kwsearch.c (Fexecute):
242 Assume C99 to put declarations nearer uses.
243 * src/kwset.c (bmexec): Omit unnecessary test.
244 * src/kwset.h (struct kwsmatch): Make OFFSET and SIZE individual
245 elements, not arrays of size 1 (a revenant of an earlier API).
246 All uses changed.
247
2482020-10-11 Norihiro Tanaka <noritnk@kcn.ne.jp>
249
250 grep: remove unused code
251 * src/kwsearch.c (Fcompile, Fexecute): Remove unused code. No longer these
252 are used after commit 016e590a8198009bce0e1078f6d4c7e037e2df3c.
253
2542020-10-05 Paul Eggert <eggert@cs.ucla.edu>
255
256 build: update gnulib submodule to latest
257
2582020-10-05 Jim Meyering <meyering@fb.com>
259
260 tests: correct filename-lineno.pl
261 * tests/filename-lineno.pl: Remove a stray envvar
262 that somehow slipped into expected output string.
263
2642020-10-05 Paul Eggert <eggert@cs.ucla.edu>
265
266 tests: fix tests when PCRE is not used
267 * tests/Makefile.am (TESTS_ENVIRONMENT):
268 Set PATH before setting PCRE_WORKS, so that the latter test
269 uses the just-built grep.
270 * tests/filename-lineno.pl (invalid-re-P-paren)
271 (invalid-re-P-star-paren): Adjust non-PCRE case to match
272 recently-changed behavior.
273
274 build: update gnulib submodule to latest
275
2762020-10-03 Paul Eggert <eggert@cs.ucla.edu>
277
278 doc: document --include/--exclude better
279 Problem reported by John Ruckstuhl (Bug#43782).
280 * doc/grep.texi (File and Directory Selection):
281 Document what happens if contradictory options are given,
282 or if no option matches a file name.
283 * doc/grep.in.1:
284
2852020-10-01 Jim Meyering <meyering@fb.com>
286
287 maint: add technically-required quotes
288 * configure.ac: Quote args of AC_CONFIG_AUX_DIR, AC_CONFIG_SRCDIR
289 and AC_CHECK_FUNCS_ONCE.
290
2912020-09-28 Jim Meyering <meyering@fb.com>
292
293 tests: restore deleted -P tests
294 v3.4-almost-45-g8577dda deleted these two -P-using tests because a
295 grep built without PCRE support would fail those tests. This sets
296 an envvar with the equivalent of the result from the require_pcre_
297 function and restores the now-guarded tests. Tested by running this:
298 ./configure --disable-perl-regexp && make check
299 * tests/Makefile.am (PCRE_WORKS): Set this envvar.
300 * tests/filename-lineno.pl: Restore invalid-re-P-paren and
301 invalid-re-P-star-paren, now each with a guard.
302
3032020-09-27 Jim Meyering <meyering@fb.com>
304
305 maint: post-release administrivia
306 * NEWS: Add header line for next release.
307 * .prev-version: Record previous version.
308 * cfg.mk (old_NEWS_hash): Auto-update.
309
310 version 3.5
311 * NEWS: Record release date.
312
313 maint: avoid autoconf warnings * configure.ac (AC_HEADER_STDC): Remove. It's been assumed for ages. * m4/pcre.m4 (gl_FUNC_PCRE): Use AS_HELP_STRING, not AC_HELP_STRING.
314
315 build: update gnulib to latest
316
3172020-09-26 Jim Meyering <meyering@fb.com>
318
319 build: update gnulib to latest
320
321 tests: skip stack-overflow test when built with ASAN
322 * tests/stack-overflow: Skip this test when the binary was built
323 with ASAN, to avoid spurious failures.
324
3252020-09-25 Paul Eggert <eggert@cs.ucla.edu>
326
327 build: update gnulib submodule to latest
328
329 build: update gnulib submodule to latest
330
3312020-09-24 Jim Meyering <meyering@fb.com>
332
333 tests: fix surrogate-pair test to work on 16-bit wchar_t systems
334 * tests/surrogate-pair: Avoid new failure on systems with
335 16-bit wchar_t. Detect the condition and exit before the
336 otherwise-failing tests. Remove the now-incorrect in-loop
337 test for that alternate failure mode. This was exposed by
338 testing on gcc119.fsffrance.org, a power8 AIX 7.2 system.
339
3402020-09-23 Paul Eggert <eggert@cs.ucla.edu>
341
342 grep: don't assume PCRE in tests
343 * tests/filename-lineno.pl: Remove invalid-re-P-paren and
344 invalid-re-P-star-paren as they assume PCRE support, which
345 causes a false alarm "grep: Perl matching not supported in a
346 --disable-perl-regexp build" on platforms without PCRE.
347
348 grep: pacify Sun C 5.15
349 This suppresses a false alarm '"grep.c", line 720: warning:
350 initializer will be sign-extended: -1'.
351 * src/grep.c (uword_max): New static constant.
352 (initialize_unibyte_mask): Use it.
353
3542020-09-23 Paul Eggert <eggert@cs.ucla.edu>
355 Norihiro Tanaka <noritnk@kcn.ne.jp>
356
357 grep: fix more Turkish-eyes bugs
358 Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577).
359 * NEWS: Mention new bug report.
360 * src/grep.c (ok_fold): New static var.
361 (setup_ok_fold): New function.
362 (fgrep_icase_charlen): Reject single-byte characters
363 if they match some multibyte characters when ignoring case.
364 This part of the patch is partly derived from
365 <https://bugs.gnu.org/43577#14>, which means it is:
366 (main): Call setup_ok_fold if ok_fold might be needed.
367 * src/searchutils.c (kwsinit): With the grep.c changes,
368 this code can now revert to classic 7th Edition Unix style;
369 aborting would be wrong.
370 * tests/turkish-eyes: Add tests for these bugs.
371
3722020-09-23 Paul Eggert <eggert@cs.ucla.edu>
373
374 build: update gnulib submodule to latest
375 * NEWS: Mention Bug#43577, which this fixes.
376
377 grep: fix recently-introduced performance glitch
378 * src/grep.c (main): Do not double-increment update_patterns.
379 update_patterns increments n_patterns now; do not increment it
380 again, as the incorrect count would hurt performance heuristics later.
381
3822020-09-22 Paul Eggert <eggert@cs.ucla.edu>
383
384 doc: improve --line-buffer doc
385 * doc/grep.texi (Other Options): Document --line-buffered more
386 carefully, and say what happens when it is not used. Problem
387 reported by Dan Jacobson (Bug#35339).
388
389 tests: port timeout test to Alpine
390 Problem reported by Bruno Haible in:
391 https://lists.gnu.org/r/grep-devel/2020-09/msg00080.html
392 * tests/init.cfg (require_timeout_): Check that ‘timeout 0.01
393 sleep 0.02’ works as expected, to avoid spurious test failure
394 on Alpine.
395
3962020-09-22 Jim Meyering <meyering@fb.com>
397
398 tests: test for many-regexp N^2 RSS regression
399 * tests/many-regex-performance: New test for this performance
400 regression.
401 * tests/Makefile.am: Add it.
402 * NEWS (Bug fixes): Describe it.
403
4042020-09-22 Norihiro Tanaka <noritnk@kcn.ne.jp>
405
406 grep: avoid unnecessary regex compilation
407 Grep resorts to using the regex engine when the precision of either
408 -o or --color is required, or when the pattern is not supported by
409 our DFA engine (e.g., backref). Otherwise, grep would perform regex
410 compilation solely to check the syntax. This change makes grep skip
411 that compilation in the common case for which it is unnecessary.
412
413 The compilation we are avoiding is quite costly, consuming O(N^2)
414 RSS for N regular expressions.
415
416 * src/dfasearch.c (GEAcompile): Add new argument, and avoid unneeded
417 compilation of regex.
418 * src/grep.c (compile_fp_t): Update prototype.
419 (main): Update caller.
420 * src/kwsearch.c (Fcompile): Update caller and add new argument.
421 * src/pcresearch.c (Pcompile): Add new argument.
422 * src/search.h (GEAcompile, Fcompile, Pcompile): Update prototype.
423
4242020-09-22 Jim Meyering <meyering@fb.com>
425
426 build: update gnulib to latest
427
428 tests: skip stack-overflow test on midnightbsd*
429 * tests/stack-overflow: skip_ when run on this OS. See details
430 in https://lists.gnu.org/r/grep-devel/2020-09/msg00062.html
431 * tests/Makefile.am (host_triplet): Export.
432
4332020-09-21 Paul Eggert <eggert@cs.ucla.edu>
434
435 doc: say how to match chars by code
436 From a suggestion in Bug#41004.
437 * doc/grep.texi (Character Encoding, Matching Non-ASCII):
438 New sections. Move some material from Environment Variables
439 into these sections.
440
4412020-09-18 Paul Eggert <eggert@cs.ucla.edu>
442
443 * src/dfasearch.c (struct dfa_comp): Fix out-of-date comment.
444
445 grep: "grep '\)'" reports an error again
446 * src/grep.c (try_fgrep_pattern): With -G, pass \) through to
447 GEAcompile so that it can complain. This fixes an unexpected
448 change in behavior from grep 3.4 and earlier.
449 * tests/filename-lineno.pl: Add tests for this sort of thing.
450
451 grep: tweak by using mempcpy
452 * src/grep.c (try_fgrep_pattern): Tweak previous change
453 by using mempcpy.
454
4552020-09-18 Jim Meyering <meyering@fb.com>
456
457 grep: make echo .|grep '\.' match once again
458 The same applied for many other backslash-escaped bytes, not just
459 metacharacters. The switch to rawmemchr in v3.4-almost-10-g9393b97
460 made some parts of the code require the usually-guaranteed newline
461 sentinel at the end of each pattern. Before, some consumers used a
462 (correct) pattern length and did not care that try_fgrep_pattern could
463 transform a pattern (with sentinel) like "\\.\n" to "..\n", thus
464 violating that assumption.
465 * src/grep.c (try_fgrep_pattern): Preserve the invariant
466 that each regexp is newline-terminated.
467 * tests/backslash-dot: New file. Test for this.
468 * tests/Makefile.am (TESTS): Add it.
469
470 tests: triple-backref: print a reference to glibc bug
471 * tests/triple-backref (MALLOC_CHECK_): And tell glibc not to
472 bother with a core dump. Suggested by Pádraig Brady.
473
4742020-09-18 Paul Eggert <eggert@cs.ucla.edu>
475
476 grep: be more consistent about diagnostic format
477 * NEWS: Mention this.
478 * bootstrap.conf (gnulib_modules): Remove 'quote'.
479 * src/grep.c: Do not include quote.h.
480 (grep, grepdirent, grepdesc): Put the three unusual diagnostics
481 into the same "grep: FOO: message" form that grep uses elsewhere.
482 * tests/binary-file-matches, tests/in-eq-out-infloop:
483 Adjust tests to match new diagnostic format.
484
4852020-09-17 Jim Meyering <meyering@fb.com>
486
487 build: update gnulib to latest
488
4892020-09-17 Paul Eggert <eggert@cs.ucla.edu>
490
491 * tests/triple-backref: Add comment.
492
4932020-09-17 Jim Meyering <meyering@fb.com>
494
495 tests: make new test executable, to placate distcheck
496 * tests/binary-file-matches: Make this executable.
497
498 tests: add coverage for code that emits the new diagnostic
499 * tests/binary-file-matches: New file.
500 * tests/Makefile.am (TESTS): Add it.
501
502 maint: avoid syntax-check failure
503 * src/grep.c (grep): Lower-case the "B" in "Binary file... matches"
504 diagnostic that we now emit to stderr. This avoids the following
505 when running "make syntax-check":
506 maint.mk: found capitalized error message
507 make: *** [maint.mk:469: sc_error_message_uppercase] Error 1
508
5092020-09-17 Paul Eggert <eggert@cs.ucla.edu>
510
511 Send "Binary file FOO matches" to stderr
512 * NEWS, doc/grep.texi: Mention this change (Bug#29668).
513 * src/grep.c (grep): Send "Binary file FOO matches" to stderr
514 instead of stdout.
515 * tests/encoding-error, tests/invalid-multibyte-infloop:
516 * tests/null-byte, tests/pcre-count, tests/surrogate-pair:
517 * tests/symlink, tests/unibyte-binary:
518 Adjust tests to match new behavior. In all cases this
519 simplifies the tests, which is a good sign.
520
521 Suppress "Binary file FOO matches" if -I
522 Problem reported by Jason Franklin (Bug#33552).
523 * NEWS: Mention this.
524 * src/grep.c (grep): Do not output "Binary file FOO matches" if -I.
525 * tests/encoding-error: Add test for this bug.
526
5272020-09-15 Jim Meyering <meyering@fb.com>
528
529 maint: keep two blank lines before each old Noteworthy line.
530 * NEWS: Insert a blank line.
531
5322020-09-15 Paul Eggert <eggert@cs.ucla.edu>
533
534 build: update gnulib submodule to latest
535
5362020-09-13 Paul Eggert <eggert@cs.ucla.edu>
537
538 build: update gnulib submodule to latest
539
5402020-09-12 Paul Eggert <eggert@cs.ucla.edu>
541
542 build: update gnulib submodule to latest
543
5442020-09-11 Jim Meyering <meyering@fb.com>
545
546 build: update gnulib to latest
547
5482020-09-09 Paul Eggert <eggert@cs.ucla.edu>
549
550 grep: fix logic for growing PCRE JIT stack
551 * src/pcresearch.c (jit_exec) [PCRE_EXTRA_MATCH_LIMIT_RECURSION]:
552 When growing the match_limit_recursion limit, do not use the old
553 value if ! (flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION), as it is
554 uninitialized in that case.
555
556 grep: fix PCRE JIT test when JIT not available
557 Problem reported by Thomas Deutschmann (Bug#29446#23).
558 * src/pcresearch.c (Pexecute): Diagnose PCRE_ERROR_RECURSIONLIMIT.
559 * tests/pcre-jitstack: Treat recursion limit overflow like stack
560 overflow.
561
562 grep: fix -w bug in UTF-8 locales
563 Problem reported by Mayo Fark (Bug#43225).
564 * src/searchutils.c (wordchar_prev): In a UTF-8 locale, do not
565 assume that an encoding-error byte cannot be part of a word
566 constituent, as this assumption is incorrect for the last byte
567 of a multibyte word constituent.
568 * tests/word-delim-multibyte: Add a test for the bug.
569
570 Distribute a gzip tarball again
571 Requested by Issam E. Maghni in:
572 https://lists.gnu.org/r/grep-devel/2020-09/msg00000.html
573 * configure.ac (AM_INIT_AUTOMAKE): Remove no-dist-gzip.
574
575 * README-prereq: Also mention xz.
576
5772020-09-07 Paul Eggert <eggert@cs.ucla.edu>
578
579 Prefer rawmemchr to memchr when it’s easy
580 * bootstrap.conf (gnulib_modules): Add rawmemchr.
581 * src/dfasearch.c (GEAcompile, EGexecute):
582 * src/grep.c (update_patterns, prpending, prtext):
583 * src/kwsearch.c (Fcompile, Fexecute):
584 * src/pcresearch.c (Pcompile, Pexecute):
585 Simplify (and presumably speed up a little) by using rawmemchr
586 with a sentinel, instead of using memchr.
587
588 Simplify pattern_file_name
589 * src/grep.c (pattern_file_name): Make first argument
590 origin-0, not origin-1, as this simplifies both caller and
591 callee. All uses changed.
592
593 Simplify regex_compile
594 * src/dfasearch.c (regex_compile): "" suffices; we don’t need "\0".
595 No need to initialize pat_lineno.
596
597 Omit duplicate regexps
598 Do not pass two copies of the same regexp to the
599 regular-expression engine. Although the engines should
600 perform nearly as well even with the copies, in practice they do not.
601 Problem reported by Luca Borzacchiello (Bug#43040).
602 * bootstrap.conf (gnulib_modules): Add hash.
603 * src/grep.c: Include stdint.h, for SIZE_WIDTH.
604 Include hash.h.
605 (struct patloc, patloc, patlocs_allocated, patlocs_used):
606 Rename from struct FL_pair, fl_pair, n_fl_pair_slots, n_pattern_files,
607 respectively, since the data type is no longer a pair.
608 All uses changed.
609 (struct patloc): New member FILELINE. The lineno member is now
610 ptrdiff_t since nowadays we prefer signed types.
611 (pattern_array, patterns_table): New static vars.
612 (count_nl_bytes, fl_add): Remove; no longer used.
613 (hash_pattern, compare_patterns, update_patterns): New functions.
614 update_patterns does what fl_add used to do, plus remove dups.
615 (pattern_file_name): Adjust to change from fl_pair to patloc.
616 (main): Move some variables to inner blocks for clarity.
617 Maintain the pattern_table hash of all patterns.
618 Update pattern_array to match keys, and use update_patterns
619 instead of fl_add to remove duplicate keys.
620 * tests/filename-lineno.pl (invalid-re-2-files)
621 (invalid-re-2-files2, invalid-re-2e): Ensure regexps are unique in
622 tests so that dups aren’t removed in diagnostics.
623 (invalid-re-line-numbers): New test.
624
6252020-08-23 Jim Meyering <meyering@fb.com>
626
627 build: update gnulib to latest
628 * gnulib: Update submodule to latest.
629 * bootstrap.conf (gnulib_modules): Add explicit dependency on dirname-lgpl.
630 Before, we pulled this in via a dependency.
631 * bootstrap: Update from gnulib.
632
633 build: require autoconf-2.64
634 * configure.ac: Require autoconf-2.64, up from 2.63, to align with gnulib.
635
6362020-08-22 Paul Eggert <eggert@cs.ucla.edu>
637
638 Revert -L exit status change introduced in grep 3.2
639 Problems reported by Antonio Diaz Diaz in:
640 https://bugs.gnu.org/28105#29
641 * NEWS, doc/grep.texi (Exit Status), src/grep.c (usage):
642 Adjust documentation accordingly.
643 * src/grep.c (grepdesc, main): Go back to old behavior.
644 * tests/skip-read: Adjust tests accordingly.
645
6462020-01-20 Paul Eggert <eggert@cs.ucla.edu>
647
648 tests: fix permission issue in previous change
649
650 tests: work around GCC -fprofile-generate bug
651 * tests/triple-backref: Add a 10 s timeout to work around
652 what appears to be a GCC bug with -fprofile-generate.
653 Problem reported by Martin Liška, with diagnosis by
654 Andreas Schwab (Bug#21513).
655
6562020-01-02 Jim Meyering <meyering@fb.com>
657
658 maint: post-release administrivia
659 * NEWS: Add header line for next release.
660 * .prev-version: Record previous version.
661 * cfg.mk (old_NEWS_hash): Auto-update.
662
663 version 3.4
664 * NEWS: Record release date.
665
666 build: update gnulib to latest, for mbrtowc-vs-Irix build fix
667
6682020-01-02 Paul Eggert <eggert@cs.ucla.edu>
669
670 doc: mention glibc bug 24269
671 * doc/grep.texi (Known Bugs): Mention glibc bug 24269.
672 Merge formatting/URL changes from Gnulib regex.texi.
673
674 doc: fix --exclude description in man page
675 Problem reported by Duncan Moore (Bug#37212).
676 * src/grep.c (usage): Fix incorrect statement about --exclude
677 and directories. Standardize on “that match GLOB” instead
678 of “matching GLOB”.
679
680 doc: fix missing “more” in man page
681 Problem reported by Philippe Schnoebelen (Bug#34078).
682 * doc/grep.in.1: Add missing “more”.
683
6842020-01-01 Paul Eggert <eggert@cs.ucla.edu>
685
686 doc: add [:blank:] to man page
687 * doc/grep.in.1: Mention [:blank:] (Bug#33291).
688
6892020-01-01 Jim Meyering <meyering@fb.com>
690
691 maint: update all copyright year number ranges
692 Run "make update-copyright" and then...
693 * gnulib: Update to latest with copyright year adjusted.
694 * tests/init.sh: Sync with gnulib to pick up copyright year.
695 * bootstrap: Likewise.
696 * doc/grep.in.1: Use "-" in copyright year ranges, not \en.
697
6982019-12-31 Jim Meyering <meyering@fb.com>
699
700 tests: avoid unwarranted failure in a netbsd 8.1 VM
701 * tests/mb-non-UTF8-perf-Fw: Run twice, to avoid first-read penalty.
702 Reported by Nelson H.F. Beebe.
703
7042019-12-30 Jim Meyering <meyering@fb.com>
705
706 build: update gnulib to latest (for localeinfo perf fix)
707
708 maint: add syntax-check rule to prohibit "backreference" spelling
709 * cfg.mk (sc_prohibit_backref): New rule.
710
7112019-12-30 Paul Eggert <eggert@cs.ucla.edu>
712
713 maint: remove too-long line from AUTHORS
714 * AUTHORS: Remove URL that’s too long.
715
716 maint: update AUTHORS
717 * AUTHORS: Update to better reflect current authorship.
718
7192019-12-30 Jim Meyering <meyering@fb.com>
720
721 avoid new syntax-check failures
722 * cfg.mk (old_NEWS_hash): Updating old news, we must also udpate this.
723
7242019-12-30 Paul Eggert <eggert@cs.ucla.edu>
725
726 doc: don’t encourage back-references
727 * doc/grep.texi (Usage): Remove palindrome question. Bondioni’s
728 RE makes grep issue a ‘grep: stack overflow’ diagnostic, and we
729 shouldn’t be encouraging fancy back-references anyway, due to all
730 the bugs in this area (Bug#26864). Plus, the allusion to
731 “GNU extensions” doesn't seem to be correct here.
732
733 doc: robustify some examples
734 Prompted by suggestions by Stephane Chazelas (Bug#38792#20).
735 * doc/grep.texi (Usage): Make examples more robust.
736
737 doc: fix bug# typo
738
739 doc: spell "back-reference" more consistently
740
741 doc: mention back-reference bugs
742 Inspired by Bug#26864.
743 * doc/grep.texi (Known Bugs): New section.
744 Mention back-reference issues.
745
7462019-12-29 Paul Eggert <eggert@cs.ucla.edu>
747
748 doc: Add -- to more-complex example
749 Suggested by Stephane Chazelas (Bug#38792).
750 * doc/grep.in.1, doc/grep.texi: Add ‘--’ to recently-added example.
751
752 doc: improve subsection title (Bug#26132)
753 * doc/grep.in.1: Rename "Matcher Selection" to "Pattern Syntax".
754
755 doc: fix typo in previous patch
756
757 doc: document quoting better
758 Problem reported by Martin Simons (Bug#38792).
759 * doc/grep.texi: Fix quoting used in examples. Say that patterns
760 should be quoted, use quoting more consistently in examples, and
761 give an example illustrating the difference between patterns and
762 globbing. Don’t assume zgrep expertise in example.
763 * doc/grep.in.1: Likewise. Also, reorder sections
764 to match GNU/Linux man-pages style.
765
7662019-12-26 Jim Meyering <meyering@fb.com>
767
768 maint: tweak NEWS wording
769 * NEWS: Minor wording change.
770
771 build: update gnulib to latest; and sync tests/init.sh
772 * gnulib: update
773 * tests/init.sh: Sync from gnulib (this removes the LC_ALL=C setting).
774
775 tests: avoid spurious failure due to 1-second timeout
776 * tests/grep-dev-null-out: Use a 10-second timeout, rather than
777 a 1-second one. This avoids false failure on slow systems.
778 Reported by Assaf Gordon in
779 https://lists.gnu.org/r/grep-devel/2019-12/msg00018.html
780
7812019-12-26 Paul Eggert <eggert@cs.ucla.edu>
782
783 build: update gnulib submodule to latest
784
785 maint: adjust surrogate-pair for 16-bit wchar_t
786 * tests/surrogate-pair: Adjust to match fixed behavior
787 on AIX 7.2, where wchar_t is 16 bits and cannot represent
788 the test case data.
789
7902019-12-25 Jim Meyering <meyering@fb.com>
791
792 tests: fix typo in name of test file
793 * tests/backslash-s-vs-invalid-multitype: Rename to...
794 * tests/backslash-s-vs-invalid-multibyte: ...this.
795 * tests/Makefile.am (TESTS): Reflect renaming.
796
797 tests: ensure we use require_timeout_ when needed
798 * cfg.mk (sc_timeout_prereq): New syntax-check rule.
799
800 tests: require timeout
801 * tests/mb-non-UTF8-perf-Fw: This test uses "timeout",
802 so must first call require_timeout_.
803 This avoids test spurious failure when running with
804 no timeout program. Reported by Bruno Haible in
805 https://lists.gnu.org/r/grep-devel/2019-12/msg00008.html
806
8072019-12-25 Paul Eggert <eggert@cs.ucla.edu>
808
809 tests: work around AIX 7.2 sh printf bug
810 AIX 7.2 /bin/sh’s printf command mishandles octal escapes
811 in multibyte locales: it treats them as characters, not bytes.
812 * tests/backslash-s-vs-invalid-multitype, tests/encoding-error:
813 Use the C locale when employing the printf command with an octal
814 escape that AIX 7.2 sh might mishandle.
815 * tests/init.sh (setup_): Use the C locale for tests.
816 This has the side benefit of making them more reproducible.
817
8182019-12-22 Jim Meyering <meyering@fb.com>
819
820 maint: adjust new comments
821 * src/dfasearch.c (possible_backrefs_in_pattern): Remove a
822 duplicate "a", insert a "be" and a comma, and reformat.
823
824 build: update gnulib to latest
825 * gnulib: Update submodule to latest.
826 * bootstrap: Copy from gnulib.
827 * tests/init.sh: Likewise.
828
8292019-12-22 Paul Eggert <eggert@cs.ucla.edu>
830
831 grep: fix some bugs in pattern-grouping speedup
832 This fixes some bugs in the previous commit,
833 and should finish the fix for Bug#33249.
834 * NEWS: Mention fix for Bug#33249.
835 * src/dfasearch.c (possible_backrefs_in_pattern, regex_compile)
836 (GEAcompile): In new code, prefer ptrdiff_t to size_t when either
837 will do, since ptrdiff_t has better error checking. At some point
838 we should adjust the old code too.
839 (possible_backrefs_in_pattern): Rename from
840 find_backref_in_pattern. New arg BS_SAFE. All uses changed.
841 Fix false negative if a multibyte character ends in a single
842 '\\' byte, followed by the two bytes '\\', '1'.
843 (regex_compile): Simplify.
844 (GEAcompile): Avoid quadratic behavior when reallocating growing
845 buffers. Fix a couple of bugs in copying pattern data involving
846 backreferences. Fix another bug in copying pattern metadata
847 involving backreferences, by removing the need to copy it.
848
8492019-12-22 Norihiro Tanaka <noritnk@kcn.ne.jp>
850
851 grep: grouping of a pattern with multiple lines
852 When grep uses regex, it splits a pattern with multiple lines by
853 newline character into fragments. Compilation and execution run for
854 each fragment. That causes slowdown. By this change, each fragment is
855 divided into groups by whether the fragment includes back references.
856 A fragment with back references constitutes group, and all fragments
857 that lack back references also constitute a group.
858
859 This change extremely speeds-up following case.
860
861 $ seq -f '%040g' 0 9999 | sed '1s/$/\\(0\\)\\1/' >pat
862 $ yes 00000000000000000000000000000000000000000x | head -10000 >in
863 $ time -p env LC_ALL=C src/grep -f pat in
864
865 * src/dfasearch.c (find_backref_in_pattern, regex_compile):
866 New functions.
867 (GEAcompile): Use the new functions to group fragments
868 as mentioned above.
869
8702019-12-19 Paul Eggert <eggert@cs.ucla.edu>
871
872 maint: add NEWS for Bug#34951 fix
873 * NEWS: Mention Bug#34951.
874
8752019-12-19 Norihiro Tanaka <noritnk@kcn.ne.jp>
876
877 dfa: separate parse and compile phase
878 DFAMUST() must be called after parse and before tokens re-order which is
879 introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98, but both are
880 executed in compilation phase.
881
882 * lib/dfa.c (dfaparse): Change it to global function.
883 (dfacomp): If first argument is NULL, skip parse.
884 * lib/dfa.h: (dfaparse): Add a prototype.
885
8862019-12-19 Paul Eggert <eggert@cs.ucla.edu>
887
888 build: update gnulib submodule to latest
889
8902019-12-19 Norihiro Tanaka <noritnk@kcn.ne.jp>
891
892 grep: speed up multiple word matching
893 grep uses its KWset matcher for multiple word matching, but that is
894 very slow when most of the parts matched to a pattern are not words.
895 So, if the first match to a pattern is not a word, use the grep matcher
896 to match for its line.
897
898 Note that when START_PTR is set, the grep matcher uses the regex matcher
899 which is very slow to match words. Therefore, we use the grep matcher
900 when only START_PTR is NULL.
901
902 * src/kwsearch.c (Fexecute): If an initial match is incomplete because
903 not on a word boundary, use the grep matcher to find a matching line.
904
9052019-12-18 Jim Meyering <meyering@fb.com>
906
907 maint: sort test names
908 * tests/Makefile.am (TESTS): Alphabetize the new addition,
909 mb-non-UTF8-perf-Fw to placate syntax-check's sc_sorted_tests.
910
9112019-12-18 Paul Eggert <eggert@cs.ucla.edu>
912
913 maint: adjust to recent Gnulib change
914 * po/POTFILES.in: Remove lib/xstrtol-error.c.
915
9162019-12-17 Paul Eggert <eggert@cs.ucla.edu>
917
918 grep: do not match invalid UTF-8
919 Update Gnulib to latest. Also:
920 * src/dfasearch.c (EGexecute): Use ptrdiff_t, not size_t,
921 to match new Gnulib API.
922 * tests/Makefile.am (TESTS): Add dfa-invalid-utf8.
923 * tests/dfa-invalid-utf8: New file.
924
9252019-11-30 Jim Meyering <meyering@fb.com>
926
927 tests: add test that would have detected -Fw perf regression
928 * tests/mb-non-UTF8-perf-Fw: New file. Detect v3.3-22-g090a4db's
929 performance regression.
930 * tests/Makefile.am (TESTS): Add it.
931
9322019-11-29 Jim Meyering <meyering@fb.com>
933
934 maint: fix test comment
935 * tests/mb-non-UTF8-word-boundary: Also correct "introduced-in"
936 version number in a comment here.
937
9382019-11-25 Jim Meyering <meyering@fb.com>
939
940 maint: correct NEWS blurb
941 * NEWS (Bug fixes): Correction: the -Fw bug was introduced
942 in 2.28, not in 3.0. Reported by Paul Eggert.
943
9442019-11-17 Norihiro Tanaka <noritnk@kcn.ne.jp>
945
946 grep: improve grep -Fw performance in non-UTF8 multibyte locales
947 * src/searchutils.c (mb_goback): New parameter. All callers changed.
948 * src/search.h (mb_goback): Update prototype.
949 * src/kwsearch.c (Fexecute): Use mb_goback's MBCLEN to detect a
950 word-boundary even more efficiently.
951
952 grep: fix performance regression with previous patch
953 * src/kwsearch.c (Fexecute): Avoid unnecessary back-up in non-UTF8
954 multibyte locales.
955
9562019-11-16 Jim Meyering <meyering@fb.com>
957
958 maint: rename a variable: bol -> nl
959 * src/kwsearch.c (Fexecute): Change misleading name: s/bol/nl/
960
961 build: update gnulib to latest
962
963 maint: correct and clarify a comment
964 * src/kwsearch.c (Fexecute): Logic was reversed.
965
966 grep: avoid false -Fw match in non-UTF8 multibyte locales
967 For example, this command would erroneously print its input line:
968 echo ab | LC_CTYPE=ja_JP.eucjp grep -Fw b
969 This arose when the "memrchr" search for a preceding newline failed:
970 in that case, MB_START was not adjusted and was initially the same
971 as BEG, so wordchar_prev mistakenly returned 0.
972 * src/kwsearch.c (Fexecute): Set MB_START also when there is no
973 preceding newline.
974 * NEWS (Bug fixes): Mention it.
975 * tests/mb-non-UTF8-word-boundary: New file. Test for the bug.
976 * tests/Makefile.am (TESTS): Add it.
977 Reported by NIDE, Naoyuki in https://bugs.gnu.org/38223.
978
9792019-11-08 Jim Meyering <meyering@fb.com>
980
981 build: update gnulib to latest
982 * po/POTFILES.in: Add lib/argmatch.h.
983
9842019-11-05 Paul Eggert <eggert@cs.ucla.edu>
985
986 grep: new --no-ignore-case option
987 Suggested by Karl Berry and mostly implemented by Arnold Robbins
988 (Bug#37907).
989 * NEWS:
990 * doc/grep.in.1:
991 * doc/grep.texi (Matching Control):
992 * src/grep.c (usage):
993 Document the new option.
994 * src/grep.c (NO_IGNORE_CASE_OPTION): New constant.
995 (long_options, main): Support new option.
996
997 grep: simplify previous patch
998 * src/grep.c (main): Use an int rather than an enum for a local
999 var, which is overkill here.
1000
1001 grep: further simplify out_file handling
1002 * src/grep.c (print_filenames): Make this a local variable instead
1003 of static. Rename it to filename_option, to avoid confusion with
1004 the print_filename function, and rename the enum values for the
1005 same reason. All uses changed.
1006 (out_file): Now -1, 0, 1 to represent unknown, false, true.
1007 All uses changed.
1008 (single_command_line_arg): Remove. This static variable’s
1009 function is now accomplished by a local variable ‘num_operands’.
1010 (grepdesc): Simplify adjustment of out_file accordingly.
1011 (main): Initialize out_file to -1 if not known yet.
1012
10132019-11-05 Zev Weiss <zev@bewilderbeest.net>
1014
1015 grep: simplify out_file handling
1016 * src/grep.c (print_filenames): New tristate enum (-H, -h, or
1017 neither); supplants with_filenames and no_filenames.
1018 (single_command_line_arg): New variable indicating if grep was run
1019 with a single command-line argument.
1020 (no_filenames): Remove variable.
1021 (grepdirent): Don't twiddle out_file back and forth during recursion.
1022 (grepdesc): Turn off out_file on 'grep -r foo nondirectory'.
1023 (main): Replace with_filenames and no_filenames with print_filenames.
1024 Enable out_file when both -r/-R and multiple arguments are given.
1025
10262019-10-12 Paul Eggert <eggert@cs.ucla.edu>
1027
1028 grep: fix ‘grep -L ... >/dev/null’ bug
1029 Problem reported by Adam Sampson (Bug#37716).
1030 * NEWS: Mention this.
1031 * src/grep.c (grepdesc): Don’t assume that stdout being /dev/null
1032 means list_files == LISTFILES_NONE.
1033 (main): Do not change list_files merely because stdout is /dev/null.
1034 * tests/skip-read: Test for this bug.
1035
10362019-10-03 Paul Eggert <eggert@cs.ucla.edu>
1037
1038 grep: tighten -i doc
1039 * doc/grep.in.1:
1040 * doc/grep.texi (Matching Control):
1041 * src/grep.c (usage):
1042 Make it clearer that -i affects patterns and data, but not
1043 file names (Bug#37604).
1044
10452019-03-10 Paul Eggert <eggert@cs.ucla.edu>
1046
1047 maint: fix “/src/grep: No such file or directory”
1048 Problem reported by Jim Meyering in:
1049 https://lists.gnu.org/r/grep-devel/2019-02/msg00000.html
1050 * NEWS: Mention the change.
1051 * configure.ac (fn_grep): Remove. This old attempt to fix
1052 <https://savannah.gnu.org/bugs/?31646> wasn’t working anyway,
1053 since subprograms didn’t grok fn_grep. People building on Solaris
1054 will need a working grep, which is reasonably standard nowadays.
1055 (GREP, EGREP): Do not override. This way, we test the
1056 newly-built grep only when running ‘make test’ and suchlike.
1057 Instead, output a hopefully-helpful diagnostic if the
1058 system 'grep' does not work.
1059
10602019-02-18 Jim Meyering <meyering@fb.com>
1061
1062 tests: avoid false positive upon stack overflow
1063 * tests/pcre-jitstack: Don't let a stack overflow evoke a false
1064 failure. This test is to ensure there is no internal PCRE error.
1065 Reported by Andreas Schwab in http://bugs.gnu.org/34370
1066
10672019-02-16 Jim Meyering <meyering@fb.com>
1068
1069 build: avoid build failure with --enable-gcc-warnings
1070 * src/kwset.c (bmexec_trans): Define with _GL_ATTRIBUTE_PURE,
1071 per suggestion from recent gcc snapshot.
1072
10732019-02-03 Paul Eggert <eggert@cs.ucla.edu>
1074
1075 doc: clarify --exclude globbing
1076 Problem reported by Paul Jackson.
1077 * doc/grep.in.1:
1078 * doc/grep.texi (File and Directory Selection):
1079 Clarify how --exclude globbing works.
1080
1081 grep: parse --color arg independent of locale
1082 This is a better fix for Bug#34285.
1083 * bootstrap.conf (gnulib_modules): Add c-strcase.
1084 * src/grep.c: Include c-strcase.h, not strings.h.
1085 (main): Use c_strcasecmp, not strcasecmp.
1086
10872019-02-02 Paul Eggert <eggert@cs.ucla.edu>
1088
1089 grep: fix grep.c includes
1090 * src/grep.c: Include strings.h; problem reported by David
1091 Monniaux (Bug#34285). Do not include fcntl.h, as system.h does
1092 that for us.h
1093
1094 build: update gnulib submodule to latest
1095
10962019-01-20 Jim Meyering <meyering@fb.com>
1097
1098 build: ensure no VLA is used
1099 Cause developer builds to fail for any use of a VLA.
1100 VLAs (variable length arrays) limit portability.
1101 * configure.ac (nw): Remove -Wvla from the list of disabled warnings,
1102 thus enabling the warning when configured with --enable-gcc-warnings.
1103 (GNULIB_NO_VLA) Define, disabling use of VLAs in gnulib. This commit
1104 is functionally equivalent to coreutils' v8.30-44-gd26dece5d.
1105
1106 build: update gnulib to latest
1107
11082019-01-20 Paul Eggert <eggert@cs.ucla.edu>
1109
1110 doc: --binary-files update in man page
1111 * doc/grep.in.1: Adjust --binary-files description to match that
1112 in doc/grep.texi. When I updated the documentation in
1113 2016-09-09T01:33:14!eggert@cs.ucla.edu I forgot to update the man
1114 page accordingly (Bug#33898).
1115
1116 grep: simplify pcresearch.c ifdefs
1117 This fixes a warning if PCRE is not used (Bug#34054).
1118 * configure.ac (USE_PCRE): New conditional.
1119 * src/Makefile.am (grep_SOURCES) [!USE_PCRE]: Omit pcresearch.c.
1120 * src/grep.c (matchers) [!HAVE_LIBPCRE]: Omit perl matcher.
1121 (setmatcher) [!HAVE_LIBPCRE]: If helpful, mention
1122 --disable-perl-regexp in diagnostic.
1123 * src/pcresearch.c: Simplify by assuming HAVE_LIBPCRE.
1124
11252019-01-01 Jim Meyering <meyering@fb.com>
1126
1127 maint: update all copyright dates via "make update-copyright"
1128 * gnulib: Also update submodule for its copyright updates.
1129
11302018-12-20 Jim Meyering <meyering@fb.com>
1131
1132 doc: fix the bug-introduced version in 3.3's announcement
1133 * NEWS: Correct bug-introduced version (s/2.3/3.2/).
1134 * cfg.mk (old_NEWS_hash): Updating old news, we must also udpate this.
1135
1136 maint: post-release administrivia
1137 * NEWS: Add header line for next release.
1138 * .prev-version: Record previous version.
1139 * cfg.mk (old_NEWS_hash): Auto-update.
1140
1141 version 3.3
1142 * NEWS: Record release date.
1143
1144 grep: fix \b DFA-bug in C locale
1145 Under some conditions, \b would mistakenly fail to match, e.g.
1146 echo 123-x|LC_ALL=C grep '.\bx'
1147 * NEWS (Bug fixes): Mention it
1148 * gnulib: Update to latest, for DFA regression fix.
1149 * tests/word-delim-multibyte: Add a test for the dfa.c regression.
1150
11512018-12-20 Paul Eggert <eggert@cs.ucla.edu>
1152
1153 grep: fit --version authorship into 80
1154 * src/grep.c (AUTHORS): Remove.
1155 (main): Output the authorship info ourselves instead of having
1156 version_etc do it. This is better for i18n anyway.
1157
1158 build: update gnulib submodule to latest
1159
11602018-12-20 Jim Meyering <meyering@fb.com>
1161
1162 maint: post-release administrivia
1163 * NEWS: Add header line for next release.
1164 * .prev-version: Record previous version.
1165 * cfg.mk (old_NEWS_hash): Auto-update.
1166
1167 version 3.2
1168 * NEWS: Record release date.
1169
11702018-12-18 Jim Meyering <meyering@fb.com>
1171
1172 build: update gnulib for c-stack fix
1173
11742018-12-17 Bruno Haible <bruno@clisp.org>
1175
1176 tests: stack-overflow: avoid unwarranted test failure on some hosts
1177 * tests/stack-overflow: Use ulimit to limit stack size. Otherwise,
1178 at least on gcc113, grep would fail to overflow its stack, so this
1179 test would fail to find the required diagnostic and would fail.
1180
11812018-12-16 Jim Meyering <meyering@fb.com>
1182
1183 tests: reenable the surrogate-pair test
1184 This reverts commit bdb98cec2e7bf255e1d00eaf8be16299f7bf571e,
1185 but adding the comment changes suggested by Bruno Haible in
1186 https://lists.gnu.org/r/grep-devel/2018-12/msg00037.html
1187 * tests/surrogate-pair: New file.
1188 * tests/Makefile.am (TESTS): List it.
1189
11902018-12-16 Bruno Haible <bruno@clisp.org>
1191
1192 tests: stackoverflow: fix test failure on HardenedBSD 11
1193 * tests/stack-overflow: Try up to 10 million opening parentheses.
1194
11952018-12-16 Jim Meyering <meyering@fb.com>
1196
1197 tests: remove stale surrogate-pair test
1198 The cygwin-specific code for surrogate pairs was first disconnected
1199 via v2.21-62-g936c904 and later removed as part of a then-unused
1200 function via v2.24-12-g704de87. So now I'm removing the test, too.
1201 If someone thinks it important and would like to revive it, please do.
1202 * tests/surrogate-pair: Remove file.
1203 * tests/Makefile.am (TESTS): Remove it.
1204
12052018-12-16 Paul Eggert <eggert@cs.ucla.edu>
1206
1207 build: update gnulib submodule to latest
1208
12092018-12-15 Jim Meyering <meyering@fb.com>
1210
1211 tests: stack-overflow: handle the case of success without the diagnostic
1212 * tests/stack-overflow: Do not always require a stack
1213 overflow diagnostic.
1214
1215 build: update gnulib to latest
1216 * gnulib: Update to latest, to pull in code that now compensates for
1217 a bug in glibc-2.27 and prior.
1218
1219 build: make the autoconf-2.63 requirement explicit
1220 * configure.ac: AC_PREREQ: Require 2.63, not 2.59. And quote properly.
1221 Autoconf-2.63 has been required for some time via gnulib.
1222 This merely makes it explicit.
1223
12242018-12-15 Paul Eggert <eggert@cs.ucla.edu>
1225
1226 tests: fix diagnostic typo
1227 Fix by Bruno Haible in:
1228 https://lists.gnu.org/r/grep-devel/2018-12/msg00003.html
1229 * tests/init.cfg (envvar_check_fail): Fix diagnostic.
1230
12312018-11-24 Jim Meyering <meyering@fb.com>
1232
1233 tests: stack-overflow: avoid false failure
1234 * tests/stack-overflow: This test would fail to elicit a stack overflow
1235 diagnostic on some OS X systems. Rewrite to iterate, gradually increasing
1236 the size of the input regex, stopping when grep emits the desired diagnostic
1237 or the size reaches a reasonable limit.
1238
12392018-10-16 Jim Meyering <meyering@fb.com>
1240
1241 tests: reduce the sole failing test
1242 * tests/backref-alt: Significantly reduce abort-inducing input.
1243
1244 build: update gnulib to latest; also update bootstrap and init.sh
1245
12462018-10-13 Jim Meyering <meyering@fb.com>
1247
1248 doc: NEWS: mention performance improvements
1249 * NEWS (Improvements): Mention them.
1250
12512018-10-13 Jim Meyering <meyering@fb.com>
1252
1253 grep: triple initial buffer size: 32k->96k
1254 Changing 32k to 96k gives a 3-23% performance improvement.
1255 All timings ran with this diff on top of commit v3.1-39-g7179b21:
1256
1257 for n in 32 64 96 128; do
1258 echo n=$n
1259 perl -pi -e 's/(INITIAL_BUFSIZE =) \d+/$1 '$n/ src/grep.c &&
1260 make AM_CFLAGS=-O3 WERROR_CFLAGS= >& makerr-$n &&
1261 for needle in 1f2 1f298lkjskjhahjklkj34; do
1262 echo " needle=$needle"
1263 for i in $(seq 10); do
1264 env MALLOC_PERTURB_= time -qf%e src/grep $needle w2000
1265 done 2>&1 |sort -g | tee >(head -1|sed 's/^/ /') > .time-${n}KB-$needle
1266 done
1267 done
1268
1269 Tested searchs: search for a short literal pattern that is not
1270 present in 9.3GB file containing 2000 copies of /usr/dict/words
1271 created via this:
1272 ln -s /usr/share/dict/words k && cat $(yes k|head -2000) > w2000
1273 I ran this command:
1274 env MALLOC_PERTURB_= time src/grep 1f2 w2000
1275 old(32k) vs new elapsed time, best of 10 trials (gcc-9.0.0 20180831, -O3):
1276 32k 64k 96k(%incr) 128k CPU
1277 1.25 1.18 1.16( 7.2) 1.20 i7-4770S@3.10GHz cache=8MB
1278 1.21 1.16 1.17( 3.3) 1.19 Xeon(R) E3-1505M v5 @ 2.80GHz cache=8MB
1279 2.36 2.29 2.29( 3.0) 2.36 Xeon(R) E5-2680 v4 @ 2.40GHz cache=32MB
1280 1.40 1.32 1.31( 6.4) 1.33 i5-6260U @ 1.80GHz cache=4MB
1281 1.31 1.26 1.24( 5.3) 1.23 AMD FX(tm)-4100 cache=2MB (with only 1000 copies)
1282
1283 Searching for a longer string: 1f298lkjskjhahjklkj34
1284 2.03 1.76 1.61(20.7) 1.53 i7-4770S@3.10GHz cache=8MB
1285 1.95 1.70 1.56(20.0) 1.51 Xeon(R) E3-1505M v5 @ 2.80GHz
1286 3.27 2.98 2.84(13.1) 3.02 Xeon(R) E5-2680 v4 @ 2.40GHz
1287 2.48 2.12 1.91(23.0) 1.80 i5-6260U @ 1.80GHz cache=4MB
1288 1.72 1.54 1.46(15.1) 1.41 AMD FX(tm)-4100 cache=2MB
1289
1290 * src/grep.c (INITIAL_BUFSIZE): Triple it: 32kB -> 96kB
1291
12922018-09-28 Barret Rhoden <brho@cs.berkeley.edu> (tiny change)
1293
1294 maint: fix cross-compiling problem
1295 * cfg.mk (PATH): Omit if cross-compiling (Bug#32866).
1296
12972018-09-28 Paul Eggert <eggert@cs.ucla.edu>
1298
1299 build: update gnulib submodule to latest
1300
1301 grep: fix usage 80-column glitch
1302 * src/grep.c (usage): Do not go over 80 columns in the source
1303 code, to pacify "make dist".
1304
13052018-09-19 Paul Eggert <eggert@cs.ucla.edu>
1306
1307 maint: update bootstrap
1308 * bootstrap: Copy from Gnulib.
1309
1310 maint: fix build failure
1311 Problem found by OpenCSW buildbot; the bug also occurs on GNU/Linux
1312 build platforms. The symptom is “system.h:26:24: fatal error:
1313 configmake.h: No such file or directory”. See:
1314 https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/builds/107
1315 * bootstrap.conf: Add configmake, a dependency that was formerly brought
1316 in only by accident.
1317
13182018-09-18 Paul Eggert <eggert@cs.ucla.edu>
1319
1320 build: update gnulib submodule to latest
1321
13222018-08-09 Paul Eggert <eggert@cs.ucla.edu>
1323
1324 tests: fix comment
1325
1326 tests: backref-alt works with glibc 2.28
1327 Problem reported by Jaroslav Skarvada (Bug#32409).
1328 * tests/Makefile.am (XFAIL_TESTS) [!USE_INCLUDED_REGEX]:
1329 Don’t add backref-alt, since this bug is fixed in glibc 2.28.
1330
13312018-05-11 Paul Eggert <eggert@cs.ucla.edu>
1332
1333 doc: “pattern” vs “patterns”
1334 * doc/grep.in.1, doc/grep.texi, src/grep.c (usage): Be more
1335 careful about saying that an argument or option specifies one or
1336 more patterns, not just a single pattern. Problem reported by Kaz
1337 Kylheku (Bug#31400).
1338
1339 build: update gnulib submodule to latest
1340
13412018-04-21 Jim Meyering <meyering@fb.com>
1342
1343 maint: fix new syntax-check (sc_long_lines) failure
1344 * HACKING: Shorten line by one byte to fit in 80 columns.
1345
1346 build: update gnulib to latest
1347
13482018-04-21 Paul Eggert <eggert@cs.ucla.edu>
1349
1350 doc: fix font typo
1351
1352 maint: update URLs
1353 Mostly this is just changing http: to https:.
1354 In one or two places it removes no-longer-useful URLs.
1355
1356 doc: man-page format fixes
1357 * doc/grep.in.1: Fix minor formatting glitches, e.g., extra
1358 space after [...] because groff thought it was a sentence end.
1359 Problem reported by Ingo Schwarze (Bug#31228#11).
1360
13612018-04-20 Paul Eggert <eggert@cs.ucla.edu>
1362
1363 doc: mention encoding errors
1364 This attempts to document the encoding-error problem more
1365 precisely (Bug#30326).
1366 * doc/grep.in.1, doc/grep.texi: Mention that the behavior of
1367 patterns like ‘.’ is not specified on encoding errors.
1368
1369 doc: port better to mandoc
1370 * doc/grep.in.1: Check for groff and its macro packages
1371 independently, as groff can be used with non-groff macro packages.
1372 Use an-ext style macros rather than www.tmac style, as this should
1373 be more portable to mandoc. Problem reported by Laura Morales and
1374 Ingo Schwarze (Bug#31228).
1375
13762018-02-16 Jim Meyering <meyering@fb.com>
1377
1378 maint: avoid new syntax-check failure
1379 * cfg.mk (old_NEWS_hash): Update, to accommodate v3.1-20-g63d4174's
1380 typo fix.
1381
1382 doc: clarify that PCRE support is here to stay
1383 * doc/grep.texi (grep Programs): Clarify: it's not PCRE support
1384 that is experimental, but its combination with --null-data (-z).
1385
13862018-02-05 Paul Eggert <eggert@cs.ucla.edu>
1387
1388 maint: fix typo
1389
13902018-01-06 Jim Meyering <meyering@fb.com>
1391
1392 maint: update gnulib and copyright dates for 2018
1393 * gnulib: Update to latest.
1394 * all files: Run "make update-copyright".
1395 * bootstrap: Update from gnulib.
1396
13972017-12-17 Jim Meyering <meyering@fb.com>
1398
1399 build: link with -lsigsegv, when c-stack module requires it
1400 * src/Makefile.am (grep_LDADD): Add $(LIBCSTACK).
1401 Otherwise, on at least Debian and Arch-based systems, linking would
1402 fail with diagnostics like these:
1403 c-stack.c:207: undefined reference to `stackoverflow_install_handler'
1404 c-stack.c:216: undefined reference to `sigsegv_install_handler'
1405 Reported by Jeremy Feusi.
1406
1407 build: suppress sig-handler.h's -Wcast-function-type warning
1408 * configure.ac (WERROR_CFLAGS): Add -Wno-cast-function-type
1409 to suppress warning about sig-handler.h's sa_handler_t cast:
1410 sig-handler.h: In function 'get_handler':
1411 sig-handler.h:47:12: error: cast between incompatible function\
1412 types from 'void (* const)(int, siginfo_t *, void *)'\
1413 {aka 'void (* const)(int, struct <anonymous> *, void *)'}\
1414 to 'void (*)(int)' [-Werror=cast-function-type]
1415 return (sa_handler_t) a->sa_sigaction;
1416
14172017-12-16 Jim Meyering <meyering@fb.com>
1418
1419 grep: diagnose stack overflow rather than segfaulting
1420 * bootstrap.conf (gnulib_modules): Add c-stack.
1421 * src/grep.c: Include "c-stack.h".
1422 (main): Call c_stack_action (NULL);
1423 * tests/stack-overflow: New file.
1424 * tests/Makefile.am (TESTS): Add name of new file.
1425 * NEWS (Improvements): Mention it.
1426 Interestingly, this bug does not afflict grep-2.5.4 or prior,
1427 so it appeared to have been introduced with grep-2.6. However,
1428 the origin is in glibc's regexp compiler, and I tracked it to
1429 stack-aware parsing that was removed from glibc's regexp in 2002.
1430 However, grep-2.5.4 was released in 2009. That version worked
1431 (and still works, now) because it included and (by default) used
1432 an old copy of glibc's regexp code.
1433 Jeremy Feusi reported the grep segfault in https://bugs.gnu.org/29666.
1434 I reported the glibc regexp bug in
1435 https://sourceware.org/bugzilla/show_bug.cgi?id=22620
1436
14372017-11-26 Stephan T. Lavavej <stl@nuwen.net>
1438
1439 grep: fix directory recursion on MS-Windows
1440 gnulib recently gained a module, windows-stat-inodes, that fixes
1441 directory recursion on MS-Windows. No changes to grep's C sources are
1442 required; grep simply needs to request the module during configuration.
1443
1444 When grep requests this module, its configure script will gain the
1445 behavior that was implemented in windows-stat-inodes.m4. This detects
1446 mingw and sets WINDOWS_STAT_INODES=1. All other platforms are
1447 unaffected, setting WINDOWS_STAT_INODES=0 (which is what's happening
1448 in the absence of this patch).
1449
1450 * bootstrap.conf (gnulib_modules): Add windows-stat-inodes.
1451 * NEWS (Bug fixes): Mention it.
1452 Thanks to Pär Björklund who diagnosed the problem as involving inodes,
1453 and thanks to Václav Haisman who provided the bootstrap.conf patch.
1454
14552017-11-25 Paul Eggert <eggert@cs.ucla.edu>
1456
1457 grep: port better to Adélie GNU/Linux 64-bit ppc
1458 Problem reported by A. Wilcox (Bug#29446).
1459 * src/pcresearch.c (PCRE_EXTRA_MATCH_LIMIT_RECURSION)
1460 (PCRE_STUDY_EXTRA_NEEDED): Default to 0.
1461 (jit_exec): If we run up against the recursion limit,
1462 double it (if possible) and try again.
1463 (Pcompile): Also specify PCRE_STUDY_EXTRA_NEEDED so that
1464 pc->extra is not null.
1465
14662017-11-03 Paul Eggert <eggert@cs.ucla.edu>
1467
1468 grep: omit a dup 'const'
1469 * src/grep.c (matchers): Omit duplicate 'const'.
1470
14712017-10-13 Bernhard Voelker <mail@bernhard-voelker.de>
1472
1473 doc: document the option delimiter '--'
1474 * doc/grep.texi (Other options): Do the above.
1475 Reported in https://lists.opensuse.org/opensuse/2017-03/msg00411.html
1476 This addresses http://bugs.gnu.org/26139
1477
14782017-08-21 Paul Eggert <eggert@cs.ucla.edu>
1479
1480 build: update gnulib submodule to latest
1481
1482 Pacify GCC 5.4
1483 * src/grep.c (grepdesc): Rework to pacify GCC 5.4 warning
1484 about logical not.
1485
14862017-08-20 Paul Eggert <eggert@cs.ucla.edu>
1487
1488 build: update gnulib submodule to latest
1489
14902017-08-17 Paul Eggert <eggert@cs.ucla.edu>
1491
1492 grep: -L exits with status 0 if a file is selected
1493 Problem reported by Anthony Sottile (Bug#28105).
1494 * NEWS, doc/grep.texi (Exit Status), src/grep.c (usage): Document this.
1495 * src/grep.c (grepdesc): Implement it.
1496 * tests/skip-read: Test it.
1497
1498 build: update gnulib submodule to latest
1499
15002017-08-13 Jim Meyering <meyering@fb.com>
1501
1502 maint: avoid newly-introduced syntax-check failure
1503 * src/grep.c (usage): Shorten --help line to 80, so
1504 "make syntax-check" passes once again.
1505
15062017-08-03 Paul Eggert <eggert@cs.ucla.edu>
1507
1508 doc: improve -o help
1509 * src/grep.c (usage): Document that -o outputs only nonempty
1510 matches (Bug#27931).
1511
15122017-07-26 Paul Eggert <eggert@cs.ucla.edu>
1513
1514 tests: add Bug#27838 test case
1515 * tests/backref-alt: New test case from a fuzzer.
1516
15172017-07-25 Paul Eggert <eggert@cs.ucla.edu>
1518
1519 doc: distinguish -w from \<...\>
1520 * doc/grep.texi (Matching Control):
1521 Give example of why -w differs from \<...\> (Bug#27813).
1522
15232017-07-11 Paul Eggert <eggert@cs.ucla.edu>
1524
1525 doc: define Dt string in man page
1526 Problem reported by Bjarni I. Gislason via Santiago R.R. (Bug#27651).
1527 * doc/grep.in.1 (dT): New macro.
1528 (Dt): Define this string.
1529
15302017-07-02 Jim Meyering <meyering@fb.com>
1531
1532 maint: post-release administrivia
1533 * NEWS: Add header line for next release.
1534 * .prev-version: Record previous version.
1535 * cfg.mk (old_NEWS_hash): Auto-update.
1536
1537 version 3.1
1538 * NEWS: Record release date.
1539
15402017-07-01 Jim Meyering <meyering@fb.com>
1541
1542 tests: avoid false failures when run in qemu user mode
1543 * tests/filename-lineno.pl: Derive the program name that grep
1544 will use in diagnostics, based on a suggestion from Assaf Gordon.
1545 * tests/in-eq-out-infloop: Similar: accept an arbitrary "command_name: "
1546 prefix on checked diagnostics, rather than requiring "grep: ".
1547 * tests/reversed-range-endpoints: Likewise.
1548 * tests/write-error-msg: Likewise.
1549 Reported by Bruno Haible in http://bugs.gnu.org/27532
1550
15512017-06-25 Jim Meyering <meyering@fb.com>
1552
1553 gnulib: update to latest
1554 * gnulib: Update to latest for these portability fixes:
1555 - stat: port to xlc 12.01
1556 - xalloc-oversized: port to icc
1557
1558 doc: fix another typo
1559 * doc/grep.texi (File and Directory Selection): Fix typo: s/afer/after/
1560
15612017-06-24 Jim Meyering <meyering@fb.com>
1562
1563 doc: stop calling --perl-regexp (-P) "highly" experimental
1564 Use wording that is less likely to make readers think that
1565 support for -P may be removed.
1566 * doc/grep.in.1: s/highly experimental/experimental/
1567 * doc/grep.texi: Likewise.
1568 Suggested by Evan Sheahan.
1569
15702017-06-21 Jim Meyering <meyering@fb.com>
1571
1572 doc: correct typo
1573 * doc/grep.texi (Performance): s/suprisingly/surprisingly/
1574
1575 gnulib: update to latest
1576
15772017-06-21 Paul Eggert <eggert@cs.ucla.edu>
1578
1579 grep: -m no longer cuts off trailing context
1580 Problem reported by Markus Jochim (Bug#26254).
1581 * NEWS, doc/grep.texi (General Output Control): Document this.
1582 * src/grep.c (prpending): Selected lines no longer cut off context.
1583 (usage): Say "selected" instead of "matching", where appropriate.
1584 * tests/foad1, tests/max-count-vs-context, tests/yesno:
1585 Adjust to match new behavior.
1586
15872017-05-31 Paul Eggert <eggert@cs.ucla.edu>
1588
1589 Document grep performance
1590 * doc/grep.texi (Performance): New section.
1591
1592 build: update gnulib submodule to latest
1593
15942017-05-21 Jim Meyering <meyering@fb.com>
1595
1596 maint: make the announcement template Cc the devel- list
1597 * cfg.mk (announcement_Cc_): Define.
1598
1599 gnulib: update to latest; and update tests/init.sh
1600
1601 maint: accommodate GCC7's -Werror=duplicated-branches
1602 * src/system.h (IGNORE_DUPLICATE_BRANCH_WARNING): Define.
1603 * src/grep.c (grepfile): Use it.
1604 * src/kwset.c (bmexec, acexec): Use it.
1605
1606 maint: update to work with GCC7's -Werror=implicit-fallthrough=
1607 * src/system.h (FALLTHROUGH): Define.
1608 * src/grep.c (context_length_arg): Use new FALLTHROUGH macro in place
1609 of comments
1610 (fgrep_to_grep_pattern, try_fgrep_pattern, main): Likewise.
1611
16122017-05-13 Jim Meyering <meyering@fb.com>
1613
1614 gnulib: update to latest and adapt src/kwset.c
1615 * gnulib: Update to latest.
1616 * src/kwset.c: Include "verify.h" for use of assume.
1617
16182017-03-22 Jim Meyering <meyering@fb.com>
1619
1620 gnulib: update to latest for dfa [0-9] performance improvement
1621 This pulls in the following change that is very relevant to grep:
1622
1623 commit 6afba02d7869d39ed7f61981045ddbdcb2814101
1624 Author: Paul Eggert <eggert@cs.ucla.edu>
1625 dfa: make [0-9] faster in non-C locales
1626
1627 * gnulib: Update to latest.
1628 * NEWS (Improvements): Describe the effect on grep.
1629
16302017-03-05 Jim Meyering <meyering@fb.com>
1631
1632 build: use $(builddir), not $(srcdir)
1633 * cfg.mk (PATH): Use $(builddir), so this also takes effect
1634 in a non-srcdir build. Also, switch ${PATH} syntax to $(PATH).
1635
16362017-03-05 Juan Manuel Guerrero <juan.guerrero@gmx.de>
1637
1638 build: use $(PATH_SEPARATOR), not ":" to augment PATH
1639 * cfg.mk (PATH): Use $(PATH_SEPARATOR), for those systems that
1640 use something other than ":".
1641 * THANKS.in: Remove name, to avoid syntax-check failure due to
1642 the duplicate, now that there is this commit.
1643
16442017-02-17 Jim Meyering <meyering@fb.com>
1645
1646 maint: fix distcheck failure: remove stale dosbuf.c reference
1647 * src/Makefile.am (EXTRA_DIST): Do not attempt to distribute
1648 the recently deleted file, dosbuf.c.
1649
1650 maint: fix new syntax-check errors
1651 * po/POTFILES.in: Add lib/xbinary-io.c.
1652 * cfg.mk (FILTER_LONG_LINES): Add TODO to the list of exempt files.
1653
16542017-02-16 Paul Eggert <eggert@cs.ucla.edu>
1655
1656 Fix up recent -U patches
1657 Inspired by a suggestion by Eric Blake (Bug#25707#17).
1658 * bootstrap.conf (gnulib_modules): Add xbinary-io,
1659 and remove binary-io and xfreopen.
1660 * doc/grep.texi (Other Options):
1661 Fix typo and reword to be a bit more general.
1662 * src/grep.c: Include xbinary-io.h instead of xfreopen.h.
1663 (grepfile): Open with O_BINARY if binary.
1664 (grepdesc): No need for set_binary_mode now.
1665 (grep_command_line_arg, main): Set stdin to binary mode if binary.
1666 (main): Avoid unnecessary test of stdin == NULL.
1667 Use xsetmode instead of xfreopen.
1668 * src/system.h: Do not include binary-io.h.
1669
1670 build: update gnulib submodule to latest
1671
1672 Simplify -U on MS-Windows by removing guesswork
1673 Suggested by Eric Blake (Bug#25707#11).
1674 * NEWS, doc/grep.texi: Document this.
1675 * src/dosbuf.c: Remove.
1676 * bootstrap.conf (gnulib_modules): Add xfreopen.
1677 * src/grep.c: Include xfreopen.h, not dosbuf.c.
1678 (fillbuf, print_line_head): Do not undossify input.
1679 (binary): New static var.
1680 (grepdesc): Apply BINARY to input file.
1681 (usage): Remove -u help.
1682 (main): Set BINARY if -U, and apply it to stdout. Do nothing if -u.
1683 With -f, apply BINARY to input file.
1684
16852017-02-16 Eric Blake <eblake@redhat.com>
1686
1687 grep: don't forcefully strip carriage returns
1688 Commit 5c92a54 made the mistaken assumption that using fopen("rt")
1689 on platforms where O_TEXT is non-zero makes sense. However, POSIX
1690 already requires fopen("r") to open a file in text mode, vs.
1691 fopen("rb") when binary mode is wanted, and at least on Cygwin,
1692 where it is possible to control whether a mount point is binary
1693 or text by default (using just "r"), the use of fopen("rt") actively
1694 breaks assumptions on a binary mount by silently corrupting any
1695 carriage returns that are supposed to be preserved.
1696
1697 * src/grep.c (main): Never use fopen("rt") (Bug#25707).
1698
16992017-02-13 Paul Eggert <eggert@cs.ucla.edu>
1700
1701 Update TODO and doc
1702 * TODO: Bring up-to-date and fix formatting glitches.
1703 * doc/grep.in.1, doc/grep.texi: Fix minor glitches.
1704 The above patches should address the same problems that recent
1705 Debian doc patches address, albeit in a different way.
1706
17072017-02-12 Paul Eggert <eggert@cs.ucla.edu>
1708
1709 doc: clarify default input (Bug#25651)
1710 * doc/grep.in.1:
1711 * src/grep.c (usage): Clarify default input when -r.
1712 * src/grep.c (usage): Do not bother documenting egrep and fgrep;
1713 the manual is enough.
1714
17152017-02-09 Jim Meyering <meyering@fb.com>
1716
1717 maint: post-release administrivia
1718 * NEWS: Add header line for next release.
1719 * .prev-version: Record previous version.
1720 * cfg.mk (old_NEWS_hash): Auto-update.
1721
1722 version 3.0
1723 * NEWS: Record release date.
1724
17252017-02-08 Paul Eggert <eggert@cs.ucla.edu>
1726
1727 grep: do not mishandle \. in multiple patterns
1728 Problem reported by Lars Wendler (Bug#25655).
1729 * NEWS: Document this.
1730 * src/grep.c (try_fgrep_pattern): Fix typo that prevented
1731 keys from being properly updated.
1732 * tests/foad1: Test for the bug.
1733
17342017-02-07 Paul Eggert <eggert@cs.ucla.edu>
1735
1736 Do not assume PCRE 8.20 or later
1737 Problem reported by Zube (Bug#25647)
1738 * NEWS: Document this.
1739 * src/pcresearch.c (struct pcre.com.jit_stack):
1740 Declare only if PCRE_STUDY_JIT_COMPILE.
1741
17422017-02-06 Jim Meyering <meyering@fb.com>
1743
1744 maint: post-release administrivia
1745 * NEWS: Add header line for next release.
1746 * .prev-version: Record previous version.
1747 * cfg.mk (old_NEWS_hash): Auto-update.
1748
1749 version 2.28
1750 * NEWS: Record release date.
1751
17522017-02-02 Jim Meyering <meyering@fb.com>
1753
1754 gnulib: update to latest
1755
17562017-02-01 Paul Eggert <eggert@cs.ucla.edu>
1757
1758 grep: tune to avoid memchr2 sometimes
1759 Problem noted by Norihiro Tanaka in:
1760 http://lists.gnu.org/archive/html/grep-devel/2017-01/msg00027.html
1761 Although not enough to restore all the previous performance in the
1762 case he noted, it helps significantly.
1763 * src/kwset.c (memchr_kwset): Bring back small_heuristic,
1764 in a somewhat different form.
1765
17662017-01-29 Jim Meyering <meyering@fb.com>
1767
1768 gnulib: update to latest
1769
17702017-01-23 Paul Eggert <eggert@cs.ucla.edu>
1771
1772 grep: simplify recent kwset change
1773 * src/kwset.c (acexec_trans): Simplify.
1774
17752017-01-23 Jim Meyering <meyering@fb.com>
1776
1777 tests: really add the new test name
1778 * tests/Makefile.am (TESTS): Add fgrep-longest.
1779
17802017-01-21 Norihiro Tanaka <noritnk@kcn.ne.jp>
1781
1782 grep -Fo could report a match that is not the longest
1783 * src/kwset.c (acexec): Fix it.
1784 * tests/fgrep-longest: New test.
1785 * tests/Makefile.am: Add the test.
1786 * NEWS: Mention it.
1787
17882017-01-18 Paul Eggert <eggert@cs.ucla.edu>
1789
1790 grep: speed up Aho-Corasick when at most 2 bytes
1791 When using Aho-Corasick and all matched strings either begin with
1792 the same byte, or begin with one of at most two bytes, use memchr2
1793 to search for these matching bytes and apply the Aho-Corasick
1794 algorithm only when a memchr2 match is found. On my platform,
1795 this speeds up 'grep -F -e aa -e ba in' by a factor of 7, where
1796 the file 'in' was created by 'seq -f %040.0f 10000000 >in'.
1797 * src/kwset.c (struct kwset.gc1): Now int, not char.
1798 If negative, there is no single terminal byte. All uses changed.
1799 (struct kwset.gc1help): Now int, not char.
1800 If negative, memchr2 cannot be used.
1801 (kwsprep): Set up gc1 and gc1help from kwset->next, with
1802 the new (slightly changed) interpretation.
1803 (memchr_kwset): Use memchr2 if possible.
1804 Adjust to match new meaning of gc1, gc1help.
1805 (memoff2_kwset): Remove; no longer needed.
1806 (acexec_trans): Use memchr_kwset when possible, for speed.
1807 It now supersedes memoff2_kwset.
1808
1809 grep: remove Commentz-Walter code
1810 This code was not being used, and complicated maintenance.
1811 We can bring it back from the repository if it turns out
1812 to be useful later.
1813 * src/kwset.c (struct kwset.reverse): Remove. All uses of
1814 FOO->reverse replaced by (FOO->kwsexec == bmexec).
1815 (kwsalloc): Remove 'reverse' arg, as callers outside this
1816 module do not care about algorithm choice. All callers changed.
1817 (kwsprep): When deciding whether to use Boyer-Moore, do not worry
1818 about being called twice on the same kwset, as that is not allowed.
1819 (cwexec): Remove; it was never called. All uses removed.
1820
18212017-01-17 Jim Meyering <meyering@fb.com>
1822
1823 maint: avoid new syntax-check failures
1824 * src/kwset.c (struct kwset): Split a line longer than 80.
1825 * bootstrap: Update from gnulib. This fixes a new syntax-check
1826 failure due to its use of "time stamp".
1827
18282017-01-17 Paul Eggert <eggert@cs.ucla.edu>
1829
1830 * NEWS: Fix typo.
1831
1832 * src/kwset.c: Fix comment typo.
1833
1834 Improve -i performance in typical UTF-8 searches
1835 Currently ‘grep -i i’ is slow in a UTF-8 locale, because ‘i’ in
1836 the pattern matches the two-byte character 'ı' (U+0131, LATIN
1837 SMALL LETTER DOTLESS I) in data, and kwset handles only
1838 single-byte character translations, so grep falls back on a slower
1839 DFA-based search for all searches. Improve -i performance in the
1840 typical case by using kwset when data are free of troublesome
1841 characters like 'ı', falling back on the DFA only when data
1842 contain troublesome characters.
1843 * src/dfasearch.c (GEAcompile):
1844 * src/grep.c (compile_fp_t):
1845 * src/kwsearch.c (Fcompile):
1846 * src/pcresearch.c (Pcompile):
1847 Pattern arg is now char *, not char const *, since Fcompile
1848 now reallocates it sometimes.
1849 * src/grep.c (all_single_byte_after_folding): Remove.
1850 All callers removed.
1851 (fgrep_icase_charlen): New function.
1852 (fgrep_icase_available, try_fgrep_pattern):
1853 Use it, for more-generous semantics.
1854 (fgrep_to_grep_pattern): Now extern.
1855 (main): Do not free keys, since Fexecute may use them.
1856 * src/kwsearch.c (struct kwsearch): New struct.
1857 (Fcompile): Return it. If -i, be more generous about patterns.
1858 (Fexecute): Use it. Fall back on DFA when the data contain
1859 troublesome characters; this should be rare in practice.
1860 * src/kwset.c, src/kwset.h (kwswords): New function.
1861
1862 build: update gnulib submodule to latest
1863
18642017-01-15 Paul Eggert <eggert@cs.ucla.edu>
1865
1866 dfa: prefer ptrdiff_t to size_t
1867 The code already cannot handle objects with size greater than
1868 SIZE_MAX / 2, so be more honest about it and use ptrdiff_t instead
1869 of size_t. ptrdiff_t arithmetic is signed, which allows for more
1870 checking via -fsanitize=undefined. It also makes the code a tad
1871 smaller on x86-64, since it can test for < 0 rather than for ==
1872 SIZE_MAX.
1873 * src/dfasearch.c (struct dfa_comp.kwset_exact_matches):
1874 (kwsmusts, EGexecute):
1875 * src/kwsearch.c (Fcompile, Fexecute):
1876 * src/kwset.c (struct kwset.kwsexec, kwsincr, memchr_kwset)
1877 (memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans)
1878 (acexec, kwsexec):
1879 * src/kwset.h (struct kwsmatch.index, .offset, .size):
1880 Prefer ptrdiff_t to size_t where either will do.
1881
18822017-01-11 Paul Eggert <eggert@cs.ucla.edu>
1883
1884 grep: improve comments, mostly in kwset
1885 Remove kwset.h comments that are obsolete and seemingly not
1886 maintained anyway; people can look in kwset.c instead.
1887 Update comments to reflect current behavior better.
1888 Cite Faro & Lecroq 2013. Use GNU style for end-of-sentence.
1889
18902017-01-01 Jim Meyering <meyering@fb.com>
1891
1892 maint: update gnulib and copyright dates for 2017
1893 * gnulib: Update to latest.
1894 * all files: Run "make update-copyright".
1895
18962016-12-31 Paul Eggert <eggert@cs.ucla.edu>
1897
1898 grep: speed up -x with many patterns
1899 * src/kwsearch.c (Fcompile): Improve buffer allocation overhead
1900 with -x and multiple patterns. In the common case where '\n' is
1901 the end-of-line byte, avoid copying other than the first and last
1902 patterns.
1903
19042016-12-31 Jim Meyering <meyering@fb.com>
1905
1906 gnulib: update to latest, fixing a parallel getopt test failure
1907
19082016-12-29 Paul Eggert <eggert@cs.ucla.edu>
1909
1910 maint: space before paren
1911
1912 grep: int cleanup in kwset.c
1913 This should affect only theoretical bugs with very large inputs.
1914 On my platform, this patch shrinks the grep text by 136 bytes.
1915 * src/kwset.c: Include intprops.h, for INT_MULTIPLY_WRAPV.
1916 (struct trie, struct kwset, kwsalloc, kwsincr, treedelta, kwsprep)
1917 (bm_delta2_search, bmexec_trans, cwexec): Prefer ptrdiff_t to int
1918 when counts can exceed INT_MAX in large inputs, at least in theory.
1919 (hasevery): Use bool for booleans.
1920 (bmexec_trans): Avoid undefined behavior on integer overflow.
1921
19222016-12-27 Norihiro Tanaka <noritnk@kcn.ne.jp>
1923
1924 grep: improve performance with multiple patterns
1925 * src/grep.c (main): Avoid fgrep-to-grep conversion for word matching
1926 with multiple patterns in single byte locales.
1927
19282016-12-27 Paul Eggert <eggert@cs.ucla.edu>
1929
1930 * NEWS: Fix typo.
1931
1932 grep: fix bug with '... | grep pat >> /dev/null'
1933 Problem reported by Benno Fünfstück (Bug#25283).
1934 * NEWS: Document this.
1935 * src/grep.c (drain_input) [SPLICE_F_MOVE]:
1936 Don't assume /dev/null is always acceptable output to splice.
1937 * tests/grep-dev-null-out: Test for the bug.
1938
19392016-12-26 Paul Eggert <eggert@cs.ucla.edu>
1940
1941 grep: minor performance tweak for pure functions
1942 * src/search.h (wordchars_size, wordchar_next, wordchar_prev):
1943 Declare to be pure.
1944
19452016-12-25 Zev Weiss <zev@bewilderbeest.net>
1946
1947 grep: move localeinfo to grep.c
1948 It's not really dfasearch-specific, and grep.c initializes it, so it
1949 seems like the most appropriate "owner".
1950
1951 * src/dfasearch.c (localeinfo): Remove.
1952 * src/grep.c (localeinfo): Add.
1953 * src/search.h (localeinfo): Move to new commented section.
1954
19552016-12-25 Zev Weiss <zev@bewilderbeest.net>
1956
1957 pcresearch: thread safety
1958 * src/pcresearch.c (pcre_comp): New struct to hold previously-global
1959 state.
1960 (jit_exec): Operate on a pcre_comp parameter instead of global state.
1961 (Pcompile): Allocate and return a pcre_comp instead of setting global
1962 variables.
1963 (Pexecute): Operate on a pcre_comp parameter instead of global state.
1964
1965 kwsearch: thread safety
1966 * src/kwsearch.c (Fcompile): Return a kwset_t instead of setting a
1967 global variable.
1968 (Fexecute): Use a passed-in kwset_t instead of a global variable.
1969 (kwset): Remove global variable.
1970
1971 dfasearch: thread safety
1972 * src/dfasearch.c (struct dfa_comp): New struct to hold
1973 previously-global variables.
1974 (dfawarn): Remove static variable.
1975 (kwsmusts): Operate on a dfa_comp parameter instead of global
1976 variables.
1977 (GEAcompile): Allocate and return a dfa_comp struct instead of setting
1978 global variables.
1979 (EGexecute): Operate on a dfa_comp parameter instead of global
1980 variables.
1981 * src/searchutils.c (kwsinit): Replace a static array with a
1982 dynamically-allocated one.
1983
19842016-12-25 Zev Weiss <zev@bewilderbeest.net>
1985
1986 grep: prepare search backends for thread-safety
1987 To facilitate removing mutable global state from search backends,
1988 compile() functions will return an opaque pointer to backend-specific
1989 data, which must then be passed back into the corresponding execute()
1990 function. This is merely a preparatory step changing function
1991 signatures and call sites, so the pointers passed & returned are
1992 dummies for now and not (yet) actually used.
1993
1994 * src/grep.c (compile_fp_t): Now returns an opaque pointer (the
1995 compiled pattern).
1996 (execute_fp_t): Now passed the pointer returned by a compile_fp_t.
1997 All call sites updated accordingly.
1998 (compiled_pattern): New static variable.
1999 * src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL).
2000 (EGexecute): Receive a void pointer argument (unused).
2001 * src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL).
2002 (Fexecute): Receive a void pointer argument (unused).
2003 * src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL).
2004 (Pexecute): Receive a void pointer argument (unused).
2005 * src/search.h: Update compile/execute function prototypes.
2006
20072016-12-24 Jim Meyering <meyering@fb.com>
2008
2009 maint: fix "syntax-check" failure
2010 * src/grep.c (SEP_STR_GROUP): Declare "static".
2011
20122016-12-23 Paul Eggert <eggert@cs.ucla.edu>
2013
2014 grep: fix comment in searchutils.c
2015
2016 grep: improve word checking with UTF-8
2017 * src/searchutils.c: Do not include <verify.h>.
2018 (word_start): Remove, replacing with ...
2019 (sbwordchar): New static var. All uses changed.
2020 (wordchar_prev): Return size_t, not bool, as this generates
2021 slightly better code. Go back faster if UTF-8.
2022
2023 grep: standardize on localeinfo.multibyte
2024 * src/dfasearch.c (EGexecute):
2025 * src/grep.c (main):
2026 * src/kwsearch.c (Fexecute):
2027 * src/pcresearch.c (Pcompile):
2028 Prefer localeinfo.multibyte to (MB_CUR_MAX > 1).
2029
2030 grep: speed up -wf in C locale
2031 Problem reported by Norihiro Tanaka (Bug#22357#100).
2032 This patch improves the performance on that benchmark on my
2033 platform so that grep is now only about 2x slower than grep 2.26,
2034 which means it is considerably faster than grep 2.25 and earlier.
2035 * src/kwsearch.c (Fexecute):
2036 Use wordchars_size to boost performance for this case.
2037 * src/search.h, src/searchutils.c (wordchars_size): New function.
2038
2039 grep: specialize word-finding functions
2040 This improves performance a bit.
2041 * src/dfasearch.c, src/kwsearch.c (wordchar):
2042 Remove; now in searchutils.c.
2043 * src/grep.c (main): Call wordinit if -w.
2044 * src/search.h: Adjust.
2045 * src/searchutils.c: Include verify.h.
2046 (word_start): New static var.
2047 (wordchar): Move here from dfasearch.c and kwsearch.c.
2048 (wordinit, wordchars_count, wordchar_next, wordchar_prev):
2049 New functions.
2050 (mb_prev_wc, mb_next_wc): Remove.
2051 All callers changed to use the new functions instead.
2052
2053 grep: simplify Fexecute
2054 * src/kwsearch.c (Fexecute): Avoid the need for a 'try' local or
2055 for a 'goto success'. Update mb_start to reflect newline found.
2056
2057 grep: remove C label
2058 * src/kwsearch.c (Fexecute): Remove label.
2059
2060 maint: rewrite to avoid some macros
2061 These days, the dangerous powers of C macros are not needed if
2062 constants or functions will do just as well.
2063 * src/grep.c (SEP_CHAR_SELECTED, SEP_CHAR_REJECTED, SEP_STR_GROUP)
2064 (INITIAL_BUFSIZE):
2065 * src/kwset.c (DEPTH_SIZE):
2066 Now constants, not macros.
2067 * src/kwset.c (link): Remove macro. Instead, rename local vars
2068 from 'link' to 'cur'.
2069 (malloc) [GREP]: Remove macro. All uses of malloc changed to xmalloc.
2070 Omit double-inclusion of xalloc.h. Do not depend on 'GREP'.
2071 (U): Now a function, not a macro.
2072 * src/kwset.c, src/searchutils.c (NCHAR): Move this macro to ...
2073 * src/system.h: ... here, and make it a constant.
2074
20752016-12-20 Paul Eggert <eggert@cs.ucla.edu>
2076
2077 grep: fix performance with multiple patterns
2078 Problem reported by Jaroslav Skarvada (Bug#22357).
2079 * NEWS: Document this and other recent performance fixes.
2080 * src/grep.c (E_MATCHER_INDEX): New constant.
2081 (all_single_byte_after_folding):
2082 New function, split out from fgrep_icase_available.
2083 (fgrep_icase_available): Use it.
2084 (try_fgrep_pattern): New function, which also uses it.
2085 (main): With two or more patterns, use try_fgrep_pattern to fix
2086 performance regression. The number "two" here is just a heuristic.
2087
2088 grep: simplify matcher configuration
2089 * src/grep.c (matcher, compile): Remove static vars.
2090 (compile_fp_t): Now takes a 3rd syntax argument.
2091 (Gcomppile, Ecompile, Acompile, GAcompile, PAcompile): Remove.
2092 (struct matcher): Now nameless, since it is used only once.
2093 Make 'name' a bit shorter. New member 'syntax'.
2094 (matchers): Initialize it, and change removed functions to GEAcompile.
2095 (F_MATCHER_INDEX, G_MATCHER_INDEX): New constants.
2096 (setmatcher): New arg MATCHER, and return new matcher index.
2097 Avoid unnecessary call to strcmp.
2098 (main): Keep matcher as a local int, not a global pointer.
2099 * src/kwsearch.c (Fcompile):
2100 * src/pcresearch.c (Pcompile): Ignore the 3rd syntax argument.
2101
2102 grep: simplify line counting in patterns
2103 * src/grep.c (n_patterns): Rename from patfile_lineno,
2104 as it is now origin-zero. Now size_t, not uintmax_t.
2105 (count_nl_bytes, fl_add): Simplify to just buffer and size.
2106 All callers changed.
2107
21082016-12-19 Paul Eggert <eggert@cs.ucla.edu>
2109
2110 build: update gnulib submodule to latest
2111
21122016-12-18 Paul Eggert <eggert@cs.ucla.edu>
2113
2114 build: update gnulib submodule to latest
2115
2116 build: update gnulib submodule to latest
2117
21182016-12-13 Jim Meyering <meyering@fb.com>
2119
2120 tests: use just-built grep in more places
2121 * cfg.mk (PATH): Prepend $(srcdir)/src, so that we use the just-
2122 built grep also when running commands like those of "make distcheck".
2123 This would have avoided the recently-luckily-noticed infloop bug.
2124 Tested by running this in a just-built directory:
2125 f=src/grep; printf '%s\n' '#!/bin/sh' 'sleep 9h' > $f; chmod a+x $f
2126 and then verifying that nearly every "make syntax-check" rule hangs.
2127
2128 maint: tell "syntax-check" not to worry about the NEWS update
2129 Whenever we change "old" NEWS, we have to update this checksum.
2130 Otherwise, a "make syntax-check" test that guards against a class
2131 of logical merge conflicts will fail.
2132 * cfg.mk (old_NEWS_hash): Update this hash to accommodate the
2133 recent clarification of a 2.27 NEWS entry.
2134
21352016-12-13 Arnold D. Robbins <arnold@skeeve.com>
2136
2137 build: update gnulib submodule to latest
2138 * src/dfasearch.c (GEAcompile): Remove use of flag, RE_ICASE covers it.
2139
21402016-12-12 Paul Eggert <eggert@cs.ucla.edu>
2141
2142 grep: work around proc lseek glitch
2143 Problem reported by Andreas Schwab (Bug#25180).
2144 * NEWS: Document this.
2145 * src/grep.c (finalize_input): Ignore EINVAL lseek failures.
2146 * tests/Makefile.am (TESTS): Add proc.
2147 * tests/proc: New file.
2148
21492016-12-07 Paul Eggert <eggert@cs.ucla.edu>
2150
2151 grep: simplify finalize_input
2152 * src/grep.c (finalize_input): Simplify without changing behavior.
2153 It's still a bit of a rat's-nest, but it's a cozier rat's-nest.
2154
2155 maint: clarify early-exit news for 2.27
2156 * NEWS: Mention early-exit options to avoid confusion. See:
2157 http://lists.gnu.org/archive/html/grep-devel/2016-12/msg00007.html
2158
21592016-12-06 Jim Meyering <meyering@fb.com>
2160
2161 maint: post-release administrivia
2162 * NEWS: Add header line for next release.
2163 * .prev-version: Record previous version.
2164 * cfg.mk (old_NEWS_hash): Auto-update.
2165
2166 version 2.27
2167 * NEWS: Record release date.
2168
21692016-11-29 Jim Meyering <meyering@fb.com>
2170
2171 grep: fix DFA-induced infloop
2172 * gnulib: Update to latest, for the DFA infloop fix.
2173 * tests/dfa-infloop: New test, to trigger an infinite loop
2174 in the DFA matcher.
2175 * tests/Makefile.am (TESTS): Add it.
2176
21772016-11-28 Jim Meyering <meyering@fb.com>
2178
2179 tests: use "returns_ N env VAR=val ..."
2180 rather than "VAR=val returns_ N ..."
2181 Some shells do not propagate envvar settings through our use
2182 of the "returns_" function, so set any envvar via use of "env".
2183 This was an issue at least on Ubuntu and Debian-based systems,
2184 presumably due to their common use of "dash" as /bin/sh.
2185 Reported by Assaf Gordon.
2186 * tests/char-class-multibyte: As above.
2187 * tests/euc-mb: Likewise.
2188 * tests/false-match-mb-non-utf8: Likewise.
2189 * tests/pcre-infloop: Likewise.
2190 * tests/pcre-jitstack: Likewise.
2191 * tests/sjis-mb: Likewise.
2192 * tests/warn-char-classes: Likewise.
2193
21942016-11-28 Paul Eggert <eggert@cs.ucla.edu>
2195
2196 tests: revert check for unibyte French range bug
2197 The test wasn't portable, as it assumed that rational ranges
2198 were not in effect. Problem reported by Eric Blake (Bug#25048#8).
2199 There doesn't seem to be a portable way to do the test, so omit it.
2200 * tests/init.cfg, tests/unibyte-bracket-expr:
2201 Revert previous change.
2202
2203 build: update gnulib submodule to latest
2204
22052016-11-27 Jim Meyering <meyering@fb.com>
2206
2207 grep: avoid false matches in non-UTF8 multibyte locales
2208 * gnulib: Update to latest, for the dfa.c fix.
2209 * NEWS (Bug fixes): Mention it.
2210 * tests/false-match-mb-non-utf8: New file, with tests for this.
2211 Based on tests from Stephane Chazelas.
2212 * tests/Makefile.am (TESTS): Add it.
2213 Introduced by commit v2.18-54-g3ef4c8e, a change that made grep use
2214 its DFA matcher more aggressively. The malfunction arises only with
2215 the DFA matcher, not with regex.
2216 Reported by Stephane Chazelas in https://bugs.gnu.org/24975
2217
22182016-11-20 Paul Eggert <eggert@cs.ucla.edu>
2219
2220 tests: check for unibyte French range bug
2221 Problem reported by Stephane Chazelas (Bug#24973).
2222 This bug was fixed in Gnulib.
2223 * NEWS: Document the fix.
2224 * tests/init.cfg (require_ru_RU_koi8_r): Remove.
2225 * tests/unibyte-bracket-expr: Add a test for the bug.
2226 Call get-mb-cur-max directly instead of bothering with
2227 require_ru_RU_koi8_r.
2228
2229 build: update gnulib submodule to latest
2230
22312016-11-19 Paul Eggert <eggert@cs.ucla.edu>
2232
2233 grep: further -P performance fix
2234 Problem reported by Stephane Chazelas in:
2235 http://bugs.gnu.org/22655#103
2236 * src/pcresearch.c (Pexecute): Set the subject to the start of
2237 each line as it is found.
2238
2239 grep: -P no longer uses PCRE_MULTILINE
2240 This reverts commit f6603c4e1e04dbb87a7232c4b44acc6afdf65fef,
2241 as the extra performance is not worth the trouble for PCRE users.
2242 Problem reported by Stephane Chazelas in:
2243 http://bugs.gnu.org/22655#103
2244 * NEWS: Document this and the next patch.
2245 * src/dfasearch.c (EGexecute):
2246 * src/grep.c (execute_fp_t):
2247 * src/kwsearch.c (Fexecute):
2248 * src/pcresearch.c (Pexecute):
2249 First arg is now a const pointer again.
2250 * src/grep.c (buf_has_encoding_errors): Now static.
2251 * src/grep.h (buf_has_encoding_errors): Remove decl.
2252 * src/search.h: Adjust decls.
2253 * src/pcresearch.c (reflags): Remove. All uses removed.
2254 (Pcompile, Pexecute): Do not use PCRE_MULTILINE.
2255
22562016-11-19 Jim Meyering <meyering@fb.com>
2257
2258 doc: fix a doubled "the"
2259 * doc/grep.texi (--perl-regexp): s/the\nthe/the/
2260
22612016-11-19 Paul Eggert <eggert@cs.ucla.edu>
2262
2263 grep: fix -zxP bug
2264 * NEWS: Document this.
2265 * src/pcresearch.c (Pcompile): Search a line at a time if -x is
2266 used, since -x uses ^ and $.
2267 * tests/pcre: Test this.
2268
2269 grep: simplify by using PRIuMAX
2270 * configure.ac (HAVE_PRINTF_C99_SIZES): Remove; no longer needed.
2271 * src/grep.c (print_offset): Simplify (Bug#24451).
2272
2273 grep: -T now adjusts number widths for worst case
2274 * NEWS, doc/grep.texi (Output Line Prefix Control):
2275 Document this (Bug#24451).
2276 * src/grep.c (offset_width): New static var.
2277 (print_offset): Use it instead of arg. All callers changed.
2278 (grep): Set it.
2279 * tests/initial-tab: Test this.
2280
2281 grep: -T no longer outputs BS
2282 * NEWS: Document this (Bug#24451).
2283 * src/grep.c (print_line_head): Do not attempt to backspace output.
2284 * tests/initial-tab: New test.
2285 * tests/Makefile.am (TESTS): Add it.
2286
2287 grep: document -oz better
2288 * doc/grep.texi (General Output Control, Usage): Tweak (Bug#24961).
2289
2290 grep: fix performance typo with -P
2291 Reported by Zev Weiss in: http://bugs.gnu.org/22655#88
2292 * src/pcresearch.c (Pcompile): Initialize reflags.
2293
2294 tests: use "returns_" rather than "$?"
2295 * tests/grep-dev-null-out: Use "returns_ 124" rather than testing
2296 $? = 124.
2297
2298 grep -f /dev/null -L PAT FILE outputs FILE
2299 * NEWS: Document this.
2300 * src/grep.c (main): Do not exit right away with -L.
2301 * tests/skip-read: Test for the fix.
2302
2303 grep: tune -f /dev/null
2304 * src/grep.c (main): Do the -f /dev/null early-exit checks before
2305 more-expensive tests that involve syscalls.
2306
2307 grep: treat -f /dev/null like -m0
2308 * NEWS: Document this.
2309 * src/grep.c (main): With -f /dev/null, don't bother to read the
2310 input. This is what FreeBSD grep does.
2311 * tests/Makefile.am (TESTS): Add skip-read.
2312 * tests/skip-read: New file.
2313
2314 grep: avoid O(N**2) buffer reallocation
2315 * src/grep.c (main): Use x2realloc to avoid O(N**2) performance as
2316 pattern buffers grow.
2317
2318 grep: avoid unnecessary gettext call
2319 Translate "(standard input)" lazily.
2320 * src/grep.c (input_filename): New function.
2321 (suppressible_error): Remove 1st arg, since it is always
2322 input_filename (). All callers changed.
2323 (suppressible_error, print_filename, grep, grepdesc): Use it.
2324 (grep_command_line_arg): Set filename to NULL if standard
2325 input has no label. Often, this avoids all calls to gettext,
2326 which can be a win as the first call can be expensive.
2327
2328 grep: drain the input pipe faster
2329 * src/grep.c (dev_null_output): Now static.
2330 (drain_input): New function, using 'splice' if that makes sense.
2331 (finalize_input): Use it.
2332 (main): Omit now-unnecessary initialization.
2333
2334 grep: scale back /dev/null speedup
2335 The performance improvement when output is /dev/null (commit
2336 af6af288eac28951b5eee1eaaf373e22b2193b7b dated 2016-05-01)
2337 breaks scripts that run "PROGRAM | grep PATTERN >/dev/null"
2338 where PROGRAM dies when writing into a broken pipe.
2339 Suppress the improvement if standard input is not seekable.
2340 Problem reported by Gary Johnson (Bug#24941).
2341 * NEWS: Document this.
2342 * src/grep.c (seek_failed): New static var.
2343 (seek_data_failed): Move decl earlier, to be next to seek_failed.
2344 (file_must_have_nulls): Skip useless syscalls if seek_failed.
2345 Lessen source-code nesting.
2346 (reset): Set seek_failed and seek_data_failed.
2347 Try lseek even on non-regular files.
2348 (grep): New arg INEOF. All callers changed.
2349 Do not clear seek_data_failed here, since 'reset' now does this.
2350 (finalize_input): New static function.
2351 (grepdesc): Use it.
2352 (main): Do not exit on first match merely because output is
2353 /dev/null.
2354 * tests/grep-dev-null-out: Adjust to new behavior.
2355
2356 grep: improve diagnostic on lseek failure
2357 * src/grep.c (reset): Mention the file name in the (unlikely)
2358 chance of an lseek failure.
2359
2360 grep: avoid unnecessary isatty calls
2361 This fixes an inefficiency that was mistakenly introduced a while
2362 back, when the macro SET_BINARY became defined on all platforms.
2363 * src/grep.c (grepdesc, main): Do not unecessarily call isatty on
2364 POSIXish platforms.
2365
2366 grep: -Pz no longer rejects ^, $
2367 Problem reported by Stephane Chazelas (Bug#22655).
2368 * NEWS: Document this.
2369 * doc/grep.texi (grep Programs): Warn about -Pz.
2370 * src/pcresearch.c (reflags): New static var.
2371 (multibyte_locale): Remove static var; now local to Pcompile.
2372 (Pcompile): Check for (? and (* too. Set reflags instead of
2373 dying when problematic operators are found.
2374 (Pexecute): Use reflags to decide whether searches should
2375 be multiline.
2376 * tests/pcre: Test new behavior.
2377
23782016-11-14 Jim Meyering <meyering@fb.com>
2379
2380 tests: use "returns_" rather than explicit comparison with "$?"
2381 * tests/sjis-mb (encode): Rearrange to emit desired input into
2382 a file, rather than piping directly into grep. That permits
2383 the use of returns_ 1 to verify timeout's exit status.
2384 * tests/euc-mb: Use "returns_ 1" rather than testing $? = 1
2385 * tests/char-class-multibyte: Likewise.
2386 * tests/dfa-heap-overrun: Likewise.
2387 * tests/encoding-error: Likewise.
2388 * tests/fedora: Likewise.
2389 * tests/grep-dev-null: Likewise.
2390 * tests/init.cfg (envvar_check_fail): Likewise.
2391 * tests/kwset-abuse: Likewise.
2392 * tests/mb-non-UTF8-overrun: Likewise.
2393 * tests/multibyte-white-space: Likewise.
2394 * tests/pcre-infloop: Likewise.
2395 * tests/surrogate-pair: Likewise.
2396 * tests/warn-char-classes: Likewise.
2397 Do the same for other values:
2398 * tests/backref-multibyte-slow: Likewise.
2399 * tests/euc-mb: Likewise.
2400 * tests/pcre-abort: Likewise.
2401 * tests/pcre-jitstack: Likewise.
2402 * tests/repetition-overflow: Likewise.
2403 * tests/reversed-range-endpoints: Likewise.
2404 * tests/warn-char-classes: Likewise.
2405
24062016-10-26 Jim Meyering <meyering@fb.com>
2407
2408 doc: grep builds on HP-UX once again
2409 * NEWS (Bug fixes): Mention the HP-UX fix.
2410
2411 gnulib: update to latest, for getprogname HPUX port
2412
24132016-10-22 Mark Veltzer <mark.veltzer@gmail.com>
2414
2415 ignore coverage generated files
2416
2417 ignore ar-lib in build-aux
2418
24192016-10-20 Zev Weiss <zev@bewilderbeest.net>
2420
2421 grep: use 'j' intmax_t printf length modifier if supported
2422 * configure.ac: Use gl_PRINTF_SIZES_C99 to test printf and
2423 (conditionally) define HAVE_PRINTF_C99_SIZES.
2424 * src/grep.c (print_offset): Use printf("%j...") for printing
2425 [u]intmax_t if HAVE_PRINTF_C99_SIZES is defined; otherwise continue
2426 using the existing hand-rolled loop.
2427
24282016-10-15 Jim Meyering <meyering@fb.com>
2429
2430 build: distribute new file, die.h, so "make distcheck" passes
2431 * src/Makefile.am (grep_SOURCES): Add die.h.
2432 Also, sort these file names.
2433
24342016-10-10 Paul Eggert <eggert@cs.ucla.edu>
2435
2436 build: update gnulib submodule to latest
2437
24382016-10-09 Jim Meyering <meyering@fb.com>
2439
2440 maint: die.h: add the "#define ..." part of double inclusion guard
2441 * src/die.h (DIE_H): Define to 1.
2442
24432016-10-04 Paul Eggert <eggert@cs.ucla.edu>
2444
2445 grep: don't assume stdbool.h before die call
2446 * src/die.h: Include stdbool.h, since 'die' uses 'false'
2447
2448 grep: die more systematically
2449 * src/die.h: New file.
2450 * src/dfasearch.c, src/grep.c, src/pcresearch.c: Include die.h.
2451 * src/dfasearch.c (dfaerror):
2452 * src/grep.c (context_length_arg, add_count, prline, setmatcher, main):
2453 * src/pcresearch.c (jit_exec, Pcompile, Pexecute):
2454 Use 'die' instead of 'error' when exiting.
2455 * src/pcresearch.c: Do not include verify.h.
2456 (die): Remove; now in die.h.
2457 * src/search.h: Do not include error.h here, since this file does
2458 not use anything defined in error.h. Instead, dfasearch.c, which
2459 uses error.h's symbols, now includes error.h directly.
2460
24612016-10-02 Jim Meyering <meyering@fb.com>
2462
2463 maint: post-release administrivia
2464 * NEWS: Add header line for next release.
2465 * .prev-version: Record previous version.
2466 * cfg.mk (old_NEWS_hash): Auto-update.
2467
2468 version 2.26
2469 * NEWS: Record release date.
2470
24712016-10-01 Jim Meyering <meyering@fb.com>
2472
2473 gnulib: update to latest; for getprogname fix
2474
24752016-10-01 Paul Eggert <eggert@cs.ucla.edu>
2476
2477 tests/grep-dir: port to Solaris 10
2478 * tests/grep-dir: Port to Solaris 10 'cat', which
2479 exits with status 0 even after 'read' fails from a directory.
2480
24812016-09-28 Jim Meyering <meyering@fb.com>
2482
2483 build: placate GCC 7's -Wimplicit-fallthrough
2484 * src/pcresearch.c (die): New macro.
2485 (Pexecute): Use it in place of offending uses of error,
2486 to placate GCC 7's -Wimplicit-fallthrough.
2487 Include verify.h. Since this is grep's first explicit use of this
2488 gnulib module, ...
2489 * bootstrap.conf (gnulib_modules): Add verify.
2490
2491 gnulib: update to latest; for ...
2492 This includes the following:
2493 - a getprogname-vs-openbsd-5.1 portability fix
2494 - "fallthru" comment-adding changes for dfa and unistr/u8-uctomb-aux.c
2495 - another getprograme fix to avoid breaking newer glibc
2496
24972016-09-27 Paul Eggert <eggert@cs.ucla.edu>
2498
2499 build: reword .git old-GCC warning
2500 * configure.ac (gl_gcc_warnings): Reword diagnostic.
2501 Suggested by Assaf Gordon in:
2502 http://lists.gnu.org/archive/html/grep-devel/2016-09/msg00024.html
2503
2504 build: port .git builds to newer GCC
2505 * configure.ac (gl_gcc_warnings): Omit duplicate copy of 'main'.
2506 Problem reported by Assaf Gordon in:
2507 http://lists.gnu.org/archive/html/grep-devel/2016-09/msg00024.html
2508
2509 build: port .git builds to older GCC
2510 Problem reported by Dagobert Michelsen in:
2511 http://lists.gnu.org/archive/html/grep-devel/2016-09/msg00018.html
2512 * configure.ac (gl_gcc_warnings): Default to false if .git
2513 exists but GCC is too old.
2514
25152016-09-27 Jim Meyering <meyering@fb.com>
2516
2517 tests/long-pattern-perf: avoid false-failure due to cache speed
2518 * tests/long-pattern-perf: This test would fail semi-consistently
2519 on some systems, probably because the smaller regexp fit well
2520 within cache, yet the larger one did not. In that case, there
2521 was a relative speed difference greater than 20x and the test
2522 would fail. Quadruple the sizes, to make that less likely.
2523 Also, construct the 10x larger regexp directly from the smaller,
2524 rather than relying on seq with endpoints to induce that
2525 approximate size ratio. Reported by Bruce Dubbs in
2526 https://lists.gnu.org/archive/html/grep-devel/2016-09/msg00013.html
2527
25282016-09-24 Jim Meyering <meyering@fb.com>
2529
2530 build: avoid "./configure && make dist" missing-dep. failure
2531 * Makefile.am (run-syntax-check): Depend on "all", to avoid a
2532 parallel build failure due to a missing dependency. Reported by
2533 Paul Eggert in https://bugs.gnu.org/24256#50
2534
25352016-09-24 Paul Eggert <eggert@cs.ucla.edu>
2536
2537 build: update gnulib submodule to latest
2538
25392016-09-24 Jim Meyering <meyering@fb.com>
2540
2541 tests/fmbtest: avoid false-failure due to reliance on MB-correct sed
2542 * tests/fmbtest: Several of these tests would mistakenly fail due to
2543 postprocessing with a combination of sed and locale support that failed
2544 to handle some multibyte characters in the cs_CZ.UTF-8 locale. Instead
2545 of relying on sed's multibyte support or anything locale-related to
2546 perform this simple filtering, just use this: tr -cs '0-9' '[ *]'
2547 Also, rather than exporting LC_ALL, just set it for each command.
2548 Reported by Nelson H. F. Beebe.
2549 https://bugs.gnu.org/24534
2550
2551 tests: revamp multibyte-white-space test to be more permissive
2552 This test elicits too many failures. Whether a system has accurate
2553 unicode "whitespace" attributes should not influence whether grep's
2554 test suite passes. In many cases, now you will see a warning that
2555 some multibyte characters do not pass whitespace-related tests, but
2556 this test no longer fails. However, if you run this test on a modern
2557 enough system, it does require that \s and \S do work properly with
2558 most of the listed characters.
2559 * tests/multibyte-white-space: Confirm that Fedora 24's locale
2560 tables still declare those four Unicode code points *not* whitespace.
2561 Honor a new column telling how to handle failure. Provide more
2562 information in each diagnostic.
2563 Reported by Nelson H. F. Beebe.
2564 https://bugs.gnu.org/24530
2565
2566 tests: avoid erroneous failure of pcre-jitstack test
2567 On some systems (*BSD), 'ulimit -s unlimited' would fail, yet the
2568 test for that mistakenly masked the failure, so the following grep
2569 command ended up failing with a segfault.
2570 * tests/pcre-jitstack: Don't mask the ulimit failure.
2571 Reported privately by Nelson H. F. Beebe.
2572 https://bugs.gnu.org/24524
2573
25742016-09-23 Jim Meyering <meyering@fb.com>
2575
2576 grep: avoid unwarranted "input file 'F' is also the output" on *BSD
2577 On *BSD systems, any command like "echo y | grep x", where grep reads
2578 from a pipe and writes to standard output, would mistakenly emit this:
2579 grep: input file '(standard input)' is also the output
2580 * src/grep.c (grepdesc): Ensure that the file descriptor we're
2581 reading is a regular one before using SAME_INODE to test whether
2582 it is the same as the descriptor open on standard output.
2583 Nelson Beebe reported privately that the foad1 tests failed on many
2584 BSD systems. Exposed by commit v2.25-2-gaf6af28.
2585 https://bugs.gnu.org/24522
2586
2587 tests: avoid backref-multibyte-slow false failure
2588 * tests/backref-multibyte-slow (max_seconds): If we calculate
2589 a max duration of 1 second, use 5. Otherwise, on high-latency
2590 systems, it would be way too easy for the duration of the final
2591 test run to exceed that limit. Reported by Nelson H. F. Beebe.
2592 http://bugs.gnu.org/24516
2593
25942016-09-22 Jim Meyering <meyering@fb.com>
2595
2596 gnulib: update to latest; for getprogname-vs-AIX fix
2597
25982016-09-18 Norihiro Tanaka <noritnk@kcn.ne.jp>
2599
2600 grep: add news entry for fix to bug#24233
2601 * NEWS (Bug fixes): Add an entry describing bug#24233.
2602 The bug was fixed by commit v2.25-77-gad468bb, by chance.
2603
26042016-09-15 Jim Meyering <meyering@fb.com>
2605
2606 gnulib: update to latest
2607
26082016-09-10 Jim Meyering <meyering@fb.com>
2609
2610 dfa: reflect move of grep's DFA code to gnulib
2611 Now that the core DFA code and tests reside in gnulib,
2612 remove the copies here and use what gnulib provides.
2613 * bootstrap.conf: Use the dfa module.
2614 * cfg.mk: Remove settings involving files that have moved.
2615 (_gl_TS_unmarked_extern_functions): Add dfaerror and dfawarn.
2616 It is wrong/ugly to have to define these global symbols to use
2617 the dfa module, but we'll adjust that separately.
2618 * po/POTFILES.in: Apply s/src/lib/ to src/dfa.c.
2619 * src/Makefile.am: Remove mention of dfa.[ch] and localeinfo.[ch].
2620 * tests/Makefile.am: Remove mention of the tests that we have
2621 moved to the gnulib module.
2622 * src/dfa.c: Remove file.
2623 * src/dfa.h: Likewise.
2624 * src/localeinfo.c: Likewise.
2625 * src/localeinfo.h: Likewise.
2626 * tests/dfa-match: Likewise.
2627 * tests/dfa-match-aux.c: Likewise.
2628 * tests/invalid-char-class: Likewise.
2629
2630 gnulib: update to latest, for new dfa module
2631
26322016-09-08 Paul Eggert <eggert@cs.ucla.edu>
2633
2634 grep: encoding errors suppress just their line
2635 From a suggestion by Marcello Perathoner (Bug#22838).
2636 * NEWS, doc/grep.texi (File and Directory Selection): Document this.
2637 * src/grep.c (print_line_head): Do not suppress later output lines
2638 merely because an earlier output line would have had an encoding error.
2639 * tests/encoding-error: Test for the new behavior.
2640
26412016-09-08 Jim Meyering <meyering@fb.com>
2642
2643 gnulib: update to latest, for getprogname fixes
2644
26452016-09-08 Norihiro Tanaka <noritnk@kcn.ne.jp>
2646
2647 dfa: additional change new option for anchored searches
2648 * src/dfa.c (dfaexec_main): Do it.
2649
26502016-09-07 Paul Eggert <eggert@cs.ucla.edu>
2651
2652 doc: define "context lines"
2653 Reported by Igor Bogomazov via Santiago Ruano Rincón (Bug#24024).
2654 * doc/grep.texi (Context Line Control): Define "context lines".
2655
2656 build: update gnulib submodule to latest
2657
26582016-09-05 Jim Meyering <meyering@fb.com>
2659
2660 maint: switch from gnulib's progname to getprogname module
2661 * gnulib: Update to latest, for its new getprogname module.
2662 * bootstrap.conf (avoided_gnulib_modules): Include the getprogname
2663 module rather than the now-obsolescent progname.
2664 * src/grep.c: Include "getprogname.h" rather than "progname.h"
2665 and remove any use of set_program_name.
2666 * tests/dfa-match-aux.c (main): Likewise.
2667 * tests/get-mb-cur-max.c (main): Likewise.
2668 * src/grep.c (usage, main): Use getprogname() in place of program_name.
2669
26702016-09-02 Paul Eggert <eggert@cs.ucla.edu>
2671
2672 dfa: minor cleanup of previous change
2673 * src/dfa.c (dfaexec_main): Omit redundant code and reindent.
2674
26752016-09-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
2676
2677 dfa: additional change new option for anchored searches
2678 * src/dfa.c (dfaexec_main): Do it.
2679
2680 dfa: use single-byte algorithm even in non-UTF-8
2681 * src/dfa.c (dfaexec_main): Do it. (This was inadvertently
2682 omitted in a recent patch.)
2683
26842016-09-02 Paul Eggert <eggert@cs.ucla.edu>
2685
2686 dfa: merge xalloc.h changes from Gawk
2687 * src/dfa.h (_GL_ATTRIBUTE_MALLOC): Define here, as other
2688 Gnulib .h files do. This is more consistent with Gawk.
2689 * src/dfa.c: Include xalloc.h, since dfa.h no longer does so.
2690 Include localeinfo.h later; we don't care about order, but Gawk does.
2691
26922016-09-02 Arnold Robbins <arnold@skeeve.com>
2693
2694 dfa: port to C90
2695 * src/dfa.c (dfamust): Avoid declarations after statement (Bug#21486).
2696
26972016-09-02 Paul Eggert <eggert@cs.ucla.edu>
2698
2699 dfa: new option for anchored searches
2700 This follows up on a suggestion by Norihiro Tanaka (Bug#24262).
2701 * src/dfa.c (struct regex_syntax): New member 'anchor'.
2702 (char_context): Use it.
2703 (dfasyntax): Change signature to specify it, along with the old
2704 FOLD and EOL args, as a single DFAOPTS arg. All uses changed.
2705 * src/dfa.h (DFA_ANCHOR, DFA_CASE_FOLD, DFA_EOL_NUL): New constants
2706 for dfasyntax new last arg.
2707
27082016-09-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
2709
2710 dfa: simplify and optimize at initial state in execution
2711 * src/dfa.c (skip_remains_mb): Remove argument *pwc. Update calller.
2712 (dfaexec_main): Simplify and optimize at initial state (Bug#24261).
2713
2714 dfa: simplify to find state index for state 0
2715 * src/dfa.c (dfastate): Simplify to find state index for state 0.
2716
27172016-09-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
2718
2719 tests: add a new test for SJIS locale
2720 * tests/sjis-mb: Add a new test. It fails in grep-2.25 or prior.
2721
27222016-09-01 Paul Eggert <eggert@cs.ucla.edu>
2723
2724 grep: update NEWS
2725 * NEWS: Describe previous change.
2726
2727 grep: use regex fastmap unless -i
2728 This builds on a suggestion by Norihiro Tanaka (Bug#24009).
2729 * src/dfasearch.c (GEAcompile): Use a fastmap unless -i.
2730 This improves performance 20x for me using the first benchmark
2731 given in Bug#24009.
2732
2733 grep: improve dfasearch storage management
2734 This patch is mostly refactoring, with a bit of performance tweaking.
2735 It is done in preparation for a fix for Bug#24009.
2736 * src/dfasearch.c (patterns): Now of type struct re_pattern_buffer *
2737 instead of an anonymous struct pointer, since there is no longer
2738 any need to keep regs here. All uses changed.
2739 (GEAcompile): Use patlim instead of a hard-to-follow "total".
2740 Use x2nrealloc to avoid potential O(N**2) reallocation algorithm.
2741 Initialize just the pattern members that need clearing.
2742 (EGexecute): Put regs into a static variable, as this code did
2743 before 2001-02-18, as there is no need to have a separate set of
2744 regs for each pattern. Explain the "Q@#%!#" comment better.
2745
27462016-09-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
2747
2748 dfa: remove separation by context in transition in non-UTF8 multibyte locales
2749 * src/dfa.c (struct dfa): Remove member curr_dependent. All uses
2750 removed.
2751
27522016-09-01 Paul Eggert <eggert@cs.ucla.edu>
2753
2754 dfa: document previous change
2755 * NEWS: Adjust to match previous change.
2756
27572016-09-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
2758
2759 dfa: avoid invalid character matching period
2760 * dfa.c (transit_state): Avoid invalid character matching period.
2761
2762 dfa: use single-byte algorithm even in non-UTF-8
2763 Even in non-UTF8 locales, if the current input character
2764 is single byte, we can use CSET to match ANYCHAR.
2765 * src/dfa.c (struct dfa): New member canychar.
2766 Cache index of CSET for ANYCHAR.
2767 (lex): Make CSET for ANYCHAR.
2768 (state_index): Simplify.
2769 (dfastate): Consider CSET for ANYCHAR.
2770 (transit_state_singlebyte, transit_state): Remove handling for eolbyte,
2771 as we assume that eolbyte does not appear at current position.
2772 (dfaexec_main): Use algorithm for single byte character to any single
2773 byte character in input text always.
2774 (dfasyntax): Initialize canychar.
2775
27762016-09-01 Paul Eggert <eggert@cs.ucla.edu>
2777
2778 grep: avoid code duplication with -iF
2779 This follows up on the -iF performance improvement (Bug#23752).
2780 * NEWS: Simplify description of -iF improvement.
2781 * src/dfa.c: Do not include wctype.h.
2782 (lonesome_lower, case_folded_counterparts): Move to localeinfo.c.
2783 (CASE_FOLDED_BUFSIZE): Move to localeinfo.h.
2784 * src/grep.c: Do not include wctype.h.
2785 (lonesome_lower): Remove.
2786 (fgrep_icase_available): Use case_folded_counterparts instead.
2787 Do not call it for the same character twice.
2788 Return false on wcrtomb failures (which should never happen).
2789 (fgrep_to_grep_pattern, main): Simplify. Let fgrep_to_grep’s
2790 caller fiddle with the global variables.
2791 * src/localeinfo.c: Include <wctype.h>
2792 (lonesome_lower, case_folded_counterparts):
2793 Move here from src/dfa.c. Return int, not unsigned int.
2794 Verify that CASE_FOLDED_BUFSIZE is big enough.
2795 * src/localeinfo.h (CASE_FOLDED_BUFSIZE): Now 32, so that
2796 we don’t expose lonesome_lower’s size.
2797 * src/searchutils.c (kwsinit): Return new kwset instead of
2798 storing it via a pointer. All callers changed. Simplify a bit.
2799
28002016-09-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
2801
2802 grep: speed up -iF in multibyte locales
2803 In a multibyte locale, if a pattern is composed of only single byte
2804 characters and their all counterparts are also single byte characters
2805 and the pattern does not have invalid sequences, grep -iF uses the
2806 fgrep matcher, the same as in a single byte locale (Bug#23752).
2807 * NEWS: Mention it.
2808 * src/grep.c (lonesome_lower): New constant.
2809 (fgrep_icase_available): New function.
2810 (fgrep_to_grep_pattern): Simplify it.
2811 (main): Use them.
2812 * src/searchutils.c (kwsinit): New arg MB_TRANS; all uses changed.
2813 Try fgrep matcher for case insensitive matching by grep -F in multibyte
2814 locale.
2815
28162016-08-31 Paul Eggert <eggert@cs.ucla.edu>
2817
2818 build: update gnulib submodule to latest
2819
28202016-08-31 Jim Meyering <meyering@fb.com>
2821
2822 maint: avoid new 'make syntax-check' failure
2823 * src/dfa.c (using_simple_locale): Prefer STREQ(a,b) over
2824 strcmp(a,b) == 0.
2825
2826 gnulib: update to latest
2827
28282016-08-31 Paul Eggert <eggert@cs.ucla.edu>
2829
2830 dfa: make dfa.c fully thread-safe
2831 This follows up on Zev Weiss’s recent patches to make the DFA code
2832 thread-safe (Bug#24249). It removes the remaining static
2833 variables used by dfa.c. These variables are locale-dependent, so
2834 they would cause problems in multithreaded code where different
2835 threads are in different locales (e.g., via uselocale). I
2836 abstracted most of the variables into a new localeinfo module.
2837 * src/Makefile.am (grep_SOURCES): Add localeinfo.c.
2838 (noinst_HEADERS): Add localeinfo.h.
2839 * src/dfa.c: Include localeinfo.h.
2840 (struct dfa): Remove multibyte member, as it is now part of
2841 localeinfo. New members simple_locale and localeinfo.
2842 Put locale-related members at the end.
2843 (mbrtowc_cache): Remove; now part of dfa->localeinfo.
2844 (charclass_index): Rename back from dfa_charclass_index,
2845 since it's private.
2846 (unibyte_word_constituent): New arg DFA; use its sbctowc member.
2847 (using_utf8, dfa_using_utf8, init_mbrtowc_cache, check_utf8):
2848 Remove; now done by localeinfo members. All uses changed.
2849 (dfasyntax): New localeinfo arg. Move to end to avoid forward decls.
2850 Initialize the entire DFA.
2851 (unibyte_c, check_unibyte_c): Remove; now in simple_locale member.
2852 (using_simple_locale): Now takes bool instead of DFA.
2853 Do the locale check here, rather than in the caller,
2854 as the result is now cached in dfa->simple_locale.
2855 (dfaalloc): Just allocate the DFA. dfasyntax now initializes it.
2856 * src/dfa.h: Add forward decl of struct localeinfo.
2857 Adjust to new dfa.c API.
2858 * src/dfasearch.c (localeinfo): New var, replacing former static
2859 vars like mbrtowc_cache.
2860 * src/localeinfo.c, src/localeinfo.h: New files.
2861 * src/search.h: Include localeinfo.h.
2862 (localeinfo): New decl.
2863 * src/searchutils.c (mbclen_cache, build_mbclen_cache):
2864 Remove. All uses changed to localeinfo.
2865 * tests/Makefile.am (dfa_match_aux_LDADD): Add localeinfo.o.
2866 * tests/dfa-match-aux.c: Include localeinfo.h.
2867 (main): Adjust to changes in DFA API.
2868
28692016-08-28 Paul Eggert <eggert@cs.ucla.edu>
2870
2871 build: update gnulib submodule to latest
2872 This should fix Bug#24323 reported by Dennis Clarke, where grep
2873 does not build on Solaris 10 when compiled with Solaris Studio 12.4.
2874
28752016-08-23 Paul Eggert <eggert@cs.ucla.edu>
2876
2877 dfa: minor thread-safety cleanups
2878 * src/dfa.c (struct lexer_state): Rename lexptr to ptr and lexleft
2879 to left, for brevity. All uses changed.
2880 (struct dfa): Rename lexstate to lex and parsestate to parse,
2881 for brevity. All uses changed.
2882 (using_simple_locale): Simplify boolean expression.
2883 (FETCH_WC): Parenthesize uses of dfa macro arg.
2884 (FETCH_WC, parse_bracket_exp, addtok_mb): Prefer suffix operators
2885 on structure members when possible, for clarity.
2886 (parse_bracket_exp): Check for buffer exhaustion before
2887 dereferencing buffer pointer.
2888 (struct lexptr): New type.
2889 (push_lex_state, pop_lex_state): Use it. Change from macros
2890 PUSH_LEX_STATE and POP_LEX_STATE to static functions, and add
2891 parameters to make them proper C functions. All uses changed.
2892 (lex): Simplify tests for \) and \|. Avoid some string
2893 duplication by using &"^..."[boolean].
2894 (dfaalloc): Use xzalloc, not xcalloc with 1.
2895
28962016-08-21 Paul Eggert <eggert@cs.ucla.edu>
2897
2898 grep: minor tweaks of initial buffer alloc
2899 * src/grep.c (main): Allocate input buffer only when about
2900 to do I/O. Avoid int overflow on systems with 2 GiB pages.
2901 Fix size_t overflow check.
2902
29032016-08-20 Zev Weiss <zev@bewilderbeest.net>
2904
2905 dfa: constify some function parameters
2906 * src/dfa.c (char_context): Mark dfa parameter const.
2907 (charclass_context): Likewise.
2908
2909 dfa: thread-safety: initialize mbrtowc_cache in dfa_init
2910 * src/dfa.c (dfasyntax): Remove initialization of mbrtowc_cache.
2911 (init_mbrtowc_cache): New function.
2912 (dfa_init): Call it.
2913 http://bugs.gnu.org/24259
2914
2915 dfa: thread-safety: eliminate static local variables
2916 * src/dfa.c: Replace utf8 and unibyte_c static local variables with
2917 static globals initialized by a new function dfa_init() which must be
2918 called before any other dfa*() functions.
2919 (dfa_using_utf8): Rename using_utf8() to dfa_using_utf8() for
2920 consistency with other exported functions.
2921 * src/dfa.h (dfa_using_utf8): Rename using_utf8() to dfa_using_utf8();
2922 also add _GL_ATTRIBUTE_PURE.
2923 (dfa_init): New function.
2924 * src/grep.c (main), tests/dfa-match-aux.c (main): Call dfa_init().
2925 * src/dfasearch.c (EGexecute): Replace using_utf8 with dfa_using_utf8.
2926 * src/kwsearch.c (Fexecute): Likewise.
2927 * src/pcresearch.c (Pcompile): Likewise.
2928 http://bugs.gnu.org/24259
2929
2930 dfa: thread-safety: move regex syntax configuration into struct dfa
2931 * src/dfa.c: move global variables holding regex syntax configuration
2932 into a new struct (`struct regex_syntax') and add an instance of it to
2933 struct dfa. All references to the globals are replaced with
2934 references to the dfa struct's new member. As a side effect, a
2935 `struct dfa' must be allocated with dfaalloc() and passed to
2936 dfasyntax().
2937 * src/dfa.h (dfasyntax): Add new struct dfa* parameter.
2938 * src/dfasearch.c (GEAcompile): Allocate `dfa' earlier and pass it to
2939 dfasyntax().
2940 * tests/dfa-match-aux.c (main): Pass `dfa' to dfasyntax().
2941 http://bugs.gnu.org/24259
2942
2943 dfa: thread-safety: move parser state into struct dfa
2944 * src/dfa.c: move global variables holding parser state (`tok' and
2945 `depth') into a new struct (`struct parser_state') and add an instance
2946 of it to struct dfa. All references to the globals are replaced by
2947 references to the dfa struct's new member.
2948 http://bugs.gnu.org/24259
2949
2950 dfa: thread-safety: move lexer state into struct dfa
2951 * src/dfa.c: move global variables holding lexer state into a new
2952 struct (`struct lexer_state') and add an instance of this struct to
2953 struct dfa. All references to the globals are replaced with
2954 references to the dfa struct's new member.
2955 http://bugs.gnu.org/24259
2956
29572016-08-19 Zev Weiss <zev@bewilderbeest.net>
2958
2959 dfa: thread-safety: remove dfa.c's "dfa" global
2960 Remove the global dfa struct. Instead, add a struct dfa pointer
2961 parameter to each function that had been using the global.
2962 * src/dfa.c (dfa): Remove file-scoped global.
2963 (charclass_index): Remove now-unnecessary function.
2964 (using_simple_locale): Add a dfa parameter and update all callers.
2965 (FETCH_WC, parse_bracket_exp, lex, addtok_mb, addtok): Likewise.
2966 (addtok_wc, add_utf8_anychar, atom, nsubtoks, copytoks): Likewise.
2967 (closure, branch, regexp): Likewise.
2968 (dfaparse): No longer set the global.
2969 http://bugs.gnu.org/24260
2970
29712016-08-18 Paul Eggert <eggert@cs.ucla.edu>
2972
2973 grep: tune list_files conversion to enum
2974 * src/grep.c (grepdesc): Use a slightly more-efficient way to test
2975 list_files.
2976
2977 grep: prefer bitwise to short-circuit when shorter
2978 * src/grep.c (skip_devices, initialize_unibyte_mask, fillbuf, main)
2979 * src/kwsearch.c (Fexecute): Prefer bitwise to short-circuit ops
2980 when they are logically equivalent and the bitwise ops generate
2981 shorter code on GCC 6.1 x86-64.
2982 * src/grep.c (get_nondigit_option, parse_grep_colors):
2983 Use c_isdigit instead of spelling it out with a short-circuit op.
2984
29852016-08-17 Paul Eggert <eggert@cs.ucla.edu>
2986
2987 dfa: use 64-bit when ulong is at least that wide
2988 * src/dfa.c (charclass_word): Now unsigned long instead of unsigned.
2989 (CHARCLASS_WORD_BITS): Now 64 on 64-bit platforms.
2990 (CHARCLASS_PAIR, CHARCLASS_INIT): New macros.
2991 (CHARCLASS_WORD_MASK): Now a static const, since it no longer
2992 needs to be a macro.
2993 (equal): Open-code rather than calling memcmp.
2994 (add_utf8_anychar): Use CHARCLASS_INIT.
2995
2996 dfa: avoid uninitialized constants
2997 Some compilers warn about 'static int const x;' on the grounds
2998 that X should have an initializer. Instead of worrying about
2999 this, rewrite to avoid this sort of thing.
3000 * src/dfa.c (emptyset): New function.
3001 (parse_bracket_exp): Use it instead of 'equal' and a zero constant.
3002 * src/dfasearch.c (struct patterns): Remove tag 'patterns'.
3003 (patterns0): Remove zero constant.
3004 (GEAcompile): Use memset instead of the zero constant.
3005
30062016-08-17 Jim Meyering <meyering@fb.com>
3007
3008 maint: avoid new "make syntax-check" failure
3009 * src/dfa.c: Adjust comment not to go past column 80.
3010
3011 tests: pcre-jitstack: avoid false failure without base64 -d support
3012 * tests/pcre-jitstack: Try harder to find a base64 decoder:
3013 try 'base64 -d', 'base64 -D', 'openssl base64 -d' and perl's
3014 MIME::Base64 decode_base64. The old code would fail at least on
3015 OS X, for which base64 expects -D or --decode.
3016 Reported by Jack Howarth in http://bugs.gnu.org/24243.
3017
30182016-08-16 Paul Eggert <eggert@cs.ucla.edu>
3019
3020 dfa: minor refactoring and doc fixes
3021 * NEWS: Improve description of recent change.
3022 * src/dfa.c: Improve commentary. Indent new code (and some
3023 long-existing howlers) more in GNU style.
3024 (dfa_state): Reorder members to make struct smaller on x86.
3025 mb_trindex member is now state_num, not size_t, so that -1 is more
3026 natural; all uses changed.
3027 (struct dfa): Similarly for mb_trcount member.
3028 (state_index): Compute values for new state components before
3029 allocating the state, to make the code easier to understand.
3030 (state_index, dfastate): Prefer A & ~B to other forms like (A & B)
3031 != A.
3032 (dfastate, build_state, transit_state): In new code, prefer i++ to
3033 ++i in for-loop control.
3034 (build_state, transit_state): In new code, prefer < to >.
3035 (transit_state): Add to *PP in one assignment, rather than in a
3036 loop. Prefer !x to x == NULL. Use xmalloc instead of xnmalloc,
3037 since the size is a constant. Do the size calculation as a signed
3038 integer constant expression, so that the compiler diagnoses any
3039 overflow.
3040 (transit_state, free_mbdata): Tune by looping from -1 to N - 1,
3041 rather than from 0 to N - 1 with a separate instance for -1.
3042 (dfaexec_main): Rewrite to avoid side effects in if-part.
3043 (free_mbdata): Simplify.
3044
3045 dfa: port to C90
3046 * src/dfa.c (transit_state, dfa_supported, dfamust):
3047 Don't use declarations after statements.
3048 If I recall correctly, gawk still wants to port to C90.
3049
3050 dfa: fix context newline confusion
3051 * src/dfa.c (transit_state): Fix "... & ~0" that was evidently
3052 intended to be "... & ~1". Do index calculation in a simpler way,
3053 that uses just addition (Bug#21486).
3054
30552016-08-16 Norihiro Tanaka <noritnk@kcn.ne.jp>
3056
3057 dfa: improve leading "." with non-UTF8 multibyte
3058 In non-UTF8 multibyte locales, matching the dot expression is very
3059 slow, as the next state is calculated on demand. This change caches
3060 the result for the typical case (Bug#21486).
3061
3062 Compare the run times of this command before and after this change,
3063 on a i5-4570 CPU @ 3.20GHz using rawhide (~fedora 22) and compiled
3064 with gcc 5.1.1 20150618:
3065 yes "$(printf 'a%38db\n' 0)" | head -1000000 >in
3066 env LC_ALL=ja_JP.eucJP time -p \
3067 src/grep .......................................... in
3068 Before: 19.10
3069 After : 0.55
3070
3071 * NEWS: Document this.
3072 * src/dfa.c: (struct dfa_state): New members curr_dependent, mb_trindex.
3073 (MAX_TRCOUNT): New constant.
3074 (struct dfa): New members mb_trans, mb_trcount.
3075 (state_index): Initialize new members of struct dfa_state and calculate
3076 dependency on context of next character for positions for dot.
3077 (dfastate): Calculate follows positions for dot if enabled.
3078 (realloc_trans_if_necessary): Allocate transition tables.
3079 (build_state): Use new constant and reset transition tables.
3080 (transit_state): Use cache for transition from a state with the dot
3081 expression.
3082 (free_mbdata): Deallocate transition tables.
3083
30842016-08-06 Jim Meyering <meyering@fb.com>
3085
3086 tests: standardize on 10-second timeouts to avoid rare false failure
3087 In a parallel test run, it is not unusual to exceed a timeout of
3088 1-3 seconds. Increase several from 3 or fewer to 10 seconds.
3089 * tests/skip-device: Increase timeout from 2 to 10 seconds.
3090 * tests/grep-dev-null-out: Likewise, but s/1/10/.
3091 * tests/pcre-invalid-utf8-input: Likewise, but s/3/10/.
3092 * tests/dfa-match: Likewise.
3093 * tests/pcre-invalid-utf8-infloop: Likewise.
3094 * tests/pcre-infloop: Likewise.
3095 * tests/max-count-overread: Likewise.
3096 * tests/invalid-multibyte-infloop: Likewise.
3097 Prompted by http://bugs.gnu.org/24159.
3098
3099 tests/backref-multibyte-slow:: avoid false positive
3100 * tests/backref-multibyte-slow: When redirecting the "fast" LC_ALL=C
3101 run's output to /dev/null, we got an artificially low timing (of 0),
3102 due to grep's own stdout-vs-/dev/null optimization. With an initial
3103 timing of 0 on that first run, the derived timeout for the UTF-8 run
3104 (which redirects to a file) would be a mere 1 second. The fix: also
3105 redirect that first run's output to a file, not to /dev/null.
3106
31072016-08-05 Norihiro Tanaka <noritnk@kcn.ne.jp>
3108
3109 dfa: minor fix for whether dfa is "fast"
3110 * src/dfa.c (dfaoptimize): When a UTF-8 optimization succeeds for
3111 a DFA (it can use single-byte code paths), record that by setting
3112 its ->fast flag.
3113
31142016-07-25 Jim Meyering <meyering@fb.com>
3115
3116 grep: print "filename:lineno:" in invalid-regex diagnostic
3117 Determining the file name and line number is a little tricky because
3118 of the way the regular expressions are all concatenated onto a newline-
3119 separated list. By the time grep would compile regular expressions,
3120 the <filename,lineno> origin of each regexp was no longer available.
3121 This patch adds a list of filename,first_lineno pairs, one per input
3122 source, by which we can then map the ordinal regexp number to a
3123 filename,lineno pair for the diagnostic.
3124
3125 * src/dfasearch.c (GEAcompile): When diagnosing an invalid regexp
3126 specified via -f FILE, include the "FILENAME:LINENO: " prefix.
3127 Also, when there are two or more lines with compilation failures,
3128 diagnose all of them, rather than stopping after the first.
3129 * src/grep.h (pattern_file_name): Declare it.
3130 * src/grep.c: (struct FL_pair): Define type.
3131 (fl_pair, n_fl_pair_slots, n_pattern_files, patfile_lineno):
3132 Define globals.
3133 (fl_add, pattern_file_name): Define functions.
3134 (main): Call fl_add for each type of the following: -e argument,
3135 -f argument, command-line-specified (without -e) regexp.
3136 * tests/filename-lineno.pl: New file.
3137 * tests/Makefile.am (TESTS): Add it.
3138 * NEWS (Improvements): Mention this.
3139 Initially reported by Gunnar Wolf in https://bugs.debian.org/525214
3140 Forwarded to grep's bug list by Santiago Ruano Rincón as
3141 http://debbugs.gnu.org/23965
3142
31432016-07-24 Jim Meyering <meyering@fb.com>
3144
3145 tests: add coreutils' perl-driven test framework
3146 * configure.ac: Set the AM_CONDITIONAL variable, HAVE_PERL.
3147 * tests/Coreutils.pm: New file.
3148 * tests/CuSkip.pm: New file.
3149 * tests/CuTmpdir.pm: New file.
3150 * tests/no-perl: New file.
3151 * tests/Makefile.am: Set up to use .pl tests:
3152 (TEST_EXTENSIONS, TESTSUITE_PERL, TESTSUITE_PERL_OPTIONS): Define.
3153 (SH_LOG_COMPILER, PL_LOG_COMPILER): Define.
3154 (EXTRA_DIST): Add the four new file names.
3155
3156 doc: omit an excess word in HACKING
3157
31582016-07-21 Norihiro Tanaka <noritnk@kcn.ne.jp>
3159
3160 grep: always match single line only with DFA superset
3161 \n cannot occur inside a multibyte character. So an input always
3162 matches single line only with DFA superset.
3163
3164 * src/dfasearch.c (EGexecute): Simplify it with above.
3165
31662016-07-15 Norihiro Tanaka <noritnk@kcn.ne.jp>
3167
3168 dfa: fix whitespace problems
3169 * src/dfa.c: Use GNU style for pointer decls.
3170
31712016-07-15 Paul Eggert <eggert@cs.ucla.edu>
3172
3173 maint: modernize HACKING a bit
3174 * HACKING: Remove some ancient history to simplify maintenance.
3175
31762016-07-14 Paul Eggert <eggert@cs.ucla.edu>
3177
3178 grep: minor style changes for -F crash fix
3179 * src/kwset.c (memoff2_kwset): Use ?: instead of if-else.
3180
31812016-07-14 Norihiro Tanaka <noritnk@kcn.ne.jp>
3182
3183 grep: fix -F crash when alternating duplicates
3184 grep -F crashes with a pattern like 0\n0.
3185 This bug was introduced in 966f6586fbce3081ce6e5e2f9b55301b0ec3d2b4.
3186
3187 * src/kwset.c (memoff2_kwset): If two characters are the same,
3188 use memchr instead of memchr2.
3189 * tests/two-chars: New test.
3190 * tests/Makefile.am (TESTS): Add it.
3191
31922016-07-07 Paul Eggert <eggert@cs.ucla.edu>
3193
3194 dfa: fix comments to match code better
3195 * src/dfa.c: Fix comments.
3196
31972016-07-06 Paul Eggert <eggert@cs.ucla.edu>
3198
3199 dfa: don't treat null bytes specially
3200 * src/dfa.c (transit_state): Do not treat null byte specially
3201 when eolbyte == '\n'.
3202
32032016-07-06 Norihiro Tanaka <noritnk@kcn.ne.jp>
3204
3205 dfa: don't distingish letter in non-POSIX locales
3206 For non-POSIX locales, dfa does not support word delimiter
3207 support, so remove distinction between letters and non-letters.
3208 * src/dfa.c (struct dfa): Remove members initstate_letter,
3209 initstate_others. All uses removed. New member initstate_notbol.
3210 (dfaanalyze, dfaexec_main): Replace old members with new member.
3211 (wchar_context): Remove. Update callers.
3212
32132016-07-06 Paul Eggert <eggert@cs.ucla.edu>
3214
3215 dfa: minor cleanups for non-POSIX simplification
3216 * src/dfa.c (transit_state_singlebyte): Remove unnecessary 'const'
3217 from arg; we usually don't bother with 'const' on locals.
3218 (transit_state_singlebyte): Omit '!= NULL' in boolean context.
3219 Use assert rather than abort.
3220
32212016-07-06 Norihiro Tanaka <noritnk@kcn.ne.jp>
3222
3223 dfa: simplify for non-POSIX locales
3224 Simplify the dfa code, since it no longer supports ranges,
3225 collating elements, and equivalent classes in non-POSIX locales.
3226 * src/dfa.c (struct dfa): Remove mb_match_lens.
3227 (enum status_transit_state, match_anychar)
3228 (check_matching_with_multibyte_ops, transit_state_consume_1char):
3229 (State_transition): Remove.
3230 (transit_state_singlebyte): Accepts pointer-to-pointer position,
3231 instead of pointer, and no longer accept pointer to next state.
3232 Return next state instead of status_transit_state. All callers
3233 changed.
3234 (transit_state_singlebyte, transit_state): Simplify.
3235 (dfaexec_main): Now transit_state is called only when next character
3236 matches with ANYCHAR.
3237
32382016-06-14 Paul Eggert <eggert@cs.ucla.edu>
3239
3240 doc: propagate more changes from grep.texi
3241 Problem reported by Björn Voigt in: http://bugs.gnu.org/23763#27
3242 * doc/grep.in.1: Fix more inconsistencies with grep.texi.
3243
32442016-06-13 Paul Eggert <eggert@cs.ucla.edu>
3245
3246 doc: remove obsolete MS-DOS mention
3247 * doc/grep.in.1: Remove obsolete discussion of MS-DOS heuristics.
3248 Problem reported by Björn Voigt in: http://bugs.gnu.org/23763
3249
32502016-06-09 Zev Weiss <zev@bewilderbeest.net>
3251
3252 grep: do pagesize initialization and buffer allocation earlier
3253 * src/grep.c (reset, main): We're going to need pagesize and buffer
3254 initialized anyway, so we might as well do so unconditionally early on
3255 rather than checking on every call to reset().
3256 http://bugs.gnu.org/23717
3257
3258 grep: remove unnecessary dirdesc variable.
3259 * src/grep.c (grepdirent): Remove dirdesc variable and just use
3260 fts_cwd_fd directly, since the fts_options test was guaranteed to
3261 succeed (and fts_cwd_fd was already being used directly in fstatat()
3262 anyway). http://bugs.gnu.org/23716
3263
3264 grep: convert list_files to an enum
3265 * src/grep.c: Make list_files a tristate enum instead of an int.
3266 http://bugs.gnu.org/23715
3267
3268 grep: correct a stale comment and remove dead code
3269 * src/grep.c (grepdesc): The `grep()' function no longer has
3270 special-case negative return values, since it no longer handles
3271 directories, so don't bother checking for them.
3272 http://bugs.gnu.org/23714
3273
3274 maint: replace bitwise with logical OR
3275 * src/grep.c (main): replace bitwise ORs with logical ORs where it
3276 makes sense (when dealing with boolean conditions as opposed to
3277 bitmasks). http://bugs.gnu.org/23713
3278
3279 maint: mark a couple of static variables const
3280 * src/dfa.c (parse_bracket_exp): mark zeroclass const.
3281 * src/dfasearch.c: mark patterns0 const.
3282 http://bugs.gnu.org/23712
3283
32842016-06-03 Paul Eggert <eggert@cs.ucla.edu>
3285
3286 tests: fix similar bug in exit status test
3287 * tests/grep-dir (status_range): New shell function.
3288 Use it to fix bug where $? was not saved properly.
3289
32902016-06-03 Zev Weiss <zev@bewilderbeest.net>
3291
3292 tests: fix bug in exit status test
3293 When checking $? against multiple values, save its value in another
3294 variable and check that so as to avoid tests beyond the first seeing a
3295 $? clobbered by earlier ones.
3296
3297 * tests/status: save $? in a temporary variable before testing it.
3298
32992016-06-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
3300
3301 dfa: more simplification of dfaexec_main
3302 * src/dfa.c (dfaexec_main): Failure at an acceptable position and demand
3303 to build state is unlikely. So go next loop without checking them after
3304 a newline. This commit induces no semantic change.
3305
33062016-06-02 Paul Eggert <eggert@cs.ucla.edu>
3307
3308 maint: correct attribution
3309 * build-aux/git-log-fix: Fix attribution of primary Aho-Corasick patch
3310
33112016-06-02 Paul Eggert <eggert@cs.ucla.edu>
3312
3313 grep: simplify -F Aho-Corasick a bit
3314 This removes some tuning that complicates the code without providing
3315 performance benefits that I could measure (GCC 6.1, x86-64).
3316 (acexec_trans): Do not hand-unroll. Unduplicate the code for a
3317 transition step.
3318
3319 * src/kwset.c (struct kwset.kwsexec, bmexec, acexec_trans, acexec)
3320
33212016-06-02 Paul Eggert <eggert@cs.ucla.edu>
3322
3323 grep: minor cleanups for -F Aho-Corasick
3324 * NEWS: Don't claim 7x, as the value seems to be system-dependent.
3325 * src/kwset.c (struct kwset.kwsexec, bmexec, acexec_trans, acexec):
3326 * src/kwset.c, src/kwset.h (kwsalloc, kwsexec):
3327 Don't put 'const' into the declaration when that is irrelevant to
3328 the API. More generally, don't bother with 'const' when it's only
3329 a local so it is reasonably obvious to a reader that it is 'const'
3330 anyway. It would be overkill to add 'const' to all locals that
3331 never change.
3332 * src/kwset.c (U): Avoid unnecessary parens.
3333 (treefails, memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans):
3334 Prefer SIZE_MAX to (size_t) -1.
3335 (bmexec_trans, cwexec, acexec_trans):
3336 Remove attributes for static functions that no longer seem needed.
3337 (memoff2_kwset): Rename from memchr2_kwset, since it returns
3338 an offset, not a pointer. All uses changed.
3339 (cwexec, acexec_trans) [lint]: Remove initialization that is no
3340 longer needed; at least, GCC 6.1 x86-64 does not need it.
3341 (acexec_trans): Clarify code by using nesting rather than 'continue'.
3342
33432016-06-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
3344
3345 grep: use memchr2 for two patterns of a character
3346 * src/kwset.c (memchr2_kwset): Add a new function. grep uses memchr2 to
3347 search just two letters.
3348 (cwexec, acexec_trans): Use it.
3349
3350 grep: -F multiword longest match not always needed
3351 Searching multiple fixed words, grep immediately returns without longest
3352 match if not needed. Without this change, grep tries longest match for
3353 multiple words even if not needed.
3354 * src/kwset.c (kwsexec, acexec, cwexec, bmexec): Add a bool argument
3355 for whether longest match is needed. All callers changed.
3356 * src/kwset.h (kwsexec): Update prototype.
3357
33582016-06-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
3359
3360 grep: use Aho-Corasick algorithm to search multiple fixed words
3361 Searching multiple fixed words, grep used the Commentz-Walter
3362 algorithm, but this was O(m*n) and was very slow in the worst case.
3363 For example:
3364
3365 - input: yes `printf %040d` | head -10000000
3366 - word1: x0000000000000000000
3367 - word2: x
3368
3369 This change instead uses the Aho-Corasick algorithm to search multiple
3370 fixed words. It uses a high-quality trie-building function that is
3371 already defined for Commentz-Walter in kwset.c.
3372
3373 I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5
3374 by this change. Using best-of-5 trials for the benchmark:
3375
3376 find /usr/share/doc/ -type f |
3377 LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null
3378
3379 The results were:
3380
3381 real 11.37 user 11.03 sys 0.24 [without the change]
3382 real 1.49 user 1.31 sys 0.15 [with the change]
3383
3384 * src/kwset.c (struct kwset): Add a new member 'mode'.
3385 (kwsalloc): Use it.
3386 All callers are changed.
3387 (kwsincr): Using Aho-Corasick algorithm, build tries in normal order.
3388 (acexec_trans, acexec): Add a new function.
3389 (kwsexec): Use it.
3390 * src/kwset.h (kwsalloc): Update a prototype.
3391 * NEWS (Improvements): Mention it.
3392
33932016-05-13 Jim Meyering <meyering@fb.com>
3394
3395 maint: do not let a LANGUAGE envvar setting perturb tests
3396 E.g., running "LANGUAGE=eo make check" would provoke a failure
3397 of the encoding-error test, on systems that mistakenly let that
3398 envvar trump the setting of LC_ALL.
3399 * tests/envvar-check: New file, copied from coreutils.
3400 * tests/Makefile.am (EXTRA_DIST): Add it.
3401 (TESTS_ENVIRONMENT): Source it.
3402 Also select TMPDIR as we do for coreutils tests.
3403 Reported by Benno Schulenberg in http://bugs.gnu.org/23527.
3404
34052016-05-02 Jim Meyering <meyering@fb.com>
3406
3407 maint: avoid NEWS syntax-check failure
3408 * NEWS: Move the mention of the /dev/null speed-up from the
3409 block for 2.25 into the current, in-preparation block.
3410
34112016-05-01 Paul Eggert <eggert@cs.ucla.edu>
3412
3413 dfa: prefer bool for boolean
3414 * src/dfa.c (syntax_bits_set, dfasyntax, using_utf8, FETCH_WC)
3415 (POP_LEX_STATE, State_transition):
3416 * src/dfa.h (using_utf_8):
3417 Use bool for boolean.
3418
34192016-05-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
3420
3421 dfa: stop exporting internal functions
3422 * src/dfa.c, src/dfa.h (dfaparse, dfaanalyze, dfastate, dfainit):
3423 Now static.
3424
3425 dfa: prefer bool at DFA interfaces
3426 * src/dfa.c (struct dfa, dfasyntax, dfaanalyze, dfaexec_main)
3427 (dfaexec_mb, dfaexec_sb, dfaexec_noop, dfaexec, dfacomp):
3428 * src/dfa.h (dfasyntax, dfacomp, dfaexec, dfaanalyze):
3429 * src/dfasearch.c (EGexecute):
3430 Use bool for boolean.
3431
34322016-05-01 Paul Eggert <eggert@cs.ucla.edu>
3433
3434 dfa: speed up checking for character boundary
3435 This should help performance with gawk; not so much with grep.
3436 Suggested by Norihiro Tanaka in: http://bugs.gnu.org/18777
3437 * src/dfa.c (never_trail): New static var.
3438 (dfasyntax): Initialize it.
3439 (skip_remains_mb): Use it to speed up a common case in Gawk.
3440
3441 grep: /dev/null output speedup
3442 This sped up 'seq 10000000000 | grep . >/dev/null' by a factor of
3443 380,000 on my platform (Fedora 23, x86-64, AMD Phenom II X4 910e,
3444 en_US.UTF-8 locale).
3445 * NEWS: Document this.
3446 * src/grep.c (grepbuf): exit_on_match no longer implies that -q
3447 was specified, so when a match is found, exit with exit_failure if
3448 an error was also found.
3449 (grepdesc): Omit unnecessary S_ISREG and st_ino checks.
3450 out_stat.st_ino is zero if stdout is not a regular file,
3451 and this cannot possibly equal st->st_ino.
3452 (main): Omit duplicate initialization of exit_failure. Do not
3453 bother with isatty unless -q is not used and stdout is a character
3454 special file and --color=auto and TERM says colorization is
3455 possible. Most importantly, set exit_on_match if the output is
3456 /dev/null.
3457 * tests/grep-dev-null-out: New test.
3458 * tests/Makefile.am (TESTS): Add it.
3459 * tests/status: Do not require grep to actually read all the input
3460 files when the output is /dev/null and a matching line has been
3461 found.
3462
34632016-04-21 Jim Meyering <meyering@fb.com>
3464
3465 maint: post-release administrivia
3466 * NEWS: Add header line for next release.
3467 * .prev-version: Record previous version.
3468 * cfg.mk (old_NEWS_hash): Auto-update.
3469
3470 version 2.25
3471 * NEWS: Record release date.
3472
34732016-04-19 Paul Eggert <eggert@cs.ucla.edu>
3474
3475 dfa: remove dependency on btowc
3476 MirOS BSD btowc is a macro that (when GCC is being used) hardcodes
3477 btowc (0x80) == WEOF regardless of locale, which contradicts
3478 future POSIX in the C locale. Instead of bothering to develop a
3479 Gnulib workaround for the btowc incompatibility, use mbrtowc,
3480 which we are using elsewhere and fixing anyway, and are caching so
3481 it is fast here. Problem reported by Nelson H. F. Beebe via Jim
3482 Meyering in: http://bugs.gnu.org/23269#14
3483 * bootstrap.conf (gnulib_modules): Remove btowc.
3484 * src/dfa.c (struct dfa): Remove mbrtowc_cache member, replacing with ...
3485 (mbrtowc_cache): ... this new static var. All uses changed.
3486 (dfambcache): Remove; now done by setsyntax. Call removed.
3487 (is_valid_unibyte_character): Remove.
3488 (IS_WORD_CONSTITUENT): Remove this macro, replacing it with ...
3489 (unibyte_word_constituent): ... this new function. It uses
3490 mbrtowc_cache rather than btowc.
3491 (dfasyntax): Initialize mbrtowc_cache before using it.
3492
34932016-04-10 Paul Eggert <eggert@cs.ucla.edu>
3494
3495 grep: minor doc tweaks inspired by Debian
3496 Problem reported by Santiago Ruano Rincón in: http://bugs.gnu.org/22911
3497 * doc/grep.in.1:
3498 * doc/grep.texi (Matching Control, grep Programs)
3499 (Regular Expressions):
3500 Document -e, -f, and PCRE more carefully.
3501
35022016-04-10 Jim Meyering <meyering@fb.com>
3503
3504 maint: remove unused mbtoupper function
3505 * src/searchutils.c (mbtoupper): Remove now-unused function.
3506 Also remove inclusion of <assert.h>, since this change removed
3507 the final use of assert.
3508 * src/search.h (mbtoupper): Remove declaration.
3509
35102016-04-10 Paul Eggert <eggert@cs.ucla.edu>
3511
3512 grep: in C locale, all bytes are valid characters
3513 This works around glibc bug 19932:
3514 https://sourceware.org/bugzilla/show_bug.cgi?id=19932
3515 The actual bug fix was the update to the current version of Gnulib.
3516 grep problem reported by Björn Jacke in: http://bugs.gnu.org/23234
3517 * NEWS: Mention this.
3518 * doc/grep.texi (File and Directory Selection): Crossref to LC_*
3519 section. Suggest why -a or LC_ALL=C might be useful.
3520 (Environment Variables): Mention 'locale -a'.
3521 Say that LC_CTYPE also specifies encoding, and that every
3522 byte is a valid character in the C or POSIX locale.
3523 * tests/c-locale: New test.
3524 * tests/Makefile.am (TESTS): Add it.
3525
3526 build: update gnulib submodule to latest
3527
35282016-04-05 Paul Eggert <eggert@cs.ucla.edu>
3529
3530 Give another example of binary file processing
3531 Problem reported by Shlomi Fish
3532 * doc/grep.texi (File and Directory Selection):
3533 Document that 'q$' might match 'q' followed by a NUL
3534 if --binary-files=binary is in effect.
3535
35362016-04-03 Paul Eggert <eggert@cs.ucla.edu>
3537
3538 tests: test egrep/fgrep help only if our grep
3539 Problem reported by Christian Weisgerber in: http://bugs.gnu.org/23146
3540 * tests/Makefile.am (TESTS_ENVIRONMENT):
3541 Test egrep and fgrep only if they use our grep.
3542
35432016-03-29 Jim Meyering <meyering@fb.com>
3544
3545 tests: remove spurious test of egrep
3546 * tests/reversed-range-endpoints: Do not test egrep here.
3547 There is already a test of grep -E.
3548 Prompted by http://bugs.gnu.org/23146
3549
35502016-03-23 Paul Eggert <eggert@cs.ucla.edu>
3551
3552 grep: -Pz no longer misdiagnoses [^a]
3553 Problem reported by Michael Jess.
3554 * NEWS: Document this.
3555 * src/pcresearch.c (Pcompile): Do not diagnose [^ when [ is unescaped.
3556 * tests/pcre: Test for the bug.
3557
35582016-03-22 Jim Meyering <meyering@fb.com>
3559
3560 maint: move new 'Improvements' blurb into proper section
3561 * NEWS (Improvements): Move this new section from within the block
3562 for the already-released 2.24 into the proper "next-release" block.
3563 Also, retain the 2-blank-line separator between blocks.
3564
35652016-03-18 Jim Meyering <meyering@fb.com>
3566
3567 maint: avoid spurious "binary file ... matches" in generated THANKS
3568 * Makefile.am (THANKS): Don't apply grep to a stream containing
3569 NUL bytes. Sync this rule from the one in coreutils: it was missing
3570 some improvements.
3571 Reported by Bailes Magio in http://bugs.gnu.org/22899
3572
35732016-03-18 Paul Eggert <eggert@cs.ucla.edu>
3574
3575 grep: -oz now outputs null bytes, not newlines
3576 * NEWS: Document this.
3577 * doc/grep.texi (Other Options): Clarify that -z affects output
3578 as well as input data.
3579 * src/grep.c (print_line_middle): Output eolbyte, not newline, if -o.
3580 * tests/null-byte: Test -o too.
3581 * tests/pcre-context: Adjust test to match new behavior.
3582
35832016-03-17 Paul Eggert <eggert@cs.ucla.edu>
3584
3585 grep: use errno consistently in write diagnostics
3586 Feature request and initial version reported by Assaf Gordon in:
3587 http://bugs.gnu.org/23031
3588 * NEWS: Document this.
3589 * src/grep.c: Include <stdarg.h>.
3590 (stdout_errno): New static var.
3591 (write_error_seen): Remove; superseded by stdout_errno.
3592 All uses changed.
3593 (putchar_errno, fputs_errno, printf_errno, fwrite_errno)
3594 (fflush_errno): New static functions.
3595 (print_filename, print_sep, print_offset, print_line_head)
3596 (print_line_middle, print_line_tail, prline, prtext, grep)
3597 (grepdesc): Use them.
3598 * tests/write-error-msg: New file.
3599 * tests/Makefile.am (TESTS): Add it.
3600
36012016-03-10 Jim Meyering <meyering@fb.com>
3602
3603 maint: post-release administrivia
3604 * NEWS: Add header line for next release.
3605 * .prev-version: Record previous version.
3606 * cfg.mk (old_NEWS_hash): Auto-update.
3607
3608 version 2.24
3609 * NEWS: Record release date.
3610
36112016-02-28 Jim Meyering <meyering@fb.com>
3612
3613 maint: add dist-check.mk
3614 This file augments "make distcheck" rules.
3615 * dist-check.mk: New file, from coreutils via gzip.
3616 * Makefile.am (EXTRA_DIST): Add it.
3617 * cfg.mk: Include it.
3618
36192016-02-21 Paul Eggert <eggert@cs.ucla.edu>
3620
3621 grep: -Pz is incompatible with ^ and $
3622 Problem reported by Sergei Trofimovich in: http://bugs.gnu.org/22655
3623 * NEWS: Document this.
3624 * src/pcresearch.c (Pcompile): Warn with -Pz and anchors.
3625 * tests/pcre: Test new behavior.
3626
36272016-02-21 Jim Meyering <meyering@fb.com>
3628
3629 tests: test cleanup
3630 * tests/z-anchor-newline: Remove test artifact that would write
3631 to /t/x.
3632
36332016-02-20 Jim Meyering <meyering@fb.com>
3634
3635 grep -z: avoid erroneous match with regexp anchor and \n in text
3636 * src/dfasearch.c (EGexecute): Clear the newline_anchor bit when
3637 eolbyte is not '\n'.
3638 * tests/z-anchor-newline: New file.
3639 * tests/Makefile.am (TESTS): Add it.
3640 * NEWS (Bug fixes): Describe it.
3641 Originally reported by Ulrich Mueller in
3642 https://bugs.gentoo.org/show_bug.cgi?id=574662
3643 Reported to us by Sergei Trofimovich as http://debbugs.gnu.org/22655
3644
3645 tests: convert "cmd && fail=1" to "returns_ 1 cmd || fail=1"
3646 The latter is robust, while the former can silently ignore
3647 failure due to signals.
3648 * cfg.mk (sc_prohibit_and_fail_1): New rule, copied from coreutils.
3649 * tests/long-pattern-perf: Perform the above substitution.
3650 * tests/mb-non-UTF8-performance: Likewise.
3651 * tests/help-version: Merge from coreutils.
3652
36532016-02-09 Jim Meyering <meyering@fb.com>
3654
3655 maint: add a check-very-expensive target
3656 * Makefile.am (check-very-expensive): New convenience rule,
3657 currently merely equivalent to check-expensive.
3658
36592016-02-04 Jim Meyering <meyering@fb.com>
3660
3661 maint: post-release administrivia
3662 * NEWS: Add header line for next release.
3663 * .prev-version: Record previous version.
3664 * cfg.mk (old_NEWS_hash): Auto-update.
3665
3666 version 2.23
3667 * NEWS: Record release date.
3668
36692016-02-02 Jim Meyering <meyering@fb.com>
3670
3671 gnulib: update to latest
3672 Update for this "make distcheck"-fixing change:
3673 > verify-tests: also remove stray test-verify.Tpo
3674
36752016-02-01 Jim Meyering <meyering@fb.com>
3676
3677 tests/null-byte: test another code path
3678 * tests/null-byte: Also exercise the case in which there is
3679 a match in the block along with the NUL byte.
3680
36812016-01-31 Paul Eggert <eggert@cs.ucla.edu>
3682
3683 Omit excess "Binary file ... matches"
3684 Problem reported in: http://bugs.gnu.org/22461
3685 * src/grep.c (grep): Don't report "Binary file ... matches"
3686 merely because the file contained both matches and binary data.
3687 Insist that the binary data contained a match.
3688 * tests/null-byte: Add a test for this.
3689
36902016-01-28 Jim Meyering <meyering@fb.com>
3691
3692 gnulib: update to latest
3693
36942016-01-23 Jim Meyering <meyering@fb.com>
3695
3696 gnulib: update to latest
3697
3698 maint: fix typo in NEWS: s/a/an/
3699
37002016-01-15 Paul Eggert <eggert@cs.ucla.edu>
3701
3702 grep: -x now supersedes -w more consistently
3703 * NEWS, doc/grep.texi (Matching Control): Mention this.
3704 * src/dfasearch.c (EGexecute):
3705 * src/pcresearch.c (Pcompile):
3706 Don't get confused by -w if -x is also present.
3707 * src/pcresearch.c (Pcompile): Remove misleading comment about
3708 non-UTF-8 multibyte locales, as PCRE doesn't support them.
3709 Calculate buffer sizes more carefully; the old method
3710 allocated a buffer slightly too big, seemingly due to luck.
3711 * tests/backref-word, tests/pcre: Add tests for this bug.
3712
3713 tests: omit update-copyright-tests
3714 This test does not check how 'grep' itself operates, so it is
3715 out of place for grep's 'make check'. Problem reported by Sam Razavi in:
3716 http://bugs.gnu.org/22376
3717 * bootstrap.conf (avoided_gnulib_modules): Add update-copyright-tests.
3718
37192016-01-11 Jim Meyering <meyering@fb.com>
3720
3721 tests: do use "yes" but via an AWK replacement
3722 Also, use sed Nq in place of head -N
3723 * tests/init.cfg (yes): Define.
3724 Thanks to Paul Eggert for this definition.
3725 * tests/max-count-overread: Revert to using "yes".
3726 * tests/mb-non-UTF8-performance: Likewise, and use
3727 "sed Nq" in place of head -N.
3728
37292016-01-11 Paul Eggert <eggert@cs.ucla.edu>
3730
3731 * tests/pcre-count: Don't assume the page size is 32kB.
3732
37332016-01-08 Paul Eggert <eggert@cs.ucla.edu>
3734
3735 tests: port to other POSIXish platforms
3736 I tested this on Solaris 10 and AIX 7.1.
3737 * tests/max-count-overread:
3738 * tests/mb-non-UTF8-performance:
3739 Don't assume 'yes' exists, as 'yes' is not in POSIX.
3740 * tests/mb-non-UTF8-performance:
3741 Don't rely on 'head -1000', as that option syntax is not POSIX.
3742 * tests/pcre-count: Don't rely on "printf '\x0'".
3743 * tests/unibyte-binary: Don't assume \200 is an encoding error
3744 in every unibyte locale.
3745
37462016-01-08 Jim Meyering <meyering@fb.com>
3747
3748 tests: fix encoding-error test failure to use of printf '\xHH'
3749 * tests/encoding-error: Don't rely on printf having support for \xHH
3750 hexadecimal. That is not portable. Use \OOO octal, instead.
3751
3752 maint: fix typo in NEWS: s/a/an/
3753
37542016-01-07 Jim Meyering <meyering@fb.com>
3755
3756 mb-non-UTF8-performance: avoid FP test failure on fast hardware
3757 * tests/mb-non-UTF8-performance: Don't use a fixed size.
3758 Otherwise, on a fast system, the fixed-size unibyte test
3759 would complete in a nominal 0 ms, which might well be
3760 smaller than 1/30 of the multibyte duration, provoking
3761 a false positive test failure. Instead, increase the
3762 size of the input until we obtain a unibyte duration of
3763 at least 10ms.
3764
37652016-01-07 Paul Eggert <eggert@cs.ucla.edu>
3766
3767 doc: mention unibyte encoding fix
3768 * NEWS: Document recent fix for encoding errors in unibyte locales.
3769
3770 grep: improve unibyte -P performance
3771 This is a followon to the recent changes prompted by Bug#20526.
3772 In <http://bugs.gnu.org/bug=20526#86> Norihiro Tanaka pointed out
3773 that grep mistakenly assumed that unibyte locales cannot have
3774 encoding errors. Here, the mistake hurt performance significantly.
3775 On Fedora 23 x86-64 in the C locale, this patch improved grep's
3776 performance by a factor of 7 when run as "grep -P 'z.*a'" on the
3777 output of "yes $(printf '\200\n') | head -n 1000000000".
3778 * src/pcresearch.c (multibyte_locale) [HAVE_LIBPCRE]: New static var.
3779 (Pcompile): Set it.
3780 (Pexecute): Use it to avoid the need to call
3781 buf_has_encoding_errors in unibyte locales.
3782
37832016-01-06 Paul Eggert <eggert@cs.ucla.edu>
3784
3785 Improve on fix for Bug#22181
3786 * src/pcresearch.c (Pexecute): Update subject when skipping past
3787 easily-determined encoding errors, as this is faster than letting
3788 pcre_exec skip them. On my platform this improves performance
3789 4.7x on a benchmark created via "yes $(printf '\200\200\200\200
3790 \200\200\200\200\200\200\200\200\200\200\200\200\200\200\200\200x\n')
3791 | head -n 1000000 >j; grep -oP y j" in a UTF-8 locale. Rework
3792 code that deals with PCRE_ERROR_BADUTF8 return, to avoid an
3793 incorrect (albeit currently harmless) 'bol = false' assignment.
3794
3795 grep: restore -P optimization (followup fix)
3796 * src/search.h (EGexecute, Fexecute, Pexecute):
3797 Change decls to match new implementations.
3798 I forgot to add this file to the previous commit.
3799
3800 grep: restore -P PCRE_NO_UTF8_CHECK optimization
3801 On my platform in the en_US.utf8 locale, this makes 'grep -P "z.*a" k'
3802 220x faster, where k is created by the shell command:
3803 yes 'abcdefg hijklmn opqrstu vwxyz' | head -n 10000000 >k
3804 * src/dfasearch.c (EGexecute):
3805 * src/grep.c (execute_fp_t):
3806 * src/kwsearch.c (Fexecute):
3807 * src/pcresearch.c (Pexecute):
3808 First arg is now char *, not char const *, since Pexecute now
3809 temporarily modifies this argument.
3810 * src/grep.c, src/grep.h (buf_has_encoding_errors): Now extern.
3811 * src/pcresearch.c (Pexecute): Use it. If the input is free of
3812 encoding errors, use a multiline search and the PCRE_NO_UTF8_CHECK
3813 option, as this is typically way faster. This restores an
3814 optimization that was removed with the recent changes for binary
3815 file detection.
3816
38172016-01-05 Paul Eggert <eggert@cs.ucla.edu>
3818
3819 Fix calculation of unibyte_mask
3820 * src/grep.c (initialize_unibyte_mask): The old method worked for
3821 UTF-8 and other typical encodings, but did not work for weird
3822 encodings, e.g., one where all bytes other than 0x7f and 0x80 are
3823 unibyte characters.
3824
38252016-01-01 Paul Eggert <eggert@cs.ucla.edu>
3826
3827 grep: fix bug with with invalid unibyte sequence
3828 This was introduced by the recent binary-data-detection changes.
3829 Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/20526#86
3830 * src/grep.c (HIBYTE, easy_encoding, init_easy_encoding): Remove,
3831 replacing with ...
3832 (uword_max, unibyte_mask, initialize_unibyte_mask): ... this new
3833 constant, static var, and function. All uses changed. The
3834 unibyte_mask var generalizes the old local var hibyte_mask, which
3835 worked only for encodings where every byte with 0x80 turned off is
3836 a single-byte character.
3837 (buf_has_encoding_errors): Return false immediately if
3838 unibyte_mask is zero, not whether the current encoding is unibyte.
3839 The old test was incorrect in unibyte locales in which some bytes
3840 were encoding errors.
3841 * tests/pcre-z: Require UTF-8 locale, since the grep -z . test now
3842 needs this. Use printf \0 rather than tr. Port the 'grep -z .'
3843 test to platforms where the C locale says '\200' is an encoding
3844 error. Use cmp rather than compare, as the file is binary and
3845 so non-GNU diff might not work.
3846 * tests/unibyte-binary: New file.
3847 * tests/Makefile.am (TESTS): Add it.
3848
38492016-01-01 Jim Meyering <meyering@fb.com>
3850
3851 maint: update copyright year, bootstrap, init.sh
3852 Run "make update-copyright" and then...
3853
3854 * gnulib: Update to latest.
3855 * tests/init.sh: Update from gnulib.
3856 * bootstrap: Likewise.
3857
38582015-12-31 Paul Eggert <eggert@cs.ucla.edu>
3859
3860 doc: clarify text vs binary match output
3861 * NEWS:
3862 * doc/grep.texi (File and Directory Selection):
3863 Make it clearer that grep can now output matching text before
3864 reporting a binary match. Problem reported by Norihiro Tanaka in:
3865 http://bugs.gnu.org/20526#83
3866
3867 doc: minor clarifications
3868 * doc/grep.in.1, doc/grep.texi: Minor clarifications suggested by
3869 Debian documentation patches. Problem reported by Santiago Ruano
3870 Rincón in: http://bugs.gnu.org/18651
3871
3872 grep: fix -l --line-buffer bug
3873 Problem reported by Louis Sautier in: http://bugs.gnu.org/18750
3874 * NEWS: Document this.
3875 * src/grep.c (grep, grepdesc): If --line-buffered, flush
3876 stdout after outputting newline (or null byte, if applicable).
3877
38782015-12-30 Paul Eggert <eggert@cs.ucla.edu>
3879
3880 grep: remove duplicate init
3881 * src/grep.c (print_line_middle): Remove duplicate initialization.
3882
3883 grep: report line-buffered write error right away
3884 * src/grep.c (prline): When line buffered, if there is a write
3885 error, report it immediately rather than waiting until the next
3886 line of output.
3887
3888 grep: -c should keep counting after binary data
3889 Problem and fix reported by Jaroslav Škarvada, and test case
3890 reported by Norihiro Tanaka, in: http://bugs.gnu.org/22028
3891 * NEWS: Document this.
3892 * src/grep.c (grep): Don't stop counting merely because nulls seen.
3893 * tests/pcre-count: New file.
3894 * tests/Makefile.am (TESTS): Add it.
3895
3896 dfa: port to tinycc
3897 * src/dfa.c (add_utf8_anychar): Put 'const' after type.
3898 Problem reported by Aharon Robbins in:
3899 http://bugs.gnu.org/22260
3900
3901 grep: be less picky about encoding errors
3902 This fixes a longstanding problem introduced in grep 2.21,
3903 which is overly picky about binary files.
3904 * NEWS:
3905 * doc/grep.texi (File and Directory Selection): Document this.
3906 * src/grep.c (input_textbin, textbin_is_binary, buffer_textbin)
3907 (file_textbin):
3908 Remove. All uses removed.
3909 (encoding_error_output): New static var.
3910 (buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls):
3911 New functions, which reuse bits
3912 and pieces of the removed functions.
3913 (lastout, print_line_head, print_line_middle, print_line_tail, prline)
3914 (prpending, prtext, grepbuf):
3915 Avoid use of const, now that we have
3916 functions that require modifying a sentinel.
3917 (print_line_head): New arg LEN. All uses changed.
3918 (print_line_head, print_line_tail):
3919 Return indicator whether the output line was printed.
3920 All uses changed.
3921 (print_line_middle): Exit early on encoding error.
3922 (grep): Use new method for determining whether file is binary.
3923 * src/grep.h (enum textbin, TEXTBIN_BINARY, TEXTBIN_UNKNOWN)
3924 (TEXTBIN_TEXT, input_textbin): Remove decls. All uses removed.
3925 * src/pcresearch.c (Pexecute): Remove multiline optimization,
3926 since the main program no longer checks for encoding errors on input.
3927 * tests/encoding-error: New file.
3928 * tests/Makefile.am (TESTS): Add it.
3929
39302015-12-29 Jim Meyering <meyering@fb.com>
3931
3932 maint: correct (make sorted) order of test file names
3933 * tests/Makefile.am (TESTS): Insert new test name in sorted order.
3934
39352015-12-28 Paul Eggert <eggert@cs.ucla.edu>
3936
3937 grep: --exclude matches trailing parts of args
3938 Problem reported by Vincent Lefevre in:
3939 http://bugs.gnu.org/22144
3940 * NEWS:
3941 * doc/grep.texi (File and Directory Selection): Document this.
3942 * src/grep.c (excluded_patterns, excluded_directory_patterns):
3943 Now 2-element arrays, with one element for subfiles and another
3944 for command-line args. All uses changed. This implements the change.
3945 (exclude_options): New function.
3946 * tests/include-exclude: Test the change.
3947
39482015-12-18 Jim Meyering <meyering@fb.com>
3949
3950 grep -oP: don't infloop when processing invalid UTF8 preceding a match
3951 * src/pcresearch.c (Pexecute): When advancing SUBJECT past an
3952 encoding error, don't blindly set P to that new value, since we
3953 will soon compute SEARCH_OFFSET = P - SUBJECT, and mistakenly
3954 making that difference too small would allow us to match some
3955 previously-processed text, resulting in an infinite loop.
3956 * NEWS (Bug fixes): Mention it.
3957 * THANKS.in: Add Christian's name and email address.
3958 * tests/pcre-invalid-utf8-infloop: New file.
3959 * tests/Makefile.am (TESTS): Add it.
3960 Reported by Christian Boltz in http://debbugs.gnu.org/22181
3961 Introduced by commit, v2.21-37-g14f8e48.
3962
39632015-11-04 Jim Meyering <meyering@fb.com>
3964
3965 tests: mark performance-related tests as expensive
3966 These performance-related tests are slightly failure prone due to
3967 varying system load during the two runs.
3968 Marking these tests as "expensive" makes it so they are no longer run
3969 via "make check". You can still run them via make "check-expensive".
3970 This makes them less likely to be run by regular users.
3971 * tests/long-pattern-perf: Use expensive_.
3972 * tests/mb-non-UTF8-performance: Likewise.
3973 Reported by Jaroslav Skarvada in http://debbugs.gnu.org/21826
3974 and by Andreas Schwab in http://debbugs.gnu.org/21812.
3975
39762015-11-01 Jim Meyering <meyering@fb.com>
3977
3978 maint: post-release administrivia
3979 * NEWS: Add header line for next release.
3980 * .prev-version: Record previous version.
3981 * cfg.mk (old_NEWS_hash): Auto-update.
3982
3983 version 2.22
3984 * NEWS: Record release date.
3985
3986 tests: pcre-jitstack: upon failure, retry with no stack size limit
3987 * tests/pcre-jitstack: Don't let an example that provokes inordinate
3988 stack space use cause a test failure. Thanks to reports from and
3989 analysis by Bruce Dubbs; see http://debbugs.gnu.org/21755
3990
39912015-10-27 Jim Meyering <meyering@fb.com>
3992
3993 maint: update THANKS.in
3994 * THANKS.in: Add name+email of those who found and reported
3995 the bug that made grep -E '^x|x$' match any "x".
3996
39972015-10-25 Zev Weiss <zev@bewilderbeest.net>
3998
3999 dfa: plug a memory leak in dfamust
4000 * src/dfa.c (dfamust): Ensure MP is freed, by refraining
4001 from returning early when, at "done:" *RESULT is NULL.
4002
40032015-10-25 Jim Meyering <meyering@fb.com>
4004
4005 gnulib: update to latest
4006 * gnulib: Pull in one more portability fix:
4007 stdalign: port to Sun C 5.9
4008
40092015-10-24 Jim Meyering <meyering@fb.com>
4010
4011 gnulib: update to latest, for portability fixes
4012 * gnulib: Pull in changes like these:
4013 fts: port to C11 alignof
4014 stdalign: work around pre-4.9 GCC x86 bug
4015
4016 maint: NEWS: correct/amend
4017 * NEWS: Move the long-regexp-performance-improvement from
4018 "Bug fixes" to "Improvements." Say more and include an example.
4019 The -Fw degradation was introduced in commit v2.18-125-g94555dd
4020
4021 tests: avoid spurious failure on OpenBSD 5.8
4022 * tests/fedora: Don't rely on "diff - FILE" reading from stdin.
4023 Reported privately by Nelson Beebe.
4024
40252015-10-17 Jim Meyering <meyering@fb.com>
4026
4027 gnulib: update to latest; also bootstrap and tests/init.sh
4028 * bootstrap: Update from gnulib.
4029 * tests/init.sh: Likewise.
4030 * gnulib: Update submodule to latest.
4031
4032 build: avoid spurious bootstrap failure involving pkg.m4
4033 Running ./bootstrap could fail mistakenly at the very end in
4034 its attempt to obtain a copy of pkg.m4. It would search only
4035 $(aclocal --print-ac-dir) and some other directories, but not
4036 those listed in $(aclocal --print-ac-dir)/dirlist.
4037 * bootstrap.conf (bootstrap_post_import_hook): Also search the
4038 directories named in $(aclocal --print-ac-dir)/dirlist when that
4039 file exists with nonzero size.
4040
40412015-10-16 Paul Eggert <eggert@cs.ucla.edu>
4042
4043 maint: add news item
4044 * NEWS: Document grep -Fw speedup.
4045
4046 grep: simplify previous change
4047 * src/grep.c (main): Simplify recently-changed grep -Fw test.
4048
40492015-10-16 Norihiro Tanaka <noritnk@kcn.ne.jp>
4050
4051 grep: use grep matcher for grep -Fw when unibyte
4052 In single byte locales with grep -Fw, prefer the grep matcher to the
4053 kwset matcher, as the former uses KWset and a DFA, whereas the latter
4054 calls kwsexec many times until it matches a word.
4055 * src/grep.c (main): Change pattern for fgrep into grep for grep -Fw in
4056 single byte locales.
4057
40582015-10-16 Paul Eggert <eggert@cs.ucla.edu>
4059
4060 grep: use memchr/memrchar
4061 * src/kwsearch.c (Fexecute): Prefer memchr and memrchr to doing it
4062 by hand.
4063
40642015-10-16 Norihiro Tanaka <noritnk@kcn.ne.jp>
4065
4066 grep: improve performance of grep -Fw
4067 * src/kwsearch.c (Fexecute): grep -Fw examined whether the previous
4068 character is a word character after matching from the head of the
4069 buffer. It is extremely slow. Now, if grep found a potential match,
4070 it looks for the previous newline, and examines from there.
4071
40722015-10-13 Jim Meyering <meyering@fb.com>
4073
4074 maint: use single quote rather than UTF-8 multi-byte version
4075 * tests/backref-alt: Translate unnecessary non-ASCII in comment.
4076
40772015-10-13 Paul Eggert <eggert@cs.ucla.edu>
4078
4079 dfa: make the executable a bit smaller
4080 * src/dfa.c (dfamust): Hoist MB_CUR_MAX calculation out of loops.
4081
40822015-10-13 Norihiro Tanaka <noritnk@kcn.ne.jp>
4083
4084 dfa: fix bug in alternate of sub-patterns that differ only in constraints
4085 Fix a bug where a line incorrectly matches alternates of sub-patterns
4086 that differ only in the constraints, e.g., the ERE '^a|a$'.
4087 Reported by Greg Boyd in: http://debbugs.gnu.org/21670
4088 * src/dfa.c (dfamust): For a pattern with constraints, check that it is
4089 matched including the constraints, to judge whether it is exact.
4090
4091 dfa: fix off-by-one error
4092 * src/dfa.c (dfamust): Fix off-by-one error in computing 'must' length,
4093 which caused the 'must' to be too short. See:
4094 http://bugs.gnu.org/21670#28
4095
40962015-10-12 Jim Meyering <meyering@fb.com>
4097
4098 doc: NEWS: mention a bug fix
4099 * NEWS (Bug fixes): Describe it.
4100 This bug was introduced by commit v2.18-85-g2c94326
4101 and fixed by commit v2.21-51-g256a4b4.
4102
41032015-10-11 Paul Eggert <eggert@cs.ucla.edu>
4104
4105 tests: add test case for Bug#21670
4106 * tests/options: Add test #4 to catch Bug#21670.
4107 Also, do not overescape # in shell strings.
4108
41092015-09-19 Paul Eggert <eggert@cs.ucla.edu>
4110
4111 Add test for pop_fail_stack bug
4112 Problem reported by Hanno Böck in: http://bugs.gnu.org/21513
4113 If you use --with-included-regex the bug fix is in gnulib, here:
4114 http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=5513b40999149090987a0341c018d05d3eea1272
4115 If you use glibc, the bug fix has not been installed yet.
4116 * tests/Makefile.am (XFAIL_TESTS): Add backref-alt if system matcher.
4117 (TESTS): Add backref-alt.
4118 * tests/backref-alt: New file.
4119 * tests/triple-backref: Remove unused var.
4120 Don't skip if tested with glibc, as Makefile.am now handles this.
4121
4122 build: update gnulib submodule to latest
4123
41242015-08-19 Norihiro Tanaka <noritnk@kcn.ne.jp>
4125
4126 grep: avoid use of uninitialized variable
4127 EGexecute would use "backref" uninitialized.
4128 While that could have no bearing on correctness, it could
4129 impact performance, via an unnecessary use of regexp.
4130 * src/dfasearch.c (EGexecute): Initialize backref.
4131 Reported as http://debbugs.gnu.org/21273
4132 Introduced by commit v2.21-55-gea0ebaa.
4133
41342015-08-12 Norihiro Tanaka <noritnk@kcn.ne.jp>
4135
4136 grep: remove fgrep code for case insensitive match
4137 The fgrep matcher is no longer called in case insensitive matching,
4138 so remove the code to support it.
4139 * src/kwsearch.c (mb_case_map_apply): Remove function.
4140 (Fexecute): Remove now-unused code.
4141
41422015-08-12 Paul Eggert <eggert@cs.ucla.edu>
4143
4144 dfa: optimize [x-x]
4145 * src/dfa.c (parse_bracket_exp): Treat [x-x] as if it were [x].
4146 This also pacifies GCC, which otherwise complains about wc2
4147 being set but not used.
4148
41492015-08-12 Norihiro Tanaka <noritnk@kcn.ne.jp>
4150
4151 dfa: remove unused multibyte support
4152 Now regex should be used for range, collating element, equivalent class
4153 in non POSIX locales. So remove code to support these features.
4154 * dfa.c (struct mb_char_classes): Remove members ch_classes,
4155 nch_classes, ranges, nranges, equivs, nequivs, coll_elems, ncoll_elems.
4156 All uses removed.
4157 (match_mb_charset): Remove function.
4158
41592015-08-01 Jim Meyering <meyering@fb.com>
4160
4161 tests: mb-non-UTF8-performance: use new function
4162 * tests/mb-non-UTF8-performance: Rewrite to use
4163 the user-time measuring function in init.cfg.
4164
4165 tests: long-pattern-perf: measure user time, not elapsed
4166 Measuring user time makes this test less prone to false
4167 positive failure, and also lets us use a tighter bound.
4168 * tests/long-pattern-perf: Measure elapsed user time rather than
4169 wall-clock time, to permit a tighter bound on the ratio of
4170 N-to-10N timings. Suggested by Giuseppe Ottaviano.
4171 Also, use regexps built from mostly 5-digit numbers, so that the 10:1
4172 ratio applies to lines of "seq" output as well as to total bytes.
4173
4174 tests: new function to measure elapsed user time
4175 * tests/init.cfg (user_time_): New function.
4176
41772015-07-25 Norihiro Tanaka <noritnk@kcn.ne.jp>
4178
4179 dfa: remove word delimiter support for multibyte locales
4180 DFA supports word delimiter expressions, but it does not behave
4181 correctly for multibyte locales. Even if it were to be fixed,
4182 the DFA matcher's performance would be no better than that of regex.
4183 Thus, this change removes DFA support for word delimiter expressions
4184 in multibyte locales.
4185
4186 * src/dfa.c (dfa_supported): Return false also when a pattern uses any
4187 word delimiter expression in a multibyte locale.
4188
41892015-07-25 Norihiro Tanaka <noritnk@kcn.ne.jp>
4190
4191 dfa: avoid execution for a pattern including an unsupported expression
4192 If a pattern includes a construct unsupported by the DFA matcher,
4193 the DFA search would fail in most cases. Make dfaexec immediately
4194 return for any such pattern.
4195
4196 * src/dfa.c (struct dfa_state) [has_backref, has_mbcset]: Remove members
4197 and all uses.
4198 (dfaexec_main): Remove 'backref' parameter. Update callers.
4199 (dfaexec_noop): New function.
4200 (dfa_supported): New function.
4201 (dfassbuild): Remove now-unused code.
4202 (dfacomp): When a pattern uses a DFA-unsupported construct, do not
4203 waste time performing any further analysis.
4204
42052015-07-19 Norihiro Tanaka <noritnk@kcn.ne.jp>
4206
4207 dfa: DEBUG: print detail of DFA states
4208 When compiled with -DDEBUG, grep outputs tokens etc.
4209 With this change, also print DFA states and transitions.
4210 This change is very useful when debugging those.
4211
4212 * src/dfa.c (prtok) [DEBUG]: Change `%c' to `%02x' in printf format.
4213 (state_index) [DEBUG]: Print detail of new state.
4214 (dfastate) [DEBUG]: Print detail of DFA states.
4215 Reported as http://debbugs.gnu.org/18707
4216
42172015-07-18 Norihiro Tanaka <noritnk@kcn.ne.jp>
4218
4219 tests: sjis-mb: accept two more locales
4220 * tests/sjis-mb: Accept the ja_JP.SJIS and ja_JP.PCK locales
4221 as well as ja_JP.SHIFT_JIS, so this test is less likely to
4222 be skipped unnecessarily. Reported as http://bugs.gnu.org/18983
4223
42242015-07-18 Jim Meyering <meyering@fb.com>
4225
4226 tests: add a test for the performance fix
4227 * tests/long-pattern-perf: New file.
4228 * tests/Makefile.am (TESTS): Add it.
4229
42302015-07-18 Norihiro Tanaka <noritnk@kcn.ne.jp>
4231
4232 dfa: speed up handling of long pattern
4233 DFA tries to find a long sequence of characters that must appear
4234 in any matching line. However, when a pattern is long (length N),
4235 it is very slow, because it makes O(N^2) strstr calls.
4236 This change reduces that to O(N) by processing each sequence of
4237 adjacent "regular" characters as a group.
4238
4239 Compare the run times of this command before and after this change:
4240 (on a i7-4770S CPU @ 3.10GHz using rawhide (~fedora 22) and compiled
4241 with gcc 6.0.0 20150627)
4242 : | env time -f %e grep -f <(seq -s '' 9999)
4243 Before: 0.85
4244 After: 0.02
4245
4246 * src/dfa.c (dfamust): Process each string of concatenated normal
4247 characters as a unit.
4248 * NEWS (Improvement): Mention it.
4249 Prompted by a bug report and patch by Ivan Yanikov
4250 in http://bugs.gnu.org/15191#5
4251
42522015-07-17 Jim Meyering <meyering@fb.com>
4253
4254 tests: fix mis-applied patch.
4255 * tests/include-exclude: I applied "|sort" to the wrong creation
4256 of "out", and didn't push the same patch that I'd tested.
4257
4258 tests: avoid FS-dependent false-positive failure
4259 * tests/include-exclude: Sort file name list, so that this test
4260 is not sensitive to the order in which those names are returned
4261 via readdir. I noticed the failure on a Fedora 21 system using ext4.
4262 Also fix a typo: s/framework_failure+/framework_failure_/
4263
42642015-07-13 Paul Eggert <eggert@cs.ucla.edu>
4265
4266 grep: fix bug with --exclude-dir and command line
4267 Reported by Aron Griffis in: http://bugs.gnu.org/21027
4268 * NEWS: Document this.
4269 * src/grep.c (grepdirent): Don't check whether the file is skipped
4270 when on the command line, as that's the caller's responsibility.
4271 (main): Anchor the exclude patterns.
4272 * tests/include-exclude: Adjust test case to match fixed behavior.
4273 Add some more test cases.
4274
4275 tests: fix $? typo in null-byte
4276 * tests/null-byte: Don't assume $? survives an invocation of 'test'.
4277
42782015-07-05 Jim Meyering <meyering@fb.com>
4279
4280 maint: dfa: used unsigned types where appropriate
4281 * src/dfa.c (case_folded_counterparts): Return unsigned int, not int.
4282 Change type of two locals to unsigned int, to reflect that their
4283 values are never negative.
4284 (parse_bracket_exp): Adjust type of result at each use, as well
4285 as that of related index variables.
4286
42872015-07-04 Norihiro Tanaka <noritnk@kcn.ne.jp>
4288
4289 dfa: build struct dfamust on demand
4290 If we won't use KWset, do not build a "struct dfamust".
4291 Now it is built only when needed.
4292 * src/dfa.c (struct dfa) [musts]: Remove member.
4293 (dfacomp): Don't build dfamust here.
4294 (dfamustfree): New function to free a struct dfamust.
4295 (dfamust): Make it a global function, and make it return a pointer
4296 to a malloc'd struct dfamust.
4297 (dfamusts): Remove it.
4298 * src/dfa.h (struct dfamust) [next]: Remove member.
4299 In the implementation preceding this patch, there was
4300 never more than one of these in a given "struct dfa".
4301 (dfamustfree, dfamust): Add prototypes.
4302 (dfamusts): Remove prototype.
4303 (dfaalloc): Declare with _GL_ATTRIBUTE_MALLOC.
4304 To make that symbol usable there, move the inclusion
4305 of "xalloc.h" from dfa.c to this file, dfa.h.
4306 * src/dfasearch.c (kwsmusts): Adapt to use the new interface.
4307 Update the comments to reflect reality.
4308 This addresses http://bugs.gnu.org/17715
4309
43102015-07-04 Paul Eggert <eggert@cs.ucla.edu>
4311
4312 grep: use recent gnulib syntax bits
4313 * src/grep.c (Gcompile, Ecompile): Use plain RE_SYNTAX_GREP
4314 and RE_SYNTAX_EGREP, now that we assume a recent-enough gnulib.
4315
4316 maint: ignore gendocs_template_min
4317 * doc/.gitignore: Add '/gendocs_template_min'.
4318
4319 build: update gnulib submodule to latest
4320
4321 dfa: '.' and '[^x]' now consistently match newline
4322 * src/dfa.c (parse_bracket_exp, lex, add_utf8_anychar)
4323 (match_anychar): RE_DOT_NEWLINE and RE_HAT_LISTS_NOT_NEWLINE
4324 are about LF, not about eolbyte. This patch does not affect
4325 'grep', but may affect other users of dfa.c.
4326
4327 grep: -z '[^x]' now consistently matches newline
4328 Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/20974#19
4329 * NEWS: Document this.
4330 * src/grep.c (Gcompile, Ecompile): Clear RE_HAT_LISTS_NOT_NEWLINE.
4331 * tests/utf8-bracket: Test this.
4332
43332015-07-03 Paul Eggert <eggert@cs.ucla.edu>
4334
4335 grep: -z '.' now consistently matches newline
4336 Problem reported by Balazs Kezes in: http://bugs.gnu.org/20974
4337 * NEWS: Document this.
4338 * tests/utf8-bracket: New file, to test for this bug.
4339 * src/grep.c (Gcompile, Ecompile): Also specify RE_DOT_NEWLINE.
4340 * tests/Makefile.am (TESTS): Add it.
4341
4342 grep: simplify print_line_middle slightly
4343 * src/grep.c (print_line_middle): Simplify.
4344
4345 grep: don't mishandle left context in -P
4346 http://bugs.gnu.org/20957
4347 * src/pcresearch.c (jit_exec): New arg SEARCH_OFFSET.
4348 Caller changed.
4349 (Pexecute): Pass the left context to pcre_exec, so that PCRE
4350 regular-expression matching can see it.
4351 * tests/pcre-context: New file, to test for this bug.
4352 * tests/Makefile.am (TESTS): Add it.
4353
43542015-06-28 Jim Meyering <meyering@fb.com>
4355
4356 tests/case-fold-backref: factor test
4357
43582015-06-26 Paul Eggert <eggert@cs.ucla.edu>
4359
4360 grep: don't hang on command-line fifo if -D skip
4361 * NEWS: Document this.
4362 * src/grep.c (skip_devices):
4363 New function, with code taken from grepdirent.
4364 (grepdirent): Use it. Avoid an unnecessary initialization.
4365 (grepfile): If skipping devices, open files with O_NONBLOCK.
4366 Throw in O_NOCTTY while we're at it.
4367 (grepdesc): Skip devices here, too. Not only does this fix the
4368 bug, it fixes an unlikely race condition if some other process
4369 renames a device between fstatat and openat.
4370 * tests/skip-device: Add a test for this bug.
4371
4372 grep: minor tweaks
4373 * src/grep.c (main): Change recently-added static vars to be
4374 constants, which makes them sharable. Prefer 'return' to 'exit'
4375 when returning/exiting from 'main'. Move decl closer to first use
4376 and rename local from 'ok' (which was confusing) to 'status'.
4377 Prefer named constant STDOUT_FILENO to unnamed constant 1.
4378
43792015-06-26 Jim Meyering <meyering@fb.com>
4380
4381 maint: unify three argv-processing calls
4382 * src/grep.c (main): Unify three calls to grep_commandline_arg.
4383
4384 maint: alphabetize anonymous enum member names
4385
43862015-05-30 Paul Eggert <eggert@cs.ucla.edu>
4387
4388 test: tighten tests for bracket exprs
4389 * tests/posix-bracket: Test '[a-a[.-.]--]'.
4390 Also, test that failures are with status 1
4391 (nonmatching data), not status 2 (invalid expressions).
4392
43932015-04-26 Jim Meyering <meyering@fb.com>
4394
4395 maint: update bootstrap from gnulib
4396 * bootstrap: Update from gnulib.
4397
4398 maint: reword a diagnostic not to trigger leading capital check
4399 * src/pcresearch.c: Reword diagnostic to avoid "make syntax-check"
4400 failure.
4401
4402 maint: sort test names in tests/Makefile.am and add syntax-check rule
4403 * cfg.mk (sc_sorted_tests): New rule.
4404 * tests/Makefile.am (TESTS): Alphabetize.
4405
44062015-04-25 Norihiro Tanaka <noritnk@kcn.ne.jp>
4407
4408 dfa: make find_pred return NULL for an invalid predicate
4409 This could never happen when invoked via grep, but could have triggered
4410 a bug if dfa.c's find_pred function were invoked by some other program.
4411 * src/dfa.c (find_pred): Return NULL for an invalid predicate.
4412 * tests/invalid-char-class: New file to test for this.
4413 * tests/Makefile.am (TESTS): Add that new file name to the list.
4414 This addresses http://debbugs.gnu.org/18631
4415
44162015-04-06 Paul Eggert <eggert@cs.ucla.edu>
4417
4418 build: improve pkg-config doc and error handling
4419 Error-handling improvement suggested by Mike Frysinger in:
4420 http://bugs.gnu.org/16757#29
4421 * NEWS: Document pkg-config changes.
4422 * README-prereq: pkg-config is now a prereq when building from
4423 repository.
4424 * m4/pcre.m4 (gl_FUNC_PCRE): Report an error if pcre is explicitly
4425 requested but not available. Defer to user-supplied PCRE_CFLAGS
4426 and PCRE_LIBS.
4427
4428 build: remove typo and don't bother with /usr/include/pcre
4429 Problem reported by Holger Bruenjes.
4430 * m4/pcre.m4: Remove test for /usr/include/libpng (a typo).
4431 Come to think of it, don't bother worrying about
4432 /usr/include/pcre, as hosts with that problem can use pkg-config
4433 or configure with CFLAGS by hand.
4434
4435 build: use pkg-config (if available) to configure libpcre
4436 Problem reported by Mike Frysinger in: http://bugs.gnu.org/16757
4437 * bootstrap.conf (bootstrap_post_import_hook):
4438 Copy pkg-config's pkg.m4.
4439 * configure.ac: Invoke PKG_PROG_PKG_CONFIG.
4440 * m4/pcre.m4 (gl_FUNC_PCRE): Rewrite to use pkg-config if
4441 available, and to test that pcre_compile can be linked to.
4442 * src/Makefile.am (AM_CFLAGS): Add PCRE_CFLAGS.
4443 (grep_LDADD): Add PCRE_LIBS.
4444 * src/pcresearch.c: Simply include <pcre.h> if HAVE_LIBPCRE,
4445 since 'configure' arranges for the appropriate -I option now.
4446
44472015-03-11 Paul Eggert <eggert@cs.ucla.edu>
4448
4449 grep: output "." file name in diagnostic
4450 This is bug C as reported by David Grayson in:
4451 http://bugs.gnu.org/16444#18
4452 This bug occurs only in obscure circumstances, and I didn't see
4453 how to write a reasonable test case for it.
4454 * src/grep.c (filename_prefix_len): Remove, replacing with ...
4455 (omit_dot_slash): New static var. All uses of the former replaced
4456 with uses of the latter.
4457 (grepdirent): Don't add 2 if the filename is just ".".
4458
4459 egrep, fgrep: just use what's in PATH
4460 * src/egrep.sh: Don't monkey with PATH; just use whatever 'grep'
4461 is in the path. This is simpler, and lets the user specify
4462 default options with a script for only grep, with no need for
4463 egrep and fgrep scripts.
4464 Fixes: bug#19998
4465
4466 doc: give a script wrapper example
4467 * doc/grep.texi (Environment Variables): Give an example of a
4468 wrapper script, as an alternative to using GREP_OPTIONS.
4469 Fixes: bug#19998
4470
4471 doc: clarify how -a matches
4472 * doc/grep.in.1, doc/grep.texi (File and Directory Selection):
4473 Give an example of how non-text bytes affect pattern matching in
4474 binary files.
4475 Fixes: bug#20080
4476
44772015-02-23 Paul Eggert <eggert@cs.ucla.edu>
4478
4479 Cover the non-INSTALL case
4480 * README: Mention what to do if there is no INSTALL file.
4481 Fixes: bug#19928
4482
44832015-02-11 Jim Meyering <meyering@fb.com>
4484
4485 maint: use ASAN-poisoning more carefully
4486 The ASAN-poisoning instituted by commit v2.21-14-g1555185 was
4487 incomplete, since the poisoned tail of the read buffer could well
4488 be the target of a legitimate follow-on read. To accommodate that,
4489 we must unpoison each such region just before beginning fillbuf's
4490 read loop.
4491 * src/grep.c [HAVE_ASAN] (asan_poison): Define.
4492 (clear_asan_poison): Define.
4493 (fillbuf): Clear before reading, since we are likely to read
4494 into memory that was poisoned on the preceding iteration.
4495 * tests/two-files: New file, to test for this.
4496 * tests/Makefile.am (TESTS): Add it.
4497
44982015-02-10 Paul Eggert <eggert@cs.ucla.edu>
4499
4500 Grow the JIT stack if it becomes exhausted
4501 Problem reported by Oliver Freyermuth in: http://bugs.gnu.org/19833
4502 * NEWS: Document the fix.
4503 * tests/Makefile.am (TESTS): Add pcre-jitstack.
4504 * tests/pcre-jitstack: New file.
4505 * src/pcresearch.c (NSUB): Move decl earlier, since it's needed
4506 earlier now.
4507 (jit_stack_size) [PCRE_STUDY_JIT_COMPILE]: New static var.
4508 (jit_exec): New function.
4509 (Pcompile): Initialize jit_stack_size.
4510 (Pexecute): Use new jit_exec function. Report a useful diagnostic
4511 if the error is PCRE_ERROR_JIT_STACKLIMIT.
4512
45132015-02-01 Jim Meyering <meyering@fb.com>
4514
4515 maint: reference CVE-2015-1345 from NEWS
4516 * NEWS: Mention the CVE that was addressed by v2.21-13-g83a95bd,
4517 "grep -F: fix a heap buffer (read) overrun".
4518
45192015-01-18 Jim Meyering <meyering@fb.com>
4520
4521 maint: convert "goto" to "continue" and remove now-spurious label
4522 * src/kwset.c (bmexec_trans): Using "goto big_advance" here is
4523 equivalent to using "continue". Make that change and remove
4524 the now-unused label.
4525
45262015-01-10 Jim Meyering <meyering@fb.com>
4527
4528 tests: add support for ASAN memory poisoning
4529 This lets us reliably detect with ASAN some UMR bugs
4530 that would otherwise be detectable only some of the time
4531 with MSAN. Use __asan_poison_memory_region to mark the unused
4532 portion of a read buffer as inaccessible. Then, with ASAN,
4533 any attempt to access those bytes results in an ASAN abort.
4534 * src/system.h: Include "ignore-value.h".
4535 (__has_feature): Define.
4536 (HAVE_ASAN): Define when address sanitizer is enabled.
4537 [HAVE_ASAN]: Declare these two __asan_* symbols.
4538 [!HAVE_ASAN] (__asan_poison_memory_region): Define stub.
4539 [!HAVE_ASAN] (__asan_unpoison_memory_region): Likewise.
4540 * src/grep.c: Use __asan_poison_memory_region.
4541
45422015-01-09 Yuliy Pisetsky <ypisetsky@fb.com>
4543
4544 grep -F: fix a heap buffer (read) overrun
4545 grep's read buffer is often filled to its full size, except when
4546 reading the final buffer of a file. In that case, the number of
4547 bytes read may be far less than the size of the buffer. However, for
4548 certain unusual pattern/text combinations, grep -F would mistakenly
4549 examine bytes in that uninitialized region of memory when searching
4550 for a match. With carefully chosen inputs, one can cause grep -F to
4551 read beyond the end of that buffer altogether. This problem arose via
4552 commit v2.18-90-g73893ff with the introduction of a more efficient
4553 heuristic using what is now the memchr_kwset function. The use of
4554 that function in bmexec_trans could leave TP much larger than EP,
4555 and the subsequent call to bm_delta2_search would mistakenly access
4556 beyond end of the main input read buffer.
4557
4558 * src/kwset.c (bmexec_trans): When TP reaches or exceeds EP,
4559 do not call bm_delta2_search.
4560 * tests/kwset-abuse: New file.
4561 * tests/Makefile.am (TESTS): Add it.
4562 * THANKS.in: Update.
4563 * NEWS (Bug fixes): Mention it.
4564
4565 Prior to this patch, this command would trigger a UMR:
4566
4567 printf %0360db 0 | valgrind src/grep -F $(printf %019dXb 0)
4568
4569 Use of uninitialised value of size 8
4570 at 0x4142BE: bmexec_trans (kwset.c:657)
4571 by 0x4143CA: bmexec (kwset.c:678)
4572 by 0x414973: kwsexec (kwset.c:848)
4573 by 0x414DC4: Fexecute (kwsearch.c:128)
4574 by 0x404E2E: grepbuf (grep.c:1238)
4575 by 0x4054BF: grep (grep.c:1417)
4576 by 0x405CEB: grepdesc (grep.c:1645)
4577 by 0x405EC1: grep_command_line_arg (grep.c:1692)
4578 by 0x4077D4: main (grep.c:2570)
4579
4580 See the accompanying test for how to trigger the heap buffer overrun.
4581
4582 Thanks to Nima Aghdaii for testing and finding numerous
4583 ways to break early iterations of this patch.
4584
45852015-01-08 Jim Meyering <meyering@fb.com>
4586
4587 grep: avoid false-positive UMR
4588 For some inputs, valgrind would report an uninitialized
4589 memory read error, but it was harmless.
4590 * src/grep.c (fillbuf): Initialize those trailing bytes.
4591
45922015-01-01 Jim Meyering <meyering@fb.com>
4593
4594 gnulib: update to latest
4595
4596 maint: update copyright year ranges to include 2015
4597 Run "make update-copyright". Also, ...
4598 * grep.texi: Update manually, converting each "--" to "-".
4599
46002014-12-15 Paul Eggert <eggert@cs.ucla.edu>
4601
4602 doc: document binary-data heuristic better
4603 Problem reported by Martin Hoch in: http://bugs.gnu.org/19388
4604 * doc/grep.texi (File and Directory Selection):
4605 Document what non-text bytes are.
4606 (Usage): Fix cross reference.
4607
46082014-12-12 Jim Meyering <meyering@fb.com>
4609
4610 maint: fix a new "make syntax-check" failure
4611 * tests/dfa-match-aux.c: s/can not/cannot/
4612
46132014-12-12 Norihiro Tanaka <noritnk@kcn.ne.jp>
4614
4615 build: avoid build failure with --enable-gcc-warnings and no PCRE
4616 * src/pcresearch.c [HAVE_LIBPCRE] (empty_match): Guard the declaration
4617 of this PCRE-only variable.
4618
46192014-12-07 Paul Eggert <eggert@cs.ucla.edu>
4620
4621 tests: port fmbtest to CentOS 6 and earlier
4622 * tests/fmbtest: Port to platforms where the 'sed' pattern
4623 '[^0-9]' does not match every non-digit character. Problem
4624 reported by Norihiro Tanaka in: http://bugs.gnu.org/19293
4625
46262014-12-06 Norihiro Tanaka <noritnk@kcn.ne.jp>
4627
4628 dfa: simplify dfaexec
4629 * src/dfa.c (dfaexec): Simplify by rearrangement of IF conditions.
4630 This commit induces no semantic change, and reverts part of commit
4631 v2.5.4-144-gbafa134.
4632
46332014-12-06 Norihiro Tanaka <noritnk@kcn.ne.jp>
4634
4635 dfa: avoid invalid match or infinite loop in unused matching mode
4636 Neither grep nor gawk uses this DFA code in its matching mode,
4637 since each always calls dfacomp with a nonzero final argument.
4638 However, when used in that mode, it had bug:
4639 After failing to match in matching mode, it should return NULL,
4640 but instead would either report a false match or enter an
4641 infinite loop.
4642
4643 * src/dfa.c (dfaexec_main): After failing to match in matching mode
4644 return NULL, rather than transitioning to the next state.
4645 * tests/dfa-match: Add a new test.
4646 * tests/dfa-match-aux.c: Add a new program to exercise this
4647 otherwise-unused part of dfa.c.
4648 * tests/Makefile.am: Add a rule to build new test.
4649 (check_PROGRAMS): Add dfa-match-aux.
4650 (AM_CPPFLAGS): Add -I$(top_srcdir)/src.
4651 (TESTS): Add dfa-match.
4652 * cfg.mk (exclude_file_name_regexp--sc_bindtextdomain):
4653 (exclude_file_name_regexp--sc_prohibit_atoi_atof):
4654 Exempt the new test file from some syntax-check rules.
4655
46562014-12-04 Santiago Ruano Rincón <santiago@debian.org>
4657
4658 doc: document grep-2.11 change in behavior of -r, --recursive
4659 * doc/grep.texi (--recursive, -r): Mention the new behavior
4660 of recursively searching "." when there is no FILE argument.
4661 * doc/grep.in.1: Likewise.
4662 That change first appeared in grep-2.11, released on 2012-03-02.
4663
46642014-11-24 Jim Meyering <meyering@fb.com>
4665
4666 maint: correct for four Author: name misspellings
4667 * .mailmap: Correct for misspelling in Norihiro Tanaka's last name
4668 as listed in four commit Author: fields: s/Norihirio/Norihiro/
4669
46702014-11-23 Jim Meyering <meyering@fb.com>
4671
4672 maint: post-release administrivia
4673 * NEWS: Add header line for next release.
4674 * .prev-version: Record previous version.
4675 * cfg.mk (old_NEWS_hash): Auto-update.
4676
4677 version 2.21
4678 * NEWS: Record release date.
4679
46802014-11-21 Jim Meyering <meyering@fb.com>
4681
4682 tests: sjis-mb: remove now-obsolete and failing sub-tests
4683 * tests/sjis-mb: Commit v2.18-123-geb3292b changed how grep
4684 handles patterns with encoding errors. These SJIS tests are
4685 skipped so often that we didn't notice until now that there were
4686 two tests of that changed behavior, and that on any system with
4687 the ja_JP.SHIFT_JIS locale, they would always fail. Remove those
4688 two tests, since this functionality is well tested separately,
4689 via tests/prefix-of-multibyte.
4690
46912014-11-20 Norihiro Tanaka <noritnk@kcn.ne.jp>
4692
4693 grep -F could erroneously fail to match in non-UTF8 multibyte locales
4694 This fixes a bug that can strike only when using a non-UTF8 multibyte
4695 locale like ja_JP.SHIFT_JIS.
4696
4697 Consider this example: it would mistakenly fail to match before
4698 this patch:
4699
4700 printf '\203AA\n'|LC_ALL=ja_JP.SHIFT_JIS src/grep -F A
4701
4702 When searching for a single byte that happens to be the latter
4703 byte of a multibyte character, and the target byte also follows
4704 that multibyte character, grep -F would advance an internal pointer
4705 by one byte too many, thus missing the target byte. A test case
4706 for this bug is already included in tests/sjis-mb.
4707
4708 * src/kwsearch.c (Fexecute): Skip one byte less, after matched middle of a
4709 multi-byte character. Introduced by commit v2.18-119-gfb7d538.
4710
47112014-11-17 Jim Meyering <meyering@fb.com>
4712
4713 tests: big-match: disable OOM-provoking subtest
4714 * tests/big-match: Our application of this regexp '^.*x\(\)\1'
4715 to a file containing a single matching line of length 2GiB+2
4716 would cause inordinate memory consumption (over 100GB) via
4717 regexec.c, but no leak. That would cause disruption on most
4718 systems, so remove this subtest. Reported by Assaf Gordon.
4719
47202014-11-16 Norihiro Tanaka <noritnk@kcn.ne.jp>
4721
4722 dfa: avoid undefined behavior
4723 * src/dfa.c (dfassbuild): Don't call memcpy with a second
4724 argument of NULL, even when the size (3rd argument) is 0.
4725
47262014-11-14 Jim Meyering <meyering@fb.com>
4727
4728 gnulib: update to latest
4729
47302014-11-14 Norihiro Tanaka <noritnk@kcn.ne.jp>
4731
4732 grep -F -x -o PAT would print an extra newline for each match
4733 * src/kwsearch.c (Fexecute): Correctly compute the length of a match
4734 by subtracting 2 (not 1) when match_lines is set. With -x, we augment
4735 the "line" by both prepending and appending an EOLBYTE to the search
4736 pattern. Here, we must correct for that. However, to compensate,
4737 when we are using -x (--line-regexp) and start_ptr is NULL, we have
4738 to add 1 to the length so that we still print the trailing EOLBYTE.
4739 Introduced by commit v2.18-85-g2c94326.
4740 * tests/match-lines: Add a new test.
4741 * tests/Makefile.am (TESTS): Add it.
4742 * NEWS (Bug fixes): Mention it.
4743
47442014-11-11 Paul Eggert <eggert@cs.ucla.edu>
4745
4746 tests: port to Darwin
4747 The 'sed' command 's/.//' does not delete all bytes in the C locale.
4748 Problem reported by Nelson H. F. Beebe.
4749 * tests/fmbtest: Don't assume that sed treats bytes with the
4750 top bit set as valid characters in the C locale, as this is not
4751 true for Darwin. Use the cs_CZ.UTF-8 locale instead, and
4752 simplify the sed script.
4753
4754 tests: fix recently-introduced stray output
4755 * tests/init.cfg (require_pcre_): Remove stray debugging output.
4756
4757 build: port to GCC 4.6.4 + glibc 2.5
4758 On platforms this old, building with _FORTIFY_SOURCE equal to 2
4759 results in duplicate definitions of standard library functions.
4760 Problem reported by Nelson H. F. Beebe.
4761 * configure.ac (_FORTIFY_SOURCE): Sort after GNULIB_PORTCHECK.
4762 By default, do not enable this unless GNULIB_PORTCHECK is defined.
4763 This better matches the original intent, which as I recall was to
4764 enable these extra checks only with --enable-gcc-warnings.
4765
4766 tests: port to libpcre sans UTF-8 support
4767 Problem reported by Nelson H. F. Beebe.
4768 * tests/pcre-infloop, tests/pcre-invalid-utf8-input, tests/pcre-utf8:
4769 Skip the test unless PCRE works in an en_US.UTF-8 locale.
4770
47712014-11-09 Jim Meyering <meyering@fb.com>
4772
4773 tests: do not fail when the zh_CN.UTF-8 locale is not installed
4774 * tests/word-multibyte: This test would fail on a system with
4775 no zh_CN.UTF-8 locale. Use it only if it is installed.
4776
4777 tests: avoid hex_printf_ portability problems
4778 * tests/init.cfg (hex_printf_): Spell out a-f and A-F, for
4779 non-C locales, ensure that the input to sed is newline-terminated,
4780 and quote the final octal format string.
4781 Suggestions from Paul Eggert.
4782
47832014-11-08 Jim Meyering <meyering@fb.com>
4784
4785 tests: avoid a multibyte tr portability problem
4786 * tests/init.cfg (tr): New wrapper function.
4787 See comments for details. Reported by Norihiro Tanaka
4788 in http://debbugs.gnu.org/18991
4789
4790 maint: remove spurious LC_ALL setting from one test
4791 * tests/word-multibyte: Remove unnecessary setting of LC_ALL.
4792
4793 tests: fix typo in previous change
4794 * tests/init.cfg (hex_printf_): Fix typo s/A-f/A-F/.
4795 For the record, I introduced that error, not Norihiro.
4796
47972014-11-08 Norihiro Tanaka <noritnk@kcn.ne.jp>
4798
4799 tests: avoid awk+printf+\xHH portability trap
4800 * tests/init.cfg (hex_printf_): Rewrite in terms of printf and sed.
4801 Using awk's printf with \xHH in the format string was not portable
4802 to the awk of Solaris 10, AIX 7 or HP-UX 11.23, as reported in
4803 http://debbugs.gnu.org/18987.
4804 * tests/word-multibyte: Use printf rather than hex_printf_,
4805 and give the character we're printing a name: e_acute (rather
4806 than A-grave), since that is used in other tests.
4807 a trailing \n in the format string, adjust by removing it, and
4808 instead invoking echo.
4809 * tests/multibyte-white-space: Simply remove each trailing \n.
4810 They were not needed.
4811
48122014-11-07 Jim Meyering <meyering@fb.com>
4813
4814 tests: avoid printf+\xHH portability trap
4815 * tests/word-multibyte: Using the bourne shell's printf function
4816 with strings like "\xHH\xHH" happens to work for most interactive
4817 shells, but not for dash. That is not portable. Use our hex_printf_
4818 awk wrapper instead. Without this change, this test would fail on
4819 a Debian system for which /bin/sh is configured to be "dash".
4820
4821 maint: move helper function, hex_printf to init.cfg
4822 * tests/init.cfg (hex_printf_): New function, from ...
4823 * tests/multibyte-white-space: ... here. Reflect the
4824 s/hex_print/hex_printf_/ renaming.
4825
48262014-11-02 Paul Eggert <eggert@cs.ucla.edu>
4827
4828 grep: port O_NOFOLLOW errno checking to NetBSD
4829 Problem reported by Assaf Gordon in: http://bugs.gnu.org/18892
4830 * NEWS: Document it.
4831 * src/grep.c (open_symlink_nofollow_error):
4832 New function, which does the right thing on NetBSD.
4833 (grepfile): Use it.
4834
48352014-10-31 Jim Meyering <meyering@fb.com>
4836
4837 build: generate man pages even when existing targets are read-only
4838 * doc/Makefile.am (grep.1): Use mv -f to move temporary to target,
4839 in case the target is read-only. Also, always make the generated
4840 files read-only.
4841 (egrep.1 fgrep.1): Likewise.
4842 This avoids a build failure reported by Eric Blake in
4843 http://lists.gnu.org/archive/html/bug-grep/2014-10/msg00112.html
4844
48452014-10-30 Jim Meyering <meyering@fb.com>
4846
4847 tests: avoid false-positive failure due to some zh_CN.* locales
4848 On some systems, and for some zh_CN.* locales (e.g., OpenBSD5.5) the
4849 E-acute pair of bytes do not qualify as a word-constituent character.
4850 * tests/word-multibyte: Use zh_CN.UTF-8, rather than "zh_CN".
4851 Reported by Assaf Gordon and Bruce Dubbs in
4852 http://debbugs.gnu.org/18892
4853
48542014-10-29 Jim Meyering <meyering@fb.com>
4855
4856 gnulib: update to latest; bootstrap, too
4857 * gnulib: Update to latest.
4858 * bootstrap: Copy latest from gnulib.
4859
48602014-10-28 Jim Meyering <meyering@fb.com>
4861
4862 tests: make new test script executable
4863 * tests/word-multibyte: Make this file executable.
4864
48652014-10-28 Norihiro Tanaka <noritnk@kcn.ne.jp>
4866
4867 dfa: make \w and \W work in multibyte locales
4868 Reported by Jaroslav Skarvada in: http://bugs.gnu.org/18817
4869 Now, \w and \W are supported in not only single byte locale but multibyte
4870 locale.
4871
4872 * src/dfa.c (PUSH_LEX_STATE, POP_LEX_STATE): Move definitions "up",
4873 so they are not within the function.
4874 (lex): Make \w and \W work in a multibyte locale, the same way
4875 we made \s and \S work.
4876 * tests/word-multibyte: New test for this change.
4877 * tests/Makefile.am: Add a rule to build new test.
4878 * NEWS (Bug fixes): Mention it.
4879
48802014-10-26 Norihiro Tanaka <noritnk@kcn.ne.jp>
4881
4882 dfa: avoid false match in a non-UTF8 multibyte locale
4883 This command should print nothing:
4884
4885 printf '\263\244\263\244\n' \
4886 | LC_ALL=ja_JP.eucJP grep -E "$(printf '^x|\244\263')"
4887
4888 Before this patch, it would print its sole input line.
4889 * src/dfa.c (struct dfa): Add new members: min_trcount,
4890 initstate_letter, initstate_others.
4891 (dfaanalyze): Build states with not only a newline context but others.
4892 (build_state): Don't release initial states.
4893 (skip_remains_mb): Add a parameter.
4894 Add a comment describing all parameters.
4895 (dfaexec_main): When there are multiple start states, we are about
4896 to transition from one state to another and the current byte is not
4897 the first byte of a multibyte character, first advance past the
4898 current multibyte character.
4899 * tests/euc-mb: Add a new test.
4900 * NEWS (Bug fixes): Mention it.
4901 This addresses http://debbugs.gnu.org/18685
4902
49032014-10-25 Paul Eggert <eggert@cs.ucla.edu>
4904
4905 tests: work around older libpcre bugs when testing -P and UTF-8
4906 * tests/pcre-invalid-utf8-input: Add require_timeout_ and
4907 require_compiled_in_MB_support. Put a timeout of 3 seconds on
4908 grep, to avoid having this test case loop forever with older
4909 versions of libpcre, such as those found on RHEL 6.5.
4910 Reported by Jim Meyering in: http://bugs.gnu.org/18806#34
4911
49122014-10-24 Norihiro Tanaka <noritnk@kcn.ne.jp>
4913
4914 tests: add test for grep -P fix
4915 * tests/pcre-o: New test for this change.
4916 * tests/Makefile.am (TESTS): Add it.
4917
49182014-10-24 Paul Eggert <eggert@cs.ucla.edu>
4919
4920 grep: fix grep -P crash
4921 Reported by Shlomi Fish in: http://bugs.gnu.org/18806
4922 Commit 9fa500407137f49f6edc3c6b4ee6c7096f0190c5 (2014-09-16) is a
4923 hack that I put in to speed up 'grep -P'. Unfortunately, not only
4924 is it violation of modularity, it's also a bug magnet, as we have
4925 found out with Bug#18738 and Bug#18806. Remove the optimization
4926 instead of applying more bandaids. Perhaps we can think of a
4927 better way of doing the optimization, or perhaps we can just live
4928 with a slower grep -P (as -P is inherently slower anyway...).
4929 * src/grep.c, src/grep.h (validated_boundary):
4930 Remove. All uses removed.
4931 * src/pcresearch.c (Pexecute): Do not worry about validated_boundary.
4932
49332014-10-19 Norihiro Tanaka <noritnk@kcn.ne.jp>
4934
4935 dfa: remove two erroneous clauses from a now-unused function
4936 RE_DOT_NEWLINE and RE_DOT_NOT_NULL apply only to a dot that
4937 matches any character. Do not consider them when matching
4938 with a bracket expression.
4939
4940 * src/dfa.c (match_mb_charset): Remove tests for RE_DOT_NEWLINE
4941 and RE_DOT_NOT_NULL.
4942
49432014-10-19 Norihiro Tanaka <noritnk@kcn.ne.jp>
4944
4945 dfa: process all MBCSET constructs via glibc's matcher
4946 The DFA matcher does not support collating symbols or equivalence
4947 classes, so ensure that any MBCSET reference is handled by the glibc
4948 matcher. dfa.c already handled this in one case, but not the other,
4949 so that a command like "printf '\0' |src/grep -aE '^\s?$'" would
4950 mistakenly end up using dfa.c's match_mb_charset function rather
4951 than glibc's matcher.
4952
4953 * src/dfa.c (dfaexec_main): Move that code into the
4954 State_transition macro. This renders the match_mb_charset
4955 unused by grep.
4956 * tests/multibyte-white-space: Add a test to exercise the
4957 just-rendered-inaccessible code path.
4958
49592014-10-15 Norihiro Tanaka <noritnk@kcn.ne.jp>
4960
4961 grep: initialize validation_boundary properly before use
4962 * src/grep.c (main): Initialize validation_boundary before pre-searching
4963 for an empty line.
4964
49652014-10-15 Paul Eggert <eggert@cs.ucla.edu>
4966
4967 grep: fix off-by-one bug in -P optimization
4968 Reported by Norihiro Tanaka in: http://bugs.gnu.org/18738
4969 * src/pcresearch.c (Pexecute): Fix off-by-one bug with
4970 validation_boundary.
4971 * tests/init.cfg (envvar_check_fail): Catch off-by-one bug.
4972
49732014-10-08 Norihiro Tanaka <noritnk@kcn.ne.jp>
4974
4975 dfa: fix a theoretical bug
4976 * src/dfa.c (dfaexec_main): After searching for a match from
4977 the initial state, set the previous state, S1, to 0.
4978 So far, we have found no case in which this fix makes a difference.
4979 See http://debbugs.gnu.org/18645
4980
49812014-10-07 Paul Eggert <eggert@cs.ucla.edu>
4982
4983 doc: modernize and simplify man page
4984 * doc/grep.in.1 (Tx, Id): Remove. All uses removed.
4985 (MTO, URL): New macros, used for email and URL.
4986 Use them when appropriate.
4987 In main text, omit chatty discussions of other implementations;
4988 the full manual suffices for this sort of thing.
4989
4990 doc: clarify exit status
4991 Reported by Santiago Ruano Rincón in: http://bugs.gnu.org/18651
4992 * doc/grep.in.1 (EXIT STATUS):
4993 * doc/grep.texi (Exit Status): Clarify.
4994
49952014-10-07 Norihiro Tanaka <noritnk@kcn.ne.jp>
4996
4997 dfa: test for just-fixed bug
4998 * tests/mb-dot-newline: New file.
4999 * tests/Makefile.am (TESTS): Add it.
5000 * NEWS (Bug fixes): Mention it.
5001 Bisection suggests that the bug was introduced by
5002 commit v2.18-123-geb3292b. Also see
5003 http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580
5004
50052014-10-05 Norihiro Tanaka <noritnk@kcn.ne.jp>
5006
5007 dfa: factor out a new nontrivial block of duplicated code
5008 * src/dfa.c (State_transition): New macro.
5009 (dfaexec_main): Use it twice.
5010
5011 dfa: check end of input buffer after transition in non-UTF8 multibyte locale
5012 * src/dfa.c (dfaexec_main): Check for end of input buffer after each
5013 transition in a non-UTF8 multibyte locale.
5014 * tests/mb-non-UTF8-overrun: New test.
5015 * tests/Makefile.am (TESTS): Add it.
5016 * src/grep.c (main): With this fix, we no longer need the fourth
5017 byte of "eolbytes".
5018
50192014-10-04 Jim Meyering <meyering@fb.com>
5020
5021 grep: avoid stack buffer read-underrun and overrun
5022 Testing binaries built with -fsanitize=address caused aborts due
5023 to stack underrun and overrun.
5024 * src/grep.c (main): Allocate a larger buffer for eolbytes:
5025 one byte before the beginning and one more after the end.
5026 For details, see http://debbugs.gnu.org/18580#44.
5027
50282014-10-04 Norihiro Tanaka <noritnk@kcn.ne.jp>
5029
5030 grep: fix subscript error when testing whether empty lines match
5031 src/grep.c (grep): When testing whether an empty line matches,
5032 make the input buffer one byte longer, as dfaexec uses that
5033 for a sentinel.
5034
50352014-09-27 Paul Eggert <eggert@cs.ucla.edu>
5036
5037 dfa: minor tweaks, mostly to remove __attribute__ ((noinline))
5038 That attribute isn't portable, and I found a way to get similar
5039 performance with standard C features.
5040 * NEWS: Document the recently-installed performance improvement.
5041 * src/dfa.c (struct dfa): New member dfaexec.
5042 (dfaexec_main): Remove unnecessary 'const'.
5043 (dfaexec_mb, dfaexec_sb): Remove __attribute__ ((noinline));
5044 no longer needed.
5045 (dfaexec): Use new dfaexec member.
5046 (dfainit, dfaoptimize, dfassbuild): Initialize it.
5047
50482014-09-27 Norihiro Tanaka <noritnk@kcn.ne.jp>
5049
5050 dfa: separate dfaexec function to help optimization by compiler
5051 * src/dfa.c (dfaexec_main): Rename from dfaexec, add inline attribute.
5052 (dfaexec_mb): New function. Run it when d->multibyte is true. For this
5053 function inlination must be avoided.
5054 (dfaexec_sb): New function. Run it when d->multibyte is false. For this
5055 function inlination must be avoided.
5056 (dfaexec): Call dfaexec_mb or dfaexec_sb accoding to d->multibyte.
5057
50582014-09-27 Norihiro Tanaka <noritnk@kcn.ne.jp>
5059
5060 dfa: speed-up at initial state
5061 DFA state is always 0 until have found potential match. So we improve
5062 matching there by continuing to use the transition table.
5063
5064 * src/dfa.c (skip_remains_mb): New function.
5065 (dfaexec): Speed-up at initial state.
5066
50672014-09-27 Paul Eggert <eggert@cs.ucla.edu>
5068
5069 maint: generalize the -Wcast-align fix
5070 * src/grep.c (CAST_ALIGNED): New macro.
5071 (skip_easy_bytes): Use it.
5072
50732014-09-27 Jim Meyering <meyering@fb.com>
5074
5075 maint: suppress a false-positive -Wcast-align warning
5076 Building with --enable-gcc-warnings and gcc-4.9.1 would provoke this:
5077 grep.c:499:12: error: cast from 'const char *' to 'const uword *'\
5078 (aka 'const unsigned long *') increases required alignment from\
5079 1 to 8 [-Werror,-Wcast-align]
5080 for (s = (uword const *) p; ! (*s & hibyte_mask); s++)
5081 ^~~~~~~~~~~~~~~~~
5082 * src/grep.c (skip_easy_bytes): Use a pragma to suppress
5083 gcc's false-positive cast-alignment warning.
5084
50852014-09-26 Paul Eggert <eggert@cs.ucla.edu>
5086
5087 grep: don't check extensively for invalid prefix bytes unless -P
5088 Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56
5089 * src/grep.c (grep): After the first buffer is checked, leave the
5090 file-type checker in TEXTBIN_UNKNOWN state only when -P is used.
5091 Only the -P matcher has performance problems with checking binary
5092 data that make it worthwhile to check every prefix input byte so
5093 the -P matcher's TEXTBIN_UNKNOWN optimizations can come into play.
5094 Other matchers can simply check the data directly, and using
5095 TEXTBIN_UNKNOWN with them slows 'grep' down for no benefit.
5096
5097 grep: scan for valid multibyte strings more quickly
5098 Scan valid multibyte strings more quickly in the common case of
5099 encodings that are upward compatible with ASCII, such as UTF-8.
5100 You'd think there'd be a fast standard way to do this nowadays,
5101 but nooooo....
5102 Problem reported by Jim Meyering in: http://bugs.gnu.org/18454#56
5103 * src/grep.c (HIBYTE): New constant.
5104 (easy_encoding): New static var.
5105 (init_easy_encoding, skip_easy_bytes): New functions.
5106 (uword): New type.
5107 (buffer_textbin): Skip easy bytes quickly.
5108 Don't bother with mb_clen here, since skip_easy_bytes typically
5109 captures the easy cases; just use mbrlen directly.
5110 (buffer_textbin, file_textbin): First arg is no longer a const
5111 pointer, since the byte past the end is now an overwritten sentinel.
5112 (fillbuf): Make room for a uword after the buffer, for skip_easy_bytes.
5113 (main): Call init_easy_encoding.
5114
51152014-09-17 Paul Eggert <eggert@cs.ucla.edu>
5116
5117 grep: speed up processing of holes before EOF on Solaris
5118 * src/grep.c (fillbuf): If SEEK_DATA fails with errno == ENXIO,
5119 skip over the hole at EOF.
5120
5121 grep: port to platforms lacking SEEK_DATA
5122 Reported by Norihiro Tanaka in: http://bugs.gnu.org/18454#38
5123 * src/grep.c (SEEK_DATA): Default to SEEK_SET if not defined.
5124 (SEEK_HOLE): Move to top level, and default it to SEEK_SET.
5125 (file_textbin): Adjust to new default.
5126 (fillbuf): Don't bother with SEEK_DATA if it defaults to SEEK_SET.
5127
5128 grep: skip past holes efficiently
5129 Take advantage of the relaxed rules for treating non-text bytes in
5130 binary data, by efficiently skipping past holes on platforms
5131 supporting lseek's SEEK_DATA flag.
5132 On one test on a circa-2008 Sun Fire V40z running Solaris 11.2,
5133 'grep x' took 0.009 real-time seconds to scan a holey file of size
5134 9,223,372,036,854,775,802 bytes, for a nominal scan rate of 1 ZB/s.
5135 grep 2.20's scan rate on this platform was 843 MB/s, so this is a
5136 speedup by a factor of 1.2 trillion. The speedup factor is not
5137 as great on GNU/Linux hosts, due to what appear to be SEEK_DATA
5138 inefficiencies, but presumably this will be cleared up in time.
5139 * NEWS: Document this.
5140 * src/grep.c, src/grep.h (eolbyte): Now char, not unsigned char.
5141 This is for compatibility with the rest of the code.
5142 The old (performance?) reasons for 'unsigned char' are now moot.
5143 * src/grep.c (skip_nuls, skip_empty_lines, seek_data_failed):
5144 New static vars.
5145 (totalnl): Move up, since it's about input, not output, and
5146 fillbuf now uses it.
5147 (add_count): Move up, since fillbuf now uses it.
5148 (all_zeros): New function.
5149 (fillbuf): Use SEEK_DATA to skip past holes efficiently,
5150 on systems that support this.
5151 (grep, main): Set the new static vars.
5152
5153 grep: improve -P performance in typical cases
5154 * src/grep.c, src/grep.h (enum textbin): Move to grep.h.
5155 (input_textbin, validated_boundary): New vars.
5156 * src/grep.c (grepbuf, grep): Initialize them.
5157 * src/pcresearch.c (Pexecute): Do a multiline search
5158 when the input is known to be free of encoding errors.
5159 Quickly discard bytes that are obviously encoding errors.
5160 Quickly match empty strings.
5161
5162 grep: minor -P speedup with jit_stack
5163 * src/pcresearch.c (jit_stack): No longer static.
5164
5165 grep: non-text bytes in binary data may be treated as line ends
5166 * NEWS, doc/grep.texi (File and Directory Selection):
5167 Document this change.
5168 * src/grep.c (zap_nuls): New function.
5169 (grep): Use it.
5170 * tests/null-byte: Relax to allow new behavior.
5171
5172 grep: -z no longer considers '\200' to be binary data
5173 This avoids a problem when using grep -z in a Windows-1252 locale.
5174 Plus, it lets 'grep -z' run a bit faster.
5175 * NEWS: Document this.
5176 * src/grep.c (buffer_textbin): Don't look for '\200' if -z.
5177 * tests/pcre-z: Test for new behavior.
5178
5179 grep: refactor binary-vs-unknown-vs-text flags for clarity
5180 * src/grep.c (enum textbin): New enum.
5181 (textbin_is_binary): New function.
5182 (buffer_textbin, file_textbin, grep): Use them, for clarity.
5183
51842014-09-16 Paul Eggert <eggert@cs.ucla.edu>
5185
5186 grep: fix -P speedup bug with empty match
5187 * src/pcresearch.c (NSUB): New top-level constant, replacing
5188 'nsub' within Pexecute.
5189 (Pcompile, Pexecute): Use it.
5190 (Pexecute): Don't assume sub[1] is zero after a PCRE_ERROR_BADUTF8
5191 match failure.
5192 * tests/pcre-invalid-utf8-input: Test for this bug.
5193
5194 grep: port -P speedup to hosts lacking PCRE_STUDY_JIT_COMPILE
5195 * src/pcresearch.c (Pcompile): Do not assume that
5196 PCRE_STUDY_JIT_COMPILE is defined.
5197 (empty_match): Define on all platforms.
5198
5199 grep: use mbclen cache in one more place
5200 * src/grep.c (fgrep_to_grep_pattern): Use mb_clen here, too.
5201
5202 grep: avoid false alarms for mb_clen and to_uchar
5203 * cfg.mk (_gl_TS_unmarked_extern_functions): New var,
5204 to bypass the tight_scope false alarms on mb_clen and to_uchar.
5205
5206 grep: use mbclen cache more effectively
5207 * src/grep.c (buffer_textbin, contains_encoding_error):
5208 Use mb_clen for speed.
5209 (buffer_textbin): Bypass mb_clen in unibyte locales.
5210 (main): Always initialize the cache, since it's sometimes used in
5211 unibyte locales now. Initialize it before contains_encoding_error
5212 might be called.
5213 * src/search.h (SEARCH_INLINE): New macro.
5214 (mbclen_cache): Now extern decl.
5215 (mb_clen): New inline function.
5216 * src/searchutils.c (SEARCH_INLINE, SYSTEM_INLINE): Define.
5217 (mbclen_cache): Now extern.
5218 (build_mbclen_cache): Put 1 into the cache when mbrlen returns 0.
5219 (mb_goback): Use mb_len for speed, and rely on it returning nonzero.
5220 * src/system.h (SYSTEM_INLINE): New macro.
5221 (to_uchar): Use it.
5222
5223 grep: improve performance for older glibc
5224 glibc has a bug where mbrlen and mbrtowc mishandle length-0 inputs.
5225 Working around it in gnulib slows grep down, so disable the tests for it
5226 and make sure grep works even if the bug is present.
5227 * bootstrap.conf (avoided_gnulib_modules): Add mbrtowc-tests.
5228 * configure.ac (gl_cv_func_mbrtowc_empty_input): Assume yes.
5229 * src/searchutils.c (mb_next_wc): Don't invoke mbrtowc on empty input.
5230
5231 grep: treat a file as binary if its prefix contains encoding errors
5232 * NEWS:
5233 * doc/grep.texi (File and Directory Selection):
5234 Document this.
5235 * src/grep.c (buffer_encoding, buffer_textbin): New functions.
5236 (file_textbin): Rename from file_is_binary. Now returns 3-way value.
5237 All callers changed.
5238 (file_textbin, grep): Check the input more carefully for text vs
5239 binary data.
5240 (contains_encoding_error): Remove; use replaced by buffer_encoding.
5241 * tests/backref-multibyte-slow:
5242 * tests/high-bit-range:
5243 * tests/invalid-multibyte-infloop:
5244 Use -a, since the input is now considered to be binary.
5245 * tests/invalid-multibyte-infloop: Add a check for new behavior.
5246
5247 grep: use bool for boolean in grep.c
5248 * src/grep.c (show_version, suppress_errors, only_matching)
5249 (align_tabs, match_icase, match_words, match_lines, errseen)
5250 (write_error_seen, is_device_mode, usable_st_size)
5251 (file_is_binary, skipped_file, reset, fillbuf, out_quiet)
5252 (out_line, out_byte, count_matches, no_filenames, line_buffered)
5253 (done_on_match, exit_on_match, print_line_head, prline, grep)
5254 (grepdirent, grepfile, grepdesc, grep_command_line_arg)
5255 (get_nondigit_option, main): Use bool for boolean.
5256 (print_line_head, prline): Use char for byte.
5257 * src/grep.h: Include <stdbool.h>, and adjust decls to match
5258 changes in grep.c.
5259
5260 grep: speed up -P on files containing many multibyte errors
5261 * src/pcresearch.c (empty_match): New var.
5262 (Pcompile): Set it.
5263 (Pexecute): Use it.
5264
5265 grep: remove/refactor unnecessary code about line splitting
5266 * src/grep.c (do_execute): Remove. Caller now uses 'execute'.
5267 * src/pcresearch.c (Pexecute): Improve comment about this.
5268
52692014-09-12 Paul Eggert <eggert@cs.ucla.edu>
5270
5271 grep: diagnose -P in non-UTF-8 multibyte locale
5272 * src/pcresearch.c (Pcompile):
5273 libpcre supports only unibyte and UTF-8 locales,
5274 so report an error and exit if used in other locales.
5275 * NEWS: Mention this.
5276 * tests/euc-mb: Test this.
5277
52782014-09-12 Jim Meyering <meyering@fb.com>
5279
5280 doc: move NEWS note about GREP_OPTIONS into proper section
5281 * NEWS (Changes in behavior): Move the note about GREP_OPTIONS
5282 from the 2.20 section into the section for the upcoming release.
5283
52842014-09-12 Paul Eggert <eggert@cs.ucla.edu>
5285
5286 grep: make GREP_OPTIONS obsolescent
5287 * NEWS:
5288 * doc/grep.in.1 (ENVIRONMENT_VARIABLES):
5289 * doc/grep.texi (Environment Variables):
5290 Document that GREP_OPTIONS is obsolescent now.
5291 * src/grep.c (main): Warn if GREP_OPTIONS is used.
5292 * tests/r-dot, tests/skip-device: Don't use GREP_OPTIONS.
5293
52942014-09-11 Paul Eggert <eggert@cs.ucla.edu>
5295
5296 doc: bug tracker has moved to debbugs.gnu.org
5297 * README (KNOWN BUGS):
5298 * doc/grep.in.1:
5299 * doc/grep.texi (Reporting Bugs): Document this.
5300
5301 grep: fix false matches with -P '...$' and invalid UTF-8
5302 * tests/pcre-invalid-utf8-input: Add a test for that.
5303
5304 grep: fix false matches with -P '...$' and invalid UTF-8
5305 * src/pcresearch.c (Pexecute): Use PCRE_NOTEOL when matching
5306 initial substrings of a line.
5307
53082014-09-10 Jim Meyering <meyering@fb.com>
5309
5310 tests: add expect-to-fail test for a glibc regexp bug
5311 * tests/triple-backref: New file.
5312 * tests/Makefile.am (TESTS): Add it.
5313 (XFAIL_TESTS): List it as a known, always-failing test.
5314 Based on the bug report from Paul Eggert:
5315 https://sourceware.org/bugzilla/show_bug.cgi?id=17356
5316
5317 maint: avoid distcheck failure
5318 * Makefile.am (EXTRA_DIST): Add .mailmap.
5319
53202014-09-10 Paul Eggert <eggert@cs.ucla.edu>
5321
5322 grep: port recent fix to older pcre version
5323 * src/pcresearch.c (Pexecute): Don't assume that a pcre_exec
5324 that returns PCRE_ERROR_NOMATCH leaves its sub argument alone.
5325 This assumption is false for libpcre-3 version 8.31-2ubuntu2.
5326
53272014-09-09 Paul Eggert <eggert@cs.ucla.edu>
5328
5329 grep: -P now treats invalid UTF-8 input as non-matching
5330 Problem reported by Santiago Vila in: http://bugs.gnu.org/18266
5331 * NEWS: Mention this.
5332 * src/pcresearch.c (Pexecute): Treat UTF-8 encoding errors
5333 as non-matching data, instead of exiting 'grep'.
5334 * tests/pcre-infloop: grep now exits with status 1, not 2.
5335 * tests/pcre-invalid-utf8-input: grep now exits with status 0, not 2.
5336
53372014-08-14 Paul Eggert <eggert@cs.ucla.edu>
5338
5339 grep: fix integer-width bugs in undossify_input etc.
5340 undossify_input bug reported by Vincent Lefevre in:
5341 http://bugs.gnu.org/18269
5342 * src/dosbuf.c (undossify_input): Return size_t, not int.
5343 * src/grep.c (fillbuf): Work portably even if safe_read returns a
5344 value greater than SSIZE_MAX, e.g., if there's an I/O error.
5345
53462014-08-03 Paul Eggert <eggert@cs.ucla.edu>
5347
5348 doc: document LANGUAGE
5349 Reported by Benno Schulenberg in: http://bugs.gnu.org/18185
5350 * doc/grep.texi (Environment Variables): Document LANGUAGE.
5351
5352 doc: prefer @env to @code
5353 Reported by Benno Schulenberg in: http://bugs.gnu.org/18184
5354 * doc/grep.texi: Avoid @code in favor of @env, or of nothing at all.
5355
53562014-07-11 Paul Eggert <eggert@cs.ucla.edu>
5357
5358 doc: Document -r vs --exclude more carefully.
5359 Problem reported by Hugues Andreux in: http://bugs.gnu.org/17763
5360 * doc/grep.texi (File and Directory Selection): Be more careful
5361 about documenting the interaction between recursive searching,
5362 --include, --exclude, and --exclude-dir.
5363
53642014-06-27 Jim Meyering <meyering@fb.com>
5365
5366 maint: split long lines, and enforce the 80-column limit
5367 * cfg.mk (sc_long_lines): New rule, from coreutils; exempt tests/*
5368 * src/grep.c (usage): Tweak -F wording to shorten a line.
5369 Correct grammar in a comment.
5370 Split the --exclude-file=... description to fit within 80 columns.
5371 Use emit_bug_reporting_address, eliminating another long line.
5372 * src/dfa.c: Split long lines. No semantic change.
5373 * doc/grep.texi: Likewise.
5374 * tests/include-exclude: Split a long line.
5375 * tests/backref: Split long lines.
5376 * tests/empty: Likewise.
5377 * tests/fmbtest: Likewise.
5378
5379 doc: update HACKING
5380 * HACKING: Update from coreutils.
5381
5382 maint: generate distributed THANKS from VC'd THANKS.in
5383 * Makefile.am (THANKS): New rule.
5384 * THANKS.in: New file.
5385 * THANKS: Remove. Now it's generated from the combination of
5386 THANKS.in and git logs.
5387 * .mailmap: New file.
5388 * cfg.mk (sc_THANKS_in_duplicates): New syntax-check rule, from
5389 coreutils.
5390 * .gitignore: Add THANKS.
5391 * thanks-gen: New file, from coreutils.
5392
53932014-06-27 Paul Eggert <eggert@cs.ucla.edu>
5394
5395 grep: with -E, unmatched ')' matches itself
5396 Problem reported by Nathan Weeks in: http://bugs.gnu.org/17856
5397 * src/grep.c (Ecompile): Also specify RE_UNMATCHED_RIGHT_PAREN_ORD.
5398 * doc/grep.texi (Fundamental Structure), NEWS: Document this.
5399 * tests/ere.tests: Add a couple of tests for this.
5400 * tests/spencer1.tests: Fix exit status.
5401
54022014-06-17 Paul Eggert <eggert@cs.ucla.edu>
5403
5404 build: avoid -Wstack-protector
5405 This allows the use of --enable-gcc-warnings on Gentoo and Ubuntu.
5406 See: http://bugs.gnu.org/17793
5407 * configure.ac (WERROR_CFLAGS): Avoid -Wstack-protector.
5408
5409 This can be worked around, but the cure is worse than the disease.
5410
54112014-06-17 Paul Eggert <eggert@cs.ucla.edu>
5412
5413 build: don't make output files read-only
5414 This led to problems, such as the prompt "mv: try to overwrite
5415 'egrep', overriding mode 0555 (r-xr-xr-x)? " during a build.
5416 It can be worked around, but the cure is worse than the disease;
5417 making output files read-only is more trouble than it's worth.
5418 * doc/Makefile.am (grep.1, egrep.1, fgrep.1):
5419 * lib/Makefile.am (colorize.c):
5420 * src/Makefile.am (egrep fgrep):
5421 Don't make output files read-only. Prefer separate commands to
5422 '&&' when either will do.
5423
54242014-06-08 Paul Eggert <eggert@cs.ucla.edu>
5425
5426 maint: remove grep.spec
5427 * grep.spec: Remove; obsolete and evidently not used.
5428
54292014-06-07 Paul Eggert <eggert@cs.ucla.edu>
5430
5431 doc: use gnulib fdl module
5432 * bootstrap.conf (gnulib_modules): Add fdl.
5433 * doc/fdl.texi: Remove, as this now comes from gnulib.
5434 * doc/.gitignore: Update to match current sources.
5435
54362014-06-06 Jim Meyering <meyering@fb.com>
5437
5438 build: improve rule to generate egrep+fgrep scripts
5439 * src/Makefile.am (egrep fgrep): chmod a=rx generated files,
5440 and remove $@-t before attempting to redirect to it, in case it
5441 is read-only.
5442
5443 build: don't redirect directly to $@
5444 * lib/Makefile.am (colorize.c): Don't redirect directly to target, $@.
5445 Otherwise, we could create a corrupt colorize.c file with a
5446 timestamp that indicates it is up to date.
5447 Also, make the generated file read-only.
5448
54492014-06-05 Paul Eggert <eggert@cs.ucla.edu>
5450
5451 grep: undo part of previous change
5452 * src/dfa.c (enlist): Undo part of previous change that doesn't
5453 look correct and doesn't help performance much anyway.
5454
5455 grep: use system strstr if available and fast
5456 Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/17700
5457 * NEWS: Document this.
5458 * bootstrap.conf (gnulib_modules): Add strstr.
5459 * src/dfa.c (istrstr): Remove.
5460 (enlist): Use strstr instead. Wait until we need memory before
5461 allocating it; this can save an unnecessary allocate and free.
5462
5463 build: update gnulib submodule to latest
5464
54652014-06-03 Jim Meyering <meyering@fb.com>
5466
5467 maint: post-release administrivia
5468 * NEWS: Add header line for next release.
5469 * .prev-version: Record previous version.
5470 * cfg.mk (old_NEWS_hash): Auto-update.
5471
5472 version 2.20
5473 * NEWS: Record release date.
5474
54752014-05-30 Jim Meyering <meyering@fb.com>
5476
5477 grep: fix --max-count=N (-m N) to stop reading after Nth match
5478 With --max-count=N (-m N), grep is supposed to stop reading input
5479 after it has found the Nth match. However, a recent context-
5480 related change made it so grep would always read to end of file.
5481 * src/grep.c (prtext): Don't let a negative "out_after" value
5482 make "pending" line count negative.
5483 * tests/max-count-overread: New test, for this.
5484 * tests/Makefile.am (TESTS): Add it.
5485 * NEWS (Bug fixes): Mention it.
5486 * THANKS: Add names of two recent bug reporters.
5487 This bug was introduced by commit v2.18-139-g5122195.
5488 Reported by Marc Aldorasi in http://bugs.gnu.org/17640.
5489
54902014-05-29 Jim Meyering <meyering@fb.com>
5491
5492 dfa: fix off-by-one under-allocation from recent change
5493 Commit v2.19-10-gc32ff67 mistakenly made this change:
5494 -realloc_trans_if_necessary (d, 1);
5495 +realloc_trans_if_necessary (d, 0);
5496 which led to a heap buffer overflow.
5497 * src/dfa.c (dfaexec): Allocate space for one state, as before.
5498
54992014-05-28 Norihiro Tanaka <noritnk@kcn.ne.jp>
5500
5501 dfa: fix bug with regex containing multiple begin/end-line constraints
5502 grep -E 'a(b$|c$)' would mistakenly match "aa".
5503 * src/dfa.c (dfamust): When resetting 'is' in OR, also reset
5504 'begline' and 'endline' of 'must'.
5505 * NEWS (Bug fixes): Mention it.
5506 This bug was introduced via commit v2.18-85-g2c94326.
5507 Reported by Péter Radics in <http://bugs.gnu.org/17617>.
5508
55092014-05-26 Norihiro Tanaka <noritnk@kcn.ne.jp>
5510
5511 dfa: simplify building initial state
5512 build_state_zero doesn't need the struct dfa to be initialized,
5513 so remove the initialization and simplify.
5514 * src/dfa.c (build_state_zero): Remove.
5515 (dfaexec): Call realloc_trans_if_necessary and build_state directly.
5516
5517 dfa: revert "grep: do not count newline before the start of buffer"
5518 This reverts commit 5dc3af2806d21455b818be3f9da26c372e4a7f8d.
5519 The previous change renders that commit unnecessary.
5520
5521 dfa: do not clear the first state of a transition table
5522 If number of DFA states reaches 1024, build_state clears transition
5523 tables to save memory. However, the initial state is always used,
5524 so clearing it just wastes time.
5525 * src/dfa.c (build_state): Do not clear the initial state's
5526 transition and failure tables.
5527
5528 grep: remove unnecessary argument
5529 * src/grep.c (do_execute): Remove argument 'start_ptr'. It's always null.
5530 All uses changed.
5531
55322014-05-24 Paul Eggert <eggert@cs.ucla.edu>
5533
5534 grep: --exclude-dir=FOO/ now ignores the trailing slash
5535 Problem reported by Khaled Ziyaeen; see: http://bugs.gnu.org/17481
5536 * NEWS, doc/grep.texi (File and Directory Selection): Document this.
5537 * src/grep.c (main): Implement this.
5538 * tests/include-exclude: Test this.
5539
5540 dist: don't distribute lib/colorize.c
5541 'configure' creates this file, so it shouldn't be distributed; see:
5542 http://bugs.gnu.org/17480
5543 * configure.ac (COLORIZE_SOURCE): New macro.
5544 Don't use AC_CONFIG_LINKS for lib/colorize.c.
5545 * lib/Makefile.am (nodist_libgreputils_a_SOURCES): New macro.
5546 (libgreputils_a_SOURCES): Remove colorize.c.
5547 (CLEANFILES): Add colorize.c
5548 (colorize.c): New rule.
5549
55502014-05-23 behoffski <behoffski@grouse.com.au>
5551
5552 maint: uncapitalize first letter of two dfaerror message strings
5553 * dfa.c (lex): Make two message strings consistent with all of
5554 the others: do not capitalize the first letter of the first word.
5555
55562014-05-23 Jim Meyering <meyering@fb.com>
5557
5558 maint: revert "grep: port mb_next_wc to RHEL 6.5 x86-64"
5559 This reverts commit v2.18-148-ga6ae68d.
5560 Now that we have gnulib change v0.1-131-g2a045bc, "mbrlen, mbrtowc:
5561 fix bug with empty input", this work-around is no longer needed.
5562
5563 gnulib: update, for mbrlen/mbrtowc empty input bug fix
5564
55652014-05-22 Jim Meyering <meyering@fb.com>
5566
5567 maint: post-release administrivia
5568 * NEWS: Add header line for next release.
5569 * .prev-version: Record previous version.
5570 * cfg.mk (old_NEWS_hash): Auto-update.
5571
5572 version 2.19
5573 * NEWS: Record release date.
5574
55752014-05-21 Jim Meyering <meyering@fb.com>
5576
5577 maint: avoid new false-positive syntax-check failure
5578 * cfg.mk (exclude_file_name_regexp--sc_prohibit_doubled_word):
5579 Exempt new test file that contains legitimate use of "in in".
5580
55812014-05-17 Norihiro Tanaka <noritnk@kcn.ne.jp>
5582
5583 tests: add test case for newline-count fix
5584 * tests/count-newline: New test.
5585 * tests/Makefile.am (TESTS): Add it.
5586
55872014-05-16 Norihiro Tanaka <noritnk@kcn.ne.jp>
5588
5589 grep: do not count newline before the start of buffer
5590 * src/dfa.c (build_state): When checking whether the previous
5591 character was a newline, do not count any newline before the
5592 start of the buffer.
5593
55942014-05-15 Paul Eggert <eggert@cs.ucla.edu>
5595
5596 grep: port mb_next_wc to RHEL 6.5 x86-64
5597 * src/searchutils.c (mb_next_wc): Work around glibc bug 16950; see:
5598 https://sourceware.org/bugzilla/show_bug.cgi?id=16950
5599 This bug was masked in the other GNU/Linux tests I made. It was
5600 exposed on RHEL 6.5 x86-64, where the compiler (GCC Red Hat 4.4.7-4)
5601 happened to use temporaries in a different way.
5602 Also see recent changes to the Gnulib documentation in this area:
5603 http://lists.gnu.org/archive/html/bug-gnulib/2014-05/msg00013.html
5604
5605 tests: port mb-non-UTF8-performance to RHEL 6.5
5606 * tests/mb-non-UTF8-performance (timeout): Use an integer,
5607 as 'timeout 1.234' doesn't work in EUC locales.
5608
56092014-05-12 Paul Eggert <eggert@cs.ucla.edu>
5610
5611 egrep, fgrep: port to Solaris 10 /bin/sh
5612 This old shell doesn't grok ${0%/*}; see: http://bugs.gnu.org/17471
5613 * src/Makefile.am (egrep fgrep): Don't assume the shell does substrings.
5614 * src/egrep.sh (dir): New var, so that the substring calculation is
5615 done only once (which is a small win even with newer shells),
5616 and so that the calculation is easier to edit on older shells.
5617
56182014-05-10 Jim Meyering <meyering@fb.com>
5619
5620 maint: NEWS: adjust wording to reflect move
5621 * NEWS (Improvements): Correct direction-relative wording,
5622 now that the referent is below, not above.
5623
5624 maint: NEWS: move "Improvements" to the top
5625 * NEWS: Move the small "Improvements" section to precede
5626 the longer "Bug fixes" one.
5627
5628 gnulib: update submodule to latest, and bootstrap
5629 * gnulib: Update submodule.
5630 * bootstrap: Update from gnulib.
5631
56322014-05-10 Paul Eggert <eggert@cs.ucla.edu>
5633
5634 dfa: omit double includes
5635 * src/dfa.c: Don't include stddef.h or stdbool.h, as dfa.h includes
5636 them already, and it's the same module as we are.
5637 Suggested by Aharon Robbins in: http://bugs.gnu.org/17458
5638
5639 dfa: fix bug with \< etc in multibyte locales
5640 Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16867
5641 * NEWS: Document the fix.
5642 * src/dfa.c (dfaoptimize): Remove any superset if changing from
5643 UTF-8 to unibyte, and if the pattern has no backreferences.
5644 (dfassbuild): In multibyte locales, treat \< \> \b \B as
5645 backreferences in the DFA, since the DFA relies on unibyte
5646 tests to check them.
5647 (dfacomp): Optimize after building the superset, so that
5648 dfassbuild can depend on d->multibyte. A downside is that
5649 dfaoptimize must remove supersets that are likely slower than the
5650 DFA after optimization, but that's been done in the
5651 above-described change.
5652 * tests/Makefile.am (XFAIL_TESTS): Remove word-delim-multibyte,
5653 since the test works now.
5654
5655 tests: add test case for -C 0 change
5656 * tests/context-0: New test.
5657 * tests/Makefile.am (TESTS): Add it.
5658
5659 grep: -A 0, -B 0, -C 0 now output a separator
5660 Problem reported by Dan Jacobson in: http://bugs.gnu.org/17380
5661 * NEWS:
5662 * doc/grep.texi (Context Line Control): Document this.
5663 * src/grep.c (prtext): Output a separator even if context is zero.
5664 (main): Default context is now -1, not 0.
5665
56662014-05-09 Paul Eggert <eggert@cs.ucla.edu>
5667
5668 grep: minor improvements to retry-DFA-superset patch
5669 * src/dfasearch.c (EGexecute): Avoid unnecessary test in a context
5670 where memrchr cannot return a null pointer.
5671
56722014-05-09 Norihiro Tanaka <noritnk@kcn.ne.jp>
5673
5674 grep: retry DFA superset after matching multiple lines
5675 * src/dfasearch.c (EGexecute): Without this patch, the code reverts
5676 to KWset when the DFA superset matches multiple lines.
5677 However, if the DFA superset matches multiple lines, it most likely
5678 also matches a single line, and reverting to KWset means dfafast
5679 won't work effectively. Change the code so that it retries the DFA
5680 superset immediately after it matches multipline lines. On my platform
5681 this improves the performance of "LC_ALL=C grep '\(ab\)cd\1d' k" from
5682 3.48 to 2.14 seconds realtime, where k contains the output of
5683 "yes abcdabc | head -50000000".
5684
5685 dfa: fix inconsistency in multibyte locales
5686 * src/dfa.c (dfaexec): Use the same exit condition in multibyte
5687 locales as in unibyte.
5688
56892014-05-08 Jim Meyering <meyering@fb.com>
5690
5691 maint: mark some breakless cases with /* fallthrough */ comment
5692 * src/dfa.c (addtok_mb, dfaanalyze): Add comment so that it is
5693 clear that the "break" statement is deliberately omitted.
5694
56952014-05-08 Paul Eggert <eggert@cs.ucla.edu>
5696
5697 dfa: assume C89 for CHAR_BIT
5698 * src/dfa.c (CHARBITS): Remove. All uses replaced by CHAR_BIT.
5699 (NOTCHAR): Now an enum, since it need not be a macro.
5700
5701 dfa: don't assume unsigned int is exactly 32 bits wide
5702 Sun C 5.12 (sparc) warns of the potential unportability.
5703 * src/dfa.c (charclass_word): New type, for clarity.
5704 All relevant uses of 'unsigned' changed.
5705 (CHARCLASS_WORD_BITS): Rename from INTBITS. All uses changed.
5706 Now an enum, since it needn't be a macro.
5707 (CHARCLASS_WORD_MASK): New macro.
5708 (CHARCLASS_WORDS): Rename from CHARCLASS_INTS. All uses changed.
5709 (setbit, clrbit): Cast 1 to charclass_word, for clarity.
5710 (notset, add_utf8_anychar, dfastats):
5711 Don't assume unsigned int is exactly 32 bits wide.
5712 (dfastate): Don't rely on implementation-defined conversion of
5713 greater-than-INT_MAX unsigned to int. Change bit test to resemble
5714 tstbit more.
5715
5716 maint: fix indenting to pacify 'prohibit_tab_based_indentation'
5717 * src/dfa.c: Use spaces and not tabs to indent some lines.
5718
5719 grep: simplify and clarify invert-related code
5720 * src/grep.c (out_invert, prtext): Use bool for booleans.
5721 (prline): Remove unnecessary '!!' on a value that is always 0 or 1.
5722 (prtext): Remove last arg NLINESP; use !out_invert instead. All uses
5723 changed. Move decls to nearer uses, since we can assume C99 here.
5724 Update 'outleft' and 'after_last_match' here; it's simpler.
5725 (grepbuf): Compute return value by subtracting new from old 'outleft',
5726 rather than by keeping a separate running total. Avoid code duplication
5727 by arranging for prtext to be called from one place, not three.
5728
57292014-05-08 Norihiro Tanaka <noritnk@kcn.ne.jp>
5730
5731 grep: improve performance of -v when combined with -L, -l or -q
5732 Problem reported by Jörn Hees in: http://bugs.gnu.org/17427
5733 * src/grep.c (grepbuf, grep): When -v is combined with -L, -l, or -q,
5734 don't read data unnecessarily after a non-match is found.
5735
57362014-05-06 Paul Eggert <eggert@cs.ucla.edu>
5737
5738 doc: mention performance changes
5739 * NEWS: Discuss recent performance improvements and downgrades.
5740
5741 dfa: clarify use of "if"
5742 The phrase "Y is true if X" is logically equivalent to "X implies Y",
5743 but often "X if and only if Y" was intended.
5744 * src/dfa.c, src/dfa.h: Reword to avoid the incorrect use of "if".
5745
5746 dfa: minor performance improvement for previous change
5747 * src/dfa.c (struct dfa): New member 'fast'. Remove 'has_backref'.
5748 All uses changed.
5749
57502014-05-06 Norihiro Tanaka <noritnk@kcn.ne.jp>
5751
5752 dfa: speed up 'dfaisfast'
5753 * src/dfa.c (struct dfa): New member 'has_backref'.
5754 (addtok_mb): Set it.
5755 (dfaisfast): Use it.
5756
57572014-05-05 Paul Eggert <eggert@cs.ucla.edu>
5758
5759 grep: fix -w match next to a multibyte letter
5760 * NEWS: Document this.
5761 * src/dfasearch.c, src/kwsearch.c (WCHAR): Remove.
5762 (wordchar): New static function.
5763 * src/dfasearch.c (EGexecute):
5764 * src/kwsearch.c (Fexecute): Use the new functions, so that the
5765 code works correctly if a multibyte character adjacent to the
5766 match has two or more bytes.
5767 * src/search.h, src/searchutils.c (mb_prev_wc, mb_next_wc):
5768 New functions.
5769 * tests/word-delim-multibyte: Add a test for grep -w (which now
5770 passes), and a test for \> (which still fails). The \< test also
5771 still fails.
5772
5773 grep: improve internal API for multibyte boundary
5774 * src/search.h, src/searchutils.c (mb_goback): Rename from
5775 is_mb_middle. Omit last arg. Return number of bytes to go back,
5776 not just a boolean. All uses changed.
5777 * src/dfasearch.c (EGexecute):
5778 * src/kwsearch.c (Fexecute): Adjust to API change.
5779 * src/kwsearch.c (Fexecute): Eliminate common subexpression.
5780
5781 grep: fix encoding-error incompatibilities among regex, DFA, KWset
5782 This follows up to http://bugs.gnu.org/17376 and fixes a different
5783 set of incompatibilities, namely between the regex matcher and the
5784 other matchers, when the pattern contains encoding errors.
5785 The GNU regex matcher is not consistent in this area: sometimes
5786 an encoding error matches only itself, and sometimes it
5787 matches part of a multibyte character. There is no documentation
5788 for grep's behavior in this area and users don't seem to care,
5789 and it's simpler to defer to the regex matcher for problematic
5790 cases like these.
5791 * NEWS: Document this.
5792 * src/dfa.c (ctok): Remove. All uses removed.
5793 (parse_bracket_exp, atom): Use BACKREF if a pattern contains
5794 an encoding error, so that the matcher will revert to regex.
5795 * src/dfasearch.c, src/grep.c, src/pcresearch.c, src/searchutils.c:
5796 Don't include dfa.h, since search.h now does that for us.
5797 * src/dfasearch.c (EGexecute):
5798 * src/kwsearch.c (Fexecute): In a UTF-8 locale, there's no need to
5799 worry about matching part of a multibyte character.
5800 * src/grep.c (contains_encoding_error): New static function.
5801 (main): Use it, so that grep -F is consistent with plain fgrep
5802 when the pattern contains an encoding error.
5803 * src/search.h: Include dfa.h, so that kwsearch.c can call using_utf8.
5804 * src/searchutils.c (is_mb_middle): Remove UTF-8-specific code.
5805 Callers now ensure that we are in a non-UTF-8 locale.
5806 The code was clearly wrong, anyway.
5807 * tests/fgrep-infloop, tests/invalid-multibyte-infloop:
5808 * tests/prefix-of-multibyte:
5809 Do not require that grep have a particular behavor for this test.
5810 It's OK to match (exit status 0), not match (exit status 1), or
5811 report an error (exit status 2), since the pattern contains an
5812 encoding error and grep's behavior is not specified for such
5813 patterns. Test only that KWset, DFA, and regex agree.
5814 * tests/prefix-of-multibyte: Add tests for ABCABC and __..._ABCABC___.
5815
58162014-05-04 Paul Eggert <eggert@cs.ucla.edu>
5817
5818 dfa: minor simplification
5819 * src/dfa.c (parse_bracket_exp): Use enum, not macro, and move var
5820 to just the scope it's needed.
5821
5822 grep: simplify and fix problems with KWset-DFA agreement patch
5823 * src/dfa.c (dfambcache, parse_bracket_exp): Simplify.
5824 (mbs_to_wchar, wctok, FETCH_WC, match_anychar, match_mb_charset)
5825 (check_matching_with_multibyte_ops, transit_state_consume_1char)
5826 (transit_state, dfaexec): Use wint_t, not wchar_t, so that
5827 WEOF is treated correctly on platforms where WEOF is not a valid
5828 wchar_t value.
5829 (ctok, lex): Use int, not unsigned int, for characters,
5830 so that EOF is treated more naturally.
5831 (parse_bracket_exp): Use NOTCHAR to mark uninitialized char, since
5832 FETCH_WC can now set the char to EOF.
5833 (lex): Remove unnecessary test for EOF.
5834 (parse_bracket_exp, atom): Swap then and else parts, to put
5835 the small one first; this is more readable here.
5836 * src/searchutils.c (is_mb_middle): Simplify.
5837
5838 tests: improve coverage for prefix-of-multibyte
5839 * tests/prefix-of-multibyte: Also test the regex version.
5840
58412014-05-04 Norihiro Tanaka <noritnk@kcn.ne.jp>
5842
5843 grep: make KWset and DFA agree about invalid sequences in patterns
5844 See: http://bugs.gnu.org/17376
5845 * src/dfa.c (dfambcache): Don't cache invalid sequences, because they can't be
5846 represented by wide characters.
5847 (dfambcache, mbs_to_wchar): Return WEOF for invalid sequences.
5848 (ctok): New global variable.
5849 (parse_bracket_exp, atom, match_anychar, match_mb_charset): Don't allow WEOF.
5850 (lex): Set 'ctok'.
5851 * src/kwsearch.c (Fexecute):
5852 * src/searchutils.c (is_mb_middle): Don't check here.
5853 * tests/invalid-multibyte-infloop: Adjust to fixed behavior.
5854 * tests/prefix-of-multibyte: Add test cases for this bug.
5855
58562014-05-03 Jim Meyering <meyering@fb.com>
5857
5858 maint: make ChangeLog generation more robust
5859 * Makefile.am (gen-ChangeLog): Sync changes from GNU coreutils,
5860 to ensure exit status is propagated, and to support an optional
5861 git-log-fix file.
5862
58632014-05-03 Paul Eggert <eggert@cs.ucla.edu>
5864
5865 grep: clarify EGexecute slightly
5866 * src/dfasearch.c (EGexecute): Change if-then-else to !if-else-then.
5867
58682014-05-03 Norihiro Tanaka <noritnk@kcn.ne.jp>
5869
5870 grep: fix the bug in previous patch.
5871 * src/dfasearch.c (EGexecute): Do it.
5872
58732014-04-30 Paul Eggert <eggert@cs.ucla.edu>
5874
5875 grep: simplify EGexecute further
5876 * src/dfa.c, src/dfa.h (dfasuperset): Arg is now const pointer.
5877 Now pure.
5878 * src/dfasearch.c (EGexecute): Coalesce some duplicate code.
5879 Don't worry about memrchr returning NULL when that's impossible.
5880
58812014-04-30 Norihiro Tanaka <noritnk@kcn.ne.jp>
5882
5883 grep: adjust timing back to kwset when dfaisfast is true
5884 * src/dfasearch.c (EGexecute): If DFA fails after kwset succeeds,
5885 the code doesn't return to kwset until it reaches the end of the buffer
5886 or finds a match. Because of this, although some cases speed up,
5887 others slow down.
5888
5889 Adjust the heuristic for switching to the DFA, so that it
5890 is more likely to switch at the right times.
5891
58922014-04-30 Norihiro Tanaka <noritnk@kcn.ne.jp>
5893
5894 grep: simplify superset
5895 * src/dfa.h (dfahint): Remove decl.
5896 (dfasuperset): New decl.
5897 * src/dfa.c (dfahint): Remove.
5898 (dfassbuild): Rename from dfasuperset.
5899 (dfasuperset): New function. It returns the superset of D.
5900 * src/dfasearch.c: Use dfasuperset instead of dfahint, and simplify.
5901
5902 dfa: optimize memory allocation
5903 * src/dfa.c (epsclosure): get the value of 'visited' from the argument.
5904 (dfaanalyze): Define and allocate variable 'visited'.
5905 (dfastate): Use not 'insert' but 'merge' to insert positions for
5906 state 0 of DFA.
5907
59082014-04-29 Norihiro Tanaka <noritnk@kcn.ne.jp>
5909
5910 kwset: improve performance by inlining tr
5911 Without this change, older versions of GCC won't inline 'tr', and this
5912 can hurt performance significantly. See: http://bugs.gnu.org/17229#64
5913 * src/kwset.c (tr): Make it inline.
5914
59152014-04-27 Jim Meyering <meyering@fb.com>
5916
5917 gnulib: update to latest
5918 * gnulib: This fixes a bug whereby running bootstrap
5919 would remove our build-aux/git-log-fix file.
5920
59212014-04-27 Paul Eggert <eggert@cs.ucla.edu>
5922
5923 kwset: improve performance by inlining more
5924 Problem reported by Norihiro Tanaka in <http://bugs.gnu.org/17229#55>.
5925 * src/kwset.c (bmexec_trans): Rename from bmexec, and make it inline.
5926 (bmexec): New implementation, which calls bmexec_trans. This helps
5927 GCC inline more aggressively with the default optimization, and
5928 improves performance 25% with the reported benchmark on my host.
5929
59302014-04-26 Paul Eggert <eggert@cs.ucla.edu>
5931
5932 kwset: speed up by using memchr2
5933 Idea suggested by Eric Blake in: http://bugs.gnu.org/17229#43
5934 * bootstrap.conf (gnulib_modules): Add memchr2.
5935 * src/kwset.c: Include stdint.h, for uintptr_t. Include memchr2.h.
5936 (struct kwset): New members gc1, gc2, gc1help.
5937 (tr): Move earlier, so it can be used earlier.
5938 (kwsprep): Initialize struct kwset's new members.
5939 (memchr_kwset): Rename from memchr_trans. Combine C and TRANS args into
5940 new arg KWSET. All uses changed. Use memchr2 when appropriate.
5941 (bmexec): Use new members instead of recomputing their values.
5942 Increase advance_heuristic; it's just a guess, but memchr2 probably
5943 makes it reasonable to increase it.
5944
5945 kwset: improve performance when large Boyer-Moore key doesn't match
5946 * src/kwset.c (bmexec): As a heuristic, prefer memchr to seeking
5947 by delta1 only when the latter doesn't advance much.
5948
5949 dfa: fix index bug in previous patch, and simplify
5950 * src/dfa.c, src/dfa.h (dfaisfast): Arg is const pointer.
5951 * src/dfa.c (dfaisfast): Simplify, since supersets never contain BACKREF.
5952 * src/dfa.h (dfaisfast): Declare to be pure.
5953 * src/dfasearch.c (EGexecute): Fix typo that could cause buffer
5954 read overrun when !dfafast. Hoist duplicate computation out
5955 of an if's then and else parts.
5956
59572014-04-26 Norihiro Tanaka <noritnk@kcn.ne.jp>
5958
5959 grep: speed up for a case to repeat failure in DFA after success in kwset
5960 A DFA is typically much faster if it is unibyte and does not set BACKREF.
5961 Skip kwset if the DFA is fast. For example:
5962
5963 yes abcdabc | head -50000000 >k
5964 env LC_ALL=C time -p src/grep -i 'abcd.bd' k
5965
5966 This improved real-time from 4.86 to 1.34 s.
5967
5968 * src/dfa.c, src/dfa.h (dfaisfast): New function.
5969 * src/dfasearch.c (EGexecute): Use it.
5970
59712014-04-24 Paul Eggert <eggert@cs.ucla.edu>
5972
5973 dfa: fix recently-introduced memory leak
5974 Problem reported by Aharon Robbins in: http://bugs.gnu.org/17341
5975 * src/dfa.c (dfasuperset): free after dfafree.
5976
5977 misc: fix doc and test bugs re grep -z
5978 Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16871
5979 * doc/grep.texi (Usage): Remove incorrect example with -P.
5980 * tests/pcre: Improve test so that it actually tests whether \s
5981 matches a newline.
5982
5983 dfa: minor simplification of dfaexec
5984 * src/dfa.c (dfaexec): Streamline updating of returned values.
5985 Don't bother to check d->multibyte before updating mbp.
5986 Avoid duplicate p > end test.
5987
59882014-04-24 Paul Eggert <eggert@cs.ucla.edu>
5989
5990 dfa: simplify and be more consistent about MB_CUR_MAX
5991 * src/dfa.c (struct dfa): New member 'multibyte',
5992 replacing 'mb_cur_max'. All uses changed. Use this new member
5993 consistently, instead of sometimes referring to MB_CUR_MAX directly.
5994
5995 dfa: fix comment
5996 * src/dfa.c (maybe_realloc): Fix comment to match behavior better.
5997
59982014-04-24 Norihiro Tanaka <noritnk@kcn.ne.jp>
5999
6000 grep: skip checking of multibyte character boundary, reaching at eolbyte
6001 * src/dfa.c (dfaexec): Skip checking of multibyte character boundary,
6002 reaching at eolbyte.
6003
60042014-04-24 Paul Eggert <eggert@cs.ucla.edu>
6005
6006 dfa: fix incorrect comment that led to heap overrun
6007 * dfa.c (maybe_realloc): Fix comment to match behavior.
6008
6009 dfa: minor tuneup of dfamust memory savings patch
6010 * src/dfa.c (allocmust): Use xmalloc, not xzalloc.
6011 Initialize the must completely, so that the caller need not
6012 invoke resetmust. All callers changed.
6013 (dfamust): Omit asserts that aren't needed on typical machines
6014 where dereferencing NULL dumps core. Don't leak memory if the
6015 pattern contains a NUL byte.
6016
60172014-04-24 Norihiro Tanaka <noritnk@kcn.ne.jp>
6018
6019 grep: avoid wasting memory for large patterns in dfamust
6020 * src/dfa.c (struct must): New member 'prev'. It points to the
6021 previous must.
6022 (allocmust): New function.
6023 (freemust): New function.
6024 (dfamust): Use it.
6025
60262014-04-24 Jim Meyering <meyering@fb.com>
6027
6028 grep: fix new heap write buffer overrun
6029 * src/dfa.c (parse_bracket_exp): Fix off-by-one allocation error.
6030 Exposed by running the tests with an ASAN-enabled binary (i.e.,
6031 created using gcc's -fsanitize=address option). Introduced by
6032 commit v2.18-70-gd3d9612, "dfa: simplify range char allocation".
6033
60342014-04-24 Paul Eggert <eggert@cs.ucla.edu>
6035
6036 build: suppress unsafe-loop-optimizations warnings
6037 I ran into one of these while trying out GCC 4.9.0's new
6038 -fsanitize=undefined option. The warning told me that GCC didn't
6039 do an unsafe optimization, but in 'grep' this is not typically a
6040 symptom of a programming error.
6041 * configure.ac (WERROR_CFLAGS): Suppress -Wunsafe-loop-optimizations.
6042
60432014-04-23 Paul Eggert <eggert@cs.ucla.edu>
6044
6045 dfa: fix memory leak reintroduced by previous patch
6046 Reported by Norihiro Tanaka in <http://bugs.gnu.org/17328#16>.
6047 * src/dfa.c (dfaexec): Allocate mb_match_lens and mb_follows only
6048 if not already allocated.
6049 (free_mbdata): Null out mb_match_lens to mark it as being freed.
6050
60512014-04-23 Jim Meyering <meyering@fb.com>
6052
6053 tests: use consistent spelling for locale name, en_US.UTF-8
6054 * tests/pcre-infloop: Spell locale name, en_US.UTF-8, consistently,
6055 converting this one use from "en_US.utf8", which would provoke a
6056 test failure on OS/X.
6057
60582014-04-23 Paul Eggert <eggert@cs.ucla.edu>
6059
6060 dfa: omit static variables that limited dfaexec to one struct dfa
6061 Problem reported by Aharon Robbins in: http://bugs.gnu.org/17328
6062 * src/dfa.c (struct dfa): New member mbs.
6063 mb_follows is now a position_set, not a pointer to one;
6064 this simplifies memory allocation. All uses changed.
6065 (mbs_to_wchar): Put DFA arg at the end, in place of the mbstate_t *arg,
6066 since the DFA now contains an mbstate_t. All uses changed.
6067 (mbs): Remove static variable.
6068 (dfaexec): Remove static bool that attempted to optimize memory
6069 allocation, as this wasn't correct for Gawk. Perhaps we can think
6070 of a better way to optimize memory.
6071
60722014-04-22 Paul Eggert <eggert@cs.ucla.edu>
6073
6074 kwset: simplify and speed up Boyer-Moore unibyte -i in some cases
6075 This improves the performance of, for example,
6076 yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 | grep -i jk
6077 in a unibyte locale.
6078 * src/kwset.c (memchr_trans): New function.
6079 (bmexec): Use it. Simplify the code and remove some of the
6080 confusing gotos and breaks and labels. Do not treat glibc memchr
6081 as a special case; if non-glibc memchr is slow, that is lower
6082 priority and I suppose we can try to work around the problem in
6083 gnulib.
6084
60852014-04-22 Norihiro Tanaka <noritnk@kcn.ne.jp>
6086
6087 grep: speed-up by using memchr() in Boyer-Moore searching
6088 memchr() of glibc is faster than seeking by delta1 on some platforms.
6089 When there is no chance to match for a while, use it on them.
6090 * src/kwset.c (bmexec): Use memchr() in Boyer-Moore searching.
6091
60922014-04-22 Paul Eggert <eggert@cs.ucla.edu>
6093
6094 kwset: simplify Boyer-Moore with unibyte -i
6095 This change doesn't significantly affect performance on my platform,
6096 and should make the code easier to maintain.
6097 * src/kwset.c (BM_DELTA2_SEARCH, LAST_SHIFT, TRANS):
6098 Remove these macros, in favor of ...
6099 (tr, bm_delta2_search): New functions. All uses changed.
6100 The latter function is inline because this improves code size and
6101 runtime CPU slightly on x86-64 with gcc -O2 (GCC 4.9.0).
6102 (bmexec): Prefer tr when that's simpler.
6103
61042014-04-22 Norihiro Tanaka <noritnk@kcn.ne.jp>
6105
6106 grep: may also use Boyer-Moore algorithm for case-insensitive matching
6107 * src/kwset.c (BM_DELTA2_SEARCH, LAST_SHIFT, TRANS): New macro.
6108 (bmexec): Use character translation table.
6109 (kwsexec): Call bmexec for case-insensitive matching.
6110 (kwsprep): Change the `if' condition.
6111
61122014-04-21 Paul Eggert <eggert@cs.ucla.edu>
6113
6114 grep: -P now rejects invalid input sequences in UTF-8 locales
6115 See <http://bugs.gnu.org/17245> and <http://bugs.exim.org/1468>.
6116 * NEWS: Document this.
6117 * src/pcresearch.c (Pexecute): Do not use PCRE_NO_UTF8_CHECK,
6118 as this leads to undefined behavior when the input is not UTF-8.
6119 * tests/pcre-infloop, tests/pcre-invalid-utf8-input:
6120 Exit status is now 2, not 1, when grep -P is given invalid UTF-8
6121 data in a UTF-8 locale.
6122
6123 dfa: minor improvements to previous patch
6124 * src/dfa.c (dfamust): Use &=, not if-then.
6125 * src/dfa.h (struct dfamust):
6126 * src/dfasearch.c (begline, hwsmusts): Use bool for boolean.
6127 * src/dfasearch.c (kwsmusts):
6128 * src/kwsearch.c (Fcompile): Prefer decls after statements.
6129 * src/dfasearch.c (kwsmusts): Avoid conditional branch.
6130 * src/kwsearch.c (Fcompile): Unify the two calls to kwsincr.
6131
61322014-04-21 Norihiro Tanaka <noritnk@kcn.ne.jp>
6133
6134 grep: speed-up for exact matching with begline and endline constraints.
6135 dfamust turns on the flag when a state exactly matches the proposed one.
6136 However, when the state has begline and/or endline constraints, turns
6137 off it.
6138
6139 This patch enables to match a state exactly, even if the state has
6140 begline and/or endline constraints. If a exact string has one of their
6141 constrations, the string adding eolbyte to a head and/or foot is pushed
6142 to kwsincr(). In addition, if it has begline constration, start
6143 searching from just before the position of the text.
6144
6145 * src/dfa.c (variable must): New members `begline' and `endline'.
6146 (dfamust): Consideration of begline and endline constrations.
6147 * src/dfa.h (struct dfamust): New members `begline' and `endline'.
6148 * src/dfasearch.c (kwsmusts): If a exact string has begline constration,
6149 start searching from just before the position of the text.
6150 (EGexecute): Same as above.
6151 * src/kwsearch.c (Fexecute): Same as above.
6152
61532014-04-20 Paul Eggert <eggert@cs.ucla.edu>
6154
6155 dfa: fix bug that caused NUL to be mishandled in patterns
6156 This bug was introduced in the early-2012 patches that fixed some
6157 context-handling bugs. Bisecting found commit
6158 d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100),
6159 but it apears the underlying problem was introduced in commit
6160 8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100).
6161 * NEWS: Mention bug fix.
6162 * src/dfa.c (char_context): Consider NUL to be a newline only if -z.
6163 * tests/Makefile.am (TESTS): Add null-byte.
6164 * tests/null-byte: New file.
6165
61662014-04-19 Jim Meyering <meyering@fb.com>
6167
6168 build: reenable some compiler warning options
6169
61702014-04-18 Paul Eggert <eggert@cs.ucla.edu>
6171
6172 dfa: fix pointer type conversion bug
6173 The code converted between size_t * and ptrdiff_t *, which wasn't
6174 diagnosed by modern x86-64 GCC but isn't portable. Problem
6175 reported by Norihiro Tanaka in <http://bugs.gnu.org/17136#31>.
6176 * configure.ac (WERROR_CFLAGS): Don't add -Wno-pointer-sign.
6177 We want GCC to diagnose pointer signedness problems, as they
6178 violate the C standard and other compilers no doubt complain too.
6179 * src/dfa.c (struct dfa): Change type of salloc to size_t.
6180 (realloc_trans_if_necessary): Convert signed value to size_t before
6181 passing its address to x2nrealloc. Changing the type of tralloc
6182 to size_t might have led to problems elsewhere.
6183
61842014-04-18 Jim Meyering <meyering@fb.com>
6185
6186 maint: Revert "dfa: avoid new NULL dereference"
6187 This reverts commit 5190041fe515743ef4545abf287d243bc025c701.
6188 It was only a bug if one neglected to update to the latest gnulib.
6189 With the newer xn2realloc, there is no problem.
6190
6191 dfa: avoid new NULL dereference
6192 * src/dfa.c (dfa_charclass_index): Restore a "+ 1" mistakenly omitted
6193 during recent improvements. Introduced in v2.18-66-g6a60fd5.
6194
61952014-04-17 Paul Eggert <eggert@cs.ucla.edu>
6196
6197 dfa: minor cleanup
6198 * src/dfa.c (MAX): Remove; no longer used.
6199
62002014-04-17 Norihiro Tanaka <noritnk@kcn.ne.jp>
6201
6202 dfa: speed up by checking multibyte characters on demand
6203 If dfaexec() runs in non-UTF8 locales, length and wide character
6204 representation are checked for all characters of a line in a input
6205 string. However, if matched early in the line, results for remaining
6206 characters are wasted.
6207
6208 This patch checks multibyte characters on demand. It should work
6209 faster for early matches, and reduces memory requirements.
6210
6211 * src/dfa.c (struct dfa): Remove members mblen_buf, nmblen_buf,
6212 inputwcs, ninputwcs. All uses removed.
6213 (buf_begin, buf_end, prepare_wc_buf): Remove. All uses removed.
6214 (SKIP_REMAINS_MB_IF_INITIAL_STATE): Remove. This is now expanded
6215 when used.
6216 (match_anychar, match_mb_charset, check_matching_with_multibyte_ops):
6217 New arg wc, mbclen. Remove arg idx. All uses changed.
6218 (transit_state_consume_1char): New arg wc. All uses changed.
6219 (transit_state): New arg 'end'. All uses changed.
6220
62212014-04-17 Paul Eggert <eggert@cs.ucla.edu>
6222
6223 dfa: trans reallocation microoptimization
6224 * src/dfa.c (realloc_trans_if_necessary):
6225 Help the compiler avoid unnecessary reloads.
6226
6227 dfa: simplify dfmust initialization
6228 * src/dfa.c (dfamust): Don't initialize musts twice.
6229 Use zcalloc, not xmalloc followed by zeroing.
6230 Make result a const pointer.
6231
6232 dfa: simplify freelist
6233 * src/dfa.c (freelist): Don't null out array while freeing its
6234 pointers; the caller can do that if needed.
6235 (resetmust): Null out zeroth entry of array.
6236
6237 dfa: avoid duplicate strlen when allocating memory
6238 * src/dfa.c (dfamust): Use xstrdup, not strlen (twice) + xmemdup.
6239
6240 dfa: simplify memory allocation
6241 * src/dfa.c (icatalloc, freelist, enlist, comsubs, addlists, inboth)
6242 (dfamust): Don't worry about null arguments or results,
6243 as memory allocators no longer can return null pointers.
6244 (dfamust): Invoke malloc just once when building a concatenated string.
6245
6246 dfa: simplify position set and element count allocation
6247 * src/dfa.c (dfaanalyze): Allocation position set info all at one go,
6248 and similarly for element count info.
6249
6250 dfa: simplify multibyte_prop allocation
6251 * src/dfa.c (struct dfa): Simplify by removing nmultibyte_prop;
6252 it should always be the same as talloc. All uses changed.
6253
6254 dfa: simplify range char allocation
6255 * src/dfa.c (struct dfa): Simplify by allocating one array of ranges
6256 rather than one for range starts and another for range ends.
6257 All uses changed.
6258
6259 dfa: simplify transition table allocation
6260 * src/dfa.c (struct dfa): Remove member 'realtrans', as it can
6261 be computed from 'trans'. All uses changed.
6262 (realloc_trans_if_necessary): Move earlier, to avoid a forward decl.
6263 Use x2nrealloc to compute new size, rather than doing it by hand,
6264 which omits a check for unlikely overflow.
6265 (realloc_trans_if_necessary, dfafree): Adjust to the fact that
6266 d->trans now might be either NULL, or 1 + the pointer to free.
6267 (build_state, build_state_zero): Use realloc_trans_if_necessary
6268 instead of duplicating its code.
6269
6270 dfa: better size-overflow check
6271 * src/dfa.c (dfasuperset): Let xnmalloc do the multiplication,
6272 to check for size arithmetic overflow better.
6273
6274 dfa: avoid unnecessary work and other initialization
6275 * src/dfa.c (dfaanalyze, dfainit):
6276 Don't bother allocating when x2nrealloc will do it for us.
6277 (dfastate): Allocate grps and labels on the stack, as their
6278 size is known at compile time.
6279 (build_state): Use xmalloc, not xnmalloc, since the multiplication
6280 can be done at compile-time.
6281
6282 dfa: clarify memory allocation and port to IRIX
6283 This change was prompted by a porting problem:
6284 IRIX defines its own MALLOC macro, which clashes with ours.
6285 More generally, the MALLOC etc. macros are confusing, as they
6286 look like functions but do not have C-function semantics.
6287 A functional style makes the code easier to read, and though
6288 it lengthens the code a bit here it'll make other
6289 simplifications easier.
6290 * src/dfa.c (XNMALLOC, XCALLOC, CALLOC, MALLOC, REALLOC): Remove.
6291 All uses replaced by xnmalloc etc.
6292 (REALLOC_IF_NECESSARY): Remove; all uses replaced by ....
6293 (maybe_realloc): New function.
6294 (copy, merge): Free and allocate rather than realloc, as we
6295 needn't save the contents.
6296
62972014-04-14 Jim Meyering <meyering@fb.com>
6298
6299 tests: detect an infloop-inducing bug in grep -P (pcre-8.35)
6300 * tests/pcre-infloop: New test.
6301 * tests/Makefile.am (TESTS): Add it.
6302
63032014-04-12 Paul Eggert <eggert@cs.ucla.edu>
6304
6305 build: update gnulib submodule to latest
6306
63072014-04-11 Paul Eggert <eggert@cs.ucla.edu>
6308
6309 grep: improvements for the open-CSET patch
6310 * src/dfa.c (dfamust): Simplify by removing some duplicate code.
6311 Optimize patterns like [aaa] even when not case-folding.
6312 Avoid an unnecessary copy of the charclass.
6313
63142014-04-11 Norihiro Tanaka <noritnk@kcn.ne.jp>
6315
6316 grep: open CSET and transform into uppercase when MB_CUR_MAX == 1
6317 In unibyte locales with -i, kwset matching isn't helpful, because
6318 dfamust doesn't extract the CSET entries. Fix dmamust so that it
6319 does that, and makes it possible to take out a longer fixed string
6320 from tokens.
6321 * src/dfa.c (dfamust): open CSET and transform into uppercase
6322 when MB_CUR_MAX == 1.
6323
63242014-04-11 Paul Eggert <eggert@cs.ucla.edu>
6325
6326 grep: cleanup for HAS_DOS_FILE_CONTENTS issue
6327 While cleaning up the empty-string fix, I noticed that one part of
6328 the code worried about CRLF in pattern files whereas another part
6329 did not. Fix this by using the same approach in both places,
6330 and make the CRLF code more modular in the process.
6331 * src/dosbuf.c (dos_binary, dos_unix_byte_offsets): New functions.
6332 (undossify_input, dossified_pos): Do nothing if ! O_BINARY.
6333 * src/grep.c: Always include dosbuf.c so that the code is
6334 checked statically even on non-DOS hosts.
6335 (dos_binary, dos_unix_byte_offsets): New decls.
6336 (undossify_input): Declare unconditionally.
6337 * src/grep.c (fillbuf, print_line_head, main):
6338 * src/kwsearch.c (Fcompile):
6339 Simplify by not worrying about HAVE_DOS_FILE_CONTENTS.
6340 * src/grep.c (main): fopen with "rt" if O_TEXT; this is simpler
6341 than worrying about HAVE_DOS_FILE_CONTENTS elsewhere.
6342 * src/system.h (HAVE_DOS_FILE_CONTENTS): Remove.
6343
6344 grep: cleanup for empty-string fix
6345 * NEWS: Document it.
6346 * src/dfasearch.c (GEAcompile):
6347 * src/kwsearch.c (Fcompile):
6348 Use C99-style decls to simplify. Avoid duplicate code.
6349 * tests/empty-line: Add some more tests like this.
6350
63512014-04-11 Norihiro Tanaka <noritnk@kcn.ne.jp>
6352
6353 grep: no match for the empty string included in multiple patterns
6354 * src/dfasearch.c (EGAcompile): Fix it.
6355 * src/kwsearch.c (Fcompile): Fix it.
6356
63572014-04-08 Paul Eggert <eggert@cs.ucla.edu>
6358
6359 grep: remove bool_bf
6360 The extra complexity of this microoptimization wasn't ever much help,
6361 and currently it generated bigger code with gcc -O2 (x86-64).
6362 * src/dfa.c (bool_bf): Remove. All uses replaced by plain 'bool',
6363 without a bitfield.
6364
63652014-04-08 Jim Meyering <meyering@fb.com>
6366
6367 maint: avoid sc_po_check syntax-check failure (kwset.c)
6368 * po/POTFILES.in: Remove kwset.c from this list, since it
6369 no longer contains a translatable diagnostic.
6370
63712014-04-08 Paul Eggert <eggert@cs.ucla.edu>
6372
6373 grep: port better to hosts with nonstandard nl_langinfo
6374 On some hosts, nl_langinfo returns strings other than "UTF-8" when
6375 UTF-8 is used, and (worse) return "UTF-8" even if the encoding is
6376 single-byte. Work around these problems by trying a sample
6377 character instead.
6378 * src/dfa.c, src/pcresearch.c, src/searchutils.c:
6379 Don't include <langinfo.h>.
6380 * src/dfa.c (using_utf8): Test for UTF-8 by trying a character
6381 rather than by invoking nl_langinfo (CODESET); this is more
6382 portable in practice, and removes a dependency on
6383 HAVE_LANGINFO_CODESET.
6384 * src/pcresearch.c: Include dfa.h, for using_utf8.
6385 (Pcompile): Use using_utf8 rather than nl_langinfo.
6386
63872014-04-07 Paul Eggert <eggert@cs.ucla.edu>
6388
6389 grep: prefer bool in DFA internals
6390 * src/dfa.c (bool_bf): New type.
6391 (dfa_state): Use it, as this seems to generate slightly better
6392 code with GCC.
6393 (struct mb_char_classes, struct dfa, equal, case_fold, dfasyntax)
6394 (laststart, parse_bracket_exp, lex, dfaparse, dfaanalyze, dfastate)
6395 (match_mb_charset, dfamust):
6396 Use bool for boolean.
6397 (using_utf8) [!HAVE_LANGINFO_CODESET]: Tune.
6398 (dfaanalyze): Prefer & to && and | to || on booleans; it's simpler here.
6399 (dfastate): Simplify charclass nonzero testing. Redo has_mbcset
6400 test so that the compiler's more likely to optimize it.
6401
64022014-04-07 Norihiro Tanaka <noritnk@kcn.ne.jp>
6403
6404 grep: prefer regex to DFA for ANYCHAR in multibyte locales
6405 * src/dfa.c (dfa_state): New member has_mbcset.
6406 Rename backref to has_backref, and make it of type bool too.
6407 All uses changed.
6408 (state_index, dfastate): Initialize new member.
6409 (dfaexec): Prefer regex to DFA for ANYCHAR in multibyte locales.
6410
64112014-04-07 Paul Eggert <eggert@cs.ucla.edu>
6412
6413 grep: remove trival_case_ignore
6414 This optimization is no longer needed, given the other
6415 optimizations recently installed. Derived from a patch by
6416 Norihiro Tanaka; see <http://bugs.gnu.org/17019>.
6417 * bootstrap.conf (gnulib_modules): Remove assert-h.
6418 * src/dfa.c (CASE_FOLDED_BUFSIZE): Move here from dfa.h.
6419 Remove now-unnecessary static assert.
6420 (case_folded_counterparts): Now static.
6421 * src/dfa.h (CASE_FOLDED_BUFSIZE, case_folded_counterparts):
6422 Remove decls; no longer public.
6423 * src/dfasearch.c (kwsmusts): Use kwset even if fill MB_CUR_MAX > 1
6424 and case-insensitive.
6425 * src/grep.c (MBRTOWC, WCRTOMB): Remove.
6426 (fgrep_to_grep_pattern): Use mbrtowc, not MBRTOWC.
6427 (trivial_case_ignore): Remove; this optimization is no longer needed.
6428 All uses removed.
6429
6430 grep: simplify memory allocation in kwset
6431 * src/kwset.c: Include kwset.h first, to check its prereqs.
6432 Include xalloc.h, for xmalloc.
6433 (kwsalloc): Use xmalloc, not malloc, so that the caller need not
6434 worry about memory allocation failure.
6435 (kwsalloc, kwsincr, kwsprep): Do not worry about obstack_alloc
6436 returning NULL, as that's not possible.
6437 (kwsalloc, kwsincr, kwsprep, bmexec, cwexec, kwsexec, kwsfree):
6438 Omit unnecessary conversion between struct kwset * and kwset_t.
6439 (kwsincr, kwsprep): Return void since memory-allocation failure is
6440 not possible now. All uses changed.
6441 * src/kwset.h: Include <stddef.h>, for size_t, so that this
6442 include file doesn't require other files to be included first.
6443
6444 grep: minor cleanups for Galil speedups
6445 * src/kwset.c: Update citations.
6446 Include stdbool.h.
6447 (kwsincr, kwsprep): Clarify by using C99 decls after statements.
6448 (kwsprep): Clarify by using MIN. Avoid a couple of buffer copies
6449 when !TRANS.
6450 (bmexec): Use bool for boolean. Prefer "continue;" to ";".
6451
64522014-04-07 Norihiro Tanaka <noritnk@kcn.ne.jp>
6453
6454 grep: use the Galil rule for Boyer-Moore algorithm in KWSet
6455 The Boyer-Moore algorithm is O(m*n), which means it may be much
6456 slower than the DFA. Its Galil rule variant is O(n) and increases
6457 efficiency in the typical case; it skips sections that are known
6458 to match and does not compare more than once for a position in the text.
6459 To use the Galil rule, look for the delta2 shift at each position
6460 from the trie instead of the 'mind2' value.
6461 * src/kwset.c (struct kwset): Replace member 'mind2' with 'shift'.
6462 (kwsprep): Look for the delta2 shift.
6463 (bmexec): Use it.
6464
64652014-04-06 Paul Eggert <eggert@cs.ucla.edu>
6466
6467 grep: cleanup DFA superset optimization
6468 * src/dfa.c (dfa_charclass_index): New function, with body of
6469 old dfa_charclass but with an extra parameter D.
6470 (charclass_index): Reimplement in terms of dfa_charclass_index.
6471 (dfahint): Clarify.
6472 (dfasuperset): Do not assign to 'dfa' static variable. Instead,
6473 use a local, and use the new dfa_charclass_index function. This
6474 doesn't fix any bugs, but it's clearer. Initialize a few more
6475 members, to simplify dfafree. Copy the charclasses with
6476 just one memcpy call. Don't assign nonnull to D->superset until
6477 it's known to be valid; that's simpler.
6478 (dfafree, dfaalloc): Simplify based on dfasuperset initializations.
6479 * src/dfa.h (dfahint): Add comment.
6480 * src/dfasearch.c (EGexecute): Simplify use of memchr.
6481 Simplify by using memrchr. Fix typo that could cause a buffer
6482 read overrun.
6483
64842014-04-06 Norihiro Tanaka <noritnk@kcn.ne.jp>
6485
6486 grep: optimization with the superset of DFA
6487 The superset of a DFA is like the DFA, except that for speed
6488 ANYCHAR, MBCSET and BACKREF are replaced by (CSET full bits) STAR,
6489 and mb_cur_max is 1. For example, for 'a\(b\)c\1':
6490 original: a b CAT c CAT BACKREF CAT
6491 superset: a b CAT c CAT CSET STAR CAT (The CSET has all bits set.)
6492 If a string matches a DFA, it matches the DFA's superset.
6493 Using the superset to filter can dramatically improve performance,
6494 over 200x in some cases. See <http://bugs.gnu.org/16966>.
6495 * src/dfa.c (struct dfa): New member 'superset'.
6496 (dfahint, dfasuperset): New functions.
6497 (dfacomp): Create and analyze the superset.
6498 (dfafree): Free only non-NULL items.
6499 (dfaalloc): Initialize superset member.
6500 (dfaoptimize): If succeed in optimization for UTF-8 locale, don't use
6501 the superset.
6502 * src/dfa.h (dfahint): New decl.
6503 * src/dfasearch.c (EGexecute): Use dfahint.
6504
65052014-04-06 Jim Meyering <meyering@fb.com>
6506
6507 build: avoid OS X 10.8.5 build failure due to lack of static_assert
6508 * bootstrap.conf (gnulib_modules): Add assert-h, to accommodate the
6509 new use of static_assert on systems lacking support for that construct.
6510 Without this change, compilation of dfa.c failed on OS X 10.8.5 with
6511 gcc-4.9.0 20140324. We should be using gnulib's assert-h module,
6512 regardless, for its nominal improved portability, since grep includes
6513 assert.h and uses assert.
6514
65152014-04-05 Norihiro Tanaka <noritnk@kcn.ne.jp>
6516
6517 grep: fix performance bug with regex in line-by-line mode
6518 * src/dfasearch.c (EGexecute): Match line-by-line with regex.
6519
65202014-04-05 Paul Eggert <eggert@cs.ucla.edu>
6521
6522 grep: minor improvements to previous patch
6523 * src/dfa.c (MAX): New macro.
6524 (match_anychar, match_mb_charset, transit_state_consume_1char):
6525 Use it to simplify assignments.
6526 (SKIP_REMAINS_MB_IF_INITIAL_STATE): Prefer != 0 for unsigned.
6527 (free_mbdata): Omit an unnecessary 'free'.
6528
65292014-04-05 Norihiro Tanaka <noritnk@kcn.ne.jp>
6530
6531 grep: reuse multibyte DFA buffers in non-UTF8 locales
6532 * src/dfa.c (struct dfa): New members 'mblen_buf', 'nmblen_buf',
6533 'inputwcs', 'ninputwcs', 'mb_follows' and 'mb_match_lens'.
6534 (mblen_buf, inputwcs): Remove static vars.
6535 (SKIP_REMAINS_MB_IF_INITIAL_STATE, match_anychar, match_mb_charset)
6536 (transit_state_consume_1char, transit_state, prepare_wc_buf):
6537 Use new members instead of global variables.
6538 (check_matching_with_multibyte_ops): Use new members
6539 instead of new allocation.
6540 (dfaexec): Initialize new members.
6541 (free_mbdata): Free new members.
6542
65432014-04-05 Paul Eggert <eggert@penguin.cs.ucla.edu>
6544
6545 grep: simplify dfa.c by having it not include mbsupport.h directly
6546 * src/mbsupport.h: Remove.
6547 * src/Makefile.am (noinst_HEADERS): Remove mbsupport.h.
6548 * src/dfa.c, src/grep.c, src/search.h: Don't include mbsupport.h.
6549 * src/dfa.c: Include wchar.h and wctype.h unconditionally, as
6550 this simplifies the use of dfa.c in grep, and it does no harm
6551 in gawk.
6552 (setlocale, static_assert): Remove gawk-specific hacks, as
6553 gawk now does these itself.
6554 (struct dfa, dfambcache, mbs_to_wchar)
6555 (is_valid_unibyte_character, setbit_wc, using_utf8, FETCH_WC)
6556 (addtok_wc, add_utf8_anychar, atom, state_index, epsclosure)
6557 (dfaanalyze, dfastate, prepare_wc_buf, dfaoptimize, dfafree, dfamust):
6558 * src/dfasearch.c (EGexecute):
6559 * src/grep.c (main):
6560 * src/searchutils.c (mbtoupper):
6561 Assume MBS_SUPPORT.
6562
65632014-04-01 Norihiro Tanaka <noritnk@kcn.ne.jp>
6564
6565 dfa: avoid re-building a state built previously
6566 * src/dfa.c (dfaexec): Avoid to re-build a state built previously.
6567
65682014-03-28 Paul Eggert <eggert@cs.ucla.edu>
6569
6570 dfa: improve port to freestanding DJGPP
6571 Suggested by Aharon Robbins (Bug#17056).
6572 * src/dfa.c (setlocale) [!LC_ALL]: Return NULL, not "C",
6573 reverting part of a recent change.
6574 (using_simple_locale): Return true if setlocale returns null.
6575
65762014-03-28 Jim Meyering <meyering@fb.com>
6577
6578 tests: placate "make syntax-check" re compare arg ordering
6579 * tests/euc-mb: Reverse order of arguments to compare.
6580 Be consistent in ordering compare arguments: expected followed
6581 by actual.
6582
65832014-03-28 Paul Eggert <eggert@cs.ucla.edu>
6584
6585 dfa: avoid an indirection and port wint_t usage
6586 * src/dfa.c (struct dfa): Put mbrtowc_cache directly into struct dfa
6587 rather than having a pointer; this saves a malloc and an indirection.
6588 All uses changed.
6589 (dfambcache): Port to hosts where wint_t * can't be cast to wchar_t *.
6590
65912014-03-28 Norihiro Tanaka <noritnk@kcn.ne.jp>
6592
6593 grep: take mbrtowc_cache into new member of struct dfa
6594 When struct dfa more than one are used at the same time, mbrtowc cache
6595 may be conflict. So, take mbrtowc_cache into new member of struct dfa,
6596 and define each mbrtowc cache for them.
6597
6598 * src/dfa.c (struct dfa): New member `mbrtowc_cache'.
6599 (dfambcache): Rename from build_mbrtowc_cache. Add dependency on struct dfa.
6600 (mbs_to_wchar): Add dependency on struct dfa.
6601 (FETCH_WC): Use it.
6602 (prepare_wc_buf): Use it. Add dependency on struct dfa.
6603 (dfacomp): Call it.
6604 (dfafree): Release it.
6605
66062014-03-28 Paul Eggert <eggert@cs.ucla.edu>
6607
6608 dfa: cache results of mbrtowc for speed
6609 Idea suggested by Norihiro Tanaka in Bug#16842.
6610 * src/dfa.c (mbrtowc_cache): New static var.
6611 (build_mbrtowc_cache, mbs_to_wchar): New functions.
6612 (FETCH_WC) [MBS_SUPPORT]: Speed up by using mbs_to_wchar
6613 instead of mbrtowc and wctob.
6614 (FETCH_WC) [!MBS_SUPPORT]: Rewrite in terms of old FETCH macro.
6615 (FETCH): Remove; no longer used.
6616 (lex): Simplify by avoiding the need for FETCH.
6617 (prepare_wc_buf) [MBS_SUPPORT]: Speed up by using mbs_to_wchar.
6618 Simplify the loop.
6619 (dfacomp): Initialize the cache.
6620
66212014-03-27 Norihiro Tanaka <noritnk@kcn.ne.jp>
6622
6623 grep: perform the kwset-helping DFA match in narrower range
6624 When kwsexec gives us the offset of a potential match, we compute
6625 line begin/end and then run the DFA matcher to see if there really
6626 is a match on that line. When the beginning of the line, BEG, is
6627 not on a multibyte character boundary, advance BEG until it on such
6628 a boundary, before running the DFA search.
6629 * src/dfasearch.c (EGexecute): As above. Add a comment.
6630 * tests/euc-mb: Add a test case that exercises this code.
6631 This addresses http://debbugs.gnu.org/17095.
6632
66332014-03-26 Jim Meyering <meyering@fb.com>
6634
6635 maint: fix "make dist"
6636 * src/Makefile.am (egrep fgrep): Specify egrep.sh via
6637 $(srcdir)/egrep.sh, so non-srcdir builds work once again.
6638
66392014-03-26 Paul Eggert <eggert@penguin.cs.ucla.edu>
6640
6641 dfa: improve port to freestanding DJGPP
6642 * src/dfa.c (setlocale) [!LC_ALL]: Return "C", not NULL (Bug#17056).
6643 (using_simple_locale): Store setlocale result in a ptr-to-const.
6644
6645 egrep, fgrep: improve diagnostics from shell scripts
6646 This should fix Bug#17098.
6647 * src/Makefile.am (EXTRA_DIST): Add egrep.sh.
6648 (egrep fgrep): Depend on egrep.sh and Makefile.
6649 Build from new file egrep.sh, as this makes the build process
6650 easier to follow. Arrange for $0 to look nicer in subgrep.
6651 * src/egrep.sh: New file.
6652
66532014-03-23 Paul Eggert <eggert@cs.ucla.edu>
6654
6655 dfa: avoid undefined behavior
6656 * src/dfa.c (FETCH_WC, addtok_wc): Don't rely on undefined behavior
6657 when converting an out-of-range value to 'int'.
6658 (FETCH_WC, prepare_wc_buf): Don't rely on conversion state after
6659 mbrtowc returns a special value, as it's undefined for (size_t) -1.
6660 (prepare_wc_buf): Simplify test for valid character.
6661
6662 grep: fix and simplify grep -iF optimization
6663 * src/grep.c (check_any_alphabets): Remove.
6664 (fgrep_to_grep_pattern): Fix problems when mbrtowc returns -1 or -2.
6665 Simplify a bit.
6666 (main): Don't bother optimizing 'grep -iF PAT' when PAT contains no
6667 alphabetics; it's so rare it's not worth the complexity.
6668
66692014-03-23 Norihiro Tanaka <noritnk@kcn.ne.jp>
6670
6671 grep: optimization for fgrep with changing the macher to grep macher.
6672 fgrep macher is only use kwset engine. However, it's very slow for
6673 case-insensitive matching in multibyte locales.
6674
6675 And so, if the matcher is fgrep and case-insensitive and keys including
6676 any alphabets, change it into grep matcher by escape of keys. OTOH, if
6677 keys include no alphabet, turn match_icase flag off.
6678
6679 I prepare following string to measure the performance.
6680
6681 yes $(printf '%078dm' 0)| head -1000000 | tr 0 a > in
6682 A=`printf '\xef\xbc\xa1'` # FULLWIDTH LATIN CAPITAL LETTER A
6683
6684 I run three tests with this patch (best-of-5 trials):
6685
6686 env LC_ALL=en_US.UTF-8 time -p src/fgrep -i "$A" in
6687 real 8.54 user 7.13 sys 1.16
6688
6689 Back out that commit (temporarily), recompile, and rerun the experiment:
6690
6691 env LC_ALL=en_US.UTF-8 time -p src/fgrep -i "$A" in
6692 real 0.07 user 0.02 sys 0.05
6693
6694 * src/fgrep.c (Gcompile) New function.
6695 * src/main.c (check_any_alphabets) New function.
6696 (fgrep_to_grep_pattern) New function.
6697 (main) Use them.
6698
66992014-03-23 Paul Eggert <eggert@cs.ucla.edu>
6700
6701 egrep, fgrep: go back to shell scripts
6702 Although egrep's and fgrep's switch from shell scripts to
6703 executables may have made sense in 2005, it complicated
6704 maintenance and recently has caused subtle performance bugs.
6705 Go back to the old way of doing things, as it's simpler and more
6706 easily separated from the mainstream implementation. This should
6707 be good enough nowadays, as POSIX has withdrawn egrep/fgrep and
6708 portable applications should be using -E/-F anyway.
6709 * po/POTFILES.in: Remove src/egrep.c, src/fgrep.c, src/main.c.
6710 * src/Makefile.am (bin_PROGRAMS): Remove egrep, fgrep.
6711 (bin_SCRIPTS): New macro.
6712 (grep_SOURCES): Move searchutils.c, dfa.c, dfasearch.c, kwset.c,
6713 kwsearch.c, pcresearch.c here from libgrep_a_SOURCES.
6714 (egrep_SOURCES, fgrep_SOURCES, noinst_LIBRARIES, libgrep_a_SOURCES):
6715 Remove.
6716 (LDADD): Remove libgrep.a.
6717 (egrep, fgrep): New rules.
6718 (CLEANFILES): New macro.
6719 * src/grep.c: Rename from src/main.c.
6720 (usage, setmatcher, main):
6721 Simplify, since there's now just one executable.
6722 (Gcompile, Ecompile, Acompile, GAcompile, PAcompile, matchers):
6723 Move here from the (removed) src/grep.c.
6724 (compile_fp_t, execute_fp_t, struct matcher, matchers):
6725 Move here from src/grep.h, as they no longer need to be public.
6726 (struct matcher.name): Avoid one level of indirection/relocation.
6727 (do_execute, main): Fix a performance bug when it was compiled
6728 as 'fgrep', due to confusion about which matcher was which.
6729 (main): Fix a performance bug with -P, likewise.
6730 * src/grep.h (before_options, after_options): Remove.
6731 * src/egrep.c, src/fgrep.c, src/grep.c: Remove.
6732
6733 dfa: port to freestanding DJGPP (Bug#17056)
6734 * src/dfa.c (setlocale) [!LC_ALL]: Define a dummy.
6735
67362014-03-16 Jim Meyering <meyering@fb.com>
6737
6738 tests: avoid false-positive failure on some AMD CPUs
6739 * tests/mb-non-UTF8-performance: Avoid false-positive failure
6740 when run on certain AMD processors.
6741
67422014-03-10 Jim Meyering <meyering@fb.com>
6743
6744 tests: make a performance-measuring test less system-sensitive
6745 Andreas Schwab reported in http://debbugs.gnu.org/16941
6746 that this test would timeout and fail on m68k-suse-linux.
6747 Rather than testing absolute duration with a limit tuned
6748 to today's hardware, compare performance of grep with LC_ALL=C
6749 against that same command using LC_ALL=ja_JP.eucJP.
6750 * tests/init.cfg (require_hi_res_time_): New function.
6751 * tests/mb-non-UTF8-performance: Rewrite to use it:
6752 record absolute duration D of the first (normally much faster)
6753 command, and set a timeout of 8*D for the command running in
6754 an affected locale.
6755
67562014-03-09 Paul Eggert <eggert@cs.ucla.edu>
6757
6758 maint: pacify 'make dist'
6759 * src/dfa.c (parse_bracket_exp): Reindent with spaces.
6760 * src/dfa.h (case_folded_counterparts): Prefix decl with 'extern'.
6761 * src/main.c: Don't include assert.h.
6762
67632014-03-07 Paul Eggert <eggert@cs.ucla.edu>
6764
6765 fgrep: fix case-fold incompatibility with plain 'grep'
6766 fgrep converted to lowercase, whereas the regex code converted
6767 to uppercase. The resulting behaviors don't agree in offbeat
6768 cases like Greek sigmas and Turkish Is. Fix this by changing
6769 fgrep to agree with the regex code.
6770 * src/kwsearch.c (Fcompile, Fexecute):
6771 * src/searchutils.c (kwsinit, mbtoupper):
6772 Convert to uppercase, not to lowercase, for compatibility with
6773 plain 'grep'.
6774 * src/search.h, src/searchutils.c (mbtoupper):
6775 Rename from mbtolower, since it now converts to uppercase.
6776 All uses changed.
6777 * tests/case-fold-titlecase: Add tests for this.
6778
6779 grep: fix case-fold mismatches between DFA and regex
6780 The DFA code and the regex code didn't use the same semantics for
6781 case-folding. The regex code says that the data char d matches
6782 the pattern char p if uc (d) == uc (p). POSIX is unclear in this
6783 area; the simplest fix for now is to change the DFA code to agree
6784 with the regex code. See <http://bugs.gnu.org/16919>.
6785 * src/dfa.c (static_assert): New macro, if not already defined.
6786 (setbit_case_fold_c): Assume MB_CUR_MAX is 1 and that case_fold
6787 is nonzero; all callers changed.
6788 (setbit_case_fold_c, parse_bracket_exp, lex, atom):
6789 Case-fold like the regex code does.
6790 (lonesome_lower): New constant.
6791 (case_folded_counterparts): New function.
6792 (parse_bracket_exp): Prefer plain setbit when case-folding is
6793 not needed.
6794 * src/dfa.h (CASE_FOLDED_BUFSIZE): New constant.
6795 (case_folded_counterparts): New function decl.
6796 * src/main.c (trivial_case_ignore): Case-fold like the regex code does.
6797 (main): Try to improve comment re trivial_case_ignore.
6798 * tests/case-fold-titlecase: Add lots more test cases.
6799
68002014-03-06 Paul Eggert <eggert@cs.ucla.edu>
6801
6802 build: update gnulib submodule to latest
6803
6804 doc: do not overpromise --ignore-case's behavior
6805 * NEWS: Omit vague statement about titlecase that could be
6806 misinterpreted, and is more trouble than it's worth.
6807 * doc/grep.texi: Add @documentencoding. Fix copyright range to
6808 use endash not hyphen.
6809 (Matching Control): Do not overpromise what --ignore-case will do.
6810 Give examples of corner cases where the documentation does not
6811 specify behavior.
6812
68132014-03-05 Paul Eggert <eggert@cs.ucla.edu>
6814
6815 maint: remove differences from gnulib regex code
6816 These don't seem to be needed with GCC 4.8.2, and are making
6817 maintenance harder. If we need to disable warnings with older
6818 compilers, we can add pragmas to the gnulib versions. See
6819 <http://bugs.gnu.org/16911#24>.
6820 * gl/lib/regcomp.c.diff, gl/lib/regex_internal.c.diff:
6821 * gl/lib/regex_internal.h.diff, gl/lib/regexec.c.diff:
6822 Remove.
6823 * cfg.mk (exclude_file_name_regexp--sc_prohibit_tab_based_indentation):
6824 Don't mention gl/* files.
6825
68262014-03-03 Paul Eggert <eggert@cs.ucla.edu>
6827
6828 grep: fix comment
6829 * src/main.c (trivial_case_ignore): Fix comment typo.
6830
68312014-03-03 Norihiro Tanaka <noritnk@kcn.ne.jp>
6832
6833 grep: avoid to add same character to a bracket expression
6834 * src/main.c (trivial_ignore_case): Only when uppercase and/or
6835 lowercase is different from original character, add it to new pattern.
6836
68372014-03-02 Paul Eggert <eggert@cs.ucla.edu>
6838
6839 grep: fix some unlikely bugs in trivial_case_ignore
6840 * src/main.c (MBRTOWC, WCRTOMB): Reformat as per usual GNU style.
6841 (trivial_case_ignore): Don't overrun buffer in the unusual case
6842 when a character has both lowercase and uppercase counterparts.
6843 Don't rely on undefined behavior when assigning out-of-range value
6844 to an 'int'. Simplify by avoiding unnecessary buffer copies.
6845 Work even with shift encodings, by using mbsinit to
6846 disable the optimization if we are not in the initial state
6847 when we replace B by [BCD].
6848
68492014-03-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
6850
6851 grep: revert removal of trivial_case_ignore
6852 Revive trivial_case_ignore function in order to be able to use kwset.
6853
6854 * src/main.c (MBRTOWC, WCRTOMB): New macros.
6855 (trivial_case_ignore): New function.
6856 (main): Use it.
6857
68582014-03-02 Norihiro Tanaka <noritnk@kcn.ne.jp>
6859
6860 grep: optimization of bracket expression for non-UTF8 locales
6861 * src/dfa.c (addtok): Replace an MBCSET with a CSET even in
6862 non-UTF8 locales, and even when it has individual characters.
6863
68642014-03-01 Paul Eggert <eggert@cs.ucla.edu>
6865
6866 doc: describe titlecase fix better
6867 * NEWS: Document behavior on lowercase text too.
6868 Suggested by Eric Blake in <http://bugs.gnu.org/16911#10>.
6869 * doc/grep.texi (Matching Control): Specify behavior of -i
6870 more precisely.
6871
68722014-02-28 Paul Eggert <eggert@cs.ucla.edu>
6873
6874 grep: minor tuning for mb_case_map_apply
6875 * src/kwsearch.c (mb_case_map_apply): Avoid unnecessary widening of
6876 size_t to intmax_t. Avoid unnecessary reinitialization of k.
6877
6878 grep: avoid 'inline' when it doesn't matter
6879 These days, compilers generally do just fine without advice from
6880 users about 'inline', and there's little need for 'static inline',
6881 just as there's little need for 'register'.
6882 * src/dfa.c (to_uchar):
6883 * src/dosbuf.c (guess_type, undossify_input, dossified_pos):
6884 * src/main.c (undossify_input):
6885 No longer inline.
6886 * src/search.h (mb_case_map_apply): Move from here ...
6887 * src/kwsearch.c (mb_case_map_apply): ... to here, and
6888 make it no longer 'inline'.
6889
6890 grep: fix bugs with -i and titlecase
6891 * NEWS: Document this.
6892 * src/dfa.c (setbit_wc): Simplify.
6893 (setbit_c): Remove; no longer used.
6894 (setbit_case_fold_c, parse_bracket_exp, atom):
6895 Don't mishandle titlecase. For 'atom', this removes the need for
6896 the refactoring of Bug#16729.
6897 (lex): Use the slower approach only for letters that have a
6898 differing case.
6899 * tests/case-fold-titlecase: New file.
6900 * tests/Makefile.am (TESTS): Add it.
6901
6902 grep: remove lint
6903 * src/main.c (MBRTOWC, WCRTOMB): Remove no-longer-used macros.
6904
69052014-02-28 Norihiro Tanaka <noritnk@kcn.ne.jp>
6906
6907 grep: remove trivial_case_ignore
6908 * src/main.c (trivial_case_ignore): Remove.
6909 (main): Remove its use; this optimization is no longer needed.
6910
6911 grep: don't match line-by-line for case-insensitive with grep and awk
6912 * src/main.c (matcher): Move decl up.
6913 (do_execute): With the grep or awk matchers,
6914 no need to match line by line.
6915
69162014-02-27 Jim Meyering <meyering@fb.com>
6917
6918 maint: dfa: pass NULL, not 0, as 2nd arg to setlocale
6919 * src/dfa.c (using_simple_locale): Use NULL, not 0.
6920
69212014-02-27 Paul Eggert <eggert@cs.ucla.edu>
6922
6923 * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match multibyte chars.
6924
6925 * src/dfa.c (parse_bracket_exp): Parenthesize.
6926
6927 grep: fix multiple bugs with bracket expressions
6928 * NEWS: Document this.
6929 * src/dfa.c (using_simple_locale): New function.
6930 (parse_bracket_exp): Handle bracket expressions like [a-[.z.]]
6931 correctly. Don't assume that dfaexec handles expressions like
6932 [^a-z] correctly, as they can match multiple characters in some
6933 locales.
6934 * tests/posix-bracket: New file.
6935 * tests/Makefile.am (TESTS): Add it.
6936
69372014-02-25 Stephane Chazelas <stephane.chazelas@gmail.com>
6938
6939 align grep -Pw with grep -w
6940 For the -w option, with -P, we used to look for the pattern surrounded by
6941 word boundaries. That's different from what grep -w does and what the
6942 documentation describes. Now align with grep -w and the documentation by
6943 using PCRE look-behind and look-ahead operators to match the pattern if
6944 it is not surrounded by word constituents.
6945 * src/pcresearch.c (Pcompile): Use (?<!\w)(?:...)(?!\w) rather than
6946 \b(?:...)\b.
6947 * NEWS (Bug fixes): Mention it.
6948 * tests/pcre-w: New file.
6949 * tests/Makefile.am (TESTS): Add it.
6950 This complements the fix for http://debbugs.gnu.org/16865
6951
69522014-02-24 Stephane Chazelas <stephane.chazelas@gmail.com>
6953
6954 grep -P: fix it so backreferences now work with -w and -x
6955 To implement -w and -x, we bracket the search term with parentheses.
6956 However, that set of parentheses had the default semantics of
6957 "capturing", i.e., creating a backreferenceable matched quantity.
6958 Instead, use (?:...), to create a non-capturing group.
6959 * src/pcresearch.c (Pcompile): Use (?:...) rather than (...).
6960 * NEWS (Bug fixes): Mention it.
6961 * tests/pcre-wx-backref: New file.
6962 * tests/Makefile.am (TESTS): Add it.
6963 This addresses http://debbugs.gnu.org/16865
6964
69652014-02-20 Jim Meyering <meyering@fb.com>
6966
6967 maint: post-release administrivia
6968 * NEWS: Add header line for next release.
6969 * .prev-version: Record previous version.
6970 * cfg.mk (old_NEWS_hash): Auto-update.
6971
6972 version 2.18
6973 * NEWS: Record release date.
6974
6975 tests: test for the non-UTF8 multi-byte performance regression
6976 Test for the just-fixed performance regression.
6977 With a 100-200x differential, it is reasonable to expect that
6978 a very slow system will be able to complete the designated
6979 task in a few seconds, while with the bug, even a very fast
6980 system would exceed the timeout.
6981 * tests/mb-non-UTF8-performance: New file.
6982 * tests/Makefile.am (TESTS): Add it.
6983 * tests/init.cfg (require_JP_EUC_locale_): New function.
6984
6985 grep -i: avoid a performance regression in multibyte non-UTF8 locales
6986 * src/main.c: Include dfa.h.
6987 (trivial_case_ignore): Perform this optimization only for UTF8 locales.
6988 This rectifies a 100-200x performance regression in non-UTF8 multi-byte
6989 locales like ja_JP.eucJP. The regression was introduced by the 10x
6990 UTF8/grep-i speedup, commit v2.16-4-g97318f5.
6991 * NEWS (Bug fixes): Mention it.
6992 Reported by Norihiro Tanaka in http://debbugs.gnu.org/16232#50
6993
6994 maint: give dfa.c's using_utf8 function external scope
6995 * src/dfa.c (using_utf8): Remove "static inline".
6996 * src/dfa.h (using_utf8): Declare it.
6997 * src/searchutils.c (is_mb_middle): Use using_utf8 rather than
6998 rolling our own.
6999
70002014-02-20 Paul Eggert <eggert@cs.ucla.edu>
7001
7002 tests: test [^^-^] in unibyte locales
7003 This is a bug in the current dfa.c, which was reintroduced by the
7004 recent reversion from RRI.
7005 * tests/unibyte-negated-circumflex: New file.
7006 * tests/Makefile.am (TESTS): Add it.
7007 * tests/init.cfg (require_unibyte_locale): New function.
7008
7009 grep: fix bug with patterns like [^^-~] in unibyte locales
7010 * NEWS: Document this.
7011 * src/dfa.c (parse_bracket_exp): Escape patterns like [^^-~], or
7012 Awk patterns like [\^-\]], so that they are not misinterpreted by
7013 the system regex library. Check for system regex failure due to
7014 memory exhaustion.
7015
70162014-02-17 Jim Meyering <meyering@fb.com>
7017
7018 maint: post-release administrivia
7019 * NEWS: Add header line for next release.
7020 * .prev-version: Record previous version.
7021 * cfg.mk (old_NEWS_hash): Auto-update.
7022
7023 version 2.17
7024 * NEWS: Record release date.
7025
70262014-02-17 Paolo Bonzini <bonzini@gnu.org>
7027
7028 revert "grep: DFA now uses rational ranges in unibyte locales"
7029 The correct course of action for grep is to defer range interpretation
7030 to regex, because otherwise you can get mismatches between regexes with
7031 backreferences and those without.
7032
7033 For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing
7034 result that the first regex won't match a superset of the language
7035 described by the second regex.
7036
7037 The source of the confusion is that, even though grep's dfa.c was changed
7038 to use range checking instead of strcoll, that code is only invoked if
7039 dfaexec is called with backref = NULL, and that never happens for grep!
7040
7041 In the end, all that's needed for RRI is compiling --with-included-regex,
7042 and in that case the patch is almost a no-op. Almost, because there
7043 are corner cases that aren't handled correctly (e.g. [a-[.e.]], or
7044 regular expressions that include a NUL character), but this can be
7045 handled separately.
7046
7047 * NEWS: Revert paragraph introduced by commit v2.16-7-g1078b64.
7048 * src/dfa.c (parse_bracket_exp): Revert back to regcomp/regexec.
7049
70502014-02-16 Mike Frysinger <vapier@gentoo.org>
7051
7052 maint: ignore configure.lineno
7053 * .gitignore: Add configure.lineno.
7054
70552014-02-11 Benno Schulenberg <bensberg@justemail.net>
7056
7057 help: remove surplus newline
7058 * src/main.c (usage): Remove inconsistent \n introduced by previous
7059 patch.
7060
70612014-02-10 Benno Schulenberg <bensberg@justemail.net>
7062
7063 help: fix a line ending, and use the same word for similar things
7064 * src/main.c (usage): Change a stray 'n' to a newline, and use
7065 the word "display" for showing version info as for help text.
7066
70672014-02-09 Norihiro Tanaka <noritnk@kcn.ne.jp>
7068
7069 speed up mb-boundary-detection after each preliminary match
7070 After each kwsexec or dfaexec match, we must determine whether
7071 the tentative match falls in the middle of a multi-byte character.
7072 That is what our is_mb_middle function does, but it was expensive,
7073 even when most input consisted of single-byte characters. The main
7074 cost was for each call to mbrlen. This change constructs and uses
7075 a cache of the lengths returned by mbrlen for unibyte values.
7076 The largest speed-up (3x to 7x, CPU-dependent) is when most
7077 lines contain a match, yet few are printed, e.g., when using
7078 grep -v common-pattern ... to filter out all but a few lines.
7079
7080 * src/search.h (build_mbclen_cache): Declare it.
7081 * src/main.c: Include "search.h".
7082 [MBS_SUPPORT] (main): Call build_mbclen_cache in a multibyte locale.
7083 * src/searchutils.c [HAVE_LANGINFO_CODESET]: Include <langinfo.h>.
7084 (mbclen_cache): New global.
7085 (build_mbclen_cache): New function.
7086 (is_mb_middle) [HAVE_LANGINFO_CODESET]: Use it.
7087 * NEWS (Improvements): Mention it.
7088
70892014-02-01 Jim Meyering <meyering@fb.com>
7090
7091 maint: use to_uchar function rather than explicit casts
7092 * src/system.h (to_uchar): Define function.
7093 * src/kwsearch.c (Fexecute): Use to_uchar twice in place of casts.
7094 * src/dfasearch.c (EGexecute): Likewise.
7095 * src/main.c (prepend_args): Likewise.
7096 * src/kwset.c (U): Define in terms of to_uchar.
7097 * src/dfa.c (match_mb_charset): Use to_uchar, not an explicit cast.
7098
70992014-01-27 Jim Meyering <meyering@fb.com>
7100
7101 maint: remove vestiges of support for long-disabled --mmap option
7102 This option was disabled in March of 2010, and began to elicit a
7103 warning in January of 2012. Its time has come.
7104 * doc/grep.in.1: Remove mention.
7105 * doc/grep.texi: Likewise.
7106 * src/main.c (GROUP_SEPARATOR_OPTION, usage, MMAP_OPTION)
7107 (long_options, main): Remove all traces.
7108 * tests/Makefile.am (check_PROGRAMS): Remove mention of ignore-mmap.
7109 * tests/ignore-mmap: Remove file.
7110 * NEWS (Maintenance): Mention it.
7111
71122014-01-26 Jim Meyering <meyering@fb.com>
7113
7114 maint: move two local variable declarations
7115 * src/dfasearch.c (kwsmusts): Move one declaration down to the point
7116 of definition. Move another into the sole scope where it is used.
7117
71182014-01-26 Norihiro Tanaka <noritnk@kcn.ne.jp>
7119
7120 dfasearch: skip kwset optimization when multi-byte+case-insensitive
7121 Now that DFA searching works with multi-byte locales, the only remaining
7122 reason to case-convert the searched input is the kwset optimization.
7123 But multi-byte case-conversion is so expensive that it's not
7124 worthwhile even to attempt that optimization.
7125
7126 * src/dfasearch.c (kwsmusts): Skip this function in ignore-case mode
7127 when the locale is multi-byte.
7128 (EGexecute): Now that this code need not handle multi-byte case-ignoring
7129 matches, remove the expensive copy/case-conversion code.
7130 With no case-converted buffer, there is no longer any need to call
7131 mb_case_map_apply, so remove it and associated code.
7132 (kwsincr_case): Remove function. Now, every use of this function
7133 is equivalent to a use of kwsincr. Replace all uses.
7134 * tests/turkish-eyes: Test all of -E, -F and -G.
7135
71362014-01-25 Norihiro Tanaka <noritnk@kcn.ne.jp>
7137
7138 dfa: remove GREP-ifdef'd code in favor of code used by gawk
7139 For many years, gawk and grep have used different #ifdef'd bits of
7140 code relating to how the DFA matcher matches multibyte characters.
7141 Remove the GREP-specific code in favor of the code gawk uses. This
7142 permits us to avoid still more cases in which grep must resort to
7143 the expensive process of copying/case-converting each input line
7144 before matching against a case-converted regexp.
7145 * src/dfa.c (parse_bracket_exp, atom): As above.
7146
71472014-01-25 Jim Meyering <meyering@fb.com>
7148
7149 gnulib: update to latest
7150
71512014-01-17 Paul Eggert <eggert@cs.ucla.edu>
7152
7153 grep: DFA now uses rational ranges in unibyte locales
7154 Problem reported by Aharon Robbins in <http://bugs.gnu.org/16481>.
7155 * NEWS:
7156 * doc/grep.texi (Environment Variables)
7157 (Character Classes and Bracket Expressions):
7158 Document this.
7159 * src/dfa.c (parse_bracket_exp): Treat unibyte locales like multibyte.
7160
71612014-01-17 Aharon Robbins <arnold@skeeve.com>
7162
7163 grep: add undocumented '-X gawk' and '-X posixawk' options
7164 See <http://bugs.gnu.org/16481>.
7165 * src/grep.c (GAcompile, PAcompile): New functions.
7166 (const): Use them.
7167
71682014-01-10 Pádraig Brady <P@draigBrady.com>
7169
7170 tests: remove superfluous uses of printf
7171 * tests/turkish-eyes: Remove unnecessary uses of printf.
7172
71732014-01-09 Jim Meyering <meyering@fb.com>
7174
7175 grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
7176 These days, nearly everyone uses a multibyte locale, and grep is often
7177 used with the --ignore-case (-i) option, but that option imposes a very
7178 high cost in order to handle some unusual cases in just a few multibyte
7179 locales. This change gets most of the performance of using LC_ALL=C
7180 without eliminating the ability to search for multibyte strings.
7181
7182 With the following example, I see an 11x speed-up with a 2.3GHz i7:
7183 Generate a 10M-line file, with each line consisting of 40 'j's:
7184
7185 yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k
7186
7187 Time searching it for the simple/noexistent string "foobar",
7188 first with this patch (best-of-5 trials):
7189
7190 LC_ALL=en_US.UTF-8 env time src/grep -i foobar k
7191 1.10 real 1.03 user 0.07 sys
7192
7193 Back out that commit (temporarily), recompile, and rerun the experiment:
7194
7195 git log -1 -p|patch -R -p1; make
7196 LC_ALL=en_US.UTF-8 env time src/grep -i foobar k
7197 12.50 real 12.41 user 0.08 sys
7198
7199 The trick is to realize that for some search strings, it is easy
7200 to convert to an equivalent one that is handled much more efficiently.
7201 E.g., convert this command:
7202
7203 grep -i foobar k
7204
7205 to this:
7206
7207 grep '[fF][oO][oO][bB][aA][rR]' k
7208
7209 That allows the matcher to search in buffer mode, rather than having to
7210 extract/case-convert/search each line separately. Currently, we perform
7211 this conversion only when search strings contain neither '\' nor '['.
7212 See the comments for more detail.
7213
7214 * src/main.c (trivial_case_ignore): New function.
7215 (main): When possible, transform the regexp so we can drop the -i.
7216 * tests/turkish-eyes: New file.
7217 * tests/Makefile.am (TESTS): Use it.
7218 * NEWS (Improvements): Mention it.
7219
72202014-01-07 Paul Eggert <eggert@cs.ucla.edu>
7221
7222 tests: port Solaris 10 /bin/sh patch back to GNU/Linux
7223 Problem reported by Jim Meyering.
7224 * tests/bre, tests/ere, tests/spencer1-locale:
7225 Prefer re_shell, not re_shell_.
7226 * tests/init.sh (re_shell): New var, which is exported instead of
7227 re_shell_.
7228
7229 Port to Solaris 10 /bin/sh.
7230 Problem reported by Dagobert Michelsen in <http://bugs.gnu.org/16380>.
7231 * tests/bre, tests/ere, tests/spencer1-locale:
7232 Prefer re_shell_ to SHELL, if re_shell_ is set.
7233 * tests/init.sh (re_shell_): Export if it's used.
7234
72352014-01-01 Jim Meyering <meyering@fb.com>
7236
7237 maint: post-release administrivia
7238 * NEWS: Add header line for next release.
7239 * .prev-version: Record previous version.
7240 * cfg.mk (old_NEWS_hash): Auto-update.
7241
7242 version 2.16
7243 * NEWS: Record release date.
7244
7245 gnulib: update to latest, for maint.mk fix
7246
7247 maint: update copyright dates for 2014
7248 Do that by running "make update-copyright".
7249
7250 gnulib: update to latest
7251
72522013-12-31 Jim Meyering <meyering@fb.com>
7253
7254 pcre: use PCRE_NO_UTF8_CHECK properly
7255 In order to obtain the behavior we want, i.e., to disable
7256 error-on-invalid-UTF-in-input, apply this PCRE option in
7257 pcre_exec, not when compiling.
7258 * src/pcresearch.c (Pexecute): Use PCRE_NO_UTF8_CHECK here, ...
7259 (Pcompile): ...rather than here.
7260 * tests/pcre-invalid-utf8-input: Adjust test case to test for this.
7261
72622013-12-26 Jim Meyering <meyering@fb.com>
7263
7264 maint: fix inconsistent spacing in expression
7265 * src/main.c (prline): Fix inconsistent spacing in expression:
7266 s/ / /.
7267
72682013-12-26 behoffski <behoffski@grouse.com.au>
7269
7270 maint: fix a garbled comment
7271 * src/dfa.c (XNMALLOC, etc.): Fix garbled comment wording.
7272
72732013-12-23 Jim Meyering <meyering@fb.com>
7274
7275 maint: fix/improve a comment
7276 * src/main.c (prline): Replace untrue FIXME comment with one
7277 telling how the hard-to-reach code can be exercised.
7278
72792013-12-21 Santiago Ruano Rincón <santiago@debian.org>
7280
7281 pcre: tell grep -P to relax its stance on invalid multibyte chars
7282 Do not exit-2 for invalid UTF-8 characters. Just prior to this
7283 change, this command would match no lines and fail like this:
7284 $ printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 grep -P j|cat -A; echo $?
7285 grep: invalid UTF-8 byte sequence in input
7286 2
7287 After this change, the same command matches both lines, and succeeds:
7288 jM-^B$
7289 j$
7290 0
7291 * src/pcresearch.c (Pcompile): Use PCRE_NO_UTF8_CHECK, too, and
7292 add a comment.
7293 * tests/pcre-utf8: Add a test and a comment.
7294 This change did not work with Debian unstable pcre-8.31-2
7295 or with some 8.33 and 8.34-based versions, but does work with
7296 Fedora 20's 8.33 and with a built-from-latest source library.
7297 Based on a patch by Santiago Ruano Rincón.
7298 See http://bugs.gnu.org/15758/
7299
73002013-12-21 Jim Meyering <meyering@fb.com>
7301
7302 tests: avoid FP failure due to exhausted memory
7303 * tests/long-line-vs-2GiB-read: Don't declare the test "failed"
7304 when running out of memory. In that case, skip it.
7305
73062013-12-18 Jim Meyering <meyering@fb.com>
7307
7308 maint: add comments and split some long lines
7309 * src/main.c (do_execute): Add a comment.
7310 Split some lines longer than 80 bytes.
7311
7312 pcre: avoid a nominal leak
7313 * src/pcresearch.c (Pcompile)[HAVE_LIBPCRE && !PCRE_STUDY_JIT_COMPILE]:
7314 We would leak "re" if built with HAVE_LIBPCRE but without
7315 PCRE_STUDY_JIT_COMPILE. Move the free out one level.
7316
7317 maint: indent cpp directives to reflect nesting
7318 * src/pcresearch.c: Insert spaces after a few "#", to indent
7319 cpp directives to reflect their nesting.
7320
7321 grep: handle lines longer than INT_MAX on more systems
7322 When trying to exercize some long-line-handling code, I ran these
7323 commands:
7324 $ dd bs=1 seek=2G of=big < /dev/null; grep -l x big; echo $?
7325 grep: big: Invalid argument
7326 2
7327 grep should not have issued that diagnostic, and it should
7328 have exited with status 1, not 2. What happened?
7329 grep read the 2GiB of NULs, doubled its buffer size,
7330 copied the 2GiB into the new 4GiB buffer, and proceeded
7331 to call "read" with a byte-count argument of 2^32.
7332 On at least Darwin 12.5.0, that makes read fail with EINVAL.
7333 The solution is to use gnulib's safe_read wrapper.
7334 * src/main.c: Include "safe-read.h"
7335 (fillbuf): Use safe_read, rather than bare read. The latter
7336 cannot handle a read size of 2^32 on some systems.
7337 * bootstrap.conf (gnulib_modules): Add safe-read.
7338 * tests/long-line-vs-2GiB-read: New file.
7339 * tests/Makefile.am (TESTS): Add it.
7340 * NEWS (Bug fixes): Mention it.
7341
73422013-11-25 Jim Meyering <meyering@fb.com>
7343
7344 tests: port to non-GNU sed
7345 * tests/multibyte-white-space (utf8_space_characters): The generation
7346 of test inputs relied on GNU sed's interpretation of \<, but that is
7347 not portable, and caused spurious test failures. Adjust the sed regexp
7348 to work on all versions.
7349 Reported by Karl Dubost in http://bugs.gnu.org/15953.
7350
73512013-11-22 Jim Meyering <meyering@fb.com>
7352
7353 maint: minor cleanup: xmalloc+strcpy -> xmemdup
7354 * src/main.c (main): Replace an xmalloc+strcpy combination
7355 with an equivalent use of xmemdup.
7356
73572013-11-21 Jim Meyering <meyering@fb.com>
7358 Paul Eggert <eggert@cs.ucla.edu>
7359
7360 dfa: avoid undefined behavior of "1 << 31"
7361 * src/dfa.c (charclass): Change type from "int" to "unsigned int".
7362 (tstbit): Rather than shifting "1" left to form a mask, shift the
7363 LHS bits the right and use "1" as the mask. Also, return bool, rather
7364 than "int".
7365 (setbit, clrbit, dfastate): Don't shift "1" (aka (int)1) left by 31 bits.
7366 Instead, use "1U" as the operand, to avoid undefined behavior.
7367 Spotted by gcc's new -fsanitize=undefined.
7368
73692013-11-02 Jim Meyering <meyering@fb.com>
7370
7371 grep: fix regression with -P vs. invalid UTF-8 input
7372 * src/pcresearch.c (Pexecute): Don't abort upon unexpected
7373 PCRE-specific error code. Explicitly handle PCRE_ERROR_BADUTF8,
7374 and change the default to print a diagnostic including the unhandled
7375 integer PCRE error code and exit with status 2.
7376 * tests/pcre-invalid-utf8-input: New file.
7377 * tests/Makefile.am (TESTS): Add it.
7378 * NEWS (Bug fixes): Mention it.
7379 * THANKS: Update.
7380 Reported by Dave Reisner in http://bugs.gnu.org/15758.
7381
7382 grep: fix regression involving \s and \S
7383 Commit v2.14-40-g01ec90b made \s and \S work with multi-byte
7384 characters, but it made it so any use like \s*, \s+, \s?, \s{3}
7385 would malfunction in a multi-byte locale.
7386 * src/dfa.c (lex): Also reset laststart.
7387 * tests/backslash-s-and-repetition-operators: New file.
7388 * tests/Makefile.am (TESTS): Add it.
7389 * NEWS (Bug fixes): Mention it.
7390 * THANKS: Update.
7391 Reported by Mirraz Mirraz in http://bugs.gnu.org/15773.
7392
73932013-11-01 Jim Meyering <meyering@fb.com>
7394
7395 maint: NEWS: document a release-related bug fix
7396 * NEWS (Bug fixes): Add an entry for a fix pulled from gnulib.
7397
73982013-10-26 Jim Meyering <meyering@fb.com>
7399
7400 build: update gnulib submodule to latest
7401 This pulls in a gnulib fix for maint.mk that ensures the procedure
7402 described in README-release actually does what we want. Before this
7403 change, that procedure resulted in a grep-2.15 tarball that would
7404 lead to a grep binary whose --version- reported version number was
7405 2.14.51... rather than the expected 2.15.
7406
7407 maint: avoid automake deprecation warning re ACLOCAL_AMFLAGS
7408 * Makefile.am (ACLOCAL_AMFLAGS): Don't use this deprecated variable.
7409 * configure.ac (AC_CONFIG_MACRO_DIRS): Use this instead.
7410 (AUTOMAKE_OPTIONS): Require automake-1.12.
7411
7412 maint: post-release administrivia
7413 * NEWS: Add header line for next release.
7414 * .prev-version: Record previous version.
7415 * cfg.mk (old_NEWS_hash): Auto-update.
7416
7417 version 2.15
7418 * NEWS: Record release date.
7419
74202013-10-25 Paul Eggert <eggert@cs.ucla.edu>
7421
7422 build: port to AIX
7423 Problem reported by Pavel Kharitonov in <http://bugs.gnu.org/15690#68>.
7424 * src/Makefile.am (LDADD): Add $(LIBTHREAD).
7425
7426 build: avoid duplicate -funit-at-a-time etc. options
7427 * configure.ac (WERROR_CFLAGS): Don't add -fdiagnostics-show-option
7428 and -funit-at-a-time, as Gnulib does that for us now, and we're
7429 merely piling on duplicats.
7430
74312013-10-24 Jim Meyering <meyering@fb.com>
7432
7433 tests: port more tests to bourne shells with hex-challenged printf
7434 * tests/pcre-utf8: Convert the hex \xHH literals for the euro symbol
7435 to octal \OOO.
7436 * tests/turkish-I: Likewise for "I with dot".
7437 * tests/turkish-I-without-dot: Likewise for another Turkish I: U+0131.
7438
7439 maint: clean up an ugly 'while' condition
7440 * src/main.c (get_nondigit_option): Separate a slightly baroque
7441 "while" expression into two separate statements, both inside the loop.
7442
74432013-10-23 Jim Meyering <meyering@fb.com>
7444
7445 tests: port to bourne shells whose printf doesn't grok hex
7446 Use octal escapes, not hex, in printf(1) format strings,
7447 and in one case, use $AWK's printf so we can continue
7448 to use the table of hex values.
7449 * tests/char-class-multibyte: Use printf octal escapes, not hex,
7450 for portability to shells like dash and Solaris 10's /bin/sh.
7451 * tests/backslash-s-vs-invalid-multitype: Likewise.
7452 * tests/surrogate-pair: Likewise.
7453 * tests/unibyte-bracket-expr: Count in decimal and convert to octal.
7454 * tests/multibyte-white-space (hex_printf): New function.
7455 Use it in place of printf so we can retain the table of hex digits
7456 without hitting the limitation of some bourne shells.
7457 Reported by Paul Eggert in http://bugs.gnu.org/15690#11
7458
74592013-10-21 Jim Meyering <meyering@fb.com>
7460
7461 gnulib: update to latest
7462
7463 maint: remove now-unused wcscoll module
7464 * bootstrap.conf (gnulib_modules): Remove wcscoll; no longer used.
7465
74662013-10-20 Paul Eggert <eggert@cs.ucla.edu>
7467
7468 build: avoid chatter from Automake 1.14
7469 * configure.ac (AM_INIT_AUTOMAKE): Add subdir-objects.
7470
7471 build: port shell pattern to Solaris 10
7472 * configure.ac: Don't use unquoted '^' in a pattern, as this
7473 breaks 'configure' on Solaris 10, whose /bin/sh complains about it,
7474 which causes 'configure' to exit even before it finds a decent shell.
7475 Unix 7th edition shell accepted '^' as an alias for '|'.
7476
7477 build: port to platforms that predefine _FORTIFY_SOURCE
7478 Problem reported by Brenton Hoff (Bug#15663).
7479 * configure.ac (_FORTIFY_SOURCE): Don't define if already defined.
7480 This is what Emacs does.
7481
74822013-10-20 Jim Meyering <meyering@fb.com>
7483
7484 build: update gnulib submodule to latest
7485
74862013-10-19 Jim Meyering <meyering@fb.com>
7487
7488 tests: extend the multibyte-white-space test
7489 * tests/multibyte-white-space (utf8_space_characters): Add more
7490 single-byte whitespace characters. Align RHS hex values and
7491 make the sed substitution less rigid, to accommodate.
7492 Also, ensure that grep '\S' exits with status 1.
7493
7494 maint: update bootstrap to latest from gnulib
7495 * bootstrap: Update from gnulib.
7496
7497 maint: fix typo in NEWS
7498 * NEWS: Fix/improve example commands in most recent entry.
7499 The LC_ALL envvar setting goes before grep, not before printf.
7500 Don't reference src/ in the second example command, and do specify
7501 the locale.
7502
75032013-10-09 Jim Meyering <meyering@fb.com>
7504
7505 tests: add a test for better coverage of some tricky code
7506 * tests/spencer1.tests: Add a non-range bracket expression representing the
7507 same regexp, to cover the alternate code path, the one that does not require
7508 a regcomp/exec call to interpret the regexp.
7509
75102013-10-01 Jim Meyering <meyering@fb.com>
7511
7512 tests: ensure neither \s nor \S matches an invalid multibyte character
7513 * tests/backslash-S-vs-invalid-multitype: New file.
7514 Prompted by the bug report from Roman at
7515 http://savannah.gnu.org/bugs/?40009
7516 * tests/Makefile.am (TESTS): Add it.
7517
7518 dfa: fix \s and \S to work for multibyte
7519 * src/dfa.c (lex): In multibyte mode, we can't treat \s and \S as we do
7520 in single-byte mode. Map them to [[:space:]] and [^[:space:]] respectively,
7521 to make the DFA matcher use the regex-matcher for this term.
7522 * tests/multibyte-white-space: New file. Test for the bug.
7523 * tests/Makefile.am (TESTS): Add it.
7524 This bug was introduced with the addition of DFA support
7525 for \s and \S in commit v2.5.4-112-gf979ca0.
7526
75272013-09-30 Jim Meyering <meyering@fb.com>
7528
7529 maint: change all references: s/POSIX\.2/POSIX/
7530 There is no longer any point in referring to POSIX.N.
7531 POSIX is sufficient.
7532 * doc/grep.in.1: As above.
7533 * src/main.c (main): Likewise.
7534 * tests/file: Likewise.
7535 * tests/options: Likewise.
7536 * ChangeLog: Likewise.
7537 * NEWS: Likewise.
7538 * cfg.mk: Update, to match changed NEWS.
7539 Inspired by Glenn Golden's suggestion in http://bugs.gnu.org/15486
7540
75412013-09-22 Jim Meyering <meyering@fb.com>
7542
7543 dfa: remove dead disjunct
7544 * src/dfa.c (parse_bracket_exp): Remove dead disjunct.
7545 At that point, we know MB_CUR_MAX <= 1, so the test,
7546 MB_CUR_MAX > 1 && ... is always false. Remove the disjunct.
7547
7548 maint: dfa: improve comments and formatting
7549 * src/dfa.c (add_utf8_anychar): Correct wording/alignment of a comment.
7550 (dfaexec): Add curly braces around multi-line while statement within
7551 a "then" block.
7552 (ANYCHAR): Clarify comment: "." does not match an invalid UTF8 character.
7553 (parse_bracket_exp) Improve comment.
7554
75552013-09-08 Jim Meyering <meyering@fb.com>
7556
7557 dfa: appease a static analyzer, and save 95 stack bytes
7558 * src/dfa.c (MAX_BRACKET_STRING_LEN): Rename from BRACKET_BUFFER_SIZE
7559 and decrease from 128 to 32.
7560 (parse_bracket_exp): Add one byte more than MAX_BRACKET_STRING_LEN
7561 to the length of "str" buffer, to avoid appearance that we may store
7562 the trailing NUL beyond the end of buffer. A string of length 32
7563 or greater is rejected by earlier processing, so would never reach
7564 this code. Addresses http://bugs.gnu.org/15307
7565
75662013-09-01 Corinna Vinschen <vinschen@redhat.com>
7567
7568 fix Cygwin UTF-16 surrogate-pair handling with -i
7569 grep -i would segfault on systems using UTF-16-based wchar_t (Cygwin)
7570 when converting an input string containing certain 4-byte UTF-8
7571 sequences to lower case. The conversions to wchar_t and back to
7572 a UTF-8 multibyte string did not take surrogate pairs into account.
7573 * src/searchutils.c (mbtolower) [__CYGWIN__]: Detect and handle
7574 surrogate pairs when converting.
7575 * NEWS (Bug fixes): Mention it.
7576 * tests/surrogate-pair: New test.
7577 * tests/Makefile.am (TESTS): Add it.
7578 Reported by: Jim Burwell
7579
75802013-08-19 Paul Eggert <eggert@cs.ucla.edu>
7581
7582 doc: mention how to use the latest gnulib
7583 * README-hacking: Steal some text from coreutils/README-hacking.
7584
75852013-08-10 Jim Meyering <meyering@fb.com>
7586
7587 build: update gnulib-related code
7588 * gnulib: Update submodule to latest.
7589 * bootstrap: Update from gnulib.
7590 * gl/lib/regex_internal.h.diff: Update to reflect gnulib changes.
7591 * bootstrap.conf: Partial sync from coreutils.
7592
75932013-08-09 Jim Meyering <meyering@fb.com>
7594
7595 tests: simplify and factor newest test
7596 * tests/char-class-multibyte2: Simplify file names.
7597 Factor out $e_acute, so that the grep argument representation
7598 is ascii (though the value is still UTF8).
7599
7600 doc: NEWS: mention the DFA segfault fix
7601 * NEWS (Bug fixes): List the DFA segfault fix.
7602
76032013-07-05 Paul Eggert <eggert@cs.ucla.edu>
7604
7605 Redo comments and white space to better approach GNU style.
7606
76072013-07-05 Paolo Bonzini <bonzini@gnu.org>
7608
7609 tests: add testcase for previous change
7610 * tests/Makefile.am (TESTS): add char-class-multibyte2.
7611 * tests/char-class-multibyte2: New file.
7612
76132013-07-05 Mike Haertel <mike@ducky.net>
7614
7615 dfa: fix multibyte character in brackets with repetition
7616 Let FOO stand for any multibyte (e.g. CJK character) in the regexp.
7617 It turns out the following much simpler regexp:
7618 ([^.]*[FOO]){1,2}
7619 is sufficient to cause the crash.
7620
7621 In the first step of its parsing, DFA transforms regexp from human
7622 readable syntax into reverse-polish form. For regexps of the form a{m,n}
7623 repeat counts, it simply builds repeated copies of the representation
7624 of a, with appropriate inserted CAT and QMARK operators. For the above
7625 example with a regexp of the form a{1,2} it would build:
7626
7627 <RPN representation for a>
7628 <RPN representation for a>
7629 QMARK
7630 CAT
7631
7632 When building repeated copies of RPN representations, additional
7633 copies of the RPN representations are made by calling a function
7634 copytoks() with arguments consisting of the start position and
7635 length of the original copy.
7636
7637 The problem is that the current code for copytoks() is simply
7638 incorrect. It operates by calling addtok() for each individual
7639 token in the source range being copied. But, in the particular
7640 case that the token being added is MBCSET, addtok():
7641
7642 (1) incorrectly assumes that the character set being added to be added
7643 is the one most (addtok has no argument to indicate which cset is
7644 being added, so it just uses the latest one)
7645
7646 (2) attempts to do some token sequence expansion into more primitive
7647 operators so things like [FOO] are matched efficiently.
7648
7649 Both of these assumptions are incorrect in the case that addtok()
7650 is being called from copytoks(): (1) is simply not true, and
7651 (2) is redundant--the expansion has already been done token sequence
7652 being copied, so there is no need to do the expansion again.
7653
7654 The correct function to add exactly one token, without further expansion,
7655 is addtok_mb(). So here is my proposed fix, which is that copytoks()
7656 should never call addtok(), but instead directly call addtok_mb()
7657 (which is what addtok() eventually calls).
7658
7659 * src/dfa.c (copytoks): Rewrite using addtok_mb directly.
7660
76612013-05-28 Jim Meyering <meyering@fb.com>
7662
7663 maint: align backslashes consistently
7664 * tests/Makefile.am: Most backslashes were aligned with TABs,
7665 so adjust the few that used spaces to conform.
7666
7667 grep -F: avoid an infinite loop with invalid multi-byte search string
7668 * src/kwsearch.c (Fexecute): Avoid an infinite loop when processing
7669 a fixed (-F) multibyte search string that is an invalid byte sequence
7670 in the current locale and that matches the bytes of the input twice
7671 on a line. Reported by Daisuke GOTO in
7672 http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4773
7673 * tests/invalid-multibyte-infloop: New test.
7674 * tests/Makefile.am (TESTS): Add it.
7675 * NEWS (Bug fixes): Mention it.
7676
76772013-04-18 Paul Eggert <eggert@cs.ucla.edu>
7678
7679 * cfg.mk (old_NEWS_hash): Update.
7680
7681 doc: document EREs like a{,10}
7682 Problem reported by Eric Blake in
7683 <http://lists.gnu.org/archive/html/bug-grep/2013-04/msg00005.html>.
7684 * NEWS: Document the bug fix.
7685 * doc/grep.in.1: Restore documentation for this feature, but mention
7686 that it is a GNU extension.
7687 * doc/grep.texi (Fundamental Structure): Mention that this feature
7688 is a GNU extension.
7689
76902013-04-02 Paul Eggert <eggert@cs.ucla.edu>
7691
7692 build: make dfa.c closer to Gawk's
7693 * src/dfa.c: Include <stddef.h>, not <sys/types.h>.
7694 stddef.h is smaller and is all we need and is portable nowadays.
7695 Include <wchar.h> and <wctype.h> only if MBS_SUPPORT.
7696
76972013-01-15 Paul Eggert <eggert@cs.ucla.edu>
7698
7699 grep: make dfa.h standalone
7700 Problem reported by Aharon Robbins in
7701 <http://lists.gnu.org/archive/html/bug-grep/2013-01/msg00007.html>.
7702 * src/dfa.c: Include dfa.h first, so that it's tested standalone.
7703 No need to include <regex.h>, since we are in charge of dfa.h and
7704 know that it includes <regex.h>.
7705 * src/dfa.h: Include <regex.h> and <stddef.h>, so that it's standalone.
7706
77072013-01-11 Stefano Lattarini <stefano.lattarini@gmail.com>
7708
7709 build: update gettext version to 0.18.2
7710 * configure.ac (AM_GNU_GETTEXT_VERSION): Update to 0.18.2.
7711 This is necessary to have the gettext-provided m4 files to use
7712 AC_PROG_MKDIR_P rather than AM_PROG_MKDIR_P. This latter macro,
7713 planned to disappear in Automake 1.14, has already been removed
7714 in the development version of Automake, so that, without this
7715 change, grep fails to bootstrap with bleeding-edge Automake.
7716
77172013-01-11 Paul Eggert <eggert@cs.ucla.edu>
7718
7719 build: update gnulib submodule to latest
7720
77212013-01-11 Stefano Lattarini <stefano.lattarini@gmail.com>
7722
7723 build: remove redundant use of $(INCLUDES)
7724 * lib/Makefile.am (INCLUDES): Remove. Automake automatically adds
7725 $(srcdir) and $(top_builddir) to the C preprocessor search path.
7726 INCLUDES is deprecated in Automake 1.13 (causing a runtime
7727 warning), and will be removed in Automake 1.14.
7728
77292013-01-04 Jim Meyering <jim@meyering.net>
7730
7731 build: update gnulib submodule to latest
7732
7733 maint: update all copyright year number ranges
7734 Run "make update-copyright".
7735
77362012-11-20 Paul Eggert <eggert@cs.ucla.edu>
7737
7738 grep: normalize diagnostics
7739 * src/pcresearch.c (Pcompile): Use similar format diagnostics
7740 as elsewhere, and translate them.
7741
77422012-11-19 Paul Eggert <eggert@cs.ucla.edu>
7743
7744 grep: diagnose read errors from -f dir, porting to Solaris
7745 Problem reported by Dennis Clarke for Solaris 10 in
7746 <http://lists.gnu.org/archive/html/bug-grep/2012-11/msg00009.html>.
7747 * src/main.c (main): For -f F, diagnose any read errors
7748 encountered when reading F.
7749 * tests/Makefile.am (XFAIL_TESTS): Remove grep-dir.
7750 * tests/grep-dir: Don't assume that directories cannot be read
7751 via fread, as POSIX allows this and it can happen on Solaris.
7752
77532012-11-09 Paolo Bonzini <bonzini@gnu.org>
7754
7755 pcre: add PCRE-JIT support for grep
7756 * NEWS: Document new feature.
7757 * src/pcresearch.c [PCRE_STUDY_JIT_COMPILE] (jit_stack): New.
7758 [PCRE_STUDY_JIT_COMPILE] (Pcompile): JIT-compile the regular expression
7759 and allocate a stack for it. Based on a patch from Zoltan Herczeg.
7760 * THANKS: Add Zoltan to the list.
7761
77622012-10-24 Paul Eggert <eggert@cs.ucla.edu>
7763
7764 build: go back to AC_PROG_CC
7765 * configure.ac: Go back to using AC_PROG_CC rather than AC_PROG_CC_STDC,
7766 as the latter is obsolescent and the Autoconf bug involving the former
7767 has been fixed.
7768
77692012-10-24 Jim Meyering <jim@meyering.net>
7770
7771 build: use AC_PROG_CC_STDC rather than AC_PROG_CC
7772 * configure.ac: Use AC_PROG_CC_STDC rather than AC_PROG_CC,
7773 to accommodate autoconf-2.69-37+.
7774
7775 build: update gnulib submodule to latest
7776
77772012-10-23 Eric Blake <eblake@redhat.com>
7778
7779 build: default to --enable-gcc-warnings in a git tree
7780 Anyone building from cloned sources can be assumed to have a new
7781 enough environment, such that enabling gcc warnings by default will
7782 be useful. Tarballs still default to no warnings, and the default
7783 can still be overridden with --disable-gcc-warnings.
7784 * configure.ac (gl_gcc_warnings): Set default based on environment.
7785
77862012-10-03 Jim Meyering <meyering@redhat.com>
7787
7788 maint: factor out STREQ definition
7789 * src/main.c (STREQ): Remove definition.
7790 * src/pcresearch.c: (STREQ): Likewise.
7791 * src/system.h (STREQ): Define it here instead.
7792
7793 maint: correct syntax-check failures; adjust NEWS
7794 * tests/pcre-utf8: Reverse order of compare arguments.
7795 Remove all copyright year numbers except 2012.
7796 Use skip_ "diagnostic...", rather than a bare "exit 77".
7797 * NEWS: Start with a concise description of the bug.
7798 * src/pcresearch.c (STREQ): Define, so that we can...
7799 (Pcompile): use STREQ, not strcmp.
7800
78012012-10-03 Paolo Bonzini <bonzini@gnu.org>
7802
7803 tests: include UTF-8 testcases for grep -P
7804 * tests/Makefile.am (TESTS): Add pcre-utf8.
7805 * tests/pcre-utf8: New file.
7806
78072012-10-03 Petr Pisar <ppisar@redhat.com>
7808
7809 pcresearch: set UTF-8 flag correctly for UTF-8 locales
7810 Otherwise, Unicode properties (\p{XXX}) do not work with characters
7811 outside the 7-bit ASCII character set.
7812
7813 * src/pcresearch.c (Pcompile): Look for UTF-8 locales and set PCRE_UTF8
7814 if one is found.
7815
78162012-10-03 Jaroslav Škarvada <jskarvad@redhat.com>
7817
7818 doc: fix a formatting bug in grep.1 template
7819 * doc/grep.in.1: Insert .TP before the paragraph describing
7820 --dereference-recursive (-R).
7821
78222012-10-03 Jim Meyering <meyering@redhat.com>
7823
7824 maint: placate gcc's -Wjump-misses-init warning
7825 * src/kwsearch.c (Fexecute): Replace a "goto" and "return" with
7826 a simple return statement, eliminating the label, since that was
7827 the sole use.
7828 * src/dfasearch.c (EGexecute): Likewise.
7829
78302012-09-01 Jim Meyering <meyering@redhat.com>
7831
7832 build: update gnulib submodule to latest
7833
78342012-09-01 Eric Blake <eblake@redhat.com>
7835
7836 build: work with new glibc when not optimizing
7837 Starting with glibc 2.15, the system headers refuse to compile
7838 unconditional use of FORTIFY_SOURCE if optimization is disabled
7839 but -Werror is in effect.
7840
7841 * configure.ac (FORTIFY_SOURCE): Make conditional.
7842
78432012-08-19 Jim Meyering <meyering@redhat.com>
7844
7845 maint: post-release administrivia
7846 * NEWS: Add header line for next release.
7847 * .prev-version: Record previous version.
7848 * cfg.mk (old_NEWS_hash): Auto-update.
7849
7850 version 2.14
7851 * NEWS: Record release date.
7852
78532012-08-07 Jim Meyering <meyering@redhat.com>
7854
7855 build: update gnulib and bootstrap
7856
7857 tests: test for bug with -i and ^$ in a multi-byte locale
7858 * tests/empty-line-mb: New file.
7859 * tests/Makefile.am (TESTS): Add it.
7860
7861 grep -i '^$' in a multi-byte locale could report a false match
7862 * src/dfasearch.c (EGexecute): Do not match the sentinel "newline"
7863 that is appended to each buffer.
7864 This bug may sound like a big deal (it certainly surprised me), but
7865 realize that only the empty-line-matching regular expression '^$'
7866 can trigger it, and then only when you add the unnecessary (and
7867 arguably superfluous) -i, *and* run the command in a multi-byte
7868 locale. Using a multi-byte locale for such a regular expression
7869 is also pointless, and hurts performance.
7870 * NEWS (Bug fixes): Mention it.
7871 Reported by Alexander Katassonov <katasso@gmx.de>
7872
78732012-08-06 Jim Meyering <meyering@redhat.com>
7874
7875 tests: fix a skip diagnostic that mentioned the wrong locale
7876 * tests/init.cfg (require_tr_utf8_locale_): s/en_US/tr_TR/
7877
78782012-08-02 Jim Meyering <meyering@redhat.com>
7879
7880 tests: skip failing test on FS/system that lack SEEK_HOLE support
7881 * tests/big-hole: Test for SEEK_HOLE support. If not available,
7882 skip this test. Hence, this test is now skipped on linux-3.5.0 with
7883 ext4 or tmpfs. The test runs (and passes) with at least btrfs, xfs,
7884 or ocfs2.
7885 * bootstrap.conf (gnulib_modules): Use the perl module.
7886
78872012-07-30 Jim Meyering <meyering@redhat.com>
7888
7889 maint: optimize long-line processing
7890 * src/main.c (grep): Use memrchr rather than an open-coded loop,
7891 reducing the cost of the replaced code by 50% when processing very
7892 long lines. If there were a rawmemrchr function (analogous to glibc's
7893 rawmemchr), then the performance improvement would be even greater.
7894
78952012-07-27 Paul Eggert <eggert@cs.ucla.edu>
7896
7897 maint: remove stat-size
7898 * bootstrap.conf (gnulib_modules): Remove stat-size.
7899 * src/main.c: Don't include stat-size.h; no longer needed.
7900
7901 grep: don't falsely report compressed text files as binary
7902 * NEWS: Document this.
7903 * src/main.c (file_is_binary): Remove the heuristic based on
7904 st_blocks, as it does not work for compressed file systems.
7905 On Solaris, it'd be cheap to test whether the file system is known
7906 to be uncompressed, which allow the heuristic, but Solaris has
7907 SEEK_HOLE so there's little point.
7908
7909 grep: don't falsely report tiny text files as binary
7910 * NEWS: Document this.
7911 * src/main.c (file_is_binary): When we are already at apparent
7912 EOF, skip the file-size check, as some servers use zero blocks
7913 to store binary files. Reported by Martin Carroll in
7914 <http://lists.gnu.org/archive/html/bug-grep/2012-07/msg00016.html>.
7915
79162012-07-26 Paul Eggert <eggert@cs.ucla.edu>
7917
7918 doc: document -r/-R in man page
7919 * doc/grep.in.1: Document -r vs. -R.
7920
79212012-07-21 Jim Meyering <meyering@redhat.com>
7922
7923 tests: avoid false positive upon kernel OOM-kill
7924 * tests/big-match (skip_diagnostic): Handle case of 139 (SIGKILL)
7925 with no diagnostic.
7926
7927 build: update gnulib and bootstrap
7928
7929 maint: fix misspellings in old ChangeLog
7930 * ChangeLog-2009: Fix typos.
7931
79322012-07-19 Paul Eggert <eggert@cs.ucla.edu>
7933
7934 grep: fix ptrdiff/size_t clash
7935 Reported by Jaroslav Škarvada in <http://savannah.gnu.org/bugs/?36883>.
7936 * src/dfasearch.c (EGexecute): Use size_t, not ptrdiff_t, for lengths.
7937 Use regoff_t to store re_match's output, and test it before converting
7938 it to size_t.
7939
79402012-07-06 Jim Meyering <meyering@redhat.com>
7941
7942 maint: correct log typo, to reflect in generated ChangeLog
7943 * Makefile.am (gen-ChangeLog): Use --amend, now that we must
7944 make our first log correction.
7945 * build-aux/git-log-fix: New file.
7946
79472012-07-04 Jim Meyering <meyering@redhat.com>
7948
7949 maint: post-release administrivia
7950 * NEWS: Add header line for next release.
7951 * .prev-version: Record previous version.
7952 * cfg.mk (old_NEWS_hash): Auto-update.
7953
7954 version 2.13
7955 * NEWS: Record release date.
7956
7957 build: update gnulib submodule, bootstrap, init.sh
7958
79592012-06-17 Jim Meyering <meyering@redhat.com>
7960
7961 tests: add another turkish-I-related test case
7962 * tests/turkish-I-without-dot: Also exercise the case in which
7963 the original string and the lower-case buffer have precisely
7964 the same length (22 bytes here), yet internal offsets do differ.
7965
79662012-06-16 Jim Meyering <meyering@redhat.com>
7967
7968 grep -i: work also when converting to lower-case inflates byte count
7969 Commit v2.12-16-g7aa698d addressed the case in which the lower-case
7970 representation of an input byte occupies fewer bytes than the original.
7971 However, even with commit v2.12-20-g074842d, grep -i would still
7972 misbehave when converting a character to lower-case increased its
7973 byte count. The map-manipulation code assumed that the case conversion
7974 could only shrink the byte count. With the consideration that it may
7975 also inflate it, the deltas recorded in the map array must be signed,
7976 and we must account for the one-to-two-or-more mapping when the
7977 original-to-lower-case conversion causes the byte count to increase.
7978 * src/searchutils.c (mbtolower): When a lower-case character occupies
7979 more than one byte, set its remaining map slots to zero. Change the
7980 type of the map to be signed, and compute the change in character
7981 byte count as new_length - old_length.
7982 * src/search.h: Include <stdint.h>, for decl of intmax_t.
7983 (mb_case_map_apply): Adjust for signed increments:
7984 each map entry is now signed.
7985 (mb_len_map_t): Define type. Thanks to Paul Eggert for noticing
7986 in review that using a bare "char" as the base type would be wrong on
7987 systems for which it is a signed type (as with gcc's -funsigned-char).
7988 * src/kwsearch.c (Fcompile, Fexecute): Likewise.
7989 * src/dfasearch.c (kwsincr_case, EGexecute): Likewise.
7990 * tests/turkish-I-without-dot: New test. Thanks to Paolo Bonzini
7991 for the tip that in the tr_TR.utf8 locale, mapping "I" to lower case
7992 increases the character's byte count.
7993 * tests/Makefile.am (TESTS): Add it.
7994 * tests/init.cfg (require_tr_utf8_locale_): New function.
7995 * NEWS (Bug fixes): Expand the existing entry.
7996
79972012-06-12 Paul Eggert <eggert@cs.ucla.edu>
7998
7999 grep: handle -i when chars differ in length but line does not
8000 * src/searchutils.c (mbtolower): Return the map back to the caller
8001 if any input character's length differs from the corresponding output
8002 character's, not merely if the total string length differs.
8003 Problem reported by Johannes Meixner in
8004 <http://lists.gnu.org/archive/html/bug-grep/2012-06/msg00029.html>.
8005
80062012-06-07 Jim Meyering <meyering@redhat.com>
8007
8008 tests: extend coverage of dfa.c's match_mb_charset
8009 Add a test case to increase test coverage of part of dfa.c (the DFA
8010 matcher used by grep and gawk). While thinking about removing the few
8011 remaining uses of strncpy in dfa.c, I found that none of the existing
8012 tests covered the 40+ lines of code at the end of match_mb_charset,
8013 so constructed this test case to demonstrate that it's not dead code
8014 * tests/dfa-coverage: New test, for improved coverage.
8015 * tests/Makefile.am (TESTS): Add it.
8016
80172012-06-05 Jim Meyering <meyering@redhat.com>
8018
8019 build: fix a subtly twisted "make distcheck" failure
8020 "make distcheck" would fail when, during a test build,
8021 an attempt to overwrite the deliberately-write-protected
8022 $(srcdir)/grep.pot file would fail.
8023 * bootstrap.conf (bootstrap_epilogue): Don't let the existence of
8024 a large sparse file in the build directory induce "make distcheck"
8025 failure. The existence of a large sparse test file named 8T-or-so
8026 would make po/Makefile.in.in's use of grep (to search for "GNU grep"
8027 as an indication that this is a GNU package) exit 2 without generating
8028 any output, which made the first xgettext use --package-name=grep,
8029 while that same search for "GNU grep" would succeed when run
8030 from a pristine from-tarball build, thus making the second
8031 xgettext invocation use --package-name='GNU grep'.
8032 That mismatch:
8033 -"Project-Id-Version: grep 2.12.18-1080\n"
8034 +"Project-Id-Version: GNU grep 2.12.18-1080\n"
8035 led to the attempt by Makefile.in.in's grep.pot-update rule to
8036 overwrite ../../grep.pot in the read-only po/ source directory.
8037
80382012-06-03 Jim Meyering <meyering@redhat.com>
8039
8040 build: update gnulib submodule, bootstrap and init.sh
8041 cfg.mk: Exempt dfa.c from the new no-strncpy test, for now.
8042
80432012-06-02 Jim Meyering <meyering@redhat.com>
8044
8045 grep: fix how -i works with a match containing the Turkish I-with-dot
8046 Fix a long-standing problem in the way grep's -i interacts with
8047 data whose byte count changes when we convert it to lower case.
8048 For example, the UTF-8 Turkish I-with-dot (İ) occupies two bytes,
8049 but its lower case analog, i, occupies just one byte. The code
8050 converts both search string and the haystack data to lower case,
8051 and then searches for the modified string in the modified buffer.
8052 The trouble arose when using a lowercase buffer <offset,length>
8053 pair to manipulate the original (longer) buffer.
8054
8055 The solution is to change mbtolower to return additional information:
8056 a malloc'd mapping vector. With that, the caller maps the lowercase-
8057 relative <offset,length> to numbers that refer to the original buffer.
8058 This mapping is used only when lengths actually differ, so the cost
8059 in general should be small.
8060
8061 * src/searchutils.c (mbtolower): Add the new map parameter.
8062 * src/search.h (mb_case_map_apply): New function.
8063 * src/kwsearch.c (Fexecute): Update mbtolower caller, and upon
8064 success, apply the new map.
8065 * src/dfasearch.c (EGexecute): Likewise.
8066 * tests/Makefile.am (XFAIL_TESTS): Remove turkish-I from this list;
8067 that test is no longer expected to fail.
8068 * NEWS (Bug fixes): Mention it.
8069 Reported by Ilya Basin in
8070 http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3413 and later
8071 by Strahinja Kustudic in http://savannah.gnu.org/bugs/?36567
8072
80732012-06-01 Paul Eggert <eggert@cs.ucla.edu>
8074
8075 grep: remove unnecessary "what-if-signal?" code
8076 * src/main.c (fillbuf): Don't worry about EINTR when closing --
8077 not possible, since we're not catching signals.
8078
80792012-05-16 Paul Eggert <eggert@cs.ucla.edu>
8080
8081 grep: avoid nominal integer overflow
8082 * src/dfa.c (add_utf8_anychar): Avoid signed integer overflow.
8083 Although this works on all platforms we know about, strictly
8084 speaking the behavior is undefined, and Sun C 5.8 warns about it.
8085
80862012-05-15 Jim Meyering <meyering@redhat.com>
8087
8088 maint: avoid nit-picky syntax-check test failure; tweak big-hole test
8089 * NEWS: Restore deleted newline in "old" NEWS, to fix a syntax-check
8090 test failure.
8091 * tests/big-hole: Use awk, rather than a shell loop: saves 3000 lines
8092 of verbose shell output in the .log file.
8093
80942012-05-15 Paul Eggert <eggert@cs.ucla.edu>
8095
8096 grep: sparse files are now considered binary
8097 * NEWS: Document this.
8098 * doc/grep.texi (File and Directory Selection): Likewise.
8099 * bootstrap.conf (gnulib_modules): Add stat-size.
8100 * src/main.c: Include stat-size.h.
8101 (usable_st_size): New function, mostly stolen from coreutils.
8102 (fillbuf): Use it.
8103 (file_is_binary): New function, which looks for holes too.
8104 (grep): Use it.
8105 * tests/Makefile.am (TESTS): Add big-hole.
8106 * tests/big-hole: New file.
8107
81082012-05-06 Paul Eggert <eggert@cs.ucla.edu>
8109
8110 maint: quote 'like this' or "like this", not `like this'
8111 See <http://lists.gnu.org/archive/html/bug-grep/2012-01/msg00125.html>.
8112 * ChangeLog-2009, HACKING, NEWS, README-hacking, cfg.mk, configure.ac:
8113 * lib/colorize-w32.c, m4/pcre.m4:
8114 * src/Makefile.am, src/dfa.c, src/dosbuf.c, src/main.c:
8115 * tests/backref, tests/help-version, tests/tests:
8116 In commentary, quote 'like this' or "like this" rather than
8117 `like this' or ``like this''.
8118 * cfg.mk (old_NEWS_hash): Update due to changed old NEWS.
8119 * doc/grep.texi (General Output Control): Quote sample text
8120 with @samp, not with `...'.
8121 * src/main.c (usage):
8122 * tests/help-version: Quote 'like this' rather than `like this'
8123 in diagnostics.
8124
8125 exclude: process exclude and include directives in order
8126 Also, change exclude and include directives so that they apply to
8127 command-line arguments too. This restores the pre-2.6 behavior,
8128 and fixes a bug reported by Quentin Arce in
8129 <http://lists.gnu.org/archive/html/bug-grep/2012-04/msg00056.html>.
8130 * NEWS: Document this.
8131 * src/main.c (included_patterns): Remove. All uses removed.
8132 (skipped_file): New function.
8133 (grepdirent): New arg command_line; all callers changed. This is
8134 needed because non-command-line files can invoke fts_open, and
8135 their directory entries need to be distinguished from top-level
8136 directory entries. Move code into the new skipped_file function.
8137 (grepdesc): Check whether a command-line argument should be skipped.
8138 (main): --include and --exclude options now share excluded_patterns
8139 rather than having separate variables included_patterns and
8140 excluded_patterns.
8141 * tests/include-exclude: Add a test to detect the fixed bug.
8142
8143 build: update gnulib submodule to latest
8144
81452012-04-30 Jim Meyering <meyering@redhat.com>
8146
8147 cosmetic: binary operator goes *after* the newline, when split
8148 * src/dfa.c (match_mb_charset): Join split lines.
8149 (parse_bracket_exp): Move "||" from end of first split line
8150 to the beginning of the continued line.
8151 * src/dosbuf.c (dossified_pos): Likewise, but for "&&".
8152
8153 grep: -K is not an option: remove it from list
8154 The presence of "K" in the short-option string meant that
8155 an erroneous "grep -K ..." would fail with a bare Usage/Try...
8156 message, without the usual "invalid option -- 'K'". With this
8157 removal, now grep prints the expected invalid option diagnostic.
8158 * src/main.c (short_options): Remove "K".
8159 Reported by Петр Досычев in
8160 http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4488
8161
81622012-04-29 Paolo Bonzini <bonzini@gnu.org>
8163
8164 dfa: small fixes to single-byte range computation
8165 * src/dfa.c (parse_bracket_exp): Do not call regexec with an invalid
8166 subject. Move declarations before all statements.
8167
81682012-04-27 Paolo Bonzini <bonzini@gnu.org>
8169
8170 dfa: do not use hard-locale
8171 * bootstrap.conf (gnulib_modules): Remove hard-locale.
8172 * src/dfa.c (hard_LC_COLLATE): Remove.
8173 (dfaparse): Do not initialize it.
8174 (parse_bracket_exp): Always go through system regex matcher to find
8175 single byte characters matching a range.
8176
8177 drop support for Makefile.boot
8178 * Makefile.am: Do not distribute README-boot and Makefile.boot.
8179 * NEWS: Mention this change.
8180 * README-alpha: Do not mention README-boot and Makefile.boot.
8181 * Makefile.boot: Remove.
8182 * README-boot: Remove.
8183
81842012-04-27 Aharon Robbins <arnold@skeeve.com>
8185
8186 dfa: do not use strcoll to match multibyte characters in ranges
8187 This does not affect the behavior of grep, which always defers
8188 to glibc or gnulib when matching ranges.
8189 * src/dfa.c (match_mb_charset): Compare wc directly to the range
8190 endpoints.
8191
8192 dfa: include stdbool.h explicitly
8193 * src/dfa.c: Include stdbool.h explicitly
8194
81952012-04-23 Jim Meyering <meyering@redhat.com>
8196
8197 maint: post-release administrivia
8198 * NEWS: Add header line for next release.
8199 * .prev-version: Record previous version.
8200 * cfg.mk (old_NEWS_hash): Auto-update.
8201
8202 version 2.12
8203 * NEWS: Record release date.
8204
8205 build: update gnulib submodule to latest
8206
8207 tests: skip annoyingly long gnulib lock tests
8208 * bootstrap.conf (avoided_gnulib_modules): Define.
8209 (gnulib_tool_option_extras): Use it.
8210
82112012-04-22 Jim Meyering <meyering@redhat.com>
8212
8213 tests: avoid spurious quote-mismatch failure on OS/X
8214 * tests/in-eq-out-infloop: Simplify expected error output, eliminating
8215 expected quotes altogether, thus avoiding spurious OS/X-specific
8216 failure due to mismatch of multi-byte vs. single-byte quotes.
8217
82182012-04-17 Jim Meyering <meyering@redhat.com>
8219
8220 build: update gnulib submodule to latest
8221 * bootstrap: Also update this file.
8222
82232012-04-17 Jim Meyering <meyering@redhat.com>
8224
8225 grep: fix --devices=ACTION (-D) so stdin is once again exempt
8226 An oversight in the 2.11 changes made it so "echo x|grep x" would
8227 fail for those who set GREP_OPTIONS=--devices=skip.
8228
8229 * src/main.c (grepdesc): Ignore skip-related options when reading
8230 from standard input.
8231 * tests/skip-device: New file. Test for the above.
8232 * tests/Makefile.am (TESTS): Add it.
8233 * doc/grep.texi (File and Directory Selection): Clarify this point,
8234 documenting the stdin exemption.
8235 * NEWS (Bug fixes): Mention it, and add a few "[fixed in ...] notes.
8236 Reported by Tino Keitel in http://bugs.debian.org/669084,
8237 and forwarded to bug-grep by Aníbal Monsalve Salazar.
8238
82392012-04-13 Jim Meyering <meyering@redhat.com>
8240
8241 maint: dfa: correct bogus formatting
8242 * src/dfa.c (transit_state, dfaexec): s/++ * VAR/++*VAR/
8243
8244 maint: dfa: add/improve comments
8245 * src/dfa.c (transit_state_consume_1char): Note always-ignored
8246 return value.
8247 Fix typos: s/equivalent class/equivalence class/.
8248
8249 maint: dfa: avoid unnecessary uses of strcpy/strncpy
8250 * src/dfa.c (icatalloc): Use memcpy, not strcpy, given the length.
8251 (dfamust): Combine MALLOC+strcpy into cleaner xmemdup.
8252 (parse_bracket_exp): Likewise, but replace a use of strncpy.
8253
8254 grep: handle symlinked directory loops as usual
8255 * src/main.c (grepfile): Treat EMLINK just like ELOOP, for
8256 systems like FreeBSD 9.0 on which we would otherwise report
8257 "Too many links" rather than ignoring that type of failure.
8258 E.g., "mkdir d; cd d; ln -s . a; grep -r ^" would print
8259 grep: a: Too many links and would exit with status 2.
8260 Now, it prints nothing and exits with status 1, as before.
8261 Reported by Nelson H. F. Beebe.
8262
8263 tests: avoid spurious failure of the symlink test
8264 * tests/symlink: Ignore spurious "Binary file d matches" on
8265 systems for which reading from a directory actually succeeds.
8266 Reported by Bruno Haible and Nelson Beebe.
8267
82682012-04-09 Jim Meyering <meyering@redhat.com>
8269
8270 tests: avoid syntax-check failure: reverse compare arguments
8271 * tests/repetition-overflow: Fix reversed compare arguments.
8272
8273 build: update gnulib submodule to latest
8274
82752012-03-18 Paul Eggert <eggert@cs.ucla.edu>
8276
8277 grep: report overflow for ERE a{1000000000}
8278 * NEWS: Document this.
8279 * src/dfa.c (MIN): New macro.
8280 (lex): Lexically analyze the repeat-count operator once, not
8281 twice; the double-scan complicated the code and made it harder to
8282 understand and fix. Adjust the repeat-count parsing so that it
8283 better matches the behavior of the regex code, in three ways:
8284 1. Diagnose too-large repeat counts rather than treating them as
8285 literal characters. 2. Use RE_INVALID_INTERVAL_ORD, not
8286 RE_NO_BK_BRACES, to decide whether to treat invalid-syntax {...}s
8287 as literals. 3. Use the same wording for {...}-related
8288 diagnostics that the regex code uses.
8289 * tests/bre.tests, tests/ere.tests, tests/repetition-overflow:
8290 Adjust to match new behavior, and add a few tests.
8291 * cfg.mk (exclude_file_name_regexp--sc_error_message_uppercase):
8292 New macro, since the diagnostics start with uppercase letters.
8293
82942012-03-14 Paul Eggert <eggert@cs.ucla.edu>
8295
8296 grep: -r no longer follows symlinks; use fts
8297 Change -r to follow only command-line symlinks, and by default to
8298 read only devices named on the command line. This is a simple
8299 way to get a more-useful behavior when searching random
8300 directories; the idea is to use 'find' if you want something fancy.
8301 -R acts as before and gets a new alias --dereference-recursive.
8302 The code now uses fts internally, so it is more robust and
8303 faster with large hierarchies.
8304 * .gitignore: Remove lib/savedir.c, lib/savedir.h.
8305 * tests/symlink: New file
8306 * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o.
8307 Perhaps other changes are needed too, but I'm not sure what
8308 this makefile is for.
8309 * NEWS: Document changes.
8310 * doc/grep.texi (File and Directory Selection): Likewise.
8311 * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open.
8312 Add fstatat, fts, openat-safer.
8313 * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h.
8314 * lib/savedir.c, lib/savedir.h: Remove.
8315 * po/POTFILES.in: Add lib/openat-die.c.
8316 * src/main.c: Include fcntl-safer.h, fts_.h. Don't include
8317 isdir.h, savedir.h.
8318 (struct stats, stats_base): Remove.
8319 (long_options, usage, main): Add --dereference-recursive and
8320 implement -r vs -R.
8321 (filename_prefix_len, fts_options): New static vars.
8322 (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants.
8323 (devices): Now defaults to READ_COMMAND_LINE_DEVICES.
8324 (reset, grep): Now takes just struct stat rather than file name and
8325 struct stats. All callers changed.
8326 (fillbuf): Now takes struct stat reather than struct stats.
8327 All callers changed.
8328 (grep): Don't worry about recursing too deeply; fts and grepdesc
8329 handle this now.
8330 (is_device_mode, grepdirent, grepdesc, grep_command_line_args):
8331 New functions.
8332 (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats
8333 arg. All callers changed. Use openat_safer rather than open.
8334 Use desc == STDIN_FILENO to tell whether we're reading "-".
8335 Don't worry about EINTR when closing -- not possible, since we're
8336 not catching signals.
8337 * tests/Makefile.am (TESTS): Add symlink.
8338 * tests/symlink: New file.
8339
83402012-03-12 Paul Eggert <eggert@cs.ucla.edu>
8341
8342 tests: port big-match to non-GNU dd
8343 * tests/big-match: Don't assume GNU dd extension "bs=1M".
8344
8345 tests: test for bug with -r --exclude-dir and no file operand
8346 * tests/include-exclude: Test for the bug and fix.
8347
83482012-03-12 Allan McRae <allan@archlinux.org>
8349
8350 grep: fix segfault with -r --exclude-dir and no file operand
8351 * src/main.c (grepdir): Don't invoke excluded_file_name on NULL.
8352 * NEWS (Bug fixes): Mention it.
8353
83542012-03-09 Jim Meyering <meyering@redhat.com>
8355
8356 tests: exercise two recently-fixed bugs
8357 * tests/repetition-overflow: New test for bugs fixed by commit
8358 v2.10-82-gcbbc1a4.
8359 * tests/Makefile.am (TESTS): Add it.
8360
83612012-03-03 Jim Meyering <meyering@redhat.com>
8362
8363 maint: use an optimal-for-grep xz compression setting
8364 * cfg.mk (XZ_OPT): Use -6e (determined empirically, see comments).
8365 This sacrifices a meager 60 bytes of compressed tarball size for a
8366 55-MiB decrease in the memory required during decompression. I.e.,
8367 using -9e would shave off only 60 bytes from the tar.xz file, yet
8368 would force every decompression process to use 55 MiB more memory.
8369
8370 build: update gnulib submodule to latest
8371
83722012-03-02 Jim Meyering <meyering@redhat.com>
8373
8374 maint: post-release administrivia
8375 * NEWS: Add header line for next release.
8376 * .prev-version: Record previous version.
8377 * cfg.mk (old_NEWS_hash): Auto-update.
8378
8379 version 2.11
8380 * NEWS: Record release date.
8381
8382 tests: avoid failure when using Solaris 10's sed
8383 * tests/reversed-range-endpoints: Use a simpler sed expression to
8384 sanitize actual output, so it also works with Solaris 10's /bin/sed.
8385
83862012-03-01 Jim Meyering <meyering@redhat.com>
8387
8388 maint: manually correct formatting in dfa.c's cpp definitions
8389 * src/dfa.c: Adjust formatting in cpp definitions.
8390
8391 maint: indent dfa.c
8392 * src/dfa.c: Filter through indent like this:
8393 HOME=. indent -Tsize_t -l79 --leave-preprocessor-space \
8394 --dont-format-comments --no-tabs < dfa.c > k && mv k dfa.c
8395
8396 doc: correct grep.1's descriptions of \w and \W (they omitted "_")
8397 * doc/grep.in.1: Fix descriptions of \w and \W.
8398 They did not mention "_".
8399 * doc/grep.texi (The Backslash Character and Special Expressions):
8400 [\w, \W]: List the "_" before the char class, not after: [_[:alnum:]],
8401 for readability and to be consistent with the man page.
8402
84032012-03-01 Paul Eggert <eggert@cs.ucla.edu>
8404
8405 maint: spelling fixes
8406
8407 grep: fix integer-overflow issues in main program
8408 * NEWS: Document this.
8409 * bootstrap.conf (gnulib_modules): Add inttypes, xstrtoimax.
8410 Remove xstrtoumax.
8411 * src/main.c: Include <inttypes.h>, for INTMAX_MAX, PRIdMAX.
8412 (context_length_arg, prtext, grepbuf, grep, grepfile)
8413 (get_nondigit_option, main):
8414 Use intmax_t, not int, for line counts.
8415 (context_length_arg, main): Silently ceiling line counts
8416 to maximum value, since there's no practical difference between
8417 doing that and using infinite-precision arithmetic.
8418 (out_before, out_after, pending): Now intmax_t, not int.
8419 (max_count, outleft): Now intmax_t, not off_t.
8420 (prepend_args, prepend_default_options, main):
8421 Use size_t, not int, for sizes.
8422 (prepend_default_options): Check for int and size_t overflow.
8423
8424 grep: avoid mishandling of long lines
8425 * src/pcresearch.c (Pexecute): Do not pass a line longer than
8426 INT_MAX to pcre_exec, since its API does not permit that.
8427
8428 grep: remove no-longer-used setrlimit code
8429 This code has been unused and obsolescent ever since the regex
8430 code stopped using the stack for large regular expressions.
8431 * src/main.c [HAVE_SETRLIMIT]: Do not include <sys/time.h> or
8432 or <sys/resource.h>; no longer needed.
8433 (set_rlimits): Remove. All callers changed.
8434
8435 grep: fix some core dumps with long lines etc.
8436 These problems mostly occur because the code attempts to stuff
8437 sizes into int or into unsigned int; this doesn't work on most
8438 64-bit hosts and the errors can lead to core dumps.
8439 * NEWS: Document this.
8440 * src/dfa.c (token): Typedef to ptrdiff_t, since the enum's
8441 range could be as small as -128 .. 127 on practical hosts.
8442 (position.index): Now size_t, not unsigned int.
8443 (leaf_set.elems): Now size_t *, not unsigned int *.
8444 (dfa_state.hash, struct mb_char_classes.nchars, .nch_classes)
8445 (.nranges, .nequivs, .ncoll_elems, struct dfa.cindex, .calloc, .tindex)
8446 (.talloc, .depth, .nleaves, .nregexps, .nmultibyte_prop, .nmbcsets):
8447 (.mbcsets_alloc): Now size_t, not int.
8448 (dfa_state.first_end): Now token, not int.
8449 (state_num): New type.
8450 (struct mb_char_classes.cset): Now ptrdiff_t, not int.
8451 (struct dfa.utf8_anychar_classes): Now token[5], not int[5].
8452 (struct dfa.sindex, .salloc, .tralloc): Now state_num, not int.
8453 (struct dfa.trans, .realtrans, .fails): Now state_num **, not int **.
8454 (struct dfa.newlines): Now state_num *, not int *.
8455 (prtok): Don't assume 'token' is no wider than int.
8456 (lexleft, parens, depth): Now size_t, not int.
8457 (charclass_index, nsubtoks)
8458 (parse_bracket_exp, addtok, copytoks, closure, insert, merge, delete)
8459 (state_index, epsclosure, state_separate_contexts)
8460 (dfaanalyze, dfastate, build_state, realloc_trans_if_necessary)
8461 (transit_state_singlebyte, match_anychar, match_mb_charset)
8462 (check_matching_with_multibyte_ops, transit_state_consume_1char)
8463 (transit_state, dfaexec, free_mbdata, dfaoptimize, dfafree)
8464 (freelist, enlist, addlists, inboth, dfamust):
8465 Don't assume indexes fit in 'int'.
8466 (lex): Avoid overflow in string-to-{hi,lo} conversions.
8467 (dfaanalyze): Redo indexing so that it works with size_t values,
8468 which cannot go negative.
8469 * src/dfa.h (dfaexec): Count argument is now size_t *, not int *.
8470 (dfastate): State numbers are now ptrdiff_t, not int.
8471 * src/dfasearch.c: Include "intprops.h", for TYPE_MAXIMUM.
8472 (kwset_exact_matches): Now size_t, not int.
8473 (EGexecute): Don't assume indexes fit in 'int'.
8474 Check for overflow before converting a ptrdiff_t to a regoff_t,
8475 as regoff_t is narrower than ptrdiff_t in 64-bit glibc (contra POSIX).
8476 Check for memory exhaustion in re_search rather than treating
8477 it merely as failure to match; use xalloc_die () to report any error.
8478 * src/kwset.c (struct trie.accepting): Now size_t, not unsigned int.
8479 (struct kwset.words): Now ptrdiff_t, not int.
8480 * src/kwset.h (struct kwsmatch.index): Now size_t, not int.
8481
8482 tests: test for problems with long matches
8483 The new test is expensive, so add a category of expensive tests,
8484 which are normally not run, and put the new test in this new
8485 category. The idea of having expensive tests is taken from coreutils.
8486 * HACKING: Mention RUN_EXPENSIVE_TESTS and similar env vars.
8487 * Makefile.am (check-expensive): New rule.
8488 * tests/Makefile.am (TESTS): Add big-match.
8489 * tests/init.cfg (expensive_): New function, from coreutils.
8490 * tests/big-match: New file.
8491
84922012-02-29 Paul Eggert <eggert@cs.ucla.edu>
8493
8494 maint: use gnulib _Noreturn rather than __attribute__ ((noreturn))
8495 * src/grep.h (__attribute__): Remove.
8496 * src/dfa.h (__attribute__): Likewise.
8497 (dfaerror): Use noreturn rather than __attribute__ ((noreturn)).
8498 * src/main.c (usage): Likewise.
8499
85002012-02-26 Jim Meyering <meyering@redhat.com>
8501
8502 build: update submodule, bootstrap, tests/init.sh from gnulib
8503 * gl/lib/regcomp.c.diff: Adjust.
8504 * bootstrap: Update from gnulib.
8505 * tests/init.sh: Update from gnulib.
8506
85072012-02-26 Paolo Bonzini <bonzini@gnu.org>
8508
8509 dfa: merge calls to SUCCEEDS_IN_CONTEXT
8510 * src/dfa.c (state_index): use a single call to SUCCEEDS_IN_CONTEXT.
8511
8512 dfa: fix a subtle constraint encoding bug
8513 * src/dfa.c (SUCCEEDS_IN_CONTEXT, PREV_NEWLINE_DEPENDENT,
8514 PREV_LETTER_DEPENDENT): Rewrite to handle all 3*3=9 possible
8515 combinations of previous and next character contexts.
8516 (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT): Remove.
8517 (NO_CONSTRAINT, BEGLINE_CONSTRAINT, ENDLINE_CONSTRAINT,
8518 BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT, LIMWORD_CONSTRAINT,
8519 NOTLIMWORD_CONSTRAINT): Switch to new encoding.
8520 * NEWS: Document resulting bugfix.
8521 * tests/spencer1.tests: Add regression test.
8522
8523 dfa: do not use MATCHES_*_CONTEXT directly
8524 * src/dfa.c (dfastate): Use SUCCEEDS_IN_CONTEXT.
8525
8526 dfa: change meaning of a state context
8527 * src/dfa.c (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT): New.
8528 (state_separate_contexts): Remove second argument.
8529 (state_index): Do not mask away CTX_NONE.
8530 (dfaanalyze): Adjust call to state_index and state_separate_contexts.
8531 (dfastate): Adjust calls to state_index and state_separate_contexts.
8532
85332012-02-13 Paul Eggert <eggert@cs.ucla.edu>
8534
8535 tests: fix loop in epipe test
8536 * tests/epipe: Don't loop forever if the bug is present.
8537 Problem reported by Jaroslav Skarvada.
8538
85392012-02-08 Paul Eggert <eggert@cs.ucla.edu>
8540
8541 tests: work portably even if SIGPIPE is ignored
8542 * tests/epipe: Don't rely on "trap - PIPE"; that's not portable.
8543 Problem reported by Eric Blake in
8544 <http://lists.gnu.org/archive/html/bug-grep/2012-02/msg00017.html>.
8545 Also, use "ls -al" rather than "echo", in case "echo" is done by a
8546 buggy shell that ignores write errors. And close grep's fd 3, as
8547 a sanity check.
8548
85492012-02-07 Paul Eggert <eggert@cs.ucla.edu>
8550
8551 tests: work even if SIGPIPE is ignored
8552 * tests/epipe: Do not infinite-loop if SIGPIPE is already ignored.
8553 It could be that the invoker of 'make check' ignores SIGPIPE,
8554 for example.
8555
85562012-02-05 Jim Meyering <meyering@redhat.com>
8557
8558 build: accommodate -Wshadow and -Werror=suggest-attribute=pure
8559 * src/dfa.c (state_separate_contexts): Add _GL_ATTRIBUTE_PURE.
8560 (dfaexec): Rename parameter, s/newline/allow_nl/, to avoid
8561 shadowing the global.
8562
85632012-02-05 Paolo Bonzini <bonzini@gnu.org>
8564
8565 dfa: refactor common context computations
8566 * src/dfa.c (CTX_ANY, charclass_context, state_separate_contexts): New.
8567 (dfaanalyze): Use state_separate_contexts.
8568 (dfastate): Use charclass_context and state_separate_contexts. Rename
8569 prev_context to separate_contexts.
8570
8571 dfa: change newline/letter to a single context value
8572 * src/dfa.c (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT,
8573 SUCCEEDS_IN_CONTEXT, ACCEPTS_IN_CONTEXT): Take a single context value
8574 for prev and curr.
8575 (struct dfa_state): Replace newline and letter with context.
8576 (wchar_context): New.
8577 (state_index): Replace newline and letter with context. Compare
8578 context values in the state struct. Adjust calls to pass contexts.
8579 (wants_newline): Replace with wanted_context. Adjust calls to pass
8580 contexts.
8581 (dfastate): Replace wants_newline and wants_letter with wanted_context.
8582 Adjust calls to pass contexts.
8583 (build_state): Adjust calls to pass contexts.
8584 (match_anychar, match_mb_charset, transit_state): Use wchar_context.
8585 Adjust calls to pass contexts.
8586
85872012-02-05 Paolo Bonzini <bonzini@gnu.org>
8588
8589 dfa: introduce contexts for the values in d->success
8590 Also initialize all tables in a single place in dfasyntax.
8591
8592 * src/dfa.c (CTX_NONE, CTX_LETTER, CTX_NEWLINE, char_context): New.
8593 (sbit, letters, newline): New.
8594 (dfasyntax): Fill them.
8595 (dfastate): Remove letters, newline, initialized.
8596 (build_state): Use CTX_* constants.
8597 (dfaexec): Remove sbit and sbit_init.
8598
85992012-02-05 Paolo Bonzini <bonzini@gnu.org>
8600
8601 dfa: remove useless check
8602 * src/dfa.c (state_index): There is nothing that is a newline *and*
8603 a letter. Remove redundant call to SUCCEEDS_IN_CONTEXT.
8604
86052012-01-22 Jim Meyering <meyering@redhat.com>
8606
8607 build: update bootstrap from gnulib and adapt
8608 * bootstrap: Update from gnulib.
8609 * tests/init.sh: Update from gnulib.
8610 * bootstrap.conf (bootstrap_epilogue): Remove now-unnecessary,
8611 snippet that edited gnulib-tests/gnulib.mk.
8612 (gnulib_tool_option_extras): Add both --symlink and
8613 --makefile-name=gnulib.mk. Remove use of $bt.
8614 * lib/Makefile.am: Initialize numerous automake variables so that
8615 generated code in gnulib.mk may use += to append to them.
8616
8617 maint: convert `this' to 'this' quoting style in diagnostics
8618 Now that gnulib's quote and quotearg modules use 'this' style,
8619 change the few explicit uses in diagnostics to conform.
8620 * src/egrep.c (after_options): Use 'this' style of quotes.
8621 * src/fgrep.c (after_options): Likewise.
8622 * src/grep.c (after_options): Likewise.
8623 * src/main.c (usage): Likewise.
8624
8625 build: update gnulib to latest; adjust quoting in tests
8626 * gnulib: Update.
8627 * tests/in-eq-out-infloop: Convert expected diagnostics to match
8628 new quoting.
8629
86302012-01-22 Paul Eggert <eggert@cs.ucla.edu>
8631
8632 doc: document recent diagnostics-related changes
8633 * NEWS: Document changes re diagnostics related to GREP_COLORS,
8634 directory loops, -s, "write error".
8635
8636 grep: be quiet about GREP_COLORS syntax
8637 * src/main.c (struct color_cap): fct now returns void,
8638 since there's no longer need to use what it returns.
8639 (color_cap_mt_fct, color_cap_rv_fct, color_cap_ne_fct): Return void.
8640 (parse_grep_colors): Do not output diagnostics and then exit with
8641 status 0. Instead, ignore errors in GREP_COLORS. This is more
8642 consistent with programs that (e.g.) ignore errors in termcap entries,
8643 and it's more internally-consistent as some GREP_COLORS errors
8644 were ignored but not others.
8645
8646 grep: exit with nonzero status if directory loop
8647 * src/main.c (grepdir): Exit with status 2 if a directory loop is
8648 found, since the output might not be "right" (i.e., infinite...).
8649
8650 grep: suppress read errors if -s
8651 * src/main.c (reset, grep, grepfile): Do not report an input error
8652 if -s is given.
8653
8654 grep: don't say "write error" over and over
8655 Problem reported by Travis Gummels in
8656 <https://bugzilla.redhat.com/show_bug.cgi?id=741452>.
8657 * src/main.c (write_error_seen): New static var.
8658 (clean_up_stdout): New function.
8659 (prline): Do not output 'write error' more than once; exit
8660 after the first one. Use the same wording for the diagnostic
8661 that close_stdout uses.
8662 (main): Clean up with clean_up_stdout, not close_stdout, so that
8663 grep doesn't output multiple "write error" diagnostics.
8664 * tests/Makefile.am (TESTS): Add epipe.
8665 * tests/epipe: New file.
8666
86672012-01-12 Paul Eggert <eggert@cs.ucla.edu>
8668
8669 dfa: non-glibc word-constituent unibyte fix
8670 * src/dfa.c (is_valid_unibyte_character): Fix typo that caused
8671 this to incorrectly return 0 on unibyte non-glibc systems.
8672 Problem reported by Aharon Robbins in
8673 <http://lists.gnu.org/archive/html/bug-grep/2012-01/msg00084.html>.
8674
86752012-01-04 Paul Eggert <eggert@cs.ucla.edu>
8676
8677 doc: document empty pattern better
8678 * doc/grep.texi (Top, Fundamental Structure, Usage):
8679 Explain how grep deals with the empty pattern.
8680 Problem spotted by Bernhard Voelker in
8681 <http://lists.gnu.org/archive/html/bug-grep/2012-01/msg00050.html>.
8682
8683 grep: with no args, search "." only if command-line -r
8684 * NEWS: Document this.
8685 * doc/grep.texi (Environment Variables, grep Programs): Likewise.
8686 * src/main.c (usage): Likewise.
8687 (main): Implement this.
8688 (prepend_default_options): Return a count of prepended options.
8689 * tests/r-dot: Test the above.
8690
86912012-01-03 Jim Meyering <meyering@redhat.com>
8692
8693 tests: adjust test to match code, now that --mmap writes to stderr
8694 * tests/ignore-mmap: Separate stdout and stderr; test both.
8695
8696 deprecate the --mmap option
8697 * src/main.c (main): Deprecate the --mmap option: issue a warning
8698 when it is used.
8699 (usage): Change description.
8700 * doc/grep.texi (Other Options): Document the new behavior.
8701 * NEWS (Changes in behavior): Mention it.
8702
87032012-01-03 Paolo Bonzini <bonzini@gnu.org>
8704
8705 dfa: fix incorrect comment
8706 * src/dfa.c (dfastate): Fix comment for newline.
8707
8708 dfa: fix rebase conflict
8709 * src/dfa.c (dfaanalyze): Fix reference to nalloc.
8710
8711 dfa: automatically resize position_sets
8712 * src/dfa.c (insert, copy, merge): Resize arrays here.
8713 (dfaanalyze): Do not track number of allocated elements here.
8714 (dfastate): Allocate mbps with only one element.
8715
8716 dfa: change position_set nelem to size_t
8717 * src/dfa.c (REALLOC_IF_NECESSARY): Disable assertion, to avoid
8718 warnings from -Wtype-limits.
8719 (position_set): Change nelem to a size_t.
8720
8721 dfa: move nalloc to position_set structure
8722 * src/dfa.c (position_set): Add alloc.
8723 (alloc_position_set): Initialize it.
8724 (dfaanalyze): Use it instead of the nalloc array or nelem.
8725
8726 dfa: remove dead assignment
8727 * src/dfa.c (transit_state): transit_state_consume_1char will clear follows,
8728 do not do this ourselves.
8729
8730 dfa: introduce alloc_position_set
8731 * src/dfa.c (alloc_position_set): New function, use it throughout.
8732
8733 dfa: use a more compact data type for grps
8734 * src/dfa.c (leaf_set): New.
8735 (dfastate): Use the smaller type, leaf_set, for grps. Its prior type
8736 contained an unused constraint field.
8737
8738 dfa: use MALLOC/REALLOC always
8739 src/dfa.c (dfastate, enlist, dfamust): Use MALLOC and REALLOC.
8740
8741 dfa: remove unnecessary braces
8742 * src/dfa.c (dfastate): Remove unnecessary braces.
8743
8744 dfa: x2nrealloc starting from a NULL pointer works
8745 * src/dfa.c (parse_bracket_exp): Do not MALLOC mbcset parts the first time
8746 they are encountered. Initialize chars_al correctly.
8747
87482012-01-03 Jim Meyering <meyering@redhat.com>
8749
8750 build: avoid build failure with --enable-gcc-warnings and recent gcc
8751 * lib/colorize-posix.c: Disable -Wsuggest-attribute=const, to avoid
8752 warning about this empty init_colorize function.
8753
87542012-01-03 Paolo Bonzini <bonzini@gnu.org>
8755
8756 remove lib/ms/
8757 * configure.ac: Create lib/colorize.c as a symbolic link.
8758 * lib/colorize-posix.c: New name of lib/colorize-impl.c.
8759 * lib/colorize-w32.c: New name of lib/ms/colorize-impl.c.
8760 * lib/colorize.c: Delete.
8761 * lib/Makefile.am (EXTRA_DIST): Adjust.
8762 * .gitignore: Adjust.
8763 * cfg.mk: Adjust syntax-check exclusions.
8764
8765 unify colorize.h headers
8766 * lib/Makefile.am (EXTRA_DIST): Adjust.
8767 * lib/colorize.h: Remove inline functions.
8768 * lib/colorize-impl.c: Move them here as functions.
8769 * lib/ms/colorize.h: Remove.
8770 * src/Makefile.am (DEFAULT_HEADERS): Remove.
8771
87722012-01-02 Paolo Bonzini <bonzini@gnu.org>
8773
8774 colorize: use isatty module
8775 * bootstrap.conf: Add isatty module.
8776 * gnulib: Update to latest.
8777 * lib/colorize.h: Remove argument from should_colorize.
8778 * lib/ms/colorize.h: Likewise.
8779 * lib/colorize-impl.c: Factor isatty call out of here...
8780 * lib/ms/colorize-impl.c: ... and here...
8781 * src/main.c: ... into here.
8782
87832012-01-02 Jim Meyering <meyering@redhat.com>
8784
8785 tests: avoid minor "make check" failure
8786 * tests/r-dot: Make executable, to avoid triggering a failed
8787 consistency test in "make check".
8788
87892012-01-02 Paul Eggert <eggert@cs.ucla.edu>
8790
8791 grep: -r with no args now searches "."
8792 This is a patch I've been meaning to put in for years.
8793 When I added support for "grep -r", I forgot to have "grep -r PAT"
8794 search the working directory by default, instead of searching
8795 standard input (which makes no sense, even if stdin is a directory).
8796 This is not an upward compatible change, since "grep -r PAT <file"
8797 will no longer search standard input, but that's OK; nobody should
8798 be using "grep -r" that way anyway.
8799 * NEWS: Document this.
8800 * doc/grep.texi (File and Directory Selection, grep Programs, Usage):
8801 Likewise.
8802 * src/main.c (usage): Likewise.
8803 (grepdir): If DIR is null, search the working directory, but do
8804 not prepend "./" to the file names.
8805 (main): If recursing and no operands are given, search ".".
8806 * tests/Makefile.am (TESTS): Add r-dot.
8807 * tests/r-dot: New file.
8808
8809 grep: prefer fgets to printf, _ to gettext
8810 * lib/colorize.h (print_end_colorize):
8811 * lib/ms/colorize-impl.c (print_end_colorize):
8812 Use fputs instead of printf.
8813 * src/main.c (usage): Likewise. Use _ instead of gettext.
8814
88152012-01-01 Paul Eggert <eggert@cs.ucla.edu>
8816
8817 grep: check stdin like other files
8818 * NEWS: Document this.
8819 * src/main.c (grepfile): Revamp tests for input files so that
8820 standard input is tested like other files. For example, report
8821 an error if standard input equals standard output.
8822 Prefer open+fstat to stat+open if possible, as open+fstat is
8823 usually a bit faster and avoids a race condition.
8824 * tests/in-eq-out-infloop: Add tests for cases like
8825 'grep pat <file >>file'.
8826
88272012-01-01 Jim Meyering <meyering@redhat.com>
8828
8829 maint: update all copyright year number ranges
8830 Run "make update-copyright".
8831
88322011-12-31 Paul Eggert <eggert@cs.ucla.edu>
8833
8834 grep: lower-case function names
8835 These names used to be macros, but they're functions now.
8836 All callers changed.
8837 * src/main.c (pr_sgr_start): Rename from PR_SGR_START.
8838 (pr_sgr_end): Rename from PR_SGR_END.
8839 (pr_sgr_start_if): Rename from PR_SGR_START_IF.
8840 (pr_sgr_end_if): Rename from PR_SGR_END_IF.
8841
8842 ms: move Microsoft-specific stuff to lib/ms
8843 * cfg.mk (exclude_file_name_regexp--sc_prohibit_strcmp)
8844 (exclude_file_name_regexp--sc_require_config_h)
8845 (exclude_file_name_regexp--sc_require_config_h_first):
8846 New rules.
8847 * lib/colorize.c, lib/colorize.h, lib/colorize-impl.c:
8848 * lib/ms/colorize.h, lib/ms/colorize-impl.c: New files.
8849 * configure.ac (GREP_SRC_INCLUDES): New macro.
8850 * lib/Makefile.am (libgreputils_a_SOURCES): Add colorize.[ch].
8851 (EXTRA_DIST): New macro.
8852 * src/Makefile.am (DEFAULT_INCLUDES): New macro.
8853 * src/main.c: Include colorize.h.
8854 (PR_SGR_START, PR_SGR_END, PR_SGR_START_IF, PR_SGR_END_IF):
8855 Now static functions, not macros.
8856 (hstdout, norm_attr, w32_console_init, w32_sgr2attr)
8857 (w32_clreol) [__MINGW32__]: Move to lib/ms/colorize-impl.c.
8858 (pr_sgr_start, pr_sgr_end): Remove; callers changed to use new
8859 print_start_colorize, print_end_colorize from colorize.h.
8860 (init_colorize): Rename from w32_console_init and move to
8861 colorize module; caller changed.
8862 (should_colorize): Move to colorize module.
8863
8864 grep: do input==output check more like dir loop check
8865 * src/main.c (grepfile): Just use SAME_INODE; don't bother
8866 with SAME_REGULAR_FILE. This works better on properly-working
8867 POSIX hosts, since it handles the case where the file is changing
8868 as we grep it. It works worse on hosts that don't support st_ino
8869 properly, but in practice this isn't that much of a problem here.
8870 * src/system.h (same_file_attributes, SAME_REGULAR_FILE):
8871 Remove; no longer needed.
8872
8873 build: update gnulib submodule to latest
8874
88752011-12-28 Paul Eggert <eggert@cs.ucla.edu>
8876
8877 maint: remove now-unused/obsolete files
8878 * README.DOS: Remove file.
8879 * m4/djgpp.m4: Likewise.
8880 * .gitignore: Remove reference to m4/djgpp.m4.
8881
88822011-12-28 Jim Meyering <meyering@redhat.com>
8883
8884 maint: distribute ChangeLog-2009
8885 * Makefile.am (EXTRA_DIST): Add ChangeLog-2009.
8886 Spotted by Eli Zaretskii.
8887
88882011-12-28 Jim Meyering <meyering@redhat.com>
8889
8890 main.c: add some 'const' directives
8891 * src/main.c (color_dict, fg_color, bg_color, cap): Declare const.
8892
8893 No semantic change.
8894
88952011-12-28 Jim Meyering <meyering@redhat.com>
8896
8897 main.c: correct indentation and formatting style
8898 * src/main.c: Correct many formatting inconsistencies.
8899 No semantic change.
8900
8901 avoid new syntax-check failures
8902 * cfg.mk (old_NEWS_hash): Update, to accommodate old NEWS modification.
8903 * src/main.c: Indent solely with spaces, never with TABs.
8904 (should_colorize): Remove useless parens in #if directive.
8905
89062011-12-28 Eli Zaretskii <eliz@gnu.org>
8907
8908 Fix whitespace, indentation and documentation
8909 * src/main.c (parse_grep_colors): Fix indentation.
8910 (usage): Mention MS-Windows in help text for -U and -u options.
8911
8912 update NEWS for MS-Windows changes
8913 * NEWS: Mention MS-Windows related bugfixes and enhancements.
8914
8915 Fix the test suite for MS-Windows.
8916 * tests/include-exclude: Use --directories=skip, to avoid
8917 gratuitous failures on systems that cannot grep directories.
8918 * tests/reversed-range-endpoints: Don't reject program names with
8919 leading directories and drive letters.
8920 * tests/warn-char-classes: Likewise.
8921
8922 Support color highlighting on MS-Windows
8923 * src/main.c (SGR_START, SGR_END, PR_SGR_FMT, PR_SGR_FMT_IF): Remove.
8924 (PR_SGR_START, PR_SGR_START_IF): Replace with pr_sgr_start.
8925 (PR_SGR_END, PR_SGR_END_IF): Replace with pr_sgr_end.
8926 (pr_sgr_start, pr_sgr_end, should_colorize): New functions.
8927 (w32_console_init, w32_sgr2attr, w32_clreol) [__MINGW32__]: New functions.
8928 (main): Use should_colorize. Invoke w32_console_init.
8929
89302011-12-24 Paul Eggert <eggert@cs.ucla.edu>
8931
8932 don't ignore errors when reading a directory
8933 grep no longer silently suppresses errors when reading a directory
8934 as if it were a text file. For example, "grep x ." now reports a
8935 read error on most systems; formerly, it ignored the error.
8936 Problem reported as an aside by Bob Proulx (Bug#10355).
8937 * NEWS: Document this.
8938 * src/main.c (grep, grepfile): Implement this. Simplify the code
8939 considerably.
8940 * src/system.h (is_EISDIR): Remove; no longer needed.
8941
8942 --include etc. now work on command-line args more consistently
8943 --include and --exclude apply only to non-directories and
8944 --exclude-dir applies only to directories. "-" (standard input)
8945 is never excluded, since it is not a file name.
8946 This bug was discovered while fixing a read-directory bug (Bug#10355).
8947 * NEWS: Document this.
8948 * src/main.c (main): Implement this.
8949 * tests/include-exclude: Test for it.
8950
89512011-12-24 Jim Meyering <meyering@redhat.com>
8952
8953 build: update gnulib submodule to latest
8954
89552011-12-12 Arnold D. Robbins <arnold@skeeve.com>
8956
8957 doc: improve grep.texi
8958 * doc/grep.texi: General editing for improved aesthetics.
8959 Also fix a few problems.
8960
89612011-12-12 Jim Meyering <meyering@redhat.com>
8962
8963 build: use gnulib's iswctype wcscoll
8964 * bootstrap.conf (gnulib_modules): Add iswctype and wcscoll.
8965 * configure.ac: Remove explicit checks for those functions.
8966 * src/mbsupport.h (MBS_SUPPORT): Define to 1 if not already defined.
8967 Remove the conditional, now that we're guaranteed by gnulib to have
8968 wcscoll and iswctype.
8969 Suggested by Alan Hourihane in http://savannah.gnu.org/bugs/?34930
8970
8971 disable the new input==output guard for additional options
8972 * src/main.c (grepfile): Do not reject input == output also
8973 when using a few other options.
8974 * tests/in-eq-out-infloop: Test these new cases.
8975 * NEWS (Bug fixes): Mention it
8976
89772011-12-11 Nicolas Vigier <boklm@mars-attacks.org>
8978
8979 do not reject "grep -qr . > out"
8980 The recent fix to avoid an infinite disk-filling loop, commit 5e20a38a,
8981 introduced a minor regression. If you use grep with -q and -r, and
8982 redirect output to a file that will be traversed, then grep would
8983 reject the command, even though it will generate no output.
8984 In that case, there is no risk of an infinite loop.
8985 * src/main.c (grepfile): Do not reject input == output when
8986 using --quiet/--silent (-q).
8987 Reported by J H Wilson in http://bugs.mageia.org/show_bug.cgi?id=3501
8988 forwarded by Nicolas Vigier to https://savannah.gnu.org/bugs/?34917
8989
89902011-11-29 Arnold Robbins <arnold@skeeve.com>
8991
8992 dfa: do not call nl_langinfo in !MBS_SUPPORT mode
8993 * src/dfa.c (using_utf8) [!MBS_SUPPORT]: Remove erroneous "defined"
8994 in cpp test for MBS_SUPPORT. Since commit a163349d, MBS_SUPPORT is 0/1.
8995 This error caused trouble only in the !MBS_SUPPORT case.
8996
8997 dfa: avoid warning from deficient compiler in !MBS_SUPPORT mode
8998 * src/dfa.c (setbit_wc) [!MBS_SUPPORT]: Add explicit "return false;"
8999 after "abort ();", to avoid a warning from deficient compilers.
9000
90012011-11-29 Jim Meyering <meyering@redhat.com>
9002
9003 tests: use "compare exp out", not "compare out exp"
9004 Likewise, when an empty file is expected, use "compare /dev/null out",
9005 not "compare out /dev/null". I.e., specify the expected/desired contents
9006 via the first file name. Prompted by a suggestion from Bruno Haible
9007 in http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4020/focus=29154
9008
9009 Run these commands:
9010
9011 git grep -l -E 'compare [^ ]+ exp' \
9012 |xargs perl -pi -e 's/(compare) (\S+) (exp\S*)/$1 $3 $2/'
9013 git grep -l -E 'compare [^ ]+ /dev/null' \
9014 |xargs perl -pi -e 's/(compare) (\S+) (\/dev\/null)/$1 $3 $2/'
9015
90162011-11-29 Jim Meyering <meyering@redhat.com>
9017
9018 build: update gnulib submodule to latest
9019
90202011-11-28 Jim Meyering <meyering@redhat.com>
9021
9022 build: accommodate -Werror=suggest-attribute=pure
9023 Now that we're using the latest manywarnings module from gnulib,
9024 accommodate gcc's -Werror=suggest-attribute=pure option by marking
9025 suggested functions with gnulib-defined _GL_ATTRIBUTE_PURE.
9026 * src/kwset.c (hasevery): Mark function with pure attribute.
9027 (bmexec): Likewise.
9028 * src/dfa.c (nsubtoks, istrstr, find_pred, dfamusts): Likewise.
9029 * configure.ac: Disable (for lib/) options that seem not to be worth
9030 the trouble: -Wunsuffixed-float-constants and -Wformat-nonliteral.
9031
90322011-11-21 Bruno Haible <bruno@clisp.org>
9033
9034 build: fix "make check" error on OSF/1
9035 * tests/Makefile.am (TESTS_ENVIRONMENT): Test the value of the variable
9036 BASH_VERSION, not the literal ASH_VERSION.
9037
90382011-11-21 Jim Meyering <meyering@redhat.com>
9039
9040 portability: work consistently on *BSD systems
9041 * src/dfa.c (is_valid_unibyte_character): Define.
9042 (IS_WORD_CONSTITUENT): Use it here, to make grep work consistently
9043 even on *BSD systems, which use different tables for ctype macros
9044 like isalpha. http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4022
9045 With help from Bruno Haible.
9046
90472011-11-20 Jim Meyering <meyering@redhat.com>
9048
9049 maint: consistently use NULL, not 0, when comparing pointers
9050 * src/dfa.c (dfaanalyze): Compare trans[s] with NULL, not 0.
9051
9052 maint: remove an avoidable #ifdef/#endif pair
9053 * src/dfa.c (dfaanalyze): Remove avoidable #ifdef around "{".
9054
9055 tests: fix typo in last change
9056 * tests/word-delim-multibyte: Use double quotes around $e_acute,
9057 not single quotes. Spotted by Bruno Haible.
9058 This and the preceding change do not resolve the XPASS failure
9059 on OpenBSD 4.9 after all. See the explanation at
9060 http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4022
9061
9062 tests: avoid unwarranted test failure on *BSD-based systems
9063 * tests/word-delim-multibyte (e_acute): Use a more portable
9064 representation of e-acute. Reported by Bruno Haible.
9065
90662011-11-19 Jim Meyering <meyering@redhat.com>
9067
9068 maint: accommodate -Wdeclaration-after-statement, but only in dfa.c,
9069 and because doing so does not impact readability/maintainability.
9070 This is solely to accommodate gawk users who are stuck with ancient gcc.
9071 This is no excuse to change any other code in grep.
9072 * src/dfa.c (dfaoptimize, parse_bracket_exp): Move declaration
9073 to precede first statement in block.
9074
90752011-11-16 Jim Meyering <meyering@redhat.com>
9076
9077 maint: post-release administrivia
9078 * NEWS: Add header line for next release.
9079 * .prev-version: Record previous version.
9080 * cfg.mk (old_NEWS_hash): Auto-update.
9081
9082 version 2.10
9083 * NEWS: Record release date.
9084
9085 build: update gnulib submodule to latest
9086
90872011-11-13 Jim Meyering <meyering@redhat.com>
9088
9089 maint: update bootstrap and init.sh from gnulib
9090 * tests/init.sh: Update from gnulib.
9091 * bootstrap: Likewise.
9092
90932011-11-12 Jim Meyering <meyering@redhat.com>
9094
9095 build: update gnulib for exclude-test fixes
9096
9097 tests: make our "export" replacement efficient with modern shells
9098 * tests/Makefile.am (TESTS_ENVIRONMENT): Use a trivial and efficient
9099 implementation with a shell that supports "export var=val".
9100 Use the sed-invoking replacement only when necessary.
9101 Improved by Stefano Lattarini.
9102
9103 tests: make the replacement export function more robust
9104 * tests/Makefile.am (sed_quote_value): Also quote single quotes.
9105 Remove sed's -e options. Not needed.
9106
91072011-11-12 Bruno Haible <bruno@clisp.org>
9108
9109 tests: fix test suite execution failure on OSF/1 5.1
9110 * tests/Makefile.am (TESTS_ENVIRONMENT): Use a shell function to
9111 ensure that we use only the portable form of the 'export' shell
9112 built-in.
9113
9114 tests: don't assume that /bin/bash exists
9115 * tests/fedora: Run using /bin/sh, not /bin/bash.
9116
9117 tests: avoid unwarranted failures due to SATAN's timeout
9118 * tests/init.cfg (require_timeout_): Also ensure that
9119 timeout exits with its child's exit status.
9120
9121 build: fix compilation error on MSVC 9 to due Pexecute() declaration
9122 * src/pcresearch.c (WITHOUT_PCRE_NORETURN): Remove macro.
9123 (Pexecute): Replace abort() call with code that does not trigger GCC
9124 warnings.
9125
9126 tests: fix high-bit-range test failure on OSF/1 5.1
9127 * tests/high-bit-range: Use octal escape instead of hexadecimal escape
9128 sequence.
9129
91302011-11-11 Jim Meyering <meyering@redhat.com>
9131
9132 build: update gnulib for solaris test fix
9133
91342011-11-10 Jim Meyering <meyering@redhat.com>
9135
9136 build: update gnulib submodule to latest
9137
9138 maint: adjust the URL that will appear in the generated announcement
9139 * cfg.mk (url_dir_list): Use this http://ftp.gnu.org/gnu/$(PACKAGE)
9140 for the first link listed in the generated announcement.
9141 announce-gen now provides the faster mirror link automatically.
9142
91432011-11-06 Jim Meyering <meyering@redhat.com>
9144
9145 build: stop distributing gzip'd releases; xz is enough
9146 * configure.ac (AM_INIT_AUTOMAKE): Add no-dist-gzip.
9147 * NEWS (Build-related): Mention that we're dropping .tar.gz.
9148
9149 build: update gnulib submodule to latest
9150
91512011-10-14 Stefano Lattarini <stefano.lattarini@gmail.com>
9152
9153 distcheck: ensure dist-hook fails if syntax-check fails
9154 * Makefile.am (run-syntax-check): Fix logic, to ensure that
9155 the recipe of this target returns a non-zero exit status if
9156 "make syntax-check" fails.
9157
91582011-10-12 Jim Meyering <meyering@redhat.com>
9159
9160 build: update gnulib submodule to latest
9161 This should fix a few portability problems, including one on HP-UX
9162 and a test-float failure on PPC, reported by Andreas Metzler.
9163
91642011-10-10 Stefano Lattarini <stefano.lattarini@gmail.com>
9165
9166 gitignore: merge top-level and tests/ .gitignore files
9167 * tests/.gitignore: Remove; what little remained of its
9168 contents has been moved ...
9169 * .gitignore: ... here.
9170
9171 tests: tiny simplification in TESTS_ENVIRONMENT definition
9172 * tests/Makefile.am (TESTS_ENVIRONMENT): Remove redundant use of
9173 `export'.
9174
91752011-10-10 Stefano Lattarini <stefano.lattarini@gmail.com>
9176
9177 tests: support development version of automake too
9178 This change implements a more correct and idiomatic use of the
9179 features of the Automake-provided 'parallel-tests' harness.
9180 Moreover, this change is required in order for the testsuite to
9181 continue to work with the new testsuite harness that is planned
9182 to be introduced in Automake 1.12 (which, as of the writing date,
9183 is still under development and in late alpha state).
9184
9185 * tests/Makefile.am (TESTS_ENVIRONMENT): The development version of
9186 automake dos not support setting the interpreter delegated to run
9187 the tests scripts in this variable; instead, use ...
9188 (LOG_COMPILER): ... this variable.
9189 * .gitignore: Ignore `.trs' files in directory `tests/'.
9190 * build-aux/.gitignore: Ignore `test-driver' script.
9191
91922011-10-03 Eli Zaretskii <eliz@gnu.org>
9193
9194 dfa: don't mishandle high-bit bytes in a regexp with signed-char
9195 This appears to arise only on systems for which "char" is signed.
9196 * src/dfa.c (FETCH_WC, FETCH): Produce an unsigned value, rather
9197 than a sign-extended one. Fixes a bug on MS-Windows with compiling
9198 patterns that include characters with the 8-th bit set.
9199 (to_uchar): Define. From coreutils.
9200 Reported by David Millis <tvtronix@yahoo.com>.
9201 See http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3893
9202 * NEWS (Bug fixes): Mention it.
9203
92042011-09-16 Jim Meyering <meyering@redhat.com>
9205
9206 maint: dfa: simplify multi-byte-related conditionals
9207 * src/dfa.c (setbit_case_fold_c, parse_bracket_exp, lex):
9208 (addtok_mb, dfaparse): Change each "MBS_SUPPORT && MB_CUR_MAX > 1"
9209 test to just "MB_CUR_MAX > 1".
9210 * src/dfasearch.c (kwsincr_case, EGexecute): Likewise.
9211 * src/kwsearch.c (Fcompile, Fexecute): Likewise.
9212 * src/searchutils.c (kwsinit): Likewise.
9213 * src/dfa.c (parse_bracket_exp): Convert
9214 "if (!MBS_SUPPORT || MB_CUR_MAX == 1)" to
9215 "if (MB_CUR_MAX == 1)" and do this:
9216 - assert(!MBS_SUPPORT || MB_CUR_MAX == 1);
9217 + assert(MB_CUR_MAX == 1);
9218
9219 maint: dfa: simplify several expressions
9220 * src/dfa.c (dfainit): Set d->mb_cur_max unconditionally, now
9221 that MB_CUR_MAX is always usable. With that, simplify all
9222 "MBS_SUPPORT && d->mb_cur_max > 1" to simply "d->mb_cur_max > 1".
9223 (dfastate, dfaexec, dfainit, dfafree): Simplify, removing each
9224 now-unnecessary "MBS_SUPPORT &&".
9225
9226 maint: dfa: avoid in-function "#if MBS_SUPPORT" tests
9227 * src/dfa.c (setbit_case_fold_c): Remove "#if MBS_SUPPORT" in favor
9228 of simple "if (MBS_SUPPORT ...".
9229 (dfaexec, addtok): Likewise.
9230
9231 maint: ensure that MB_CUR_MAX is defined even when !MBS_SUPPORT
9232 * src/mbsupport.h [!MBS_SUPPORT] (MB_CUR_MAX): Define to 1.
9233
9234 build: fix compilation failure when MBS_SUPPORT is 0
9235 * src/dfa.c (add_utf8_anychar): Always compile this function,
9236 but when MBS_SUPPORT is 0, give it an empty body.
9237 (prepare_wc_buf): Likewise.
9238 [! MBS_SUPPORT] (setbit_wc): Define to always abort.
9239
9240 maint: dfa: simplify dfaoptimize
9241 * src/dfa.c (dfaoptimize): Simplify.
9242 (dfacomp): Remove now-redundant "if (MBS_SUPPORT)" guard,
9243 since dfaoptimize does nothing if !MBS_SUPPORT.
9244
9245 maint: dfa: remove some #if MBS_SUPPORT guards
9246 * src/dfa.c: Replace a few "#if MBS_SUPPORT" directives with
9247 "if (MBS_SUPPORT)". Remove some altogether.
9248
9249 maint: dfa: convert #if-MBS_SUPPORT (dfastate)
9250 * src/dfa.c (dfastate): Use regular "if", not #if MBS_SUPPORT.
9251
9252 maint: dfa: convert #if-MBS_SUPPORT (dfastate)
9253 * src/dfa.c (dfastate): Use regular "if", not #if MBS_SUPPORT.
9254
9255 maint: dfa: convert #if-MBS_SUPPORT (state_index)
9256 * src/dfa.c (state_index): Use regular "if", not #if MBS_SUPPORT.
9257
9258 maint: dfa: convert #if-MBS_SUPPORT (dfaparse)
9259 * src/dfa.c (dfaparse): Use regular "if", not #if MBS_SUPPORT.'
9260
9261 maint: dfa: convert #if-MBS_SUPPORT (copytoks)
9262 * src/dfa.c (copytoks): Use regular "if", not #if MBS_SUPPORT.'
9263
9264 maint: dfa: convert #if-MBS_SUPPORT (lex)
9265 * src/dfa.c (lex): Use regular "if", not #if MBS_SUPPORT.'
9266
9267 maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp)
9268 * src/dfa.c (parse_bracket_exp): Use regular "if", not #if MBS_SUPPORT.
9269
9270 maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp)
9271 * src/dfa.c (parse_bracket_exp): Use regular "if", not #if MBS_SUPPORT.
9272
9273 maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp)
9274 * src/dfa.c (parse_bracket_exp): Use regular "if", not #if MBS_SUPPORT.
9275
9276 maint: dfa: convert #if-MBS_SUPPORT (dfaexec)
9277 * src/dfa.c (dfaexec): Use regular "if", not #if MBS_SUPPORT.
9278
9279 maint: dfa: convert #if-MBS_SUPPORT (dfaexec)
9280 * src/dfa.c (dfaexec): Use regular "if", not #if MBS_SUPPORT.
9281 Also add curly braces around multi-line if/else blocks.
9282
9283 maint: dfa: remove #if-MBS_SUPPORT (free_mbdata)
9284 * src/dfa.c (free_mbdata): Remove the #if guard altogether.
9285
9286 maint: dfa: convert #if-MBS_SUPPORT (dfaoptimize, dfacomp)
9287 * src/dfa.c (dfaoptimize, dfacomp): Use regular "if",
9288 not #if MBS_SUPPORT.
9289
9290 maint: dfa: convert #if-MBS_SUPPORT (dfafree)
9291 * src/dfa.c (dfafree): Use regular "if", not #if MBS_SUPPORT.
9292
9293 maint: dfa: convert #if-MBS_SUPPORT (parse_bracket_exp, part1)
9294 * src/dfa.c (parse_bracket_exp): Remove in-function #if MBS_SUPPORT.
9295
9296 maint: remove #if-MBS_SUPPORT declaration guards
9297 * src/search.h: Don't bother to #if-out declarations.
9298
9299 maint: convert #if-MBS_SUPPORT (EGexecute)
9300 * src/dfasearch.c (EGexecute): Remove in-function #if MBS_SUPPORT.
9301
9302 maint: convert #if-MBS_SUPPORT (kwsincr_case)
9303 * src/dfasearch.c (kwsincr_case): Remove in-function #if MBS_SUPPORT.
9304 Move decl's down.
9305
9306 maint: convert #if-MBS_SUPPORT (Fcompile, etc.)
9307 * src/kwsearch.c (Fcompile, Fexecute): Remove in-function #if MBS_SUPPORT.
9308 (Fcompile): Rearrange some declarations. No semantic change.
9309
9310 maint: convert #if-MBS_SUPPORT (kwsinit)
9311 * src/searchutils.c (kwsinit): Remove in-function #if MBS_SUPPORT.
9312
9313 maint: dfa: remove case-guarding #if-MBS_SUPPORT
9314 * src/dfa.c [DEBUG] (prtok): Remove now-useless #if-MBS_SUPPORT.
9315
93162011-09-15 Jim Meyering <meyering@redhat.com>
9317
9318 maint: remove #if MBS_SUPPORT around member declaration
9319 * src/dfa.c (dfastate): Don't #ifdef-out "mbps" position_set member.
9320
9321 maint: dfa: remove #if MBS_SUPPORT around struct definition
9322 * src/dfa.c (struct mb_char_classes): Don't #ifdef-out declarations.
9323
9324 build: avoid compilation failure when building without PCRE support
9325 * src/pcresearch.c [!HAVE_LIBPCRE] (WITHOUT_PCRE_NORETURN): Define
9326 to _Noreturn, not obsoleted-by-gnulib _GL_ATTRIBUTE_NORETURN.
9327 Reported by Eric Blake.
9328
9329 tests: stop using skip_test_; use skip_ instead
9330 * tests/init.cfg (skip_test_): Remove definition. Use the improved
9331 skip_ function from init.sh, now that it has the same feature.
9332 * tests/euc-mb: s/skip_test_/skip_/
9333 * tests/sjis-mb: Likewise.
9334 * tests/fmbtest: Likewise.
9335
9336 tests: skip tests that require MBS support
9337 * tests/init.cfg (require_compiled_in_MB_support): New function.
9338 * tests/char-class-multibyte: Use it here, since this test cannot
9339 succeed without MBS support.
9340 * tests/equiv-classes: Likewise.
9341 * tests/euc-mb: Likewise.
9342 * tests/fgrep-infloop: Likewise.
9343 * tests/init.cfg: Likewise.
9344 * tests/prefix-of-multibyte: Likewise.
9345 * tests/turkish-I: Likewise.
9346 * tests/sjis-mb: Likewise.
9347
9348 tests: make fmbtest explain (to stderr, not log) why it is skipped
9349 * tests/fmbtest: Use skip_ and fail_ to give better diagnostics.
9350
9351 maint: dfa: improve comments
9352 * src/dfa.c (match_mb_charset, match_anychar): Improve comments.
9353
93542011-09-14 Jim Meyering <meyering@redhat.com>
9355
9356 build: update gnulib submodule to newer
9357
9358 maint: correct indentation
9359 * src/dfa.c (dfaexec): Reposition curly braces to match indentation style.
9360 Remove useless comment.
9361
9362 maint: move declaration "down" to inner scope where it is used
9363 * src/dfa.c (dfaexec): Move decl of local down into scope where used.
9364
93652011-09-07 Jim Meyering <meyering@redhat.com>
9366
9367 doc: use "file name" consistently in grep's --help output
9368 * src/main.c (usage): Use "file name", not "filename" in descriptions
9369 of --with-filename (-H), --no-filename (-h) and --label=LABEL.
9370 Suggested by Sequoia McDowell.
9371
9372 bug: requires ru_RU.KOI8-R". [bug introduced in grep-2.9]
9373
93742011-08-31 Matthew Burgess <matthew@linuxfromscratch.org>
9375
9376 tests: remove debug code that would cp to /t
9377 * tests/unibyte-bracket-expr: Remove debug artifact introduced
9378 by 2011-06-02 commit de5f7000, "tests: exercise a uni-byte [...]
9379 bug: requires ru_RU.KOI8-R". [bug introduced in grep-2.9]
9380
93812011-08-20 Jim Meyering <meyering@redhat.com>
9382
9383 build: use largefile module and update to latest gnulib
9384 * configure.ac: Remove AC_SYS_LARGEFILE, subsumed by ...
9385 * bootstrap.conf (gnulib_modules): ...this. Use largefile module.
9386 * gnulib: Update to latest.
9387
9388 maint: clean up and plug a leak-on-OOM
9389 * src/dfa.c (icatalloc): Clean up; use xrealloc in place of malloc
9390 and realloc; remove conditionals that are unnecessary, now that
9391 failed allocation results in exit.
9392 (enlist): Use xrealloc in place of realloc; remove conditional.
9393 (comsubs): Avoid leak upon failed enlist call.
9394 (dfamust): Use xmalloc in place of malloc.
9395 Remove conditionals, now that icpyalloc and icatalloc never return NULL.
9396
9397 maint: use x2nrealloc, not xrealloc
9398 * src/main.c (main): Use x2nrealloc, not xrealloc
9399
94002011-07-24 Jim Meyering <meyering@redhat.com>
9401
9402 tests: add a test to trigger the bug
9403 * tests/Makefile.am (TESTS): Add it.
9404 * tests/in-eq-out-infloop: Exercise the bug/fix.
9405
9406 exit 2 (rather than infloop) when an input file is also on stdout
9407 This avoids a potential "infinite" disk-filling loop.
9408 Reported in http://savannah.gnu.org/patch/?5316
9409 and http://savannah.gnu.org/bugs/?17457.
9410 * src/main.c: Include "quote.h".
9411 (out_stat): New global.
9412 (grepfile): Compare each regular file's dev/ino/etc.
9413 with those from the file on stdout (if it too is regular).
9414 (main): Set out_stat, if stdout is a regular file.
9415 * src/system.h: Include "same-inode.h".
9416 (same_file_attributes): Define. From diffutils.
9417 (SAME_REGULAR_FILE): Define.
9418 * bootstrap.conf (gnulib_modules): Use quote, not quotearg.
9419 Use same-inode.
9420 * NEWS (Bug fixes): Mention it.
9421
94222011-07-15 Reuben Thomas <rrt@sc3d.org>
9423
9424 doc: improve documentation of character classes in the man page
9425 * doc/grep.in.1: Reword documentation of character classes.
9426
94272011-07-12 Jim Meyering <meyering@redhat.com>
9428
9429 dfa: remove unnecessary inclusion of verify.h
9430 * src/dfa.c: Don't include "verify.h".
9431
9432 dfa: simplify use of *ALLOC macros
9433 * src/dfa.c (XNMALLOC, XCALLOC): Redefine without outer cast-to-(t *).
9434 (CALLOC, MALLOC, REALLOC): Remove type "t" parameter and adjust callers.
9435
9436 dfa: change semantics of REALLOC_IF_NECESSARY's 3rd parameter
9437 * src/dfa.c (REALLOC_IF_NECESSARY): Change meaning of 3rd param,
9438 from "maximum index" to 1 greater than that: the required number
9439 of *P-sized elements. Note that only some of the uses of
9440 REALLOC_IF_NECESSARY needed to be adjusted, the others had already
9441 required an extra element.
9442
9443 dfa: rename REALLOC_IF_NECESSARY param/local for clarity
9444 * src/dfa.c (REALLOC_IF_NECESSARY): Rename nalloc and new_nalloc
9445 to n_alloc and new_n_alloc.
9446
9447 dfa: prepare for a semantic change in REALLOC_IF_NECESSARY
9448 * src/dfa.c (REALLOC_IF_NECESSARY): Remove "t" (type) parameter.
9449 Use (*p) instead. Adjust all callers.
9450
9451 dfa: add braces to REALLOC_IF_NECESSARY definition
9452 * src/dfa.c (REALLOC_IF_NECESSARY): Add curly braces; use TABs
9453 to right-indent.
9454
94552011-06-28 Paolo Bonzini <bonzini@gnu.org>
9456
9457 doc: improve documentation of character classes
9458 * doc/grep.texi (Character classes): Mention explicitly when
9459 examples refer to the C locale, explain better the general
9460 meaning of character classes.
9461
94622011-06-28 Jim Meyering <meyering@redhat.com>
9463
9464 dfa: fix the root cause of the heap overrun
9465 dfa's "insert" function was supposed to be maintaining the position
9466 list sorted on *decreasing* index, but since the 2009-12-09 "Speed
9467 up insert" commit, 62458291, it was using code that assumed the data
9468 were sorted on *increasing* index. As such, sometimes it would no
9469 longer merge constraints (not finding a match) and would append
9470 entries that normally would have matched and been merged. Those
9471 erroneous append operations resulted in the heap overrun fixed by
9472 2011-06-17 commit 0b91d692 by doubling the array size.
9473 * src/dfa.c (insert): Fix the comparison.
9474 (dfaanalyze): Now that that's fixed, revert commit 0b91d692,
9475 allocating space for only d->nleaves entries, not double that.
9476 As far as I can tell, this change has no effect other than
9477 decreased memory usage, although it may improve performance
9478 slightly, since the resulting list of positions is half as long
9479 as it used to be.
9480
94812011-06-28 Paolo Bonzini <bonzini@gnu.org>
9482
9483 dfa: use memcpy to copy position_sets
9484 * src/dfa.c (copy): Use memcpy.
9485
9486 dfa: use copyset to copy charclasses
9487 * src/dfa.c (add_utf8_anychar): Change memcpy to copyset.
9488
9489 gnulib: Update
9490 Fixes mmap-anon.m4 conflict with fn_grep, reported by Rainer Orth.
9491
94922011-06-21 Jim Meyering <meyering@redhat.com>
9493
9494 maint: update bootstrap from gnulib
9495 * bootstrap: Update to latest, so it no longer inserts empty lines
9496 in .gitignore files.
9497 * .gitignore: Let bootstrap move "!..." lines to end of file.
9498
9499 post-release administrivia
9500 * NEWS: Add header line for next release.
9501 * .prev-version: Record previous version.
9502 * cfg.mk (old_NEWS_hash): Auto-update.
9503
9504 version 2.9
9505 * NEWS: Record release date.
9506
9507 build: avoid a warning when building with --disable-perl-regexp...
9508 and --enable-gcc-warnings.
9509 * src/pcresearch.c (WITHOUT_PCRE_NORETURN): Define.
9510 Remove the unreachable return statement.
9511 Reported by Eric Blake.
9512
9513 tests: ensure that each test script is executable
9514 This adds a rule run at "make check" time to ensure that
9515 test scripts are consistently executable.
9516 This change is not required for "make check", but makes it easier
9517 for people to run scripts manually, but that is discouraged because
9518 doing so makes it easy to omit important variable settings that
9519 are normally provided via TESTS_ENVIRONMENT.
9520 This change also makes each of the existing TESTS executable.
9521 * tests/Makefile.am (check_executable_TESTS): New rule.
9522 (check): Depend on it.
9523 * tests/{all_scripts}: chmod 755.
9524 Prompted by a report from Eric Blake.
9525
9526 maint: update bootstrap from gnulib
9527 * bootstrap: Update from gnulib.
9528
9529 maint: update po/POTFILES.in
9530 * po/POTFILES.in: Remove dfasearch.c, now that it no longer
9531 contains a translatable diagnostic.
9532
9533 tests: include-exclude: avoid false positive failure on FreeBSD
9534 * tests/include-exclude: Avoid false-positive failure due to
9535 matching "a" in a directory on FreeBSD, when searching a directory
9536 without "-r". Search for '^aaa$' rather than just 'a'.
9537 Adjust test inputs and expected output files accordingly.
9538
9539 dfa: remove some useless casts
9540 * src/dfa.c (icatalloc): Change type of "old" parameter
9541 from "char const *" to "char *".
9542 Don't cast-away const on realloc argument.
9543 Remove now-unnecessary const-discarding cast.
9544 Don't (void)-cast strcpy result.
9545 * src/dosbuf.c (undossify_input): Remove anachronistic
9546 cast-to-"char *" of realloc argument.
9547
9548 dfa: more heap-allocation-related overflow protection
9549 * src/dfa.c (enlist): Use xnrealloc, not realloc.
9550 Also, remove unnecessary cast-to-(char *).
9551 (dfamust): Use xnmalloc, not malloc. Before, this code would
9552 return upon malloc failure (xnmalloc exits upon failure), but
9553 later, via the *ALLOC macros, it could already exit, so this
9554 new potential exit point is nothing new. The same applies
9555 to enlist, since it is called only through dfamust.
9556
9557 tests: update init.sh; simplify TESTS_ENVIRONMENT
9558 * tests/init.sh: Update from coreutils.
9559 * tests/Makefile.am (TESTS_ENVIRONMENT): Remove shell_or_perl_
9560 function. Instead, just use $(SHELL), since grep has no test
9561 that starts with #!/usr/bin/perl.
9562
95632011-06-20 Jim Meyering <meyering@redhat.com>
9564
9565 build: update gnulib submodule to latest
9566
9567 build: avoid configure/gnulib-related errors
9568 * bootstrap.conf: Remove now-unnecessary code to exclude
9569 gettext/intl-related m4 tests.
9570
95712011-06-19 Jim Meyering <meyering@redhat.com>
9572
9573 maint: tighten up superfluous code
9574 * src/main.c (parse_grep_colors): Use xstrdup in place of xmalloc,
9575 a useless test, strlen, and strcpy.
9576
95772011-06-19 Paul Eggert <eggert@cs.ucla.edu>
9578
9579 dfa: avoid possibility of overflow
9580 * src/dfa.c (REALLOC_IF_NECESSARY, CALLOC, MALLOC, REALLOC):
9581 Use functions from xalloc.h to avoid overflow.
9582 * src/dfasearch.c (GEAcompile): Use xnrealloc rather than realloc.
9583 * src/pcresearch.c (Pcompile): Use xnmalloc, not xmalloc.
9584
95852011-06-17 Jim Meyering <meyering@redhat.com>
9586
9587 build: update gnulib submodule to latest
9588
9589 dfa: correct two uses of btowc
9590 * src/dfa.c (setbit_c, setbit_case_fold_c): Compare the btowc
9591 return value against WEOF, not EOF. Suggested by Eli Zaretskii.
9592 On a system like MinGW with unsigned wint_t, comparing a btowc
9593 return value against EOF (-1) would always be false.
9594
9595 dfa: don't overrun a malloc'd buffer for certain regexps
9596 * src/dfa.c (dfaanalyze): Allocate space for twice as many
9597 positions as there are leaves. Before this change, for some
9598 regular expressions, DFA analysis would have inserted far more
9599 "positions" than dfa->nleaves (up to double).
9600 Reported by Raymond Russell in http://savannah.gnu.org/bugs/?33547
9601 * tests/dfa-heap-overrun: Trigger the overrun.
9602 * tests/Makefile.am (TESTS): Add it.
9603 * NEWS (Bug fixes): Mention it.
9604
96052011-06-08 Jim Meyering <meyering@redhat.com>
9606
9607 tests: don't ignore sjis-mb test failure
9608 I made changes that caused grep to segfault during "make check" --
9609 as seen in dmesg output -- yet no test failed(!), and there was no
9610 trace of the segfault in the logs.
9611 * tests/sjis-mb (test_grep_reject): Ensure that output is empty.
9612 Don't ignore test failure.
9613
96142011-06-07 Paolo Bonzini <bonzini@gnu.org>
9615
9616 dfa: optimize wide characters in a bracket expression
9617 * src/dfa.c (addtok): Compile characters to an alternation. Handle the
9618 case when nothing else remains in the MBCSET.
9619
9620 dfa: refactor to prepare for upcoming optimizations
9621 * src/dfa.c (parse_bracket_exp): Move optimization of MBCSET from here...
9622 (addtok): ... to here.
9623
96242011-06-07 Paolo Bonzini <bonzini@gnu.org>
9625
9626 dfa: correct handling of single-byte character ranges
9627 This provides a better fix for the unibyte-bracket-expr and high-bit-range
9628 testcases, and fixes the latent bug tested by bogus-wctob.
9629
9630 * src/dfa.c (setbit_case_fold): Remove, replace with...
9631 (setbit_wc, setbit_c, setbit_case_fold_c): ... these.
9632 (parse_bracket_exp): Use setbit_case_fold_c when iterating over
9633 single-byte sequences. Use setbit_wc for multi-byte character sets,
9634 and setbit_case_fold_c for single-byte character sets.
9635 (lex): Use setbit_case_fold_c for single-byte character sets.
9636
96372011-06-07 Paolo Bonzini <bonzini@gnu.org>
9638
9639 tests: exercise latent bug in character ranges
9640 * tests/bogus-wctob: New.
9641 * Makefile.am (TESTS): Add it.
9642
96432011-06-07 Jim Meyering <meyering@redhat.com>
9644
9645 tests: exercise a uni-byte [...] bug: requires ru_RU.KOI8-R
9646 * tests/unibyte-bracket-expr: New file.
9647 * tests/Makefile.am (TESTS): Add it.
9648 * init.cfg (require_ru_RU_koi8_r): New function.
9649
9650 fix the [...] bug also for relatively unusual uni-byte encodings
9651 * src/dfa.c (setbit_case_fold): Also handle uni-byte locales
9652 like the one mentioned in the original report: see 2011-05-07
9653 commit d98338eb. Re-reported by Santiago Ruano Rincón.
9654 Note that most uni-byte locales are not affected.
9655 * NEWS (Bug fixes): Mention it.
9656
9657 tests: use skip_test_, not skip_
9658 Use skip_test_, not skip_. The former prints its message both to
9659 the log file and to FD 9 (redirected to tty via tests/Makefile.am),
9660 while skip_ prints only to stderr, which goes to the log file.
9661 * tests/init.cfg (skip_test_): New function.
9662 Use skip_test_ in place of skip_ everywhere.
9663 * tests/fmbtest: s/skip_/skip_test_/
9664 * tests/sjis-mb: Likewise.
9665 * tests/euc-mb: Likewise.
9666
9667 tests: fmbtest: factor
9668 * tests/fmbtest: Factor out locale-name duplication.
9669
9670 tests: fix skip-inducing typo in fmbtest
9671 * tests/fmbtest: Fix locale name typo (s/cz_CZ/cs_CZ/)
9672 that would cause this test to be skipped every time.
9673
96742011-06-07 Paolo Bonzini <bonzini@gnu.org>
9675
9676 gnulib: adjust included modules
9677 * bootstrap.conf (gnulib_modules): Drop strtoul, rename wctype to
9678 wctype-h.
9679
96802011-05-21 Jim Meyering <meyering@redhat.com>
9681
9682 grep -P: don't abort upon exceeding PCRE's backtracking limit
9683 * src/pcresearch.c (Pexecute): Handle PCRE_ERROR_MATCHLIMIT.
9684 * tests/Makefile.am (XFAIL_TESTS): Remove pcre-abort.
9685 * tests/pcre-abort: Expect failure, no output, and increase
9686 the length of the input string, in case the backtracking limit
9687 is ever raised. Adjust comment.
9688 * NEWS (Bug fixes): Mention it.
9689
9690 tests: show how to make grep -P abort
9691 * tests/pcre-abort: New file.
9692 Minimal testcase by Paolo Bonzini, derived from a report
9693 by www.beaver@list.ru.
9694 * tests/Makefile.am (TESTS): Add it.
9695 (XFAIL_TESTS): Add it here, too, since this test always fails, for now.
9696
9697 tests: fix oddities in pcre-z
9698 * tests/pcre-z: Redirect stderr inside $(), not outside.
9699 Remove double quotes around $REGEX (which is just 'a') within
9700 double-quoted "$(...)". Split a long line.
9701
9702 tests: factor out a new require_pcre_ function
9703 * tests/init.cfg (require_pcre_): New function, factored out of...
9704 * tests/pcre-z: ...here. Use the function.
9705 * tests/pcre: Likewise.
9706
9707 tests: clean up pcre
9708 * tests/pcre: Skip (don't pass) the test when PCRE support is disabled.
9709 Don't redirect so much to /dev/null, now that all test output goes to
9710 pcre.log. Remove unnecessary braces and diagnostic about failing test.
9711
97122011-05-13 Jim Meyering <meyering@redhat.com>
9713
9714 post-release administrivia
9715 * NEWS: Add header line for next release.
9716 * .prev-version: Record previous version.
9717 * cfg.mk (old_NEWS_hash): Auto-update.
9718
9719 version 2.8
9720 * NEWS: Record release date.
9721
9722 build: update gnulib, for fixed getcwd test
9723
9724 build: update gnulib submodule to latest
9725
9726 maint: remove syntax-checking sc_tight_scope rule
9727 * src/Makefile.am (sc_tight_scope): Remove rule.
9728 Now it's provided via gnulib's maint.mk.
9729 * cfg.mk (sc_tight_scope): Likewise.
9730
97312011-05-08 Jim Meyering <meyering@redhat.com>
9732
9733 maint: use consistent declaration syntax
9734 * src/grep.h (matchers): Declare consistently, so the sc_tight_scope
9735 rule detects this as an extern-marked variable.
9736
97372011-05-07 Jim Meyering <meyering@redhat.com>
9738
9739 maint: use gnulib's new readme-release module
9740 * bootstrap.conf (gnulib_modules): Add readme-release.
9741 (bootstrap_epilogue): Add the recommended perl one-liner.
9742 * README-release: Remove file; it is now generated from gnulib.
9743 * .gitignore: Add it.
9744 * gnulib: Update submodule to latest.
9745
9746 tests: exercise bug with 0x80..0xff in [...]
9747 * tests/high-bit-range: New test, inspired by an example in the
9748 report by Igor O. Ladygin: http://bugs.debian.org/624387,
9749 via Santiago Ruano Rincón's http://savannah.gnu.org/bugs/?33198
9750 * tests/Makefile.am (TESTS): Add it.
9751
9752 fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff
9753 * src/dfa.c (setbit_case_fold) [MBS_SUPPORT]: Set the bit also
9754 when wctob returns EOF.
9755 * NEWS (Bug fixes): Mention it.
9756
97572011-05-02 Reuben Thomas <rrt@sc3d.org>
9758
9759 doc: correct comment about mmap
9760 * doc/grep.texi (Other Options) [--mmap]: This option is now
9761 ignored, so using it can have no effect on performance.
9762
97632011-05-02 Arnold D. Robbins <arnold@skeeve.com>
9764
9765 build: move add_utf8_anychar into MBS ifdef
9766
97672011-05-01 Arnold D. Robbins <arnold@skeeve.com>
9768
9769 maint: remove GAWK ifndef; no longer needed
9770
97712011-05-01 Jim Meyering <meyering@redhat.com>
9772
9773 maint: remove now-unnecessary use of gnulib's strtol module
9774 * bootstrap.conf (gnulib_modules): Remove now-obsolete "strtol".
9775
97762011-04-29 Jim Meyering <meyering@redhat.com>
9777
9778 maint: tweak README-release
9779 * README-release: Add note to check the NixOS/Hydra autobuilder results.
9780
97812011-04-28 Jim Meyering <meyering@redhat.com>
9782
9783 build: update gnulib submodule to latest
9784
9785 maint: add the tight_scope syntax-checking rule
9786 This ensures that the only externally scoped symbols are ones
9787 that are explicitly marked as "extern" or white-listed like "main".
9788 * src/Makefile.am (sc_tight_scope): New rule, copied from coreutils.
9789 * cfg.mk (sc_tight_scope): Define, to hook to it from the top level.
9790
9791 maint: mark some function declarations as extern
9792 * src/search.h: Add "extern" keyword to each function declaration.
9793
97942011-04-23 Jim Meyering <meyering@redhat.com>
9795
9796 maint: fix doubled-word typos in comments
9797 * src/dfa.c (SUCCEEDS_IN_CONTEXT): Remove doubled "a".
9798 * src/dfa.c (BACKREF): s/it it/it is/
9799
98002011-04-09 Jim Meyering <meyering@redhat.com>
9801
9802 maint: fix typos in comments: s/can not/cannot/
9803 * src/dfa.c (check_matching_with_multibyte_ops, dfastate): As above.
9804
98052011-03-19 Jim Meyering <meyering@redhat.com>
9806
9807 maint: stop using .x-sc_* files to list syntax-check exemptions
9808 Instead, use the new mechanism with which you merely use a
9809 variable (derived from the rule name) defined in cfg.mk to an ERE
9810 matching the exempted file names.
9811 * gnulib: Update to latest, to get maint.mk that implements this.
9812 * .x-sc_bindtextdomain: Remove file.
9813 * .x-sc_prohibit_tab_based_indentation: Likewise.
9814 * .x-sc_prohibit_xalloc_without_use: Likewise.
9815 * .x-sc_space_tab: Likewise.
9816 * cfg.mk: Define variables to exempt the same files.
9817
9818 build: correct my change of 2011-01-28
9819 Do not override original dist-hook rule.
9820 * Makefile.am (run-syntax-check): Rename from overriding dist-hook.
9821 (dist-hook): Depend on run-syntax-check.
9822
98232011-02-27 Jim Meyering <meyering@redhat.com>
9824
9825 maint: update from gnulib
9826 * bootstrap: Update from gnulib.
9827 * tests/init.sh: Likewise.
9828 * gnulib: Update to latest.
9829
98302011-01-27 Jim Meyering <meyering@redhat.com>
9831
9832 build: update gnulib submodule to latest
9833
9834 build: run syntax-check rules as part of "make dist"
9835 * Makefile.am (dist-hook): Depend on syntax-check.
9836 Suggested by Reuben Thomas.
9837
98382011-01-26 Jim Meyering <meyering@redhat.com>
9839
9840 maint: remove unneeded #include directives
9841 * lib/savedir.c: Don't include <stddef.h>. Not needed.
9842 * src/dfa.c: Likewise.
9843
98442011-01-22 Jim Meyering <meyering@redhat.com>
9845
9846 build: avoid new syntax-check failures
9847 * .x-sc_bindtextdomain: New file, used to avoid a spurious
9848 failure from the new syntax-check rule.
9849 * NEWS: Remove a trailing space.
9850
98512011-01-19 Jim Meyering <meyering@redhat.com>
9852
9853 tests: add a known-to-fail test
9854 * tests/turkish-I: New test.
9855 * tests/Makefile.am (TESTS): Add it.
9856 (XFAIL_TESTS): Add here, too.
9857 Reported by Ilya Basin.
9858
9859 maint: sort test names in Makefile.am
9860 * tests/Makefile.am (TESTS): Sort test names.
9861
98622011-01-05 Jim Meyering <meyering@redhat.com>
9863
9864 doc: remove erroneous "{,m}" item from grep man page
9865 * doc/grep.in.1: Remove item describing bogus {,m} regex notation.
9866 Reported by Fernando Basso.
9867
98682011-01-03 Jim Meyering <meyering@redhat.com>
9869
9870 maint: update copyright year ranges to include 2011
9871 Run "make update-copyright", so "make syntax-check" works in 2011.
9872
9873 build: update gnulib submodule to latest
9874
98752010-12-20 Paolo Bonzini <bonzini@gnu.org>
9876
9877 main: fix exit status on xmalloc failures
9878 * NEWS: Update.
9879 * src/main.c (main): Set exit_failure. Reported by Guy Shaw.
9880
9881 add comment above fn_grep
9882 * configure.ac (fn_grep): Add comment suggested by Bruno Haible.
9883
98842010-11-14 Paolo Bonzini <bonzini@gnu.org>
9885
9886 grep: add include guards
9887 * src/system.h: Add multiple inclusion guards.
9888 * src/grep.h: Likewise.
9889
9890 configure: fix M4 quotation
9891 * configure.ac: Add extra brackets around [...] patterns.
9892
9893 configure: remove dependency on grep that supports long lines and -e
9894 * configure.ac (fn_grep): New. Set GREP and EGREP to it, replace
9895 with newly-built grep before AC_OUTPUT. Reported by Florin Iucha
9896 <http://savannah.gnu.org/bugs/?31646>.
9897
98982010-11-04 Jim Meyering <meyering@redhat.com>
9899
9900 build: update gnulib to latest
9901
9902 tests: don't hard-code a 5-second timeout; that's not always enough
9903 Instead, time the command in the C locale and use 10 times that
9904 duration -- rounded up to whole seconds -- as the timeout when running
9905 it in the UTF-8 locale.
9906 * tests/backref-multibyte-slow: Compute a performance-relative timeout.
9907 Reported by Gilles Espinasse, regarding an imac 400. For more details,
9908 see http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3360
9909
99102010-10-09 Jim Meyering <meyering@redhat.com>
9911
9912 maint: describe policy on copyright year number ranges
9913 * README: Mention coreutils' long-standing policy on use of M-N
9914 ranges in copyright year lists. Requested by Richard Stallman.
9915
99162010-10-04 Dmitry V. Levin <ldv@altlinux.org>
9917
9918 build: compile gnulib without -Wcast-align to avoid warnings on ARM
9919 * configure.ac (GNULIB_WARN_CFLAGS): Remove -Wcast-align.
9920
99212010-09-30 Jim Meyering <meyering@redhat.com>
9922
9923 maint: don't define a gpg_key_ID. now it's obtained automatically
9924 * cfg.mk (gpg_key_ID): Remove definition. No longer needed.
9925
99262010-09-23 Paolo Bonzini <bonzini@gnu.org>
9927
9928 tests: add testcase for previous fix
9929 * tests/inconsistent-ranges: New.
9930 * tests/Makefile.am (TESTS): Add it.
9931
99322010-09-23 Paolo Bonzini <bonzini@gnu.org>
9933
9934 dfa: process range expressions consistently with system regex
9935 The actual meaning of range expressions in glibc is not exactly strcoll,
9936 which makes the behavior of grep hard to predict when compiled with the
9937 system regex. Leave to the system regex matcher the decision of which
9938 single-byte characters are matched by a range expression.
9939
9940 This partially reverts a change made in commit 0d38a8bb (which made
9941 sense at the time, but not now that src/dfa.c is not doing multibyte
9942 character set matching anymore).
9943
9944 * src/dfa.c (in_coll_range): Remove.
9945 (parse_bracket_exp): Use system regex to find which single-char
9946 bytes match a range expression.
9947
99482010-09-23 Bruno Haible <bruno@clisp.org>
9949
9950 build: fix link error on systems that have libiconv but not libintl
9951 * src/Makefile.am (LDADD): Add $(LIBICONV).
9952
99532010-09-21 Jim Meyering <meyering@redhat.com>
9954
9955 build: avoid compilation failure on the Hurd
9956 * src/dfasearch.c (dfawarn): Rename enum symbols to use DW_ prefix,
9957 so as not to collide with "GNU", which is defined by the Hurd.
9958 Reported by Matthias Lanzinger in http://savannah.gnu.org/bugs/?31096
9959
99602010-09-20 Jim Meyering <meyering@redhat.com>
9961
9962 maint: avoid obsolete gnulib modules
9963 * bootstrap.conf (gnulib_modules): Don't use obsolete atexit module.
9964 Use malloc-gnu and realloc-gnu -- malloc and realloc are obsolete.
9965
9966 maint: update README-release
9967 * README-release: Reflect changes in coreutils' version of this file.
9968
99692010-09-20 Aharon Robbins <arnold@skeeve.com>
9970
9971 dfa: fix compilation when not using MBS
9972 * src/dfa.c (prepare_wc_buf) [!MBS_SUPPORT]: Do not compile this
9973 function.
9974
99752010-09-16 Jim Meyering <meyering@redhat.com>
9976
9977 post-release administrivia
9978 * NEWS: Add header line for next release.
9979 * .prev-version: Record previous version.
9980 * cfg.mk (old_NEWS_hash): Auto-update.
9981
9982 version 2.7
9983 * NEWS: Record release date.
9984
99852010-09-13 Paolo Bonzini <bonzini@gnu.org>
9986
9987 tests: add equiv-classes
9988 * configure.ac (USE_INCLUDED_REGEX): Add Automake conditional.
9989 * tests/equiv-classes: New test.
9990 * tests/Makefile.am (TESTS): Add it.
9991 (XFAIL_TESTS) [USE_INCLUDED_REGEX]: Mark it as expected failure.
9992
99932010-09-13 Paolo Bonzini <bonzini@gnu.org>
9994
9995 dfa: fall back to glibc matcher if a MBCSET is found
9996 This patch enables full support of equivalence classes and multicharacter
9997 collation symbols. It can also improve performance problems in some
9998 cases for multibyte grep. Both of these changes however depend on the
9999 glibc version installed in the system.
10000
10001 For UTF-8 it will trigger only in the presence of MBCSET, e.g. [a-z].
10002 For other character sets all brackets and `.` as well will trigger it.
10003
10004 * NEWS: Document this.
10005 * src/dfa.c (dfaexec): Fall back to glibc for multibyte matches,
10006 if possible.
10007
100082010-09-13 Paolo Bonzini <bonzini@gnu.org>
10009
10010 build: update gnulib submodule to latest
10011 This is done to include commit "regex: Pass the system regex if its only
10012 problem is 32-bit regoff_t".
10013
10014 * gnulib: Update to e2b0e1a.
10015
100162010-09-12 Jim Meyering <meyering@redhat.com>
10017
10018 build: update gnulib submodule to latest
10019
10020 tests: update init.sh from gnulib
10021 * tests/init.sh: Update from gnulib.
10022
100232010-09-08 Patrick Boyd <pboyd04@gmail.com>
10024
10025 dfa: reduce stack usage
10026 * src/dfa.c (dfaanalyze): Allocate GRPS and LABELS arrays from heap,
10027 not on the stack. With this change, grep can now run in these UEFI
10028 simulators:
10029 http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=EDK
10030 http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=EDK2
10031
100322010-09-08 Jim Meyering <meyering@redhat.com>
10033
10034 tests/portability: avoid spurious failure with OpenBSD's /bin/sh
10035 * tests/warn-char-classes: Don't use "set -x" here. It causes
10036 a spurious test failure on openbsd 4.7 when using its /bin/sh,
10037 since the command, /bin/sh -xc 'P=1 : 2> err' emits "P=1" into err.
10038 To enable set -x, run the test with "VERBOSE=yes", e.g.,
10039 make check -C tests TESTS=warn-char-classes VERBOSE=yes
10040
100412010-09-07 Jim Meyering <meyering@redhat.com>
10042
10043 build: update gnulib submodule to latest
10044
100452010-09-03 Jim Meyering <meyering@redhat.com>
10046
10047 tests: remove .sh suffix from remaining test scripts.
10048 * tests/backref: Rename from backref.sh.
10049 * tests/bre: Rename from bre.sh.
10050 * tests/ere: Rename from ere.sh.
10051 * tests/file: Rename from file.sh.
10052 * tests/khadafy: Rename from khadafy.sh.
10053 * tests/options: Rename from options.sh.
10054 * tests/pcre: Rename from pcre.sh.
10055 * tests/spencer1: Rename from spencer1.sh.
10056 * tests/spencer2: Rename from spencer2.sh.
10057 * tests/status: Rename from status.sh.
10058 * tests/yesno: Rename from yesno.sh.
10059 * tests/Makefile.am: Reflect renamings.
10060
10061 tests: convert remaining tests to use init.sh
10062 * tests/file.sh: Use init.sh. Use Exit, not exit. Use grep, not ${GREP}.
10063 * tests/khadafy.sh: Likewise.
10064 * tests/options.sh: Likewise.
10065 * tests/spencer1.sh: Likewise.
10066 * tests/spencer2.sh: Likewise.
10067 * tests/status.sh: Likewise.
10068 * tests/spencer1.awk: Use grep, not ${GREP}.
10069 Don't ignore failure to generate intermediate shell script.
10070 * tests/Makefile.am (CLEANFILES): Remove altogether, now that
10071 all tests use init.sh.
10072 (TESTS_ENVIRONMENT): Don't set GREP. It's no longer used.
10073
10074 tests: remove warning.sh
10075 * tests/warning.sh: Remove file. All it did was print a warning.
10076 * tests/Makefile.am (TESTS): Remove warning.sh.
10077
10078 tests: convert pcre.sh to use init.sh
10079 * tests/pcre.sh: Use init.sh. Use Exit, not exit. Use grep, not ${GREP}.
10080
10081 tests: convert bre.sh to use init.sh
10082 * tests/bre.sh: Use init.sh.
10083 Use Exit, not exit.
10084 Use "$abs_top_srcdir/tests/", not "$srcdir/" to specify inputs.
10085 Source generated bre.script, rather than invoking $SHELL.
10086 * tests/ere.sh: Likewise.
10087 * tests/bre.awk: Use grep, not ${GREP}.
10088 * tests/ere.awk: Likewise.
10089 * tests/Makefile.am (CLEANFILES): Remove bre.script and ere.script.
10090
10091 tests: convert to use init.sh
10092 * tests/yesno.sh: Use init.sh.
10093 Use Exit, not exit.
10094 Use grep, not $GREP.
10095 * tests/backref.sh: Likewise.
10096 * tests/Makefile.am (CLEANFILES): Remove yesno.txt.
10097
10098 build: update gnulib submodule to latest
10099
10100 build: update build/test tools from gnulib
10101 * bootstrap: Update from gnulib.
10102 * tests/init.sh: Likewise.
10103
101042010-09-01 Jim Meyering <meyering@redhat.com>
10105
10106 maint: add lib/version-etc.c to the list in POTFILES.in
10107 * po/POTFILES.in: Add lib/version-etc.c.
10108
101092010-09-01 Jim Meyering <meyering@redhat.com>
10110
10111 grep: diagnose and exit-2 for bogus REs like [:space:], [:digit:], etc.
10112 When I make a mistake like this:
10113 grep '[:lower:]' ...
10114 be it in a script or on the command line, I want to know about
10115 it as soon as possible. I don't want grep to print a mere warning
10116 that it is interpreting this suspicious and almost guaranteed-wrong
10117 regular expression as a set of just 6 bytes. And I certainly don't
10118 want grep to silently do the wrong thing, even if that would be
10119 officially standards-conforming. It's obvious that I intended
10120 [[:lower:]], and I want my error to be diagnosed in a way that is
10121 most likely to get my attention. Thus, with this change, grep now
10122 prints a diagnostic and exits with status 2 the moment it
10123 encounters an offending [:char_class:] construct.
10124
10125 This changes the way grep works by default, rather than
10126 putting this new behavior on an option. A new option
10127 would seldom be used in scripts (not portable), and would
10128 probably be used only rarely by those who need it the most.
10129 This new functionality provides a valuable safety measure
10130 and incurs truly negligible risk.
10131
10132 For strict POSIX compliance, set POSIXLY_CORRECT in
10133 your environment. That disables this new feature.
10134
10135 Revert the changes from commit 2cd3bcea, "grep: add
10136 --warnings={always,never,auto}.", and then do the following:
10137
10138 * src/dfasearch.c (dfawarn): Call getenv("POSIXLY_CORRECT") here;
10139 Remove "warning: " from the diagnostic, now that it's more than
10140 a warning, and exit with status 2.
10141 * NEWS (New features): Describe the new semantics.
10142 * tests/warn-char-classes: Adjust one test to accommodate this change.
10143 * doc/grep.texi (Character Classes and Bracket Expressions): Document.
10144 (Environment Variables): Cross-reference it.
10145 Remove reference to obsolete getopt illegal vs. invalid difference.
10146 Thanks to Paul Eggert for suggestions and an initial prod.
10147
101482010-08-30 Jim Meyering <meyering@redhat.com>
10149
10150 maint: use gnulib's standard --version-printing code
10151 This includes author names and keeps the copyright year up to date.
10152 * bootstrap.conf (gnulib_modules): Add propername and version-etc-fsf.
10153 * src/main.c (AUTHORS): Define.
10154 (main): Use version_etc, rather than hard-coding the copyright text.
10155 Prompted by a patch from Paolo Bonzini.
10156
101572010-08-27 Paolo Bonzini <bonzini@gnu.org>
10158
10159 dfa: warn on [:space:] and similar
10160 * src/dfa.c (parse_bracket_exp): Warn on regular expressions such as
10161 [:space:].
10162 * src/dfa.h (dfawarn): New prototype.
10163 * src/dfasearch.c (dfawarn): New.
10164 * NEWS: Document.
10165
10166 tests: add test for warnings
10167 * tests/Makefile.am (TESTS): Add warn-char-class.
10168 * tests/warn-char-class: New.
10169
10170 grep: add --warnings={always,never,auto}.
10171 * src/grep.h (no_warnings): New declaration.
10172 * src/main.c (no_warnings): New.
10173 (WARNINGS_OPTION): Add to enum.
10174 (main): Add --warnings. Handle color_option == 2 together with it.
10175
10176 tests: add failing test for grep from a directory
10177 * tests/Makefile.am (TESTS, XFAIL_TESTS): Add grep-dir.
10178 * tests/grep-dir: New.
10179
10180 tests: add test for previous commit
10181 * tests/Makefile.am (TESTS): Add grep-dev-null.
10182 * tests/grep-dev-null: New.
10183
10184 search: fix "grep -Fif /dev/null"
10185 * bootstrap.conf: Include gnulib module minmax.
10186 * src/searchutils.c (mbtolower): Handle *N == 0 case.
10187 * src/system.h: Include minmax.h from gnulib.
10188
101892010-08-27 Adam Katz <savannah@kopis.com>
10190
10191 Remove declaration after statement in dfa.c
10192 * dfa.c (dfaexec): Declare saved_end at the beginning of the function.
10193
101942010-08-13 Jim Meyering <meyering@redhat.com>
10195
10196 make --include=FILE work once again
10197 The semantics of excluded_file_name changed (when operating on
10198 an "included" file name list).
10199 * src/main.c (main): Adjust for changed semantics of excluded_file_name
10200 simply by removing a negation.
10201 * NEWS (Bug fixes): Mention this fix.
10202 * tests/include-exclude: Add a test for this.
10203 Reported by Joe Perches in http://savannah.gnu.org/bugs/?29876.
10204
102052010-07-16 Paolo Bonzini <bonzini@gnu.org>
10206
10207 doc: document \s and \S
10208 * doc/grep.texi (The Backslash Character and Special Expressions):
10209 Document \s and \S escapes.
10210
102112010-05-29 Karl Berry <karl@gnu.org>
10212
10213 doc: discuss matches that span two or more lines
10214 * doc/grep.texi (Usage): Discuss matching across lines.
10215 (Character Classes and Bracket Expressions) <[:space:]>: refer to it.
10216
102172010-05-25 Jim Meyering <meyering@redhat.com>
10218
10219 build: use latest gettext: 0.18
10220 * configure.ac: Use gettext-0.18.
10221 * bootstrap.conf (gnulib_modules): Use gettext-h, not gettext.
10222 since the latter drags in a depedency on gettext 0.18.
10223 Suggested by Bruno Haible.
10224
10225 maint: update helper scripts from gnulib
10226 * tests/init.sh: Update from gnulib.
10227 * bootstrap: Likewise.
10228
10229 build: update gnulib submodule to latest
10230
10231 maint: don't emit an extra newline in each of two diagnostics
10232 * src/main.c (context_length_arg, grepdir): Remove a stray \n in
10233 each of two diagnostics.
10234
102352010-05-24 Bruno Haible <bruno@clisp.org>
10236
10237 search: Avoid out-of-bounds access.
10238 * src/dfasearch.c (EGexecute): Avoid access beyond end of buffer
10239 that could happen if start != beg - buf.
10240
102412010-05-23 Aharon Robbins <arnold@skeeve.com>
10242
10243 dfa: fix signedness warnings
10244 * src/dfa.c (dfaexec): Cast p when passing it to prepare_wc_buf.
10245
102462010-05-09 Jim Meyering <meyering@redhat.com>
10247
10248 tests: update init.sh
10249 * tests/init.sh: Update from gnulib.
10250
10251 tests: normalize init.sh-sourcing code
10252 * tests/backref-multibyte-slow: Use one-line idiom.
10253 * tests/backref-word: Likewise.
10254 * tests/case-fold-backref: Likewise.
10255 * tests/case-fold-backslash-w: Likewise.
10256 * tests/case-fold-char-class: Likewise.
10257 * tests/case-fold-char-range: Likewise.
10258 * tests/case-fold-char-type: Likewise.
10259 * tests/char-class-multibyte: Likewise.
10260 * tests/dfaexec-multibyte: Likewise.
10261 * tests/empty: Likewise.
10262 * tests/euc-mb: Likewise.
10263 * tests/fedora: Likewise.
10264 * tests/fgrep-infloop: Likewise.
10265 * tests/fmbtest: Likewise.
10266 * tests/foad1: Likewise.
10267 * tests/ignore-mmap: Likewise.
10268 * tests/include-exclude: Likewise.
10269 * tests/max-count-vs-context: Likewise.
10270 * tests/pcre-z: Likewise.
10271 * tests/prefix-of-multibyte: Likewise.
10272 * tests/reversed-range-endpoints: Likewise.
10273 * tests/sjis-mb: Likewise.
10274 * tests/spencer1-locale: Likewise.
10275 * tests/word-delim-multibyte: Likewise.
10276 * tests/word-multi-file: Likewise.
10277
10278 tests: update help-version
10279 * tests/help-version: Update from coreutils.
10280
102812010-05-06 Jim Meyering <meyering@redhat.com>
10282
10283 tests: enable glibc's malloc-perturbing option
10284 * tests/Makefile.am (MALLOC_PERTURB_): Define, in case it's not already
10285 set in your environment.
10286 (TESTS_ENVIRONMENT): Propagate MALLOC_PERTURB_ setting to test scripts.
10287
102882010-05-06 Paolo Bonzini <bonzini@gnu.org>
10289
10290 dfa: speed up [[:digit:]] and [[:xdigit:]]
10291 There's no "multibyte pain" in these two classes, since POSIX
10292 and ISO C99 mandate their contents.
10293
10294 Time for "./grep -x '[[:digit:]]' /usr/share/dict/linux.words"
10295 Before: 1.5s, after: 0.07s. (sed manages only 0.5s).
10296
10297 * src/dfa.c (predicates): Declare struct dfa_ctype separately
10298 from definition. Add sb_only.
10299 (find_pred): Return const struct dfa_ctype *.
10300 (parse_bracket_exp): Return const struct dfa_ctype *. Do
10301 not fill MBCSET for sb_only character types.
10302
103032010-05-05 Jim Meyering <meyering@redhat.com>
10304
10305 tests: readability: use awk rather than obfuscated sed
10306 * tests/backref-multibyte-slow: Generate input using an awk for-loop
10307 rather than expensive and harder-to-read sed pipes.
10308 Remove stray "set -x" and "wc -l in".
10309
10310 dfa: avoid segfault when processing an invalid multi-byte sequence
10311 * src/dfa.c (dfaexec): Handle the cases in which mbrtowc returns
10312 (size_t)-1 or (size_t)-2, rather than setting mblen_buf[i] to an
10313 outrageously large value.
10314
103152010-05-05 Paolo Bonzini <bonzini@gnu.org>
10316
10317 grep: remove redundant syntax bit
10318 * grep.c (Gcompile): Remove RE_HAT_LISTS_NOT_NEWLINE.
10319
10320 tests: add test for newly-fixed performance problem
10321 * tests/backref-multibyte-slow: New.
10322 * tests/Makefile.am: Add it.
10323
103242010-05-05 Paolo Bonzini <bonzini@gnu.org>
10325
10326 dfa: convert to wide character line-by-line
10327 This provides a nice speedup for -m in general, but especially
10328 it avoids quadratic complexity in case we have to go to glibc.
10329
10330 * NEWS: Document change.
10331 * src/dfa.c (prepare_wc_buf): Extract out of dfaexec. Convert
10332 only up to the next newline.
10333 (dfaexec): Exit multibyte processing loop if past buf_end.
10334 Call prepare_wc_buf again after processing a newline.
10335
103362010-05-01 Jim Meyering <meyering@redhat.com>
10337
10338 maint: remove useless #if HAVE_STDLIB_H
10339 * src/mbsupport.h: Don't test HAVE_STDLIB_H.
10340
103412010-04-20 Jim Meyering <meyering@redhat.com>
10342
10343 dfa: don't #ifdef-out member declarations
10344 * src/dfa.c (struct dfa): Remove "#if MBS_SUPPORT" guard that made
10345 several member declarations conditional on this cpp definition.
10346 (token): Likewise.
10347 Reported by Anders Wallin.
10348
10349 tests: ensure that the --mmap option is ignored
10350 * tests/ignore-mmap: New file.
10351 * tests/Makefile.am (TESTS): Add it.
10352 Reported by Jaroslav Škarvada in <http://savannah.gnu.org/bugs/?29614>
10353
103542010-04-20 Paolo Bonzini <bonzini@gnu.org>
10355
10356 dfa: honor RE_DOT_NEWLINE and RE_DOT_NOT_NULL in UTF-8 period optimization
10357 * src/dfa.c (add_utf8_anychar): Check for RE_DOT_NEWLINE and
10358 RE_DOT_NOT_NULL.
10359
10360 grep: fix --mmap not being ignored
10361 * NEWS: Document bugfix.
10362 * main.c (main): Ignore MMAP_OPTION.
10363
103642010-04-19 Jim Meyering <meyering@redhat.com>
10365
10366 maint: avoid syntax-check failure due to indentation via TABs
10367 * src/dfa.c (atom): Expand TABs in indentation.
10368
10369 build: update gnulib submodule to latest
10370
10371 maint: restrict scope of two globals to dfasearch.c
10372 * src/dfasearch.c (patterns, pcount): Declare these file-scoped
10373 globals to be static.
10374
103752010-04-19 Paolo Bonzini <bonzini@gnu.org>
10376
10377 dfa: optimize UTF-8 period
10378 * NEWS: Document improvement.
10379 * src/dfa.c (struct dfa): Add utf8_anychar_classes.
10380 (add_utf8_anychar): New.
10381 (atom): Simplify if/else nesting. Call add_utf8_anychar for ANYCHAR
10382 in UTF-8 locales.
10383 (dfaoptimize): Abort on ANYCHAR.
10384
10385 dfa: drop ORTOP
10386 * src/dfa.c (token, prtok, addtok_mb, nsubtoks, dfaanalyze, dfamust):
10387 Remove ORTOP.
10388 (regexp): Remove parameter, always add OR at the end, adjust callers.
10389 (atom): Adjust caller.
10390 (dfaparse): Adjust caller. Always add OR at the end.
10391
10392 dfa: fix {0,0}
10393 * NEWS: Document change.
10394 * src/dfa.c (struct dfa): Remove "broken" field.
10395 (lex): Do not set it.
10396 (closure): On {0,0}, backup and lex another closure without
10397 adding a CAT.
10398 (dfabroken): Remove.
10399 * src/dfa.h (dfabroken): Remove.
10400 * tests/spencer1.tests: Add testcases for {m,n}.
10401
10402 dfa: simplify dfainit
10403 * src/dfa.c (dfainit): Use memset.
10404
104052010-04-17 Jim Meyering <meyering@redhat.com>
10406
10407 doc: fix a nit in HACKING
10408 * HACKING: Correct size of .git/ dir: 9MB, not 30MB.
10409
10410 tests: add an expected-to-fail test using \< in a multi-byte locale
10411 * tests/word-delim-multibyte: New test. Currently failing.
10412 * tests/Makefile.am (TESTS): Add it.
10413 (XFAIL_TESTS): Define, temporarily.
10414 Reported by Jaroslav Škarvada in http://savannah.gnu.org/bugs/?29537.
10415
104162010-04-16 Paolo Bonzini <bonzini@gnu.org>
10417
10418 test: cover just-fixed bug
10419 * tests/empty: Test -Fw too.
10420
10421 grep: fix matching the empty string with grep -Fw
10422 * NEWS: Document fix.
10423 * src/kwsearch.c (Fexecute): The empty string is a valid match if it is
10424 a whole word.
10425
104262010-04-15 Jim Meyering <meyering@redhat.com>
10427
10428 maint: update init.sh and HACKING
10429 * HACKING: Sync from coreutils.
10430 * tests/init.sh: Update from gnulib.
10431
104322010-04-13 Jim Meyering <meyering@redhat.com>
10433
10434 build: update gnulib submodule to latest; adapt
10435 * COPYING: Remove empty line.
10436 * README: Likewise.
10437 * doc/fdl.texi: Likewise.
10438 * tests/backref-word: Likewise.
10439
104402010-04-11 Stefano Lattarini <stefano.lattarini@gmail.com>
10441
10442 tests: accept the Debian timeout program
10443 * tests/init.cfg: test timeout with `timeout 10s true'
10444
104452010-04-08 Jim Meyering <meyering@redhat.com>
10446
10447 dfa: convert "cannot happen" code/comment to use assert
10448 * src/dfa.c (dfamust): There were numerous "cannot happen" comments,
10449 some associated with "if (expr) goto done;". Replace each with an
10450 equivalent "assert (!expr);".
10451
10452 build: use gnulib's isblank module
10453 * bootstrap.conf (gnulib_modules): Use gnulib's isblank module,
10454 now that we rely on the function by that name.
10455
10456 maint: undo TAB-conversion change to gl/lib/*.c.diff
10457 This fixes a bootstrap failure due to the patches not applying.
10458 * .x-sc_prohibit_tab_based_indentation: Add ^gl/lib/.*\.c\.diff$
10459 * gl/lib/regcomp.c.diff: Revert today's TAB->space change.
10460 * gl/lib/regex_internal.c.diff: Likewise.
10461 * gl/lib/regexec.c.diff: Likewise.
10462
104632010-04-08 Arnold D. Robbins <arnold@skeeve.com>
10464
10465 dfa: fix declaration of dfabroken in dfa.h
10466 * dfa.h (dfabroken) [GAWK]: Fix declaration to match that in dfa.c.
10467
104682010-04-08 Jim Meyering <meyering@redhat.com>
10469
10470 maint: add syntax-check rule to enforce the new no-leading-TABs policy
10471 * cfg.mk (sc_prohibit_tab_based_indentation): New rule, from coreutils.
10472 (sc_prohibit_emacs__indent_tabs_mode__setting): Likewise.
10473 (old_NEWS_hash): Update.
10474 * .x-sc_prohibit_tab_based_indentation: List exempt files.
10475
104762010-04-08 Jim Meyering <meyering@redhat.com>
10477
10478 convert all TABs to equivalent spaces in indentation
10479 Using this file,
10480
10481 cat > leading-blank.exempt <<\EOF
10482 (?:^|\/)ChangeLog[^/]*$
10483 (?:^|\/)(?:GNU)?[Mm]akefile[^/]*$
10484 \.(?:am|mk)$
10485 EOF
10486
10487 run this command to convert all non-conforming leading white
10488 space to be all spaces:
10489
10490 git ls-files \
10491 | pcregrep -vf leading-blank.exempt \
10492 | xargs pcregrep -l '^ *\t' \
10493 | xargs perl -MText::Tabs -ni -le \
10494 '$m=/^( *\t[ \t]*)(.*)/; print $m ? expand($1) . $2 : $_'
10495
104962010-04-08 Jim Meyering <meyering@redhat.com>
10497
10498 build: include cfg.mk in the distribution tarball
10499 * Makefile.am (EXTRA_DIST): Add cfg.mk.
10500
105012010-04-08 Jim Meyering <meyering@redhat.com>
10502
10503 maint: Makefile.am tweak (no semantic change)
10504 * Makefile.am (EXTRA_DIST): List one per line. Sort.
10505
10506 build: include cfg.mk in the distribution tarball
10507 * Makefile.am (EXTRA_DIST): Add cfg.mk.
10508
105092010-04-08 Jim Meyering <meyering@redhat.com>
10510
10511 dfa: move definition of __attribute__ back into dfa.h
10512 * src/dfa.c (__attribute__): Move definition back to...
10513 * src/dfa.h: ... this file. It is essential for non-gcc compilers.
10514 Reported by Arnold Robbins.
10515
105162010-04-07 Arnold D. Robbins <arnold@skeeve.com>
10517
10518 dfa: move internals from dfa.h to dfa.c
10519 * src/dfa.h: Move internals into dfa.c.
10520 * src/dfa.c: The dfa internals are now totally local to this file.
10521 (dfaalloc, dfamusts, dfabroken): New functions to access features.
10522 * src/dfasearch.c (dfa): Change this global variable from struct to pointer.
10523 Adapt to that change, and use new functions, dfamusts and dfaalloc.
10524
105252010-04-07 Jim Meyering <meyering@redhat.com>
10526
10527 mbtolower: avoid potential NULL-dereference
10528 * src/searchutils.c: Include <assert.h>.
10529 (mbtolower): Assert that 0 < *n, to avoid possibility of NULL-deref.
10530 Remove dead increment.
10531
10532 maint: tell git to ignore more build products
10533 * .gitignore: Also ignore results of "make ID" and "make tags".
10534
10535 build: update gnulib submodule to latest
10536
10537 tests: use init.sh consistently
10538 * tests/euc-mb: Call "path_prepend_ ." on a line by itself,
10539 and with a comment. This makes it so all of the srcdir/init.sh
10540 lines are consistent, project-wide, and so that the addition of "."
10541 to PATH for this test is properly documented.
10542 * tests/sjis-mb: Likewise.
10543
10544 maint: avoid new syntax-check failure, ...
10545 ...now that the sole use of xmalloc no longer matches the
10546 regular expression used by the syntax-check rule.
10547 * .x-sc_prohibit_xalloc_without_use: Exempt src/kwset.c.
10548
10549 grep: make kwset's obstack use xmalloc, not malloc
10550 This insidious bug could make grep fail to diagnose a failed malloc,
10551 and then proceed to dereference the resulting NULL pointer.
10552 Note that this bug was unlikely ever to cause real trouble; without
10553 the fix, grep would segfault upon OOM, now it exits with a diagnostic.
10554 * src/kwset.c (malloc) [GREP]: Define without the "(s)" macro
10555 parameter, so that unadorned uses of malloc are also mapped to xmalloc.
10556 One such use is in the expansion of obstack_init.
10557 Report and patch by Nelson H. F. Beebe, in
10558 http://thread.gmane.org/gmane.comp.gnu.grep.bugs/2995
10559
10560 tests: improve help-version (sync from gzip's version)
10561 * tests/help-version: Cross-check $VERSION and --version output.
10562 * tests/Makefile.am (TESTS_ENVIRONMENT): Export VERSION=$(VERSION).
10563
105642010-04-06 Jim Meyering <meyering@redhat.com>
10565
10566 doc: update THANKS
10567 * THANKS: Update.
10568
105692010-04-06 Aharon Robbins <arnold@skeeve.com>
10570
10571 build: avoid conflict with WCHAR definition from Cygwin's <windows.h>
10572 * src/dfa.h (enum token): Remove the definition from this file.
10573 Replace with a declaration and typedef. Moved to ...
10574 * src/dfa.c (enum token): ... here.
10575 Reported by Corinna Vinschen.
10576
105772010-04-06 Jim Meyering <meyering@redhat.com>
10578
10579 doc: add HACKING
10580 * HACKING: New file. Copied from coreutils, with s/coreutils/grep/
10581 and a few minor edits.
10582
105832010-04-05 Jim Meyering <meyering@redhat.com>
10584
10585 tests: pull fixed init.sh from gnulib
10586 * tests/init.sh: Update from gnulib.
10587
10588 maint: fix new argmatch-related syntax-check failures
10589 * configure.ac (ARGMATCH_DIE): Use usage(EXIT_FAILURE), not exit(1).
10590 * po/POTFILES.in: Add lib/argmatch.c.
10591
10592 maint: update cfg.mk to work with gnulib's newer "make syntax-check"
10593 * cfg.mk: Update to use new _sc_search_regexp interface. Run this:
10594 perl -pi -e 's/\b_prohibit_regexp\b/_sc_search_regexp/;'
10595 -e 's/\bmsg=/halt=/; s/\bre=/prohibit=/;' cfg.mk
10596 and then adjust backslashes so they still line up.
10597
10598 maint: update tests/init.sh from gnulib
10599 This ensures that the explanation for any skipped or failed test
10600 is printed on stderr, not buried in each .log file.
10601 * tests/init.sh: Update from gnulib.
10602 * tests/init.cfg (stderr_fileno_): Define to 9, to match the
10603 literal 2>&9 in tests/Makefile.am
10604
10605 build: update gnulib submodule to latest
10606
106072010-04-04 Jim Meyering <meyering@redhat.com>
10608
10609 maint: use argmatch, for better --directories=INVAL diagnostics
10610 Before, you'd see this:
10611 grep: unknown directories method
10612
10613 Now, you'll see this:
10614 grep: invalid argument `INVAL' for `--directories'
10615 Valid arguments are:
10616 - `read'
10617 - `recurse'
10618 - `skip'
10619 Usage: src/grep [OPTION]... PATTERN [FILE]...
10620 Try `src/grep --help' for more information.
10621
10622 * bootstrap.conf: Add argmatch.
10623 * configure.ac: Define ARGMATCH_DIE and ARGMATCH_DIE_DECL.
10624 * src/main.c (directories_type): Define.
10625 (directories_args, directories_types) Define.
10626 All of the above so we can...
10627 (main): Use XARGMATCH.
10628 (usage): Declare extern, now that argmatch calls it via ARGMATCH_DIE.
10629
106302010-04-04 Jim Meyering <meyering@redhat.com>
10631
10632 dfa.c: const correctness; and remove useless casts of realloc and malloc
10633 * src/dfa.c (icatalloc, icpyalloc, istrstr, enlist): As above.
10634 (inboth, dfamust, comsubs): Likewise.
10635
10636 dfa.c: use a better (unsigned) type for an index: int->unsigned int
10637 * src/dfa.c (dfaexec): Use "unsigned int" for a logically unsigned index.
10638
10639 maint: style: use sizeof VAR, rather than sizeof TYPE, where possible
10640 * src/dfa.c (copyset, zeroset): Prefer sizeof EXPR, over sizeof TYPE,
10641 for improved readability/maintainability.
10642 (equal, parse_bracket_exp, addtok_wc, dfaparse, dfaexec): Likewise.
10643
106442010-04-02 Jim Meyering <meyering@redhat.com>
10645
10646 dfa.c: use a better (unsigned) type for an index: int->size_t
10647 * src/dfa.c (parse_bracket_exp): Use size_t as type of index, not int.
10648
10649 maint: const-correctness
10650 * src/dfa.c (tstbit, copyset, equal, charclass_index): Declare read-only
10651 "charclass" parameters to be "const". No semantic change.
10652
10653 maint: include <wchar.h> and <wctype.h> unconditionally
10654 * src/main.c: Include <wchar.h> and <wctype.h> unconditionally.
10655 Their presence/usefulness are assured by gnulib.
10656 * src/dfa.c: Likewise.
10657 * src/search.h: Likewise.
10658
10659 maint: MBS_SUPPORT: define to 0/1, not undef/1
10660 Prepare to remove many of these #ifdefs.
10661 * src/mbsupport.h (MBS_SUPPORT): Define to 0/1, not undef/1.
10662 Change each "#ifdef MBS_SUPPORT" to "#if MBS_SUPPORT". Use this:
10663 perl -pi -e 's/ifdef (MBS_SUPPORT)/if $1/' $(g grep -l ifdef.MBS_SUPPO)
10664 * src/dfa.c: s/#ifdef MBS_SUPPORT/#if MBS_SUPPORT/
10665 * src/dfa.h: Likewise.
10666 * src/dfasearch.c: Likewise.
10667 * src/kwsearch.c: Likewise.
10668 * src/main.c: Likewise.
10669 * src/search.h: Likewise.
10670 * src/searchutils.c: Likewise.
10671
106722010-04-02 Jim Meyering <meyering@redhat.com>
10673
10674 maint: use STREQ in place of strcmp
10675 perl -pi -e 's/\bstrcmp *\((.*?)\) == 0/STREQ ($1)/' src/main.c
10676 perl -pi -e 's/\bstrcmp *\((.*?)\) != 0/!STREQ ($1)/' src/main.c
10677
10678 * src/dfa.c (STREQ): Define.
10679 Use it instead of strcmp.
10680 * src/main.c (STREQ): Likewise.
10681 * cfg.mk (local-checks-to-skip): Remove sc_prohibit_strcmp,
10682 to enable the strcmp-prohibition.
10683
106842010-04-02 Jim Meyering <meyering@redhat.com>
10685
10686 maint: enable the useless_cpp_parens syntax check
10687 * cfg.mk (local-checks-to-skip): Remove sc_useless_cpp_parens.
10688 * src/main.c (devices, fillbuf, exit_on_match): Remove useless parens.
10689 (print_line_head, grepfile, set_limits, main): Likewise.
10690 * src/vms_fab.h: Likewise.
10691 * vms/config_vms.h: Likewise.
10692 * src/mbsupport.h: Likewise.
10693
10694 cleanup and improvement: parse command line arguments consistently
10695 * src/main.c: Include c-ctype.h, for this:
10696 (prepend_args): Use c_isspace, not ISSPACE.
10697 This is important so that we parse arguments consistently,
10698 and independently of the current locale.
10699 * bootstrap.conf (gnulib_modules): Add c-ctype.
10700 * src/system.h: Remove IS* definitions here, too.
10701 * src/dfasearch.c (WCHAR): Use isalnum, not ISALNUM.
10702 * src/kwsearch.c (WCHAR): Likewise.
10703 * src/searchutils.c (kwsinit): Use tolower, not TOLOWER.
10704
10705 cleanup: rely on gnulib's ctype.h functions; remove IS* macros and is_*
10706 * src/dfa.c (setbit_case_fold, prednames): Use official names.
10707 (IS_WORD_CONSTITUENT, lex): Likewise.
10708 (ISALNUM, ISALPHA, ISCNTRL, ISDIGIT, ISGRAPH): Remove definitions.
10709 (ISLOWER, ISPRINT, ISPUNCT, ISSPACE, ISUPPER, ISXDIGIT): Likewise.
10710 (is_alnum, is_alpha, is_blank, is_cntrl, is_digit, is_graph): Likewise.
10711 (is_lower, is_print, is_punct, is_space, is_upper, is_xdigit): Likewise.
10712 (isgraph): Likewise.
10713
10714 build: update gnulib submodule to latest, and adjust
10715 * src/main.c (parse_grep_colors): Adjust diagnostics not to trigger
10716 the sc_error_message_period and sc_error_message_uppercase
10717 syntax-check rules.
10718
10719 maint: remove all VMS-related code
10720 * configure.ac (AC_CONFIG_FILES): Remove vms/Makefile
10721 * Makefile.am (SUBDIRS): Remove vms.
10722 * src/Makefile.am (EXTRA_DIST): Remove vms_fab.c and vms_fab.h.
10723 * src/vms_fab.c, src/vms_fab.h, vms/make.com: Remove files.
10724 * vms/Makefile.am, vms/README, vms/config_vms.h: Likewise.
10725
10726 post-release administrivia
10727 * NEWS: Add header line for next release.
10728 * .prev-version: Record previous version.
10729 * cfg.mk (old_NEWS_hash): Auto-update.
10730
10731 version 2.6.3
10732 * NEWS: Record release date.
10733
107342010-04-02 Jim Meyering <meyering@redhat.com>
10735
10736 grep: avoid used-undefined error with truncated multibyte input
10737 * src/dfa.c (addtok_wc): Don't use buf[0] (it's undefined) when
10738 wcrtomb returns <= 0.
10739
10740 MBS_SUPPORT-removal: * src/dfa.c (dfastate):
10741
107422010-04-01 Jim Meyering <meyering@redhat.com>
10743
10744 maint: avoid unnecessary 2nd getenv("TERM")
10745 * src/main.c (main): Don't call getenv("TERM") twice -- in the same
10746 expression, even.
10747
10748 tests: remove all unportable uses of echo
10749 * src/main.c: Use printf rather than echo -ne in a comment.
10750 * tests/fedora: Use printf (not echo) also in ok/fail functions.
10751 * cfg.mk (sc_prohibit_echo_minus_en): New rule, to prohibit
10752 any future introduction.
10753
10754 tests: add explicit requirement for en_US.UTF-8
10755 * tests/char-class-multibyte: Use require_en_utf8_locale_,
10756 rather than open-coding it.
10757 * tests/prefix-of-multibyte: Require the locale explicitly.
10758 * tests/fgrep-infloop: Likewise.
10759 This fixes test failures that would arise on systems without
10760 that particular locale. Reported by Ludovic Courtès.
10761
10762 tests: new function, to require an en_US UTF8 locale
10763 * tests/init.cfg (require_en_utf8_locale_): New function.
10764
10765 tests: use printf, not echo -n, echo -e, or any combination
10766 * tests/fedora: Using printf is more portable.
10767
10768 grep: remove unnecessary code
10769 * src/main.c (print_line_middle): Now that we use RE_ICASE
10770 (enabled in commit 70e23616, "dfa: rewrite handling of multibyte
10771 case_fold lexing"), this case-conversion code is useless and wasteful.
10772 Remove it.
10773
10774 doc: fix typo: s/AM_V_AT/AM_V_at/
10775 * doc/Makefile.am (egrep.1 fgrep.1): The former has case consistent
10776 with its sister variable, AM_V_GEN, but the latter is the one that
10777 actually works.
10778
10779 doc: generated files are best made read-only, ...
10780 ...to minimize risk of accidentally modifying the generated file
10781 rather than its template. These are tiny, so no risk, but it's
10782 a good to be consistent, so generated files are easier to spot.
10783 * doc/Makefile.am (egrep.1 fgrep.1): When generating these files,
10784 ensure that they too are created read-only.
10785
10786 doc: generate grep.1 from template
10787 * doc/Makefile.am (grep.1): New rule.
10788 (CLEANFILES): Add grep.1 to the list.
10789 * .gitignore: Add /doc/grep.1
10790 * doc/grep.in.1: Replace hard-coded "2.5.1-cvs" with @VERSION@.
10791 Update copyright year list.
10792 Omit the line-splitting \(co directive so that update-copyright
10793 will perform future updates automatically.
10794 Egmont Koblinger reported the outdated version string
10795 and copyright year list in the man page:
10796 http://savannah.gnu.org/bugs/?29390
10797
10798 doc: prepare to generate grep.1
10799 * doc/grep.1: Rename to...
10800 * doc/grep.in.1: ...this.
10801
108022010-03-31 Eric Blake <eblake@redhat.com>
10803
10804 build: avoid another warning
10805 Noticed on cygwin:
10806 get-mb-cur-max.c: In function 'main':
10807 get-mb-cur-max.c:27: error: unused parameter 'argc' [-Wunused-parameter]
10808
10809 * tests/get-mb-cur-max.c (main): Use argc.
10810
108112010-03-31 Paolo Bonzini <bonzini@gnu.org>
10812
10813 tests: fix on systems with broken sh
10814 * tests/Makefile.am (TESTS_ENVIRONMENT): Adjust coreutils remnants.
10815 * tests/bre.sh: Invoke script with $SHELL if defined.
10816 * tests/ere.sh: Likewise.
10817 * tests/spencer1-locale: Likewise.
10818 * tests/spencer1.sh: Likewise.
10819
10820 tests: improve empty test
10821 * tests/empty: Add more tests, note expected failure.
10822
10823 tests: improve empty test with respect to locales
10824 * tests/empty: Add tests for multiple locales.
10825
10826 grep: fix grep -F against empty string
10827 * src/searchutils.c (is_mb_middle): Do not return true for empty matches
10828 when p == buf.
10829
10830 tests: rename empty.sh to empty
10831 * tests/empty.sh: Rename to...
10832 * tests/empty: ... this.
10833 * tests/Makefile.am (TESTS): Adjust.
10834
10835 tests: convert empty.sh to new style
10836 * tests/empty.sh: Convert to init.sh, add 10-second timeout.
10837
10838 tests: use get-mb-cur-max in char-class-multibyte
10839 * tests/char-class-multibyte: Use get-mb-cur-max to detect UTF-8 support.
10840 Rewrite previous locale detection code as a grep test.
10841
10842 tests: fix -Wformat failure
10843 * tests/get-mb-cur-max (main): Cast MB_CUR_MAX to int.
10844
108452010-03-30 Jim Meyering <meyering@redhat.com>
10846
10847 doc: add a "Reply-To" to the suggested announcement mail header
10848 * README-release: Add "Reply-To" with the list address,
10849 to minimize risk of replies to the other announcement recipients.
10850 Suggestion from Eric Blake.
10851
108522010-03-29 Jim Meyering <meyering@redhat.com>
10853
10854 build: avoid compiler warning when building test program
10855 * tests/Makefile.am (AM_CPPFLAGS, AM_CFLAGS, AM_LDFLAGS): Define,
10856 so that all the usual C compile-and-link machinery comes into play.
10857 * tests/get-mb-cur-max.c: Include "progname.h".
10858 Remove unnecessary inclusion of <ctype.h>.
10859 Mike Frysinger reported the "implicit decl of set_program_name" warning.
10860
10861 build: detect PCRE support also when <pcre/pcre.h> is the header
10862 * m4/pcre.m4: Also check for <pcre/pcre.h>.
10863 * src/pcresearch.c: Include <pcre/pcre.h>, if needed.
10864 Guard inclusions with HAVE_PCRE_H and HAVE_PCRE_PCRE_H, not HAVE_LIBPCRE.
10865 * NEWS (Bug fixes): Mention it.
10866 Dmitry V. Levin reported that PCRE support was not detected
10867 on systems with <pcre.h> not in the default include path.
10868
10869 post-release administrivia
10870 * NEWS: Add header line for next release.
10871 * .prev-version: Record previous version.
10872 * cfg.mk (old_NEWS_hash): Auto-update.
10873
10874 version 2.6.2
10875 * NEWS: Record release date.
10876
108772010-03-29 Eric Blake <eblake@redhat.com>
10878
10879 build: avoid warnings on cygwin
10880 * lib/savedir.c (isdir): Avoid shadowing a declaration.
10881 * src/main.c (get_nondigit_option): Cast away const to avoid
10882 compiler warning.
10883
10884 maint: ignore new test executable
10885 * .gitignore: Enhance.
10886
108872010-03-29 Jim Meyering <meyering@redhat.com>
10888
10889 doc: consolidate redundant-looking entries
10890 * NEWS: Consolidate the two --include/exclude-related entries.
10891 Suggested by Eric Blake.
10892
108932010-03-29 Paolo Bonzini <bonzini@gnu.org>
10894
10895 tests: use $(...) consistently
10896 * tests/backref.sh: Use `...' instead of ``...'' in comments.
10897 * tests/bre.awk: Use $(...) instead of `...`.
10898 * tests/ere.awk: Use $(...) instead of `...`.
10899 * tests/euc-mb: Use $(...) instead of `...`.
10900 * tests/fmbtest: Use $(...) instead of `...`.
10901 * tests/foad1: Use $(...) instead of `...`.
10902 * tests/pcre-z: Use $(...) instead of `...`. Quote output of grep.
10903 * tests/spencer1-locale.awk: Use $(...) instead of `...`.
10904 * tests/spencer1.awk: Use $(...) instead of `...`.
10905 * tests/yesno.sh: Use $(...) instead of `...`.
10906
109072010-03-29 Jim Meyering <meyering@redhat.com>
10908
10909 build: make doc/Makefile.am cleaner and more robust
10910 * doc/Makefile.am (egrep.1 fgrep.1): Generate robustly, i.e.,
10911 do not redirect directly to $@.
10912 Use $(AM_V_GEN).
10913 Do not distribute intermediate files like fgrep.man and egrep.man.
10914 Likewise, do not use them to generate their %.1 images.
10915 Instead, generate the .1 files directly.
10916
109172010-03-29 Paolo Bonzini <bonzini@gnu.org>
10918
10919 tests: add program to detect locales
10920 * tests/Makefile.am (check_PROGRAMS): Add get-mb-cur-max.
10921 * tests/get-mb-cur-max.c: New.
10922 * tests/euc-mb: Use it. Fail if the former detection test fails.
10923 * tests/sjis-mb: Use it. Fail if the former detection test fails. Expand
10924 comments.
10925
109262010-03-29 Paolo Bonzini <bonzini@gnu.org>
10927
10928 tests: add tests for SJIS character sets
10929 The attached test will be skipped unless (on a glibc system) you run
10930 something like
10931
10932 mkdir /usr/lib/locale/ja_JP.SHIFT_JIS
10933 zcat /usr/share/i18n/charmaps/SHIFT_JIS.gz | \
10934 localedef \
10935 -f - \
10936 -i /usr/share/i18n/locales/ja_JP \
10937 /usr/lib/locale/ja_JP.SHIFT_JIS
10938
10939 * tests/Makefile.am: Add sjis-mb.
10940 * tests/sjis-mb: New.
10941
109422010-03-29 Paolo Bonzini <bonzini@gnu.org>
10943
10944 grep -F: fix a bug with SJIS character sets
10945 Commit db9d6 would erroneously skip matches in SJIS character sets. In
10946 this character set low bytes (i.e. ASCII bytes) are also valid second
10947 bytes in a double-byte character, so you have to continue looking for
10948 a match, even if you match in the middle of a double-byte character.
10949
10950 * src/kwsearch.c: Ensure that beg is advanced by at least one byte,
10951 but do not fail immediately after matching in the middle of a double-byte
10952 character.
10953
109542010-03-28 Bruno Haible <bruno@clisp.org>
10955
10956 build: update after change in gnulib's lib-ignore module
10957 * src/Makefile.am (AM_LDFLAGS): Define. Use gnulib's new
10958 $(IGNORE_UNUSED_LIBRARIES_CFLAGS).
10959
109602010-03-28 Jim Meyering <meyering@redhat.com>
10961
10962 tests: disable new texinfo-acronym syntax-check from gnulib
10963 * cfg.mk (local-checks-to-skip): Add new sc_texinfo_acronym, to skip it.
10964
109652010-03-28 Norihiro Tanaka <noritnk@kcn.ne.jp>
10966
10967 tests: exercise fix for improper match of incomplete MB char prefix
10968 * tests/prefix-of-multibyte: New file.
10969 * tests/Makefile.am (TESTS): Add it.
10970
109712010-03-28 Jim Meyering <meyering@redhat.com>
10972
10973 grep -F: fix a multi-byte erroneous-match-in-middle bug
10974 Just as Perl prints nothing in this case,
10975 printf '\357\274\241\n' | perl -CIO -lne '/\357/ and print'
10976
10977 grep should also print nothing when used as follows.
10978 However, these would mistakenly match with grep prior to 2.6.2:
10979 printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\357'
10980 printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\357\274'
10981
10982 * src/searchutils.c (is_mb_middle): New parameter: the length of the
10983 match, in bytes, as determined by kwsexec. Use this to detect when
10984 the nominal match found by kwsexec must be skipped because it is for
10985 an incomplete multi-byte character that is a prefix of a character
10986 in the input.
10987 * src/dfasearch.c (EGexecute): Update caller.
10988 * src/kwsearch.c (Fexecute): Likewise.
10989 * src/search.h: Update prototype.
10990 * NEWS (Bug fixes): Mention it.
10991 Report and analysis by Norihiro Tanaka.
10992
109932010-03-28 Norihiro Tanaka <noritnk@kcn.ne.jp>
10994
10995 tests: add tests for the fgrep-infloop bug
10996 * tests/init.cfg (require_timeout_): New function.
10997 * tests/fgrep-infloop: New file. Test for the above fix.
10998 * tests/Makefile.am (TESTS): Add it.
10999
110002010-03-28 Jim Meyering <meyering@redhat.com>
11001
11002 grep -F: avoid infinite loop when searching for incomplete MB character
11003 Searching for an incomplete non-prefix of a multi-byte character
11004 should find no match.
11005
11006 Just as these print nothing,
11007 printf '\357\274\241\357\274\241\n' \
11008 | perl -CIO -ne '/\241\357/ and print'
11009 printf '\357\274\241\n' | perl -CIO -ne '/\274\241/ and print'
11010 printf '\357\274\241\n' | perl -CIO -ne '/\241/ and print'
11011 printf '\357\274\241\n' | perl -CIO -ne '/\274/ and print'
11012
11013 These should also print nothing, but with grep-2.6 and grep-2.6.1,
11014 they would infloop:
11015 printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\241'
11016 printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274'
11017 printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274\241'
11018
11019 * src/kwsearch.c (Fexecute): Don't infloop when searching for
11020 an incomplete non-prefix part of a multi-byte character.
11021 * NEWS (Bug fixes): Mention it.
11022 Reported and diagnosed by Norihiro Tanaka.
11023
110242010-03-28 Jim Meyering <meyering@redhat.com>
11025
11026 tests: rename: fmbtest.sh -> fmbtest
11027 * tests/fmbtest.sh: Rename to ...
11028 * tests/fmbtest: ...this, dropping the .sh suffix.
11029 * tests/Makefile.am (TESTS): Reflect renaming.
11030
11031 tests: convert fmbtest.sh to use init.sh
11032 * tests/fmbtest.sh: Use init.sh and adapt accordingly:
11033 Use "grep", not ${GREP}. Use Exit, not exit.
11034
11035 tests: also exercise the --include + glob path
11036 * tests/include-exclude: Exercise Javier's fix.
11037
110382010-03-28 Javier Villavicencio <the_paya@gentoo.org>
11039
11040 grep -r: fix --include with globs, too
11041 The previous fix addressed only the non-glob case.
11042 * src/main.c (main): Use add_exclude's EXCLUDE_WILDCARDS option,
11043 to enable the use of fnmatch with --include=GLOB.
11044 gnulib: Update to latest, for the fixed exclude.c.
11045
110462010-03-28 Jim Meyering <meyering@redhat.com>
11047
11048 grep -r: fix --include with non-globs
11049 * lib/savedir.c (savedir): Fix logic error. Introduced by commit
11050 bf3bd92c, "build: adapt to the newer exclude API we now get from gnulib"
11051 * tests/include-exclude: Test for this bug by exercising --include, too.
11052 * NEWS (Bug fixes): Mention it.
11053 Reported by Philipp Kohlbecher in http://savannah.gnu.org/bugs/?29358
11054
110552010-03-27 Jim Meyering <meyering@redhat.com>
11056
11057 kwset: correct comments; require non-NULL kwsmatch argument
11058 * src/kwset.c (kwsexec): Correct comments. This function has been
11059 returning an offset, not a pointer, for 9 years.
11060 Do not test for kwsmatch == NULL. All callers pass non-NULL.
11061 (cwexec): Likewise.
11062 * src/kwset.h (kwsexec): Mark the 4th parameter, kwsmatch, as non-NULL.
11063 Include "arg-nonnull.h".
11064
11065 build: add -I$(top_builddir)/lib so we also find generated .h files
11066 * src/Makefile.am (AM_CPPFLAGS): Rename from INCLUDES to avoid
11067 warning from automake -Wall.
11068 Add -I$(top_builddir)/lib, so we find generated .h files like
11069 getopt.h in a non-srcdir build.
11070
11071 build: remove superfluous LOCALEDIR definition
11072 * src/Makefile.am (INCLUDES): Remove unnecessary definition of
11073 LOCALEDIR here. Now, it's defined via gnulib's configmake.h.
11074 * src/system.h: Include "configmake.h" for its LOCALEDIR definition.
11075
11076 grep: don't segfault upon use of --include or --exclude* options
11077 * lib/savedir.c (isdir1): Fix fatal typo: deref "dir" argument,
11078 not the global (initially-NULL) "path". Reported by Standish Parsley.
11079 * tests/include-exclude: New file.
11080 * tests/Makefile.am (TESTS): Add it.
11081 * NEWS (Bug fixes): Mention it.
11082
110832010-03-26 Jim Meyering <meyering@redhat.com>
11084
11085 tests: rename: foad1.sh -> foad1
11086 * tests/foad1.sh: Rename to ...
11087 * tests/foad1: ...this, dropping the .sh suffix.
11088 * tests/Makefile.am (TESTS): Reflect renaming.
11089
11090 tests: convert foad1.sh to use init.sh
11091 This fixes a spurious test failure when "make check" is run with
11092 certain envvars set, e.g., "make check GREP_COLOR=always"
11093 * tests/foad1.sh: Use init.sh and adapt accordingly:
11094 Use "grep", not ${GREP}. Test VERBOSE against "yes", not "1",
11095 to be consistent with init.sh.
11096 Use Exit, not exit.
11097 Reported by Nelson H. F. Beebe.
11098
11099 tests: insulate tests from envvar settings
11100 * tests/init.cfg (vars_): Unset each envvar that can affect how
11101 grep works. This protects only those tests that have been
11102 converted to use init.sh.
11103
111042010-03-25 Eric Blake <eblake@redhat.com>
11105
11106 maint: ignore 'make dist pdf' droppings
11107 * .gitignore: Add more exemptions.
11108
111092010-03-25 Jim Meyering <meyering@redhat.com>
11110
11111 tests: avoid spurious test failure due to lack of a French UTF8 locale
11112 * tests/init.cfg: New file. If either $LOCALE_FR or $LOCALE_FR_UTF8
11113 is set to "none", reset it to the empty string.
11114 Reported by Mike Frysinger and Sven Joachim.
11115 * tests/Makefile.am (EXTRA_DIST): Add init.cfg.
11116
11117 build: do not use pkg-config to test for PCRE support
11118 * configure.ac: Do not use PKG_PROG_PKG_CONFIG or PKG_CHECK_MODULES.
11119 Do not modify CPPFLAGS; that belongs to those who invoke make.
11120 Instead, use autoconf's AC_CHECK_HEADERS and AC_SEARCH_LIBS via the
11121 new macro, gl_FUNC_PCRE, defined in...
11122 * m4/pcre.m4 (gl_FUNC_PCRE): New macro, to handle pcre-related
11123 configure-time tests.
11124 * src/Makefile.am (grep_LDADD): Use LIB_PCRE, not PCRE_LIBS.
11125 * src/pcresearch.c: Test HAVE_LIBPCRE via "#if", not "#ifdef".
11126 All other cpp tests of this symbol used "#if".
11127 Prompted by a suggestion from Bruno Haible.
11128 * NEWS (Build-related): Mention this.
11129
11130 doc: correct and amend NEWS entries for 2.6.1
11131 * NEWS (Bug fixes): Correct character ranges bug description.
11132 Add an example from Dmitry V. Levin.
11133 Add that the word-with-backref bug was introduced in 2.5.1.
11134 * cfg.mk (old_NEWS_hash): Update to match.
11135
11136 post-release administrivia
11137 * NEWS: Add header line for next release.
11138 * .prev-version: Record previous version.
11139 * cfg.mk (old_NEWS_hash): Auto-update.
11140
11141 version 2.6.1
11142 * NEWS: Record release date.
11143
111442010-03-25 Tony Abou-Assaleh <taa@acm.org>
11145
11146 tests: use awk's -v option more portably
11147 * tests/spencer1-locale: Add a space between awk's "-v" option and
11148 the following VAR=value string, to avoid test failure on Mac OS X.
11149
111502010-03-25 Norihirio Tanaka <noritnk@kcn.ne.jp>
11151
11152 dfa/grep: fix compilation with MBS_SUPPORT
11153 * src/dfa.c (cur_mb_len): Initialize to 1 and always make it available.
11154 (setbit_case_fold): Do not use wint_t in prototype if !MBS_SUPPORT.
11155 (parse_bracket_exp): Fix compilation with !MBS_SUPPORT.
11156 * src/kwsearch.c (kwsinit): Do not use mbtolower and MB_CUR_MAX
11157 if !MBS_SUPPORT.
11158 * src/searchutils.c (kwsinit): Do not refer to MB_CUR_MAX if !MBS_SUPPORT.
11159
11160 * tests/char-class-multibyte: Skip if UTF-8 matching does not work.
11161 * tests/fmbtest.sh: Likewise.
11162
111632010-03-25 Jim Meyering <meyering@redhat.com>
11164
11165 build: avoid warnings about unnecessary use of "return"
11166 * src/grep.c (Gcompile, Ecompile, Acompile): Do not "return X"
11167 from a function returning void, not even when X itself is a
11168 function returning void. This avoids warnings from Sun Studio 11
11169 reported by Dagobert Michelsen.
11170 * src/egrep.c (Ecompile): Likewise.
11171
111722010-03-25 Norihirio Tanaka <noritnk@kcn.ne.jp>
11173
11174 grep: fix printing when -w is used and regex is needed for matching
11175 * NEWS: Document bugfix.
11176 * src/dfasearch.c (EGexecute): After assess_pattern_match len, is either
11177 invalid or end-beg; jump to success.
11178 * tests/Makefile.am (TESTS): Add new test.
11179 * tests/backref-word: New.
11180
111812010-03-25 Paolo Bonzini <bonzini@gnu.org>
11182
11183 dfa: fix single byte character ranges
11184 * src/dfa.c (in_coll_range): Fix ordering for second strcoll. Reported
11185 by Dmitry V. Levin.
11186 * tests/spencer1-locale.awk: Also test single-byte character sets.
11187 * NEWS: Add a note about this bugfix.
11188 * THANKS: Add Dmitry.
11189
111902010-03-25 Norihirio Tanaka <noritnk@kcn.ne.jp>
11191
11192 grep: reset state after truncated or invalid multibyte sequences
11193 * src/searchutils.c (is_mb_middle): When treating an invalid sequence
11194 or a truncated multibyte character as a single byte character, reset
11195 mbstate
11196
11197 grep: do lowercase conversion in print_line_middle only for single-byte case
11198 * src/main.c (print_line_middle): Restrict match_icase code
11199 to MB_CUR_MAX == 1. Adjust comments.
11200
112012010-03-25 Jim Meyering <meyering@redhat.com>
11202
11203 tests: provide framework_failure_ function
11204 The shell function "framework_failure" was called in the unusual
11205 event that some fundamental test set-up operation would fail.
11206 However it was not defined. Define it, but with a trailing underscore
11207 to impinge less on the test writer's name space. Adjust all uses.
11208 * tests/init.sh (framework_failure_): New function.
11209 * tests/case-fold-backref: s/framework_failure/framework_failure_/
11210 * tests/case-fold-char-class: Likewise.
11211 * tests/case-fold-char-range: Likewise.
11212 * tests/case-fold-char-type: Likewise.
11213 * tests/char-class-multibyte: Likewise.
11214 * tests/dfaexec-multibyte: Likewise.
11215 * tests/max-count-vs-context: Likewise.
11216 * tests/word-multi-file: Likewise.
11217
112182010-03-24 Jim Meyering <meyering@redhat.com>
11219
11220 doc: tweak THANKS
11221 * THANKS: Update Arnold's name and address, per request.
11222
11223 portability: use gnulib's lseek wrapper
11224 * bootstrap.conf (gnulib_modules): Use gnulib's lseek wrapper,
11225 for improved portability. lseek does not fail with ESPIPE on
11226 pipes on some systems.
11227
11228 build: avoid link failure on Solaris 8
11229 * bootstrap.conf (gnulib_modules): Add wctob.
11230 * NEWS (Portability): Mention this.
11231 Reported by Dagobert Michelsen in <http://sv.gnu.org/bugs/?29325>.
11232
112332010-03-24 Petr Písař <petr.pisar@atlas.cz>
11234
11235 doc: translate new --help message
11236 * src/main.c: Translate "after_options".
11237
112382010-03-24 Jim Meyering <meyering@redhat.com>
11239
11240 doc: NEWS make it clear that the bug was introduced in 2.6
11241 * NEWS: Clarify.
11242
112432010-03-24 Paolo Bonzini <bonzini@gnu.org>
11244
11245 tests: fix char-class-multibyte
11246 * tests/char-class-multibyte: Make it pass.
11247
112482010-03-23 Jim Meyering <meyering@redhat.com>
11249
11250 build: avoid compilation failure when MBS_SUPPORT not defined
11251 * src/dfa.c (setbit_case_fold) [!MBS_SUPPORT]: Fix curly brace mismatch.
11252
112532010-03-23 Paolo Bonzini <bonzini@gnu.org>
11254
11255 dfa: fix sigsegv on multibyte character classes
11256 Reported by Jaroslav Škarvada <jskarvad@redhat.com>. This is
11257 unfortunate. grep needs an automatic testcase generator.
11258
11259 * NEWS: Document bug.
11260 * THANKS: Mention reporter.
11261 * src/dfa.c (set_bit_casefold): Change type of first argument for
11262 self-documentation.
11263 (parse_bracket_exp): Fix call.
11264 * tests/Makefile.am: Add new testcase.
11265 * tests/char-class-multibyte: New testcase.
11266
112672010-03-23 Jim Meyering <meyering@redhat.com>
11268
11269 post-release administrivia
11270 * NEWS: Add header line for next release.
11271 * .prev-version: Record previous version.
11272 * cfg.mk (old_NEWS_hash): Auto-update.
11273
11274 version 2.6
11275 * NEWS: Record release date.
11276
11277 build: avoid warnings: tell gcc and clang that dfaerror never returns
11278 * src/dfa.h (__attribute__): Define.
11279 (dfaerror): Declare with the "noreturn" attribute.
11280 * src/dfasearch.c (dfaerror): Add an unreachable use of abort.
11281
112822010-03-22 Eric Blake <eblake@redhat.com>
11283
11284 build: fix cygwin build
11285 Portions of gnulib depend on -lintl, and cygwin does not allow
11286 lazy linking.
11287
11288 * src/Makefile.am (LDADD): Include libraries in correct order.
11289
112902010-03-22 Paolo Bonzini <bonzini@gnu.org>
11291
11292 grep: remove --mmap
11293 mmap is a bad idea for sequentially accessed file because it will cause
11294 a page fault for every read page. Just consider it a failed experiment,
11295 and ignore --mmap while accepting it for backwards compatibility.
11296
11297 * configure.ac (AC_FUNC_MMAP): Remove.
11298 * doc/grep.texi (Other options): Say --mmap is ignored.
11299 * src/grep.c (mmap_option): Remove.
11300 (long_options): Do not reference it.
11301 (bufmapped, initial_bufoffset): Remove.
11302 (reset, fillbuf): Remove HAVE_MMAP code.
11303 (grepfile): Remove bufmapped reference.
11304 (usage): Say --mmap is ignored.
11305
113062010-03-22 Paolo Bonzini <bonzini@gnu.org>
11307
11308 grep: rename files for intuitiveness
11309 * Makefile.am (libgrep_a_SOURCES, grep_SOURCES, egrep_SOURCES,
11310 fgrep_SOURCES): Adjust.
11311 * grep.c: Rename to main.c.
11312 * esearch.c: Rename to egrep.c.
11313 * fsearch.c: Rename to fgrep.c.
11314 * gsearch.c: Rename to grep.c.
11315
11316 grep: kill GREP_PROGRAM/EGREP_PROGRAM/FGREP_PROGRAM
11317 * NEWS: Document slight semantic change.
11318 * TODO: #ifdefs are gone.
11319 * po/POTFILES.in: Update.
11320 * src/Makefile.am (grep_SOURCES, egrep_SOURCES, fgrep_SOURCES): Remove
11321 grep.c/egrep.c/fgrep.c.
11322 (noinst_LIBRARIES): Change libsearch.a to libgrep.a.
11323 (libsearch_a_SOURCES): Rename to libgrep_a_SOURCES, add grep.c
11324 (LDADD): Change libsearch.a to libgrep.a.
11325 * src/esearch.c: Add before_options and after_options.
11326 * src/fsearch.c: Likewise.
11327 * src/gsearch.c: Likewise.
11328 * src/grep.c (short_options, long_options): Remove GREP_PROGRAM
11329 special-casing.
11330 (usage): Use before_options and after_options, look at matchers.
11331 (setmatcher): Merge with install_matcher.
11332 (main): Call setmatcher (NULL) instead of install_matcher.
11333 * src/grep.h (GREP_PROGRAM): Remove.
11334 (before_options, after_options): Add.
11335
11336 thank Eric Blake
11337 * THANKS: Add Eric Blake, who reported the warning fixed by 774d0ee.
11338
11339 grep: libify *search.c
11340 * src/Makefile.am (libsearch_a_SOURCES): Add dfasearch.c, kwsearch.c,
11341 pcresearch.c.
11342 * src/esearch.c, src/fsearch.c, * src/gsearch.c: Only include search.h.
11343 * src/dfasearch.c (GEAcompile, EGexecute): Export.
11344 * src/kwsearch.c (Fcompile, Fexecute): Export.
11345 * src/pcresearch.c (Pcompile, Pexecute): Export.
11346 * src/search.h: Add new exported functions.
11347
11348 grep: prepare for libification of *search.c
11349 * src/dfasearch.c (Ecompile): Remove.
11350 * src/esearch.c: Place it here...
11351 * src/gsearch.c: ... and here.
11352
11353 grep: split search.c
11354 * po/POTFILES.in: Update.
11355 * src/Makefile.am (grep_SOURCES, egrep_SOURCES, fgrep_SOURCES): Move
11356 kwset.c and dfa.c to libsearch.a. Add searchutils.c there too.
11357 * src/search.h, src/dfasearch.c, src/pcresearch.c, src/kwsearch.c,
11358 src/searchutils.c: New files, split out of src/search.c.
11359 * src/esearch.c, src/fsearch.c: Include the new files instead of search.c.
11360 * src/gsearch.c: Likewise, plus move Gcompile/Acompile here.
11361
11362 grep: remove one #ifdef
11363 * search.c (GEAcompile) [EGREP_PROGRAM]: Use common code. Inline IF_BK.
11364
113652010-03-22 Paolo Bonzini <bonzini@gnu.org>
11366
11367 grep: eliminate {COMPILE,EXECUTE}_{RET,ARGS,FCT}
11368 Modern compilers warn about type mismatches.
11369
11370 * src/grep.c (do_execute): Write full declaration.
11371 * src/grep.h (COMPILE_RET, COMPILE_ARGS, COMPILE_FCT, EXECUTE_RET,
11372 EXECUTE_ARGS, EXECUTE_FCT): Remove.
11373 (compile_fp_t, execute_fp_t): Write full declaration.
11374 * src/search.c (GEAcompile, Gcompile, Acompile, Ecompile, EGexecute,
11375 Fcompile, Fexecute, Pcompile, Pexecute): Write full declaration.
11376
113772010-03-22 Paolo Bonzini <bonzini@gnu.org>
11378
11379 grep: make egrep/fgrep use struct matcher
11380 * Makefile.am (grep_SOURCES): Add gsearch.c.
11381 (EXTRA_DIST): Add search.c.
11382 * esearch.c (matchers): New.
11383 * fsearch.c (matchers): New.
11384 * gsearch.c: New.
11385 * search.c (matchers): Remove.
11386 * grep.c: Always compile most !GREP_PROGRAM sections.
11387 (main): Use first matcher if none is explicitly provided. Remove
11388 "default" matcher.
11389 * grep.h (struct matcher): Adjust comments.
11390
11391 grep: change struct matcher termination
11392 * src/grep.c (setmatcher): Look for NULL matchers[i].name.
11393 * src/grep.h (struct matcher): Change name to pointer. Adjust comments.
11394 * src/search.c (matchers): Terminate with three NULLs.
11395
11396 grep: remove one #ifdef
11397 * search.c (Ecompile): Always go through GEAcompile to use same code path
11398 for both grep and egrep.
11399
11400 grep: remove getpagesize.h
11401 * src/getpagesize.h: Remove.
11402 * src/Makefile.am (noinst_HEADERS): Remove getpagesize.h.
11403
114042010-03-21 Jim Meyering <meyering@redhat.com>
11405
11406 build: use the fcntl-h module, not "fcntl"
11407 * bootstrap.conf (gnulib_modules): We might need fcntl.h somewhere,
11408 but don't use the fcntl function. Reported by Bruno Haible.
11409
11410 build: avoid link failure on systems using gnulib's fcntl but not open
11411 * bootstrap.conf (gnulib_modules): Using gnulib's fcntl module
11412 and including <fcntl.h>, but not also using gnulib's "open" module
11413 would result in link failure due to references to rpl_open
11414 on systems requiring the replacement (e.g., Cygwin and Darwin).
11415
11416 build: avoid compilation failure on systems using rpl_open
11417 This new build failure has arisen as a result of using gnulib's
11418 "fcntl" module. Now that an inadequate "open" syscall is replace
11419 by gnulib's wrapper, it is essential to include <fcntl.h>.
11420 * src/grep.c: Include <fcntl.h>.
11421 This is required, for grepfile's use of open, at least on
11422 Cygwin and Darwin.
11423
11424 maint: use gnulib's fcntl module, just in case
11425 * bootstrap.conf (gnulib_modules): Add fcntl.
11426 Grep uses at least O_BINARY, which may be defined therein.
11427
11428 maint: remove TYPE_* definitions from src/system.h
11429 * src/system.h (TYPE_MAXIMUM, TYPE_MINIMUM, TYPE_SIGNED): Remove
11430 definitions. They are provided by intprops.h.
11431 * src/grep.c: Include "intprops.h"
11432 * bootstrap.conf (gnulib_modules): Add intprops.
11433
11434 maint: alphabetize #include directives
11435 * src/grep.c: Alphabetize #include directives.
11436
114372010-03-20 Jim Meyering <meyering@redhat.com>
11438
11439 build: stop using gnulib's memmove module
11440 * bootstrap.conf (gnulib_modules): Remove obsolete module: memmove
11441
11442 build: reinstate gnulib's fcntl-h-tests
11443 * bootstrap.conf (gnulib_tool_option_extras): Do not avoid
11444 the fcntl-h-tests. I cannot reproduce the failure.
11445
114462010-03-20 Eric Blake <eblake@redhat.com>
11447
11448 build: allow compilation on cygwin
11449 Gnulib is incompatible with -Wunused-macros. Addtionally,
11450 cygwin 1.7.1 coupled with --enable-gcc-warnings tripped on:
11451
11452 grep.c: In function 'print_line_middle':
11453 grep.c:805: error: array subscript has type 'char' [-Wchar-subscripts]
11454 grep.c: In function 'main':
11455 grep.c:1833: error: 'optarg' redeclared without dllimport attribute: previous dllimport ignored [-Wattributes]
11456 grep.c:1834: error: 'optind' redeclared without dllimport attribute after being referenced with dll linkage
11457
11458 * configure.ac (GNULIB_WARN_FLAGS): Disable -Wunused-macros.
11459 * src/grep.c (print_line_middle): Use correct type to tolower.
11460 (main): Drop useless redeclarations.
11461 * .gitignore: Ignore more built files.
11462
114632010-03-20 Jim Meyering <meyering@redhat.com>
11464
11465 tests: ensure that all programs handle [b-a] consistently
11466 * tests/reversed-range-endpoints: New test.
11467 * tests/Makefile.am (TESTS): Add it.
11468
114692010-03-20 Jim Meyering <meyering@redhat.com>
11470
11471 build: update gnulib submodule to latest
11472 This pulls in the latest regex module from gnulib, including a fix
11473 to make it honor the RE_NO_EMPTY_RANGES syntax bit.
11474
11475 tests: temporarily disable irrelevant-to-grep failing C++ fcntl-h-tests
11476 * bootstrap.conf (gnulib_tool_option_extras): Temporarily add
11477 --avoid=fcntl-h-tests, until the C++ part of that test is fixed.
11478
114792010-03-20 Jim Meyering <meyering@redhat.com>
11480
11481 reject reversed-endpoint ranges, with all regex variants
11482 * src/search.c: Add RE_NO_EMPTY_RANGES to the syntax bits
11483 in three places, so that all of grep, egrep, and grep -E reject
11484 a range with reversed endpoints like '[b-a]'. This is required,
11485 when using the latest version of gnulib's regex module, since it
11486 now honors the RE_NO_EMPTY_RANGES flag, rather than acting as if
11487 it were always set.
11488 Based on a change by Matthew Burgess.
11489
114902010-03-19 Jim Meyering <meyering@redhat.com>
11491
11492 maint: correct macro parameter parentheses
11493 * src/dfa.c (FETCH_WC, FETCH): Parenthesize macro parameters.
11494
114952010-03-19 Paolo Bonzini <bonzini@gnu.org>
11496
11497 tests: change help-version to per-program functions
11498 * help-version: Change each *_args variable to a *_setup function.
11499
11500 dfa: fix wchar_t/wint_t type mismatch
11501 * src/dfa.c (FETCH_WC): Pass a local wchar_t variable to mbrtowc.
11502 (FETCH): Rename temporary second argument to FETCH_WC.
11503 (parse_bracket_exp): Always use FETCH_WC.
11504
115052010-03-19 Jim Meyering <meyering@redhat.com>
11506
11507 doc: add README-prereq, referenced from README-hacking
11508 * README-prereq: New file. Cloned from coreutils, s/coreutils/grep/
11509 Reported by Tony Abou-Assaleh.
11510
115112010-03-19 Arnold Robbins <arnold@skeeve.com>
11512
11513 maint: sync dfa comments from gawk
11514 * src/dfa.h (struct dfa) [newlines]: Amend comment.
11515 * src/dfa.c: Update copyright year list to include gawk's.
11516
115172010-03-17 Jim Meyering <meyering@redhat.com>
11518
11519 maint: remove obsolete "cvs-clean" make target
11520 * Makefile.am (cvs-clean): Remove obsolete target.
11521
115222010-03-17 Paolo Bonzini <bonzini@gnu.org>
11523
11524 dfa: initialize struct mbcset using memset
11525 * src/dfa.c (parse_bracket_exp): Use memset to initialize workmbc.
11526
11527 dfa: spell out "unsigned int"
11528 * dfa.c (setbit, tstbit, clrbit, setbit_case_fold, lex, dfaoptimize,
11529 free_mbdata): Put "int" after unsigned.
11530 * dfa.h (struct position, struct dfa): Likewise.
11531
115322010-03-17 Paolo Bonzini <bonzini@gnu.org>
11533
11534 dfa: optimize simple character sets under UTF-8 charsets
11535 Only use a bitset when possible without involving MBCSET. Testcase:
11536 yes 'the quick brown fox jumps over the lazy dog' | sed 100000q | \
11537 time grep -c [ABCDEFGHIJKLMNOPQRSTUVWXYZ,]
11538
11539 Before: 51ms (best of three runs); after: 16ms(best of three runs).
11540
11541 * src/dfa.c (parse_bracket_exp): For simple bracket expressions
11542 under UTF-8, use a CSET.
11543
115442010-03-17 Paolo Bonzini <bonzini@gnu.org>
11545
11546 dfa: speed up handling of brackets
11547 This patch has two sides. One is to fold the parsing of brackets in the
11548 single- and multi-byte cases. The second is to leverage this change,
11549 and use a bitset to test for single-byte characters in the charset.
11550 Splitting the two would be very hard.
11551
11552 Testcase:
11553 yes 'the quick brown fox jumps over the lazy dog' | sed 100000q | \
11554 time grep -c [ABCDEFGHIJKLMNOPQRSTUVWXYZ,]
11555
11556 Before: 59ms (best of three runs); after: 51ms (best of three runs).
11557 Nice, but mostly providing infrastructure for the next patch.
11558
11559 * src/dfa.c (setbit_case_fold): Try applying towlower/towupper.
11560 (looking_at): Remove.
11561 (FETCH_WC): New.
11562 (fetch_wc): Merge into FETCH_WC [MBS_SUPPORT].
11563 (FETCH) [MBS_SUPPORT]: Call FETCH_WC.
11564 (prednames, find_pred, is_blank and other predicates): Move above,
11565 remove K&R syntax support.
11566 (parse_bracket_exp): New name of parse_bracket_exp_mb, rewritten to
11567 include single-byte character set parsing of brackets.
11568 (lex): Adjust for fetch_wc->FETCH_WC change, remove single-byte
11569 character set parsing of brackets.
11570 (match_mb_charset): Test against work_mbc->cset.
11571 * src/dfa.h (struct mb_char_classes): Add cset.
11572
115732010-03-17 Paolo Bonzini <bonzini@gnu.org>
11574
11575 syntax-check: remove space-tab exception
11576 * .x-sc_space_tab: Remove.
11577 * src/dfa.c: Fix space-tab occurrence.
11578
11579 THANKS: fix Jim Meyering's email address
11580 * THANKS: Jim is now with Red Hat.
11581
11582 dfa: add missing function
11583 * src/dfa.c (using_utf8): New.
11584 (addtok_wc, free_mbdata, dfaoptimize) [!MBS_SUPPORT]: Do not define.
11585 (dfacomp) [!MBS_SUPPORT]: Do not call dfaoptimize.
11586
11587 tests: fix typo
11588 * fedora: Fix typo.
11589
11590 tests: use Exit
11591 * euc-mb: exit with "Exit 0".
11592
11593 grep: remove more register keywords
11594 * dosbuf.c: Remove register keywords.
11595 * grep.c: Remove register keywords.
11596 * kwset.c: Remove register keywords.
11597 * search.c: Remove register keywords.
11598
115992010-03-17 Paolo Bonzini <bonzini@gnu.org>
11600
11601 dfa: run simple UTF-8 regexps as a single-byte character set
11602 This provides a speedup whenever fgrep is "almost" sufficient but
11603 not quite (e.g. grep ^abc). This affects test cases such as
11604 https://savannah.gnu.org/bugs/?29117, which are already worked around
11605 by the line-by-line matching patch c32c04; without that patch the
11606 speedup can reach 1000x even on non-contrived testcases.
11607
11608 * src/dfa.c (dfaoptimize): New.
11609 (dfacomp): Call it.
11610
116112010-03-17 Paolo Bonzini <bonzini@gnu.org>
11612
11613 tests: fix syntax-check failures
11614 * tests/case-fold-backref: Use "foo" instead of "the".
11615 * tests/dfaexec-multibyte: Remove trailing blanks.
11616
116172010-03-17 Paolo Bonzini <bonzini@gnu.org>
11618
11619 grep: remove check_multibyte_string, fix non-UTF8 missed match
11620 Avoid computing ahead something that can be computed lazily as efficiently
11621 (or more efficiently in the case of UTF-8, though this is left as TODO).
11622 At the same time, "soften" the rejection condition for matching in the
11623 middle of a multibyte sequence to fix bug 23814.
11624
11625 Multibyte "grep -i" would still be very slow if it wasn't for the workaround
11626 patch c32c042 (grep: match multibyte charsets line-by-line when using -i,
11627 2010-03-08).
11628
11629 * NEWS: Document bugfix.
11630 * src/search.c (check_multibyte_string): Rewrite as...
11631 (is_mb_middle): ... this.
11632 (EGexecute, Fexecute): Adjust.
11633 * tests/Makefile.am (TESTS): Add euc-mb.
11634 * tests/euc-mb: New testcase.
11635
116362010-03-17 Paolo Bonzini <bonzini@gnu.org>
11637
11638 dfa: cache MB_CUR_MAX for dfaexec
11639 * src/dfa.c (state_index, dfaexec): Use d->mb_cur_max.
11640 (dfainit): Initialize it.
11641 (free_mbdata): New, extracted out of dfafree.
11642 (dfafree): Use it.
11643
11644 dfa: improve documentation of struct dfa
11645 * src/dfa.h (struct dfa): Reword some comments.
11646
11647 tests: factor name of output files into a variable
11648 * tests/case-fold-backref, tests/case-fold-char-class,
11649 tests/case-fold-char-range, tests/case-fold-char-type,
11650 tests/dfaexec-multibyte: Use a variable for the output filename,
11651 as it is common to the grep and compare invocations.
11652
11653 tests: use different output files to simplify reading failed .log files
11654 * tests/case-fold-backref, tests/case-fold-char-class,
11655 tests/case-fold-char-range, tests/case-fold-char-type: Use a different
11656 name for each output file from grep.
11657 * tests/dfaexec-multibyte: Likewise, and merge some grep invocations.
11658
11659 tests: add another grep -i testcase, from bug 16179
11660 * tests/case-fold-backref: New.
11661 * tests/Makefile.am (TESTS): Add it.
11662
116632010-03-16 Paolo Bonzini <bonzini@gnu.org>
11664
11665 dfa: rewrite handling of multibyte case_fold lexing
11666 Let dfacomp do the folding to lowercase of multibyte input strings,
11667 and remove it from grep.c. Input strings to kwset.c are still folded
11668 outside kwset.c, so we still need to do mbtolower in search.c.
11669
11670 * NEWS: Document bugfixes.
11671 * .x-sc_cast_of_argument_to_free: Remove.
11672 * src/dfa.c (wctok, addtok_wc): New.
11673 (cur_mb_index, update_mb_len_index): Remove.
11674 (FETCH): Do not call it.
11675 (parse_bracket_exp_mb) [GREP]: Disable case-folding of ranges and
11676 characters.
11677 (addtok): Extract part to...
11678 (addtok_mb): ... this new function.
11679 (lex): Call fetch_wc in the main loop for MB_CUR_MAX > 1. Return WCHAR
11680 for normal characters if MB_CUR_MAX > 1.
11681 (atom): Handle WCHAR instead of treating multibyte characters specially.
11682 Do case folding of multibyte characters here.
11683 (dfacomp): Remove case_fold special casing.
11684 * src/dfa.h (WCHAR): New.
11685 * src/grep.c (mb_icase_keys): Remove.
11686 (main): Do not call it.
11687 * src/search.c (kwsinit): Init transition table only for MB_CUR_MAX == 1.
11688 (mbtolower): New.
11689 (kwsincr_case): New.
11690 (kwsmusts): Call it instead of kwsincr.
11691 (check_multibyte_string): Remove.
11692 (check_multibyte_string_no_icase): Rename to check_multibyte_string.
11693 (GEAcompile, EGexecute, Fcompile): Use mbtolower instead of the old
11694 check_multibyte_string.
11695 * tests/Makefile.am (TESTS): Add case-fold-backslash-w.
11696 * tests/foad1.sh: Enable fixed tests.
11697 * tests/case-fold-backslash-w: New.
11698
116992010-03-16 Paolo Bonzini <bonzini@gnu.org>
11700
11701 grep: match multibyte charsets line-by-line when using -i
11702 The turtle combination -i + MB_CUR_MAX>1 requires case conversion ahead
11703 of time. Avoid doing this repeatedly when many matches succeed. Together
11704 with the previous changes, this fixes https://savannah.gnu.org/bugs/?29117
11705 and https://savannah.gnu.org/bugs/?14472.
11706
11707 * NEWS: Document new speedup.
11708 * src/grep.c (do_execute): New.
11709 (grepbuf): Use it.
11710
117112010-03-15 Paolo Bonzini <bonzini@gnu.org>
11712
11713 dfa: fix handling of ranges in multibyte character sets
11714 * src/dfa.c (parse_bracket_exp_mb): Add separate ranges for
11715 lowercase and uppercase endpoints if folding case.
11716 * tests/Makefile.am (TESTS): Add case-fold-char-range.
11717 * tests/case-fold-char-range: New.
11718
11719 tests: add more UTF-8 test cases
11720 * tests/Makefile.am (TESTS): Add spencer1-locale.
11721 (EXTRA_DIST): Add spencer1-locale.awk.
11722 * tests/spencer1-locale.awk: New.
11723 * tests/spencer1-locale: New.
11724
117252010-03-15 Jim Meyering <meyering@redhat.com>
11726
11727 tests: complete the renaming fedora.sh -> fedora
11728 * tests/Makefile.am (TESTS): Rename fedora.sh -> fedora here, too.
11729
117302010-03-15 Jim Meyering <meyering@redhat.com>
11731
11732 * tests/fedora.sh: Rename to...
11733 * tests/fedora: ...this, to reflect new convention:
11734 Use the lack of a suffix to indicate we've converted to the new
11735 init.sh-using test framework.
11736
11737 tests: adjust fedora.sh to handle traps more portably
11738
117392010-03-15 Jim Meyering <meyering@redhat.com>
11740
11741 tests: adjust fedora.sh to handle traps more portably
11742 * tests/fedora.sh: Use "Exit", not "exit".
11743
11744 tests: for each test, set an envvar to its name
11745 * tests/Makefile.am (TESTS_ENVIRONMENT): Set GREP_TEST_NAME for
11746 each test. This is used to help make the output of hundreds of
11747 independent, often-parallel valgrind runs more manageable.
11748
117492010-03-14 Jim Meyering <meyering@redhat.com>
11750
11751 tests: clean up fedora.sh
11752 * tests/fedora.sh: Use "grep", not ${GREP}.
11753 Use init.sh.
11754 Use timeout 10, not sleep 1 (three times).
11755 The latter would always sleep for 3 seconds, and the test would
11756 fail with a false positive on a slow system or with a heavily
11757 instrumented (valgrind) executable.
11758
117592010-03-12 Jim Meyering <meyering@redhat.com>
11760
11761 build: avoid build failure with --enable-gcc-warnings
11762 * src/dfa.c: Don't include <assert.h>, now that it is not used.
11763 [DEBUG]: Remove #ifdef block.
11764
117652010-03-12 Paolo Bonzini <bonzini@gnu.org>
11766
11767 syntax-check: enable space-tab
11768 * cfg.mk (local-checks-to-skip): Enable space-tab.
11769 * .x-sc_space_tab: Add exceptions.
11770 * tests/status.sh: Fix occurrence.
11771
11772 syntax-check: enable m4-quote-check
11773 * cfg.mk (local-checks-to-skip): Enable m4-quote-check.
11774 * configure.ac: Fix occurrence.
11775
11776 syntax-check: enable makefile-TAB-only-indentation
11777 * cfg.mk (local-checks-to-skip): Enable makefile-TAB-only-indentation.
11778 * Makefile.am: Fix only occurrence.
11779
11780 grep: fix error-message-uppercase
11781 * cfg.mk (local-checks-to-skip): Enable error-message-uppercase.
11782 * src/dfa.c (parse_bracket_exp_mb, lex, dfaparse): Fix occurrences.
11783 * src/search.c (Pcompile, Pexecute): Fix occurrences.
11784
11785 dfa, grep: cleanup if-before-free and cast-of-argument-to-free
11786 * .x-sc_avoid_if_before_free: Remove.
11787 * .x-sc_cast_of_alloca_return_value: Remove.
11788 * .x-sc_cast_of_x_alloc_return_value: Remove.
11789 * .x-sc_cast_of_argument_to_free: Temporarily add src/search.c.
11790 * cfg.mk (local-checks-to-skip): Remove sc_cast_of_argument_to_free.
11791 * src/dfa.c (ifree): Remove.
11792 (dfamust, build_state, transit_state, dfafree): Do not do if-before-free,
11793 do not cast free argument to ptr_t or char *.
11794 (freelist): Call free instead of ifree.
11795 * src/dfa.h (ptr_t): Remove.
11796
117972010-03-12 Paolo Bonzini <bonzini@gnu.org>
11798
11799 dfa: remove CRANGE dead code
11800 The only use of CRANGE was removed by commit 193830d. In theory it is
11801 more correct to do what CRANGE did, but in practice it seems like it did
11802 not work.
11803
11804 * src/dfa.h (token): Remove CRANGE.
11805 * src/dfa.c (atom): Do not handle CRANGE.
11806 (prtok): Likewise.
11807
118082010-03-12 Paolo Bonzini <bonzini@gnu.org>
11809
11810 dfa: get rid of x*alloc
11811 * src/dfa.c: Include xalloc.h.
11812 (xmalloc, xrealloc, xcalloc): Remove.
11813
11814 grep: cleanup one const cast
11815 * src/search.c (GEAcompile): Do not reuse motif when operating on the
11816 (const) pattern, so we can make it non-const. Remove cast from free.
11817
11818 kwset/system: remove ptr_t
11819 * src/kwset.h: Declare kwset using an incomplete struct type.
11820 * src/system.h (ptr_t): Remove.
11821
118222010-03-12 Jim Meyering <meyering@redhat.com>
11823
11824 tests: add test cases for dfaexec bug
11825 * tests/dfaexec-multibyte: New test.
11826 * tests/Makefile.am (TESTS): Add it.
11827 Reported by Paolo Bonzini in http://bugzilla.redhat.com/544407
11828 and http://bugzilla.redhat.com/544406 .
11829
118302010-03-12 Jim Meyering <meyering@redhat.com>
11831
11832 dfa: manually merge gawk's dfaexec
11833 * src/dfa.c (dfaexec): Adjust API: return pointer, not offset, and
11834 take an "end" pointer parameter, rather than integral "size".
11835 Adjust comment accordingly.
11836 (build_state): Maintain d->newlines.
11837 (copytoks): Update multibyte_prop indices.
11838 (SKIP_REMAINS_MB_IF_INITIAL_STATE): Update a cast.
11839 Return NULL, rather than (size_t) -1.
11840 (realloc_trans_if_necessary): Realloc d->newlines.
11841 * src/dfa.h (struct dfa): New member, "newlines".
11842 (struct dfa) [GAWK]: New member, "broken".
11843 (dfaexec): Update prototype and copy the new comment from dfa.c.
11844
11845 dfa: make search.c use the new dfaexec API
11846
11847 * src/search.c: Adjust to new dfaexec API.
11848 Now, dfaexec returns a pointer, not an integer,
11849 and the third parameter is END, not buffer size.
11850 * src/dfa.c (dfaexec): Rewrite the function's comment.
11851 Don't just clobber *END. While doing that happens to be
11852 fine for gawk's usage, in grep, *END usually points to the
11853 first byte of the next buffer. Save the initial value,
11854 and restore it just before returning.
11855 * src/dfa.h (dfaexec): Update comment; include parameter names.
11856
118572010-03-12 Jim Meyering <meyering@redhat.com>
11858
11859 dfa: appease static analyzers
11860 * src/dfa.c (transit_state_singlebyte): Call abort rather
11861 than returning in a "can't happen" scenario.
11862 This stops clang from emitting a false-positive report (I think it
11863 was used-uninitialized) about a caller.
11864
118652010-03-11 Jim Meyering <meyering@redhat.com>
11866
11867 dfa: do not accept [[:UPPER:]] or [[:LOWER:]] internally
11868 * src/dfa.c (parse_bracket_exp_mb): Those class names are not
11869 valid, and rejected elsewhere, so there is no point in allowing
11870 upper or mixed-case versions here.
11871
118722010-03-11 Jim Meyering <meyering@redhat.com>
11873
11874 maint: remove a trailing space
11875 * src/search.c (EXECUTE_FCT): Remove trailing space.
11876
11877 maint: remove all uses of PARAMS
11878 Remove most with this:
11879 git grep -lw PARAMS |xargs perl -pi -e 's/\bPARAMS *\((.*)\);/$1;/'
11880 Remove the remainder manually.
11881
118822010-03-11 Jim Meyering <meyering@redhat.com>
11883
11884 maint: remove all uses of PARAMS
11885 * lib/savedir.h (PARAMS): Remove definitions manually.
11886 Remove the remaining ones via this command:
11887 git grep -l define.PARAMS |xargs perl -ni -e '/define PARAMS/ or print'
11888 * src/dfa.h (PARAMS): Remove definitions.
11889 * src/system.h (PARAMS): Likewise.
11890 Remove most uses with this:
11891 git grep -lw PARAMS |xargs perl -pi -e 's/\bPARAMS *\((.*)\);/$1;/'
11892 Remove the remainder manually.
11893
11894 maint: remove now-useless prototypes
11895 * src/dfa.c: Remove the prototype of each static, non-recursive
11896 function whose definition precedes first use.
11897
11898 grep: plug an inconsequential leak
11899 * src/grep.c (main): Plug a leak: free "keys".
11900
11901 grep: avoid useless allocations for empty GREP_OPTIONS
11902 * src/grep.c (prepend_default_options): Ignore GREP_OPTIONS
11903 when it's empty, not just when it's undefined.
11904 There are still relatively harmless leaks when GREP_OPTIONS
11905 is set and non-empty. We'll address those, eventually.
11906
119072010-03-09 Jim Meyering <meyering@redhat.com>
11908
11909 build: record build-from-clone tool requirements
11910 * bootstrap.conf (buildreq): This makes bootstrap fail with
11911 a clear explanation of the problem. Otherwise, you'd get into
11912 the build process and fail with something far more cryptic.
11913
11914 dfa: remove a trailing blank
11915 * src/dfa.c (dfaexec): No trailing blanks allowed.
11916
11917 dfa: sync a tiny change from gawk
11918 * src/dfa.c (state_index) [MBS_SUPPORT]: Initialize .mpbs.nelem member
11919 unconditionally. Also initialize .mbps.elems.
11920
11921 dfa: avoid a leak (work_mbc->chars)
11922 * src/dfa.c (parse_bracket_exp_mb): Remove useless (and leaked MALLOC).
11923
11924 doc+bootstrap: document build-from-git-clone process
11925 * bootstrap: Update from coreutils/gnulib.
11926 * README-hacking: New file, nearly identical to the one in coreutils.
11927
119282010-03-08 Paolo Bonzini <bonzini@gnu.org>
11929
11930 more work on TODO
11931 * TODO: More work on the first section. Use clearer section headers.
11932
119332010-03-08 Reuben Thomas <rrt@sc3d.org>
11934
11935 bring TODO up-to-date
11936 * TODO: merge with TODO section of http://www.gnu.org/software/grep/devel.html
11937 and remove done items. Some small bits of tidying also.
11938
119392010-03-07 Paolo Bonzini <bonzini@gnu.org>
11940
11941 simplify parsing of [a-z]
11942 * src/dfa.c (in_coll_range): New.
11943 (lex): Use it instead of regcomp/regexec.
11944
11945 Small refactoring in src/dfa.c
11946 * src/dfa.c (parse_bracket_exp_mb): Return MBCSET.
11947 (lex): Assign return value of parse_bracket_exp_mb to lasttok, return it.
11948
11949 use do...while(0) idiom
11950 * dfa.c (FETCH): Wrap with do...while(0).
11951
119522010-03-06 Paolo Bonzini <bonzini@gnu.org>
11953
11954 extract common code from if/else
11955 * dfa.c (dfaexec): Simplify logic for MB_CUR_MAX > 1 case.
11956
11957 remove register variable hacks
11958 * dfa.c (dfaexec): We can extract the address of a variable without fearing
11959 performance problems, modern compilers know better.
11960
11961 remove register keywords
11962 * dfa.c (dfaexec): Modern compilers just ignore it.
11963
11964 allow grep -Pz
11965 * NEWS: Document grep -P improvements.
11966 * src/search.c (Pcompile): Remove restriction on grep -Pz.
11967 * tests/pcre-z: New.
11968 * tests/Makefile.am (TESTS): Add pcre-z.
11969
11970 fix cross-line matching in PCRE backend
11971 * search.c (Pexecute): Split the buffer in lines and match each line
11972 separately.
11973 * tests/fedora.sh: Add regression testsuite.
11974
11975 fix formatting of NEWS
11976 * NEWS: fix formatting of 2.6 entries.
11977
11978 fix a bug in handling of -i and character type
11979 * dfa.c (parse_bracket_exp_mb): Convert [[:lower:]] and [[:upper]] to
11980 [[:alpha:]] when folding case.
11981 * tests/case-fold-char-type: New file. Test for the bug.
11982 * tests/Makefile.am (TESTS): Add it.
11983 * NEWS (Bug fixes): Mention it.
11984
11985 fix previous test case change
11986 * tests/case-fold-char-class: Do not reset fail to 0 after first test.
11987
119882010-03-06 Mike Frysinger <vapier@gentoo.org>
11989
11990 grep(1) man page: touchup --label option
11991 * doc/grep.1 (--label): Don't italicize ending period. Point to -H
11992 option.
11993
119942010-03-06 Paolo Bonzini <bonzini@gnu.org>
11995
11996 augment case-fold-char-class test case
11997 * tests/case-fold-char-class: Test matching lowercase against uppercase
11998 as well as vice versa.
11999
120002010-03-05 Reuben Thomas <rrt@sc3d.org>
12001
12002 doc: improve the discussion of PCRE
12003 * doc/grep.1: Add a sentence about Perl regular expressions,
12004 and point to pcresyntax(3) and pcrepattern(3).
12005 * doc/grep.texi: Likewise.
12006
120072010-03-05 Jim Meyering <meyering@redhat.com>
12008
12009 maint: dfa-sync: comment and dead-to-grep code: no semantic change
12010 * src/dfa.c: Sync a comment and some #ifdef GAWK code.
12011
12012 maint: dfa-sync: don't malloc zero
12013 * src/dfa.c (dfacomp): Skip case_fold logic when length is zero.
12014 This probably "no semantic change", but does improve efficiency in
12015 a degenerate case.
12016
12017 maint: dfa-sync: use CALLOC rather than equiv. MALLOC+initialize-loop
12018 * src/dfa.c (dfaanalyze): Sync from gawk. No semantic change.
12019
12020 dfa.c: add support for \s and \S
12021 * src/dfa.c (lex): Sync from gawk's dfa.c.
12022
12023 maint: dfa-sync: add omitted array initializer
12024 * src/dfa.c (prednames): Add a "0" to final initializer.
12025 No semantic change.
12026
12027 fix a bug in handling of -i and character classes
12028 * dfa.c (parse_bracket_exp_mb): Sync one part of this function
12029 from gawk's dfa.c, which was patched by Arnold D. Robbins.
12030 * tests/case-fold-char-class: New file. Test for the bug.
12031 * tests/Makefile.am (TESTS): Add it.
12032 (TESTS_ENVIRONMENT): Propagate LOCALE_FR and LOCALE_FR_UTF8
12033 definitions into tests.
12034 * NEWS (Bug fixes): Mention it.
12035
120362010-03-05 Paolo Bonzini <pbonzini@redhat.com>
12037
12038 Fedora Grep regression test suite
12039 * tests/Makefile.am (TESTS): Add fedora.sh.
12040 (CLEANFILES): Add several new files.
12041 * tests/fedora.sh: New file, originally by Lubomir Rintel but somewhat
12042 rewritten to avoid bashisms.
12043
120442010-03-05 Paolo Bonzini <bonzini@gnu.org>
12045
12046 convert AUTHORS file to UTF-8
12047 * AUTHORS: Convert to UTF-8.
12048
12049 eliminate invalid "ptr += (ptr2 - ptr1)"
12050 * lib/savedir.c (savedir): new_name_space and name_space do not point into
12051 the same object, so computing their difference is invalid. Similarly,
12052 summing the difference to namep is invalid because namep and the result
12053 point into different objects. Avoid this.
12054
12055 fix for bug 21276
12056 * lib/savedir.c (isdir1): Use realloc instead of calloc. Remove
12057 dead code.
12058 (savedir): Do not leak name_space if allocation of new_name_space fails.
12059
120602010-03-04 Jim Meyering <meyering@redhat.com>
12061
12062 tests: add a test based on an example from Paolo Bonzini
12063 * tests/word-multi-file: New test.
12064 * tests/Makefile.am (TESTS): Add it.
12065
12066 doc: document release procedure
12067 * README-release: New file.
12068
12069 build: update gnulib submodule to latest
12070
120712010-02-22 Paolo Bonzini <bonzini@gnu.org>
12072
12073 add --group-separator=FOO and --no-group-separator
12074 * src/grep.c (group_separator): New.
12075 (long_options): Add --group-separator=FOO and --no-group-separator.
12076 (prtext): Print group_separator instead of SEP_STR_GROUP. Optionally
12077 suppress the separator altogether.
12078 (main) Handle GROUP_SEPARATOR_OPTION.
12079 * doc/grep.texi (Context control): Document it.
12080 * NEWS: Mention it.
12081 * tests/yesno.sh: Add testcases.
12082
120832010-02-21 Jim Meyering <meyering@redhat.com>
12084
12085 tests: don't use "echo -n"
12086 * tests/foad1.sh: Use printf, not echo -n. The latter is not portable.
12087 Reported by Daniel Richman.
12088
120892010-02-08 Jim Meyering <meyering@redhat.com>
12090
12091 remove useless DJGPP-specific code
12092 * src/grep.c (grepfile): Remove now-useless DJGPP-specific code.
12093 Now, all S_IS* macros are guaranteed to be defined via gnulib.
12094
120952010-02-07 Jim Meyering <meyering@redhat.com>
12096
12097 tests: add help-version sanity tests from coreutils
12098 * tests/help-version: New test, from coreutils.
12099 * tests/Makefile.am (TESTS): Add it.
12100 (TESTS_ENVIRONMENT) [built_programs]: Define it.
12101
12102 tests: correct TESTS_ENVIRONMENT's PATH setting
12103 * tests/Makefile.am (TESTS_ENVIRONMENT): Set PATH to start with
12104 $(abs_top_builddir)/src, so that we test the programs we've just built.
12105
12106 grep: use the correct exit status (2) upon write failure, not 1
12107 * src/grep.c (main): Initialize exit_failure to EXIT_TROUBLE.
12108 * NEWS (Bug fixes): Mention this fix.
12109
12110 maint: enable the prohibit_magic_number_exit syntax check
12111 * cfg.mk (local-checks-to-skip): Remove sc_prohibit_magic_number_exit,
12112 to enable that check.
12113 * src/system.h (EXIT_TROUBLE): Define.
12114 * src/grep.c: Use symbolic names, EXIT_SUCCESS, EXIT_FAILURE, and
12115 EXIT_TROUBLE, not 0, 1, 2.
12116 * src/search.c: Likewise.
12117 * src/vms_fab.c (string): Likewise.
12118
121192010-02-04 Jim Meyering <meyering@redhat.com>
12120
12121 doc: adjust NEWS item
12122 * NEWS: Correct a description.
12123
121242010-02-03 Jim Meyering <meyering@redhat.com>
12125
12126 tests: exercise surprising -m1 vs. --context behavior
12127 * tests/max-count-vs-context: New test. Exercise the surprising,
12128 but documented, behavior reported by Markus Jochim in
12129 http://savannah.gnu.org/bugs/?28588.
12130 * tests/Makefile.am (TESTS): Add it.
12131
12132 tests: use init.sh from gnulib
12133 * tests/init.sh: New file, from gnulib.
12134 * tests/Makefile.am (EXTRA_DIST): Add it.
12135 (TESTS_ENVIRONMENT): Add variables and features.
12136 (VERBOSE): Define.
12137
12138 maint: remove unused Makefile rule
12139 * tests/Makefile.am (dist-hook): Remove rule. No longer needed.
12140
12141 maint: adjust formatting in tests/Makefile.am
12142 * tests/Makefile.am (TESTS, CLEANFILES): Align and sort.
12143
12144 build: avoid warnings in gnulib-supplied regex files
12145 Now that we enable more warnings in lib/, we choose
12146 to avoid some via patches applied by bootstrap, using
12147 files in the gl/ hierarchy. Other, less-important
12148 warnings are avoided simply by turning off the
12149 -Wold-style-definition option and using a slightly
12150 relaxed set of warnings $(GNULIB_WARN_CFLAGS) in lib/.
12151 * gl/lib/regcomp.c.diff: Avoid warnings.
12152 * gl/lib/regex_internal.c.diff: Likewise.
12153 * gl/lib/regex_internal.h.diff: Likewise.
12154 * gl/lib/regexec.c.diff: Likewise.
12155 * configure.ac (GNULIB_PORTCHECK): Disable only -Wold-style-definition.
12156 * lib/Makefile.am (AM_CFLAGS): Use $(GNULIB_WARN_CFLAGS) rather
12157 than the slightly more strict $(WARN_CFLAGS).
12158
12159 tests: adjust spencer #37 to pass with gnulib's regex code
12160 * tests/spencer1.tests: Change #37 to expect an exit status of 2, not 1.
12161 grep 'a[b-a]' reports "Invalid range end".
12162
12163 maint: use regex from gnulib, rather than our bit-rotting one
12164 * bootstrap.conf (gnulib_modules): Add regex.
12165 * configure.ac: Don't use jm_INCLUDED_REGEX.
12166 Update use of cache variable.
12167 * lib/regex.c: Remove file.
12168 * lib/regex.h: Likewise.
12169 * m4/regex.m4: Likewise.
12170 * POTFILES.in: Update to match.
12171
12172 build: update gnulib submodule to latest
12173
121742010-01-28 Jim Meyering <meyering@redhat.com>
12175
12176 maint: update to latest gnulib; adjust cfg.mk
12177 * gnulib: Update submodule to latest.
12178 * cfg.mk (old_NEWS_hash): Update to reflect NEWS Copyright line change.
12179
121802010-01-06 Jim Meyering <meyering@redhat.com>
12181
12182 maint: avoid old jm_* macros
12183 There were jm_* macros here, until very recently.
12184 * cfg.mk (sc_prohibit_jm_in_m4): New rule, from coreutils.
12185
12186 maint: remove decl.m4
12187 * m4/decl.m4: Remove unused file.
12188
12189 maint: rely on gnulib's new isdir.h
12190 * src/grep.c: Include "isdir.h".
12191 * src/system.h: Remove declaration of isdir.
12192
12193 build: rename local to avoid shadowing global, dfa
12194 * src/dfa.c (dfamust): Rename parameter: s/dfa/d/.
12195
12196 build: avoid warning from -Wmissing-prototypes
12197 * src/dfa.c (match_mb_charset): Declare to be static.
12198
12199 build: avoid shadowing warning for "link"
12200 * src/kwset.c (link): Define to kwset_link, to avoid shadowing
12201 the function.
12202
12203 build: avoid shadowing warning for unused "rs"
12204 * src/dfa.c (transit_state): Remove dead stores;
12205 move a declaration "down".
12206 Ignore transit_state_consume_1char return value.
12207
12208 build: avoid shadowing warnings
12209 * src/dfa.c (match_mb_charset): Rename parameter: s/index/idx/.
12210 (check_matching_with_multibyte_ops, match_anychar): Likewise.
12211
12212 build: avoid warning about unused definition of N_
12213 * src/dfa.c (N_): Remove unused definition.
12214
12215 build: avoid format-string warnings
12216 * src/search.c (dfaerror): Use literal "%s" as format string.
12217 (kwsmusts, GEAcompile): Likewise.
12218 (Pcompile): Likewise.
12219
12220 build: add configure-time --enable-gcc-warnings option; avoid warnings
12221 * bootstrap.conf (gnulib_modules): Add "manywarnings" module.
12222 * configure.ac: Add --enable-gcc-warnings, derived from code in bison.
12223 * src/Makefile.am (AM_CFLAGS): Set to $(WARN_CFLAGS) $(WERROR_CFLAGS)
12224 * lib/Makefile.am (AM_CFLAGS): Likewise, but append.
12225
12226 build: remove now-useless -I../intl option
12227 * src/Makefile.am (INCLUDES): Remove -I../intl, now that intl is gone.
12228
12229 maint: avoid more warnings
12230 * src/grep.c (MAX): Remove definition of unused macro.
12231 (usage): Declare with __attribute__ ((noreturn)).
12232 Split long strings into chunks of length < 509.
12233
12234 fix a possible bug: remove errant semicolon
12235 * src/grep.c (prline): Remove erroneous semicolon-after-if-expr.
12236
12237 maint: avoid compilation warnings
12238 * bootstrap.conf (gnulib_modules): Add ignore-value.
12239 * src/search.c (check_multibyte_string_no_icase): A variant of
12240 check_multibyte_string that does *not* convert case, and hence
12241 does not modify its BUF parameter.
12242 (check_multibyte_string): Use xcalloc in place of xmalloc+memset.
12243 Use ignore_value to ignore the return value from wcrtomb. This is
12244 ok, since we know the input is a valid upper case wide character.
12245 (Fexecute, EGexecute): Update callers of check_multibyte_string
12246 to use both it and check_multibyte_string_no_icase.
12247
12248 maint: avoid warnings about unused fwrite return value
12249 * bootstrap.conf (gnulib_modules): Add unlocked-io.
12250 * src/system.h: Include "unlocked-io.h".
12251
12252 maint: remove {m4,lib}/.gitignore; they were undergoing too much churn
12253 * .gitignore: Ignore all of m4/* except m4/djgpp.m4
12254 and all of lib/* except Makefile.am, savedir.c and savedir.h.
12255 * m4/.gitignore: Remove file.
12256 * lib/.gitignore: Remove file.
12257
122582010-01-05 Jim Meyering <meyering@redhat.com>
12259
12260 build: run gnulib's tests, too
12261 * Makefile.am (SUBDIRS): Add gnulib-tests.
12262 * gnulib-tests/Makefile.am: New file.
12263 * bootstrap.conf (bootstrap_epilogue): New function, from coreutils.
12264 (gnulib_tool_option_extras): Define.
12265 * configure.ac: Add gnulib-tests/Makefile.
12266
122672010-01-03 Jim Meyering <meyering@redhat.com>
12268
12269 maint: record update-copyright options for this package
12270 * cfg.mk: Next time, just run "make update-copyright".
12271
122722010-01-01 Jim Meyering <meyering@redhat.com>
12273
12274 maint: update all FSF copyright year lists to include 2010
12275 Use this command:
12276 git ls-files |grep -vE '^(\..*|COPYING|gnulib)$' |xargs \
12277 env UPDATE_COPYRIGHT_USE_INTERVALS=1 build-aux/update-copyright
12278
122792009-12-23 Jim Meyering <meyering@redhat.com>
12280
12281 fix multi-byte-locale read-beyond-end-of-buffer error
12282 Avoid read-beyond-end-of-buffer errors, evoked by running this:
12283 LC_ALL=en_US.UTF-8 valgrind src/grep -f <(printf 'a\nb\n') <(echo c)
12284
12285 Conditional jump or move depends on uninitialised value(s)
12286 at 0x78136D: __gconv_transform_utf8_internal (in /lib/libc-2.11.so)
12287 by 0x7E7232: mbrtowc (in /lib/libc-2.11.so)
12288 by 0x8055773: dfaexec (dfa.c:2816)
12289 by 0x804D7B0: EGexecute (search.c:353)
12290 by 0x804ACD8: grepbuf (grep.c:1036)
12291 by 0x804B023: grep (grep.c:1156)
12292 by 0x804B460: grepfile (grep.c:1287)
12293 by 0x804CF0D: main (grep.c:2282)
12294
12295 Conditional jump or move depends on uninitialised value(s)
12296 at 0x7E7248: mbrtowc (in /lib/libc-2.11.so)
12297 by 0x8055773: dfaexec (dfa.c:2816)
12298 by 0x804D7B0: EGexecute (search.c:353)
12299 by 0x804ACD8: grepbuf (grep.c:1036)
12300 by 0x804B023: grep (grep.c:1156)
12301 by 0x804B460: grepfile (grep.c:1287)
12302 by 0x804CF0D: main (grep.c:2282)
12303
12304 * src/dfa.c (dfaexec) [MBS_SUPPORT]: Do not access one byte beyond
12305 end of buffer.
12306
123072009-12-23 Jim Meyering <meyering@redhat.com>
12308
12309 build: update gnulib submodule to latest
12310
123112009-12-23 Paolo Bonzini <bonzini@gnu.org>
12312
12313 Speed up insert.
12314 Suggested by Johan Walles <johan.walles@gmail.com> (bug 23354).
12315
12316 * src/dfa.c (insert): Use binary search.
12317
123182009-12-23 Johan Walles <johan.walles@gmail.com>
12319
12320 Decrease epsclosure memory usage
12321 Fixes bug 23321.
12322
12323 * src/dfa.c (epsclosure): Make visited an array of char.
12324
123252009-12-22 Paolo Bonzini <bonzini@gnu.org>
12326
12327 Make 'grep -1 -2' and 'grep -1v2' equivalent to grep -2
12328 Fixes bug 12128.
12329
12330 * src/grep.c (get_nondigit_option): Reset the buffer every time
12331 a non-digit option is found or a new argument is started.
12332
123332009-12-22 Paolo Bonzini <bonzini@gnu.org>
12334
12335 Improve description of --label
12336 Fixes bug 22681.
12337
12338 * doc/grep.1 (--label): Use -H in the example, improve wording.
12339 * doc/grep.texi (Output Line Prefix Control): Likewise.
12340
123412009-12-22 Paolo Bonzini <bonzini@gnu.org>
12342
12343 Avoid using an invalid memchr result.
12344 Related to bug 13161. I cannot find a testcase, but it is better to be
12345 defensive considering that these bug were found in the past.
12346
12347 * src/search.c (EGexecute, Fexecute): Check for memchr return values.
12348
123492009-12-11 Jim Meyering <meyering@redhat.com>
12350
12351 build: update gnulib submodule to latest
12352
123532009-12-04 Jim Meyering <meyering@redhat.com>
12354
12355 maint: enable prohibit_have_config_h check
12356 * cfg.mk (local-checks-to-skip): Enable sc_prohibit_have_config_h
12357 * lib/regex.c: Remove useless cpp test of HAVE_CONFIG_H.
12358 * lib/savedir.c: Likewise.
12359 * src/grep.c: Likewise.
12360 * src/kwset.c: Likewise.
12361 * src/search.c: Likewise.
12362
12363 maint: enable cast_of_x_alloc_return_value check
12364 * cfg.mk (local-checks-to-skip): Enable sc_cast_of_x_alloc_return_value.
12365 * .x-sc_cast_of_x_alloc_return_value:
12366 * src/dfa.c (CALLOC, MALLOC, REALLOC): Remove casts.
12367 * src/dosbuf.c (undossify_input): Likewise.
12368 * src/grep.c (print_line_middle, prepend_default_options): Likewise.
12369
12370 maint: enable cast_of_alloca_return_value check
12371 * cfg.mk (local-checks-to-skip): Enable sc_cast_of_alloca_return_value.
12372 * .x-sc_cast_of_alloca_return_value: New file.
12373
123742009-12-04 Paolo Bonzini <bonzini@gnu.org>
12375
12376 fix "grep -Ff" on CRLF-terminated files
12377 * src/search.c (Fcompile) [HAVE_DOS_FILE_CONTENTS]: Recognize \r\n as
12378 a line terminator.
12379
12380 fix compilation with included regex
12381 * Makefile.am (libgreputils_a_DEPENDENCIES): New.
12382
12383 switch to pkg-config for PCRE detection
12384 * configure.ac: use pkg-config to detect PCRE
12385 * src/Makefile.am (grep_LDADD): link grep with PCRE_LIBS
12386
123872009-12-04 Jim Meyering <meyering@redhat.com>
12388
12389 maint: remove "missing" script
12390 * missing: Remove now-unused file.
12391
12392 maint: make .gitignore ignore more
12393 * .gitignore: Ignore more.
12394
12395 maint: enable useless-if-before-free check
12396 * cfg.mk (local-checks-to-skip): Enable sc_avoid_if_before_free.
12397 * .x-sc_avoid_if_before_free: New file. Exempt regex.c and dfa.c,
12398 in case anyone ever tries to merge their contents with other versions.
12399 * src/grep.c (print_line_middle, grepdir): Remove useless if-before-free.
12400 * src/search.c (IF_BK, EXECUTE_FCT): Likewise.
12401
12402 maint: enable po-check
12403 * cfg.mk (local-checks-to-skip): Enable sc_po_check.
12404 * po/POTFILES.in: Sort and update.
12405
124062009-12-03 Paolo Bonzini <bonzini@gnu.org>
12407
12408 update gnulib, fixing missing inclusion of stdbool.h
12409 * gnulib: Update.
12410
124112009-11-30 Jim Meyering <meyering@redhat.com>
12412
12413 maint: enable two checks
12414 * cfg.mk (local-checks-to-skip): Enable two:
12415 sc_prohibit_xalloc_without_use sc_two_space_separator_in_usage
12416 * src/grep.c (usage): Conform: use two spaces, not 1.
12417 * src/kwset.c (malloc): Define as a function-macro so that the
12418 syntax-check rule sees that we are indeed using xmalloc here.
12419
12420 maint: enable makefile_path_separator check
12421 * cfg.mk (local-checks-to-skip): Enable sc_makefile_path_separator_check,
12422 now that the sole offender, an old po/Makefile.in.in, is gone.
12423
12424 maint: remove now-generated file: po/Makefile.in.in
12425 * po/Makefile.in.in: Remove file, now generated via bootstrap.
12426
12427 maint: enable makefile @...@ check
12428 * cfg.mk (local-checks-to-skip): Enable sc_makefile_check.
12429 * lib/Makefile.am (libgreputils_a_LIBADD): Use $(...), rather than
12430 anachronistic @...@ notation.
12431 * src/Makefile.am (LDADD): Likewise.
12432 * tests/Makefile.am (AWK): Remove definition.
12433
12434 maint: enable trailing_blank check
12435 * cfg.mk (local-checks-to-skip): Enable sc_trailing_blank.
12436 * AUTHORS: Remove trailing blanks.
12437 * COPYING: Likewise.
12438 * README: Likewise.
12439 * README-alpha: Likewise.
12440 * README-boot: Likewise.
12441 * THANKS: Likewise.
12442 * TODO: Likewise.
12443 * src/dfa.c: Likewise.
12444 * src/mbsupport.h: Likewise.
12445 * tests/backref.sh: Likewise.
12446 * tests/file.sh: Likewise.
12447 * tests/options.sh: Likewise.
12448 * tests/tests: Likewise.
12449 * vms/README: Likewise.
12450 * vms/make.com: Likewise.
12451
12452 maint: enable unmarked_diagnostics check
12453 * cfg.mk (local-checks-to-skip): Enable sc_unmarked_diagnostics
12454 * src/grep.c (fillbuf): Mark a diagnostic for translation.
12455 (reset): Likewise.
12456
12457 maint: enable require_config_h checks
12458 * cfg.mk (local-checks-to-skip): Enable sc_require_config_h
12459 and sc_require_config_h_first.
12460 * src/dosbuf.c: Include <config.h>.
12461 * src/vms_fab.c: Likewise.
12462 * .x-sc_require_config_h: New file: list the exceptions.
12463 * .x-sc_require_config_h_first: Likewise.
12464
12465 maint: use gnulib's progname module; enable set_program_name check
12466 * bootstrap.conf (gnulib_modules): Add progname.
12467 * src/grep.c: Include "progname.h".
12468 (program_name): Remove declaration.
12469 (main): Call set_program_name.
12470 * cfg.mk (local-checks-to-skip): Add sc_program_name.
12471
12472 maint: enable "file system" check
12473 * cfg.mk (local-checks-to-skip): Enable sc_file_system.
12474 * lib/savedir.c (savedir): Tweak spelling. Remove trailing blanks.
12475
12476 maint: enable immutable_NEWS check
12477 * NEWS: Move copyright to the bottom.
12478 Use the format required by release-related tools.
12479 * .prev-version: New file.
12480 * cfg.mk (old_NEWS_hash): Define.
12481 (local-checks-to-skip): Enable check: sc_immutable_NEWS.
12482
12483 maint: disable the many failing syntax-checks
12484 * cfg.mk: New file.
12485 (local-checks-to-skip): Define to the list of disabled rules.
12486 Subsequent change-sets will enable them, one by one.
12487
12488 build: require automake-1.11, enable silent-rules, parallel tests, xz
12489 * configure.ac (AM_INIT_AUTOMAKE): Create xz-compressed tarballs,
12490 not bzip2-compressed ones. Enable automake's silent-rules,
12491 parallel tests, and test PASS/FAIL coloring options.
12492 Use AC_CONFIG_HEADERS, not AM_CONFIG_HEADER. Quote the argument.
12493
12494 build: use git-version-gen for inter-release version strings
12495 * configure.ac (AC_INIT): Use git-version-gen.
12496
12497 build: add several build- and release-related gnulib modules
12498 * bootstrap.conf (gnulib_modules): Add announce-gen update-copyright
12499 do-release-commit-and-tag git-version-gen gnu-web-doc-update
12500 gnupload maintainer-makefile useless-if-before-free
12501
12502 build: adapt to the newer closeout module from gnulib
12503 * src/grep.c: Include "exitfail.h".
12504 (main) [-q]: Set the global variable, exit_failure, rather than
12505 calling the now-removed close_stdout_set_file_name function.
12506
12507 build: adapt to the newer exclude API we now get from gnulib
12508 * src/grep.c (main): Adapt to newer exclude.c: add EXCLUDE_WILDCARDS as
12509 the new "option" argument in calls to add_exclude and add_exclude_file.
12510
12511 build: get more lib/* files from gnulib, adjust savedir
12512 * bootstrap.conf (gnulib_modules): Add the following:
12513 closeout exclude hard-locale isdir strtoumax.
12514 * lib/.gitignore, m4/.gitignore: Update.
12515 * lib/closeout.c, lib/closeout.h: Remove.
12516 * lib/exclude.c, lib/exclude.h: Remove.
12517 * lib/hard-locale.c, lib/hard-locale.h: Remove.
12518 * lib/strtoumax.c: Remove.
12519 * lib/isdir.c: Remove.
12520 * lib/Makefile.am: Remove here, too.
12521 * lib/savedir.c: Adapt to new exclude module:
12522 s/excluded_filename/excluded_file_name/ and remove 3rd argument.
12523
12524 build: update gnulib submodule to latest
12525
12526 maint: generate ChangeLog from git logs
12527 * Makefile.am (dist-hook, gen-ChangeLog): New rules.
12528 * bootstrap.conf (gnulib_modules): Add gitlog-to-changelog.
12529 Ensure that ChangeLog exists.
12530 * ChangeLog-2009: Rename from ChangeLog
12531 * ChangeLog: Remove file.
12532 * .gitignore: Add ChangeLog.
12533
12534 maint: list gnulib modules one per line
12535 * bootstrap.conf (gnulib_modules): List them one per line.
12536
125372009-11-29 Tony Abou-Assaleh <taa@acm.org>
12538
12539 Acknowledge new maintainers, update README-alpha
12540 * AUTHORS: new maintainers added
12541 * THANKS: same
12542 * README-alpha: change CVS references to Git
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette