ucar.unidata.ui.test.Diff

public class Diff extends Object

diff Text file difference utility. ---- Copyright 1987, 1989 by Donald C. Lindsay, School of Computer Science, Carnegie Mellon University. Copyright 1982 by Symbionics. Use without fee is permitted when not for direct commercial advantage, and when credit to the source is given. Other uses require specific permission.

Conversion is NOT FULLY TESTED.

USAGE: diff oldfile newfile

This program assumes that "oldfile" and "newfile" are text files. The program writes to stdout a description of the changes which would transform "oldfile" into "newfile".

The printout is in the form of commands, each followed by a block of text. The text is delimited by the commands, which are:

DELETE AT n ..deleted lines

INSERT BEFORE n ..inserted lines

n MOVED TO BEFORE n ..moved lines

n CHANGED FROM ..old lines CHANGED TO ..newer lines

The line numbers all refer to the lines of the oldfile, as they are numbered before any commands are applied. The text lines are printed as-is, without indentation or prefixing. The commands are printed in upper case, with a prefix of ">>>>", so that they will stand out. Other schemes may be preferred. Files which contain more than MAXLINECOUNT lines cannot be processed. This can be fixed by changing "symbol" to a Vector. The algorithm is taken from Communications of the ACM, Apr78 (21, 4, 264-), "A Technique for Isolating Differences Between Files." Ignoring I/O, and ignoring the symbol table, it should take O(N) time. This implementation takes fixed space, plus O(U) space for the symbol table (where U is the number of unique lines). Methods exist to change the fixed space to O(N) space. Note that this is not the only interesting file-difference algorithm. In general, different algorithms draw different conclusions about the changes that have been made to the oldfile. This algorithm is sometimes "more right", particularly since it does not consider a block move to be an insertion and a (separate) deletion. However, on some files it will be "less right". This is a consequence of the fact that files may contain many identical lines (particularly if they are program source). Each algorithm resolves the ambiguity in its own way, and the resolution is never guaranteed to be "right". However, it is often excellent. This program is intended to be pedagogic. Specifically, this program was the basis of the Literate Programming column which appeared in the Communications of the ACM (CACM), in the June 1989 issue (32, 6, 740-755). By "pedagogic", I do not mean that the program is gracefully worded, or that it showcases language features or its algorithm. I also do not mean that it is highly accessible to beginners, or that it is intended to be read in full, or in a particular order. Rather, this program is an example of one professional's style of keeping things organized and maintainable. The program would be better if the "print" variables were wrapped into a struct. In general, grouping related variables in this way improves documentation, and adds the ability to pass the group in argument lists. This program is a de-engineered version of a program which uses less memory and less time. The article points out that the "symbol" arrays can be implemented as arrays of pointers to arrays, with dynamic allocation of the subarrays. (In C, macros are very useful for hiding the two-level accesses.) In Java, a Vector would be used. This allows an extremely large value for MAXLINECOUNT, without dedicating fixed arrays. (The "other" array can be allocated after the input phase, when the exact sizes are known.) The only slow piece of code is the "strcmp" in the tree descent: it can be speeded up by keeping a hash in the tree node, and only using "strcmp" when two hashes happen to be equal.

Change Log ---------- 1Jan97 Ian F. Darwin: first working rewrite in Java, based entirely on D.C.Lindsay's reasonable C version. Changed comments from /***************** to /**, shortened, added whitespace, used tabs more, etc. 6jul89 D.C.Lindsay, CMU: fixed portability bug. Thanks, Gregg Wonderly. Just changed "char ch" to "int ch". Also added comment about way to improve code. 10jun89 D.C.Lindsay, CMU: posted version created. Copyright notice changed to ACM style, and Dept. is now School. ACM article referenced in docn. 26sep87 D.C.Lindsay, CMU: publication version created. Condensed all 1982/83 change log entries. Removed all command line options, and supporting code. This simplified the input code (no case reduction etc). It also simplified the symbol table, which was capable of remembering offsets into files (instead of strings), and trusting (!) hash values to be unique. Removed dynamic allocation of arrays: now fixed static arrays. Removed speed optimizations in symtab package. Removed string compression/decompression code. Recoded to Unix standards from old Lattice/MSDOS standards. (This affected only the #include's and the IO.) Some renaming of variables, and rewording of comments. 1982/83 D.C.Lindsay, Symbionics: created.

Version:: Java version 0.9, 1997
Author:: Ian F. Darwin, Java version, D. C. Lindsay, C version (1982-1987)

Field Summary

Fields

Modifier and Type

Field

Description

static final int

change

static final int

delete

static final int

idle

static final int

insert

static final int

movenew

static final int

moveold

static final int

same
Constructor Summary

Constructors

Constructor

Description

Diff(String id)

Construct a Diff object.
Method Summary

Modifier and Type

Method

Description

boolean

doDiff(Reader oldFile, Reader newFile)

boolean

doDiff(Reader oldFile, Reader newFile, Writer out)

boolean

doDiff(String data1, String data2)

boolean

doDiff(String data1, String data2, Writer out)

static void

main(String[] argstrings)

main - entry point when used standalone.

void

println(String s)

Convenience wrapper for println

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- idle
  
  public static final int idle
  See Also:
  
  Constant Field Values
- delete
  
  public static final int delete
  See Also:
  
  Constant Field Values
- insert
  
  public static final int insert
  See Also:
  
  Constant Field Values
- movenew
  
  public static final int movenew
  See Also:
  
  Constant Field Values
- moveold
  
  public static final int moveold
  See Also:
  
  Constant Field Values
- same
  
  public static final int same
  See Also:
  
  Constant Field Values
- change
  
  public static final int change
  See Also:
  
  Constant Field Values
Constructor Details
- Diff
  
  public Diff(String id)
  
  Construct a Diff object.
Method Details
- main
  
  public static void main(String[] argstrings) throws Exception
  
  main - entry point when used standalone. NOTE: no routines return error codes or throw any local exceptions. Instead, any routine may complain to stderr and then exit with error to the system.
  
  Throws:
  
  Exception
- doDiff
  
  public boolean doDiff(Reader oldFile, Reader newFile, Writer out) throws Exception
  
  Throws:
  
  Exception
- doDiff
  
  public boolean doDiff(Reader oldFile, Reader newFile) throws Exception
  
  Throws:
  
  Exception
- doDiff
  
  public boolean doDiff(String data1, String data2, Writer out) throws Exception
  
  Throws:
  
  Exception
- doDiff
  
  public boolean doDiff(String data1, String data2) throws Exception
  
  Throws:
  
  Exception
- println
  
  public void println(String s)
  
  Convenience wrapper for println

Class Diff

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

idle

delete

insert

movenew

moveold

same

change

Constructor Details

Diff

Method Details

main

doDiff

doDiff

doDiff

doDiff

println