Updates as of 2011-06-04
The following documentation is very similar to the documentation that can be found in the README file in the JAR file.
Skip forward to the download section for instructions on downloading JCols if you'd rather not read this.
This utility parses text files by applying a user specified expression to each line. The line may first be split into columns by whitespace (the default) or by some regular expression (the -r option). The name "JCols" is simply an abbreviation of "Java Columns". This documentation can also be found in the README file included in the JAR file.
This utility is an alternative to things like AWK. The utility only has a command line interface. If you're interested in a application with a GUI to parse text files consider something like FlexText or File Parse, but I have not tried either of those programs. If you want a Java implementation of AWK look into jawk. Finally, in some cases the cut command or even bash/sh's set $line to extract fields from a line may be sufficient.
A future version of JCols may attempt to sandbox the expressions by using something like policy files in Java. Until then don't run untrusted expressions.
The latest JAR file may be downloaded by clicking here. The source code is included in the JAR file.
Copy the downloaded JAR file to wherever you like to keep JAR files. Optionally rename the file to simply jcols.jar. Also, optionally make the JAR executable if your system supports executable JAR files. Finally, optionally make sure the executable JAR file is in your path. The source code as well as the Eclipse project file is included in the JAR file.
Since JCols depends on Rhino there are at least three different ways that you can make Rhino available to JCols.
> yum install rhino
> ls -1 jcols.jar jcols.jar > mkdir jcols-lib > cp rhino.jar jcols-lib
> cp -i rhino.jar /usr/lib/jvm/java-1.6.0-openjdk-184.108.40.206.x86_64/jre/lib/ext
jcols.jar should be used as as a pipe (it takes text as standard input and writes text to standard output). Ideally the system should have executable JAR files in which case jcols.jar can simple be invoked as shown in the examples. If not the jcols.jar should be replaced with java -jar jcols.jar. The usage, which can be seen by running jcols.jar without any arguments, is:
Additionally the following shortcut means of accessing columns is supported (column numbers start at zero for the left most column):
The following examples use a simple data file:
> cat some-table aa1 bb1 cc1 aa2 bb2 cc2 aa3 bb3 cc3
Print out the line number and length of each line separated by a tab:
> jcols.jar 'n + "\t" + l.length' < some-table 1 11 2 11 3 11
Print out only the first and third columns with a ":" separating them. Convert the result to upper case Also, specify an input file (-f) rather than
reading from stdin:
> jcols.jar -f some-table '(s_0 + ":" + s_2).toUpperCase()' AA1:CC1 AA2:CC2 AA3:CC3
For the first two columns concatenate the column with its old value. Note that "null" is used for the first row when there is no old value. Also note that any column reference may be prefixed with "o" to get the old value:
> jcols.jar -j 'String.format("%s:%s %s:%s", s_0, os_0, s_1, os_1)' < \ some-table aa1:null bb1:null aa2:aa1 bb2:bb1 aa3:aa2 bb3:bb2
Note: See the final example for a demonstration of how the join (-n) option allows for a succinct alternative to the "String.format" used in the previous example.
Use the -v (verbose) and -k (keep temporary files) to see and inspect the generated Java source for the previous example:
> jcols.jar -jkv 'String.format("%s:%s %s:%s", s_0, os_0, s_1, os_1)' < \ some-table | head -2 Classpath URL : file:/tmp/JCols-103762802/ Generated Source : /tmp/JCols-103762802/org/selliott/jcols/LFilter.java
> jcols.jar -r '\S+?(\d+)\s+\S+?(\d+)\s+' '"num_col_2=" + s_0 + " accum_col_2=" + a_0 + \ " num_col_3=" + s_0 + " accum_col_3=" + a_0' < some-table num_col_2=1 accum_col_2=1 num_col_3=1 accum_col_3=1 num_col_2=2 accum_col_2=3 num_col_3=2 accum_col_3=3 num_col_2=3 accum_col_2=6 num_col_3=3 accum_col_3=6
Split some-table with letters and white space leaving only numbers and then print out the first two non-blank columns (like with AWK a -F option that matches the prefix of a line results in an initial blank column):
> jcols.jar -jF '[A-Za-z\s]+' '"|" + s_1 + ":" + s_2 + "|"' < some-table |1:1| |2:2| |3:3|
Attempt to convert the first two columns to integers. Since some-table is not made up of valid integers first map failed attempts to parse integers to a default (-d) value of 0. Finally ignore (-i) any exceptions by quietly continuing to the next line:
> jcols.jar -jdi 'n_0 + " " + n_1' < some-table 0 0 0 0 0 0
Illustrate calling into the Java (-j) standard library by prefixing each line with the current time in nanoseconds:
> jcols.jar -j 'System.nanoTime() + " " + l' < some-table 161436268625924 aa1 bb1 cc1 161436268713208 aa2 bb2 cc2 161436268725456 aa3 bb3 cc3
Effectively grep out a subset of the lines by returning the entire line (l) if some criteria is met (if the line ends with a "2"). Return null for lines that don't meet the criteria:
> jcols.jar -j 'l.endsWith("2") ? l : null' < some-table aa2 bb2 cc2
Take an expression that consists of a comma separated list of strings and join those strings with a join string (-n) equal to the tab character. Also trim off any leading and trailing whitespace (-t) from the line before any additional processing on it:
> jcols.jar -tn \\t 's_0, s_1, s_2' < some-table aa1 bb1 cc1 aa2 bb2 cc2 aa3 bb3 cc3
Although it is possible to use JCols programmatically from Java programs additional work needs to be done to make it easier. Consult org.selliott.jcols.JCols.main() for an example of creating an instance of JCols and then calling the process() method on that instance.
This utility is covered by the GPL license version 2 or later. A copy of the license is included in jcols.jar file. The GPL version 2 license can also be found at http://www.gnu.org/licenses/gpl-2.0.txt
Feedback may be given by posting comments here. The latest version of this code should always be here.