Ingram Olkin wanted the following statistics for the period 1985--1994.
We present a quick-and-dirty perl hack to do this.
The program is structured as follows.
<*>= #! /usr/local/bin/perl <Copyright> <Local variables> <Process files> <Print results>
<Copyright>= (<-U) # # $Revision: 1.1 $ # # Copyright (C) 1997, B. Narasimhan (naras@stat.Stanford.EDU) # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # #
We need to know where the CIS files are located. Let us use
\$cis_directory to denote the directory and \$cis_ext denote
the extension to the file names.
<Local variables>= (<-U) [D->] $cis_directory = '/usr/local/lib/cis/cis95/files'; $cis_ext = '.v95';
Defines$cis_directory,$cis_ext(links are to index).
<Local variables>+= (<-U) [<-D->] @years = (85..94);
Defines@years(links are to index).
Some variables for counts which we initialize to zero.
<Local variables>+= (<-U) [<-D] $sj = 0; # No of single author journal articles. $sp = 0; # No of single author proceedings articles. $mj = 0; # No of multiple author journal articles. $mja = 0; # No of multiple author alphabetic journal articles. $mp = 0; # No of multiple author proceedings articles. $mpa = 0; # No of multiple author alphabetic proceedings articles.
Defines$mj,$mja,$mp,$mpa,$sj,$sp(links are to index).
We basically see if each file exists and is a text file and open it.
<Process files>= (<-U)
YEAR:
foreach $year (@years) {
$filename = $cis_directory . '/' . 'cis' . $year . $cis_ext;
next YEAR unless -T $filename;
if (!open(FH, $filename)) {
print STDERR "Can't open $filename---continuing...\n";
next YEAR;
}
while (<FH>) {
<Split fields>
<Skip irrelevant stuff>
<Determine no of authors>
<Update counts>
}
}
Splitting the fields is trivial once we know the format of the records. Notice that we ignore the trailing `garbage' in the records.
<Split fields>= (<-U)
($null,$field1,$title,$authors) = split('#');
Defines$authors,$field1,$null,$title(links are to index).
Skipping irrelevant stuff---we skip books, electronic publications and other administrative records.
<Skip irrelevant stuff>= (<-U) [D->] $cite_ind = substr($field1,-1,1); # Skip books, electronic literature or administrative records next if ($cite_ind =~ /[Bb]/) || ($cite_ind =~ /C/) || ($cite_ind =~ /Z/);
Defines$cite_ind(links are to index).
Oh! there are a bunch of entries for reviews of books, which we should skip too.
<Skip irrelevant stuff>+= (<-U) [<-D] # Skip reviews of books. next if ($authors =~ /\(Rev\)/);
Multiple authors are separated by semicolons in the author field.
<Determine no of authors>= (<-U)
@authors = split(';', $authors);
$no_authors = $#authors + 1;
Defines$no_authors(links are to index).
So we are now down to the last part: updating counts. To detect if the
authors are listed alphabetically, we sort the @author array and
pack it back into a string with semicolons, just as the field would
look if the authors were listed in alphabetic order. So, if the newly
constructed string matches the field obtained from the file, the
authors are indeed in alphabetic order! We only need to take care
to update the relevant counters.
<Update counts>= (<-U)
if ($no_authors > 1) {
$sorted_authors = join(';', sort @authors);
}
if ($cite_ind =~ /[Jj]/) { # article in Journal
if ($no_authors > 1) {
$mj++;
if ($sorted_authors eq $authors) {
$mja++;
}
} else {
$sj++;
}
} else { # article in proceedings or edited book
if ($no_authors > 1) {
$mp++;
if ($sorted_authors eq $authors) {
$mpa++;
}
} else {
$sp++;
}
}
This is straight-forward.
<Print results>= (<-U) print "Statistics from CIS for years @years \n"; printf "Single author papers in Journals: %d\n", $sj; printf "Multiple author papers in Journals (alph): %d\n", $mja; printf "Multiple author papers in Journals (Non-alph): %d\n", $mj - $mja; printf "Single author papers in Proceedings: %d\n", $sp; printf "Multiple author papers in Proceedings (alph): %d\n", $mpa; printf "Multiple author papers in Proceedings (Non-alph): %d\n", $mp - $mpa;
This list is generated automatically. The numeral is that of the first definition of the chunk.
Here is a list of the identifiers used, and where they appear. Underlined entries indicate the place of definition. This index is generated automatically.