John R. Skoyles (1992) Ftp Internet Data Archiving: a Cousin for Psycoloquy. Psycoloquy: 3(29) Data Archive (1)

Volume: 3 (next, prev) Issue: 29 (next, prev) Article: 1 (next prev first) Alternate versions: ASCII Summary
Topic:
Article:
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 3(29): Ftp Internet Data Archiving: a Cousin for Psycoloquy

FTP INTERNET DATA ARCHIVING: A Cousin for PSYCOLOQUY
Target Article on Data-Archive

John R. Skoyles
Department of Psychology
University College London
WC1E 6BT, UK

ucjtprs@ucl.ac.uk

Abstract

American Psychological Association (APA) journals do not publish raw data, hence data are effectively inaccessible. I propose that authors of research papers should transfer their data to an Internet site so it can be accessed over Internet by anonymous ftp. I suggest that such data archiving would (1) make fraud easier to detect, (2) encourage scientific criticism and (3) aid the scientific process in general. Nor should it be difficult to implement.

Keywords

data archiving, deception, electronic retrieval, error detection, ftp, fraud, meta-analysis, statistics
1.1 Experimental data are rarely published. Usually we are happy with their author's own statistical treatment. But not always. Researchers do not always fully analyse their data; sometimes editors restrict their publication space; and sometimes we have an idea we would like to try out on those data. It would be nice if the experimental data we read about were easy to access. I suggest that the approaching-universal use of computers and the Internet mail and file transfer system have made this possible. PSYCOLOQUY is archived and easily accessed through anonymous ftp: There is no reason why archived research data should not be equally accessible. Though there are several potential problems with ftp archiving of published data, the benefits would, I believe, vastly outweigh them.

2.1 Here follows a case for the ftp archiving of data published in APA (American Psychological Association) journals. I raise a few objections and last consider how it might be implemented. Note that when I refer to ftp this also applies to other forms of electronic data transfer.

3.1 First, electronic data archiving should be easy to implement and will become increasingly so. Most researchers now (unlike, say, even two years ago) would have little trouble archiving their data upon publication. Most Results sections are based upon computer analyzed ASCII data files (usually by a statistical package such as SPSS or BMDP). Most researchers should have their raw data stored in a form (i.e. file and subdirectory names) which makes it easy for other researchers to use. The commands and procedures for transferring it to a central data archive will be familiar to most psychologists (if not, most departments have people who will help). Of course, all the details about the research will be contained in the published paper, so these need not be stored. Indeed, the names of journals, their volume and issue numbers, make a convenient directory and subdirectory structure for organising the archive. There is something self evident about what data are contained in /JEPHPP/18/1/SMITH/EXP1. And just as it is easy to MSEND data to an archive so it is easy to MGET them for reanalysis.

3.2.1 Second, the scientific ethic is to make error correction as easy as possible. Scientists are not always entirely competent or honest. Numerous cases of fraud and intellectual dishonesty have occurred in psychology (as elsewhere in science). Researchers are subject to enormous pressures to publish but unfortunately this normally requires positive findings. This puts pressure on researchers to rerun analyses (changing criteria for categorising data, excluding subjects, treating missing data, etc.) when only negative findings turn up. It is not clear how many researchers resist these pressures on the integrity of data analysis. At present, it is difficult to check. In a recent case reported in Science, two psychologists were only able to check the data analysis of another psychologist through the intervention of lawyers (Palca 1991).

3.2.2 There is public disquiet in the US Congress (notably, on the part of Congressman John Dingell) concerning fraud and intellectual dishonesty in science. Research on published fraudulent papers has revealed many defects (Stewart & Feder 1987). It is likely that any archived data would contain even more accessible and noticeable defects (in their data distributions, treatment and analysis). Archiving data would thus make it easier to detect both fraud and intellectual dishonesty.

3.3 Third, much honestly obtained and analyzed data is incompetently handled, yut many legitimate criticisms never arise because of difficulties accessing data. At present, if you suspect that a researcher's own analysis gives only part of the story or is misleading, you face an involved process of contacting them for the original data (something inconvenient to all concerned). Archiving data would increase the opportunities for legitimate criticism of published work.

3.4 Fourth, researchers ask different questions. Sometimes a researcher may wish to reanalyse data to answer questions the original authors ignored. People carrying out meta-analyses will often want to check the quality of the work they are using. At present this is not possible.

3.5 Fifth, students could gain much by examining real research papers and then "playing around" with their data, seeing the affects of different data-analytic strategies. They might even even find things overlooked by their authors.

3.6 Sixth, much data is accidentally lost (despite APA's requirement that authors retain their data for a number of years). An ftp archive would make a convenient data backup.

3.7 Seventh, scientific papers are printed on paper -- this, not the nature of science, is the reason data are not normally made accessible at this time. Science is about open communication that maximally exposes ideas and arguments to criticism (one legitimate criticism of an idea is the way its data are handled). Printed paper is a convenient means for opening written ideas to criticism, but it is unsuitable for making data accessible to criticism (it limits the quantity which can be published and communicates in a form that is inconvenient for computer reanalysis). Print has until recently been the only means for disseminating scientific ideas and data. Hence the tradition has arisen of limiting the dissemination of data. We should recognise the opportunity that electronic archives provide for breaking with this.

4.0 There are some reasons against ftp archiving:

4.1 Certain classes of data (e.g., clinical data) may have to be excluded to preserve the confidentiality and privacy of those from whom it is collected. This constraint does not apply to large portions of psychology, however, such as research on animals, reaction time studies on student subjects, or computer simulations.

4.2 Researchers certainly have the right to the "first go" at their data. However, the fact of publication, unless contrary notice is given, usually signifies that the data have already been substantially analyzed, and frequently no further analysis is intended.

4.3 There is another entirely invalid objection. Many researchers will be uncomfortable with their data being ftp archived because none of us are perfect. If our data can be reanalyzed we may be shown to have carried out, quite unintentionally, inappropriate or misleading analysis. To some extent the present state of affairs is quite convenient for hiding the fact that many researchers could be better statisticians and could keep better records.

5.0 Since impracticability may be an objection, I describe how an ftp archive might work:

5.1 The archive would have to be moderated by an archivist. Journal editors, for example, could contact the archivist, who would in turn contact the paper's chief author, providing a password and a temporary directory into which raw data files could be transferred. Researchers would be free to create the subdirectories they felt best organised the data and to write a brief contents file. The archivist would transfer the files to a permanent directory. A standard note on the front page of the published paper would state whether its data had been archived.

5.2 I suggest that not only the raw data be stored but also the statistical and data analysis programs (SPSS or BMDP; or uncomplied Basic, Pascal or C) used to analyse them. Without these programs, tracing the transformation of the raw data into the reported statistical findings would be much more difficult.

5.3 Parallel to the archive there should be a directory for comments by people who have accessed the data, to record their findings. Anyone wanting to reexamine anyone's data would be interested in any previous reanalyses, good and bad.

5.4 There is no reason such a data archive could not grow to cover non-APA journals, theses, and nonpublished data (for example, unpublished negative findings).

5.5 Such a system would of course involve some cost and effort, perhaps even some inconvenience. However, with the public and congressional concern about whether scientists are maximally ensuring the integrity of their data, a ftp archive would show a commitment from the psychological community to ensuring honesty in published psychological research.

REFERENCES.

Palca, J. (1991). News and Comment: Get-the-lead-out guru challenged. Science 253: 842-844.

Stewart, W. W. & Feder, N. (1987). The integrity of the scientific literature. Nature 325: 207-214.


Volume: 3 (next, prev) Issue: 29 (next, prev) Article: 1 (next prev first) Alternate versions: ASCII Summary
Topic:
Article: