Data, computers, science through the prism of a human: Required qualities of scientific writing

In this first post, I want to visit, in a non-technical manner, some qualities of scientific works, which highlight potential caveats in research. Keep in mind that from now on, wherever I feel a reference/citation makes sense in the text, I either use a link or a number in brackets (e.g. [1]), which indicates a numbered reference you can find at the end of the text (similar to an endnote).

In my years as a researcher, I have been called upon to perform experiments, support or reject hypotheses and review my work, as well as the work of others. Trying to decipher what is acceptable in science and as science, I pursued a first understanding of how things work by reading published articles. My early understanding was that a piece of scientific writing, describing some research, should have the following qualities (non-exhaustively):

it should be well motivated (i.e. should try to solve a meaningful problem or add to a clear theoretic question/line of thought);
it should be clear (i.e. understandable without too much effort);
it should be concise (i.e. not bloated with unneeded information or - to the other extreme - lacking significant information);
it should be self-sufficient (i.e. introducing the terms used, as well as the context of the problem in an adequate manner, such that minimal information beyond the text required);
it should be innovative and correctly positioned with respect to the related work (i.e. showing was is the missing piece of knowledge it offers the world);
it should be as unbiased as possible with respect to the experimental setup and findings (whether the latter be positive or negative);
it should be useful and reusable, in that they should offer insights about things we do not know fully, allowing reuse of the findings to further pursue scientific (or applied) goals.
the work should provide enough information on experiments to make them repeatable.

Unfortunately, a number of the (mind you: published) papers I read, held only a number of the above qualities. Thus, I started to understand that not all is well in science.

My second source of scientific method know-how were other scientists. During my PhD thesis I received or simply heard about a number of pieces of "advice" on how to perform research, essentially following practices that would oppose the above qualities. I provide a few examples below:

Add complex math formulas to your presentations/articles: people like to see things they do not understand. They feel your work is worth more. Opposing qualities: clarity, conciseness
A dissertation should be at least/at most X pages. Opposing qualities: conciseness / self-sufficiency
Has anyone else tried using method M for your setting? Everyone uses method M these days! Opposing qualities: motivation, innovation, unbiased approach.
It is no big deal if you hack the numbers a bit. No one will notice. Opposing qualities: unbiased approach, usefulness and reusability.
Make your own data for testing and see how well you are doing there. This is enough, as long as you get a nice p-value in the experiments. Opposing qualities: innovation and correct positioning, unbiased approach, usefulness and reusability, repeatability.
Do not refer to any failures of the method; just show the strong points. Opposing qualities: usefulness and reusability, correct positioning, unbiased approach.

Are the above pieces of advice meaningful? Do they help? Let us see what I found out myself during my practice as a researcher/reviewer/supervisor/professor, with respect to the above "advice" and how other scientists react to those who follow them:

If I find complex formulas (or any other unclear statement, no matter how scientific-looking it is) in an article without textual/intuitive support as a reviewer, I negatively comment and reduce the "clarity" grade.
To students that create long dissertations without good reason, I have them rewrite the text. And that can hurt a lot...
When a student or a paper I review starts by "others used method M in other settings, so we use it in our setting", I clearly state that the positioning and the motivation of the paper is problematic and (guess what!) I reduce its "technical quality" grade.
NOTE: I will try to offer some insight regarding scientific "hype" in another post. This hype is a very common cause of such badly motivated works.
When I submitted my first scientific journal paper [1], one of the reviewers actually repeated ALL my experiments and contacted me to validate the findings. Thankfully, my method was simply good, so the findings were true and validated. In other words, I found out early enough that being honest is the only thing that makes sense. If you lie, sooner or later you will be found and, probably, be humiliated, no matter the scientific status you may have attained (cf. here and here for cases of false evidence and their outcomes).
Working on one set of data, made by yourself, is usually not enough to get a work accepted (in most established conferences and journals). Even if your work does get accepted, the lack of reusable data minimzes the impact of your work (i.e. few people will actually cite it). It is much better practice to simply put effort into creating a sharable dataset and get it out there for use by others. This is one of the best ways to see your work being reused and cited.
Do not count on p-values too much: they are being severely discussed and criticized lately [2,3]. This is due to a simple fact: p-value is usually not what we expect it to be.
NOTE: I will try to cover the "reproducibility crisis" and the problems of statistical significance in later posts.
Omitting the downsides of a method, allows others to criticize your work as insufficient. Such debates have begun in the past, ending in no good outcomes. Once again, see the identification of downsides as the means to propose a next publication (a.k.a. future work).

Based on the above, it was quite clear since my early research steps that "building on the shoulders of giants" may not be enough, if the giants have clay legs. I will start by a strong claim (such claims are to be avoided in scientific writings):
Science is simply a guess of what reality is about. It also appears to be the best one we have, when in comes to measurable and observable phenomena.

This is what makes science amazingly useful. This is also why we need to be ready to surpass every claim science makes, to reach further towards the truth, when new evidence surface to open new ways and indicate new challenges.

References:

[1] Giannakopoulos, George, et al. "Summarization system evaluation revisited: N-gram graphs." ACM Transactions on Speech and Language Processing (TSLP) 5.3 (2008): 5.

[2] Wasserstein, Ronald L., and Nicole A. Lazar. "The ASA's statement on p-values: context, process, and purpose." The American Statistician (2016).

[3] Halsey, Lewis G., et al. "The fickle P value generates irreproducible results." Nature methods 12.3 (2015): 179-185.

Data, computers, science through the prism of a human

Δευτέρα 31 Οκτωβρίου 2016

Required qualities of scientific writing

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου