Artifact Website
Q&A forums are today an important tool to assist developers in programming tasks.
Unfortunately, contributions to Q&A forums are often unclear and incomplete as developers typically adopt a liberal style when posting questions and answers.
This paper evaluates the feasibility of using Docker to address that problem.
Docker is an increasingly-popular lightweight solution for environment virtualization, which means a developer can encapsulate the reproduction of an operating environment within a “container”.
Our study is organized around four dimensions of interest — adoption resistance, technology maturity, effort, and developer’s ability.
To conduct this study, we involved professional developers, active users of StackOverflow, and students with basic training in Docker and web development, the domain of posts we focused.
In summary, our study indicates that Docker is useful the most on a not uncommon category of posts – configuration posts of medium and high difficulty.
Overall, results suggest that integrating reproduction scripts in Q&A forums should be encouraged.
Given our limited resources, we restricted our analysis to one data source; we used StackOverflow for its recognized popularity. To select questions, we used Data Explorer, a service provided by Stack Exchange a network of Q&A forums such as StackOverflow, MathOverflow, and Ask Ubuntu. The query we used is publicly available here the result set was obtained in January 18, 2017 and is available to download here.
Explaining our query, (i) we only selected questions tagged with the name of the framework (line 16) and with the name of the programming language (line 9) we provided. We found that the framework name alone was insufficient to filter corresponding queries as posts related to different tools with similar names would also be captured.
. . . EXISTS( SELECT * FROM PostTags WHERE PostTags.PostId = Posts.Id AND PostTags.TagId = @LanguageTagId ) AND EXISTS( SELECT * FROM PostTags WHERE PostTags.PostId = Posts.Id AND PostTags.TagId = @FrameworkTagId ) . . .(ii) We only selected questions not marked as closed. For example, a question can be closed (by the community or the StackOverflow staff) because it appears to be a duplicate.
. . . SELECT TOP 100 Posts.Id AS [Post Link], Answer.Score as [Answer Score] FROM Posts INNER JOIN #PostIds ON Posts.Id = #PostIds.Id INNER JOIN Posts Answer ON Answer.Id = Posts.AcceptedAnswerId WHERE ( -- Valid Question Posts.ClosedDate IS NULL AND Posts.DeletionDate IS NULL ) ORDER BY [Answer Score] DESC . . .(iii) We only selected questions that the owner of the question selected a preferred answer. As we need humans to analyze questions, we set a bound of a hundred questions per framework. We prioritized the questions obtained from our result sets according to a quality estimator previously used in other StackOverflow mining studies not mentioned here. More specifically, we sorted the questions in result sets in reverse order of their scores and extracted the first hundred entries.
To evaluate the adoption resistance, we prepared a survey (RQ1) and anonymized every author. Although we do not provide the raw e-mails, as they may contain information about the paper author(s) (e.g., name or institution) or the Stack Overflow user information (i.e., the signature that may include the name and e-mail address). However, we provide the survey metadata gathered during the process of RQ1 and those data are available to download here.
This is a on-going section, however, we disponibilize all containers developed during the process of the paper made by both researchers and developers. General and Configuration related containers. More details on how to build and run those containers will be available as soon as possible. However, generally, those containers can be built using
docker build -t my_container /source/pathand you can run using
docker run -it --rm my_container.
For double-blind purposes author(s) name(s) won't be revealed.