A Feasibility Study on the Use of Docker to Assist Q&A Forum Users

Introduction

Q&A forums are today an important tool to assist developers in programming tasks. Unfortunately, contributions to Q&A forums are often unclear and incomplete as developers typically adopt a liberal style when posting questions and answers. This paper evaluates the feasibility of using Docker to address that problem. Docker is an increasingly-popular lightweight solution for environment virtualization, which means a developer can encapsulate the reproduction of an operating environment within a “container”.
Our study is organized around four dimensions of interest — adoption resistance, technology maturity, effort, and developer’s ability. To conduct this study, we involved professional developers, active users of StackOverflow, and students with basic training in Docker and web development, the domain of posts we focused. In summary, our study indicates that Docker is useful the most on a not uncommon category of posts – configuration posts of medium and high difficulty. Overall, results suggest that integrating reproduction scripts in Q&A forums should be encouraged.

Data source

Given our limited resources, we restricted our analysis to one data source; we used StackOverflow for its recognized popularity. To select questions, we used Data Explorer, a service provided by Stack Exchange a network of Q&A forums such as StackOverflow, MathOverflow, and Ask Ubuntu. The query we used is publicly available here the result set was obtained in January 18, 2017 and is available to download here.

Explaining our query, (i) we only selected questions tagged with the name of the framework (line 16) and with the name of the programming language (line 9) we provided. We found that the framework name alone was insufficient to filter corresponding queries as posts related to different tools with similar names would also be captured.

                                  .
                                  .
                                  .
      EXISTS(
        SELECT * FROM PostTags 
        WHERE 
          PostTags.PostId = Posts.Id 
          AND PostTags.TagId = @LanguageTagId
      )
      AND
      EXISTS(
        SELECT * FROM PostTags 
        WHERE 
          PostTags.PostId = Posts.Id 
          AND PostTags.TagId = @FrameworkTagId
      )
                                  .
                                  .
                                  .

(ii) We only selected questions not marked as closed. For example, a question can be closed (by the community or the StackOverflow staff) because it appears to be a duplicate.

                                  .
                                  .
                                  .
                                  SELECT 
  TOP 100
  Posts.Id AS [Post Link],
  Answer.Score as [Answer Score]
  FROM Posts INNER JOIN #PostIds ON Posts.Id = #PostIds.Id
  INNER JOIN Posts Answer ON Answer.Id = Posts.AcceptedAnswerId
  WHERE
    ( -- Valid Question
      Posts.ClosedDate IS NULL
      AND Posts.DeletionDate IS NULL
    )
  ORDER BY
    [Answer Score] DESC
                                  .
                                  .
                                  .

(iii) We only selected questions that the owner of the question selected a preferred answer. As we need humans to analyze questions, we set a bound of a hundred questions per framework. We prioritized the questions obtained from our result sets according to a quality estimator previously used in other StackOverflow mining studies not mentioned here. More specifically, we sorted the questions in result sets in reverse order of their scores and extracted the first hundred entries.

Survey Metadata

To evaluate the adoption resistance, we prepared a survey (RQ1) and anonymized every author. Although we do not provide the raw e-mails, as they may contain information about the paper author(s) (e.g., name or institution) or the Stack Overflow user information (i.e., the signature that may include the name and e-mail address). However, we provide the survey metadata gathered during the process of RQ1 and those data are available to download here.

Containers

This is a on-going section, however, we disponibilize all containers developed during the process of the paper made by both researchers and developers. General and Configuration related containers. More details on how to build and run those containers will be available as soon as possible. However, generally, those containers can be built using

docker build -t
                                my_container /source/path

and you can run using

docker run -it
                                --rm my_container

Copyright and license #back to top

For double-blind purposes author(s) name(s) won't be revealed.

A Feasibility Study on the Use of Docker to Assist Q&A Forum Users — the Web Frameworks Case

Introduction

Data source

Survey Metadata

Containers

Copyright and license #back to top