In order to examine the feasibility of using VisArchive for searching, browsing and exploring information in project archives of different domains, two case studies were conducted. The case studies involved (1) a construction project archive, and (2) a software project archive (software defect tracking). Specifically, the design principles implemented in VisArchive were examined in relation to the three context-specific design features: (1) the temporal context shows timeline based visualizations; (2) the search-relevance context shows visual indications of search relevance and matched keywords; and (3) the usage context shows visual representations of user activity in shared repositories. The evaluation focused on the feasibility of the prototype to resolve complex use scenarios, rather than simple search using known file names or IDs.
The interface of VisArchive was revised and modified based on these case studies, but the core features of the tools described in Section “System design details, reasoning structure and implementation” remained stable. Since a file contains more information than users often need, the interface for the construction project case study was designed to include a description viewer that allows users to view the details (e.g. file description or file path) when they click on a file from the information browser. The information browser for the software project case study was modified to display the summary for each software defect. Users can identify a software defect item by viewing its summary.
Case study 1: construction project archives
In this case study, VisArchive was used to search construction project archives of an educational building project. The project archive contained more than 800 files that were created during its design development phase by different individuals involved in the project. We created a project archive using 300 files that were selected based on the information available for testing the prototype. These files were stored and shared as digital copies in a central hosting server with a variety of file types such as PDF, DOC and TXT. The information, such as ID, name, path, and description of each file, was archived as searchable meta-data into a database system for archive searching, data processing and visualization by VisArchive.
This case study provides access history visualization components for visualizing file access history in the archive. Since prior file access history data was not provided by the construction project archive, a synthetic file access history for a number of files in the database was created for demonstration purposes. The testing focused on searching and exploring the files in the archive that users might have never accessed before, or that users lacked specific information about the files.
Searching files that match all the search keywords
The testing considered the real scenario in which a project manager (PM) wanted to find all “electrical”, “mechanical” and “structural” documents that an engineer (named hereafter as “Mike”) had worked on. The PM had to share them with another engineer that came on-board after Mike left the company. Traditional search methods present search results as a list of files from top to bottom, with information such as file name, size, last-modified date, etc. Although existing search solutions such as Buzzsaw® enable the most relevant files to appear at the top of the list, the PM would not know clearly which keywords and how many of them were matched. Accordingly, the PM might need to open each file to evaluate how relevant it is to the search keywords. It would also have been difficult for the PM to understand how these files had been produced along the way. This is important because the PM needs to find files that were produced in a certain period in the project’s history.
For the scenario described above, VisArchive enables the PM to easily identify the files matching all the search keywords (“electrical,” “mechanical,” “structural,” and “Mike”) on ten different dates. These files are considered the most relevant to the PM’s search and contain all the search keywords entered. Thus, VisArchive helps the PM to view the most relevant files first — more quickly than otherwise possible when searching manually. The efficiency of finding the most relevant information is very critical particularly in a large-scale construction project archive. The blue arrows in the timelines not only indicate the relevant files, if any, in the archive — matching all the search keywords — but also provide users a visual overview of when these files were created during the specific project stage (Fig. 6). In order to help the PM to retrieve the most relevant file out of the search results, the information about each file in the information browser is very useful as it allows the PM to view and access detailed information about the files (Fig. 7).
Exploring files relevant to the search keywords
In VisArchive, the color-coded stacked bar charts in the timelines and visual support in the information browser are designed to enable users to explore files with different levels of relevance to the search keywords. For example, the PM in the above-mentioned scenario, besides searching for the files that match all the search keywords (“electrical,” “mechanical,” “structural,” and “Mike”), was further interested in exploring other files that matched to one or more of these keywords. For example, the PM needed information about other “electrical” related files which Mike was also involved in. Existing search solutions for construction project archives make it difficult for individuals to explore the files by their relevance to the search keywords as they generate a long list of files. Since there is no visualization of the search results, individuals must view the textual meta-information of a file to identify its creation date and matched keywords. While the file list can be arranged by either “Date” or “Relevance”, individuals cannot easily explore and browse the relevant files in the project timeline and query for information such as the following: Are there any files that match a certain number of the search keywords? When were these files created in the project timeline? Which month contains more relevant files than the others? The color-scaled visual support both in the timeline and the information browser of VisArchive, allows the PM to identify these relevance-ranked files and to identify the level of relevance for each file in the information browser. For example, two files (Fig. 7 (a) and (b)) are shown as less relevant than the most relevant file (Fig. 7 (c)), but they are highlighted as more relevant than the other files found in the search results.
The associated color-coded visual panes for search keywords in the information browser allow users to distinguish the files with same relevance but different matched keywords. For example, when the PM wanted to explore electrical documents with which Mike was involved (files containing “electrical”, and “Mike”), other files relevant to other search keywords are also shown with the same relevance as shown in Fig. 7, (e.g. files containing “structural” and “Mike”). With existing solutions, users would need to read extra meta-information of each file in order to differentiate between files with the same relevance level.
Besides searching and exploring files in the project archive, the PM wanted to explore other information, for example, identifying the time periods in which the project archive was more active (i.e. when more files were created). The color-coded stacked bar charts on the timelines show the density of file creation, and the density of files relevant to the search keywords throughout the life of the project. Because the timelines convey information about the various activities and file types created during the project (e.g. documents such as different layout plans may have been created most frequently earlier on in the project), VisArchive can help to narrow down the time periods and the intensity of project activities in order to find the most relevant documents.
Exploring file access history
The access history information of files in the constructiion archive is extremely important for design coordination and development. The architect on the project had placed the latest and updated version of the architectural design into a shared construction archive but was not sure whether the consultants had accessed it. The color-coded visualization of access history provides much of this information, such as the number of times the file was accessed and the type of access (e.g., opening or modifying a file). In the access history viewer (Fig. 8), each type of access is assigned a color-code to improve the users’ ability to identify the file they are looking for (e.g., the one they accessed the day before or the one that was modified most recently). The access records can be filtered by individuals who have created, accessed, or/and modified the file.
Case study 2: defects tracking of mozilla thunderbird project
The second case study expands the application of VisArchive beyond the construction project archive to explore software defects tracking in the open-source software development domain. Unlike the construction project archives, software defects in this case study were not structured into directories as digital files. Compared to the previous case study, this case study project provides a test environment for searching and visualizing larger amounts of unstructured data.
The Mozilla project was started in 1998 and was intended to develop open-source software projects using the power of thousands of programmers all over the world (Mozilla project 2014). Thunderbird is the Mozilla Foundation’s next-generation email client. As the software is being used all over the world by thousands of users, software defects and issues can be found and reported by using a web-based defect tracking tool called Bugzilla (Bugzilla 2014) that also allows developers to track these issues and, eventually, fix them. The Thunderbird project archived more than 5000 software defects in Bugzilla from the beginning of the project in 2004. Around 1000 defect records from Bugzilla under the Thunderbird project between 2006 and 2007 were used for testing. A modified and simplified version of the VisArchive interface was used compared to the construction case study. For each software defect, the defect ID, date, and summary were used as meta-data within VisArchive.
Bugzilla allows users to search the defect archives by entering keywords and using advanced filters that are similar to the search mechanism of VisArchive. Finding relevant information over thousands of defect records in Bugzilla is a tedious process. Moreover, the search results are represented in a conventional list of defect information (Fig. 9). Although users can reorder search results alphabetically by attributes, it is difficult for users to view the relationships among different defects, and especially, to explore defects that are partially relevant to the search keywords.
The defect summary was described by the defect finder, and contained crucial information needed by a software developer or tester to recreate the defect. This kind of meta-data is hard to categorize and filter using the original Bugzilla interface. Therefore, emphasis in this case study was on finding relevant software defects by searching and exploring through the summary of software defects.
Searching defects in the software defect archives
In order to fix issues and improve the quality of software, developers need to search and find the software defects from the archives that correspond to their expertise or responsibility. In addition, time information, such as when the issues were filed, is also useful for developers to prioritize them and fix. Figure 10 shows an example of search results and visualization support that is provided by VisArchive for the software defect project. The software developer can take advantage of the timeline visualization of search results to easily identify software defects along with the date that these defects were created. With VisArchive, the developer can navigate to the time range containing the earliest and most relevant defects found (Fig. 10(a)) in the timeline and view the defect summary of the most relevant defect (Fig. 10(b)) in the information browser. Regardless of the size of the archive, the blue arrows always provide awareness of the most relevant results and accordingly offer visual cues to the users.
Exploring the defect archive and relevant software defects
If none of the defects matches the user’s requirements or users want to explore other software defects with less relevance, users may refer to the stacked bar charts in the timeline and visual panes to identify the most relevant defects in the archive and where these defects occur on the archive timelines.
Figure 11 shows the example in which a software developer searches the software defect archive with more keywords than the example in 4.2.1 (e.g. searching “compose,” “window,” “file” and “attachment”). The developer can easily identify the results (Fig. 11 (a)) matching all the search keywords (e.g. defects might be about “file attachment” in “compose window”) by finding the blue arrow in the timeline and the blue highlighted tickets in the defect browser. The developer may also want to explore other defects that are partially relevant to the search keywords. For example, the developer may be interested in other defects relevant to “compose window” or “file attachment” (e.g. Email “compose window” might have other issues besides in the “file attachment” function, and the developer may want to fix those as well). Other keyword combinations (e.g. “compose attachment”, “window file”) are not the terms that the developer is concerned with in this case, and thus these software defects can be ignored. By glancing at the stacked bar chart over the timeline, the developer can easily perceive how the defects in the archive are relevant to the search keywords and how these defects distribute over the timeline in the archive. The developer can browse and explore the defects that partially match keywords by visually scanning the color panes of keywords in the defect browser, instead of reading the summary of defects (e.g. the defects containing “compose window” are interesting items (Figs. 11 (b) and 12(a)), whereas the defects containing “window file” may be disregarded (Fig. 12(b)).
Similar to the construction case study, the timelines of the defect archive show a picture that conveys to the developer how many defects and how the defects in the archive match the search keywords. The developer is able to determine — visually — the dates that contain the defects that are more or less relevant to the search keywords. When used for logging new software issues, VisArchive enables software testers to search and explore whether there are similar or related issues existing in the archive. If relevant defects matching the same keywords are found in the archive, timelines enable users to determine and explore their relationship over time. As an example, a tester searches for a system bug in which a “Removed account stays in ‘recent’ folder view,” even after it has been removed. The search keywords for this bug include “folder” and “preferences,” and the results show the bug was logged in August. The timeline visualization reveals that another defect, labelled “Unmarking folder as favourite in ‘Favourite Folders’ view doesn’t remove folder,” is also associated with the keywords of “favourites,” “folder” and “preferences.” The tester sees the bug was logged in May and that it has since been resolved. This information could be helpful to the developer to explore whether the resolution to the earlier bug logged in May could help to resolve the similar bug detected in August.