-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathmethod.html
More file actions
54 lines (53 loc) · 8.72 KB
/
Copy pathmethod.html
File metadata and controls
54 lines (53 loc) · 8.72 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>One Building</title>
<link rel="stylesheet" href="style/scatter.css" charset="utf-8">
</head>
<body>
<header id="main-header">
<div class="container">
<h1>How Many Files for One Building? <br> <a href="index.html">Data</a>, <a href="#">Method</a>, <a href="contact.html">Contact</a>.</h1>
</div>
</header>
<div class="container">
<section id="main">
<br>
<p>Most people realize that architecture is no longer being designed on the drafting board by hand and pencil. Nowadays, almost all buildings are designed in a digital environment. But what exactly does that mean? How many files does it take to design one building? What kind of software is involved? How many emails does it take? When do they send them? How much space does it take up, and what are the largest files? How many files came from other projects, or with the software? How do these metrics change over the various phases of the design and construction process? <br><br>
The purpose of this project is to help people visualize and explore the files involved in creating one building. Born-digital architectural projects are just now beginning to be collected by museums and archives; issues may arise (and opportunities may be lost) as institutions adapt methods and policies for non-digital architectural material to born-digital work. Not only is it important for collecting institutions to grasp the scope of a project’s files in order to make more informed appraisal decisions, but also to leverage and preserve the metadata automatically generated by the digital process to create novel ways of accessing, analyzing, and understanding archival material.
<br><br>
<strong>METHODS AND RATIONALE</strong>
<br><br>
<i>Which Project?</i>
<br>
The building represented by this data is <a href="https://www.morphosis.com/architecture/127/">Emerson College Los Angeles</a>, designed by <a href="https://www.morphosis.com/">Morphosis Architects</a> and completed in 2014. As the archivist for Morphosis Architects, I have unique access to the project material and files in the state they were created.
This building was chosen because it was local, complex in terms of form and engineering systems, and ‘significant’ in that it received multiple design awards—thus representing the type of project that might be collected by a museum or archive. The project was also ideal, because Morphosis Architects served as the “prime” contractor, leading all phases of design and construction and coordinating all the work of sub-consulting engineers. Because Morphosis is based in Los Angeles, they did not have to involve a local architect for this project. All of these conditions mean that Morphosis would hold most of the project files generated during the design and construction administration of this building. Thus, the project would represent the most comprehensive look at a project size and number of files.
<br><br>
<i>And What Interface?</i>
<br>
It was important this visualization and data be easily accessible, ideally through a web browser involved no unusual plug-ins, downloads, or account creation. It was also important that it be interactive or scalable in some way, to facilitate exploration and understanding of what I anticipated to be a large amount of data. Ideally, the material would be represented as a scatterplot, with each file charted against important values like date and file size.
<br><br>
<i>Creating the Dataset</i>
<br>
The project files for Emerson College Los Angeles are held together on an active “archive” server at Morphosis Architects. To create a dataset for analysis and visualization, PowerShell scripts were run on the project from a PC connected to the server. These scripts retrieved basic metadata about each file, including file name, extension, “length” (file size in bytes), and last write date. This information was written to a Comma Separated Value (CSV) spreadsheet. Altogether, the scripts identified 147,961 files in the project folder.
<br><br>
<i>Refining the Dataset</i>
<br>
Most of the project data represented metadata automatically generated by computer during file creation and use. Data for file size, extension, and last write date was uniform and did not need to be “cleaned.” However, it was determined by Morphosis that for the dataset hosted on the web, the “file name” was sensitive information and should be replaced by a unique identifier—an arbitrary, unique number appended by “ID.” Since the file name was not important to this project, all file names were replaced by a unique identifier. An offline spreadsheet was created that linked the unique IDs with the filenames, should this be useful in future iterations of the project.
Several different approaches were explored for visualizing the data, based on the capacity of different visualization libraries and the limited web development skills of the researcher. Each approach placed different requirements on the dataset. Ultimately, the JavaScript library Data Driven Documents (D3) was chosen because of its interactivity, attractive graphics, and gentler learning curve. D3 scripts can draw on data contained in CSV or JSON format to create informational charts and graphics based on SVG files. SVG files are vector files, which means the lines, labels, points, and shapes of the graph can scale and adjust to different zoom levels and window sizes and still remain crisp, which makes for an attractive and interactive user experience.
However, D3 charts are limited in the amount of data points that can be represented. Datasets with over 10,000 points begin to slow down browser loading speeds and transitions. The raw dataset for Emerson College Los Angeles, with nearly 150,000 files, would be impossible to represent on a per file basis using D3. Thus, the data needed to be consolidated in a way that would retain all information but produce fewer datapoints.
<br><br>
<i>Refining the Data Model</i>
<br>
The initial data model for this project related directly to what information could be collected through harvesting metadata from the project files. However, to create a smaller dataset with the same amount of information, the data model needed to evolve to consolidate data in meaningful ways. By testing different PivotTable configurations of the initial dataset, It was determined that the sum (both in terms of number of files created, and total size created) of each extension type created per day could create a meaningful chart, while reducing the number of points in the dataset to 8,692 rows.
Based on this data model, a new dataset was created that charted how many files of each file type were created each day. This new dataset was produced by filtering the initial PivotTable by extension type and copying this information over to the new dataset, which was also filtered by extension type. To copy the visible cells of a filtered table to only the visible cells of another filtered table, a Macro script was created in Excel that instructed the program to only place data in empty visible cells.
<br> <br> To augment this information, two additional columns were added in the table: file format and file type. These columns interpreted the column of “extension” into more meaningful information for users. File format was determined using Siegfried Tool for Archivists, a piece of software that uses command-line terminal instructions to analyze the file signature and extension of given files to determine their file format. The results can be written to a CSV file; they were added to the new smaller dataset according to file extension. The second column, “file type,” was determined through Googling each file format to determine what type of file “group” the extension typically fell under: images, 2D/3D CAD files, emails, vector drawings, spreadsheets or databases, audio/visual files, text documents, or program and system files. PDFs were given their own category, because of how numerous they were in the project and because they might contain Text Documents, Vector Drawings, or 2D/3D CAD files and so did not fit easily into any of these. The limited amount of file types resulted in a useful dimension of the data that could be interpreted by color in the D3 chart.
</p>
</section>
</div>
<footer id="mainfooter">
<p><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. 2018 Nicole Meyer</p>
</footer>
</body>
</html>