Data
Skills Lab 4 has been exploring how to bring outside resources into your visualizations. This has included fonts, images, and the user themselves. In this next tutorial, we will bring in external data. For this exercise, I am providing code which draws a visualization of fuel economy. Each section of this tutorial will disect one piece of this snippet of code. Note that this may be a little verbose given the simplicity of the graphic but I want to demonstrate some patterns you might consider for your work!Contents
Motivation
If we think back to Lecture 5, we used fuel economy standards as a way to re-do an example 3D visualization a few different ways. One option focused on reducing chartjunk and increasing data/ink ratio. This involved making a scatterplot with a twist: we used text instead of circles as a form of direct labeling. In this tutorial, we are going to explore building that graphic again in Sketchingpy. This will help demomnstrate not just how to bring data into our work but also gives us an opportunity to explore what is gained by taking control of the drawing as opposed to using a pre-built chart.Prepare
We need a few resources to make this work:- We will again use PublicSans-Regular as an OTF file.
- We will add in fuel economy standards targets as a CSV file.
Alternatives to CSV
There's no shortage of file formats out there. We are going to stay focused on CSV as it is likely the most common type you will run into. However, we will explore data which doesn't fit into a table easily later in this course. In particular, we will spend some time with JSON with complex nested data as well as geospatial information for making maps.
Load Data
This tutorial will use Sketchingpy for file access. Let's take a look at how my sketch does it.data_layer = sketch.get_data_layer() data_raw = data_layer.get_csv('fuel_standards.csv') changes = [parse_fuel_standard_change(x) for x in data_raw]We open a CSV file by providing its name before calling parse_fuel_standard_change to make sure the data I'm expecting to see are there with the right attributes and type (number, string, etc). We will take a closer look at that in a moment but, in case this notation is unfamiliar, I could have rewritten the last line which is a list comprehension like so:
changes = [] for datum in data_raw: parsed_datum = parse_fuel_standard_change(datum) changes.append(parsed_datum)Both list comprehensions as well as map, filter, and reduce might be worth exploring. They are commonly used tools for data manipulation operations in visualizations. These patterns are also often available across many different programming languages.
Alternatives to Sketchingpy for data access
If you have worked with Python prior to this course, you probably have loaded data from files before. Perhaps you used csv from the standard library or maybe you used an external library like Pandas. You are welcome to use those tools in this course but there is a catch.
One challenge for interactive data visualization is that your work might run across multiple platforms. While this might be different operating systems like Linux, it also includes running as a component inside the sketchbook versus embedded within a web page or outside the browser altogether as a stand alone app. Accessing files can be a little different in each case. For example, when running embedded within a webpage outside the sketchbook, you might have to make requests through the network to access resources instead of just calling with open or pandas.read_csv.
Sketchingpy takes care of all of this complexity by determining where the sketch is running and accessing files accordingly. This means that the same code which runs as a stand alone "desktop application" can also run from inside a browser without modification. You certainly don't need to use Sketchingpy to open files but it is still recommended.
Within the context of this class, this Sketchingpy approach means that your final project can be run from a web browser more easily. This can help you send your work to your friends or include it on your own personal website for prospective employers! If you use something else, you may have to add a little extra code to handle network stuff when your code either leaves the sketchbook (which emulates a local file system for you) or when your sketch departs a desktop application for the web. However, we can worry about all of those details when we get to the final.
Holding Data
Let's next look at how I parse each data point:def parse_fuel_standard_change(raw_datum): """Parse a raw CSV row as a FuelStandardChange. Args: raw_datum: Raw CSV row to parse. Returns: The row represented as a FuelStandardChange. """ year = int(raw_datum['year']) standard = float(raw_datum['fuelStandardMpg']) return FuelStandardChange(year, standard)Each row of the table inside the CSV file is represented as a dictionary mapping from string keys to string values. We do a "cast" here ("int" and "float") which converts from a string to a number because we want our years like 1985 to be integers and our fuel standards like 27.5 mpg to be floating point values with decimal components. Finally, let's look at that class I'm using to hold each data point:
class FuelStandardChange: """Object representing a change in fuel standards. Record of a change in fuel standards with a year in which that standard came into force and the standard level. """ def __init__(self, year, standard): """Make a new record of a standard change. Args: year: The year like 1985 in which the standard came into force. standard: The standard starting in the year as miles per gallon. """ self._year = year self._standard = standard def get_year(self): """Get the year in which the standard came into force. Returns: The year like 1985 in which the standard came into force. """ return self._year def get_standard(self): """Get the new fuel economy standard. Returns: The standard starting in this year as miles per gallon. """ return self._standardI should mention that some developers wouldn't add this class in and instead draw from the dictionaries directly. While the choice in your own code is up to you, I personally like having custom classes because:
- I can add docstrings describing what each attribute of the data is.
- This can check that all of the expected attributes are found on the file.
- I can signal how the data should be used. For example, as I only have get methods (and not set methods), I can help ensure that those data are accidentially changed somewhere in my code.
- I can preform calculations on data (like unit conversions) if needed.
More on software architecture
If you have taken software architecture coursework, you may recognize FuelStandardChange as a "data model object" or the Model (M) in the Model-View-Presenter (MVP) pattern. In this context, the name I use for the FuelStandardVizPresenter class might make more sense. The "view" would actually be the Sketchingpy Sketch2D itself.
Separately, you may have seen namedtuple or data classes in other resources. These can also be great options though I still prefer having custom objects for three reasons:
- Through the methods on my class, I can tell those using my objects (like other developers on my project) that they can get values but can't set or change them.
- I can provide docstrings for each variable through the method used to get it. Though there are options to attach annotation to instance variables, they aren't as common and there's some systems that won't support them.
- I can build in calculations that need to be done on values when someone calls a getter. For example, we could have a get_as_mpg for miles per gallon and get_as_kml for kilometers per litre with one performing the units conversion before returning a value. We will see an example of this later in the course.
These three objectives are sometimes more difficult using these other tools. All that said, these are just ideas for you to consider. For our homework, any of these approache are welcome!
Position
Now that we have our data, we need to position it graphically on the page. We have two scales:- The horizontal scale represents different years.
- The vertical scale represents miles per gallon.
def _get_x(self, year): """Get horizontal position at which a year's results should be drawn. Get horizontal position at which a year's results should be drawn where the minimum year is at the left side (LEFT_PAD) and the maximum year is at the right side (WIDTH - RIGHT) which corresponds to the minimum and maximum x coordinate of the chart body respectively. Args: year: The integer year for which an x position is requested. Returns: The x position in pixels at which the year should be drawn. """ year_range = END_YEAR - START_YEAR percent_offset = (year - START_YEAR) / year_range working_width = WIDTH - LEFT_PAD - RIGHT_PAD pixel_offset = percent_offset * working_width return LEFT_PAD + pixel_offsetThe logic for both the x and y positioning is similar.
- First, we figure out how far into the scale a value is in terms of a percentage. In other words, how far between the minimum value for a scale and a maximum value for a scale is the value we are trying to place?
- Then, we convert that percentage back to pixels. In other words, if a value we are placing is 75% the way from the minimum value to the maximum value, we place at 75% between the minimum pixel coordinate and the maximum pixel coordinate.
- Finally, we offset depending on where the scale starts (see LEFT_PAD).
def _get_y(self, standard): """Get vertical position at which a year's results should be drawn. Get vertical position at which standard's results should be drawn where the minimum standard is at the bottom side (HEIGHT - BOTTOM_PAD) and the maximum standard is at the top side (TOP_PAD) which corresponds to the maximum and minimum y coordinate of the chart body respectively. Note that, to have larger values at higher positions, the smallest standards in mpg are at the largest y coordinates. Args: year: The float standard (mpg) for which a y position is requested. Returns: The y position in pixels at which the standard should be drawn. """ standard_range = MAX_STANDARD - MIN_STANDARD percent_offset = (standard - MIN_STANDARD) / standard_range percent_offset_reverse = 1 - percent_offset working_height = HEIGHT - TOP_PAD - BOTTOM_PAD pixel_offset = percent_offset_reverse * working_height return TOP_PAD + pixel_offsetThe complication here is that coordinates go up as you from top to bottom of the sketch's canvas. So, we reverse the percentage for the miles per gallon: instead of going from 0% to 100%, it goes from 100% to 0%. In other words, if a value we are placing is 75% from the minimum value to the maximum value, we place at 25% (100% - 75%) between the minimum pixel coordinate and the maximum pixel coordinate.
Axes and Title
Since we are direct labeling, we don't have too much "chrome" to draw for our visualization. This helps us improve our data-ink ratio! However, it is important that we tell the user where our y scale starts and ends so that they know we aren't doing anything funny.def _draw_axis(self, sketch): """Draw the left side axis which clarifies start / end standards. Args: sketch: The sketch in which to draw the axis. """ sketch.push_style() sketch.set_text_font(FONT, SMALL_SIZE) min_y = self._get_y(MIN_STANDARD) sketch.clear_stroke() sketch.set_fill(LIGHT_COLOR) sketch.set_text_align('left', 'top') min_str = MIN_ANNOTATION % MIN_STANDARD sketch.draw_text(LEFT_PAD, min_y, min_str) sketch.set_stroke(LIGHT_COLOR) sketch.clear_fill() sketch.draw_line(LEFT_PAD, min_y, LEFT_PAD + 20, min_y) max_y = self._get_y(MAX_STANDARD) sketch.clear_stroke() sketch.set_fill(LIGHT_COLOR) sketch.set_text_align('left', 'bottom') goal_str = GOAL_ANNOTATION % MAX_STANDARD sketch.draw_text(LEFT_PAD, max_y - 1, goal_str) sketch.set_stroke(LIGHT_COLOR) sketch.clear_fill() sketch.draw_line(LEFT_PAD, max_y, LEFT_PAD + 20, max_y) sketch.pop_style()We are doing a push / pop at the start and end of the method to make sure that our changes to style only apply within this method. Furthermore, note that we are also taking advantage of this opportunity to additionally contextualize the data by describing why we end at 27.5 mpg. Specifically, we identify it as the legislative target for the ECA. Finally, we can put a chart title up at the top.
def _draw_title(self, sketch): """Draw the visualization title. Args: sketch: The sketch in which the title should be drawn. """ sketch.push_style() sketch.clear_stroke() sketch.set_fill(DARK_COLOR) sketch.set_text_align('left', 'bottom') sketch.set_text_font(FONT, HUGE_SIZE) sketch.draw_text(LEFT_PAD, TOP_PAD - 18, TITLE) sketch.pop_style()The use of the title contextualizes the chart. It is pretty neutral right now but, depending on the purpose of the graphic, we could use this offer additional insight like "Pathway to 27.5 mpg" or something similar.
Other technologies beyond Sketchingpy
The patterns you see here with Sketchingpy are common to other similar technologies like Processing, P5, or HTML5 Canvas. These other technologies also have set_fill, set_stroke, draw_line, draw_text or equivalents. Therefore, the push at the start of function and pop at the end is something you'll find in other code outside Python too. I also want to mention that this example is not using translate and rotate but those are common as well in other libraries.
Data Glyphs
We are finally ready to draw the actual data points themselves.def _draw_change(self, sketch, change, align, highlight, annotation): """Draw an individual change in fuel economy standards. Args: sketch: The sketch in which the change should be drawn. change: The FuelStandardChange to draw. align: String describing the horizontal alignment to use when drawing this change. highlight: Flag that indicates if this year should be drawn in high contrast "highlight" styling. True if highlight should be used and false otherwise. annotation: The annotation to display next to this year's results or None if no annotation should be added. """ sketch.push_style() sketch.clear_stroke() if highlight: sketch.set_fill(DARK_COLOR) else: sketch.set_fill(LIGHT_COLOR) year = change.get_year() standard = change.get_standard() x = self._get_x(year) y = self._get_y(standard) sketch.set_text_font(FONT, SMALL_SIZE) sketch.set_text_align(align, 'bottom') sketch.draw_text(x, y - LARGE_SIZE / 2 - 2, year) sketch.set_text_font(FONT, LARGE_SIZE) sketch.set_text_align(align, 'center') sketch.draw_text(x, y, '%.1f' % standard) if annotation: sketch.set_text_font(FONT, SMALL_SIZE) sketch.set_text_align(align, 'top') sketch.draw_text(x, y + SMALL_SIZE / 2 + 2, annotation) sketch.pop_style()There are a lot of different options available for drawing these different data points: align, highlight, and annotation. However, this is also where we see the power of building these graphs from the ground up through something like Sketchingpy. We can configure each piece of the graphic very precisely.
Using transformation instead
Note that is common for some designers to use translate so, for example, instead of this:
sketch.draw_ellipse( get_x(year) - 5, get_y(val) - 5, 5, 5 ) sketch.draw_ellipse( get_x(year) + 5, get_y(val) + 5, 5, 5 )
You might instead see this:
sketch.push_transform() sketch.translate(get_x(year), get_y(val)) sketch.draw_ellipse(-5, -5, 5, 5) sketch.draw_ellipse(5, 5, 5,5) sketch.pop_transform()This can really simplify code where you have to do a lot of drawing relative to a particular coordinate.
Putting numbers into text
One thing I want to point out here is the use of formatting strings like %.2f. I am using the "old style" becuase, for those with prior programming background, it's more likely you will have seen something like %.2f as it exists in many different programming languages. That will do great for this course! However, I encourage you to look at additional information about string formatting in Python. There are other alternatives available!
Conditional Formatting
Our _draw_change method takes quite a few parrameters for highlighting and annotation that change based on the attributes of the data being drawn. This is sometimes called conditional formatting. Let's take a closer look at how those are determined.def draw(self, sketch): """Draw the visualization. Args: sketch: The sketch in which to draw the visualization. """ self._draw_title(sketch) self._draw_axis(sketch) for change in self._changes: year = change.get_year() align = self._determine_align(year) highlight = self._determine_highlight(year) annotation = self._determine_annotation(year) self._draw_change(sketch, change, align, highlight, annotation)For our purposes, all of this conditional formatting and labeling is all based on the year where, specifically, we want to show additional information on the start and end year specifically. Admitedly, it might be a little excessive to have this logic outside of _draw_change itself. However, I find that pulling this logic out can improve readability even if the resulting methods are small. Furthermore, if the logic for annotation gets more complicated later, it may be helpful to have it in its own space as to avoid cluttering the main draw methods too much.
Using dictionaries for conditional formatting
Let's take a closer look at one of these determine methods:
def _determine_annotation(self, year): """Determine what annotation if any should be added for the given year. Args: year: The year for which an annotation should be returned. Returns: Annotation text to display next to results for the given year or None if no annotation should be added. """ if year == START_YEAR: return START_YEAR_ANNOTATION elif year == END_YEAR: return END_YEAR_ANNOTATION else: return None
One pattern you may encounter is changing this to a dictionary which can help condense the code:
def _determine_annotation(self, year): """Determine what annotation if any should be added for the given year. Args: year: The year for which an annotation should be returned. Returns: Annotation text to display next to results for the given year or None if no annotation should be added. """ options = { START_YEAR: START_YEAR_ANNOTATION, END_YEAR: END_YEAR_ANNOTATION } return options.get(year, None)
Here, the second parameter to get is the default value to use if the key is not found.
Constants
We use constants throughout the code. While these variables in all caps are the same as any other variable as far as Python is concerned, developers understand by convention that all caps means that the value of these variables is not expected to change throughout the execution of a program. I like getting a big list of constants so that I can quickly tweak values and see how different options feel. That said, sometimes it can be a good idea to read these from the data themselves. For example:years = [change.get_year() for change in fuel_standard_changes] start_year = min(years) end_year = max(years)If you go this route, you might consider doing something similar for min_standard and max_standard.
Personal advice for constants
I tend to go through a few visualizations before I find one that fits well. That in mind, I will often leave a lot in constants at first as I explore lots of different forms and then I make it responsive to the data after I know there's a need. After all, sometimes the graphic won't be redrawn with new data! However, this isn't universally accepted advice.
Reflection
Go ahead and run the sketch if you haven't already. There is a question worth asking as we finish up: when is taking this more manual approach to drawing visualizations the right choice? The answer depends a bit on the context:- How you would produce the chart in something like your spreadsheet software?
- Would using a pre-built chart be be faster or, alternatively, at what point of overriding defaults is it easier to build it from the ground up?
- Does the pre-built chart force you to make a certain design choice that you don't think is right?
- Common charts: Some charts you will make are very common. While the default settings might not always be the best, often going beyond the pre-built graphic isn't worth the extra time when you just need something like a simple scatterplot. Instead, you can use your knowledge from this course to decide on the right representations and to make key tweaks to the pre-built graphics.
- Uncommon charts: Still, some charts you build won't have been made before so there won't be a pre-built option! Alternatively, some graphics like a slopegraph might not be well supported in the graphing software you are using so might require custom drawing regardless.
- High value visualizations: Even if there is a pre-built chart, this manual approach might give you the control you need to both consider and tailor each part of the graphic to just the perfect configuration. This could be necessary for some graphics that really matter. In particular, I find myself having to override too many things for these "high value charts" to reduce chartjunk or improve data-ink ratio when using pre-built graphics such that I have to go to Sketchingpy to get it how I want it.
- Customization labor: Some of the tricks we explore around direct labeling or performing certain annotations might be difficult to configure in some packages. We will see some more tricks soon that, while possible, can sometimes require more work to achieve in pre-built graphics than it is to do it on our own. This can change how much data density you can achieve or if the chart is easy to read.
- Interactivity: When we get into interactive visualizations in particular, you might find that creating specific user actions might be difficult in pre-built charts. What should happen when a user hovers on a data point in scatterplot depends a lot on the context of the chart and what other charts are nearby. The original developer of the pre-built graphic might have had a different use case in mind when they developed the default interactivity.
Other graphics and creative programming
We've are focused on data visualizations in this course. However, a lot of what you learn could translate to other forms of creative coding from video games to visual art. In particular, some of the patterns we are using in our code are common to other forms of graphical programming including different libraries and programming languages. The ideas and structures of Sketchingpy are commonly seen elsewhere. Furthermore, some of our techniques can cross over into other design disciplines. These skills may be relevant in your other endavors!
Next
This concludes Skills Lab 4. We will return to regular in-person instruction for Lecture 11. See you then for a favorite: Clevland and McGill!Citations
- A. Pottinger, "Sketchingpy." Sketchingpy Project, 2024. [Online]. Available: https://sketchingpy.org/
- New York Times, "Economic Scene," New York Times Corporation, Aug 8, 1978. [Online]. Available: https://www.nytimes.com/1978/08/09/archives/economic-scene-a-collision-course-on-energy-policy.html
- USWDS, "Public Sans," General Services Administration, 2026. [Online]. Available: https://public-sans.digital.gov
- E. Tufte, "The Visual Display of Quantitative Information," Graphical Press, 2001.
- Buggy Programmer, "List Comprehension in Python Explained for Beginners," Free Code Camp, 2021. [Online]. Available: https://www.freecodecamp.org/news/list-comprehension-in-python/
- M. Khalid, "Map, Filter and Reduce," Python Tips, 2017. [Online]. Available: https://book.pythontips.com/en/latest/map_filter.html
- Python Software Foundation and Python Contributors, "Python," Python Software Foundation, 2026. [Online]. Available: https://python.org
- Pandas Contributors, "Pandas," Pydata, 2026. [Online]. Available: https://pandas.pydata.org
- Processing Contributors, "Processing," Processing Foundation, 2026. [Online]. Available: https://processing.org
- P5js Contributors, "P5js," Processing Foundation, 2026. [Online]. Available: https://p5js.org
- MDN Contributors, "Canvas API," Mozilla, 2024. [Online]. Available: https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API
- U. Petri and H. Gutmann, "PyFormat," PyFormat, 2016. [Online]. Available: https://pyformat.info
- B. Vijayan, "Architectural Pattern - Model–view–presenter (MVP)," DEV Community, 2024. [Online]. Available: https://dev.to/binoy123/architectural-pattern-model-view-presenter-mvp-28hl
- M. Cisneros, "What is a slopegraph?" Storytelling with Data, 2025. [Online]. Available: https://www.storytellingwithdata.com/blog/2020/7/27/what-is-a-slopegraph
- Periscopic, "Periscopic: Do good with data," Periscopic, 2025. [Online]. Available: https://periscopic.com/
- Fathom Information Design, "Fathom Information Design," Fathom Information Design. [Online]. Available: https://fathom.info
- Stamen, "Stamen," Stamen. [Online]. Available: https://stamen.com
- N. Felton, "Feltron," The Office of Feltron. [Online]. Available: http://feltron.com
- J. Harris, "Jonathan Jennings Harris," Johnathan Jennings Harris. [Online]. Available: https://jjh.org