Back to Home
SETlib hero

SETlib

Reducing worksheet preparation time by 81% by centralizing 17 years of course content

Role
Product Design Intern
Team
3 Designers5 EngineersProject Manager
Timeline
June 2025 – January 2026
Tools
FigmaReact

100,000+ files organized by date— but faculty think in topics. I redesigned how CS facilitators find and assemble course materials across two releases.

Facilitators at UW Tacoma spend 3–5 hours weekly pulling problems from 17 years of archived content — organized by quarter and week, not by concept. I owned the facilitator experience end-to-end across two releases. V1 was built around an unreliable parser, and V2 was redesigned after production revealed its limits.

Facilitators spent hours per week hunting for problems.

Facilitators are undergraduate TAs with only six hours outside the classroom split between meetings and creating unique weekly worksheets. They spend their time pulling problems from Google Drives, organized chronologically by quarter and week.

But facilitators think in concepts: "I need a binary tree problem" or "I need something on recursion." The system's organization didn't match anyone's mental model, so finding the right problems meant browsing folders until something looked right.

Current Facilitator Workflow3–5 Hours Weekly
1
Open Google Drive
Navigate to course archive
2
Browse by quarter
Fall 2023 → Winter 2022 → Spring 2021...
3
Open folder, scan files
Read filenames, guess at content
4
Wrong topic
File doesn't match what they need
FRICTION
5
Go back, try another folder
Repeat across quarters and weeks
FRICTION
REPEAT
Give up searching
Cost of searching exceeds cost of rewriting
FAILURE
Rewrite from scratch
Duplicate work that already exists somewhere in the archive
FAILURE

The parser meant to automate content migration was unreliable, and that shaped every design decision.

This was the defining technical constraint. The other structural problem: facilitators and professors had no communication loop. No approval process and no shared visibility into what was being created.

Every facilitator I interviewed had given up searching for content.

I interviewed 8 facilitators and 2 professors, focusing on where time was lost and what prevented them from finding the right problems.

Facilitators were abandoning searches because the time spent finding a problem exceeded the cost of rewriting one from scratch. This reframed the problem from "make search faster" to "make search usable."

FacilitatorsUndergraduate TAs
Speed
1
Find problems
By topic, not by date
2
Build worksheets
Assemble and organize quickly
3
Submit for review
Move on to next task
Pain: 3–5 hours/week spent searching, often giving up entirely
ProfessorsCurriculum Leads
Oversight
1
Review submissions
Ensure quality and accuracy
2
Track what's taught
Visibility across all sections
3
Approve content
Feedback loop with facilitators
Pain: No visibility into what facilitators create, no approval workflow
Same system, different priorities → Role-based design

I built a React prototype to understand where the parser breaks.

Engineering had shared that the parser struggled with complex formatting, but I needed concrete data to design effective failure states. Rather than wait for the full implementation, I built a functional frontend in React and connected it to the backend API to test the parser before committing to design decisions.

React PrototypeV1
React prototype integrated with parser API

Basic LaTeX parsed reliably, but complex notation, math diagrams, and anything with embedded structure broke consistently. This told me failure states couldn't be edge cases but had to be the primary design consideration in V1.

Sharing the failure data with the team also gave us a shared understanding with engineering. Instead of abstract concerns about parser reliability, we had concrete examples of what broke and why.

The parser failed unpredictably, so V1 was built to expose problems rather than hide them.

Every decision in V1 traces back to one question: when the system gets something wrong, how does the user know, and what can they do about it?

The facilitator-facing experience (search, assembly, review, dashboards) shipped as the MVP, while the parsing and content insertion tools ran as a separate internal interface. This let us deliver value to facilitators immediately while engineering continued developing the parser in parallel.

Side-by-side validation gave users full transparency into what the parser got right and wrong.

Every uploaded problem required manual review before it could enter the system. I designed a split-view that placed the original file next to the output so users could instantly spot where content was misinterpreted. Errors were flagged inline and manual correction tools let users fix issues directly.

This was a deliberate trade-off: mandatory review slowed down every upload, even successful ones. But with the parser failing this frequently, trust mattered more than speed.

side-by-side validationV1
side-by-side validation

Search was reorganized around concepts using patterns facilitators already understood.

Instead of navigating folders labeled "Fall 2019, Week 7," facilitators could filter by concepts like Data Structures, Algorithms, or Recursion. I validated these categories against past curriculum and alongside professors to make sure the taxonomy matched real mental models.

Difficulty filtering (Easy, Medium, Hard) was inspired by LeetCode, a framework CS students already use to judge problem difficulty. The overall interaction followed an e-commerce pattern: browse, filter, and add to cart to keep the flow familiar and intuitive.

problem database with filtersV1
problem database with filters

Assembly and review mirrored real facilitator behavior and closed the communication gap with professors.

Once problems were selected, facilitators moved into assembly where they could reorder and finalize their worksheet. I added a difficulty breakdown so they could check whether the worksheet felt balanced before sending it off. As well as a built-in place for facilitator-to-professor context that hadn’t existed before.

assembly viewV1
assembly view

Prep time dropped from 4.2 hours to 0.8 hours, but production exposed a critical limitation.

We validated V1 with 8 facilitators over 4 weeks using identical tasks. The results confirmed the core concept worked: an 81% reduction in prep time, with facilitators completing worksheets in a single sitting instead of spreading work across multiple days.

0%
Prep Time Reduction
From 4.2 hours to 0.8 hours per worksheet
$0K
Projected Annual Savings
Reduced labor costs from streamlined preparation
0
Facilitators Validated
4-week controlled study with identical tasks

Facilitators told us prep finally felt predictable. Professors said seeing all pending submissions in one place removed years of back-and-forth. But once V1 went into production for fall quarter, we saw something our controlled tests hadn't revealed.

90% of parser failures traced back to embedded images, the most common content type in CS problems.

Our prototype used small, text-only samples. Production was different. Real CS problems are full of tree diagrams, graph visualizations, and annotated figures. The parser couldn't interpret any of them.

Production Content vs. Parser Capability90% FAILURE RATE
What the parser handled
Plain text problems
Basic LaTeX notation
Simple formatting
What production contained
Tree diagrams
Graph visualizations
Annotated figures
Embedded images
Math diagrams

Mandatory review became the default experience rather than the safety net. Facilitators started bypassing it entirely. One faculty member simply asked: "Can I just insert problems manually?"

I brought production data to engineering and it directly shaped their priorities for the new parser.

Step 1
What I found
Step 2
What I advocated
Step 3
What engineering built
90% of failures from embedded images
Parser must handle images as primary content type
Image interpretation via hybrid LLM
Users wanted to bypass parsing entirely
Manual entry as a first-class workflow, not a workaround
Manual insertion flow with live preview
Mandatory review created friction on successful parses
Confidence scoring to make review conditional
Accuracy scoring API with tiered thresholds

A hybrid LLM parser changed what was possible, so I redesigned the workflow around trust instead of caution.

Engineering's new parser combined rule-based parsing with AI-driven interpretation. It handled PDFs, Word files, and mixed formats, interpreted embedded images, and surfaced specific diagnostics about what it struggled with and why.

This fundamentally changed the design problem. V1 asked "how do we help users recover from failure?" V2 asked "how do we help users know when they don't need help?"

I designed this framework — engineering built the API output around it.

CONFIDENCE SCORING FRAMEWORK
File Uploaded
Parser Analyzes
Confidence Score
High
95–100%
System signal
Content preserved, structure validated, metadata extracted
User action
Trust & move on
Save immediately, editing optional
Medium
70–94%
System signal
Partial issues detected, most content preserved
User action
Review flagged items
Specific issues highlighted, quick fixes available
Low
Below 70%
System signal
Significant failures, content unreliable
User action
Full review or manual entry
Diagnostic breakdown, guided next steps, manual fallback

When the parser is highly confident, the interface gets out of the user’s way.

The confidence score makes the system’s reliability visible, and the parsed content becomes the default view instead of a side-by-side comparison. Users can still edit inline if they want, but they’re no longer forced to validate content that doesn’t actually need correction.

high-confidence parseV2
high-confidence parse

The Trade-off

If a high-confidence score ever let a badly parsed problem through, users would stop trusting the system and revert to reviewing everything. I set the bar for "high confidence" high enough that users who skipped review would rarely encounter errors, even if that meant more uploads landed in the medium-confidence tier than necessary.

When the parser struggled, V2 showed exactly why and gave users clear paths forward.

Low-confidence parses surfaced a breakdown with plain-language explanations rather than generic error messages. Recommended actions gave users concrete next steps: edit directly, try a different file format, or switch to manual insertion.

low-confidence parseV2
low-confidence parse

Manual insertion became a key workflow, not just a fallback.

This came from V1 feedback. I designed a complete manual entry flow: type problem content, attach images, set metadata, and see a live preview. Positioning this as a deliberate workflow rather than an emergency exit was important.

manual problem entry with live previewV2
manual problem entry with live preview

V2 eliminated the friction that held V1 back.

0%
Fewer Mandatory Reviews
Uploads processed without manual edits
0%
Content Accuracy
Maintained after removing mandatory review
0%
Facilitator Adoption
Professors mandated the switch within one quarter

V1 saw inconsistent usage during fall quarter because facilitators didn't trust the parser. After V2 shipped for winter quarter, professors mandated the switch. The system they once avoided became the one they enforced.

When your backend is unreliable, transparency becomes the product.

V1 taught me that users will tolerate imperfect systems if they can see what's happening and fix what's wrong. V2 taught me the opposite: when the system becomes reliable, the fastest thing you can do is get out of the user's way. Trust isn't a feature you add. It's something you recalibrate every time the technology changes.

Getting close to engineering made me a better designer.

Building the React prototype, stress-testing the parser, and defining the confidence scoring framework all required understanding the system at a technical level. That proximity changed which questions I asked, which constraints I pushed back on, and which ones I designed around. The most impactful decisions came from understanding the backend well enough to know what was possible and what was worth advocating for.