Gauge R&R Explained for Quality Engineers: How to Run One, Read the Results, and Keep the Records Auditors Want
A practical Gauge R&R guide for quality engineers — how to run a study, what %GRR means, parts selection mistakes, and the audit findings that keep showing up.
A Tier 2 machining supplier in Michigan submitted a PPAP package on a new bracket. Capability looked clean — Ppk of 1.78 on the critical bore diameter. Two months later, the customer's incoming inspection started rejecting parts for the same characteristic. The supplier ran another capability study from production and got Ppk = 0.94. Same operators, same machine, same gauge. The original PPAP had been done with a gauge that had a %GRR of 41%. Nobody had checked. The capability number on the submission wasn't measuring the process. It was measuring the noise of the gauge.
That's the version of measurement system analysis that keeps producing failed PPAPs and capability claims that can't be defended in front of an auditor. Gauge R&R isn't a side activity. It's the foundation that everything else — capability indices, control charts, inspection decisions — has to stand on. If the measurement system can't tell good parts from bad, none of the data downstream of it means what you think it means.
This guide covers what Gauge R&R actually is, how to run one without making the mistakes that show up in audit findings, how to read the output, and what records you need to keep so the study holds up under scrutiny. It's written for quality engineers who need the working knowledge to run a study and defend it — not the textbook treatment.
What Gauge R&R is and what it isn't
Gauge R&R is a measurement system study. It quantifies how much of the variation you observe in your measurement data comes from the measurement system itself, as opposed to actual differences between the parts.
The "R&R" part is two distinct sources of measurement noise. Repeatability is the variation you get when the same operator measures the same part with the same gauge multiple times. It's the gauge talking to itself. Reproducibility is the variation you get when different operators measure the same part with the same gauge. It's the operators disagreeing with each other through the gauge.
Add those together and you get gauge R&R — the total measurement system variation. Compare it to the total variation in the data, and you get %GRR, which tells you what fraction of your observed variability is the gauge and what fraction is the actual process.
What Gauge R&R is not: it is not a calibration. Calibration verifies that a gauge reads correctly against a known standard — it answers "is this gauge accurate to its spec?" Gauge R&R answers "can this gauge tell parts apart well enough to make decisions about them?" A perfectly calibrated gauge can fail a Gauge R&R if its resolution is too coarse for the tolerance you're measuring. The two studies serve different purposes. Auditors and customer reps know the difference. Confusing them is one of the most reliable ways to get a finding written.
The other thing Gauge R&R is not: a one-time activity. IATF 16949 Clause 7.1.5.1.1 requires statistical studies on each type of measurement system identified in the control plan, and customer-specific requirements often add re-study triggers — gauge repair, change of operator population, new process, periodic interval. Doing it once at PPAP and never revisiting is the failure mode that shows up in surveillance audits constantly.
How to run a Variable Gauge R&R study
The standard study, the one most quality engineers will run dozens of times in their career, is a crossed Variable Gauge R&R using the AIAG MSA Manual approach. Three operators, ten parts, three trials each. That's 90 measurements total.
The structure matters and most of the technical findings come from getting it wrong:
Pick ten parts that represent the actual process variation. This is the single most consequential decision in the study and the one most often done badly. The parts you measure need to span the range you'd expect to see in production — not ten parts pulled from one shift's output, not ten parts that all came off the same fixture position, not ten "golden" samples that have been verified by an external lab. The AIAG and many customer CSRs are explicit: parts should represent the expected process variation, ideally with a roughly even distribution across the range of part-to-part variation. A common formulation customers use is something like 25% near the low end of process spread, 25% near the high end, and 50% around nominal.
If you grab ten parts from the same production lot with no spread, you'll either get a falsely good %GRR (because there's almost no part variation, but the math compares R&R to total variation) or a falsely bad one. Either way, the result is meaningless. Auditors who know what they're looking at will ask where the parts came from, and the answer "the operator grabbed them from the bin" is an immediate problem.
Pick operators who actually use the gauge. The reproducibility component is measuring operator-to-operator variation. If the three people you select are the quality engineer, the lab tech, and the line lead — none of whom run the gauge in production — the study isn't measuring reproducibility of the actual measurement system. Pick the operators who do the measuring on the floor.
Randomize the part order. Each operator measures all ten parts once (Trial 1), then all ten again (Trial 2), then all ten a third time (Trial 3). The part order in each trial should be randomized, and the operator should not know which numbered part they're measuring. If the operator can see they're on "Part 4" and remembers measuring 50.012 last time, they'll bias their next reading toward that number. The result is artificially good repeatability — the gauge looks better than it is.
Use the same gauge under the same conditions for the entire study. Same fixture, same probe tip, same datum setup. If the study mixes data from two gauges, or if calibration drifts mid-study, the variation gets folded into something the math can't separate cleanly.
Identify the parts uniquely but blind the operators. Mark each part with an internal identifier the engineer can read but the operator can't see while measuring. Adhesive labels on a face the operator doesn't reference, or a position-coded fixture map. The point is that you can identify which physical part each measurement corresponds to without giving the operator a memory cue.
That's the data collection. The analysis is what most software does for you, but understanding what the software is doing keeps you from accepting a misleading result.
What %GRR actually means and the AIAG thresholds
The output of an AIAG Gauge R&R is several numbers. The two that matter most are:
%GRR (% Study Variation, or % of Tolerance, depending on which the study is set up for). This is gauge R&R variation as a percentage of either the total observed variation in the study or the tolerance width on the characteristic. The acceptance thresholds are:
- ≤ 10% — measurement system is acceptable
- 10% to 30% — conditionally acceptable depending on application, cost of the measurement device, criticality of the characteristic, and customer-specific requirements
- > 30% — not acceptable, the measurement system has to be improved
The 10/30 split is the AIAG convention and it's the number most CSRs reference. Some OEMs are stricter — they require ≤ 10% on critical and safety characteristics with no conditional zone. Some industries use different conventions. But for automotive PPAP and most general manufacturing, the AIAG numbers are the default.
Number of Distinct Categories (ndc). This is the number of distinct groups the measurement system can reliably resolve within the part-to-part variation. The AIAG threshold is ndc ≥ 5. If your ndc is 1, the gauge is only telling you "the parts are about the same" — it can't separate them into meaningful groups. Below 5, the gauge is acting more like an attribute gauge than a variable one, regardless of what the digital readout says.
A practical interpretation gap that trips engineers up: %GRR can pass while ndc fails, or vice versa. The two numbers describe related but distinct properties of the system. AIAG considers both, and a competent auditor will look at both. Reporting only %GRR and ignoring ndc is a frequent gap in PPAP packages.
The other thing worth understanding: %GRR can be expressed as a percent of study variation (%SV) or as a percent of tolerance (%Tol). They are not the same number. %SV uses the total observed variation in the ten parts you measured. %Tol uses the engineering tolerance width. If your parts in the study have very tight spread but your tolerance is wide, %Tol will pass while %SV fails. If your parts span the whole tolerance and your gauge has decent resolution, both will tell a similar story. Customer requirements often specify which to report — Ford and Stellantis CSRs are explicit on this — and using the wrong one can get a PPAP rejected on a technicality.
Attribute Gauge R&R for go/no-go and visual checks
Not every characteristic is measured with a variable gauge. Visual inspection, go/no-go gauges, hi-pot pass/fail, and similar attribute checks need a different study. The AIAG calls this an Attribute Agreement Analysis or attribute Gauge R&R.
The structure is different in three ways:
- More parts. The standard recommendation is 50 parts (some customer CSRs require more), selected to span the decision boundary — roughly 20% clearly accept, 20% clearly reject, and 60% near the spec limit where calls are hard.
- The same operators measure each part multiple times (typically 3 trials), and the analysis looks at agreement: operator vs. operator, operator vs. self (repeatability), and operator vs. reference (the master decision made by an authority — quality engineer, customer rep, or laboratory).
- The output is percent agreement and Kappa coefficients rather than %GRR. The interpretation thresholds are different — Kappa above 0.75 is generally acceptable, below 0.40 is poor.
Two findings show up over and over in attribute Gauge R&R audits. First, the parts in the study were all easy calls — clearly good or clearly bad — so the operators agreed 100% but the study didn't actually test the borderline cases where the disagreements live. Second, no reference standard was established, so there's no way to evaluate whether the operators are agreeing with each other but consistently calling marginal parts wrong.
The fix in both cases is in the parts selection. The hard parts have to be in the study. Without them, the data doesn't tell you anything useful about how the system performs in practice.
The mistakes that produce audit findings
The pattern of Gauge R&R findings on quality forums and audit reports is consistent enough that you can predict most of them:
Studies done once at PPAP and never refreshed. A finding gets written for not having a current study, the supplier produces the original from three years ago, and the surveillance auditor finds the same study still in the file the next year. Re-study triggers — gauge repair, fixture rework, change of operator population, scheduled interval — need to be defined in the procedure and actually executed.
Capability studies referencing gauges with no MSA on file. The quality engineer pulls a Cpk study during the audit. The auditor asks for the Gauge R&R on the instrument. The folder doesn't have one. This is the most direct path to a major finding because it invalidates the capability claim entirely.
Inadequate gauge resolution. The rule of thumb is that gauge resolution should be 10% or better of the tolerance — for a ±0.005" tolerance, you need a gauge that reads to 0.0001" or better. A study run on a gauge that doesn't have the resolution to resolve the variation will produce flat data and meaningless statistics. Auditors who pull the data sheets behind a marginal study will catch this.
Wrong study type for the characteristic. Running a variable Gauge R&R on a go/no-go gauge, or running attribute analysis on continuous measurement data. The study has to match the data type the gauge produces.
Studies that ignored ndc. %GRR was 22%, the supplier reported it as conditionally acceptable, and the customer rep flipped the same study and saw ndc = 2. The system doesn't have the resolution to separate the ten study parts into meaningful groups. The %GRR alone hid the problem.
Records that don't capture what the operators did. The Gauge R&R worksheet shows the numbers. It doesn't show who measured which part on which day, in what order, with what gauge serial number, under what environmental conditions. When an auditor asks "show me how this study was actually run," the answer needs to be a record, not a recollection.
Conflating calibration and Gauge R&R. A calibration certificate is presented as evidence the gauge is "good for the application." Calibration says the gauge reads accurately against a standard. It says nothing about whether the gauge can resolve the variation in your actual production parts at your actual tolerance. Auditors who know the difference will not accept calibration in place of an MSA study.
Ignoring a failed study. This one shows up on Elsmar threads with surprising regularity. The %GRR comes back at 38%, the team has an audit in three weeks, and the decision is to file the study and hope nobody pulls it. Auditors do pull them. A failed study with no follow-up corrective action is a worse finding than no study at all, because now you have documented evidence that the measurement system is incapable and that the organization knew and did nothing.
Where the records have to live
The structural problem with Gauge R&R isn't the math. The math is fixed by the AIAG manual and embedded in any decent statistical software. The problem is that a defensible Gauge R&R study has to link several pieces of evidence together: the data sheet (raw measurements), the analysis output (%GRR, ndc, ANOVA tables), the parts identification (which physical parts were used and where they came from), the operators involved, the gauge calibration certificate that was current at the time of the study, the reason the study was triggered (PPAP, gauge repair, scheduled interval), the disposition (accepted, accepted conditionally, rejected with action), and the link to the corrective action record if the study failed.
Most quality teams have all of these pieces. Few have them connected. The data sheet is in one Excel file, the analysis is in a Minitab project, the calibration record is in the gauge management system, the operator training records are in HR, the corrective action is in the CAPA log, and the only thing connecting them is the memory of whoever ran the study. When that person leaves, or when the audit happens three years later, the connection is gone.
The result is a Gauge R&R that exists on paper but can't be defended in front of an auditor who walks the trail. The %GRR number is in the PPAP. The auditor asks for the parts list — it's somewhere. The auditor asks for the calibration cert from that date — that takes a phone call. The auditor asks where the corrective action went after the study failed and was redone — that one's harder. By the time the trail goes cold twice in a row, the finding writes itself.
This is the same structural pattern that shows up in SPC reaction plans, document control records, and CAPA tracking — compliance activities that require a linked, traceable, tamper-evident history of who did what and when, running on a stack of disconnected spreadsheets and folders. That's the gap SheetLckr is built to close: compliance-grade spreadsheets with built-in version history, approval workflows, and tamper-evident audit trail, so the data sheet, the analysis, the parts source, the operators, the calibration link, and the disposition all live in one place and hold up to a registrar. The study is only as good as the records around it. Most failures are records failures, not measurement failures.
Gauge R&R is one of the few quality activities where doing it correctly the first time saves you from a long chain of downstream problems — bad capability data, bad PPAP submissions, bad inspection decisions, bad corrective actions. The math is approachable, the AIAG manual is the reference, and the failure modes are well-documented enough that you can avoid most of them by paying attention to parts selection, operator selection, and re-study triggers. What separates a study that holds up at audit from one that doesn't is rarely the calculation. It's the records around the calculation, and whether they tell a coherent story when somebody outside your organization comes asking.
Stop patching Excel. Run audits with confidence.
SheetLckr gives quality teams a spreadsheet with built-in audit trails, version locking, approvals, and CAPA tracking — so you're always audit-ready, not scrambling the week before.