J. Edward Guthrie

Monday, November 13, 2017

Hey Alabama

I couldn't tell you how many different Alabama Sunday School classrooms I've attended. Probably a couple dozen in Scottsboro alone, and I went to each several times. Each had different layouts and levels of lighting. Different colors or paintings on the walls. Some knotty pine siding. They had different materials on their tables, from old pencils with no erasers that made it impossible to cover up your mistakes on the mazes that led Joshua and the Israelites to the Promised Land, to other churches with brand new, scented markers for all of the many colors in Joseph's coat.

So it was easy to picture any of them, or each of them, as I read for the first time about the bombing of the 16th Street Baptist Church. I pictured it happening here, thirty years later. I think we all did. At First Baptist, just off the Square, or First Methodist, just down Broad Street. At Calvary. At Eastside. At Cumberland Presbyterian. At Trinity Baptist or Trinity Lutheran. As I looked at the 2x2 grid of the faces of Addie Mae Collins, Carol Denise McNair, Carole Robertson, and Cynthia Wesley in that third-grade Alabama History textbook, in my mind I put different faces over each, depending on which church it was. Rob's, Kate's, Doug's, Josh's, Scott's, Amanda's, the other Amanda's. My own. I think we all did. We were Alabamians. We were those girls.

And I identified with those girls, and not the KKK, because I believed in the goodness of Alabama. I think we all did. And when the fact that one of the men had already died in prison was something our teacher had to add because we were still using textbooks printed before Robert Edward Chambliss was ever sentenced, I let our teacher's knowledge and emotion speak for the quality of our Alabama education, not the copyright date and loose pages of the textbook. I think we all did. And from the implications of that addendum, I felt more pride in men like William Baxley, a student at Bama at the time of the bombing who was able to re-open the closed case once he rose to state Attorney General in 1971, than I did shame for the scuttled evidence and miscarriages of duty that had delayed justice for so long.

It was the year I graduated from Scottsboro High School, 2001, that the surviving "Cahaba Boys" were finally tried for the murder of those four girls. Thanks to a website I'd founded to aggregate Alabama prep track results, I spent much of that spring reading local papers from across Alabama, and that meant following the trials of Thomas Blanton and Bobby Cherry. I followed those trials with a lot of anger at the memory of what those two men did, and I retched at how far the defense attorneys tried to stretch the meaning of the word "circumstantial." Still, I felt certainty that this ugliness, along with that from forty years past, would soon be punished. I think we all did.

It was Doug Jones who prosecuted the case, 38 years after the bombing, arguing, "It's never too late for the truth to be told. It's never too late for a man to be held accountable for his crimes. It's never too late for justice." My optimism in my home state, perhaps unwarranted, was affirmed with the Blanton and Cherry convictions. Still, I shuddered to think how easily those men, nothing but scourges to Alabama, could have walked free forever. As easily as there not being a William Baxley, or a Doug Jones. I wanted all of America to know Alabama for men like Baxley and Jones, not Cherry, Blanton, Chambliss, or the countless corrupt, unnamed officials who either actively conspired or turned blind eyes to evidence to let them escape justice for so long. I think we all did.

Next month, that's a possibility. Doug Jones is running for US Senate, and he can represent Alabama to the rest of the country on Capitol Hill for the next three years and beyond. Born in Alabama, educated in Alabama, he's been working in Alabama his whole life to help write a state history we can be more proud of, the kind of state history that I know I want for Alabama. I think we all do.

Please, on December 12, be counted. As one of our great native sons wrote, there can't be more of them than us.

Tuesday, June 07, 2016

Heart rate monitor test, Pt 2

In yesterday's post I compared readings from the wrist-based Fitbit Surge and a Garmin chest-strap heart rate monitor connected to a motoACTV. The activity was a base mileage run, with a pretty steady effort that should have stayed around 155bpm, and mostly did according to both devices. Though the two weren't quite identical, the differences seemed more random than systematically biased toward one being always higher or lower than the other. There was one point in that activity, coming up a hill near the end of the run, where the Garmin read above 170, but the Fitbit stayed around 155bpm. That was potentially nothing, but also potentially an early warning that Fitbit would either be slower to recognize sharp elevations in my heart rate or wouldn't recognize them at all.

Today I repeated the experiment during a more intense workout: one mile warm-up, then two by two miles at about race pace with four minutes of rest in between, and a mile cool-down.

The results suggest that the late hill yesterday was an early warning:

The readings are drastically different during both of the intense intervals (the taller plateaus in the middle). Not only are the Fitbit readings lower than the Garmin, they're even lower than the readings from the warm-up and cool-down. It's as if the more intense intervals completely discombobulated the wrist monitor.

Trusting the Garmin readings as the truth, since they make more sense, HR went up to about 160-165bpm almost immediately during the intervals and then gradually inched up to about 170 over the two miles. But starting from the recovery low of about 120bpm, Fitbit takes about a quarter of a mile to get up to 140 during the first interval and almost the full two miles to get there on the second interval. And then never tops about 145. That's a difference of at least -40bpm during the first mile of the second interval, and never less than about -30.

Interestingly, the two devices are almost identical during the recovery from the second interval and the cool-down mile. There's a weird spike in the Fitbit during the walk to my car after the cool-down, but otherwise they were in sync there.

The differences during the workout are massive and potentially (though not definitively) significant. I'd say this goes beyond "discrepancy" and into the territory of full-blown malfunction during highest-intensity exercise. But I'd say that only matters if you're interested in actively managing your heart rate during exercise, such as trying to hit a target zone or prevent over-exertion. That's it. In terms of calorie measurement, considering the short length of time you're likely to be in this peak HR zone, the readings won't make much of a difference for total daily caloric burn---for this workout it was a difference of only 20 calories. Considering the advantages I see in having a (slightly) lighter device and not needing to wear a chest strap, the potential to get an extra 20 calories of "credit" for the day isn't worth it. Plus, those 20 extra calories captured by a chest strap aren't going to be part of a daily total because no one is wearing a chest strap all day; anyone doing daily totals is already relying on the wrist watch. And to you, I say don't worry about the difference in readings. But again, if you have heart issues or are otherwise interested in fine-tuing your exertion during workouts, Fitbit is probably not reliable enough and you should wear a chest strap.

The rest of my training week is easy runs. Saturday I'll do a long run, and unless I do it super early it'll be a hot one. I'll do the same experiment for those runs but won't update this again until Sunday night.

Monday, June 06, 2016

Heart rate monitor test, Pt 1

Since last week's news of a class action lawsuit against Fitbit for inaccuracies in its heart rate monitoring, I pulled out my old Garmin chest strap for the first time in about eight months to do a comparison. I'd gotten a Fitbit Surge (as a gift; thanks Anne!) back in November. With its wrist-based monitoring, I effectively retired the much less comfortable chest strap, so it took some digging around to find where I'd put it.

I've become pretty invested in the constant monitoring data the Fitbit offers, even spending time trying to deduce whether my slowing resting HR was the result of my heart strengthening as I increased my mileage throughout the spring, or part of my metabolism's effort to keep or regain the pounds I was shedding as a result. If the monitor's inaccurate, it calls all of that into question.

As I've learned more about the suit, I'm less worried about the resting HR accuracy since the claims are focused on discrepancies between Fitbit and ECG reading during moderate-to-high-intensity exercise. I haven't read beyond what's in the popular press, but I wonder if there's a constant error percentage, and the larger differences in terms of bpm during exercise are simply the result of the larger multiplier provided by elevated heart rates. And using my HR as a check against overexertion, especially as we get into warmer temperatures, is probably a much more important concern than whether the resting HR data supports a new pet theory.

So, as a first test, I ran five miles at a normal effort for my non-workout days with both the Fitbit Surge and the Garmin chest strap linked to my motoACTV (maybe still the best fitness watch out there, and it's now six years old). I also let each record a walking cool-down of about half a mile to capture how they measured my HR as it fell back down to normal. Downloading the GPX file from Fitbit and the TCX file from motoACTV, I linked up the times and associated HR readings, added identifiers for which device the reading came from, and then sorted them all on time. Graphing HR against time, but identifier variable gives this comparison:

There are differences throughout the activity, but not a consistent pattern in terms of one being either higher or lower than the other. The Garmin reading has more drastic swings, so there's a chance the Fitbit is slower to recognize changes in HR. I've definitely noticed that it's slower in recognizing changes in live pace readings during runs, so it's possible.

In the first quarter of the run, Fitbit is consistently higher than Garmin, usually by around 10-15bpm. For most of the rest of the run they stay pretty close. There is one point where Garmin starts registering a HR over 170, and the Fitbit is still around 155. That's worth flagging because a 175bpm reading is around the point where I take the cue to start consciously backing off; I would have missed that cue with only Fitbit.

On the recovery, it's the opposite of what I would expect based on the rest of the chart--Fitbit seems to recognize my steady "walking HR" of about 115 before the Garmin does and is 15bpm slower for that first part of my cooldown.

Nothing really stands out to me here overall. I don't consider any of the differences large enough to be concerned, but I may also be predisposed to simply not worrying about it. The spike Garmin caught that Fitbit didn't might be something to watch for, though.

Tomorrow I'll run the same test during a more intense workout. It's a 2x2mile with a rest in between, so I'll be pushing my HR higher on both intervals and have a warmup-workout-rest-workout-recovery cycle to check out.

If you use a Fitbit to track your HR during exercise, and especially if you've noticed a difference going from another monitor to Fitbit, let me know about your experience.

Friday, May 27, 2016

Flat scores, Pt 2: Crunching the numbers

This is a follow-up to my last post about how the lack of growth in US high school scores on the NAEP and decreases in the SAT could be a product of lower dropout rates and increased college opportunities. This would be a pin to the bubble of mostly conservative education reform advocates who are taking last week's release of SAT scores as a new mandate for reform efforts focused on US high schools:

But high schools have been very successful in increasing graduation rates. We know that this has a negative effect on test scores; students most likely to drop out of high schools are among the lowest performers on achievement tests, so test scores go up when these students drop out instead of taking the test and go down when students stay in school and take the tests. The question is whether this is enough to account for overall stagnation in the scores of 17 year-olds on the NAEP.

Here's the table of NAEP results from Michael Petrilli's Fordham article:

The table shows that from 2004 to 2012, math achievement increased by 8 points for nine-year-olds (4th grade), 5 points for thirteen-year-olds (eighth grade), but just 1 point for seventeen-year-olds (12th grade).

Friday, April 29, 2016

The Myth of the Ineffective Teacher

Earlier this month a California appeals court overturned the 2014 ruling in Vergara v. California that would have largely eliminated teachers' tenure protections in the state. This appeal has upset education reformers who celebrated Judge Treu's original ruling. An excerpt from Daniel Weisberg's opinion piece on the74million.com summarizes these sentiments:

"Thousands of teachers in schools across California — a small percentage but still a huge number — are not up to the job. These grossly ineffective teachers are derailing their students’ academic futures. Poor and minority students are more likely than others to be assigned to one of these teachers. And all of this is happening because of state laws that make it practically impossible for schools to replace the relatively few teachers who shouldn’t be there."

It's not a myth that there are ineffective teachers---there certainly are. Nor is it a myth that poor and minority students are more likely to have ineffective teachers---they are. The myth is in the last sentence, that the existence of ineffective teachers and their assignments to schools and students results from tenure protections. It's the myth of ineffective teachers as permanently entrenched, perpetually under-performing check collectors. That idea, as well as the notion that replacing existing tenure laws with performance-based retention, is based on a number of not-so-obvious assumptions which each break down under further examination. I list those assumptions and conclude with findings from my own research.

1. "Ineffective teachers" will be back next year unless we replace them.
This is part of a more broad principle (often ignored in discussions of teacher retention) that any benefits to a retention/dismissal decision occur after that decision is made. If Mr. Farine's performance with the graduation class of 2018 is deemed unacceptable and he is replaced, it is the classes of 2019 and 2020 who stand to benefit from that decision; for the class of 2018 it is too late. But if Mr. Farine was going to quit before the next school year anyway, labeling him ineffective has no benefit to anyone--it doesn't change a thing. The classes of 2019 and 2020 would have had a different teacher anyway.

Research shows that teachers identified as ineffective by any measure---principal ratings, value-added, student surveys---already quit voluntarily at much higher rates than teachers higher in the performance distribution. This is even more true in the types of schools Vergara attempts to highlight, meaning that those low-income and minority students who are more likely to be assigned to an ineffective teacher are also more likely to be assigned to a teacher who won't be back next year. And, once again, if they're not coming back next year anyway, retention reform doesn't benefit anyone.

2. "Ineffective teachers" will be ineffective again next year.
Not only does Mr. Farine have to intend to return next year, we also have to assume that he would be ineffective next year, too. If he would have returned but performed well, the classes of 2019 and 2020 may suffer as a result of a dismissal decision. The assumption that teacher effectiveness is a static trait is essential to the excerpt above, but it's a problematic view. The two biggest factors making teacher effectiveness more dynamic are the returns to experience of early-career teachers and statistical noise in performance measures based on test scores. Both research and anecdote are consistent in saying that teachers improve dramatically as they climb the learning curve during those first few years on the job. Many of the teachers identified as "ineffective" by performance measures are first-year teachers who perform better the following years. If Mr. Farine is a first-year teacher with the class of 2018, the classes of 2019 and 2020 are likely to learn much more in his class the following years than the cohort before them. If he's replaced, there's a large likelihood that they'll, too, have a first-year teacher instead, and the reform designed to help them will have the opposite effect of contributing to an already problematic cycle of teacher churn.

3. "Ineffective teachers" have prior performance measures proving they're ineffective.
This is largely related to the points above, but separately highlights the fact that performance-based retention policy or tenure reform relies on prior measures of performance which are unavailable for many teachers, especially those in their first year on the job, or which did not show the teachers to be ineffective.

4. "Ineffective teachers" can easily be replaced with higher-performing replacements.
This assumption is also most problematic in the settings the Vergara plaintiffs attempt to highlight. Just as teacher attrition is higher in schools with high-minority and high-poverty student populations, recruitment of qualified candidates in these schools is much more difficult. Ironically, job security for teachers willing to work in these settings is one of the least-expensive benefits that can be offered to offset the job's less attractive features, and taking it away in all schools is likely to disproportionately harm recruitment and retention efforts in lowest-performing schools. It's certainly difficult to imagine that promising less would somehow make these jobs more attractive to effective teachers.

A forthcoming research article I've been working on examines each of these assumptions empirically using a longitudinal database of teacher performance in North Carolina. Here's what I find with "ineffective teachers." For this exercise, I define "ineffective" as those teachers falling in the bottom 5% in terms of contributions to student learning, and "adequate" as those at or above the 40th percentile.

Out of 100 ineffective teachers...
1) 25 are going to leave on their own; 75 will come back
2) 17 will be at least adequate the following year
3) 24 will fall under the same definition of "ineffective" again
4) Of those 24, just ten would be replaced by a teacher who is adequate

The big takeaway comes from focusing on #2 and #4 together: A performance-based retention policy is twice as likely to deny access to adequate instruction as it is to create it.

Tuesday, September 08, 2015

Could "flat scores" in US high schools be a sign of their success?

The College Board released its annual report on SAT scores for the high school class of 2015 last fall. Nick Anderson of The Washington Post reported on the data with an article titled "SAT scores at lowest level in 10 years, fueling worries about high schools."

The problem with attributing lower SAT scores to high schools is that SAT takers represent a self-selected sample of students who want to go to college, think they can go to college, or have been talked into taking the test by a parent, teacher, counselor, or friend. That used to be only the top high school students in the country, but the number and percentage of students who take the SAT has been going up. That's a good thing.

Rising numbers of SAT takers reflects increasing graduation rates (the US dropout rate has been cut in half since 1980), expanded college opportunities, and perhaps more success of school personnel encouraging students to take the SAT. Maybe for every 100 SAT takers we induce, another 10 go to college. It might be higher than that, but the idea is to eliminate barriers and make the opportunities seem more real for students who just need that little nudge.

But these are not students who we would expect to raise the average score, or even match the performance of the self-selected cohorts of previous years. So a drop in average scores might be a product of the success of US high schools, not a sign of their failure.

And the article does admit, "the lower the participation, generally, the higher the scores." They apply this caveat to caution against comparisons of scores between schools, districts, or states, but miss that the same caution should be applied to comparisons of scores over time. Instead they conclude, "the steady decline in SAT scores and generally stagnant results from high schools on federal tests and other measures reflect a troubling shortcoming of education-reform efforts."

Here is where red flags start coming up. "Fueling worries," as the headline reads, is fair even if the story behind the data is that high schools are doing a better job of keeping kids in school and knocking down barriers for college. But building or advancing a policy platform off of this is problematic. That's a solution in search of a problem, and it makes that flimsy interpretation of the data seem more opportunistic than careless. And it looks like the reform gadflies are beginning to swarm around this "problem."

Michael Petrilli of the Fordham Institute is quoted in the WaPo article and wrote a piece the same day titled, "Why is high school achievement flat?" in which he cites relatively flat scores for high schoolers on NAEP along with falling SAT scores over the past 10 years and contrasts this stagnation to growth in elementary and middle school scores. Petrilli addresses the selection issue more directly and honestly than Anderson, writing something similar to my description of the SAT sample in regard to NAEP:

"Students who would have previously dropped out are now staying in school and remaining in the NAEP sample, thereby dragging down the scores."

If you take a moment to appreciate what Petrilli is saying here, you may understand why I have such a problem with what he says next. Graduation rates are definitely going up. That is a good thing. It's a sign that high schools are succeeding in one of their most, if not most, important jobs. The collective efforts of district leadership, school administrators, teachers, and counselors have been effective in keeping kids in school and helping them graduate. And it's possible (I'd even say likely) that this success is entirely responsible for the phenomenon of flat scores among high schoolers. Honestly, that high school scores have held steady despite taking on so many would-be dropouts who are the very definition of "at-risk students" might be a bigger triumph than the rise in graduation rates itself!

But Petrilli then presents the kind of strident call for reform that could only be justified by iron-clad evidence that high schools are failing:

"We simply haven’t done much to reform our high schools. We are holding them accountable for boosting graduation rates, but not much else. Most charter schools operate at the elementary or middle school level. Voucher programs don’t offer enough money for top-notch secondary schools. We’ve killed off much of our CTE system. And we pulled the plug on the small schools movement just as it was starting to show results.

If we want to stop seeing flat scores at the twelfth-grade level, we need a spike in high school reform efforts."

This comes at the end of the article, but it's the way Fordham is promoting the piece:

And other prominent education policy commentators are rallying behind the same point:

Here are just some of the problems with this conclusion:

1) It ignores the undeniable success of our high schools in raising the graduation rates. If you consider achievement and graduation rates together, high schools are improving on one and doing no worse than maintaining on the other.
2) High schools might also be doing better than ever (and better than elementary and middle schools) in terms of achievement, but these gains are masked by changes in student composition due to decreased dropout.
3) We have focused high school accountability on graduation rates for precisely this reason! If we focused on achievement instead, school's would benefit from higher dropout rates because only higher-achieving students would be left to take the test. Now that they have succeed in raising graduation rates, we cannot punish them for flat test scores.
4) Charter schools and vouchers pull the rug out from under the very district and school personnel who have contributed to increased graduation rates.
5) If the obsolescence of career tech--the chief purpose of which is to make school more relevant to students who may otherwise see no point in sticking around--has coincided with marked increases in graduation rates, then there's not much here to justify keeping it at all, much less expanding it.

Now, Petrilli did call on "budding education policy scholars" to test the hypothesis that would-be dropouts are depressing overall score growth. Coming up in my next post, I'll do some estimates of the plausibility that decreased dropout accounts for the observed flattening of NAEP scores and the decreases in SAT scores. I can't "prove it empirically" as Petrilli says, but I think can make a case for it being the probable explanation.

Friday, August 14, 2015

The Problem We All Live With, Pt. 2

Finally listened to part II of the TAL feature on school segregation. Another compelling episode, this time focusing on the challenges of realizing integration given the current legal and sociological climate.

The part at the end, about the insufficiency of Race to the Top's "compensatory" model for school turnaround, didn't seem consistent with the rest of the feature, though. Take the Hartford segment for example: the story insisted that minority students in the magnet schools thrived because of the influx of resources. From the transcript:

"And if [the magnet schools] stop meeting the quota [of white students], they don't get money. If they don't get money, they can't create new schools to reach more kids. In fact, the magnet schools that already exist will lose the money that makes them such great schools. The stars will stop shining in the planetarium."

If that's the takeaway from Hartford--a connection between money and being a "great school" that could not be any more direct--am I wrong to see it as self-contradictory to close the show by condemning a program that has given unprecedented financial assistance to high-poverty, high-minority schools? Instead, the hosts dismiss the approach:

"And it's more comfortable to say that it's not an issue about racism. It's just an issue about high poverty schools that need help and need more money and need more resources."

Good Schools

This article, together with last week's episode of This American Life, heartbreakingly demonstrate how the words "good schools" usually work as nothing more than coded language for segregationists. In both cases it's very clear from white parents' responses that their idea of what made their schools "good" had absolutely nothing to do with anything educators were doing. In my experience, people usually don't back up what they mean by "good schools," but when their preferences are revealed:

"The role of race in choosing schools was so pronounced that parents actually put their kids in lower-performing schools rather than enroll them in a higher-performing school with large numbers of minority students."

The Troubling Ways Wealthy Parents Pick Schools