Monday, March 28, 2016

MySQL, Counting chain of continuous records based on a condition


In this post we’re talking about how to get the maximum count of continuous records that satisfy a predefined property.

The problem

Let's imagine that we want to make a query for a school that gets the student names with the maximum number of continuous semesters when the student got a GPA that’s less than 2 with the following conditions

  • If the student's GPA for the semester is less than 2, count it in.
  • if it’s a summer term with a GPA < 2, it’s not counted.
  • if it’s a summer term with GPA > 2 it breaks the chain and counting starts over for that student.

Here’s an example

Let's consider that semesters are numbered as 1, 2 and 3, such that 3 is for the summer one.
If we have a table structure like that

student
year
semester
gpa
John
2012
1
1.2
John
2012
2
1.5
John
2012
3
1.0
John
2013
1
2.0

John’s count here is 2.
explanation : The first two semesters of 2012 are with GPA that's less than 2, so they’re counted in, summer term is not counted as per the second condition above and the first semester of 2013 is not counted because it’s GPA >= 2.0 and we are interested only in GPAs that’s < 2.

The solution

For something with that kind of complicated conditions, I thought it would be reasonable to use MySQL variables, don’t be scared, I promise they are way easier and natural than you first thought, let’s review what we want to do with them.
  • The variable needs to increment whenever a non-summer semester with a GPA < 2.0 is encountered.
  • The variable needs to reset whenever a semester of GPA >= 2.0 is encountered or the student has changed.

So, let’s start with the query


 SET @counter=0;
SET @student="";
SELECT student, max(semesterCount) as maxSemesterCount FROM (
 SELECT student,
   ( @counter:=IF(@student!=student, 0, @counter) ) AS counterReset,
   ( @student:=IF(@student!=student, student, @student) ) AS studentChange,
   ( @counter:=IF(gpa<2 AND semester!=3, @counter+1,
                 IF(gpa<2 AND semester!=3, @counter,
                 0)
               )
   ) AS semesterCount
 FROM (
   SELECT * FROM student_gpas
      ORDER BY student, educationalYear, semester
 ) q2
) q1
GROUP BY student;
Okay, let's break things down below and it’ll be a lot clearer, let’s examine the query line by line.

  • We start by initializing our two variables to some initial values, these are the variables that will be used to track things down inside the query.
  • Next we select the students and the maximum streak of semesters.
  • Inside the braces of the FROM clase is where all the magic happens (I know, there isn’t any, but yeah), we start off with selecting the student, then check if the student has changed since last row (think of the query as a loop), if yes then the counter variable is reset, otherwise it’s not changed (conditions that change it based on the GPA are coming next, we’re just separating things here for more clarity).
  • Next, same check for the student, if changed since last record, we assign the new value to the student variable
  • Check if it’s a GPA less than 2.0 with a non-summer term, if so, increment the counter variable, otherwise no change
  • Else if it’s a GPA less than 2.0 but a summer term, no change for the variable
  • Else it’s definitely a GPA  that’s more than 2.0, so reset the counter variable.

That would give something like this


student
maxSemesterCount
John
2
Sarah
5
Christopher
3

A couple of things to note here


  • This query can very much be optimized from many aspects, that’s the barebones version, the least that you can do is to reduce the number of records that’s being counted inside the query.
  • Notice that we didn’t get the values directly from the student_gpas table, the reason for that is that we want the semesters to be consecutive to be able to count them properly, otherwise numbers would differ with each run.
  • Views in MySQL don’t accept subqueries or variables, so if you want to use this query in a report designer or run from code there’s a silly workaround that works, create a table with that structure (student, maxSemesterCount fields) and insert these values in it at the time you open the report and make the report get its data from that table, I’m going to think of a better solution or if one of you guys may have been in the same situation and got to a real solution, please let me know, but for now, this works with a performance hit indeed.


That's it thank you for reading I hope that helps.

No comments:

Post a Comment