COVID Dashboard Upgrade – DZone Web Dev

Back in 2020, I had written a COVID dashboard using Rails 6.0.2 and Ruby 2.5.1. Given a date range, it displays a line graph of cumulative as well as incremental data for the number of confirmed, deceased, and recovered cases. Its data sources are:

  1. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: This GitHub repository provides one CSV file per day with country-level cumulative data.

  2. This website used to provide state-wise statistics for India but unfortunately has ceased operations. State-level data are available only through 30-Oct-2021 and can be obtained in JSON format through a REST API.

CSSE provides cumulative data every day, whereas covid19india.org provides daily incremental data.

I recently upgraded the application. In this article, I describe the motivation and upgrade details along with sharing code snippets.

Goal

The system had two Python and Ruby scripts to generate datasets in CSV format. The data are inserted into the database that the Rails dashboard accesses.

The four generator scripts are:

  • gdc.py: GDC stands for global daily cumulative. The script takes a CSV file as a command-line argument, extracts the data for each country, and prints them out. The output is redirected to a file in the datasets/global_daily_cumulative folder. The command to run it in a console is:
python3 gdc.py ../COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/04-25-2020.csv > datasets/global_daily_cumulative/04-25-2020.csv

  • gdd.py: GDD stands for global daily Delta. The script calculates the delta numbers for one day by subtracting the previous day’s cumulative numbers from the numbers of that day. The output is redirected to a file in the datasets/global_daily_delta folder. The command to run it in a console is:
python3 gdd.py datasets/global_daily_cumulative/04-25-2020.csv > datasets/global_daily_delta/04-25-2020.csv

  • idd.rb: IDD stands for India daily Delta. The script extracts the data for one day, the date of which is supplied as a command line argument. The output is redirected to a file in the datasets/india_daily_delta folder. The command to run it in a console is:
ruby idd.rb 25-Apr-20 > datasets/india_daily_delta/04-25-2020.csv

Notice the date format used by covid19india.org is dd-month-yy.

  • idc.rb: IDC stands for India daily cumulative. The script calculates the cumulative data for a day by adding that day’s incremental (delta) numbers to the previous day’s cumulative numbers. The output is redirected to a file in the datasets/india_daily_cumulative folder. The command to run it in a console is:
ruby idc.rb 04-25-2020 > datasets/india_daily_cumulative/04-25-2020.csv

The daily_all.rb program reads all the CSV files in each dataset sub-directory and makes aggregated data files collating file content. For example, the content of all files in the global_daily_cumulative folder is collated in global_daily_cumulative.csv. Each line in the master file will have the data’s date as the first field. The file names are:

  • global_daily_cumulative.csv
  • global_daily_delta.csv
  • india_daily_cumulative.csv
  • india_daily_delta.csv

The generated daily files and the four aggregated data files all have the same data format: date, place, confirmed, deaths, recovered.

The problem is that I had to run the four programs for one day’s data. Of course, I could bunch a set of commands for a week in a script file and run the file, but it still required manual effort to prepare them.

I used to insert the data into PostgreSQL by using psql passing the CSV file as an argument. A typical command was as follows (on Windows):

"C:Program FilesPostgreSQL13binpsql.exe" -h localhost -U postgres -d covid19 -c "SET client_encoding TO 'UTF8';" -c "copy global_daily_cumulative(date, place, confirmed, deaths, recovered) FROM 'E:CodeCorona2020covid19global_daily_cumulative.csv' DELIMITER ',' CSV;"

On Linux, the command would be:

psql -d covid19 -c "SET client_encoding TO 'UTF8';" -c "copy india_daily_delta(date, place, confirmed, deaths, recovered) FROM '/var/www/datasets/covid19/india_daily_delta.csv' DELIMITER ',' CSV;"  

The complete workflow is shown in the following diagram:

There are three manual steps. My goal was to do end-to-end automation. The idea was to only do a git clone or pull to my computer and run one ruby ​​file. That’s it. Alongside, I also intended to upgrade the tech stack of the dashboard.

Implementation

My choice of language is definitely Ruby, so I ported the two Python programs also to Ruby. The four generator scripts became Ruby Interactor classes having the business logic going into the call method. Similarly, I encapsulated the psql commands in an Interactor called InsertCovidDataInDb. It inserts data in the files global_daily_cumulative.csv, global_daily_delta.csv, india_daily_cumulative.csv, and india_daily_delta.csv into the tables global_daily_cumulative, global_daily_delta, india_daily_delta and india_daily_cumulative, respectively.

An interactor is a Ruby class that will include the Interactor gem and has a call method in which you write business logic. Without creating an object of the class, you invoke the “call” method to run the business logic. The usage is akin to a static method in Java. The file gdc.rb has an interactor class GenerateGdcDatagdd.rb has GenerateGddDataidd.rb has GenerateIddData and finally idc.rb has GenerateIdcData class. As an example, given below is the code for GenerateGdcData:

require 'interactor'
require 'date'
require 'csv'

$indexes_to_read = {}
$indexes_to_read['format1'] = [1,3,4,5]
$indexes_to_read['format2'] = [3,7,8,9]
   
class GenerateGdcData
    include Interactor
           
    def call
        country_data_hash = {}
        Dir[context.folder + '/*.csv'].each do |file_path|
            file_name = File.basename(file_path, ".*")
           
            # if file_name is before 03-22-2020, file_format = format1 else format2
            file_date     = Date.new(file_name[6..9].to_i, file_name[0..1].to_i, file_name[3..4].to_i)
            file_format   = file_date < Date.new(2020, 03, 22) ? "format1" : "format2"
            file_date_str = file_date.to_s
           
            country_index     = $indexes_to_read[file_format][0]
            confirmed_index   = $indexes_to_read[file_format][1]
            deaths_index      = $indexes_to_read[file_format][2]
            recovered_index   = $indexes_to_read[file_format][3]

            CSV.foreach(file_path, headers: true) do |row|
                country = row[country_index]
                if country == "Mainland China"
                    country = "China"
                elsif country == "Korea, North"
                    country = "South Korea"
                elsif country == "Korea, South"
                    country = "South Korea"
                elsif country == "Gambia, The"
                    country = "Gambia"
                elsif country == "Bahamas, The"
                    country = "Bahamas"
                elsif country == "The Bahamas"
                    country = "Bahamas"
                elsif country == "Gambia, The"
                    country = "Gambia"
                elsif country == "The Gambia"
                    country = "Gambia"
                end

                row[confirmed_index] = row[confirmed_index] ? row[confirmed_index] : 0
                row[deaths_index]    = row[deaths_index]    ? row[deaths_index]    : 0
                row[recovered_index] = row[recovered_index] ? row[recovered_index] : 0
               
                confirmed = row[confirmed_index].to_i
                deaths    = row[deaths_index].to_i
                recovered = row[recovered_index].to_i
               
                if file_format == "format2"
                    row[0] = row[0] ? row[0] : 0
                    row[5] = row[5] ? row[5] : 0.0
                    row[6] = row[6] ? row[6] : 0.0
                end

                if country_data_hash.has_key? [file_date,country]
                    country_data_hash[[file_date_str,country]][0] += confirmed if confirmed
                    country_data_hash[[file_date_str,country]][1] += deaths if deaths
                    country_data_hash[[file_date_str,country]][2] += recovered if recovered
                else
                    country_data_hash[[file_date_str,country]] = [confirmed, deaths, recovered]
                end
            end
        end
        context.gdc = country_data_hash
    end
end

The main program is in a new file, generate_covid19_data.rb. Its functionality is straightforward: call the five interactors.

Call the five interactors

Along with the automation upgrades, I upgraded the stack: ie, Ruby from 2.5.1 to Ruby 3.1.1; Rails from 6.0.2 to 7.0.2.3 with importmap plus tailwind; Chartkick from 3.4.2 to 4.1.3; and finally, PostgreSQL from 12 to 14.

Whenever I upgrade an existing application to Rails 7, my approach is to generate a sample application and copy the non-functional files over to the existing application. Specifically, these are the bin, lib, and app/javascript folders. From there I make other changes depending on which packing method I have chosen. In this case, I have changed from webpacker to importmapso I need to make sure the file bin/importmap.rb was present and javascript_importmap_tags in application.html.erb.

How to Run

Clone CSSE GitHub repo.

$ git clone https://github.com/CSSEGISandData/COVID-19.git

Clone my repository.

$ git clone https://github.com/mh-github/covid19.git

Create a database named covid19 in your PostgreSQL database. In it create four tables. The commands are available from line 92 onwards on GitHub.

Update covid19/rails/dashboard/config/database.yml with your database username, password and port numbers. Enter the same values ​​in the file insert_covid_data_in_db.rb also.

Go to the project root folder:

You will need ruby-3.1.1. If you use rvm, install this version and use.

$ rvm install 3.1.1
$ rvm use 3.1.1

Install httparty, pgand interactor gems, if you don’t have them in your system.

$ gem install httparty pg interactor

Insert data into the database.

$ ruby generate_covid19_data.rb path/to/COVID-19

Go to the dashboard folder:

Switch to ruby ​​3.1.1.

Install the required gems.

Run the server.

Access the dashboard at http://locahost:3000 in your browser. Here’s a screenshot showing graphs for Australia.
COVID DashboardI put code to capture data insertion times. psql seems pretty fast. Here are the numbers:

table rows time (seconds)

global_daily_cumulative

171502

3.09

global_daily_delta

171502

3.087

india_daily_cumulative

21677

0.335

india_daily_delta

21195

0.323

These are not bad considering that my infrastructure is my Desktop PC with i5-4570 CPU @ 3.20GHz processor, 20 GB RAM, Windows 10 Pro v21H1, and WSL2. I run PostgreSQL on Windows and my main program in WSL-Ubuntu.

Conclusion

Regardless of project size, there are always opportunities for refactoring and automation. Even if your program is a basic tutorial type, upgrading its tech stack to the latest versions gives practice and learning that you can apply to your work applications

.

Leave a Comment