todo-tracker - JavaScript

In this log, I'll delve into specific coding challenges and solutions encountered while developing the todo-tracker, highlighting the nuances of handling file traversal, multi-threading, and task differentiation.

Setting Up and Reading Files

Initially, I set up the project to handle JSON files, crucial for comparing tasks. Here's how I approached reading and parsing JSON data for loading the config for the program.:

// load config from file
import args from "./args";

// read using bun file
const file = Bun.file(`${args.config}`);
const config = await file.json();

if (!config.tags) throw new Error("No tags found in config file");

const { tags, ignore = [] } = config;

// export
export default { tags, ignore };

This simple yet effective function lays the groundwork for processing task data stored in JSON files, a fundamental step in the project's operation. Since we're using JavaScript and not TypeScript I decided I didn't need to do any input validation here, as I was mainly focused on moving as fast as possible with this project and then I'll iron out the details in the TypeScript rewrite.

Multi-threading with Bun: File Processing

One of the core features implemented was multi-threading using Bun's worker threads. This was aimed at enhancing the efficiency of file traversal. Here's a snippet showing how I set up and utilized these workers:

// Split files array into chunks for each worker
const num_workers = 4; // Or any number you find appropriate
const chunk_size = Math.ceil(files.length / num_workers);

const chunks = Array.from({ length: num_workers }, (_, i) =>
    files.slice(i * chunk_size, (i + 1) * chunk_size)
);

const workers = [];
const promises = [];

for (const chunk of chunks) {
    const worker = new Worker(new URL('./file_worker.js', import.meta.url));
    workers.push(worker);
    const promise = new Promise((resolve) => {
        worker.onmessage = (message) => {
            resolve(message.data);
        };
        worker.onerror = (error) => {
            console.error(`Worker error: ${error.message}`);
        };
    });
    promises.push(promise);
    worker.postMessage({ chunk, config: config.tags, directory: DIR });
}

// Wait for all workers to complete
const results = (await Promise.all(promises)).flat();
// Clean up
workers.forEach(worker => worker.terminate());

In this segment, files are divided into chunks distributed among workers, enabling parallel processing and significantly improving performance compared to a single-threaded approach. Since we don't need any communication between threads as we're simply doing READ operations on each file, we will benefit largely from using a multi-threaded approach (I'm learning about this at University right now!).

Handling Directory Traversal

The use of Node's readdir function was integral to traversing directories, especially for filtering out unnecessary files. The implementation looked like this:

import { readdir } from "node:fs/promises";

const ignore_regexes = config.ignore.map((i) => new RegExp(i));

const files = crawl_files.filter((f) => {
    return !(ignore_regexes.some((r) => r.test(f)));
});

Here, readdir is used in conjunction with regular expressions derived from the project's configuration, allowing the program to bypass files and directories specified as irrelevant, streamlining the scanning process. One improvement I would've loved to figure out would be taking the .gitignore file and using that for ignoring files for searching, but there wasn't an easy and quick solution I could find for both recursively searching the file tree & taking into consideration .gitignore files and also onverting the .gitignore entries to something we can use for comparing against the file paths.

In the future this would be a big optimisation point that I could make, but for now I decided to add an array of regex values in the config that the user can set. This should be flexible enough for most use-cases, and when deploying this project I could pre-bake some template config files for quick setup.

Diving into the Diff Logic

The core of the todo-tracker lies in its ability to differentiate between tasks. The diff.js script embodies this functionality:

new_tasks.forEach(new_task => {
    const same_text_result = base_tasks.find(bt => same_text(bt, new_task));
    if (same_text_result) {
        if (same_text_result.line !== new_task.line || same_text_result.file !== new_task.file) {
            diffs.push({ ...data, type: 'MOVE' });
        } else {
            diffs.push({ ...data, type: 'SAME' });
        }
    } else {
        const same_line_result = base_tasks.find(bt => bt.line === new_task.line && bt.tag === new_task.tag);
        if (same_line_result && !same_text(same_line_result, new_task)) {
            diffs.push({ ...data, type: 'UPDATE' });
        } else {
            diffs.push({ ...data, type: 'NEW' });
        }
    }
});

base_tasks.forEach(base_task => {
    if (!diffs.find(d => d.id === base_task.id)) {
        diffs.push({ ...data, type: 'DELETE' });
    }
});

This code effectively sorts tasks into categories: 'NEW', 'UPDATE', 'MOVE', 'SAME', and 'DELETE'. It compares tasks based on text, line numbers, and file locations, updating the task status accordingly.

Reflections and Conclusions

Throughout the development of todo-tracker, I tackled multi-threading, efficient directory traversal, and complex logic for task differentiation. While the project presented multiple challenges, especially in excluding irrelevant files and interpreting task changes, the solutions implemented have contributed to a robust tool capable of tracking and differentiating tasks in a multi-threaded environment.

This journey has underscored the importance of careful planning, especially when dealing with file systems and multi-threading in JavaScript. Future enhancements will focus on refining these mechanisms, ensuring even greater efficiency and user-friendliness. This documentation serves as a testament to the challenges and victories in software development, offering insights and code examples for those navigating similar paths.