30 Apr 2025

Towards Auto-Generated ERT Unit Tests

Rigorous testing clearly benefits software projects, yet many Emacs Lisp packages have minimal tests. You might think manual testing during development is enough—but that only works if the code never changes and has no evolving dependencies. Automated tests, however, give you the confidence to modify code without fear of unintentionally breaking functionality. They quickly catch issues caused by changing dependencies, and coverage tools highlight tested and untested functionality.

Benefits aside, writing test cases can feel like a chore. As an enterprise software developer, I disliked it as much as anyone. But now, as I occasionally work on Emacs Lisp packages mostly for personal use, I'm finding that a lack of automated tests costs me valuable time. We've all experienced making seemingly harmless changes, only to discover obscure bugs weeks later that automated tests might have caught immediately.

I want automated tests for my Emacs Lisp code—whether it’s a published package or just a personal library of functions—but I'd rather not write them manually. I've long dreamed of using LLMs to generate test cases. So, is this approach already viable, particularly for Emacs Lisp unit tests? Writing unit tests feels like an ideal scenario for current LLMs that may lack extensive Emacs Lisp training: unit tests are simpler than integration or performance tests, less sensitive to hallucinations, and easy to adjust or discard if problematic.

At this stage, I'm not aiming for a sleek Emacs integration. I just want to see if the approach works. Using ChatGPT in a browser with some simple copy-pasting is enough. I started by asking ChatGPT (using the o3-mini-high and 4o models) to help set up ERT tests for my personal library functions, loaded from my init.el. My goal was to run the tests externally, in batch mode, separate from my main Emacs instance.

ChatGPT performed reasonably well. After a few iterations, partly due to peculiarities in my init.el configuration, I ended up with a test file containing a dummy test that I could successfully run externally using:

emacs --batch -Q \
      -l test/test-pp-lib.el \
      -f ert-run-tests-batch-and-exit

Next, I gave ChatGPT the code for my /org-next-visible-link function (see my previous blog post) and asked it to generate a complete suite of unit tests aimed at maximizing coverage. The generated tests looked reasonable, but several failed due to small, silly issues. Some failures were caused by missing double backslashes to properly escape [ in looking-at patterns. Others were due to ChatGPT "misunderstanding" the behavior of the function: if point is already at the beginning of a link, /org-next-visible-link will skip to the next one. It was easy enough to fix these manually.

However, one test kept failing. It involved text visibility, which is the core aspect that /org-next-visible-link is supposed to handle. I pasted the ERT error and backtrace into ChatGPT and asked it to find the problem. It claimed to have identified and fixed the issue, but the test failed again—a common pattern when LLMs hallucinate fixes that don’t actually work.

Next, I tried executing the test steps manually in a buffer. The test failed for me too! That’s when I realized the problem might not be in the test, but in my code. I asked ChatGPT to help me find the bug. I gave it a big hint: during manual testing, I noticed the failure occurred in a specific corner case, when a link was inside a folded section with no text following it. I explained that adding any text after the link would make the test pass. What happened next was quite impressive.

ChatGPT took its time performing its inference-based iterative reasoning. It spent about three minutes analyzing the function code, the test code, and the failed test backtrace. When it responded, it correctly identified a bug in my code, explained the underlying problem, and suggested a specific code change to fix it. I applied the fix, and the test passed.

No more excuses: Every non-trivial function deserves a unit test.

For completeness, here is the code of the function under test (with the fix), and the unit tests generated by ChatGPT.

;; Adapted from org-next-link to only consider visible links
(defun /org-next-visible-link (&optional search-backward)
  "Move forward to the next visible link.
When SEARCH-BACKWARD is non-nil, move backward."
  (interactive)
  (let ((pos (point))
        (search-fun (if search-backward #'re-search-backward
                      #'re-search-forward)))
    ;; Tweak initial position: make sure we do not match current link.
    (cond
     ((and (not search-backward) (looking-at org-link-any-re))
      (goto-char (match-end 0)))
     (search-backward
      (pcase (org-in-regexp org-link-any-re nil t)
        (`(,beg . ,_) (goto-char beg)))))
    (catch :found
      (while (funcall search-fun org-link-any-re nil t)
        (let ((folded (org-invisible-p (match-beginning 0) t)))
          (when (or (not folded) (eq folded 'org-link))
            (let ((context (save-excursion
                             (unless search-backward (forward-char -1))
                             (org-element-context))))
              (pcase (org-element-lineage context '(link) t)
                (link
                 (goto-char (org-element-property :begin link))
                 (throw :found t)))))))
      (goto-char pos)
      ;; No further link found
      nil)))

;;; test-pp-lib.el —–– tests for pp-lib.el

;; Ensure test directory is on load-path so we can require test-helper
(add-to-list 'load-path (file-name-directory #$))

(require 'test-helper)
(require 'ert)
(require 'org)     ;; for org-mode, org-element, org-link-any-re
(require 'cl-lib)  ;; for cl-letf

;;; tests for `/org-next-visible-link'

(ert-deftest org-next-visible-link-forward-basic ()
  "Move to the first link in forward direction and return non-nil."
  (with-temp-buffer
    (insert "foo [[A]] bar [[B]] baz")
    (org-mode)
    (goto-char (point-min))
    (should (/org-next-visible-link))
    (should (looking-at "\\[\\[A\\]\\]"))))

(ert-deftest org-next-visible-link-forward-second-link ()
  "Subsequent `org-next-visible-link' should find the next link."
  (with-temp-buffer
    (insert "foo [[A]] bar [[B]] baz")
    (org-mode)
    (goto-char (point-min))
    (/org-next-visible-link)
    (should (/org-next-visible-link))
    (should (looking-at "\\[\\[B\\]\\]"))))

(ert-deftest org-next-visible-link-forward-skip-current ()
  "When point is at the beginning of a link, skip it and find the next."
  (with-temp-buffer
    (insert "[[A]] [[B]]")
    (org-mode)
    (goto-char (point-min))
    (should (/org-next-visible-link))
    (should (looking-at "\\[\\[B\\]\\]"))))

(ert-deftest org-next-visible-link-backward-basic ()
  "Move backward to the nearest previous link and return non-nil."
  (with-temp-buffer
    (insert "first [[A]] then [[B]] then [[C]]")
    (org-mode)
    (goto-char (point-max))
    (should (/org-next-visible-link t))
    (should (looking-at "\\[\\[C\\]\\]"))))

(ert-deftest org-next-visible-link-backward-second-link ()
  "Second backward invocation finds the prior link."
  (with-temp-buffer
    (insert "[[A]] [[B]]")
    (org-mode)
    (goto-char (point-max))
    (/org-next-visible-link t)
    (should (/org-next-visible-link t))
    (should (looking-at "\\[\\[A\\]\\]"))))

(ert-deftest org-next-visible-link-backward-skip-current ()
  "When point is on a link, backward skips it and finds the previous."  
  (with-temp-buffer
    (insert "[[X]] [[Y]]")
    (org-mode)
    ;; position right at the beginning of Y
    (goto-char (point-min))
    (/org-next-visible-link) ; forward to [[X]]
    ;; now move to Y
    (/org-next-visible-link)
    ;; test backward skip
    (goto-char (point))
    (should (/org-next-visible-link t))
    (should (looking-at "\\[\\[X\\]\\]"))))

(ert-deftest org-next-visible-link-no-link ()
  "With no links, returns nil and point does not move."
  (with-temp-buffer
    (insert "no links here")
    (org-mode)
    (goto-char (point-min))
    (should-not (/org-next-visible-link))
    (should (= (point) (point-min)))))

(ert-deftest org-next-visible-link-skip-in-folded-headline ()
  "Skip links that reside in a folded headline body."
  (with-temp-buffer
    (org-mode)
    (insert "* Heading1\n[[SKIP]]\n* Heading2\n[[FIND]]\n")
    ;; Fold the first subtree so its body (and the [[SKIP]] link) is hidden
    (goto-char (point-min))
    (org-cycle)  ;; this folds the subtree under Heading1
    ;; Now search forward: should skip [[SKIP]] and land on [[FIND]]
    (goto-char (point-min))
    (should (/org-next-visible-link))
    (should (looking-at "\\[\\[FIND\\]\\]"))
    ;; And then no more
    (should-not (/org-next-visible-link))))

(ert-deftest org-next-visible-link-allow-org-link-invisible ()
  "Find links hidden with `invisible='org-link` overlays."
  (with-temp-buffer
    (insert "foo [[HIDDEN]] [[VISIBLE]]")
    (org-mode)
    ;; hide first link with 'org-link
    (goto-char (point-min))
    (re-search-forward org-link-any-re)
    (let ((ov (make-overlay (match-beginning 0) (match-end 0))))
      (overlay-put ov 'invisible 'org-link))
    (goto-char (point-min))
    ;; should still hit HIDDEN first
    (should (/org-next-visible-link))
    (should (looking-at "\\[\\[HIDDEN\\]\\]"))
    ;; then hit VISIBLE
    (should (/org-next-visible-link))
    (should (looking-at "\\[\\[VISIBLE\\]\\]"))
    (should-not (/org-next-visible-link))))

;;; test-pp-lib.el ends here

Even with the current generation of ChatGPT models, I can confidently say they can be used to generate useful ERT unit tests. Is it perfect? Of course not. For example, the org-next-visible-link-backward-skip-current test contains a bug that, only by chance, doesn’t cause a failure. It also includes a useless (goto-char (point)) call. Future LLMs will only get better: they’ll be able to fix or improve existing tests and generate new ones to increase code coverage. And tighter integration with Emacs, eventually reaching the point where tests are fully auto-generated, is just a matter of time.

There's no need to wait, though. I am sure existing tools like gptel.el and aider.el could already be used to provide a tighter integration experience, if desired. Experimenting with other LLM providers and their models might also yield even better unit test generation results.

Meanwhile, I'm off to work on improving my ERT setup to streamline running tests and debugging my code. From now on, all the Emacs Lisp code I work on will be accompanied by unit tests. I encourage you to do the same.

Enjoy the malleability of Emacs and the freedom it gives you!

Discuss this post on Reddit.