Bachelor Thesis by Tarek Alakmeh
Supervisior: Prof. Dr. Thomas Fritz
Advisor: Prof. Dr. Lena Jäger
Advisor: David Reich
2022, University of Zurich
Eye-Tracking Code Example:
Aggregated fixations of participants reading code
(green = understanding, purple = non-understanding)
Predict Code Comprehension and
the difficulty thereof.
Existing code complexity metrics
are outdated and flawed.
e.g. Cyclomatic Complexity [McCabe, 1976]
,
Halstead complexity measures [Halstead, 1977]
Example of two code snippets with equal cyclomatic complexity but vastly different human comprehensibility.
Source: SonarSourcea solution using Neural Networks & Eye-Tracking
1. Source Code Snippets
1. Source Code Snippets
2. Setup & Conduct Eye-Tracking Experiment
1. Source Code Snippets
2. Setup & Conduct Eye-Tracking Experiment
3. Analyze Data & Train Neural Network
1. Source Code Snippets
1. Source Code Snippets
Diverse & Real-World Code
1. Source Code Snippets
Diverse & Real-World Code
Previous Studies
Text-Book Examples
Open Source Repositories
Proprietary Industry Code
1. Source Code Snippets
Diverse & Real-World Code
from
Open Source Repositories
50'000 most popular Python Code Snippets
Initial Phase (Retrieval)
20'000 Code Snippets
2nd Phase (filter out too long/short snippets)
NONE
LOOP
RECURSION
BOTH
AST Walker Segmentation Random Sampling
3rd Phase (segment data into 4 categories)
NONE
LOOP
Alternative: Raw Random Sampling
25%
25%
25%
25%
AST Walker Segmentation Random Sampling
3rd Phase (segment data into 4 categories)
25 Random Snippets per Category
3rd Phase (sampling from segments)
Manual Inspection & Validation Survey
4
9 Final Code Snippets
5
AST Walker Segmented Sampling
Raw Sampling
9 Final Code Snippets
5
1. Source Code Snippets
2. Setup & Conduct Eye-Tracking Experiment
3. Analyze Data & Train Neural Network
2. Setup & Conduct Eye-Tracking Experiment
1. Source Code Snippets
2. Setup & Conduct Eye-Tracking Experiment
3. Analyze Data & Train Neural Network
3. Analyze Data & Train Neural Network
3. Analyze Data & Train Neural Network
Reading behaviours
Reading behavior
Reading behavior
Reading behavior
Some participants only used function naming + comments
to understand code
While others examined the code line-by-line in a bottom-up fashion
to understand code
Reading behavior
Reading behavior
Using the eye's fixations and attributes as input to the neural network
INPUT LAYER
INPUT LAYER
INPUT LAYER
DROPOUT LAYER
NORMALIZATION LAYER
DENSE LAYER
Cross Validation over New Participants / New Code Snippets
Model Descriptions
Cyclomatic Complexity vs. Our Comprehensibility Score
def fibonacci(n: int) -> int:
"""
return F(n)
>>> [fibonacci(i) for i in range(13)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
"""
if n < 0:
raise ValueError("Negative arguments are not supported")
return _fib(n)[0]
# returns (F(n), F(n-1))
def _fib(n: int) -> tuple[int, int]:
if n == 0: # (F(0), F(1))
return (0, 1)
# F(2n) = F(n)[2F(n+1) − F(n)]
# F(2n+1) = F(n+1)^2+F(n)^2
a, b = _fib(n // 2)
c = a * (b * 2 - a)
d = a * a + b * b
return (d, c + d) if n % 2 else (c, d)
def fibonacci(n: int) -> int:
"""
return F(n)
>>> [fibonacci(i) for i in range(13)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
"""
if n < 0:
raise ValueError("Negative arguments are not supported")
return _fib(n)[0]
# returns (F(n), F(n-1))
def _fib(n: int) -> tuple[int, int]:
if n == 0: # (F(0), F(1))
return (0, 1)
# F(2n) = F(n)[2F(n+1) − F(n)]
# F(2n+1) = F(n+1)^2+F(n)^2
a, b = _fib(n // 2)
c = a * (b * 2 - a)
d = a * a + b * b
return (d, c + d) if n % 2 else (c, d)
def fibonacci(n: int) -> int:
"""
return F(n)
>>> [fibonacci(i) for i in range(13)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
"""
if n < 0:
raise ValueError("Negative arguments are not supported")
return _fib(n)[0]
# returns (F(n), F(n-1))
def _fib(n: int) -> tuple[int, int]:
if n == 0: # (F(0), F(1))
return (0, 1)
# F(2n) = F(n)[2F(n+1) − F(n)]
# F(2n+1) = F(n+1)^2+F(n)^2
a, b = _fib(n // 2)
c = a * (b * 2 - a)
d = a * a + b * b
return (d, c + d) if n % 2 else (c, d)
Eye-Tracking Data mapped onto Code
def fibonacci(n: int) -> int:
"""
return F(n)
>>> [fibonacci(i) for i in range(13)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
"""
if n < 0:
raise ValueError("Negative arguments are not supported")
return _fib(n)[0]
# returns (F(n), F(n-1))
def _fib(n: int) -> tuple[int, int]:
if n == 0: # (F(0), F(1))
return (0, 1)
# F(2n) = F(n)[2F(n+1) − F(n)]
# F(2n+1) = F(n+1)^2+F(n)^2
a, b = _fib(n // 2)
c = a * (b * 2 - a)
d = a * a + b * b
return (d, c + d) if n % 2 else (c, d)
Eye-Tracking Data mapped onto Code
e.g. by using a Transformer architecture
just using code as input being the final goal