RESOURCES

AI developer tools: Q1/Q2 2024 detailed comparison

Last updated: May 16, 2024

Scroll to see more

Artificial intelligence has forever changed the way software is developed. Automated tasks and improved teamwork, fueled by AI, are fundamentally rewriting the software lifecycle.

This article explores AI tools for developers. We’ll look at their key features, capabilities, benefits, limitations, and areas needing improvement. Understanding these technologies’ current state helps developers and managers decide whether to integrate them into their workflows.

The article covers

  • AI agents,
  • AI code reviewers, 
  • AI code assistants.

Our aim is to provide developers, managers, and tech leaders with insights to make informed tool integration decisions for greater efficiency, productivity, and innovation.

AI Developer Tools Q1-Q2 overview

Glossary

Article: A scientific publication about a particular tool.

Recommendation:

  • Ignore: The tool has been completely discontinued or ceased to be supported or developed, it’s not useful in commercial project setting, or there’s a competing tool that does it better.
  • Promising: The tool has potential, but has not been sufficiently tested in a commercial project setting. Testing in a commercial project required.
  • Developing: Initial research has been done, the tool may be useful in the future, but at present it still has a number of shortcomings that the authors need to improve.
  • Recommended: The tool has been tested in various environments and cases and has a proven positive impact on productivity.

Project status: Is the project still under development, or has support ended and this is the final product?

MetaGPT

Website: https://docs.deepwisdom.ai/main/en/

Article: https://arxiv.org/abs/2308.00352 

Other resources: https://www.1001epochs.ch/blog/metagpt-for-future-of-work 

GitHub: https://github.com/geekan/MetaGPT

Last reviewed: Apr 17, 2024

Recommendation: Ignore

Project status: Work in progress

About MetaGPT

MetaGPT is a multi-agent framework based on Large Language Models (LLMs) that aims to redefine the paradigms of task execution, collaboration, and decision-making in the workplace. It consists of two primary layers:

  • Foundational components layer: Provides the essential building blocks for individual agent operations, including environment, roles, tools, and actions.
  • Collaboration layer: Breaks down complex tasks, assigns them to appropriate agents, and ensures adherence to guidelines while fostering data sharing and a shared knowledge base.

Key features of MetaGPT include role definitions, quick learning, knowledge sharing, and a human-centric approach. It offers benefits such as automation, integration of human SOPs, creative program generation, and enhanced performance through multiple AI agents.

Benefits

Compared to other LLM-based frameworks, MetaGPT stands out in terms of scalability, customizability, and consistent performance across diverse benchmarks. Its development philosophy emphasizes adaptability, user-centricity, and a collaborative ecosystem.

Limitations

However, MetaGPT is still under development and may not be ideal for highly intricate projects. Its capabilities are also restricted to its training data, necessitating frequent updates for accuracy.

Key Points

  • Concept of MetaGPT: MetaGPT is designed to address the limitations of existing LLM-based multi-agent systems which often produce inconsistent logic due to cascading errors. It incorporates human-like workflows to streamline and standardize the development process, thus reducing errors and improving efficiency.
  • Standardized Operating Procedures (SOPs): The framework utilizes SOPs to guide the interactions and responsibilities among agents. SOPs help in breaking down complex tasks into simpler subtasks and defining clear roles for each agent.
  • Role-based system: MetaGPT assigns specific roles and responsibilities to different agents, such as Product Manager, Architect, Engineer, etc. Each role has defined inputs and outputs, which are strictly adhered to, ensuring a coherent workflow.
  • Communication protocols: To avoid miscommunications that commonly occur in unstructured natural language interactions, MetaGPT employs structured communication interfaces. Agents communicate through specific, structured outputs like flowcharts, design artifacts, and documented requirements, reducing the risk of information loss or distortion.
  • Executable feedback mechanism: An innovative aspect of MetaGPT is its executable feedback mechanism, which allows continuous code verification and debugging during runtime, thereby enhancing the quality of the generated code.
  • Empirical validation: The article reports that MetaGPT has been tested against benchmarks like HumanEval and MBPP, showing superior performance in terms of task completion rates and code quality compared to existing systems.
  • Collaborative software engineering: MetaGPT has proven particularly effective in collaborative software engineering scenarios, showing its capability to manage complex software development tasks with multiple agents involved.

Overall, MetaGPT represents a significant advancement in the field of automated programming and multi-agent systems by incorporating human-like problem-solving strategies and structured workflows into the capabilities of large language models.

 

Tests

Prompt

Result

Comment

Simple TODO KTOR crud application – basic prompt (5, 10 and 15 round attempts)

create simple  todo crud application in Ktor with jwt authentication, and serialization

Failure

– Missing classes, build files, authentication or content negotiation, some classes generated in another language

+ Proper dependencies used

Simple TODO KTOR crud application – advanced prompt (30 rounds)

Create a simple TODO CRUD application in Ktor with JWT authentication and serialization. 

 

**Requirements:**

– Use Ktor for building the server-side application

– Implement a CRUD functionality for managing TODO items (Create, Read, Update, Delete)

– Include JWT authentication for securing the endpoints

– Use Kotlin serialization for handling JSON data

– Include a `build.gradle` file for managing dependencies

 

Feel free to ask if you need any help or further clarification.

Failure

– Missing classes, build files, authentication or content negotiation, some classes generated in another language

+ Proper dependencies used

ChatDev (whitelist access only)

Website: –

Article: https://arxiv.org/pdf/2307.07924.pdf

GitHub: https://github.com/OpenBMB/ChatDev?tab=readme-ov-file

Recommendation: Ignore

Project status: Work in progress

About ChatDev

ChatDev that leverages large language models (LLMs) to streamline the entire software development process through natural language communication.

Key points

  • ChatDev is a virtual chat-powered software development company that mirrors the waterfall model, dividing the process into four stages: designing, coding, testing, and documenting.
  • At each stage, ChatDev recruits “software agents” with different roles, such as programmers, reviewers, and testers, who engage in collaborative dialogue to propose and validate solutions.
  • The chat chain breaks down each stage into atomic subtasks, enabling dual roles to discuss and resolve specific issues through context-aware communication.
  • To address code hallucination challenges, ChatDev introduces a “thought instruction” mechanism where an instructor explicitly provides guidance to the assistant programmer on code modifications.
  • Experiments show ChatDev’s efficiency and cost-effectiveness, with the ability to complete the entire software development process in under 7 minutes and at a cost of less than $1.
  • The framework demonstrates the potential of integrating LLMs into software development, streamlining key processes and promoting effective collaboration among diverse roles.
 

Devin AI (whitelist access only)

Website: https://www.cognition-labs.com/introducing-devin

Article: –

GitHub: –

Recommendation: Ignore

Project status: Work in progress

About Devin

Devin is a tireless, skilled teammate, equally ready to build alongside you or independently complete tasks for you to review. With Devin, engineers can focus on more interesting problems and engineering teams can strive for more ambitious goals.

GPT Pilot

Website: –

Article: –

GitHub: https://github.com/Pythagora-io/gpt-pilot

Last reviewed: Apr 18, 2024

Recommendation: Promising

Project status: Work in progress

About GPT Pilot

Here’s how GPT Pilot builds apps, according to a quote from a project’s GitHub README:

  1. You enter the app name and the description.
  2. Product Owner agent like in real life, does nothing. 🙂
  3. Specification Writer agent asks a couple of questions to understand the requirements better if project description is not good enough.
  4. Architect agent writes up technologies that will be used for the app and checks if all technologies are installed on the machine and installs them if not.
  5. Tech Lead agent writes up development tasks that the Developer must implement.
  6. Developer agent takes each task and writes up what needs to be done to implement it. The description is in human-readable form.
  7. Code Monkey agent takes the Developer’s description and the existing file and implements the changes.
  8. Reviewer agent reviews every step of the task and if something is done wrong Reviewer sends it back to Code Monkey.
  9. Troubleshooter agent helps you to give good feedback to GPT Pilot when something is wrong.
  10. Debugger agent hate to see him, but he is your best friend when things go south.
  11. Technical Writer agent writes documentation for the project.

Tests

Result

Comment

Simple TODO KTOR crud application

Failure

Quite promising. It took GPT Pilot 2 hours, some assistance, and manual intervention to complete a basic app with only one endpoint. Despite this, the overall process shows potential. Most issues stemmed from dependency management, import errors, and missing code sections. The total cost of this experiment was around $15.

Gorilla

Website: https://gorilla.cs.berkeley.edu/

Article: https://arxiv.org/pdf/2305.15334.pdf

GitHub: https://github.com/ShishirPatil/gorilla

Last reviewed: Apr 21, 2024

Recommendation: Ignore

Project status: Work in progress

About Gorilla

Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.

 

Korbit

Website: https://www.korbit.ai/

Article: –

GitHub: –

Last reviewed: Apr 21, 2024

Recommendation: Ignore

Project status: Work in progress

About Korbit

Korbit is an AI-powered tool designed for automatic pull request review.

  • While it generates a significant number of comments, some are useful, but they can be challenging to identify due to the sheer volume.
  • Korbit is capable of handling both small and large code diffs.
 

Tests

Result

Comment

200 line MR

Failure

9 comments, focused on changed lines, useless in the context of the whole project

700 line MR

Failure

19 comments

  • Potential bugs: Simple issues like hardcoded values.

  • Invalid comment: One comment incorrectly references an IndexOutOfBoundsException.

  • Unnecessary comments: Many comments provide no useful information.

1800 line MR

Failure

54 comments

  • The code contained the same simple bugs that the tool didn’t find.
  • The large number of useless comments is demotivating to read through them all.

AI Code Review Action

Website: –

Article: –

GitHub: https://github.com/marketplace/actions/ai-code-review-action

Last reviewed: Apr 21, 2024

Recommendation: Ignore

Project status: Work in progress

About AI Code Review Action

  • This tool is integrated into the GitHub Actions workflow.
  • Similar to Korbit, it generates a substantial number of comments, many of which may be considered redundant or unhelpful. (also there are duplications between this tool and Korbit so it seems they use similar prompting strategy)
  • However, it struggles with larger code diffs, potentially limiting its effectiveness in complex projects.
  • AI Code Review Action on GitHub utilizes the publicly available GPT-3.5 Turbo model.
 
 

Tests

Result

Comment

200 line MR

Failure

27 comments, focused on changed lines, useless in the context of the whole project.

Most comments focus on test naming, but these are invalid.

700 line MR

Failure

52 comments

  • Some comments are not about the diff code.

  • Too many comments to get some value from them.

1800 line MR

Failure

Action failed. Context is too big for GPT-3.5.

CodeRabbit

Website: https://coderabbit.ai/

Article: –

GitHub: https://github.com/marketplace/actions/ai-code-review-action

Last reviewed: Apr 21, 2024

Recommendation: Promising

Project status: Work in progress

About CodeRabbit

Data, privacy, and security: CodeRabbit does not use data collected during code reviews to train or influence the models. Queries to the Large Language Models (LLMs) are ephemeral and there is zero retention on LLMs. Neither we nor the LLMs provider(s) share any data collected during the code review process with third parties.

  • CodeRabbit takes a broader approach, focusing not only on code review but also on suggesting best practices.
  • In addition to reviewing code changes, it generates comprehensive overviews and patch notes, providing valuable insights for developers.
  • CodeRabbit places a strong emphasis on security, ensuring the protection of sensitive information.
 

Tests

Result

Comment

200 line MR

Success

Walkthrough Analysis of Code Changes
1 Comment Regarding TODO in Code

700 line MR

Success

Walkthrough analysys of code changes.

1800 line MR

Success

Walkthrough analysys of code changes.

4 comments, all of them guiding towards good practices.

Supermaven

Website: https://supermaven.com/

Article: –

GitHub: –

Last reviewed: Apr 17, 2024

Recommendation: Ignore

Project status: Work in progress

About Supermaven

  • Supermaven offers code completion suggestions, but they often lack context and relevance to the project at hand.
  • It primarily focuses on providing completions for single lines of code.
  • While Supermaven boasts fast reaction times, the code completions may contain typos and mistakes.
 

Tests

Result

Comment

Endpoint definition on already build domain (java)

Failure

  • Suggestions without domain knowledge.

  • Incomplete suggestions.

Gemini Code Assist

Website: https://cloud.google.com/code

Article: –

GitHub: –

Last reviewed: Apr 19, 2024

Recommendation: Developing

Project status: Work in progress

About Gemini Google Cloud Code

  • Positioned as a viable alternative to GitHub Copilot, Google Cloud Code offers similar capabilities.
  • It features a Gemini chat interface and generally provides slower code completions compared to GitHub Copilot.
  • Seems like Google Cloud Code does not support all programming languages, such as Flutter.
  • Limited information is available regarding data privacy measures implemented by this tool.
 

Tests

Result

Comment

Endpoint definition on already build domain (Java)

Success

  • Suggestions with domain context,

  • Suggestions including good practices,

  • Quick response object generation including practices of existing ones

Screen implementation (Flutter)

Failure

  • Flutter is currently not supported

GitHub Copilot

Website: https://github.com/features/copilot

Article: –

GitHub: –

Last reviewed: Apr 21, 2024

Recommendation: Recommended

Project status: Work in progress

About GitHub Copilot

  • Widely regarded as a top choice among code assistant tools, GitHub Copilot stands out for its advanced features.
  • It leverages a GPT-based chat interface that incorporates project context, resulting in more relevant and accurate code suggestions.
  • GitHub Copilot allows users to configure data privacy settings, addressing potential concerns about sensitive information.
 

Tests

Result

Comment

Endpoint definition on already build domain (Java)

Success

  • Suggestions with domain context

Screen implementation (Flutter)

Success

  • Useful suggestions for widgets displaying data based of objects

AI developer tools: Conclusions

A tl;dr version of our research.

Automated code review agents: Comparative conclusion

  • Usefulness of comments: While both Korbit and AI Code Review Action on GitHub generate numerous comments, the sheer volume can make it challenging to identify genuinely useful feedback. On the other hand, CodeRabbit’s approach of providing comprehensive overviews and patch notes may be more effective in conveying meaningful insights.
  • Scope of review: CodeRabbit stands out by not only reviewing code but also suggesting best practices, which can be invaluable for maintaining high-quality code and adhering to industry standards.
  • Data privacy: While data privacy is a critical aspect of any code review tool, CodeRabbit explicitly prioritizes the protection of sensitive information, giving it an advantage in security-conscious environments

Honest opinion

I currently see 3 use cases for these tools:

  1. Private projects
  2. Projects with a single developer
  3. Low-commercial-experience developers

AI-powered code assistants: Comparative analysis

  • Code completion accuracy: GitHub Copilot and Cloud Code provides accurate and context-aware code completions, thanks to its ability to understand the project’s codebase.
  • Data privacy: GitHub Copilot offers configurable data privacy settings, allowing users to control the level of information shared with the tool. On the other hand, Google Cloud Code lacks transparency regarding its data privacy practices.

AI agents: Comparative analysis

MetaGPT incorporates human-like workflows and standardized operating procedures (SOPs) to address the limitations of existing LLM-based approaches. It assigns specific roles and responsibilities to different agents, promoting a coherent and structured development process. MetaGPT’s features include an executable feedback mechanism for continuous code verification and debugging, as well as a focus on knowledge sharing and collaboration.

Similarly, GPT Pilot takes a step-by-step approach, with each agent (e.g., specification writer, architect, developer) playing a distinct role in the software development process. This structured workflow helps to mitigate the risk of cascading errors and inconsistencies.

While these AI agent-based frameworks demonstrate the potential of integrating LLMs into software development, they are still in their early stages of development and not yet ready for widespread production use. In our tests, they were unable to generate a complete and functional TODO application with Ktor, JWT authentication, and serialization, highlighting the need for further refinement and maturation before they can be reliably used for complex software projects.

Contents

AI developer tools FAQ

Frequently asked questions about AI tools used in software development.

 

What are AI developer tools?

AI developer tools are a collection of software applications and libraries that assist developers in building, testing, and deploying artificial intelligence functionalities within their software. These tools can streamline workflows and improve the efficiency of AI development.

Who can benefit from AI developer tools?

AI developer tools are beneficial for various developers, including those with experience in machine learning, data science, and traditional software development. Even beginners can leverage user-friendly tools to integrate basic AI features.

What are the common use cases for AI developer tools?

  • Training and deploying machine learning models for tasks like image or speech recognition, natural language processing, and anomaly detection.
  • Automating repetitive coding tasks and generating code snippets based on developer intent.
  • Optimizing software performance and identifying potential bugs through AI-powered analysis.

What types of AI developer tools are available?

I developer tools come in many flavors, designed to assist programmers in various stages of the development workflow. Here's a breakdown of some common types:

  • Code completion and assistants: These tools use AI to predict the next line of code, suggest code snippets, or even generate entire functions. Examples include Tabnine, JetBrains AI assistant, and aiXcoder.

  • Code review and debugging: Tools in this category can analyze code for errors, suggest improvements, and even help with debugging complex problems. Some examples include Codium, Stepsize AI, and Sourcery.

  • Documentation generation: These AI-powered tools can automatically generate documentation from your code, saving developers time and effort. Rewind.ai is a popular example.

  • General AI assistants: Some development environments like Replit include built-in chatbots powered by AI that can answer questions, provide suggestions, and even help with debugging.

  • UI/UX Design assistants: There are AI tools that can help with designing user interfaces by generating mockups or suggesting layouts based on user data. While these aren't strictly code-focused, they can be valuable for developers involved in the entire application creation process.

How do you use AI in your development process?

At Pragmatic Coders, we use AI tools to generate code, brainstorm, or streamline daily tasks.

Learn more: AI in software development: how we’re saving clients’ time & money

Why do you need to integrate AI developer tools into your product lifecycle ASAP?

Artificial intelligence is crucial to do things faster: experiment, make mistakes, and learn from then.

Joe Justice, ex-Tesla employee and Agile coach shared with us his observations on AI implementation:

  • I think companies that aren’t using AI are behind, and those that aren’t using their own AI have missed the opportunity to start training it.
  • Once you start training your own AI, you see which types of data and datasets are most useful. This realization starts to change how you gather information and even change how you work to make it easier to gather information. They haven’t even started that learning curve yet.

Most importantly, AI is crucial to innovate, which you most probably want to do if you're building digital products.

Learn more: How Elon Musk’s innovation strategy can fuel your app’s success

Research authors

  • AI RESEARCH + DEVELOPMENT TEAM
Jakub Pruszyński Jakub Pruszyński Senior Mobile Developer
  • #mobile
  • #android
  • #climbing 🧗
Sebastian Druciak Sebastian Druciak Java Developer
  • #Java
  • #ScrumMaster
  • #Gym 💪

Would you like to talk about your project?

We are here to help! We take care of the entire product development process. Your success will make us successful.