Local LLMs: Balancing Efficiency in Coding with Privacy and Security Concerns

Large Language Models (LLMs), like GPT-4, enhance software development by aiding code generation and debugging, reducing routine coding tasks. They offer code suggestions, error identification, and potential fixes, making coding more efficient. Integrated into development tools, LLMs can automate documentation, manage complex data structures, and generate test cases, boosting development speed and code quality. LLMs also improve team collaboration by summarizing discussions and automating developer support responses.

Due to privacy and security concerns, not all companies can use public Large Language Models (LLMs). Sensitive sectors require private LLMs to keep data secure. While large firms can develop proprietary models, smaller ones face high costs and complexities. Open-source LLMs can help, providing advanced language model benefits while ensuring regulatory compliance and data security. This democratizes access to AI technology and fosters collaboration for model improvement and customization.

In this blog post, we are utilizing Meta’s Large Language Model specifically designed for coding, named “Code Llama”. To streamline the deployment process, we will use “Ollama”, a tool that offers a Docker container. This setup significantly simplifies the task of getting the LLM operational quickly and efficiently.

You can find the Docker image on the hub at the following URL: https://hub.docker.com/r/ollama/ollama

Let’s execute the following command to pull and run it:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

You can provide some additional parameters.

If you have an Nvidia GPU, refer to this document: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuration.

--gpus=all

To ensure the container does not communicate with the outside, you can isolate it from any network by adding this parameter.

--network none

Now that everything is up and running, we have two ways to interact with it.

You can directly run Code Llama in a shell by using the following command:

docker exec -it ollama ollama run codellama

Let’s give it a try.

% docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
50b40a942366f48d6158c66fd2c24c9f2a489cd26c9fdba0a0f5293840c8045d
% docker exec -it ollama ollama run codellama
>>> How do I remove a docker container?

To remove a Docker container, you can use the `docker rm` command. The general syntax is:

docker rm <container-name or ID>

For example:

docker rm my_container

You can also use the `--force` option to forcefully remove the container, without prompting for confirmation. For example:

docker rm --force my_container

If you want to remove a stopped container, you can add the `-f` or `--force` flag to the command. This will not prompt for
confirmation and will immediately remove the container. For example:

docker rm -f my_container

It's important to note that removing a container will also remove any data stored in volumes associated with the container.
If you want to preserve the data, you should first stop the container and then remove it.

Alternatively, you can use the `docker system prune` command to remove all stopped containers, networks, and images that are
not tagged or referenced by any other resource. This will also remove any data stored in volumes associated with the
container. For example:

docker system prune

This will display a summary of the resources that were removed and any error messages that occurred during the cleanup
process.

>>> Send a message (/? for help)

Neat!

To use it from a different location, such as a plugin from your favorite IDE, or from a central server rather than your local machine, you might want to use it via HTTP.

Below, I demonstrate how to send a request to Ollama, specifying the model and the prompt..

curl http://localhost:11434/api/generate -d '{
  "model": "codellama",
  "prompt":"Write a shell script to remove a docker container."
}'

Here is a complete run.

% curl http://localhost:11434/api/generate -d '{
  "model": "codellama",
  "prompt":"Write a shell script to remove a docker container."
}'
{"model":"codellama","created_at":"2024-04-10T07:38:44.080588634Z","response":"\n","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:44.232310384Z","response":"Here","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:44.385408092Z","response":" is","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:44.538753134Z","response":" an","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:44.694064134Z","response":" example","done":false}
...
{"model":"codellama","created_at":"2024-04-10T07:38:49.140863636Z","response":" of","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:49.29585672Z","response":" the","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:49.449533595Z","response":" container","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:49.603687428Z","response":" to","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:49.759067845Z","response":" be","done":false}
{"model":"codellama","created_at":"2024-04-10T07:38:49.915728095Z","response":" removed","done":false}
...

I terminated the output early as it was becoming overly lengthy. However, we can convert this output into a more digestible format for easier reading.

If you have any further questions, need specific code examples, or require additional assistance, please don’t hesitate to leave a comment or send me a message.

Resources #