Share this postThe Salt - Curated AIRelated ArticlesCopy linkFacebookEmailNotesMoreRelated ArticlesBenjamin MarieOct 21, 2024Share this postThe Salt - Curated AIRelated ArticlesCopy linkFacebookEmailNotesMoreShareA selection of related articles published in my other newsletter, The Kaitchup:Quantize and Run Llama 3.3 70B Instruct on Your GPUBenjamin Marie·December 9, 2024Read full storyFast Speculative Decoding with Llama 3.2 and vLLMBenjamin Marie·October 14, 2024Read full storyTrain and Serve an AI Chatbot Based on Llama 3.2Benjamin Marie·October 17, 2024Read full storyQLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPUBenjamin Marie·August 19, 2024Read full storyRun Llama 3.1 70B Instruct on Your GPU with ExLlamaV2 (2.2, 2.5, 3.0, and 4.0-bit)Benjamin Marie·August 29, 2024Read full storyHow to Set Up a PEFT LoraConfigBenjamin Marie·September 27, 2024Read full storyDoRA vs. LoRA: Better and Faster than LoRA?Benjamin Marie·March 11, 2024Read full storyGoogle's Gemma: Fine-tuning, Quantization, and Inference on Your ComputerBenjamin Marie·February 26, 2024Read full storyRun Llama 3 70B on Your GPU with ExLlamaV2Benjamin Marie·May 6, 2024Read full storySqueezeLLM: Better 3-bit and 4-bit Quantization for Large Language ModelsBenjamin Marie·February 12, 2024Read full storyQLoRA: Fine-Tune a Large Language Model on Your GPUBenjamin Marie·May 30, 2023Read full story