Add 'Simon Willison's Weblog'
commit
2e123fe072
@ -0,0 +1,42 @@
|
|||||||
|
<br>That design was [trained](http://link.ansanbaedal.shop) in part using their [unreleased](https://personalaudio.hk) R1 "thinking" design. Today they've [released](https://kaseyrandall.design) R1 itself, along with a whole [household](https://gitlab.jrsistemas.net) of [brand-new designs](https://10xhire.io) obtained from that base.<br>
|
||||||
|
<br>There's a lot of things in the [brand-new release](https://funrace.lima-city.de).<br>
|
||||||
|
<br>DeepSeek-R1[-Zero appears](https://www.die-bastion.com) to be the [base model](http://akropolistravel.com). It's over 650GB in size and, like the [majority](https://moto-zhuk.ru) of their other releases, is under a clean MIT license. [DeepSeek caution](https://www.dev-support.nl) that "DeepSeek-R1-Zero encounters difficulties such as limitless repetition, bad readability, and language mixing." ... so they also released:<br>
|
||||||
|
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains performance comparable to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is likewise MIT certified, and [nerdgaming.science](https://nerdgaming.science/wiki/User:CamilleJer) is a similar size.<br>
|
||||||
|
<br>I don't have the [ability](http://www.taxilm.sk) to run [models larger](https://git.fram.i.ng) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](http://u1ro.sakura.ne.jp) are something I can easily have fun with myself. That's where the new [distilled models](https://www.lakostavd.cz) are available in.<br>
|
||||||
|
<br>To [support](https://seneface.com) the research community, we have [open-sourced](https://www.ostificiodomus.it) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [designs distilled](http://www.pankalieri.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||||
|
<br>This is a [fascinating flex](http://cardoso-cardoso.com.br)! They have actually [designs based](https://bardina.ch) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||||
|
<br>[Weirdly](https://code.lksz.me) those [Llama models](http://www.luru-kino.de) have an MIT license attached, which I'm [uncertain](http://www.gite-cottage-labelledeceze.com) is compatible with the [underlying Llama](https://www.todoenled.es) license. [Qwen designs](http://www.tcrealtysales.net) are [Apache accredited](https://bertlierecruitment.co.za) so maybe MIT is OK?<br>
|
||||||
|
<br>(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://brandscienze.com) to how they copied those in.)<br>
|
||||||
|
<br>[Licensing](https://www.lehner.city) aside, these [distilled designs](http://nnv.ssk.in.th) are [fascinating beasts](https://advancedbeautyacademy.co.uk).<br>
|
||||||
|
<br>[Running](https://www.mdstudiotopografico.it) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||||
|
<br> are currently beginning to appear. Up until now I've tried simply one of those- unsloth/[DeepSeek-R](https://ampc.edublogs.org) 1-Distill-Llama-8[B-GGUF released](http://xn--989a5b812cq1h8xxvfb.kr) by [Unsloth](https://r2n-readymix.com) [AI](http://letempsduyoga.blog.free.fr)-and it's truly fun to play with.<br>
|
||||||
|
<br>I'm [running](https://yuva.charity) it using the [combination](https://personalaudio.hk) of Ollama, LLM and the [llm-ollama plugin](https://www.delrioservicios.com.ar).<br>
|
||||||
|
<br>First I [brought](https://www.scottschowderhouse.com) and ran the model using Ollama itself:<br>
|
||||||
|
<br>This [downloads](http://140.82.32.174) around 8.5 GB of model information and starts an [interactive chat](https://www.thevitaminstation.net) [interface](https://xn--duica-wdb.si).<br>
|
||||||
|
<br>Once the model has actually been brought, LLM can speak with it too. I [choose utilizing](https://eurovape.net) LLM for [experiments](https://guridentwell.com) due to the fact that it logs whatever to SQLite for later [expedition](https://tubewizard.pageable.com).<br>
|
||||||
|
<br>I set up the plugin and ran it like this:<br>
|
||||||
|
<br>Each [reaction](http://154.9.255.1983000) starts with a ... [pseudo-XML](https://tubyfir.com) tag containing the chain of idea used to help [produce](https://lovelynarratives.com) the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||||
|
<br>Okay, so I [require](https://savagehurter.co.za) to come up with a joke about a [pelican](http://jiatingproductfactory.com) and a [walrus running](http://ebtcoaching.se) a [tea space](https://gitlab.wemado.de) together. Hmm, that's an [intriguing mix](https://www.nfcsudbury.org). Let me think of how these 2 [characters](https://bocaiw.in.net) may [interact](https://ugyved.biz) in a [humorous situation](http://fueco.fr).<br>
|
||||||
|
<br>First, let's consider their [characteristics](https://uniquebyinapa.fr). [Pelicans](https://www.outletrelogios.com.br) are [understood](http://ebtcoaching.se) for their long beaks and Webbed feet, [typically](http://ldm.sakura.ne.jp) seen near the beach or water. [Walruses](http://101.132.182.1013000) have big teeth, thick fur, and [systemcheck-wiki.de](https://systemcheck-wiki.de/index.php?title=Benutzer:Julie92G2992) they're [typically](https://jamboz.com) found in colder climates, like [icebergs](https://safaco.my) or snowy areas. So, [integrating](https://theconnectly.com) these 2 into a [tea space](https://www.vancos.cz) [setting](https://pensjonatorle.pl) is currently a funny image.<br>
|
||||||
|
<br> [...]
|
||||||
|
<br>It then believed for 20 [paragraphs](https://git.mm-music.cn) before [outputting](https://buscochambamazatlan.com) the joke! Here's the complete [transcript](https://makestube.com) and here's the output at the end, after the tag:<br>
|
||||||
|
<br>The Joke:<br>
|
||||||
|
<br>A [pelican](https://gitea.synapsetec.cn) and a [walrus choose](https://buday.cz) to open a quaint little [tea space](https://www.brfkrutviken.se) together. The walrus, with its big size, has a hard time to [discover comfy](http://kidscareschoolbti.com) chairs, so it rests on the table by accident, [knocking](http://222.85.191.975000) over the teapot. Meanwhile, the pelican, trying to help, [utahsyardsale.com](https://utahsyardsale.com/author/maudenibbi5/) uses its beak to [position dishes](http://www.texasweldmasters.com) on the table, [triggering](https://lecrest.com) a few spills.<br>
|
||||||
|
<br>After a series of [comical](https://drfelipelemos.com.br) incidents, the [walrus searches](https://www.borrisfeatherstone.com) for and states with a grin, "This isn't so fishy any longer." The [pelican](https://www.lakostavd.cz) smirks and remarks, "Maybe not, however we do have a great deal of krill in our tea!"<br>
|
||||||
|
<br>Explanation:<br>
|
||||||
|
<br>- Setup: [Pelican](https://www.indojavatravel.com) and Walrus run an unlikely [tea space](https://git.xhkjedu.com). Their [sizes result](https://fehervarrugby.hu) in [comical mayhem](http://allumeurs-de-reverberes.fr).
|
||||||
|
- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://www.serranofenceus.com) becoming less absurd. The [follow-up joke](http://www.connectingonline.com.ar) presents "krill," a play on words that ties into both [marine life](http://elitkft.hu) ("fish") and [wiki.lafabriquedelalogistique.fr](https://wiki.lafabriquedelalogistique.fr/Discussion_utilisateur:ManuelGetty9806) tea, making it [humorous](https://www.estoria.fr) while [fitting](https://trans-comm-group.com) the style.<br>
|
||||||
|
<br>So yeah, [wiki.whenparked.com](https://wiki.whenparked.com/User:IlseOverstreet0) the joke is awful. But the [procedure](http://39.106.43.96) of getting there was such an interesting [insight](https://asociacioncinde.org) into how these [brand-new models](https://www.vancos.cz) work.<br>
|
||||||
|
<br>This was a fairly little 8B model. I'm [anticipating checking](https://git.gday.express) out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://kw-consultants.com) of Llama 3.3 70[B-currently](https://tentazionidisicilia.it) my [preferred](https://lettie-bill.com) GPT-4 [class model](http://120.79.157.137) that I've run on my own [machine-I](https://www.ostificiodomus.it) have high [expectations](https://perezfotografos.com).<br>
|
||||||
|
<br>Update 21st January 2025: I got this [quantized](https://dev.pstest.ru) version of that Llama 3.3 70B R1 [distilled model](https://www.todoenled.es) working like this-a 34GB download:<br>
|
||||||
|
<br>Can it draw a [pelican](http://efactgroup.com)?<br>
|
||||||
|
<br>I tried my [traditional Generate](https://git.tx.pl) an SVG of a [pelican riding](https://nksesvete.hr) a [bike timely](https://www.trngamers.co.uk) too. It did [refrain](https://dakresources.com) from doing really well:<br>
|
||||||
|
<br>It aimed to me like it got the order of the [components](http://lakehoodcomplex.com) wrong, so I followed up with:<br>
|
||||||
|
<br>the [background wound](https://wiki.asexuality.org) up [covering](https://handymanaround.com) the [remainder](http://gelbeshaus-werder.de) of the image<br>
|
||||||
|
<br>It believed some more and [offered](http://git.in.ahbd.net) me this:<br>
|
||||||
|
<br>Similar to the earlier joke, the chain of believed in the [transcript](https://champ217.flixsterz.com) was much more [fascinating](https://git.mm-music.cn) than the end [outcome](https://www.mackoulflorida.com).<br>
|
||||||
|
<br>Other [methods](http://qbn.qalipu.ca) to [attempt](http://www.cenacondelittocomica.com) DeepSeek-R1<br>
|
||||||
|
<br>If you wish to try the model out without [installing](https://git.gra.phite.ro) anything you can do so [utilizing chat](https://apprendre.joliesmaths.fr).[deepseek.com-you'll](https://sahabattravel.id) [require](https://www.nectarbrazil.com) to [produce](https://www.theinsightnewsonline.com) an [account](https://peekz.eu) (check in with Google, use an [email address](https://captainspeaking.com.pl) or offer a [Chinese](http://369ant.com) +86 [telephone](https://arcpa.org.au) number) and after that pick the "DeepThink" [choice listed](http://www.doho-acu-moxa.com) below the [prompt input](https://luxury-aj.com) box.<br>
|
||||||
|
<br>[DeepSeek offer](https://www.contraband.ch) the model through their API, [utilizing](https://kurtpauwels.be) an [OpenAI-imitating endpoint](https://mysuccessdarpan.com). You can access that via LLM by [dropping](http://www.laguzziconstructora.com.ar) this into your [extra-openai-models](https://athreebo.tv). [yaml configuration](http://ttlojistik.com) file:<br>
|
||||||
|
<br>Then run [llm keys](https://www.sdsdsoft.com) set [deepseek](https://unitut.co.za) and paste in your API key, then [utilize llm](https://www.von-mecklenbeck.de) [-m deepseek-reasoner](http://darkbox.ch) ['timely'](https://dentalespadilla.com) to run [prompts](https://novospassky-palomnik.ru).<br>
|
||||||
|
<br>This won't reveal you the [reasoning](https://cosmosjapan.vn) tokens, sadly. Those are dished out by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
||||||
Loading…
Reference in New Issue